I've got a script which trolls our log files looking for python stack
dumps. For each dump it finds, it computes a signature (basically, a
call sequence which led to the exception) and uses this signature as a
dictionary key. Here's the relevant code (abstracted slightly for
readability):
def main(args):
crashes = {}
[...]
for line in open(log_file):
if does_not_look_like_a_stack_dump(line):
continue
lines = traceback_helper.unfold(line)
header, stack = traceback_helper.extract_stack(lines)
signature = tuple(stack)
if signature in crashes:
count, header = crashes[signature]
crashes[signature] = (count + 1, header)
else:
crashes[signature] = (1, header)
You can find traceback_helper at
https://bitbucket.org/roysmith/python-tools/src/4f8118d175ed/logs/
traceback_helper.py
The stack that's returned is a list. It's inherently a list, per the
classic definition:
Er, no, it's inherently a blob of multiple text lines. Sure, you've built
it a line at a time by using a list, but I've already covered that case.
Once you've identified a stack, you never append to it, sort it, delete
lines in the middle of it... none of these list operations are meaningful
for a Python stack trace. The stack becomes a fixed string, and not just
because you use it as a dict key, but because inherently it counts as a
single, immutable blob of lines.
A tuple of individual lines is one reasonable data structure for a blob
of lines. Another would be a single string:
signature = '\n'.join(stack)
Depending on what you plan to do with the signatures, one or the other
implementation might be better. I'm sure that there are other data
structures as well.
* It's variable length. Different stacks have different depths.
Once complete, the stack trace is fixed length, but that fixed length is
different from one stack to the next. Deleting a line would make it
incomplete, and adding a line would make it invalid.
* It's homogeneous. There's nothing particularly significant about each
entry other than it's the next one in the stack.
* It's mutable. I can build it up one item at a time as I discover
them.
The complete stack trace is inhomogeneous and immutable. I've already
covered immutability above: removing, adding or moving lines will
invalidate the stack trace. Inhomogeneity comes from the structure of a
stack trace. The mere fact that each line is a string does not mean that
any two lines are equivalent. Different lines represent different things.
Traceback (most recent call last):
File "./prattle.py", line 873, in select
selection = self.do_callback(cb, response)
File "./prattle.py", line 787, in do_callback
raise callback
ValueError: what do you mean?
is a valid stack. But:
Traceback (most recent call last):
raise callback
selection = self.do_callback(cb, response)
File "./prattle.py", line 787, in do_callback
ValueError: what do you mean?
File "./prattle.py", line 873, in select
is not. A stack trace has structure. The equivalent here is the
difference between:
ages = [23, 42, 19, 67, # age, age, age, age
17, 94, 32, 51, # ...
]
values = [23, 1972, 1, 34500, # age, year, number of children, income
35, 1985, 0, 67900, # age, year, number of children, income
]
A stack trace is closer to the second example than the first: each item
may be the same type, but the items don't represent the same *kind of
thing*.
You could make a stack trace homogeneous with a little work:
- drop the Traceback line and the final exception line;
- parse the File lines to extract the useful fields;
- combine them with the source code.
Now you have a blob of homogeneous records, here shown as lines of text
with ! as field separator:
../prattle.py ! 873 ! select ! selection = self.do_callback(cb, response)
../prattle.py ! 787 ! do_callback ! raise callback
But there's really nothing you can do about the immutability. There isn't
any meaningful reason why you might want to take a complete stack trace
and add or delete lines from it.