Python libraries for log mining and event abstraction? (possibly OT)

F

felciano

Hi --

I am trying to do some event abstraction to mine a set of HTTP logs.
We have a pretty clean stateless architecture with user IDs that
allows us to understand what is retrieved on each session, and should
allow us to detect the higher-order user activity from the logs.
Ideally I'd love a python toolkit that has abstracted this out into a
basic set of API calls or even a query language.

An simple example is: find all instances of a search request, followed
by a 2+ search requests with additional words in the search string,
and group these into a higher-order "Iterative Search Refinement"
event (i.e. the user got too many search results to start with, and is
adding additional words to narrow down the results). So what I need is
the ability to select temporally-related events out of the event
stream (e.g. find searches by the same user within 10 second of each
other), further filter based on additional criteria across these event
(e.g. select only search events where there are additional search
criteria relative to the previous search), and a way to annotate, roll-
up or otherwise group matching patterns into a higher-level event.
Some of these patterns may require non-trivial criteria / logic not
supported by COTS log analytics, which is why I'm trying a toolkit
approach that allows customization.

I've been hunting around Google and the usual open source sites for
something like this and haven't found anything (in python or
otherwise). This is surprising to me, as I would think many people
would benefit from something like this, so maybe I'm just describing
the problem wrong or using the wrong keywords. I'm posting this to
this group because it feels somewhat AI-ish (temporal event
abstraction, etc) and that therefore pythonistas may have experience
with (there seems to be a reasonably high correlation there). Further,
if I can't find anything I'm going to have to build it myself, and it
will be in python, so any pointers on elegant design patterns for how
to do this using pythonic functional programming would be appreciated.
Barring anything else I will start from itertools and work from
there.

That said, I'm hoping to use an existing library rather than re-invent
the wheel. Any suggestions on where to look for something like this?

Thanks!

Ramon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top