S
Scooter
I'm attempting to reformat an apache log file that was written with a
custom output format. I'm attempting to get it to w3c format using a
python script. The problem I'm having is the field-to-field matching.
In my python code I'm using split with spaces as my delimiter. But it
fails when it reaches the user agent because that field itself
contains spaces. But that user agent is enclosed with double quotes.
So is there a way to split on a certain delimiter but not to split
within quoted words.
i.e. a line might look like
2009-09-29 12:00:00 - GET / "Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC
5.0; .NET CLR 3.0.04506; .NET CLR 3.5.21022)" http://somehost.com 200
1923 1360 31715 -
custom output format. I'm attempting to get it to w3c format using a
python script. The problem I'm having is the field-to-field matching.
In my python code I'm using split with spaces as my delimiter. But it
fails when it reaches the user agent because that field itself
contains spaces. But that user agent is enclosed with double quotes.
So is there a way to split on a certain delimiter but not to split
within quoted words.
i.e. a line might look like
2009-09-29 12:00:00 - GET / "Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 6.0; GTB5; SLCC1; .NET CLR 2.0.50727; Media Center PC
5.0; .NET CLR 3.0.04506; .NET CLR 3.5.21022)" http://somehost.com 200
1923 1360 31715 -