Python 2.4.1 hang

M

Mahesh

Hi,

This is on WinXP SP1.

I needed to get to the POST body and while I was trying out various
regular expressions, one of them caused Python to hang. The Python
process was taking up 100% of the CPU. I couldn't even see the "Max
recursion depth exceeded message". Is this a bug? Code below:

import re

s = \
"""POST /TradeManagement-RT3/ReportController.Servlet HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-excel, application/vnd.ms-powerpoint,
application/msword, */*
Referer: https://dummy.com/TradeManagement-RT3/PrelimReportSearch.jsp
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR
1.0.3705)
Host: dummy.com
Content-Length: 267
Connection: Keep-Alive
Cache-Control: no-cache
Cookie:
SMSESSION=RNwINGbjtRijIplKDE8EZ79NtSJREhKu4OogQqQD1PTukTIE3pclwfkkj2b5YSFscbW97A8QQxk1066rmtqwatlBQsnxr2h6fvAPiazYWex297WmDDjPd05RjXNhVqiPmhxSN9nOVP6Pq1igcFC5b3R4AFHWFcz+lW1QyUz+1yeaLfupKDkwaV7jP0qgPbccioWUpEmn762OyqCnehjuVBJ9hDBGO07Bx8Pv/tHd0l6xjFOt6YbtHG9IfMaKhrnPwmdtyo8c/4trmRNO84BoqwhtOzhrJTVuPjzYN2uxg04ZgAt+j75gSA9OPYYymirfwx5zBnhHvQz7ezGQqUPe45l3CvnRhkFVM/kOAYdY2Cdlv+15EMCVqJLT+2cMRCPPY+vlqlgsY30h5V9NWiG+AdXKQ8LEUnPEhnSYhyIo1a3FzB1yr+E/CZfXkNi1lMrG0HiwU+NJVK7rY0deee7gFeiq8T0660eq2WOVF7USMESTOAbSDsR6Ejo+rRscvHfX7uzvu1pRw0Phw7ffF0pr/nBhunq4v7/dW6WXOzvWAEocBXK9/Hl5Ua63X/UxXVs8g6psI0mqoRziFWw+O4t4jjn1fS2e1YvvtAGRPIcNeEEPSCgqEhSUKoGz1qysPoK87MgflIaHt/PsOeRCYSS/53B87RH9RrcaJEgrHyIBZNuzEjD1AG4Uud5oKi88902RW3IATHnH1E4UntvEdo/NbCcNbgN/dGWEvBnBzLn6KYxd4PxG0pQ3vr3qDDa7v0i9eXq9t6++tlM1tIS/XIHZc4bfGKPdZC30Dtw7HwUc7bl74/SHVEEcgzgXJPkCH2zSHaxyot3sqGHCwDa3AmuUkaPSC+iviVHlTe3Uk4KsOnnG94UIwB4yv+mlkXqnw0JwausWVuetCIm+cDIuvZXgRYghjZnNcNsji0k15ddr8j4=;
CCTMRT3=O0PVLBMWBVD2115CLLS4REI

EntrySourceDescription=All&SEARCHReportType=All&SEARCHReportStatus=All&available=IC&SEARCHReportCommodity=All&SortContract_Year=1&SortTrade_Price=&SortOrder_Type=&SortBuy_Sell_Ind=&SortAccount_Number=&SortExternal_TradeId=&GroupBy=contract&command=PrelimReportCommand"""

#pattern_str = "^POST.*\\r\\n\\r((\\n)|(\\n[^\r]*))"
#pattern_str = "^POST.*\\n((\\n)|(\\n[^\r]*&))"
pattern_str = "^POST(.*\\n*)+\\n\\n" # <--- Offending pattern

pattern = re.compile(pattern_str)

match = pattern.match(s);

if match:
print match.groups()
 
F

Fredrik Lundh

Mahesh said:
I needed to get to the POST body and while I was trying out various
regular expressions, one of them caused Python to hang. The Python
process was taking up 100% of the CPU. I couldn't even see the "Max
recursion depth exceeded message". Is this a bug?

no, it's just a very stupid way to implement a trivial operation.
import re

s = \
"""POST /TradeManagement-RT3/ReportController.Servlet HTTP/1.1
/snip>

#pattern_str = "^POST.*\\r\\n\\r((\\n)|(\\n[^\r]*))"
#pattern_str = "^POST.*\\n((\\n)|(\\n[^\r]*&))"
pattern_str = "^POST(.*\\n*)+\\n\\n" # <--- Offending pattern

the first .* is a variable-length match. so is the second .*. and then you're putting it
inside a repeated capturing group. and then you're applying it to a moderately large
string. the poor engine has to check zillions of combinations before finding something
that works.

if you want to split on "\r\n\r\n", use split:

header, body = message.split("\r\n\r\n")

for more robust code, consider using the rfc822 module:

f = StringIO.String(message)
request = f.readline()
header = rfc822.Message(f)
body = f.read()

</F>
 
M

Mahesh

Yes, it is stupid but I am debugging some poorly written C++ code so I
cannot change it. It was easier for me to use python to try various
combinations (since the C++ code uses a non-standard re engine). I just
chanced upon the problem and was curious as to what Python was up to.

Thanks for clearing that up.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,234
Messages
2,571,180
Members
47,813
Latest member
RustyGary3

Latest Threads

Top