A
Adrian Petru Dimulescu
--------------010804070103040002070800
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Hello,
I have a regex infinte loop kind of problem. I use ruby 1.8.2. The
regular expression I used was:
[tT]he\s+(([\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*((\s*(\,|and|or)\s*)*[\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*)*))\s+((\w+\s+){0,3}\s*(proteins|genes|protein|gene))
I try to match this regex against the string, without the quotes (the
following should be a whole single line):
"to this end , NK_CTL clones derived from four donors ( KK , GG , GF ,
and DP ) were tested for their ability to lyse the TAP2_deficient
RMA_S\HLA_E cell_line incubated with serial_dilutions of the VMAPRTLIL ,
VMAPRTLVL , VMAPRTLLL , and VMAPRALLL peptides ."
Normally the regex is not supposed to match against this particular
string. What it does, it hangs with 100% CPU consumption, while for lots
of other lines of text it works ok. Am I doing something wrong?
I tried doing the same regex in perl, it complained about the string
having the \H unknown control sequence (it appears indeed in the text at
some point). If I replace the "\H" by, let's say, "-H", the regular
expression passes through without finding anything in Perl -- which is
normal, as I said. Ruby hangs even if I change the "\" in "-".
Needless to say I would much appreciate some help on this one. Feel free
to ask for explanation of that complicated regex if needed or any other
information.
As attachment, the ruby script that hangs.
Best regards,
Adrian Dimulescu.
--------------010804070103040002070800
Content-Type: text/plain;
name="regex-hangs.rb"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="regex-hangs.rb"
IyEgL3Vzci9iaW4vcnVieSAtdwoKc3RyaW5nID0gJ3RvIHRoaXMgZW5kICwgTktfQ1RMIGNs
b25lcyBkZXJpdmVkIGZyb20gZm91ciBkb25vcnMgKCBLSyAsIEdHICwgR0YgLCBhbmQgRFAg
KSB3ZXJlIHRlc3RlZCBmb3IgdGhlaXIgYWJpbGl0eSB0byBseXNlIHRoZSBUQVAyX2RlZmlj
aWVudCBSTUFfU1xITEFfRSBjZWxsX2xpbmUgaW5jdWJhdGVkIHdpdGggc2VyaWFsX2RpbHV0
aW9ucyBvZiB0aGUgVk1BUFJUTElMICwgVk1BUFJUTFZMICwgVk1BUFJUTExMICwgYW5kIFZN
QVBSQUxMTCBwZXB0aWRlcyAuJwpzdHJpbmcgPX4gL1t0VF1oZVxzKygoW1x3XGRcX10rKD86
W2EtekEtWl1bXGRBLVpdfFtcZEEtWl1bYS16QS1aXSlbXHdcZFxfXSooKFxzKihcLHxhbmR8
b3IpXHMqKSpbXHdcZFxfXSsoPzpbYS16QS1aXVtcZEEtWl18W1xkQS1aXVthLXpBLVpdKVtc
d1xkXF9dKikqKSlccysoKFx3K1xzKyl7MCwzfVxzKihwcm90ZWluc3xnZW5lc3xwcm90ZWlu
fGdlbmUpKS8KcHJpbnQgImkgZm91bmQ6ICIgKyAkJg==
--------------010804070103040002070800--
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Hello,
I have a regex infinte loop kind of problem. I use ruby 1.8.2. The
regular expression I used was:
[tT]he\s+(([\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*((\s*(\,|and|or)\s*)*[\w\d\_]+(?:[a-zA-Z][\dA-Z]|[\dA-Z][a-zA-Z])[\w\d\_]*)*))\s+((\w+\s+){0,3}\s*(proteins|genes|protein|gene))
I try to match this regex against the string, without the quotes (the
following should be a whole single line):
"to this end , NK_CTL clones derived from four donors ( KK , GG , GF ,
and DP ) were tested for their ability to lyse the TAP2_deficient
RMA_S\HLA_E cell_line incubated with serial_dilutions of the VMAPRTLIL ,
VMAPRTLVL , VMAPRTLLL , and VMAPRALLL peptides ."
Normally the regex is not supposed to match against this particular
string. What it does, it hangs with 100% CPU consumption, while for lots
of other lines of text it works ok. Am I doing something wrong?
I tried doing the same regex in perl, it complained about the string
having the \H unknown control sequence (it appears indeed in the text at
some point). If I replace the "\H" by, let's say, "-H", the regular
expression passes through without finding anything in Perl -- which is
normal, as I said. Ruby hangs even if I change the "\" in "-".
Needless to say I would much appreciate some help on this one. Feel free
to ask for explanation of that complicated regex if needed or any other
information.
As attachment, the ruby script that hangs.
Best regards,
Adrian Dimulescu.
--------------010804070103040002070800
Content-Type: text/plain;
name="regex-hangs.rb"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
filename="regex-hangs.rb"
IyEgL3Vzci9iaW4vcnVieSAtdwoKc3RyaW5nID0gJ3RvIHRoaXMgZW5kICwgTktfQ1RMIGNs
b25lcyBkZXJpdmVkIGZyb20gZm91ciBkb25vcnMgKCBLSyAsIEdHICwgR0YgLCBhbmQgRFAg
KSB3ZXJlIHRlc3RlZCBmb3IgdGhlaXIgYWJpbGl0eSB0byBseXNlIHRoZSBUQVAyX2RlZmlj
aWVudCBSTUFfU1xITEFfRSBjZWxsX2xpbmUgaW5jdWJhdGVkIHdpdGggc2VyaWFsX2RpbHV0
aW9ucyBvZiB0aGUgVk1BUFJUTElMICwgVk1BUFJUTFZMICwgVk1BUFJUTExMICwgYW5kIFZN
QVBSQUxMTCBwZXB0aWRlcyAuJwpzdHJpbmcgPX4gL1t0VF1oZVxzKygoW1x3XGRcX10rKD86
W2EtekEtWl1bXGRBLVpdfFtcZEEtWl1bYS16QS1aXSlbXHdcZFxfXSooKFxzKihcLHxhbmR8
b3IpXHMqKSpbXHdcZFxfXSsoPzpbYS16QS1aXVtcZEEtWl18W1xkQS1aXVthLXpBLVpdKVtc
d1xkXF9dKikqKSlccysoKFx3K1xzKyl7MCwzfVxzKihwcm90ZWluc3xnZW5lc3xwcm90ZWlu
fGdlbmUpKS8KcHJpbnQgImkgZm91bmQ6ICIgKyAkJg==
--------------010804070103040002070800--