regex problem unresolved

Ela · May 10, 2008

Dear Gurus,

I was suggested by experts to post runnable codes to seek advice.
Unfortunately, after 4-day trial, I'm unable to break down the large codes
written by others to a smaller one. And as what xhoster inferred exactly,
the matching string comes from a data file and therefore cannot be known in
advance.

It is a cyclic problem that if I had known which line of the perl codes that
makes regex fail, I might have already solved half of the problem. But I'm
really saying the truth that no line number printed. And there's a single
line of error saying "Invalid [] range "l-c" in regex; marked by <-- HERE in
m/^3-oxoacyl-[acyl-c
<-- HERE arrier protein] reductase fabg1$/"

Xho suggested 2 solutions. One is index instead of regex that I still don't
know what it means. Another one is to upgrade perl, which is impossible.

In fact, since input data file is needed and "oxoacyl" does exist in that
data file confirmed by search, I believe the fetch program first grabs all
the keywords and later uses the variable one by one to match.

I appreciate your comments/critics what can be done to maximize the chance
to solve this problem.

Sincerely, Ela !_!

Ben Bacarisse · May 10, 2008

[I have not followed the previous thread so this may be a big
mistake. I hope your explanation here stands on its own.]

I was suggested by experts to post runnable codes to seek advice.

That is the best way, but...

Unfortunately, after 4-day trial, I'm unable to break down the large codes
written by others to a smaller one.
OK.

And as what xhoster inferred exactly,
the matching string comes from a data file and therefore cannot be known in
advance.

It is a cyclic problem that if I had known which line of the perl codes that
makes regex fail, I might have already solved half of the problem. But I'm
really saying the truth that no line number printed. And there's a single
line of error saying "Invalid [] range "l-c" in regex; marked by <-- HERE in
m/^3-oxoacyl-[acyl-c
<-- HERE arrier protein] reductase fabg1$/"

First, do you understand the error message? A regexp can include [a-b] to
match a range of characters, but the character b must be "bigger than"
a. Obviously c comes before l so the regexp is invalid. So far so good.

Xho suggested 2 solutions. One is index instead of regex that I still don't
know what it means. Another one is to upgrade perl, which is
impossible.

In fact, since input data file is needed and "oxoacyl" does exist in that
data file confirmed by search, I believe the fetch program first grabs all
the keywords and later uses the variable one by one to match.

Does the code contain [l-c] in a regexp or does the program generate
these REs at run time from some other data? If the former, you need
to correct it, because it makes no sense (like writing 42/0). If the
latter, you problem is not to do with the regexp itself but the code
the builds it. The problem could then be absolutely anywhere and no
one here will be able to help without seeing some more code.

At a wild guess, your program just needs to quote [s in the strings it
reads before turning them into regexps...

Jens Thoms Toerring · May 10, 2008

Ela said:
I was suggested by experts to post runnable codes to seek advice.
Unfortunately, after 4-day trial, I'm unable to break down the large codes
written by others to a smaller one. And as what xhoster inferred exactly,
the matching string comes from a data file and therefore cannot be known in
advance.

It is a cyclic problem that if I had known which line of the perl codes that
makes regex fail, I might have already solved half of the problem. But I'm
really saying the truth that no line number printed. And there's a single
line of error saying "Invalid [] range "l-c" in regex; marked by <-- HERE in
m/^3-oxoacyl-[acyl-c
<-- HERE arrier protein] reductase fabg1$/"

Xho suggested 2 solutions. One is index instead of regex that I still don't
know what it means. Another one is to upgrade perl, which is impossible.

In fact, since input data file is needed and "oxoacyl" does exist in that
data file confirmed by search, I believe the fetch program first grabs all
the keywords and later uses the variable one by one to match.

Unfortunately, your problem descripton is still rather vague
which makes it difficult to come up with an answer. So let
me start with a summary of what I think I understood you're
trying to do:

You have a file 1 from which you read a string, lets call it
$to_match. That string you somehow have to use to find another
string within a file 2, lets call it $string_to_test.

Something from what you write above makes it look as if you
are only interested in exact matches, i.e. when $match_string
is identical to $string_to_test (the use of '^' and '$' in
the regular expression make it look a bit like that). In that
case the simplest comparison would be a plain

if ( $match_string eq $string_to_test ) ...

If, on the other hand you want to find $match_string anywhere
within $string_to_test then, as Xho pointed out, the index()
function probably is the best choice, see

perldoc -f index

since that will get rid of all the problems involved with
using a regex.

If you insist on using a regex then you will have to escape
all characters in $match_string that would be interpreted
by the regular expression matching system like e.g. the '-',
'[', ']' etc. before you use it in the regex.

But if, finally, the file 1 already contains strings meant to
be regular expressions than it looks as i this file simply
contains flawed regex strings and there's hardly anything
you can do to solve the problem (unless you know exactly
which flaws are to be expected and can correct them before
you use $match_string in a regular expression).

Regards, Jens

Jürgen Exner · May 10, 2008

Ela said:
the matching string comes from a data file and therefore cannot be known in
advance.

Good to know.

It is a cyclic problem that if I had known which line of the perl codes that
makes regex fail, I might have already solved half of the problem. But I'm
really saying the truth that no line number printed. And there's a single
line of error saying "Invalid [] range "l-c" in regex; marked by <-- HERE in
m/^3-oxoacyl-[acyl-c
<-- HERE arrier protein] reductase fabg1$/"

Then your data file contains an illegal regular expression:
^3-oxoacyl-[acyl-carrier protein] reductase fabg1$
is not a valid RE. Sorry, but that's what it is. Broken data!

Some options:
- fix that broken data file. If that data file is supposed to contain
REs, then you have to make sure that it does contain REs and not illegal
expressions
- invent some method to convert the broken RE into a valid RE before
applying it in the pattern match. I cannot tell you how because I do not
know what the intended behaviour would be.

BUT: You do know what a character class is, do you? Does your data file
really really meant to match exactly one character of the set a, c, e,
l, n, p, o, i, r, t, y, and space at that spot? Because that is what is
defined in that character class (ignoring the error for the time being)!
I have a very strong feeling that you did _NOT_ mean to use a regular
expression match in the first place but that you simply meant to compare
the text with some other text. After all, the orginal author went even
so far as to anchor the RE at the beginning and end of the RE and the RE
doesn't use any meaningful RE metacharacters at all.

Therefore I suggest to investigate option 3:
- don't use any RE match but a simple string comparison instead unless
you need the added functionality of an RE match

Xho suggested 2 solutions. One is index instead of regex that I still don't
know what it means.

See perldoc -f index
index() is _the_ standard way to check if one string is a substring of
another string.

However in your case because the RE is anchored at both ends a simple
'eq' would work even better.

Another one is to upgrade perl, which is impossible.

You still might consider doing it at least temporarily just to chase
down this bug.

I appreciate your comments/critics what can be done to maximize the chance
to solve this problem.

Think very, very hard if you really, really meant RE pattern matching or
if you meant to do literal string comparison. These are two very
different things and I have a very strong feeling that you are trying to
use hammer to drive a screw.

jue

Ben Bullock · May 11, 2008

I was suggested by experts to post runnable codes to seek advice.
Unfortunately, after 4-day trial, I'm unable to break down the large
codes written by others to a smaller one. And as what xhoster inferred
exactly, the matching string comes from a data file and therefore cannot
be known in advance.

So, to make this clear, somewhere in your code there is something of the
form of

$matchingstring = <DATAFILE>;

if (/$matchingstring/) { #
print "Hallo Ela!\n"
}

It is a cyclic problem that if I had known which line of the perl codes
that makes regex fail, I might have already solved half of the problem.
But I'm really saying the truth that no line number printed.

Jurgen Exner dug up that this isn't printed with Perl 5.6. It's a shame
if you can't upgrade. Could you please confirm that you are using Perl
5.6 by printing the results of

perl --version

And there's
a single line of error saying "Invalid [] range "l-c" in regex; marked
by <-- HERE in m/^3-oxoacyl-[acyl-c
<-- HERE arrier protein] reductase fabg1$/"

What you need to do is to use \Q \E around the string which is causing
the problems. In my example code:

if (/\Q$matchingstring\E/) { #
print "Hallo Ela!\n"
}

But that doesn't help you to find the line with the problem.

Xho suggested 2 solutions. One is index instead of regex that I still
don't know what it means.

He means you should use the function "index" instead of / / in the above
code.

Another one is to upgrade perl, which is
impossible.

As far as I know, it's possible to install Perl into a local account. It
doesn't have to be installed globally. The last time I did this was about
1996, so I'm not 100% sure about nowadays, but I'm fairly sure it should
be possible.

In fact, since input data file is needed and "oxoacyl" does exist in
that data file confirmed by search, I believe the fetch program first
grabs all the keywords and later uses the variable one by one to match.

So the problem is where the match is taking place? To find the line where
the match is taking place involves carefully looking at the source code.
Unfortunately there is no particular magic trick which will solve this
problem for you. You will have to actually understand what the program is
doing to solve this problem. In these circumstances one useful start is
to sprinkle "print" statements throughout the code.

Ben Morrow · May 11, 2008

Quoth Ben Bullock said:
As far as I know, it's possible to install Perl into a local account. It
doesn't have to be installed globally. The last time I did this was about
1996, so I'm not 100% sure about nowadays, but I'm fairly sure it should
be possible.

It's certainly possible (I've got about 16 versions of perl installed
here under my own user account) but it's by no means easy if you aren't
familiar with building perl from source.

So the problem is where the match is taking place? To find the line where
the match is taking place involves carefully looking at the source code.
Unfortunately there is no particular magic trick which will solve this
problem for you. You will have to actually understand what the program is
doing to solve this problem. In these circumstances one useful start is
to sprinkle "print" statements throughout the code.

Another would be to put something like

use Carp qw/confess/;

$SIG{__DIE__} = sub { confess $_[0] };

somewhere near the top of the program, which should give you a complete
stack trace when the error occurs.

Ben

Jürgen Exner · May 11, 2008

Ben Bullock said:
And there's
a single line of error saying "Invalid [] range "l-c" in regex; marked
by <-- HERE in m/^3-oxoacyl-[acyl-c
<-- HERE arrier protein] reductase fabg1$/"

Click to expand...

What you need to do is to use \Q \E around the string which is causing
the problems. In my example code:

if (/\Q$matchingstring\E/) { #
print "Hallo Ela!\n"
}

Actually I respectfully disagree. If the OP wants to compare two strings
then index() or even a simple 'eq' is the tool of choice.

Using the \Q...\E method will work, but it is like filing the fin of a
hammer to fit a Philips screw head because you realized that pounding in
the screw doesn't work and you don't know how to use a screw drivers (I
know you do, but speaking in general).

jue

Jürgen Exner · May 11, 2008

Ben Morrow said:
Another would be to put something like

use Carp qw/confess/;

$SIG{__DIE__} = sub { confess $_[0] };

somewhere near the top of the program, which should give you a complete
stack trace when the error occurs.

Good advise, I will save this somewhere for myself.
However the OP seems to be pretty inexperienced wrt. programming. I
doubt that he will be able to put a stack trace to any use.

jue

Ben Morrow · May 11, 2008

Quoth Jürgen Exner said:
Ben Morrow said:

Another would be to put something like

use Carp qw/confess/;

$SIG{__DIE__} = sub { confess $_[0] };

somewhere near the top of the program, which should give you a complete
stack trace when the error occurs.

Click to expand...

Good advise, I will save this somewhere for myself.

I have a little module Carp::AllVerb installed locally that just does
exactly this for __DIE__ and __WARN__; then when I'm having trouble
debugging something I can run

perl -MCarp::AllVerb foo

and get lotsa stack traces

.

However the OP seems to be pretty inexperienced wrt. programming. I
doubt that he will be able to put a stack trace to any use.

This is possibly true; however, it should at least get him(?) a source
line number for the error, which we seem to have established his version
of perl doesn't give on its own.

Ben

Ben Bullock · May 11, 2008

Actually I respectfully disagree. If the OP wants to compare two strings
then index() or even a simple 'eq' is the tool of choice.

Well, the thing is that we actually don't know what the original poster's
code is doing anyway. His main problem seemed to finding the line of code
where the problem occurred, rather than the "comparing two strings" part.

Using the \Q...\E method will work, but it is like filing the fin of a
hammer to fit a Philips screw head because you realized that pounding in
the screw doesn't work and you don't know how to use a screw drivers

OK, but my \Q \E solution has one merit: it minimizes the editing of the
program source, and hence minimizes the debugging work for Ela, who
already seems fairly confused (didn't understand abour regex and index,
etc.). That was the reason I chose it.

xhoster · May 12, 2008

Jürgen Exner said:
Ben Morrow said:

Another would be to put something like

use Carp qw/confess/;

$SIG{__DIE__} = sub { confess $_[0] };

somewhere near the top of the program, which should give you a complete
stack trace when the error occurs.

Click to expand...

Ah, very nice. I tried this, but I couldn't get it to work so I gave up.
It turns out I was using a hard-coded /[l-c]/, which caused compile
time errors which by-pass the sig handler. When I switched to stuffing
the illegal regex-string into a variable and then invoking /$x/, it worked
like a charm.

Good advise, I will save this somewhere for myself.
However the OP seems to be pretty inexperienced wrt. programming. I
doubt that he will be able to put a stack trace to any use.

Well, if nothing else she could come back to us armed with a stack trace
and ask for more help.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Ben Morrow · May 12, 2008

Quoth (e-mail address removed):

Jürgen Exner said:
Jürgen Exner said:

Ben Morrow said:

Another would be to put something like

use Carp qw/confess/;

$SIG{__DIE__} = sub { confess $_[0] };

somewhere near the top of the program, which should give you a complete
stack trace when the error occurs.

Click to expand...

Click to expand...

Ah, very nice. I tried this, but I couldn't get it to work so I gave up.
It turns out I was using a hard-coded /[l-c]/, which caused compile
time errors which by-pass the sig handler.

You can catch these by putting the assignment in a BEGIN block (or a
module, of course).

Ben

Identification of which line causing regex problem	10	May 6, 2008
regex problem	7	Jun 12, 2009
Big problem I need to solve with some unix utils	1	Jun 19, 2022
Clickable link conversion regex?	0	Nov 30, 2012
regex problem	10	May 4, 2006
Complex regex question	1	Sep 26, 2009
Help: Nested quantifiers in regex Problem	3	Oct 24, 2008
Can't solve this problem from my university	7	Oct 6, 2022

regex problem unresolved

Ela

Ben Bacarisse

Jens Thoms Toerring

Jürgen Exner

Ben Bullock

Ben Morrow

Jürgen Exner

Jürgen Exner

Ben Morrow

Ben Bullock

xhoster

Ben Morrow

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads