reducing a regex

J

Jerry Preston

Hi!

I am trying to figure out how to reduce or simplify the following:


/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

Any ideas?

Thanks,

Jerry
 
G

Gunnar Hjalmarsson

Jerry said:
I am trying to figure out how to reduce or simplify the following:

/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

Any ideas?

Would you mind sharing your own thoughts on the subject?

How on earth would anybody else be able to suggest anything without
knowing what the regex is supposed to accomplish?
 
S

Sherm Pendley

Jerry said:
I am trying to figure out how to reduce or simplify the following:

.... snip ...
Any ideas?

It's difficult to give any specific advice without your own idea of what the
regex should be doing.

But in general, I've found it useful to use the /x modifier with really
large and/or hairy regexes. That will allow you to split it across multiple
lines, and include comments for each subexpression.

sherm--
 
J

Jerry Preston

Sorry!

Basically I am looking a line with the following:



+: word : word word word : x-xxxxxxx-xx : = x word word [ x.xxx, xx.xx ]

Word can be any text with or with out numbers.

X are number only.

Jerry
 
A

Anno Siegel

Jerry Preston said:
Sorry!

Basically I am looking a line with the following:

+: word : word word word : x-xxxxxxx-xx : = x word word [ x.xxx, xx.xx ]

Word can be any text with or with out numbers.

X are number only.

Then why don't you give an example that your regex actually matches?

'+: word : word word word : 9-9999999-99 : = 9 word word [ 9.999, 99.99 ]'

would have worked. What is still missing is any specification of what
in the example is variable and what is fixed, and which properties
you want the regex to check.
I am trying to figure out how to reduce or simplify the following:
/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

You forgot to quote the two periods towards the end of the regex. They
would match nay character.

my @matches = /([\w.]+)/g;

extracts the same fields. If it does what you want, I don't know.

Anno
 
A

Anno Siegel

Jerry Preston said:
Sorry!

Basically I am looking a line with the following:

+: word : word word word : x-xxxxxxx-xx : = x word word [ x.xxx, xx.xx ]

Word can be any text with or with out numbers.

X are number only.

Then why don't you give an example that your regex actually matches?

'+: word : word word word : 9-9999999-99 : = 9 word word [ 9.999, 99.99 ]'

would have worked. What is still missing is any specification of what
in the example is variable and what is fixed, and which properties
you want the regex to check.
I am trying to figure out how to reduce or simplify the following:
/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

You forgot to quote the two periods towards the end of the regex. They
would match any character.

my @matches = /([\w.]+)/g;

extracts the same fields. If it does what you want, I don't know.

Anno
 
G

Gunnar Hjalmarsson

Jerry said:
Jerry said:
I am trying to figure out how to reduce or simplify the
following:

/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

Basically I am looking a line with the following:

+: word : word word word : x-xxxxxxx-xx : = x word word [ x.xxx, xx.xx ]

Word can be any text with or with out numbers.

X are number only.

Okay. This would match:

/^\+:.+\d\d\.\d\d \]$/

and it's simpler. But is it sufficient? I don't know.

If you want to capture all that, you *should* probably not try to
simplify it very much. But, as Sherm said, it can be written more clearly:

my @array = m{
\+:\s+
(\w+) # First word
\s+:\s+
(\w+)\s+(\w+)\s+(\w+) # group of 3 words
\s+:\s+
(\d)-(\d+)-(\d+) # article No.
\s+:\s+=\s+
(\d+)\s+(\w+)\s+(\w+) # ...
\s+\[\s+
(-?\d+.\d+) # number
,\s+
(-?\d+.\d+) # last number
}x;

HTH
 
U

Uri Guttman

AS" == Anno Siegel said:
I am trying to figure out how to reduce or simplify the following:
/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

AS> You forgot to quote the two periods towards the end of the regex. They
AS> would match nay character.

AS> my @matches = /([\w.]+)/g;

AS> extracts the same fields. If it does what you want, I don't know.

not exactly. he has some grabbed numbers with optional leading - signs
and [\w+.] won't grab them. so [\w.-]+ could work. but we both don't
know the real spec so we can't be sure. i wonder if he really wants to
grab all those things and if they are needed to actually match the
string. me thinks the OP just thinks you have to match each part in
detail.

uri
 
A

Anno Siegel

Uri Guttman said:
I am trying to figure out how to reduce or simplify the following:

/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

AS> You forgot to quote the two periods towards the end of the regex. They
AS> would match nay character.

AS> my @matches = /([\w.]+)/g;

AS> extracts the same fields. If it does what you want, I don't know.

not exactly. he has some grabbed numbers with optional leading - signs
and [\w+.] won't grab them.

Oh, right. I didn't see them in the monster regex.
so [\w.-]+ could work. but we both don't
know the real spec so we can't be sure. i wonder if he really wants to
grab all those things and if they are needed to actually match the
string. me thinks the OP just thinks you have to match each part in
detail.

That's the point I was hoping to make by giving a pointedly unspecific
regex. It doesn't hurt the point much that it is a little too unspecific.

Anno
 
U

Uri Guttman

AS> my @matches = /([\w.]+)/g;
AS> extracts the same fields. If it does what you want, I don't know.

not exactly. he has some grabbed numbers with optional leading - signs
and [\w+.] won't grab them.

AS> Oh, right. I didn't see them in the monster regex.

another reason to either not use monster regexes or to enable /x.
so [\w.-]+ could work. but we both don't know the real spec so we
can't be sure. i wonder if he really wants to grab all those things
and if they are needed to actually match the string. me thinks the
OP just thinks you have to match each part in detail.

AS> That's the point I was hoping to make by giving a pointedly
AS> unspecific regex. It doesn't hurt the point much that it is a
AS> little too unspecific.

yep.

uri
 
J

Jerry Preston

What I what to pull out is the 12 items from the following:


/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;
# +: word : word word word : x -
xxxxxxx-xx : = x word word [ x.xxx,
xx.xxx ]
# $1 $2 $3 $4 $5
$6 $7 $8 $9 $10 $11
$12

Sorry for making this harder then it need to be. I hope this helps.

Jerry



Jerry Preston said:
Sorry!

Basically I am looking a line with the following:



+: word : word word word : x-xxxxxxx-xx : = x word word [ x.xxx, xx.xx ]

Word can be any text with or with out numbers.

X are number only.

Jerry

Jerry Preston said:
Hi!

I am trying to figure out how to reduce or simplify the following:
/\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;

Any ideas?

Thanks,

Jerry
 
U

Uri Guttman

JP> What I what to pull out is the 12 items from the following:
JP> /\+:\s+(\w+)\s+:\s+(\w+)\s+(\w+)\s+(\w+)\s+:\s+(\d)-(\d+)-(\d+)\s+:\s+=\s+(\
JP> d+)\s+(\w+)\s+(\w+)\s+\[\s+(-?\d+.\d+),\s+(-?\d+.\d+)/;
JP> # +: word : word word word : x -
JP> xxxxxxx-xx : = x word word [ x.xxx,
JP> xx.xxx ]
JP> # $1 $2 $3 $4 $5
JP> $6 $7 $8 $9 $10 $11
JP> $12

JP> Sorry for making this harder then it need to be. I hope this helps.

that isn't much better then just the plain regex. i can tell you want
words and : and such from reading the regex. a spec tell you what the
input data is and why you need to parse it.

and the $1 stuff is useless as we can count.

converting it to /x style would be a great first step.

a good second step would be to use anno's and my simplification to a
single grab with /g. it looks like that is all you really need for that
string. and you just store it in an array instead of 12 $1 style
vars. in fact you never want to have so many $1 vars around as it can
get annoying.

so what is this input string? why do you need those fields? specify it
in english and not with 'word : ' stuff.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,149
Messages
2,570,843
Members
47,390
Latest member
RobertMart

Latest Threads

Top