Difficult regular expression problem!

F

Fritz Bayer

Hello,

I have the problem that the regular expression (word1|word2|word3)? is
not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to
the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/
print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ",
the second up to the last word of this sentence should be " to the
console.

However, this does not happen. I used $& to look at what gets matched
and it seems nothing at all.

If I take the question mark out - then the expresson matches, but the
problem is, that I also want the the following variation of the above
sentence to match:

"$context = "This is a very STUPID sentence. Always, the second up to
the last word of this sentence should be captured!"

The reason is, that I want to capture any expressoin which matches the
regular expression:

Always(.*)variations!

Now, if this regular expression is precedded by one of the words in
the list, then I would like to know about it and capture/print it out.

Do you know the a regular expression to achieve this?

Thanks,
Fritz
 
S

Sandman

Hello,

I have the problem that the regular expression (word1|word2|word3)? is
not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to
the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/
print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ",
the second up to the last word of this sentence should be " to the
console.

First, you're doing the substitute on $content, while the string is in
$context. Secondly, why do a substitute at all? Why not just match the words
you're looking for?



#!/usr/bin/perl
use strict;
use warnings;

my $string = "This is a very simple sentence. Always, the second up to the last
word of this sentence should be captured!";

if ($string =~m/^.*?(simple|easy|plain|simplistic).*?Always(.*)captured!$/i){
print "$1$2\n";
}

__END__
Output: simple, the second up to the last word of this sentence should be


But the above matches this particular string, not the rules the string claims
should be parsed (i.e not the second up til the last) you may want to try
something like this:


#!/usr/bin/perl
use strict;
use warnings;

my $string = "This is a very simple sentence. Always, the second up to the last
word of this sentence should be captured!";

if ($string =~m/^.*?(simple|easy).*?\. [^\s]*?\s(.*?)\s[^\s]*?$/i){
print "$1 $2\n";
}

__END__
Output: simple the second up to the last word of this sentence should be


(note that variations of 'simple' was removed to fit into linelength
 
G

Gunnar Hjalmarsson

Fritz said:
I have the problem that the regular expression (word1|word2|word3)?
is not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up
to the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/
print($1 . "\n")/iesg;

If capturing things is what you want to do, why are you using the s///
operator?
Now, I would expect this to be printing out "simple" followed by ",
the second up to the last word of this sentence should be " to the
console.

Why would it print $2 if you don't tell it to do so?
However, this does not happen. I used $& to look at what gets
matched and it seems nothing at all.

I would disagree. The regex matches, and $2 has content.
If I take the question mark out - then the expresson matches, but
the problem is, that I also want the the following variation of the
above sentence to match:

"$context = "This is a very STUPID sentence. Always, the second up
to the last word of this sentence should be captured!"

Okay. Using ? in combination with .* seems not to be a good idea.
Maybe you should use two regexes, testing the presence of "simple"
etc. separately:

my ($word) = $content =~ /(simple|easy|plain|simplitic).*Always/;
print "Found ", ($word ? "'$word'" : 'nothing'),
" before 'Always'.\n";
 
A

Anno Siegel

Fritz Bayer said:
Hello,

I have the problem that the regular expression (word1|word2|word3)? is
not being recalled when later being referenced using $1.

Here a very simple example:

$context = "This is a very simple sentence. Always, the second up to
^
That should be "$content", as has been noted.
the last word of this sentence should be captured!"

$content =~ s/(simple|easy|plain|simplitic)?.*Always(.*)captured!/
print($1 . "\n")/iesg;

Now, I would expect this to be printing out "simple" followed by ",
the second up to the last word of this sentence should be " to the
console.

First off, you seem to think that "print" returns the string printed.
It doesn't. See "perldoc -f print". Further, there is no way for the
text from the second capture to show up in the replacement. It is
captured in $2, which is nowhere mentioned. Third, why are you using
/g on the substitution? It's not going to do anything.
However, this does not happen. I used $& to look at what gets matched
and it seems nothing at all.

Then simplify the problem. Why are you still working with a series of
alternatives for the first match? "simple" would suffice for a test.
Why are you testing with a two-line string (and need /s in consequence)?
Make is a single line and leave off /s.
If I take the question mark out - then the expresson matches, but the
problem is, that I also want the the following variation of the above
sentence to match:

The question mark allows the *following* ".*" to match the whole sentence.
After that, "Always" won't match any more, so the whole match fails.
Make ".*" non-greedy by placing a "?" after it, and it matches again.
"$context = "This is a very STUPID sentence. Always, the second up to
the last word of this sentence should be captured!"

The reason is, that I want to capture any expressoin which matches the
regular expression:

Always(.*)variations!

Now, if this regular expression is precedded by one of the words in
the list, then I would like to know about it and capture/print it out.

Do you know the a regular expression to achieve this?

Apparently you are trying to write a substitution (not just a regex),
that does this along the way. If that doesn't succeed, simplify the
problem. Study the regex part by itself until it matches what you want.
Then base a substitution on that result. You won't need /e in the
final solution, far less "print".

Anno
 
G

Gunnar Hjalmarsson

Anno said:
The question mark allows the *following* ".*" to match the whole
sentence.

Yes, if you are talking about the first sentence.
After that, "Always" won't match any more, so the whole match
fails. Make ".*" non-greedy by placing a "?" after it, and it
matches again.

Not true. "Always" always matches. Making the first ".*" non-greedy
makes no difference, since there is only one "Always".
 
G

Gunnar Hjalmarsson

Gunnar said:
Making the first ".*" non-greedy makes no difference, since there
is only one "Always".

I believe you could add: Changing greediness *never* makes a regex
match that didn't match before you changed greediness - and vice
versa. Greediness may make a difference with respect to *what* is
being matched.
 
A

Anno Siegel

Gunnar Hjalmarsson said:
Yes, if you are talking about the first sentence.


Not true. "Always" always matches. Making the first ".*" non-greedy
makes no difference, since there is only one "Always".

You are right. Trying to explain an effect that doesn't exist leads
to spurious explanations.

Anno
 
F

Fritz Bayer

Christian Winter said:
Is it intentionally that one scalar is called $context,
whereas you apply your regex-substitution to $content?

-Christian

Sorry, this is a mistake, of course they should be equivalent. This
meant to be an ilustrative example.
 
F

Fritz Bayer

Gunnar Hjalmarsson said:
If capturing things is what you want to do, why are you using the s///
operator?


Why would it print $2 if you don't tell it to do so?


I would disagree. The regex matches, and $2 has content.


Okay. Using ? in combination with .* seems not to be a good idea.
Maybe you should use two regexes, testing the presence of "simple"
etc. separately:

my ($word) = $content =~ /(simple|easy|plain|simplitic).*Always/;
print "Found ", ($word ? "'$word'" : 'nothing'),
" before 'Always'.\n";


You are right. I'm sorry not having set up the example correctly. Of
course, $1 and $2 should be printed out and the scalar namens should
be the same.

So let me try to explain the problem using words. What I'm trying to
achieve can be summarized as follows:

1. I'm looking for a pattern (lets call it A), which I want to capture
and print out to standard output. It reocurs several times in the
text.

2. If I find the pattern A in the text, then I want to know, whether
or not a certain word (lets call it B) out of a word list (lets call
it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded by
a word out of C, then print A;C;. If A is found, but none of the words
out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

Here the regex, which of course is not working (just to give you an
idea):
/.*?(computer|apartment|miro|somethingelsemissing).*?(\d+)\s+Dollars?/
print ($1 . ";" . $2 . ";\n")/gies

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;
 
G

Glenn Jackman

Fritz Bayer said:
1. I'm looking for a pattern (lets call it A), which I want to capture
and print out to standard output. It reocurs several times in the
text.

2. If I find the pattern A in the text, then I want to know, whether
or not a certain word (lets call it B) out of a word list (lets call
it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded by
a word out of C, then print A;C;. If A is found, but none of the words
out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

Here the regex, which of course is not working (just to give you an
idea):
/.*?(computer|apartment|miro|somethingelsemissing).*?(\d+)\s+Dollars?/
print ($1 . ";" . $2 . ";\n")/gies

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;

I don't think regular expressions can do it.
You can split on the dollar value and look at the preceding piece of
text for the item:

my $text = <<EXAMPLE;
This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)
EXAMPLE

my @list = split /(\d+)\s+Dollars?/, $text;
my $item_re = qr{(computer|apartment|miro|somethingelsemissing)}io;
while (1) {
my $piece = shift @list;
my $amount = shift @list;
last unless defined $amount;
my ($thing) = ($piece =~ $item_re);
$thing ||= '';
print "$amount;$thing;\n";
}
 
G

Gunnar Hjalmarsson

Fritz said:
1. I'm looking for a pattern (lets call it A), which I want to
capture and print out to standard output. It reocurs several times
in the text.

2. If I find the pattern A in the text, then I want to know,
whether or not a certain word (lets call it B) out of a word list
(lets call it C) preceeds pattern A.

3. In the end I want the following result: If A is found, preceeded
by a word out of C, then print A;C;. If A is found, but none of the
words out of C preceed A, the just print A;;.

For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn'
buy it for 1000 Dollars. I definitely would by an apartment for
3000 Dollars or a Miro for 1500 Dollars but not for 5000000
Dollars. For 50 Dollars you can hire me as a perl programmer - but
I guess I'm not worth the Dollar:)"

Here the regex, which of course is not working (just to give you an
idea):
/.*?(computer|apartment|miro|somethingelsemissing).*?(\d+)\s+Dollars?/
print ($1 . ";" . $2 . ";\n")/gies

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;

As I indicated in my previous reply, I believe that you'd better
capture respective pattern in two steps. This would do it:

while (/(.*?)(\d+)\s+Dollar?/igs) {
print "$2;",
$1 =~ /(computer|apartment|miro|somethingelsemissing)/i,
";\n";
}
 
F

Fritz Bayer

Gunnar Hjalmarsson said:
As I indicated in my previous reply, I believe that you'd better
capture respective pattern in two steps. This would do it:

while (/(.*?)(\d+)\s+Dollar?/igs) {
print "$2;",
$1 =~ /(computer|apartment|miro|somethingelsemissing)/i,
";\n";
}

Ok, I get the point and you a right - that would work, so it's a
solution. Thanks!

Just one more thing. Haven't you forgotten to print out $1 and how
come that you can append the regex using a comma (,)?
 
G

Gunnar Hjalmarsson

Fritz said:
Ok, I get the point and you a right - that would work, so it's a
solution. Thanks!

Just one more thing. Haven't you forgotten to print out $1 and how
come that you can append the regex using a comma (,)?

The above code prints the specified result. Didn't you try it?

Your questions indicate that you should study perldoc for a couple of
reasons.

1. The print() function "prints a string or a list of strings", in
this case a (comma separated) list. See "perldoc -f print".

2. One of the elements in the list that is passed to print() is the
return value (in list context) from the second pattern match, i.e. the
captured word. Read about the m// operator in "perldoc perlop".

I could have assigned what's captured from the pattern matches to
variables first, so this does the same thing:

while (/(.*?)(\d+)\s+Dollar?/igs) {
my $amount = $2;
my ($word) =
$1 =~ /(computer|apartment|miro|somethingelsemissing)/i;
$word ||= '';
print "$amount;$word;\n";
}

HTH
 
A

Anno Siegel

Glenn Jackman said:
[...]
For example:

"This is an example, which hopefully helps me and you to solve my
problem. I would buy a used computer for 50 Dollars but I wouldn' buy
it for 1000 Dollars. I definitely would by an apartment for 3000
Dollars or a Miro for 1500 Dollars but not for 5000000 Dollars. For 50
Dollars you can hire me as a perl programmer - but I guess I'm not
worth the Dollar:)"

Here the regex, which of course is not working (just to give you an
idea):
/.*?(computer|apartment|miro|somethingelsemissing).*?(\d+)\s+Dollars?/
print ($1 . ";" . $2 . ";\n")/gies

I would like the following result:

50;computer;
1000;;
3000;apartment;
1500;Miro;
5000000;;
50;;

I don't think regular expressions can do it.

They can, but it's not necessarily the best way to do it. Here is
one way:

my $pattern = qr/\d+/;
my $context = qr/computer|apartment|miro|somethingelsemissing/i;

while ( $sentence =~ /(?:($context).*?($pattern))|($pattern)/gs ) {
my $amount = $2 || $3;
my $article = $1 || '';
print "$amount;$article;\n";
}

The loop body could be re-written to be executed in the replacement
part of a s///e, but that's too horrible to write down.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,149
Messages
2,570,841
Members
47,388
Latest member
EarthaGilm

Latest Threads

Top