look-ahead search for overlapping

A

Anno Siegel

John W. Krahn said:
Huub said:
Codesample: s/(?=([\w\b\]{3}))[\w\b]{1}/(\1)/g
Input: this is a test for fun
Desired output: this is a is a test a test for test for fun for fun


Apologies. I did not realize that random string of words represented
both your input and output.

I still, however, don't understand what you're trying to do. In
precisely what manner does the output relate to the input? It looks
like your output has random pieces of the input interspersed into the
input itself. You need to define how that output is generated.

Paul Lalli

What I'm trying to do is read 3 words, print the 3 words, loose the 1st
word, read the 4th word, print the 3 words, loose the new 1st word, read
the new 4th word, print the new 3 words, etc. What the script does is
basically the same, but for letters. Sofar I can't figure out how to do
it with words.

$ perl -le'
$_ = q/this is a test for fun/;
print;
s/(\w+)(?=(\W+\w+\W+\w+))/$1$2/g;
print;
'
this is a test for fun
this is a is a test a test for test for fun for fun

The regex solutions in this thread are impressive, but in actual code
I'd use a combination of split() and array handling. That deals with
two partial problems separately: splitting the original string into
words (or characters, or whatever), and generating groups of three
from a list:

$_ = 'this is a test for fun';
my $n = 2; # one less than the number of words per group

my @l = split();
my @res = join ' ' => map @l[ $_ .. $_ + $n] => 0 .. $#l - $n;

print "@res\n";

This doesn't generate groups of less than three at the end. If that
is desired one could probably add one or more empty strings to @l.

Anno
 
H

Huub

$ perl -le'$_ = q{this is a test for fun};
s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g; print;'
this is a is a test a test for test for funfor fun
$

I'm trying to use this reg.exp. and read from a textfile of unknown size
with an unknown number of words and writing the result to another
textfile. This is what I've tried:

while ($woord = <INPUT_FILE>)
{
$woord = s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g;
print OUTPUT_FILE $woord
}

resulting in this error:

Use of uninitialized value in substitution (s///) at triplet.pl line 32.

I suppose this is referring to the reg.exp. but so far I can't figure it
out. How can I find this using perldoc?

Thank you for helping.
 
P

Paul Lalli

Huub said:
while ($woord = <INPUT_FILE>)
{
$woord = s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g;
print OUTPUT_FILE $woord
}

resulting in this error:

Use of uninitialized value in substitution (s///) at triplet.pl line 32.

I suppose this is referring to the reg.exp. but so far I can't figure it
out.

No, it's referring to the $_ variable. You used the assignment
operator (=), instead of the binding operator (=~). When a pattern
match or search-and-replace is not bound to any explicit variable, it
is automatically performed on $_. The code you ran is equivalent to:
$woord = ( $_ =~ s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g);
Whereas what you meant was:
$woord =~ s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g;
How can I find this using perldoc?

I don't know what you mean by 'this'. The proper syntax for pattern
matches and regexps can be found in any of:
perldoc perlre
perldoc perlretut
perldoc perlreref

Paul Lalli
 
T

Tad McClellan

Huub said:
while ($woord = <INPUT_FILE>)
{
$woord = s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g;
print OUTPUT_FILE $woord
}

resulting in this error:

Use of uninitialized value in substitution (s///) at triplet.pl line 32.


That is not an error message.

That is a warning message.

I suppose this is referring to the reg.exp. but so far I can't figure it
out.


It is referring to one of the operands to the s/// operator.

If you do not bind the s/// to a string, it will attempt the match
against the string in $_.

You have not put anything into $_, so perl warns you about that.

I expect that you do want to bind the s/// to $woord:

$woord =~ s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g;
^^
^^
 
H

Huub

Paul said:
No, it's referring to the $_ variable. You used the assignment
operator (=), instead of the binding operator (=~). When a pattern
match or search-and-replace is not bound to any explicit variable, it
is automatically performed on $_. The code you ran is equivalent to:
$woord = ( $_ =~ s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g);
Whereas what you meant was:
$woord =~ s/(\w+\W+)(?=((?:\w+(?:\W+|$)){2}))/$1$2/g;

Thank you very much. I was looking for this, but I didn't know what it
was called and looking for '=~' to confirm my idea somehow gave no
result. Was enough to make me doubt using it.
I don't know what you mean by 'this'. The proper syntax for pattern
matches and regexps can be found in any of:
perldoc perlre
perldoc perlretut
perldoc perlreref

Thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,944
Members
47,491
Latest member
mohitk

Latest Threads

Top