Why doesn't this work?

S

Sandman

This works:
###
#!/usr/bin/perl
use strict;
use warnings;
my $string = "12.00 Simpsons 12.30 Fresh prince 14.00 Superbowl";
my @list = split / ?(?=\d\d\.\d\d)/, $string;
foreach (@list){
print "$_\n";
}
###

And outputs:
12.00 Simpsons
12.30 Fresh prince
14.00 Superbowl


This doesn't work:
###
#!/usr/bin/perl
use strict;
use warnings;
my $string = "author:Jane Smith institution:university of Cambridge year:1976";
my @list = split / ?(?=\w+:)/, $string;
foreach (@list){
print "$_\n";

}
###

It outputs:

a
u
t
h
o
r:Jane Smith
i
n
s
t
i
t
u
t
i
o
n:university of Cambridge
y
e
a
r:1976


What makes it so different? The range matching (\w+)? I thought only variable
length was forbidden in lookback, not in look forward, or am I misunderstanding
something completely?
 
P

Paul Lalli

Sandman said:
This works:
###
#!/usr/bin/perl
use strict;
use warnings;
my $string = "12.00 Simpsons 12.30 Fresh prince 14.00 Superbowl";
my @list = split / ?(?=\d\d\.\d\d)/, $string;

"Split on any position that matches an optional space, follwed by
exactly 2 digits, a period, and 2 digits"

Obviously, this only happens twice in the string, so you get your three
strings.
foreach (@list){
print "$_\n";
}
###

And outputs:
12.00 Simpsons
12.30 Fresh prince
14.00 Superbowl


This doesn't work:
###
#!/usr/bin/perl
use strict;
use warnings;
my $string = "author:Jane Smith institution:university of Cambridge year:1976";
my @list = split / ?(?=\w+:)/, $string;

"Split on any position that matches an optional space, followed by one
or more word characters and a :"

Starting at the beginning of the string, we match an optional space
(it's not there), followed by one or more letters and a colon
('uthor:'). Therefore, the position after 'a' should be split.

Next position, we match an optional space (not there), followed by one
or more letters an a colon ('thor:'). Therefore, the position after
'u' should be split.

Etcetera.

Why are you making the matching space optional? That would seem to be
a very necessary component of your split. You want to split on every
space that's followed by letters and a colon. The space is not
optional.

Remove the first ? from that regexp, and you get the results you
desired.
What makes it so different? The range matching (\w+)? I thought only variable
length was forbidden in lookback, not in look forward, or am I misunderstanding
something completely?

It's the fact that \d\d.\d\d could only match a very specific set of
substrings: 12.00, 12.30, 14.00. \w+:, on the other hand, can match
author:, uthor:, thor:, hor:, etcetera.

Paul Lalli
 
P

Paul Lalli

Paul said:
"Split on any position that matches an optional space, follwed by
exactly 2 digits, a period, and 2 digits"

Obviously, this only happens twice in the string, so you get your three
strings.

Correction. Because the space is optional, it actually matches three
times. However, there's a slight amount of magic involved in split
here, as documented in perldoc -f split:
Empty leading (or trailing) fields are produced when
there positive width matches at the beginning (or
end) of the string; a zero-width match at the
beginning (or end) of the string does not produce an
empty field.

Because the match was zero-width (the space was optional and the
look-ahead doesn't add to the match width), there was no empty leading
field produced. This explains why you did not get an empty string as
the first element of @list.

Paul Lalli
 
S

Sandman

Paul Lalli said:
Remove the first ? from that regexp, and you get the results you
desired.

Of course! Argh! Now I feel stupid. :p
It's the fact that \d\d.\d\d could only match a very specific set of
substrings: 12.00, 12.30, 14.00. \w+:, on the other hand, can match
author:, uthor:, thor:, hor:, etcetera.

Yeah, it all make sense to me. Now. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top