empty first element after split

M

Michael Hamer

Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:

$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].
 
J

Jim Gibson

Michael Hamer said:
Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:

$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].

It is because this code is using split in an inverted fashion.
Normally, split is looking for substrings separated by delimiters,
returning the substrings and discarding the delimiters. Here,
parentheses are used to capture a portion of the delimiters, and split
is returning the captured portion intermixed with the substrings.
Therefore, the first field ('a' in your example) is actually part of a
delimiter, not part of a substring, and it is the portion of the string
that precedes the first delimiter that ends up in $field[0]. Since
there is nothing before the 'a', there is nothing in $field[0].
 
H

Heiko Eißfeldt

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].


because there is nothing matching before the delimiter.
A simple split delivers items seperated by the delimiter without the
delimiter parts.
If you want parts from the delimiter also, you need to capture them.

The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f
 
D

Dave B

Michael said:
Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:

$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].

My understanding is that, since the separator is not the single blank " ",
and the string starts with a delimiter, there's always a null field "before"
the first delimiter, and split by default does not remove leading empty
fields (this is similar to what happens with awk when the separator is not
the default blank).

If you split on /:/, you should get no empty leading fields (though there
may be other reasons to prefer the original method).
 
D

Dave B

Heiko Ei������������������������������������������� said:
The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f

I think the first field is the empty string, rather than undef (but I may be
wrong).
 
J

John W. Krahn

Dave said:
Heiko Ei������������������������������������������� said:
The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f

I think the first field is the empty string, rather than undef (but I may be
wrong).

No, you are correct.


John
 
T

Tad J McClellan

my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field

Others have answered your real question, but I'll point out
that those 2 lines can be replaced with:

my(undef, @fields) = split /^([^:]+):\s*/m; # ignore leading null field
 
D

Dr.Ruud

Tad J McClellan schreef:
Michael Hamer:
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field

Others have answered your real question, but I'll point out
that those 2 lines can be replaced with:

my(undef, @fields) = split /^([^:]+):\s*/m; # ignore leading null
field

The "[^:]+" can match a newline, so a "^([^:]+):" in a multiline context
is probably better written as "^(.+?):".



$ echo -ne "A: aaa\nB: bbb\n bbb\nC: ccc\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^([^:]+):\s*/m
'
A: aaa
B: bbb
bbb
C: ccc

<>
< bbb
C>
<ccc
$ echo -ne "A: aaa\nB: bbb\n bbb\nC: ccc\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^(.+?):\s*/m
'
A: aaa
B: bbb
bbb
C: ccc

<>
<B>
<bbb
bbb

A further variant, that captures a header field value that is white
space only:

$ echo -ne "A: aaa\nB: bbb\n bbb\nC:\n \nD: ddd\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^(.+?):[^\n\S]*/m
'
A: aaa
B: bbb
bbb
C:

D: ddd

<>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top