empty first element after split

Michael Hamer · Jul 11, 2008

Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:

$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].

Jim Gibson · Jul 11, 2008

Michael Hamer said:
Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:

$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].

It is because this code is using split in an inverted fashion.
Normally, split is looking for substrings separated by delimiters,
returning the substrings and discarding the delimiters. Here,
parentheses are used to capture a portion of the delimiters, and split
is returning the captured portion intermixed with the substrings.
Therefore, the first field ('a' in your example) is actually part of a
delimiter, not part of a substring, and it is the portion of the string
that precedes the first delimiter that ends up in $field[0]. Since
there is nothing before the 'a', there is nothing in $field[0].

Heiko Eißfeldt · Jul 11, 2008

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].

because there is nothing matching before the delimiter.
A simple split delivers items seperated by the delimiter without the
delimiter parts.
If you want parts from the delimiter also, you need to capture them.

The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f

Dave B · Jul 11, 2008

Michael said:
Hi,

the Perl Cookbook suggests in Recipe 11.10 "Reading and Writing Hash
Records to Text Files" the following code to read a hash from a file:

$/ = ""; # paragraph read mode
while (<>) {
my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field
push(@Array_of_Records, { map /(.*)/, @fields });
}

or, a bit simpler for testing:

$text="a:b\nc:d\ne:f";
my @fields = split /^([^:]+):\s*/m,$text;
foreach $key (@fields)
{
print "field is $key\n";
}

This code works as intended, but I don't understand it. Why is
$fields[0] empty after the split? I would have expected it to contain
the "a" but that is found in $fields[1].

My understanding is that, since the separator is not the single blank " ",
and the string starts with a delimiter, there's always a null field "before"
the first delimiter, and split by default does not remove leading empty
fields (this is similar to what happens with awk when the separator is not
the default blank).

If you split on /:/, you should get no empty leading fields (though there
may be other reasons to prefer the original method).

Dave B · Jul 11, 2008

Heiko Eiï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ said:
The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f

I think the first field is the empty string, rather than undef (but I may be
wrong).

John W. Krahn · Jul 11, 2008

Dave said:
Heiko Eiï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ said:

The delimiter is ^([^:]+):\s* which matches a:. Everything
before the : is captured, so you get
undef, a, b, c, d, e, f

Click to expand...

I think the first field is the empty string, rather than undef (but I may be
wrong).

No, you are correct.

John

Tad J McClellan · Jul 12, 2008

my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field

Others have answered your real question, but I'll point out
that those 2 lines can be replaced with:

my(undef, @fields) = split /^([^:]+):\s*/m; # ignore leading null field

Dr.Ruud · Jul 12, 2008

Tad J McClellan schreef:

Michael Hamer:

my @fields = split /^([^:]+):\s*/m;
shift @fields; # for leading null field

Click to expand...

Others have answered your real question, but I'll point out
that those 2 lines can be replaced with:

my(undef, @fields) = split /^([^:]+):\s*/m; # ignore leading null
field

The "[^:]+" can match a newline, so a "^([^:]+):" in a multiline context
is probably better written as "^(.+?):".

$ echo -ne "A: aaa\nB: bbb\n bbb\nC: ccc\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^([^:]+):\s*/m
'
A: aaa
B: bbb
bbb
C: ccc

<>
< bbb
C>
<ccc
$ echo -ne "A: aaa\nB: bbb\n bbb\nC: ccc\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^(.+?):\s*/m
'
A: aaa
B: bbb
bbb
C: ccc

<>

<B>
<bbb
bbb

A further variant, that captures a header field value that is white
space only:

$ echo -ne "A: aaa\nB: bbb\n bbb\nC:\n \nD: ddd\n" |
perl -le'
undef $/;
$_ = <>;
print;
print "<$_>" for split /^(.+?):[^\n\S]*/m
'
A: aaa
B: bbb
bbb
C:

D: ddd

<>

empty leading field from split()	1	Nov 2, 2006
FAQ 4.31 How can I split a [character] delimited string except when inside [character]?	0	Apr 13, 2011
URI queries with varied amounts of named values	10	Apr 3, 2009
'Needless flexibilities' and structured records [very long]	10	Mar 15, 2013
Could someone help me with this source code?	5	Jan 20, 2007
format results of duplicate ldapsearch records	1	Jul 15, 2005
Is Scanner's nextLine() Supposed to Return True with Unread Empty Lines?	1	Mar 13, 2011
sendmail won't send me email but will the person filling out form	0	Nov 6, 2005

empty first element after split

Michael Hamer

Jim Gibson

Heiko Eißfeldt

Dave B

Dave B

John W. Krahn

Tad J McClellan

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads