Can this be combined into one statement?

John Black · Oct 28, 2013

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax. What is it? Thanks.

John Black

Jürgen Exner · Oct 28, 2013

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

You almost got it:
$file = (split(/\s+/, $line))[-1];

jue

Dr.Ruud · Oct 28, 2013

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax. What is it? Thanks.

There are several ways to get there, examples:

$file = ( split " ", $line )[-1];

( $file ) = $line =~ /.*(\S+)/;

Dr.Ruud · Oct 29, 2013

I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line
[...] I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax. What is it? Thanks.

There are several ways to get there, examples:

$file = ( split " ", $line )[ -1 ];

( $file ) = $line =~ /.* (\S+) /x;

Also see Text::CSV_XS.

Dr.Ruud · Oct 29, 2013

$ perl -wle '$line = "a\tb c\td e\tfilename";
($file) = $line =~ /.*(\S+)/;
print $file'
e

Did you actually try your examples?

Yeah, but only badly.

($file) = $line =~ /.*\s(\S+)/;

Dr.Ruud · Oct 29, 2013

On 2013-10-28 23:03, John Black wrote:

( $file ) = $line =~ /.* (\S+) /x;

Correction:

( $file ) = $line =~ /.*\s (\S+) /x;

Dr.Ruud · Oct 29, 2013

$file = ( split " ", $line )[-1];

Click to expand...

$ perl -wle '$line = "a\tb c\td e\tfilename";
$file = ( split " ", $line )[-1];
print $file'
filename

That works as meant, see "perldoc -f split" about the specialness of a
single space as the first parameter of split.

But to take the original post literally ("spaces"), it should capture
"e\tfilename", so then the split should be done with / +/.

Rainer Weikusat · Oct 29, 2013

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

Rainer Weikusat · Oct 29, 2013

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line); [*]
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1]; [**]

but I have not hit upon a working syntax.

Since nobody wrote this so far: The first split call ([*]) runs split in
list context, hence, it returns a list of strings created by it. But the
second ([**]) runs it in scalar context and then, it splits into @_ and
returns the number of fields found in the input.

$file = (split(' ', $line))[-1]

works as intended because the split is evaluated inside a list because
of the outer brackets.

John Black · Oct 29, 2013

John Black said:
John Black said:

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Click to expand...

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

ooo, nice. However, if there happens to be any whitespace between the last field and the end
of the line, I don't think this will work. But I think the split method would still be ok.
I don't know if there ever will be any spaces after the filename but probably better to use
code that would handle it.

John Black

John Black · Oct 29, 2013

John Black said:
John Black said:

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

Click to expand...

You almost got it:
$file = (split(/\s+/, $line))[-1];

jue

Thanks all. This works. The answer was kind of obvious but I thought I tried that. Maybe I
put the first open paren after the split?

John Black

Peter J. Holzer · Oct 29, 2013

John Black said:
John Black said:

@line_arr = split(/\s+/, $line); [*]
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1]; [**]

but I have not hit upon a working syntax.

Click to expand...

Since nobody wrote this so far: The first split call ([*]) runs split in
list context, hence, it returns a list of strings created by it. But the
second ([**]) runs it in scalar context

On my systems (perl 5.8.0 to 5.14.2) the second call doesn't run at all.
It's a syntax error.

hp

Rainer Weikusat · Oct 29, 2013

John Black said:
John Black said:

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Click to expand...

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

Click to expand...

ooo, nice. However, if there happens to be any whitespace between the last field and the end
of the line, I don't think this will work.

$line =~ /(\S+)\s*$/

John Black · Oct 29, 2013

John Black said:
John Black said:

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

Click to expand...

ooo, nice. However, if there happens to be any whitespace between the last field and the end
of the line, I don't think this will work.

Click to expand...

$line =~ /(\S+)\s*$/

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

John Black

John Black

Peter J. Holzer · Oct 29, 2013

John Black said:
John Black said:

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

Click to expand...

[...]

$line =~ /(\S+)\s*$/

Click to expand...

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see:

#!/usr/bin/perl
use warnings;
use strict;

use Benchmark ':all';

my @lines;

for (1 .. 1000) {
my $line = "";
my $nwords = rand(10) + 1;
for my $iw (1 .. $nwords) {
$line .= "a" x (rand(10) + 1);
$line .= " " x (rand(3) + ($iw < $nwords));
}
push @lines, $line;
}

cmpthese(-5,
{
'split' => sub {
for my $line (@lines) {
my $file = (split(/\s+/, $line))[-1];
}
},
'match' => sub {
for my $line (@lines) {
my ($file) = $line =~ /(\S+)\s*$/;
}
}
}
);
__END__

Rate match split
match 208/s -- -67%
split 625/s 200% --

Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

hp

John Black · Oct 29, 2013

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1]; [...]

$line =~ /(\S+)\s*$/

Click to expand...

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

Click to expand...

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see:

#!/usr/bin/perl
use warnings;
use strict;

use Benchmark ':all';

my @lines;

for (1 .. 1000) {
my $line = "";
my $nwords = rand(10) + 1;
for my $iw (1 .. $nwords) {
$line .= "a" x (rand(10) + 1);
$line .= " " x (rand(3) + ($iw < $nwords));
}
push @lines, $line;
}

cmpthese(-5,
{
'split' => sub {
for my $line (@lines) {
my $file = (split(/\s+/, $line))[-1];
}
},
'match' => sub {
for my $line (@lines) {
my ($file) = $line =~ /(\S+)\s*$/;
}
}
}
);
__END__

Rate match split
match 208/s -- -67%
split 625/s 200% --

Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

hp

This is one of the things I love about math and computers. You can prove your case. I stand
corrected. My laptop got:

Rate match split
match 261/s -- -39%
split 427/s 64% --

BTW, what is the -5 option doing in the cmpthese function? I thought the first param was the
number of iterations, but then negative doesn't make sense?

John Black

Rainer Weikusat · Oct 29, 2013

Ben Morrow said:
Quoth "Peter J. Holzer said:

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

Click to expand...

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see: [...]
Rate match split
match 208/s -- -67%
split 625/s 200% --

Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

Click to expand...

Interestingly, perl is much better at optimising /.*\s(\S+)/ (it only
has to backtrack over the last word, instead of the whole string), so
that comes out faster again:

Rate \S+\s*$ split .*\s\S+
\S+\s*$ 274/s -- -66% -66%
split 794/s 190% -- -2%
.*\s\S+ 812/s 197% 2% --

Not much, though.

Click to expand...

I tried this as well: The more words are on such a line, the better the
"Don't backtrack!" match becomes.

John Black · Oct 30, 2013

Ben Morrow said:
Ben Morrow said:

Quoth "Peter J. Holzer said:

(e-mail address removed)

Yep, I thought of this after posting. Thanks. I like this. I bet
its faster than using
split which ends up extracting a bunch of fields that are never used here.

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see: [...]
Rate match split
match 208/s -- -67%
split 625/s 200% --

Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

Click to expand...

Interestingly, perl is much better at optimising /.*\s(\S+)/ (it only
has to backtrack over the last word, instead of the whole string), so
that comes out faster again:

Rate \S+\s*$ split .*\s\S+
\S+\s*$ 274/s -- -66% -66%
split 794/s 190% -- -2%
.*\s\S+ 812/s 197% 2% --

Not much, though.

Click to expand...

I tried this as well: The more words are on such a line, the better the
"Don't backtrack!" match becomes.

Why does /(\S+)\s*$/ have to backtrack over "the whole string" whereas /.*\s(\S+)/ does not?
I'm sure I don't undertand regex backtracking...

John Black

John Black · Oct 31, 2013

Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
following sequence of matches:

\S+ \s* $
"foo" " " no match, backtrack
"fo" "" no match, backtrack
"f" "" no match, backtrack

Now perl has tried all the matches starting at the beginning of the
string, so it has to move along the string and try again. It skips over
characters matching \S, since it's already tried all possible end-points
for \S+ in this word, then it skips over characters not matching \S,
since they can't possibly match, and starts again with:

"bar" " " no match, backtrack
"ba" "" no match, backtrack
"b" "" no match, backtrack

And again:

"baz" " " match

With more words in the string, or longer words, this would take more
attempts. OTOH, with /.*\s\S+/ it tries these matches:

.* \s \S+
"foo bar baz " no match, backtrack
"foo bar baz" " " no match, backtrack
"foo bar ba" no match, backtrack
"foo bar b" no match, backtrack
"foo bar " no match, backtrack
"foo bar" " " "baz"

which only ever has to backtrack over the last word. In the specific
case of a very long last word preceded by a small number of short words
it would come out slower than the first match, but otherwise it comes
out faster.

You can see what perl is doing by running something like

perl -Mre=debug -e'"foo bar baz " =~ /.*\s\S+/'

though it takes a bit of practice to get used to interpreting the
output.

Ben

Ben, thanks for the detailed explanation. This is good stuff to keep in mind when in a
performance critical loop, but if I were doing this again, I would still go with /(\S+)\s*$/
because it is (to me) much more clear about what its doing. The $ anchor makes it obvious
that we are grabbing the word at the end of the line. The other regex matches every word on
the line and then you have to deduce that of those, the word it will return is the last one
becasue of Perl's default greedy matching policy. Which makes it less intuitive and
readable.

John Black

Charles DeRykus · Oct 31, 2013

Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
following sequence of matches:

\S+ \s* $
"foo" " " no match, backtrack
"fo" "" no match, backtrack
"f" "" no match, backtrack

Now perl has tried all the matches starting at the beginning of the
string, so it has to move along the string and try again. It skips over
characters matching \S, since it's already tried all possible end-points
for \S+ in this word, then it skips over characters not matching \S,
since they can't possibly match, and starts again with:

"bar" " " no match, backtrack
"ba" "" no match, backtrack
"b" "" no match, backtrack

And again:

"baz" " " match

With more words in the string, or longer words, this would take more
attempts.
...

I thought a possessive quantifier would help with this more intuitive
alternative: (\S+)\s*$ -> (\S++)\s*+$. But, unless there's some basic
error on my part, then the possessive replacement ate the proverbial
dust even of the backtracking regex. Maybe there are still caching
issues as mentioned here:

http://www.perlmonks.org/bare/?node_id=664545

Insert NULL into mySQL datetime	3	Dec 25, 2013
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
To update one file with the another file's data..	4	Jan 17, 2008
non-static method cannot be referenced from a static context	2	Jan 31, 2010
Inserting into a database	1	Sep 13, 2006
FAQ 4.34 How do I extract selected columns from a string?	0	Apr 27, 2011
mathematical operation in a perl one liner in substitute	9	Dec 17, 2008
Can this be done (by a noob :))	27	Jul 30, 2010

Can this be combined into one statement?

John Black

Jürgen Exner

Dr.Ruud

Dr.Ruud

Dr.Ruud

Dr.Ruud

Dr.Ruud

Rainer Weikusat

Rainer Weikusat

John Black

John Black

Peter J. Holzer

Rainer Weikusat

John Black

Peter J. Holzer

John Black

Rainer Weikusat

John Black

John Black

Charles DeRykus

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads