Can this be combined into one statement?

J

John Black

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax. What is it? Thanks.

John Black
 
J

Jürgen Exner

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

You almost got it:
$file = (split(/\s+/, $line))[-1];

jue
 
D

Dr.Ruud

Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax. What is it? Thanks.

There are several ways to get there, examples:

$file = ( split " ", $line )[-1];

( $file ) = $line =~ /.*(\S+)/;
 
D

Dr.Ruud

I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line
[...] I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax. What is it? Thanks.

There are several ways to get there, examples:

$file = ( split " ", $line )[ -1 ];

( $file ) = $line =~ /.* (\S+) /x;


Also see Text::CSV_XS.
 
D

Dr.Ruud

$ perl -wle '$line = "a\tb c\td e\tfilename";
($file) = $line =~ /.*(\S+)/;
print $file'
e

Did you actually try your examples?

Yeah, but only badly.

($file) = $line =~ /.*\s(\S+)/;
 
D

Dr.Ruud

$file = ( split " ", $line )[-1];

$ perl -wle '$line = "a\tb c\td e\tfilename";
$file = ( split " ", $line )[-1];
print $file'
filename

That works as meant, see "perldoc -f split" about the specialness of a
single space as the first parameter of split.


But to take the original post literally ("spaces"), it should capture
"e\tfilename", so then the split should be done with / +/.
 
R

Rainer Weikusat

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/
 
R

Rainer Weikusat

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line); [*]
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1]; [**]

but I have not hit upon a working syntax.

Since nobody wrote this so far: The first split call ([*]) runs split in
list context, hence, it returns a list of strings created by it. But the
second ([**]) runs it in scalar context and then, it splits into @_ and
returns the number of fields found in the input.

$file = (split(' ', $line))[-1]

works as intended because the split is evaluated inside a list because
of the outer brackets.
 
J

John Black

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

ooo, nice. However, if there happens to be any whitespace between the last field and the end
of the line, I don't think this will work. But I think the split method would still be ok.
I don't know if there ever will be any spaces after the filename but probably better to use
code that would handle it.

John Black
 
J

John Black

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

You almost got it:
$file = (split(/\s+/, $line))[-1];

jue

Thanks all. This works. The answer was kind of obvious but I thought I tried that. Maybe I
put the first open paren after the split?

John Black
 
P

Peter J. Holzer

John Black said:
@line_arr = split(/\s+/, $line); [*]
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1]; [**]

but I have not hit upon a working syntax.

Since nobody wrote this so far: The first split call ([*]) runs split in
list context, hence, it returns a list of strings created by it. But the
second ([**]) runs it in scalar context

On my systems (perl 5.8.0 to 5.14.2) the second call doesn't run at all.
It's a syntax error.

hp
 
R

Rainer Weikusat

John Black said:
John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

ooo, nice. However, if there happens to be any whitespace between the last field and the end
of the line, I don't think this will work.

$line =~ /(\S+)\s*$/
 
J

John Black

John Black said:
Simple question I think. I have a string $line that has some number of fields separated by
one or more spaces. The filename is the last field on the line and I want to grab it. There
must be a way to write these two lines as one line (skipping the intermediate @line_arr
step):

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];

but I have not hit upon a working syntax.

Grab a sequence of non-whitespace characters anchored at the end of the
string?

$line =~ /(\S+)$/

ooo, nice. However, if there happens to be any whitespace between the last field and the end
of the line, I don't think this will work.

$line =~ /(\S+)\s*$/

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

John Black

John Black
 
P

Peter J. Holzer

John Black said:
@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1];
[...]

$line =~ /(\S+)\s*$/

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see:


#!/usr/bin/perl
use warnings;
use strict;

use Benchmark ':all';

my @lines;

for (1 .. 1000) {
my $line = "";
my $nwords = rand(10) + 1;
for my $iw (1 .. $nwords) {
$line .= "a" x (rand(10) + 1);
$line .= " " x (rand(3) + ($iw < $nwords));
}
push @lines, $line;
}

cmpthese(-5,
{
'split' => sub {
for my $line (@lines) {
my $file = (split(/\s+/, $line))[-1];
}
},
'match' => sub {
for my $line (@lines) {
my ($file) = $line =~ /(\S+)\s*$/;
}
}
}
);
__END__

Rate match split
match 208/s -- -67%
split 625/s 200% --


Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

hp
 
J

John Black

@line_arr = split(/\s+/, $line);
$file = $line_arr[-1];

but I've tried various syntaxes like:

$file = split(/\s+/, $line)[-1]; [...]

$line =~ /(\S+)\s*$/

Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see:


#!/usr/bin/perl
use warnings;
use strict;

use Benchmark ':all';

my @lines;

for (1 .. 1000) {
my $line = "";
my $nwords = rand(10) + 1;
for my $iw (1 .. $nwords) {
$line .= "a" x (rand(10) + 1);
$line .= " " x (rand(3) + ($iw < $nwords));
}
push @lines, $line;
}

cmpthese(-5,
{
'split' => sub {
for my $line (@lines) {
my $file = (split(/\s+/, $line))[-1];
}
},
'match' => sub {
for my $line (@lines) {
my ($file) = $line =~ /(\S+)\s*$/;
}
}
}
);
__END__

Rate match split
match 208/s -- -67%
split 625/s 200% --


Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

hp

This is one of the things I love about math and computers. You can prove your case. I stand
corrected. My laptop got:

Rate match split
match 261/s -- -39%
split 427/s 64% --

BTW, what is the -5 option doing in the cmpthese function? I thought the first param was the
number of iterations, but then negative doesn't make sense?

John Black
 
R

Rainer Weikusat

Ben Morrow said:
Quoth "Peter J. Holzer said:
Yep, I thought of this after posting. Thanks. I like this. I bet its faster than using
split which ends up extracting a bunch of fields that are never used here.

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see: [...]
Rate match split
match 208/s -- -67%
split 625/s 200% --


Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

Interestingly, perl is much better at optimising /.*\s(\S+)/ (it only
has to backtrack over the last word, instead of the whole string), so
that comes out faster again:

Rate \S+\s*$ split .*\s\S+
\S+\s*$ 274/s -- -66% -66%
split 794/s 190% -- -2%
.*\s\S+ 812/s 197% 2% --

Not much, though.

I tried this as well: The more words are on such a line, the better the
"Don't backtrack!" match becomes.
 
J

John Black

Ben Morrow said:
Quoth "Peter J. Holzer said:
(e-mail address removed)

Yep, I thought of this after posting. Thanks. I like this. I bet
its faster than using
split which ends up extracting a bunch of fields that are never used here.

OTOH the regexp probably needs to do a lot of backtracking, so you might
lose that bet.

Let's see: [...]
Rate match split
match 208/s -- -67%
split 625/s 200% --


Yup, split is about 3 times faster for this particular set of strings
(may be wildly different for other strings).

Interestingly, perl is much better at optimising /.*\s(\S+)/ (it only
has to backtrack over the last word, instead of the whole string), so
that comes out faster again:

Rate \S+\s*$ split .*\s\S+
\S+\s*$ 274/s -- -66% -66%
split 794/s 190% -- -2%
.*\s\S+ 812/s 197% 2% --

Not much, though.

I tried this as well: The more words are on such a line, the better the
"Don't backtrack!" match becomes.

Why does /(\S+)\s*$/ have to backtrack over "the whole string" whereas /.*\s(\S+)/ does not?
I'm sure I don't undertand regex backtracking...

John Black
 
J

John Black

Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
following sequence of matches:

\S+ \s* $
"foo" " " no match, backtrack
"fo" "" no match, backtrack
"f" "" no match, backtrack

Now perl has tried all the matches starting at the beginning of the
string, so it has to move along the string and try again. It skips over
characters matching \S, since it's already tried all possible end-points
for \S+ in this word, then it skips over characters not matching \S,
since they can't possibly match, and starts again with:

"bar" " " no match, backtrack
"ba" "" no match, backtrack
"b" "" no match, backtrack

And again:

"baz" " " match

With more words in the string, or longer words, this would take more
attempts. OTOH, with /.*\s\S+/ it tries these matches:

.* \s \S+
"foo bar baz " no match, backtrack
"foo bar baz" " " no match, backtrack
"foo bar ba" no match, backtrack
"foo bar b" no match, backtrack
"foo bar " no match, backtrack
"foo bar" " " "baz"

which only ever has to backtrack over the last word. In the specific
case of a very long last word preceded by a small number of short words
it would come out slower than the first match, but otherwise it comes
out faster.

You can see what perl is doing by running something like

perl -Mre=debug -e'"foo bar baz " =~ /.*\s\S+/'

though it takes a bit of practice to get used to interpreting the
output.

Ben

Ben, thanks for the detailed explanation. This is good stuff to keep in mind when in a
performance critical loop, but if I were doing this again, I would still go with /(\S+)\s*$/
because it is (to me) much more clear about what its doing. The $ anchor makes it obvious
that we are grabbing the word at the end of the line. The other regex matches every word on
the line and then you have to deduce that of those, the word it will return is the last one
becasue of Perl's default greedy matching policy. Which makes it less intuitive and
readable.

John Black
 
C

Charles DeRykus

Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
following sequence of matches:

\S+ \s* $
"foo" " " no match, backtrack
"fo" "" no match, backtrack
"f" "" no match, backtrack

Now perl has tried all the matches starting at the beginning of the
string, so it has to move along the string and try again. It skips over
characters matching \S, since it's already tried all possible end-points
for \S+ in this word, then it skips over characters not matching \S,
since they can't possibly match, and starts again with:

"bar" " " no match, backtrack
"ba" "" no match, backtrack
"b" "" no match, backtrack

And again:

"baz" " " match

With more words in the string, or longer words, this would take more
attempts.
...

I thought a possessive quantifier would help with this more intuitive
alternative: (\S+)\s*$ -> (\S++)\s*+$. But, unless there's some basic
error on my part, then the possessive replacement ate the proverbial
dust even of the backtracking regex. Maybe there are still caching
issues as mentioned here:

http://www.perlmonks.org/bare/?node_id=664545
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top