Can this be combined into one statement?

Rainer Weikusat · Nov 1, 2013

Charles DeRykus said:
Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
following sequence of matches:

Click to expand...

[...]

I thought a possessive quantifier would help with this more intuitive
alternative: (\S+)\s*$ -> (\S++)\s*+$. But, unless there's some basic
error on my part, then the possessive replacement ate the proverbial
dust even of the backtracking regex.

BTW:
-----------
use Benchmark qw(cmpthese);

use constant MAX_WORDS => 15;
use constant MAX_LEN => 20;
use constant LINES => 100;

my (@lines, $line, $n, $nw, $x);

srand(0xdeafbabe); # repeatable

$n = LINES;
do {
$nw = int(rand(MAX_WORDS - 2)) + 2;
$line = '';
do {
$line .= 'x' x (int(rand(MAX_LEN - 1)) + 1);
$line .= ' ' x (rand(5) + 1) if $nw > 1;
} while --$nw;

push(@lines, $line);
# print STDERR ("'$line'\n");
} while --$n;

cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;
}});

Tim McDaniel · Nov 2, 2013

Ben, thanks for the detailed explanation. This is good stuff to keep
in mind when in a performance critical loop, but if I were doing this
again, I would still go with /(\S+)\s*$/ because it is (to me) much
more clear about what its doing.

I think the split(...) version is clearer still, and that's what I
would use.

C.DeRykus · Nov 2, 2013

Charles DeRykus said:
Charles DeRykus said:

On 10/30/2013 11:29 AM, Ben Morrow wrote:

Why does /(\S+)\s*$/ have to backtrack over "the whole string" whereas
/.*\s(\S+)/ does not?
I'm sure I don't undertand regex backtracking...

Consider a string like "foo bar baz ". For /\S+\s*$/ perl tries the
following sequence of matches:

Click to expand...

[...]

I thought a possessive quantifier would help with this more intuitive

Click to expand...

alternative: (\S+)\s*$ -> (\S++)\s*+$. But, unless there's some basic

Click to expand...

error on my part, then the possessive replacement ate the proverbial

Click to expand...

dust even of the backtracking regex.

Click to expand...

BTW:
-----------
use Benchmark qw(cmpthese);

use constant MAX_WORDS => 15;
use constant MAX_LEN => 20;
use constant LINES => 100;

my (@lines, $line, $n, $nw, $x);
srand(0xdeafbabe); # repeatable
$n = LINES;

do {
$nw = int(rand(MAX_WORDS - 2)) + 2;
$line = '';
do {
$line .= 'x' x (int(rand(MAX_LEN - 1)) + 1);
$line .= ' ' x (rand(5) + 1) if $nw > 1;
} while --$nw;
push(@lines, $line);
# print STDERR ("'$line'\n");
} while --$n;

cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;
}});

'rere' proves going 'reverse' can be a great idea.

(even does well as MAX_LENGTH grows... of course
split wins if you get crazy big)

Rainer Weikusat · Nov 3, 2013

I think the split(...) version is clearer still, and that's what I
would use.

But that's really nothing except /.*\s(\S+)\s*/ written in a different
way: It parses the string from left to right in order to create a list
of words aka

split(/\s+/, $line)

this has to happen in list context, otherwise, it won't work as intended
which makes

(split(/\s+/, $line))

this will look very much like a spurious pair of parentheses to someone
who is not familiar with the split documentation. Lastly, after the list
has been 'captured' in this way, it is indexed by -1

(split(/\s+/, $line))[-1]

In Perl, this means 'count backwards from the end of the list'. In other
languages, it means something different or is invalid altogether.

All of this combined doesn't exactly strike me as a particularly clear
way to express 'get the last word from a string'. Imagine these were
words written on paper and you were asked to cut the last word off the
sentence with a pair of scissors. The idea to start cutting the whole
strip of paper into pieces and drop them all on the floor except the
last one could be made into a nice sketch (skit?) if the piece of paper
was long enough and most people watching this would consider the person
doing it less-than-intellectually-gifted and "but's that what I always
do" (aka 'the construct is familiar to me') wouldn't improve this.

Rainer Weikusat · Nov 4, 2013

C.DeRykus said:
]

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;
}});

Click to expand...

'rere' proves going 'reverse' can be a great idea.

(even does well as MAX_LENGTH grows... of course
split wins if you get crazy big)

It is mostly a demonstration that at least some matches anchored at the
end of the string are theoretically trivial to optimize -- revert the
pattern and match backwards from the end.

Charlton Wilbur · Nov 4, 2013

[wrapping a function in parentheses to induce list context]

RW> this will look very much like a spurious pair of parentheses to
RW> someone who is not familiar with the split
RW> documentation.

[indexing a list with -1]

RW> All of this combined doesn't exactly strike me as a particularly
RW> clear way to express 'get the last word from a string'.

My audience, when am writing Perl, is generally people who are competent
in Perl. If I insisted on avoiding constructions that depended on
context or on Perl's builtin data structures, I might as well be writing
in Java.

At a former place of employment, writing in C, I wrote the comment

/* breadth-first search rather than depth-first search because
average-case performance is the same but worst-case performance is
far better */

and this was called out as needing more explanation by a (thoroughly
incompetent) manager who had never encountered the terms "breadth-first"
and "depth-first."[1] I retorted "A programmer who has never encountered
the concept 'breadth-first' or 'depth-first' and who can't be bothered
to look them up and learn has no business modifying this code."

RW> Imagine these were words written on paper and you were asked to
RW> cut the last word off the sentence with a pair of scissors. The
RW> idea to start cutting the whole strip of paper into pieces and
RW> drop them all on the floor except the last one could be made
RW> into a nice sketch (skit?) if the piece of paper was long enough
RW> and most people watching this would consider the person doing it
RW> less-than-intellectually-gifted

Imagine that you put a bunch of words on index cards and told a person
to sort them. He would not bother with mergesort or quicksort or bubble
sort, but would use insertion sort; and if you explained that quicksort
would sort the cards in-place while mergesort had better average-case
performance, he would look at you as less-than-intellectually-gifted.
While he may be justified in this, the simple fact is that physical
solutions have different constraints than digital solutions, and
attempting to construct an argument by analogy where you pretend that
the digital solution has the same constraints as the physical solution
is, frankly, moronic.

Charlton

[1] His degree was in music, which explains his ignorance but not his
incompetence; my degree is also in music.

C.DeRykus · Nov 4, 2013

It is mostly a demonstration that at least some matches anchored at the
end of the string are theoretically trivial to optimize -- revert the
pattern and match backwards from the end.

Yes, definitely a potential speedup although the more familiar regex anchored at the end is likely the most intuitive and popular solution.

[A very long time ago, there was a discussion of a nice-to-have regex modifier that'd search backwards
and handle the messy reversals automatically. IMO that would be potentially faster and an intuitive win too ]

gamo · Nov 4, 2013

El 01/11/13 22:34, Rainer Weikusat escribió:

use Benchmark qw(cmpthese);

use constant MAX_WORDS => 15;
use constant MAX_LEN => 20;
use constant LINES => 100;

my (@lines, $line, $n, $nw, $x);

srand(0xdeafbabe); # repeatable

$n = LINES;
do {
$nw = int(rand(MAX_WORDS - 2)) + 2;
$line = '';
do {
$line .= 'x' x (int(rand(MAX_LEN - 1)) + 1);
$line .= ' ' x (rand(5) + 1) if $nw > 1;
} while --$nw;

push(@lines, $line);
# print STDERR ("'$line'\n");
} while --$n;

cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;

},
gamo => sub {
s/\s+$//;
$x = substr($_, rindex($_, " ") );

}});

If this is correct, consider it.

Best regards

gamo · Nov 5, 2013

El 05/11/13 00:08, gamo escribió:

El 01/11/13 22:34, Rainer Weikusat escribió:

cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;

Click to expand...

},
gamo => sub {
### s/\s+$//;
### $x = substr($_, rindex($_, " ") );

$x = substr($_,1+rindex($_," ")) for @lines;

If this is correct, consider it.

Of course it's not. It's slower, but if no \s is allowed at end of
lines, it could be faster. My apologies.

Rainer Weikusat · Nov 5, 2013

gamo said:
El 05/11/13 00:08, gamo escribió:

El 01/11/13 22:34, Rainer Weikusat escribió:

cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;

Click to expand...

},
gamo => sub {
### s/\s+$//;
### $x = substr($_, rindex($_, " ") );

Click to expand...

$x = substr($_,1+rindex($_," ")) for @lines;

If this is correct, consider it.

Click to expand...

Of course it's not. It's slower, but if no \s is allowed at end of
lines, it could be faster. My apologies.

The example code doesn't produce lines with spaces at the end. If it
did, the s/// would be a dubious idea because it would change the input
data. Apart from that, using rindex here is a good idea speedwise
provided that words are really separated by spaces and not by
'whitespace characters'. In an ASCII/ISO8859-1 environment, that would
be [ \t\v] (written as regex character class). I have no idea what
exactly a vertical tab (\v) is supposed to be and until recently, Perl
didn't support it, anyway, but horizontal tabs (\t) are not that
uncommon.

gamo · Nov 5, 2013

El 05/11/13 12:39, Rainer Weikusat escribió:

gamo said:
gamo said:

El 05/11/13 00:08, gamo escribió:

El 01/11/13 22:34, Rainer Weikusat escribió:
cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;
},
gamo => sub {
### s/\s+$//;
### $x = substr($_, rindex($_, " ") );

Click to expand...

$x = substr($_,1+rindex($_," ")) for @lines;

}});

If this is correct, consider it.

Click to expand...

Of course it's not. It's slower, but if no \s is allowed at end of
lines, it could be faster. My apologies.

Click to expand...

The example code doesn't produce lines with spaces at the end. If it
did, the s/// would be a dubious idea because it would change the input
data. Apart from that, using rindex here is a good idea speedwise
provided that words are really separated by spaces and not by
'whitespace characters'. In an ASCII/ISO8859-1 environment, that would
be [ \t\v] (written as regex character class). I have no idea what
exactly a vertical tab (\v) is supposed to be and until recently, Perl
didn't support it, anyway, but horizontal tabs (\t) are not that
uncommon.

So, I win

jesus@casa:~/src$ perl cmp.pl
Rate split renb rere gamo
split 13395/s -- -6% -28% -80%
renb 14306/s 7% -- -23% -79%
rere 18620/s 39% 30% -- -72%
gamo 66776/s 399% 367% 259% --

Well, no big deal. I could make a program 15 times _slower_ using threads.

Rainer Weikusat · Nov 5, 2013

gamo said:
El 05/11/13 12:39, Rainer Weikusat escribió:

gamo said:

El 05/11/13 00:08, gamo escribió:
El 01/11/13 22:34, Rainer Weikusat escribió:
cmpthese(-3,
{
split => sub {
$x = (split(/\s+/, $_))[-1] for @lines;
},

renb => sub {
/.*\s(\S+)$/ and $x = $1 for @lines;
},

rere => sub {
reverse($_) =~ /^(\S+)/ and $x = reverse($1) for @lines;
},
gamo => sub {
### s/\s+$//;
### $x = substr($_, rindex($_, " ") );

$x = substr($_,1+rindex($_," ")) for @lines;

}});

If this is correct, consider it.

Of course it's not. It's slower, but if no \s is allowed at end of
lines, it could be faster. My apologies.

Click to expand...

The example code doesn't produce lines with spaces at the end. If it
did, the s/// would be a dubious idea because it would change the input
data. Apart from that, using rindex here is a good idea speedwise
provided that words are really separated by spaces and not by
'whitespace characters'. In an ASCII/ISO8859-1 environment, that would
be [ \t\v] (written as regex character class). I have no idea what
exactly a vertical tab (\v) is supposed to be and until recently, Perl
didn't support it, anyway, but horizontal tabs (\t) are not that
uncommon.

Click to expand...

So, I win

rindex doesn't handle 'whitespace', only the exact string passed to
it. Taking this into account, if you're posting was not supposed to
present another useful idea but to communicated something about you, it
would be "you cheat".

HTH.

gamo · Nov 5, 2013

El 05/11/13 17:11, Rainer Weikusat escribió:

rindex doesn't handle 'whitespace', only the exact string passed to
it. Taking this into account, if you're posting was not supposed to
present another useful idea but to communicated something about you, it
would be "you cheat".

HTH.

I swear that I want to cheat in puzzles as many times as I could. The
only problem is that there are problems in which I don't manage how to
do it. Would you mind to read again the MCE post and help me to cheat
in it? I could post a real problem, with a linear heuristic solution.

Thanks ;-)

Can anyone code this for me ?	1	Dec 6, 2024
Insert NULL into mySQL datetime	3	Dec 25, 2013
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
To update one file with the another file's data..	4	Jan 17, 2008
non-static method cannot be referenced from a static context	2	Jan 31, 2010
Inserting into a database	1	Sep 13, 2006
FAQ 4.34 How do I extract selected columns from a string?	0	Apr 27, 2011
mathematical operation in a perl one liner in substitute	9	Dec 17, 2008

Can this be combined into one statement?

Rainer Weikusat

Tim McDaniel

C.DeRykus

Rainer Weikusat

Rainer Weikusat

Charlton Wilbur

C.DeRykus

gamo

gamo

Rainer Weikusat

gamo

Rainer Weikusat

gamo

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads