Regexp guestion

M

Mike Mimic

Hi!

I would like to make a regexp which would string that is
something like this

abcde abcde { abcde a b c d } abcde abcde

change into

abcde abcde {abcdeabcd} abcde abcde

Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

Is there some other (better) way (I do not like loops
which are maybe not necessary).


Mike
 
P

Peter J. Acklam

Mike Mimic said:
Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

I'm not sure it's better, but it's an alternative

s/ \s+ (?= [^{]* } ) //xg;

Peter
 
T

Tad McClellan

Mike Mimic said:
abcde abcde { abcde a b c d } abcde abcde

change into

abcde abcde {abcdeabcd} abcde abcde

Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

Is there some other (better) way (I do not like loops
which are maybe not necessary).


s< ( { [^}]* } ) >
< ($a = $1) =~ tr/ //d; $a >gxe;


I wouldn't call that "better" though...
 
A

Anno Siegel

Tad McClellan said:
Mike Mimic said:
abcde abcde { abcde a b c d } abcde abcde

change into

abcde abcde {abcdeabcd} abcde abcde

Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

Is there some other (better) way (I do not like loops
which are maybe not necessary).


s< ( { [^}]* } ) >
< ($a = $1) =~ tr/ //d; $a >gxe;


I wouldn't call that "better" though...

It's hard... How about this:

s< ( { [^}]* } ) >
< join '', split ' ', $1 >gxe;

Anno
 
M

Mike Mimic

Hi!
perldoc Benchmark

Thanks. I have tested it and to my suprise
I found out that my version is the fastest
(although it uses while loop). I thought
it will be yours.


Mike
 
M

Mike Mimic

Hi!

Purl said:
Perhaps your method is quicker because
it does not work?

Yes. I forgot to mention this. Sorry. While
I was testing (benchmarking) I found this
error. But the tests were with the correct
version:

1 while $string =~ s/({[^}]*)\s+([^}]*})/$1$2/g;


Mike
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

Yes. I forgot to mention this. Sorry. While
I was testing (benchmarking) I found this
error. But the tests were with the correct
version:

1 while $string =~ s/({[^}]*)\s+([^}]*})/$1$2/g;

I would go one step further in optimizing this, but will let you take care
of actually seeing if it works faster.

Your regex is in three parts:

( { [^}]* ) # capture "{" followed by 0 or more non-"}" characters
\s+ # match one or more spaces
( [^}]* } ) # capture 0 or more non-"}" characters followed by "}"

On a string such as "{ this is a match }", here is how the string is
broken down into the three parts of your regex:

$1: "{ this is a match "
\s+: " "
$2: "}"

Do you understand what this is showing? Your first [^}]* character class
is matching too many characters. I would suggest you change the class to
[^}\s]*, so that the \s+ *can* match multiple spaces.

Your regex ends up executing as many times as there are spaces in braces.

my $str = "{ this and that and those }";
my $count = $str =~ tr/ //;

my $mimic = qr/({[^}]*)\s+([^}]*})/;
my $japhy = qr/({[^}\s]*)\s+([^}]*})/;

for my $regex ($mimic, $japhy) {
my $times = braces($str, $regex);
print "$count vs. $times\n";
}

sub braces {
my ($s, $rx) = @_;
my $t = 0;
while (my $x = $s =~ s/$rx/$1$2/g) { $t += $x }
return $t;
}

The output is:

17 vs. 17 (your regex)
17 vs. 6 (my regex)

I would like to believe, then, that my code executes faster than yours.

(Though I still think that

s<({[^}]*})><(my $x = $1) =~ tr/ //d; $x>eg;

should be faster, since it only requires ONE pass over the string,
although I guess the evaluation aspect slows things down.)
 
M

Mike Mimic

Hi!

Purl said:
I believe your benchmark tests are flawed.

I am ashamed. I read the benchmark results upside down.

So I tested everything again. Here are results.

I have not included Purl Guru's way because it fails
the string with two or more {} sections.

I also have not included Peter J. Acklam's because
it fails the string (or section) with no opening bracket.
It tests only for closing bracket.

The code:

our $string = "abcde abcde { abcde a b c d } abcde abcde { abc de a
b c d } abcde abcde";
cmpthese(500000, {
'reg1a' => sub {local $string = $string; 1 while $string =~
s/({[^}]*)\s+([^}]*})/$1$2/g;},
'reg1b' => sub {local $string = $string; 1 while $string =~
s/({[^}]*?)\s+([^}]*?})/$1$2/g;},
'reg1c' => sub {local $string = $string; 1 while $string =~
s/({[^}\s]*)\s+([^}]*})/$1$2/g;},
'reg2a' => sub {local $string = $string; $string =~
s|({[^}]+})|($_=$1)=~tr/ //d;$_|ge;},
'reg2b' => sub {local $string = $string; $string =~
s|({[^}]+})|($_=$1)=~s/ //g;$_|ge;},
'reg3' => sub {local $string = $string; $string =~ s|({[^}]+})|join('',
split(' ', $1))|ge},
});

Results:

Rate reg1a reg1b reg1c reg3 reg2b reg2a
reg1a 3552/s -- -48% -53% -82% -84% -89%
reg1b 6882/s 94% -- -10% -64% -68% -78%
reg1c 7635/s 115% 11% -- -60% -65% -76%
reg3 19254/s 442% 180% 152% -- -11% -39%
reg2b 21753/s 512% 216% 185% 13% -- -31%
reg2a 31560/s 788% 359% 313% 64% 45% --

So the fastest is really that with the tr/ //d.


Mike
 
M

Mike Mimic

Hi!

Purl said:
Who is responsible for providing clear
and concise parameters with which to work?

I know. :) Sorry. I was not saying that it is
mistake or anything. Only that it is not suitable
for my problem.

Thanks to all of you.


Mike
 
M

Mike Mimic

Hi!

Purl said:
Irrevelant. Your comments do not comply with
the originating author's stated parameters.

That was my mistake. It should handle multiple
brackets. But is not important anymore (I did it
with tr/// trick) and I will probably be able
to implement substring solution if I need
more speed (but for now it is OK).

Thanks.


Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,147
Messages
2,570,835
Members
47,383
Latest member
EzraGiffor

Latest Threads

Top