Regexp guestion

Mike Mimic · Apr 10, 2004

Hi!

I would like to make a regexp which would string that is
something like this

abcde abcde { abcde a b c d } abcde abcde

change into

abcde abcde {abcdeabcd} abcde abcde

Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

Is there some other (better) way (I do not like loops
which are maybe not necessary).

Mike

Peter J. Acklam · Apr 10, 2004

Mike Mimic said:
Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

I'm not sure it's better, but it's an alternative

s/ \s+ (?= [^{]* } ) //xg;

Peter

Tad McClellan · Apr 10, 2004

Mike Mimic said:
abcde abcde { abcde a b c d } abcde abcde

change into

abcde abcde {abcdeabcd} abcde abcde

Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

Is there some other (better) way (I do not like loops
which are maybe not necessary).

s< ( { [^}]* } ) >
< ($a = $1) =~ tr/ //d; $a >gxe;

I wouldn't call that "better" though...

Anno Siegel · Apr 11, 2004

Tad McClellan said:
Mike Mimic said:

abcde abcde { abcde a b c d } abcde abcde

change into

abcde abcde {abcdeabcd} abcde abcde

Every space character between { and } should be deleted.

I have come to this:

1 while $string =~ s/({\S*?)\s(\S*?})/$1$2/g;

Is there some other (better) way (I do not like loops
which are maybe not necessary).

Click to expand...

s< ( { [^}]* } ) >
< ($a = $1) =~ tr/ //d; $a >gxe;

I wouldn't call that "better" though...

It's hard... How about this:

s< ( { [^}]* } ) >
< join '', split ' ', $1 >gxe;

Anno

Mike Mimic · Apr 11, 2004

Hi!

Anno said:
It's hard... How about this:

s< ( { [^}]* } ) >
< join '', split ' ', $1 >gxe;

And which of those is the fastest?

Mike

Peter J. Acklam · Apr 11, 2004

Mike Mimic said:
Anno said:

It's hard... How about this:
s< ( { [^}]* } ) >
< join '', split ' ', $1 >gxe;

Click to expand...

And which of those is the fastest?

perldoc Benchmark

Peter

Mike Mimic · Apr 11, 2004

Hi!

perldoc Benchmark

Thanks. I have tested it and to my suprise
I found out that my version is the fastest
(although it uses while loop). I thought
it will be yours.

Mike

Mike Mimic · Apr 11, 2004

Hi!

Purl said:
Perhaps your method is quicker because
it does not work?

Yes. I forgot to mention this. Sorry. While
I was testing (benchmarking) I found this
error. But the tests were with the correct
version:

1 while $string =~ s/({[^}]*)\s+([^}]*})/$1$2/g;

Mike

Jeff 'japhy' Pinyan · Apr 11, 2004

[posted & mailed]

Yes. I forgot to mention this. Sorry. While
I was testing (benchmarking) I found this
error. But the tests were with the correct
version:

1 while $string =~ s/({[^}]*)\s+([^}]*})/$1$2/g;

I would go one step further in optimizing this, but will let you take care
of actually seeing if it works faster.

Your regex is in three parts:

( { [^}]* ) # capture "{" followed by 0 or more non-"}" characters
\s+ # match one or more spaces
( [^}]* } ) # capture 0 or more non-"}" characters followed by "}"

On a string such as "{ this is a match }", here is how the string is
broken down into the three parts of your regex:

$1: "{ this is a match "
\s+: " "
$2: "}"

Do you understand what this is showing? Your first [^}]* character class
is matching too many characters. I would suggest you change the class to
[^}\s]*, so that the \s+ *can* match multiple spaces.

Your regex ends up executing as many times as there are spaces in braces.

my $str = "{ this and that and those }";
my $count = $str =~ tr/ //;

my $mimic = qr/({[^}]*)\s+([^}]*})/;
my $japhy = qr/({[^}\s]*)\s+([^}]*})/;

for my $regex ($mimic, $japhy) {
my $times = braces($str, $regex);
print "$count vs. $times\n";
}

sub braces {
my ($s, $rx) = @_;
my $t = 0;
while (my $x = $s =~ s/$rx/$1$2/g) { $t += $x }
return $t;
}

The output is:

17 vs. 17 (your regex)
17 vs. 6 (my regex)

I would like to believe, then, that my code executes faster than yours.

(Though I still think that

s<({[^}]*})><(my $x = $1) =~ tr/ //d; $x>eg;

should be faster, since it only requires ONE pass over the string,
although I guess the evaluation aspect slows things down.)

Mike Mimic · Apr 12, 2004

Hi!

Purl said:
I believe your benchmark tests are flawed.

I am ashamed. I read the benchmark results upside down.

So I tested everything again. Here are results.

I have not included Purl Guru's way because it fails
the string with two or more {} sections.

I also have not included Peter J. Acklam's because
it fails the string (or section) with no opening bracket.
It tests only for closing bracket.

The code:

our $string = "abcde abcde { abcde a b c d } abcde abcde { abc de a
b c d } abcde abcde";
cmpthese(500000, {
'reg1a' => sub {local $string = $string; 1 while $string =~
s/({[^}]*)\s+([^}]*})/$1$2/g;},
'reg1b' => sub {local $string = $string; 1 while $string =~
s/({[^}]*?)\s+([^}]*?})/$1$2/g;},
'reg1c' => sub {local $string = $string; 1 while $string =~
s/({[^}\s]*)\s+([^}]*})/$1$2/g;},
'reg2a' => sub {local $string = $string; $string =~
s|({[^}]+})|($_=$1)=~tr/ //d;$_|ge;},
'reg2b' => sub {local $string = $string; $string =~
s|({[^}]+})|($_=$1)=~s/ //g;$_|ge;},
'reg3' => sub {local $string = $string; $string =~ s|({[^}]+})|join('',
split(' ', $1))|ge},
});

Results:

Rate reg1a reg1b reg1c reg3 reg2b reg2a
reg1a 3552/s -- -48% -53% -82% -84% -89%
reg1b 6882/s 94% -- -10% -64% -68% -78%
reg1c 7635/s 115% 11% -- -60% -65% -76%
reg3 19254/s 442% 180% 152% -- -11% -39%
reg2b 21753/s 512% 216% 185% 13% -- -31%
reg2a 31560/s 788% 359% 313% 64% 45% --

So the fastest is really that with the tr/ //d.

Mike

Mike Mimic · Apr 12, 2004

Hi!

Purl said:
Who is responsible for providing clear
and concise parameters with which to work?

I know.

Sorry. I was not saying that it is
mistake or anything. Only that it is not suitable
for my problem.

Thanks to all of you.

Mike

Mike Mimic · Apr 13, 2004

Hi!

Purl said:
Irrevelant. Your comments do not comply with
the originating author's stated parameters.

That was my mistake. It should handle multiple
brackets. But is not important anymore (I did it
with tr/// trick) and I will probably be able
to implement substring solution if I need
more speed (but for now it is OK).

Thanks.

Mike

Mike Mimic · Apr 13, 2004

Hi!

Purl said:
My method handles multiple brackets with ease
and good efficiency.

What about "a { a b c } b { b c d } e"?

Mike

How Python works: What do you know about support for negative indices?	13	Sep 10, 2010
replacing char no 2 in a string?	3	Oct 12, 2011
scanf in python	13	Jul 21, 2008
ctype performance benchmark	2	Jul 17, 2009
boost::archive::xml_iarchive	0	Oct 25, 2012
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023
Sorting a row of letters, using a blank space, with some added rules	6	Apr 27, 2006
Print triangle of star/blank inside the rectangle of char	3	Mar 28, 2007

Regexp guestion

Mike Mimic

Peter J. Acklam

Tad McClellan

Anno Siegel

Mike Mimic

Peter J. Acklam

Mike Mimic

Mike Mimic

Jeff 'japhy' Pinyan

Mike Mimic

Mike Mimic

Mike Mimic

Mike Mimic

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads