Space (\s) count problem

Huub · Sep 11, 2005

Hi,

I want to read the white space character, but it doesn't work. Sofar I
have this code for the actual reading.

while (<INPUT_FILE>)
{
$spatie++ if m/\s/m
}

The line to be read is: dit is een test voor TIRCIN
The result for $spatie is 1 instead of 5. Where do I go wrong?

Thanks,

Huub

Joe Smith · Sep 11, 2005

Huub said:
$spatie++ if m/\s/m

The line to be read is: dit is een test voor TIRCIN

You used the /m option instead of /g.
You did not include any code to repeat matches.

$spatie++ while m/\s/g;

It is recommended to use tr/// for counting characters.
-Joe

John Bokma · Sep 11, 2005

Huub said:
Hi,

I want to read the white space character, but it doesn't work. Sofar I
have this code for the actual reading.

You want to count, not read:

perldoc -q count

Brian Wakem · Sep 11, 2005

John said:
You want to count, not read:

perldoc -q count

A few months ago I had to write something to count the number of times a
certain string was found in another string. The script would have to do it
10s of thousands of times in each execution on strings of ~3KB.
It had to be fast, and after much benchmarking, I found the fastest method
(on my machine anyway) is one not in the faq, but to substitute the string
with itself and count the number of successes.

$matches += ($string =~ s!\Q$kw!$kw!g);

Which was ~ 12% faster than anything in faq if I remember correctly.

Why this is faster than :-

$matches++ while $string =~ m!\Q$kw!g;

I have no idea.

A. Sinan Unur · Sep 11, 2005

Brian Wakem said:
A few months ago I had to write something to count the number of times
a certain string was found in another string. The script would have
to do it 10s of thousands of times in each execution on strings of
~3KB. It had to be fast, and after much benchmarking, I found the
fastest method (on my machine anyway) is one not in the faq, but to
substitute the string with itself and count the number of successes.

$matches += ($string =~ s!\Q$kw!$kw!g);

Which was ~ 12% faster than anything in faq if I remember correctly.

Why this is faster than :-

$matches++ while $string =~ m!\Q$kw!g;

I have no idea.

Did try using index?

<untested>
$matches++ while -1 =! index $string, $kw;
</untested>

What were the results?

Sinan

Brian Wakem · Sep 11, 2005

A. Sinan Unur said:
Did try using index?

<untested>
$matches++ while -1 =! index $string, $kw;
</untested>

What were the results?

Sinan

Infinite loop.

A. Sinan Unur · Sep 11, 2005

Brian Wakem said:
....
Infinite loop.

Well, I said it was untested, didn't I

) Sorry about that.

A rudimentary benchmark I ran with the corrected code showed that index
is about 30% slower than the substitution based solution you posted.

So much for early Sunday morning inspiration.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark 'cmpthese';

my $s = join('', '012345678901234567890123456789012' x 100);
my $k = '0123456789';

cmpthese -1, {
use_regex => \&use_regex,
use_index => \&use_index,
};

sub use_index {
my ($m, $i) = (0, -1);
while(-1 != ($i = index($s, $k, $i + 1))) {
$m += 1;
}
return $m;
}

sub use_regex {
my $m = ($s =~ s!\Q$k!$k!g);
}

__END__

D:\Home\asu1\UseNet\clpmisc> perl -v

This is perl, v5.8.7 built for MSWin32-x86-multi-thread
(with 7 registered patches, see perl -V for more detail)

Copyright 1987-2005, Larry Wall

Binary build 813 [148120] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Jun 6 2005 13:36:37

D:\Home\asu1\UseNet\clpmisc> ppp
Rate use_index use_regex
use_index 5957/s -- -31%
use_regex 8653/s 45% --

John W. Krahn · Sep 12, 2005

A. Sinan Unur said:
A rudimentary benchmark I ran with the corrected code showed that index
is about 30% slower than the substitution based solution you posted.

So much for early Sunday morning inspiration.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark 'cmpthese';

my $s = join('', '012345678901234567890123456789012' x 100);
my $k = '0123456789';

cmpthese -1, {
use_regex => \&use_regex,
use_index => \&use_index,
};

sub use_index {
my ($m, $i) = (0, -1);
while(-1 != ($i = index($s, $k, $i + 1))) {
$m += 1;
}
return $m;
}

That will match overlapping patterns. A more acurate comparison would be:

sub use_index {
my ( $m, $i ) = ( 0, -length $k );
while ( -1 != ( $i = index $s, $k, $i += length $k ) ) {
$m++;
}
return $m;
}

An even faster solution is to call length() only one time:

sub use_index {
my $len = length $k;
my ( $m, $i ) = ( 0, -$len );
while ( -1 != ( $i = index $s, $k, $i += $len ) ) {
$m++;
}
return $m;
}

sub use_regex {
my $m = ($s =~ s!\Q$k!$k!g);
}

John

Anno Siegel · Sep 12, 2005

Brian Wakem said:
A few months ago I had to write something to count the number of times a
certain string was found in another string. The script would have to do it
10s of thousands of times in each execution on strings of ~3KB.
It had to be fast, and after much benchmarking, I found the fastest method
(on my machine anyway) is one not in the faq, but to substitute the string
with itself and count the number of successes.

$matches += ($string =~ s!\Q$kw!$kw!g);

Which was ~ 12% faster than anything in faq if I remember correctly.

Why this is faster than :-

$matches++ while $string =~ m!\Q$kw!g;

I have no idea.

An alternative that avoids the unnecessary substitution is

$matches = () = $string =~ m!\Q$kw!g;

These do all their counting internally in a C level loop. The while-loop
is Perl level, hence slower.

The only problem with this perfectly plausible explanation is that
benchmarks don't support it. I'm finding the substitution and the
global matching solution in the same ballpark but the explicit loop
beats them by a factor of two.

Anno

#!/usr/bin/perl
use strict; $| = 1;
use Benchmark qw( cmpthese);

our $string = ' xxx xxx xx xxx xxx xxxxxxxx x'
x 100;
our $kw = 'xxx';

goto bench;

my $n1 = 0; $n1 ++ while $string =~ /$kw/g;
my $n2 = $string =~ s!\Q$kw!$kw!g;
my $n3 = () = $string =~ m!\Q$kw!g;
print "$n1, $n2, $n3 (600)\n";
exit;

bench:
cmpthese -3, {
loop => 'my $m1 = 0; $m1 ++ while $string =~ /$kw/g;',
subst => 'my $m2 = $string =~ s!\Q$kw!$kw!g;',
match => 'my $m3 = () = $string =~ m!\Q$kw!g;',
};

Brian Wakem · Sep 12, 2005

Anno said:
An alternative that avoids the unnecessary substitution is

$matches = () = $string =~ m!\Q$kw!g;

That's about 50% slower than substitution on my machine.

Huub · Sep 12, 2005

Joe said:
You used the /m option instead of /g.
You did not include any code to repeat matches.

$spatie++ while m/\s/g;

It is recommended to use tr/// for counting characters.
-Joe

I can't find tr///. Apart from that, I want to actually read each
character (into an array) until the white space is reached. So if the
array is e.g. @woord1(1..$max1), does @woord1[1] = m/\w/g fill that
element of the array with the read character?

Brian Wakem · Sep 12, 2005

Huub said:
I can't find tr///. Apart from that, I want to actually read each
character (into an array) until the white space is reached. So if the
array is e.g. @woord1(1..$max1), does @woord1[1] = m/\w/g fill that
element of the array with the read character?

That's still a rather cryptic explanation, but it sounds like you should be
using 'split'.

Tad McClellan · Sep 12, 2005

Huub said:
I can't find tr///.

perldoc -f tr

tr/// The transliteration operator. Same as "y///". See perlop.

perldoc perlop

Apart from that, I want to actually read each
character (into an array) until the white space is reached.

Then you misled us with your poor choice of Subject header.

So if the
array is e.g. @woord1(1..$max1),

Array indexes start at zero in Perl.

Arrays in Perl are indexed inside of [square] brackets, not (parenthesis).

Please post Real Perl here.

does @woord1[1] = m/\w/g fill that
element of the array with the read character?

What happened when you tried it?

@woord1 = m/\G\S/g; # match & save the leading non-space characters

Huub · Sep 12, 2005

perldoc -f tr

tr/// The transliteration operator. Same as "y///". See perlop.

perldoc perlop

I have tried to get into the FAQ html, but get a timeout each time.

Then you misled us with your poor choice of Subject header.

Not really. This is merely a part of the problem. I still want to count
white space.

So if the
array is e.g. @woord1(1..$max1),

Click to expand...

Array indexes start at zero in Perl.

Arrays in Perl are indexed inside of [square] brackets, not (parenthesis).

I started out with [] but got an error. So I tried this an didn't get an
error.

Please post Real Perl here.

I thought I did.

does @woord1[1] = m/\w/g fill that
element of the array with the read character?

Click to expand...

What happened when you tried it?

Nothing: no error, no result.

@woord1 = m/\G\S/g; # match & save the leading non-space characters

OK.

Tad McClellan · Sep 12, 2005

Huub said:
I have tried to get into the FAQ html, but get a timeout each time.

There is no reference to the Perl FAQ in what I wrote.

There is no reference to HTML in what I wrote.

Have we entered the "Twilight Zone"?

The docs for your perl should be on your hard disk already, no need
for an internet connection to access them.

Not really.

Yes really.

You said you wanted to count characters.

Someone told you how to count characters.

You said that you want to store characters too.

If you keep changing the specification the problem will never be solved.

array is e.g. @woord1(1..$max1),

Click to expand...

Arrays in Perl are indexed inside of [square] brackets, not (parenthesis).

Click to expand...

I started out with [] but got an error. So I tried this an didn't get an
error.

You should avoid practicing cargo-cult programming such as that.

A. Sinan Unur · Sep 12, 2005

Huub <"h.v.niekerk at hccnet.nl"> wrote in

[ please provide a proper attribution when quoting others. ]

I have tried to get into the FAQ html, but get a timeout each time.

The documentation should be installed on your computer. If your *nix
distro requires you to install a separate package to get Perl
documentation, please do so.

Sinan

Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
Timing problem	4	May 1, 2023
s modifier doesn't seem to work	20	Aug 10, 2013
Pygame project sound mixing issue...	4	Dec 29, 2023
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Clever implementation for s///r	5	Apr 16, 2012
not or ! ?	7	Sep 11, 2005
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023

Space (\s) count problem

Huub

Joe Smith

John Bokma

Brian Wakem

A. Sinan Unur

Brian Wakem

A. Sinan Unur

John W. Krahn

Anno Siegel

Brian Wakem

Huub

Brian Wakem

Tad McClellan

Huub

Tad McClellan

A. Sinan Unur

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads