Space (\s) count problem

H

Huub

Hi,

I want to read the white space character, but it doesn't work. Sofar I
have this code for the actual reading.

while (<INPUT_FILE>)
{
$spatie++ if m/\s/m
}

The line to be read is: dit is een test voor TIRCIN
The result for $spatie is 1 instead of 5. Where do I go wrong?

Thanks,

Huub
 
J

Joe Smith

Huub said:
$spatie++ if m/\s/m

The line to be read is: dit is een test voor TIRCIN

You used the /m option instead of /g.
You did not include any code to repeat matches.

$spatie++ while m/\s/g;

It is recommended to use tr/// for counting characters.
-Joe
 
J

John Bokma

Huub said:
Hi,

I want to read the white space character, but it doesn't work. Sofar I
have this code for the actual reading.

You want to count, not read:

perldoc -q count
 
B

Brian Wakem

John said:
You want to count, not read:

perldoc -q count


A few months ago I had to write something to count the number of times a
certain string was found in another string. The script would have to do it
10s of thousands of times in each execution on strings of ~3KB.
It had to be fast, and after much benchmarking, I found the fastest method
(on my machine anyway) is one not in the faq, but to substitute the string
with itself and count the number of successes.

$matches += ($string =~ s!\Q$kw!$kw!g);

Which was ~ 12% faster than anything in faq if I remember correctly.

Why this is faster than :-

$matches++ while $string =~ m!\Q$kw!g;

I have no idea.
 
A

A. Sinan Unur

Brian Wakem said:
A few months ago I had to write something to count the number of times
a certain string was found in another string. The script would have
to do it 10s of thousands of times in each execution on strings of
~3KB. It had to be fast, and after much benchmarking, I found the
fastest method (on my machine anyway) is one not in the faq, but to
substitute the string with itself and count the number of successes.

$matches += ($string =~ s!\Q$kw!$kw!g);

Which was ~ 12% faster than anything in faq if I remember correctly.

Why this is faster than :-

$matches++ while $string =~ m!\Q$kw!g;

I have no idea.

Did try using index?

<untested>
$matches++ while -1 =! index $string, $kw;
</untested>

What were the results?

Sinan
 
B

Brian Wakem

A. Sinan Unur said:
Did try using index?

<untested>
$matches++ while -1 =! index $string, $kw;
</untested>

What were the results?

Sinan


Infinite loop.
 
A

A. Sinan Unur

Brian Wakem said:
....
Infinite loop.

Well, I said it was untested, didn't I :)) Sorry about that.

A rudimentary benchmark I ran with the corrected code showed that index
is about 30% slower than the substitution based solution you posted.

So much for early Sunday morning inspiration.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark 'cmpthese';

my $s = join('', '012345678901234567890123456789012' x 100);
my $k = '0123456789';

cmpthese -1, {
use_regex => \&use_regex,
use_index => \&use_index,
};

sub use_index {
my ($m, $i) = (0, -1);
while(-1 != ($i = index($s, $k, $i + 1))) {
$m += 1;
}
return $m;
}

sub use_regex {
my $m = ($s =~ s!\Q$k!$k!g);
}

__END__

D:\Home\asu1\UseNet\clpmisc> perl -v

This is perl, v5.8.7 built for MSWin32-x86-multi-thread
(with 7 registered patches, see perl -V for more detail)

Copyright 1987-2005, Larry Wall

Binary build 813 [148120] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Jun 6 2005 13:36:37

D:\Home\asu1\UseNet\clpmisc> ppp
Rate use_index use_regex
use_index 5957/s -- -31%
use_regex 8653/s 45% --
 
J

John W. Krahn

A. Sinan Unur said:
A rudimentary benchmark I ran with the corrected code showed that index
is about 30% slower than the substitution based solution you posted.

So much for early Sunday morning inspiration.

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark 'cmpthese';

my $s = join('', '012345678901234567890123456789012' x 100);
my $k = '0123456789';

cmpthese -1, {
use_regex => \&use_regex,
use_index => \&use_index,
};

sub use_index {
my ($m, $i) = (0, -1);
while(-1 != ($i = index($s, $k, $i + 1))) {
$m += 1;
}
return $m;
}

That will match overlapping patterns. A more acurate comparison would be:

sub use_index {
my ( $m, $i ) = ( 0, -length $k );
while ( -1 != ( $i = index $s, $k, $i += length $k ) ) {
$m++;
}
return $m;
}

An even faster solution is to call length() only one time:

sub use_index {
my $len = length $k;
my ( $m, $i ) = ( 0, -$len );
while ( -1 != ( $i = index $s, $k, $i += $len ) ) {
$m++;
}
return $m;
}

sub use_regex {
my $m = ($s =~ s!\Q$k!$k!g);
}


John
 
A

Anno Siegel

Brian Wakem said:
A few months ago I had to write something to count the number of times a
certain string was found in another string. The script would have to do it
10s of thousands of times in each execution on strings of ~3KB.
It had to be fast, and after much benchmarking, I found the fastest method
(on my machine anyway) is one not in the faq, but to substitute the string
with itself and count the number of successes.

$matches += ($string =~ s!\Q$kw!$kw!g);

Which was ~ 12% faster than anything in faq if I remember correctly.

Why this is faster than :-

$matches++ while $string =~ m!\Q$kw!g;

I have no idea.

An alternative that avoids the unnecessary substitution is

$matches = () = $string =~ m!\Q$kw!g;

These do all their counting internally in a C level loop. The while-loop
is Perl level, hence slower.

The only problem with this perfectly plausible explanation is that
benchmarks don't support it. I'm finding the substitution and the
global matching solution in the same ballpark but the explicit loop
beats them by a factor of two.

Anno

#!/usr/bin/perl
use strict; $| = 1;
use Benchmark qw( cmpthese);

our $string = ' xxx xxx xx xxx xxx xxxxxxxx x'
x 100;
our $kw = 'xxx';

goto bench;

my $n1 = 0; $n1 ++ while $string =~ /$kw/g;
my $n2 = $string =~ s!\Q$kw!$kw!g;
my $n3 = () = $string =~ m!\Q$kw!g;
print "$n1, $n2, $n3 (600)\n";
exit;

bench:
cmpthese -3, {
loop => 'my $m1 = 0; $m1 ++ while $string =~ /$kw/g;',
subst => 'my $m2 = $string =~ s!\Q$kw!$kw!g;',
match => 'my $m3 = () = $string =~ m!\Q$kw!g;',
};
 
B

Brian Wakem

Anno said:
An alternative that avoids the unnecessary substitution is

$matches = () = $string =~ m!\Q$kw!g;


That's about 50% slower than substitution on my machine.
 
H

Huub

Joe said:
You used the /m option instead of /g.
You did not include any code to repeat matches.

$spatie++ while m/\s/g;

It is recommended to use tr/// for counting characters.
-Joe

I can't find tr///. Apart from that, I want to actually read each
character (into an array) until the white space is reached. So if the
array is e.g. @woord1(1..$max1), does @woord1[1] = m/\w/g fill that
element of the array with the read character?
 
B

Brian Wakem

Huub said:
I can't find tr///. Apart from that, I want to actually read each
character (into an array) until the white space is reached. So if the
array is e.g. @woord1(1..$max1), does @woord1[1] = m/\w/g fill that
element of the array with the read character?


That's still a rather cryptic explanation, but it sounds like you should be
using 'split'.
 
T

Tad McClellan

Huub said:
I can't find tr///.

perldoc -f tr

tr/// The transliteration operator. Same as "y///". See perlop.

perldoc perlop

Apart from that, I want to actually read each
character (into an array) until the white space is reached.


Then you misled us with your poor choice of Subject header.

So if the
array is e.g. @woord1(1..$max1),


Array indexes start at zero in Perl.

Arrays in Perl are indexed inside of [square] brackets, not (parenthesis).

Please post Real Perl here.

does @woord1[1] = m/\w/g fill that
element of the array with the read character?


What happened when you tried it?


@woord1 = m/\G\S/g; # match & save the leading non-space characters
 
H

Huub

perldoc -f tr

tr/// The transliteration operator. Same as "y///". See perlop.

perldoc perlop

I have tried to get into the FAQ html, but get a timeout each time.
Then you misled us with your poor choice of Subject header.

Not really. This is merely a part of the problem. I still want to count
white space.
So if the
array is e.g. @woord1(1..$max1),



Array indexes start at zero in Perl.

Arrays in Perl are indexed inside of [square] brackets, not (parenthesis).

I started out with [] but got an error. So I tried this an didn't get an
error.
Please post Real Perl here.

I thought I did.
does @woord1[1] = m/\w/g fill that
element of the array with the read character?



What happened when you tried it?

Nothing: no error, no result.
@woord1 = m/\G\S/g; # match & save the leading non-space characters

OK.
 
T

Tad McClellan

Huub said:
I have tried to get into the FAQ html, but get a timeout each time.


There is no reference to the Perl FAQ in what I wrote.

There is no reference to HTML in what I wrote.

Have we entered the "Twilight Zone"?

The docs for your perl should be on your hard disk already, no need
for an internet connection to access them.

Not really.


Yes really.

You said you wanted to count characters.

Someone told you how to count characters.

You said that you want to store characters too.

If you keep changing the specification the problem will never be solved.

array is e.g. @woord1(1..$max1),
Arrays in Perl are indexed inside of [square] brackets, not (parenthesis).

I started out with [] but got an error. So I tried this an didn't get an
error.


You should avoid practicing cargo-cult programming such as that.
 
A

A. Sinan Unur

Huub <"h.v.niekerk at hccnet.nl"> wrote in

[ please provide a proper attribution when quoting others. ]
I have tried to get into the FAQ html, but get a timeout each time.

The documentation should be installed on your computer. If your *nix
distro requires you to install a separate package to get Perl
documentation, please do so.

Sinan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,174
Messages
2,570,940
Members
47,484
Latest member
JackRichard

Latest Threads

Top