Searching a sorted array of strings

K

Keith Keller

actually, I have the (rambling) Camel book, and the Perldoc FAQ online
right here. I finished this routine, Paul, and posted it to share. Did I
ask for constructive comments? Why did I know I would immediately be met
with "RTFM" and "you cant do that"? anyway, enough of that...

If you don't want comments, constructive or otherwise, don't post your
code.

And Paul didn't say "you can't do that", he said "you spent a lot of
time figuring out how to do something that's in TFM". Maybe that
doesn't help you get that time back, but perhaps it encourages someone
else to RTFM before embarking on a similar task.

--keith
 
O

one man army

_that_ looks useful

It is. Shocking, I know, that the answer given in the FAQ might be
useful. Again, quoting from the FAQ:
===========================
That being said, there are several ways to approach this. If you are
going to make this query many times over arbitrary string values, the
fastest way is probably to invert the original array and maintain a
hash whose keys are the first array's values.

@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (@blues) { $is_blue{$_} = 1 }
==========================[/QUOTE]

compared the functions, in case it is useful to know...

Benchmark: timing 2000000 iterations of a, b...
a-ARRAY: 5 wallclock secs ( 0.13 usr + 0.11 sys = 0.24 CPU)
@ 8333333.33/s (n=2000000)
(warning: too few iterations for a reliable count)
b-HASH: 1 wallclock secs ( 0.20 usr + -0.02 sys = 0.18 CPU) @
11111111.11/s (n=2000000)
(warning: too few iterations for a reliable count)
Rate a b
a 8333333/s -- -25%
b 11111111/s 33% --
 
P

Paul Lalli

one said:
compared the functions, in case it is useful to know...

Benchmark: timing 2000000 iterations of a, b...
a-ARRAY: 5 wallclock secs ( 0.13 usr + 0.11 sys = 0.24 CPU)
@ 8333333.33/s (n=2000000)
(warning: too few iterations for a reliable count)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

b-HASH: 1 wallclock secs ( 0.20 usr + -0.02 sys = 0.18 CPU) @
11111111.11/s (n=2000000)
(warning: too few iterations for a reliable count)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Did you see those lines? Did you not understand what they told you?
Any comparison this gave you is worthless. You need to increase your
iterations.


Paul Lalli
 
C

Ch Lamprecht

one said:
compared the functions, in case it is useful to know...

Benchmark: timing 2000000 iterations of a, b...
a-ARRAY: 5 wallclock secs ( 0.13 usr + 0.11 sys = 0.24 CPU)
@ 8333333.33/s (n=2000000)
(warning: too few iterations for a reliable count)
b-HASH: 1 wallclock secs ( 0.20 usr + -0.02 sys = 0.18 CPU) @
11111111.11/s (n=2000000)
(warning: too few iterations for a reliable count)
Rate a b
a 8333333/s -- -25%
b 11111111/s 33% --

Hi,
which functions are you comparing ?
Using the code below I get output as follows:

Rate match_City match_city_2
match_City 96006/s -- -85%
match_city_2 632111/s 558% --

ripon
ripon
yorba linda
yorba linda
not found
Not Found


#

use strict;
use warnings;
use Benchmark qw:)all) ;
my @CA_Cities = (
"adelanto", "agoura hills", "alameda", "alamo", "albany",
"alhambra",
"newman", "newport beach", "nipomo", "norco", "north auburn",
"north highlands", "norwalk", "novato", "oakdale", "oakland",
"oceanside", "oildale", "ojai", "olivehurst", "ontario", "opal cliffs",
"orange cove", "orangevale", "orcutt", "orinda", "orland", "orosi",
"oroville east", "oxnard", "pacific grove", "pacifica", "palm desert",
"palmdale", "palo alto", "palos verdes estates", "paradise", "paramount",
"parkway-south sacramento", "parlier", "pasadena", "patterson",
"petaluma", "pico rivera", "piedmont", "pinole", "pismo beach",
"placentia", "placerville", "pleasant hill", "pleasanton", "pomona",
"porterville", "portola hills", "poway", "prunedale", "quartz hill",
"ramona",
"rancho cordova", "rancho cucamonga", "rancho mirage",
"rancho san diego", "rancho santa margarita", "red bluff", "redding",
"redlands", "redondo beach",
"redwood city", "reedley", "rialto", "richmond", "ridgecrest",
"rio del mar", "ripon",
"west whittier-los nietos", "westlake village", "westminster",
"westmont",
"whittier", "wildomar", "willowbrook", "willows", "windsor", "winter
gardens","winters", "winton", "woodcrest", "woodlake", "woodland",
"yorba linda", "yreka", "yuba city", "yucaipa", "yucca valley"
);
my %h_Cities = map { $_ => 1 } @CA_Cities;
use integer;
sub match_City {

my $findme = $_[0];
my( $start, $end ) = ( 0, $#CA_Cities );
my $idx ;
$findme = lc $findme;

while ( $start < $end -1) {
$idx = ( $end + $start ) / 2 ;

if ( $CA_Cities[ $idx ] lt $findme ) {
$start = $idx;

} elsif ( $CA_Cities[ $idx ] gt $findme ) {
$end = $idx;

} else {
return $findme;
}
}
return "not found";
}


sub match_city_2{
my $findme = $_[0];
$findme = lc $findme;
if (exists ($h_Cities{$findme})){
return $findme;
} else {
return "Not Found";
}
}
cmpthese(500000, {
'match_City' =>sub{match_City( "fred" )},
'match_city_2' => sub {match_city_2( "fred" )}
});


## Test this
for ("Ripon","yorba linda","fred"){
print "\n" . match_City( $_ );
print "\n" . match_city_2( $_ );
}
 
T

Tad McClellan

one man army said:
It took me a
while to write this simple routine, making most every mistake on the
way. Perhaps it will be useful to someone.
I tried to use qw( "name" "name space" ); and had problems, so I use the
long form.


You cannot use qw() if your strings contain spaces.

Constructive comments welcome.

Hmmm.


#! /usr/bin/perl -w
use warnings;


Keep the pragma, lose the -w.


I reduced the LOC a bit.

Here is my dense, terse, sup smart CRYPTO code.

----------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my %CA_Cities = map { chomp; $_, 1 } <DATA>; # hashes are for lookups

print "\n" . match_City( "ripon" );
print "\n" . match_City( "riPoN" );
print "\n" . match_City( "Ripon" );
print "\n" . match_City( "yorba linda" );
print "\n" . match_City( "Fred" );

sub match_City {
exists $CA_Cities{ lc $_[0] } ? lc $_[0] : "not found\n";
}

__DATA__
adelanto
agoura hills
alameda
alamo
albany
alhambra
newman
newport beach
nipomo
norco
north auburn
north highlands
norwalk
novato
oakdale
oakland
oceanside
oildale
ojai
olivehurst
ontario
opal cliffs
orange cove
orangevale
orcutt
orinda
orland
orosi
oroville east
oxnard
pacific grove
pacifica
palm desert
palmdale
palo alto
palos verdes estates
paradise
paramount
parkway-south sacramento
parlier
pasadena
patterson
petaluma
pico rivera
piedmont
pinole
pismo beach
placentia
placerville
pleasant hill
pleasanton
pomona
porterville
portola hills
poway
prunedale
quartz hill
ramona
rancho cordova
rancho cucamonga
rancho mirage
rancho san diego
rancho santa margarita
red bluff
redding
redlands
redondo beach
redwood city
reedley
rialto
richmond
ridgecrest
rio del mar
ripon
west whittier-los nietos
westlake village
westminster
westmont
whittier
wildomar
willowbrook
willows
windsor
winter
gardens
winters
winton
woodcrest
woodlake
woodland
yorba linda
yreka
yuba city
yucaipa
yucca valley
 
R

robic0

You cannot use qw() if your strings contain spaces.
u mean "
Constructive comments welcome.

Hmmm.


#! /usr/bin/perl -w
use warnings;


Keep the pragma, lose the -w.


I reduced the LOC a bit.

Here is my dense, terse, sup smart CRYPTO code.

----------------------------------
#!/usr/bin/perl
use warnings;
use strict;

my %CA_Cities = map { chomp; $_, 1 } <DATA>; # hashes are for lookups

print "\n" . match_City( "ripon" );
print "\n" . match_City( "riPoN" );
print "\n" . match_City( "Ripon" );
print "\n" . match_City( "yorba linda" );
print "\n" . match_City( "Fred" );

sub match_City {
exists $CA_Cities{ lc $_[0] } ? lc $_[0] : "not found\n";
}

__DATA__
adelanto
agoura hills
alameda
alamo
albany
alhambra
newman
newport beach
nipomo
norco
north auburn
north highlands
norwalk
novato
oakdale
oakland
oceanside
oildale
ojai
olivehurst
ontario
opal cliffs
orange cove
orangevale
orcutt
orinda
orland
orosi
oroville east
oxnard
pacific grove
pacifica
palm desert
palmdale
palo alto
palos verdes estates
paradise
paramount
parkway-south sacramento
parlier
pasadena
patterson
petaluma
pico rivera
piedmont
pinole
pismo beach
placentia
placerville
pleasant hill
pleasanton
pomona
porterville
portola hills
poway
prunedale
quartz hill
ramona
rancho cordova
rancho cucamonga
rancho mirage
rancho san diego
rancho santa margarita
red bluff
redding
redlands
redondo beach
redwood city
reedley
rialto
richmond
ridgecrest
rio del mar
ripon
west whittier-los nietos
westlake village
westminster
westmont
whittier
wildomar
willowbrook
willows
windsor
winter
gardens
winters
winton
woodcrest
woodlake
woodland
yorba linda
yreka
yuba city
yucaipa
yucca valley
----------------------------------
 
D

DJ Stunks

Tad said:
Here is my dense, terse, sup smart CRYPTO code.

omg ur l33t! dat's m4d sk1llz, d00d! w00t! w00t!

lmao
exists $CA_Cities{ lc $_[0] } ? lc $_[0] : "not found\n";

so question: does exists() return faster than simply pulling the value
(since all valid values were previously set to true)? I don't know the
hash internals...

where'd you get that list? could be handy...

Tks,
-jp
 
D

DJ Stunks

DJ said:
where'd you get that list? could be handy...

um, just go ahead and ignore that question.... hahahaha

for some reason I thought this was the "thousands" list doofus
mentioned...

-jp
 
U

Uri Guttman

"r" == robic0 <robic0> writes:

r> u mean "

no, he means qw(). wtf does " have to do with this? oops, it is robic0,
winner of maroon troll of the year.

and of course a 1 line useless reply to a full quoted email.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,967
Members
47,520
Latest member
KrisMacono

Latest Threads

Top