Searching a sorted array of strings

O

one man army

Hi All-
I am an infrequent user of Perl. I sat down this morning to do a
simple task, which started with referring to some data. It took me a
while to write this simple routine, making most every mistake on the
way. Perhaps it will be useful to someone.
I tried to use qw( "name" "name space" ); and had problems, so I use the
long form. Constructive comments welcome.

The array is almost a thousand elements actually, edited for usenet
<- snip ->
#! /usr/bin/perl -w

use strict;
use warnings;

my @CA_Cities = (
"adelanto", "agoura hills", "alameda", "alamo", "albany", "alhambra",
"newman", "newport beach", "nipomo", "norco", "north auburn",
"north highlands", "norwalk", "novato", "oakdale", "oakland",
"oceanside", "oildale", "ojai", "olivehurst", "ontario", "opal cliffs",
"orange cove", "orangevale", "orcutt", "orinda", "orland", "orosi",
"oroville east", "oxnard", "pacific grove", "pacifica", "palm desert",
"palmdale", "palo alto", "palos verdes estates", "paradise", "paramount",
"parkway-south sacramento", "parlier", "pasadena", "patterson",
"petaluma", "pico rivera", "piedmont", "pinole", "pismo beach",
"placentia", "placerville", "pleasant hill", "pleasanton", "pomona",
"porterville", "portola hills", "poway", "prunedale", "quartz hill",
"ramona",
"rancho cordova", "rancho cucamonga", "rancho mirage",
"rancho san diego", "rancho santa margarita", "red bluff", "redding",
"redlands", "redondo beach",
"redwood city", "reedley", "rialto", "richmond", "ridgecrest",
"rio del mar", "ripon",
"west whittier-los nietos", "westlake village", "westminster",
"westmont",
"whittier", "wildomar", "willowbrook", "willows", "windsor", "winter
gardens","winters", "winton", "woodcrest", "woodlake", "woodland",
"yorba linda", "yreka", "yuba city", "yucaipa", "yucca valley"
);

sub match_City {

my( $findme ) = @_;
my( $start, $end ) = ( 0, $#CA_Cities );
my( $idx );
my $cnt = 0;

# convert to lowercase
$findme =~ s/([A-Z]+)/\L$1/g;
print "\nSEARCH\n";
print "findme= " . $findme . "\n";

while ( $start != $end && $cnt < $#CA_Cities) {

#note start and end never become equal in alot of cases
$idx = int (( $end - $start ) / 2) + $start;

print "a= " . $CA_Cities[ $idx ] . " ";

if ( $CA_Cities[ $idx ] eq $findme ) {
return $findme;

} elsif ( $CA_Cities[ $idx ] gt $findme ) {
$end = $idx;

} elsif ( $CA_Cities[ $idx ] lt $findme ) {
$start = $idx;
}

print "idx= " . $idx . " start= " . $start . " end= " . $end . "
cnt= " . $cnt ."\n";
$cnt++;
}

return "not found\n";
}

## Test this
print "\n" . match_City( "ripon" );
print "\n" . match_City( "riPoN" );
print "\n" . match_City( "Ripon" );
print "\n" . match_City( "yorba linda" );
print "\n" . match_City( $CA_Cities[ int(rand $#CA_Cities) ] );
print "\n" . match_City( $CA_Cities[ int(rand $#CA_Cities) ] );
print "\n" . match_City( $CA_Cities[ int(rand $#CA_Cities) ] );
print "\n" . match_City( "Fred" );
 
P

Paul Lalli

one said:
Subject: Searching a sorted array of strings

Did you check the Perl FAQ before sitting down to write your own
routine?

$ perldoc -q contain
Found in /opt/perl/lib/5.6.1/pod/perlfaq4.pod
How can I tell whether a list or array contains a certain
element?
I am an infrequent user of Perl. I sat down this morning to do a
simple task, which started with referring to some data. It took me a
while to write this simple routine, making most every mistake on the
way. Perhaps it will be useful to someone.
I tried to use qw( "name" "name space" ); and had problems,

You can't use the qw// shortcut for strings that contain a space.
Sorry, you just can't.
so I use the
long form. Constructive comments welcome.

The array is almost a thousand elements actually, edited for usenet
<- snip ->
#! /usr/bin/perl -w

use strict;
use warnings;

my @CA_Cities = (
"adelanto", "agoura hills", "alameda", "alamo", "albany", "alhambra",

<snip>

my %CA_Cities = map { $_ => 1 } ("adelanto", "agoura hills", "alameda",
alamo" said:
sub match_City {

my( $findme ) = @_;

# convert to lowercase
$findme =~ s/([A-Z]+)/\L$1/g;

$findme = lc $findme;
print "\nSEARCH\n";
print "findme= " . $findme . "\n";

if (exists ($CA_Cities{$findme}){
return $findme;
} else {
return "Not Found\n";
}

}
## Test this
print "\n" . match_City( "ripon" );
print "\n" . match_City( "riPoN" );
print "\n" . match_City( "Ripon" );
print "\n" . match_City( "yorba linda" );
print "\n" . match_City( $CA_Cities[ int(rand $#CA_Cities) ] );
print "\n" . match_City( $CA_Cities[ int(rand $#CA_Cities) ] );
print "\n" . match_City( $CA_Cities[ int(rand $#CA_Cities) ] );
print "\n" . match_City( "Fred" );

This is why you check the FAQ for a language *before* you start using
it - particularly if you're not especially familiar with that language.

Paul Lalli
 
O

one man army

Paul Lalli said:
Did you check the Perl FAQ before sitting down to write your own
routine?

$ perldoc -q contain
Found in /opt/perl/lib/5.6.1/pod/perlfaq4.pod
How can I tell whether a list or array contains a certain
element?
Yep. Not sure why perldoc fails. here is what my terminal said

b$ perldoc -q contains
No documentation found for "perlfaq1".
No documentation found for "perlfaq2".
No documentation found for "perlfaq3".
No documentation found for "perlfaq4".
No documentation found for "perlfaq5".
No documentation found for "perlfaq6".
No documentation found for "perlfaq7".
No documentation found for "perlfaq8".
No documentation found for "perlfaq9".
b$ perldoc -q contain
No documentation found for "perlfaq1".
No documentation found for "perlfaq2".
No documentation found for "perlfaq3".
No documentation found for "perlfaq4".
No documentation found for "perlfaq5".
No documentation found for "perlfaq6".
No documentation found for "perlfaq7".
No documentation found for "perlfaq8".
No documentation found for "perlfaq9".


Not sure why the web fails:

http://perldoc.perl.org/search.html?q=contain

-- Perl 5.8.6 documentation --

Search results

No matches found for your query "contain"
 
O

one man army

and I did look at that entry, actually, after both searches failed. And
that entry says "use a hash, here is how" or "use grep, which is slower."

So my example stands, it is what I said it is, searching a sorted
array of strings in Perl.
 
O

one man army

Paul Lalli said:
Did you check the Perl FAQ before sitting down to write your own
routine?
This is why you check the FAQ for a language *before* you start using
it - particularly if you're not especially familiar with that language.

Paul Lalli
<< FLAME >>
actually, I have the (rambling) Camel book, and the Perldoc FAQ online
right here. I finished this routine, Paul, and posted it to share. Did I
ask for constructive comments? Why did I know I would immediately be met
with "RTFM" and "you cant do that"? anyway, enough of that...
<< END FLAME >>
 
D

DJ Stunks

one said:
and I did look at that entry, actually, after both searches failed. And
that entry says "use a hash, here is how" or "use grep, which is slower."

So my example stands, it is what I said it is, searching a sorted
array of strings in Perl.

I don't understand...

Are you asking us how you _should_ search for a string in an array?
because if so the answer *is* use a hash and replace your entire
subroutine with a simple membership test.

Or, are you posting like some kind of a Public Service Announcement
-slash- FYI -slash- GPL code contribution on how one _could_ go about
searching for a string in an array? because that's going to start
another 50+ message thread...

The FAQ answers were generated for a reason, and by better coders than
you or I.

-jp
 
D

DJ Stunks

one said:
<< FLAME >>
actually, I have the (rambling) Camel book, and the Perldoc FAQ online
right here. I finished this routine, Paul, and posted it to share. Did I
ask for constructive comments? Why did I know I would immediately be met
with "RTFM" and "you cant do that"? anyway, enough of that...
<< END FLAME >>

oh, I see. it's like that.

well, allow me to kick this shit off right away. :)

why do you think we need advice from "an infrequent user of Perl" who
"[makes] most every mistake on the way" and who writes semi-coherent
36-line subroutines for something which would be done in 1 very
coherent line if only he were able to actually RTFM instead of bitching
about being told to RTFM.

later fool,
say hi to robic and mark for me :)

-jp
 
P

Paul Lalli

one said:
<< FLAME >>

Well, good of you to let us all know what kind of person you are in
advance.
actually, I have the (rambling) Camel book, and the Perldoc FAQ online
right here.

Shame you didn't use them.
I finished this routine, Paul, and posted it to share.

Yes, you did. I'm still confused as to why. The Perl FAQ I pointed
you to has a far better solution to your stated goal. I am at a loss
as to what your point is.
Did I ask for constructive comments?

Yes. Which is exactly what I gave. There is nothing more constructive
than telling future readers "This solution is sub-optimal. Here is a
better one."
Why did I know I would immediately be met with "RTFM"

Probably because you didn't... or at least, didn't take the next
natural step and follow the advice given in TFM.
and "you cant do that"?

Well, since you snipped the portion of my reply that's relavent here,
let me explain. You stated you had tried to use the qw// shortcut.
That is admirable, as it's a useful tool. But you also stated that you
encountered problems (neglecting, btw, to expand upon what those
problems were - but I did extrapolate). My "You can't do that" was
intended to inform you that you did nothing wrong in your attempt to
use qw//, that it is a feature/bug/limitation of the operator that it
cannot be used for strings which contain a space character. I do
apologize if you misinterpreted that statement.
anyway, enough of that...

Quite.

Paul Lalli
 
P

Paul Lalli

one said:
Yep. Not sure why perldoc fails. here is what my terminal said

b$ perldoc -q contains
No documentation found for "perlfaq1".

Your perldoc installation is broken. I suggest you speak with your
system administrator about repairing it.
Not sure why the web fails:

http://perldoc.perl.org/search.html?q=contain

-- Perl 5.8.6 documentation --

Search results

No matches found for your query "contain"

Honestly, that surprises me too. Apparently that search is optimized
only for whole-word queries. Regardless, the fact that I typed
"perldoc -q contains" was not meant to suggest that you should have
typed that exact phrase. I was merely pointing you to the FAQ you
would have found if you had browsed the Perl FAQ before beginning your
task:

http://perldoc.perl.org --> FAQs
http://perldoc.perl.org/index-faq.html --> "Perlfaq4 - Data
Manipulation"
How can I tell whether a certain element is contained in a list or
array?

Paul Lalli
 
O

one man army

DJ Stunks said:
I don't understand...

I am not saying anyone should do anything, of course not! Its not that
complicated... I wanted to do something, here is the working code to do
that. Maybe someone will search and find it. Maybe no one cares.
Whatever. It seems like an array of strings and searching it _is_ a
reasonable thing to request in a programming language. Just because a
Hash is easy in Perl doesn't make it an array of strings. BUT MINUTIA, I
dont really care.

Since I am getting the hang of Perl today, things like string compare,
the number of elements in an array, using the er, subroutine
capabilities of Perl to get an arguement - thats all good. Beginner
land, remember? :)

And actually, I am a really great coder. Just not in Perl. :)

So you say really preprocess the array to make it a Hash, like

%CA_Cities = ( 'San Francisco' => 1, 'Los Angeles' => 2 );

or

@CA_Cities = ( "San Francisco", 1, 'Los Angeles', 2 );
%CA2 = @CA_Cities;

and that is the WAY. I can try that, too.
 
D

DJ Stunks

one said:
I finished this routine, Paul, and posted it to share. Did I
ask for constructive comments?

lmao

if only there were some way we could go back in time.......

oh wait:
Hi All-
I am an infrequent user of Perl. I sat down this morning to do a
simple task, which started with referring to some data. It took me a
while to write this simple routine, making most every mistake on the
way. Perhaps it will be useful to someone.
I tried to use qw( "name" "name space" ); and had problems, so I use the
long form. Constructive comments welcome.

no wonder you can't code worth shit, you've got amnesia! you better
get yourself to the OR, stat!

-jp
 
A

A. Sinan Unur

It took me a while to write this simple routine, making most every
mistake on the way. Perhaps it will be useful to someone.

Not likely.
Constructive comments welcome.

Don't reinvent the wheel.
#! /usr/bin/perl -w

use strict;
use warnings;

my @CA_Cities = (

As the Paul mentioned, use a hash.

Sinan
 
P

Paul Lalli

one said:
Whatever. It seems like an array of strings and searching it _is_ a
reasonable thing to request in a programming language. Just because a
Hash is easy in Perl doesn't make it an array of strings. BUT MINUTIA, I
dont really care.

You're not getting this. What the FAQ is telling you is that if you
already have an array of strings, the best way to search it is to first
convert that array into a hash.
Since I am getting the hang of Perl today, things like string compare,
the number of elements in an array, using the er, subroutine
capabilities of Perl to get an arguement - thats all good. Beginner
land, remember? :)

Do you seriously not realize how exceedingly arrogant it is to admit
that you are an extreme beginner and in the same breath suggest that
you hope someone else finds your code and uses it? Do you not think
that there are many more seasoned and experienced programmers who have
better solutions to such problems?

And actually, I am a really great coder. Just not in Perl. :)

Good. So get better at it, and *then* start offering your advice.
Until then, learn.
So you say really preprocess the array to make it a Hash, like

%CA_Cities = ( 'San Francisco' => 1, 'Los Angeles' => 2 );

or

@CA_Cities = ( "San Francisco", 1, 'Los Angeles', 2 );
%CA2 = @CA_Cities;

No. No one is telling you you have to redefine your entire structure.
You are perfectly able to use your existing structure.

@CA_Cities = ('San Francisco', 'Los Angeles', 'Anaheim', ); #etc - as
defined before
%CA2 = map {$_ => 1} @CA_Cities;

Paul Lalli
 
O

one man army

DJ Stunks said:
well, allow me to kick this shit off right away. :)

why do you think we need advice from "an infrequent user of Perl"

?!? How is posting a simple working routine giving you advice? That
FAQ entry is not that great, so 'RTFM' is not a huge contribution to the
subject, actually.
later fool,

a fool for writing some working code and posting it?

This is weird. Did I say I was just trying some simple stuff up front?
Bravado for its own sake? no learning in that...
 
D

DJ Stunks

one said:
I wanted to do something, here is the working code to do
that. Maybe someone will search and find it.

first, it's a FAQ because it's Frequently Asked. So I imagine many
people will search for this topic. With any luck, anyone finding your
crummy so-called answer will also see Paul's response.

Since I am getting the hang of Perl today,
Beginner land, remember? :)

non-asshole beginners find that if they listen to the people who
actually know what they're talking about they learn much faster.
And actually, I am a really great coder. Just not in Perl. :)

How many more of these "great coders" can we take this week? GOD.

So you say really preprocess the array to make it a Hash, like

%CA_Cities = ( 'San Francisco' => 1, 'Los Angeles' => 2 );

why would you make Los Angeles correspond to a 2? what's the matter
with you?

*sigh*
-jp
 
O

one man army

Paul Lalli said:
You're not getting this. What the FAQ is telling you is that if you
already have an array of strings, the best way to search it is to first
convert that array into a hash.

....
No. No one is telling you you have to redefine your entire structure.
You are perfectly able to use your existing structure.

@CA_Cities = ('San Francisco', 'Los Angeles', 'Anaheim', ); #etc - as
defined before
%CA2 = map {$_ => 1} @CA_Cities;
_that_ looks useful
Paul Lalli

You're not getting this.

You're not getting me. I am not arrogant, I said I was using the
language today, and here is something that works. Whats arrogant about
that? Where did I say I was telling anybody to do it this way? This is
twiste..
I stood up for myself ny telling you RTFM was not such a big
contribution, for that you have to character assisinate? no way.
 
P

Paul Lalli

one said:
?!? How is posting a simple working routine giving you advice?

!!! You started this thread with, and I quote:
You don't consider that giving advice to future people with the same
problem?
That
FAQ entry is not that great, so 'RTFM' is not a huge contribution to the
subject, actually.

If you believe the FAQ entry is sub-optimal, by all means, explain what
you consider to be its deficiencies, start a conversation in which you
can suggest some modifications, and possibly even submit a patch via
perlbug.

Perl is open-source afterall.
a fool for writing some working code and posting it?

A fool for posting sub-optimal code, suggesting/hoping that people use
it, and for not checking the FAQ and following its advice before
embarking on a programming task.
This is weird. Did I say I was just trying some simple stuff up front?

Yes. And? Simple stuff shouldn't be done as well as complicated
stuff?

Paul Lalli
 
P

Paul Lalli

one said:
_that_ looks useful

It is. Shocking, I know, that the answer given in the FAQ might be
useful. Again, quoting from the FAQ:
===========================
That being said, there are several ways to approach this. If you are
going to make this query many times over arbitrary string values, the
fastest way is probably to invert the original array and maintain a
hash whose keys are the first array's values.

@blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (@blues) { $is_blue{$_} = 1 }
==========================

Which is simply a different way of writing the same thing I did.
You're not getting me. I am not arrogant,

I'm sorry, but you are. Just because you don't realize it does not
make it so.
I said I was using the
language today, and here is something that works.

Yes, but it doesn't work as well as the existing solutions.
Whats arrogant about
that? Where did I say I was telling anybody to do it this way?

In your original post, in which you hoped someone would find and use
the code you had just written.
This is twiste..
I stood up for myself ny telling you RTFM was not such a big
contribution,

Really? Even after you just admitted that TFM was "useful"? This has
nothing to do with "standing up for oneself". This is about optimal
programming methods, and suggesting that people use sub-optimal
methods, because you either didn't find or didn't understand the FAQ.
for that you have to character assisinate? no way.

I have done no such thing.

Paul Lalli
 
A

A. Sinan Unur

I wanted to do something, here is the working code to
do that. Maybe someone will search and find it. Maybe no one cares.
Whatever. It seems like an array of strings and searching it _is_ a
reasonable thing to request in a programming language. Just because a
Hash is easy in Perl doesn't make it an array of strings. BUT MINUTIA,
I dont really care.

So, you are not really planning to be around when some inexperienced
programmer finds, and uses your code, huh? Will you provide free and
unlimited support for this program?

Sinan
 
R

RedGrittyBrick

one said:
So you say really preprocess the array to make it a Hash, like

%CA_Cities = ( 'San Francisco' => 1, 'Los Angeles' => 2 );

Unless I have misread the thread, it was Paul Lalli who suggested:

my %CA_Cities = map { $_ => 1 } ("adelanto", "agoura hills", "alameda",
"alamo", "albany", "alhambra", <etc>);

which is different in that some sequence numbering of cities not
recorded (since order presumably is of no special interest).
or

@CA_Cities = ( "San Francisco", 1, 'Los Angeles', 2 );
%CA2 = @CA_Cities;

and that is the WAY. I can try that, too.

You have a few constructions where I don't understand why you do things
that way:

I see no need to use double-quotes around "San Francisco". That kind of
inconsistency is something I try to avoid.

Furthermore your
@CA_Cities = ( 'San Francisco', 1, 'Los Angeles', 2 );
is the same as your earlier
@CA_Cities = ( 'San Francisco' => 1, 'Los Angeles' => 2 );
so it makes sense to stick to one style, the latter notation is more
readable to me at least.

Both expressions (with pairs of strings and numbers) are more naturally
used to assign values directly to a hash, so you could replace your
@CA_Cities = ( 'San Francisco', 1, 'Los Angeles', 2 );
%CA2 = @CA_Cities;
with
%CA2 = ( 'San Francisco', 1, 'Los Angeles', 2 );
avoiding the need for an intermediate array. Again I'd use the 'foo'=>1
form.

I'd simply have a hash rather than both a hash *and* an array each
containing the same list of city names. Unless I thought that there was
some useful benefit in storing a list of city names twice over.

I think Pauls suggestion is a lot less typing and the resulting hash
seems to me to be equally useful for the purpose stated (and trivially
modifiable for an ascending sequence number if needed for some other
unstated reason).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top