Sorting an associative array

N

Nathan Olson

I've got an associative array whose keys are movie titles. I'd like to sort
the keys in an order that takes into account titles beginning with "A" or
"The." In other words, "The Graduate" ought to be sorted with the 'g's, not
the 't's. Is there any way to do this that doesn't involve sorting manually?


Thanks in advance,
Nate Olson
 
B

BZ

Nathan Olson wrote in comp.lang.perl.misc:
I've got an associative array whose keys are movie titles. I'd like to sort
the keys in an order that takes into account titles beginning with "A" or
"The." In other words, "The Graduate" ought to be sorted with the 'g's, not
the 't's. Is there any way to do this that doesn't involve sorting manually?

Something like this should work:

sort {
$a =~ s/^(a|the)\s+//;
$b =~ s/^(a|the)\s+//;
$a <=> $b
} keys %hash;
 
G

Gunnar Hjalmarsson

Nathan said:
I've got an associative array whose keys are movie titles. I'd like
to sort the keys in an order that takes into account titles
beginning with "A" or "The." In other words, "The Graduate" ought
to be sorted with the 'g's, not the 't's.

my @sortedtitles = sort {
($a =~ /(?i:the|a)?\s*(.+)/)[0]
cmp
($b =~ /(?i:the|a)?\s*(.+)/)[0]
} keys %movies;
 
J

John Bokma

Nathan said:
I've got an associative array whose keys are movie titles. I'd like to sort
the keys in an order that takes into account titles beginning with "A" or
"The." In other words, "The Graduate" ought to be sorted with the 'g's, not
the 't's. Is there any way to do this that doesn't involve sorting manually

create a look up table (array) consisting of arrays with the first
element the title, and the second one the title with "A " and "The " etc
removed. Sort it on the *second* element.

Next use the first element to index your assoc array.

Google for 'Schwartzian Transform' for some nice examples.
for example: http://www.stonehenge.com/merlyn/UnixReview/col06.html
 
J

John Bokma

BZ said:
Nathan Olson wrote in comp.lang.perl.misc:


Something like this should work:

sort {
$a =~ s/^(a|the)\s+//;
$b =~ s/^(a|the)\s+//;
$a <=> $b
} keys %hash;

Which does O(n log n) replacements. It might be faster to create a
look-up table, with O(n) replacements, and use that to sort.
 
G

Gunnar Hjalmarsson

Gunnar said:
my @sortedtitles = sort {
($a =~ /(?i:the|a)?\s*(.+)/)[0]
cmp
($b =~ /(?i:the|a)?\s*(.+)/)[0]
} keys %movies;

Correction: Make that

my @sortedtitles = sort {
($a =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
cmp
($b =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
} keys %movies;
 
J

John Bokma

Gunnar said:
Nathan said:
I've got an associative array whose keys are movie titles. I'd like
to sort the keys in an order that takes into account titles
beginning with "A" or "The." In other words, "The Graduate" ought
to be sorted with the 'g's, not the 't's.


my @sortedtitles = sort {
($a =~ /(?i:the|a)?\s*(.+)/)[0]

This fails (?) with Thesomething and Asomething. Yeah, I know that the
specs said A and The., but you missed the dot after The too, so that's
not an excuse :-D

It also strips spaces in front of titles starting with spaces (ok there
probably are none like that)
 
J

John Bokma

Gunnar said:
Gunnar said:
my @sortedtitles = sort {
($a =~ /(?i:the|a)?\s*(.+)/)[0]
cmp
($b =~ /(?i:the|a)?\s*(.+)/)[0]
} keys %movies;

Correction: Make that

my @sortedtitles = sort {
($a =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
cmp
($b =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
} keys %movies;

"Someone and the thingy"

(don't you need ^ ?)

And how to index %movies? I guess that the OP needs to access the info
in the %movies assoc.
 
G

Gunnar Hjalmarsson

John said:
Gunnar said:
my @sortedtitles = sort {
($a =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
cmp
($b =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
} keys %movies;

"Someone and the thingy"

(don't you need ^ ?)

No. Unless a key starts with 'A ' or 'The ', (.+) captures the whole
key. (But ^ wouldn't have hurted, for the sake of clarity...)
And how to index %movies? I guess that the OP needs to access the
info in the %movies assoc.

Not sure what you mean. @sortedtitles can now be used to access the
info in %movies in the desired order.
 
G

Gunnar Hjalmarsson

BZ said:
Nathan Olson wrote in comp.lang.perl.misc:

Something like this should work:

sort {
$a =~ s/^(a|the)\s+//;
$b =~ s/^(a|the)\s+//;
$a <=> $b
} keys %hash;

Did you try it?

- It does not replace case insensitively.
- It sorts strings numerically.

Besides that, since all the elements in the returned list are no
longer an (exact) key in the hash, how would you use the list?
 
J

John Bokma

Gunnar said:
John said:
Gunnar said:
my @sortedtitles = sort {
($a =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
cmp
($b =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
} keys %movies;


"Someone and the thingy"

(don't you need ^ ?)

No. Unless a key starts with 'A ' or 'The ', (.+) captures the whole
key. (But ^ wouldn't have hurted, for the sake of clarity...)

Ah, grmbl, indeed.
Not sure what you mean. @sortedtitles can now be used to access the info
in %movies in the desired order.

Indeed, my mistake again, it's a match :-(
 
J

John W. Krahn

Nathan said:
I've got an associative array whose keys are movie titles. I'd like to sort
the keys in an order that takes into account titles beginning with "A" or
"The." In other words, "The Graduate" ought to be sorted with the 'g's, not
the 't's. Is there any way to do this that doesn't involve sorting manually?

my @sorted_movie_titles =
map { s/^[^\0]+\0//; $_ }
sort
map { (my $x = $_) =~ s/^(?:a|the)\s*//i; "$x\0$_" }
keys %hash;



John
 
G

Gunnar Hjalmarsson

John said:
map { (my $x = $_) =~ s/^(?:a|the)\s*//i; "$x\0$_" }
------------------------------------------^

Seems as if you made a mistake similar to the one I made in my first post.

map { (my $x = $_) =~ s/^(?:a|the)\s+//i; "$x\0$_" }
 
J

John Bokma

Purl said:
I wish he would. He does need to learn Perl.

True, every day I learn more and more Perl. Started 10 years ago, and
wish I had more time to study, now, and the past 10 years. That I stick
with it for 10 years is because there is so much to learn and the
language keeps amazing me.
He is about the only one left harassing our family
on a daily basis. I have noted he has spent around
four to six hours today, probing our server trying
to find a way in.

Funny, I am awake for a few hours, had breakfast, spend time with my
partner, and read a bit on the Usenet.
Psychotic obsession is a type of mental disturbance.

You keep on proving to no zero about even the basics of
psychology/psychiatry.
Personally, I would much rather discuss Perl

Then why don't you start learning this language?
I am a bit disturbed

Yes, I think by now the entire Perl community is aware of that fact.
Perhaps he will take your advice. Highly doubtful considering
the type of psychosis he and others display.

Most people who suffer from psychosis are able to think correct and
valid within their reality, which is not that far removed from "reality"
 
J

John W. Krahn

Gunnar said:
------------------------------------------^

Seems as if you made a mistake similar to the one I made in my first post.

map { (my $x = $_) =~ s/^(?:a|the)\s+//i; "$x\0$_" }

Ah yes, thanks.


John
 
G

Gunnar Hjalmarsson

Purl said:
Gunnar said:
my @sortedtitles = sort {
($a =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
cmp
($b =~ /(?:(?i:the|a)\s+)?(.+)/)[0]
} keys %movies;

à beintôt will give you fits, but hardly a point of critque.

Both Perl 5.6 and Perl 5.8 choke on accented characters
on my system.

What?? Are you implying that not all movies are made in the US and
have all American titles? ;-)

Assuming that you are right about that, what exactly do you mean by
"choke"? I notice that "use locale;" makes a difference on my box with
respect to the sort order (I'm Swedish, so it probably turns on a
Swedish locale), but I don't get any errors or warnings.
 
G

Gunnar Hjalmarsson

Purl said:
I am always right.

Sorry, forgot that.
Returned sort order is wrong for accented characters. This is
assuming "a" and "à" should be grouped together, but which should
be first, is a "who's on first" question.

Then it has nothing to do with the sorting code and it has everything
to do with locales. Choose a suitable locale, and you're done.
 
G

Gunnar Hjalmarsson

Purl said:
Strikes me a and à should be grouped together at the beginning
of a sort list. Which should be first, I do not have a clue. I
suppose for you, à should be before a in a list. Maybe second
because à is a more busy letter?

Actually, 'à' is not part of the Swedish alphabet (we have å, ä and ö
besides the ASCII letters), but intuitively I'd say that 'a' should
come first.
Does your locale group those two together?

Yes.

Playing with the list you posted in another message:

$, = ' ';
print sort qw(a ab b bc à àb);

does not change the order, which is as expected since Perl ignores all
locales by default. But

use locale;
$, = ' ';
print sort qw(a ab b bc à àb);

outputs:
a à ab àb b bc

on my box.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,413
Latest member
ReeceDorri

Latest Threads

Top