Sort Map on Value

W

Wojtek

I have a Person object. It contains a pKey (unique) and a name (may
repeat, ie John Smith). The Person object will be held in a collection.
New (or old) people will be added in any order, however I want the
output to be sorted by name. Since the name can repeat I cannot use it
as a key, instead I want to use the pKey.

Normally (sorted on the key) I would use a TreeMap, but I want to use
the key to find a Person, yet sort on the Perons name:

TreeMap<String,Person> people = new TreeMap<String,Person>();
....

Person addPerson(String pKey, String name)
{
Person person = people.get(pKey);

if ( person == null )
{
person = new Person(pKey,name);
people.put(pKey,person);
}

return person;
}

I thought of creating my own Comparitor, however the TreeMap insists
that the comparitor needs to sort on the String (pKey) rather than the
value (Person).

I know I can:
- use two collections, one which is used for lookup the other for
sorting
- ignore the TreeMap and use a simple Map, then array sort the
values().toArray(). (Thanks Roedy)
- make a key which is the (name + pKey) but this would create large
keys.

I was hoping there was a more elegant way.

Yes, a Google search turned this up with solutions, however most of
those did not use Generics :-(
 
W

Wojtek

Eric Sosman wrote :
Wojtek said:
I have a Person object. It contains a pKey (unique) and a name (may repeat,
ie John Smith). The Person object will be held in a collection. New (or
old) people will be added in any order, however I want the output to be
sorted by name. Since the name can repeat I cannot use it as a key, instead
I want to use the pKey.

Normally (sorted on the key) I would use a TreeMap, but I want to use the
key to find a Person, yet sort on the Perons name:

Eva first, then Juan. :)

Map<PKey,Person> map = ...
Person p1 = ...
map.put(p1.getKey(), p1);
Person p2 = ...;
map.put(p2.getKey(), p2);
...

Person[] array = map.values().toArray(new Person[0]);
Arrays.sort(array, new Comparator<Person>() {
public int compare(Person p1, Person p2) {
return p1.getName().compareTo(p2.getName());
}
});
for (Person p : array) {
... visit in name order ...
}

In other words, just maintain the Map in the perfectly ordinary
way. When you want to traverse all the Persons in name order (which
is inherently a "batch" or "mass" operation), grab them from the Map,
sort them in whatever order you like, and traverse to heart's content.

Which is what Roedy's solution stated:
 
A

Andreas Leitgeb

I haven't tried it myself, but isn't the TreeMap designed to be
able to use a custom Ordering? Create a Comparator like Eric did
(for Arrays.sort), but pass it to the TreeMap constructor...

Map persons=new TreeMap<Person>(new Comparator<Person>() {
public int compare(Person p1, Person p2) {
return p1.getName().compareTo(p2.getName());
}
});
persons.put( ... );
 
A

Andreas Leitgeb

Andreas Leitgeb said:
I haven't tried it myself, but isn't the TreeMap designed to be
able to use a custom Ordering? Create a Comparator like Eric did
(for Arrays.sort), but pass it to the TreeMap constructor...
Map persons=new TreeMap<Person>(new Comparator<Person>() {
public int compare(Person p1, Person p2) {
return p1.getName().compareTo(p2.getName());
}
});
persons.put( ... );


Oops, my mistake, TreeMap's Comparable is for the keys, not for the values.
but you can do a custom comparable that just looks up the keys in the treemap
to get to the name for compareTo().

Unless I'm missing something else this time.

It depends on how often you need the sorted sequence. If rarely, then the
separate array may actually be better. If you need it often, or just the first
or last one each time after another person has been added or removed, then my
approach (if it even works) may be better performant. :)

Also you can add each Person to two structures: a map for quick retrieval per
key, and some sorted list, that's perhaps faster to keep sorted, than sorting
it from scratch after every add/remove.
 
T

Tom Anderson

I have a Person object. It contains a pKey (unique) and a name (may
repeat, ie John Smith). The Person object will be held in a collection.
New (or old) people will be added in any order, however I want the
output to be sorted by name. Since the name can repeat I cannot use it
as a key, instead I want to use the pKey.

Normally (sorted on the key) I would use a TreeMap, but I want to use
the key to find a Person, yet sort on the Perons name:

TreeMap<String,Person> people = new TreeMap<String,Person>();
...

Person addPerson(String pKey, String name)
{
Person person = people.get(pKey);

if ( person == null )
{
person = new Person(pKey,name);
people.put(pKey,person);
}

return person;
}

I thought of creating my own Comparitor, however the TreeMap insists that the
comparitor needs to sort on the String (pKey) rather than the value (Person).

I know I can:
- use two collections, one which is used for lookup the other for sorting
- ignore the TreeMap and use a simple Map, then array sort the
values().toArray(). (Thanks Roedy)
- make a key which is the (name + pKey) but this would create large keys.

They wouldn't be large - you could write a little class which holds a
reference to a pKey and a name; since these are both part of the Person
anyway, the overhead of the key object is just an object header and two
pointers. Cheap as chips!

The problem is that there's no way to do lookups just by pKey - you'd need
to know the name to be able to make a key to do a lookup. Since the
comparison would have to be by name, with ties being broken by pKey,
there's no way to cheat and do a lookup just with the pKey (without doing
a linear search).
I was hoping there was a more elegant way.

Do you allocate the pKeys? If there was a way to allocate them such that
they were monotonic with the names, then you could just use the pKeys as
keys, and a map sorted by pKey would be sorted by name too. A simple way
to do this would be to make the pKeys a name with a per-name counter
appended to the end. If you're worried about size, compress the name
somehow (i posted a funky coding scheme that is basically order-preserving
Huffman a while ago), although this may be too complicated. Or just use
the little pair class mentioned above.

You could copy and paste LinkedHashMap, and modify it to keep its linked
list in sorted order (using a supplied Comparator), rather than insertion
order. That would involve an O(n^2) sort, though, which would lose heavily
if you do a large proportion of puts.

tom
 
J

Joshua Cranmer

Andreas said:
Oops, my mistake, TreeMap's Comparable is for the keys, not for the values.
but you can do a custom comparable that just looks up the keys in the treemap
to get to the name for compareTo().

I believe the TreeMap would be calling the compare function in looking
up a key, so you'd need a different map to back the lookup.
 
L

Lew

Eric said:
Wojtek said:
Eric Sosman wrote :
Wojtek wrote:
[...]
In other words, just maintain the Map in the perfectly ordinary
way. When you want to traverse all the Persons in name order (which
is inherently a "batch" or "mass" operation), grab them from the Map,
sort them in whatever order you like, and traverse to heart's content.

Which is what Roedy's solution stated:
ignore the TreeMap and use a simple Map, then array sort the
values().toArray(). (Thanks Roedy)

Right. But Wojtek also expressed some uncertainty about how to use
generics in this solution, so I thought it'd be helpful to illustrate
more fully.

Or you could convert values() to a List that you Collections.sort().
 
A

Andreas Leitgeb

Joshua Cranmer said:
I believe the TreeMap would be calling the compare function in looking
up a key, so you'd need a different map to back the lookup.

Why that? Isn't the lookup by key still hashCode()/equals()-based?

Maybe indeed I'd need a different map for lookup, but then more likely
because during a put()-operation it will need compare() and by that
time, we might find the TreeMap in an inconsistent state for a get().
 
L

Lew

Andreas said:
Why that? Isn't the lookup by key still hashCode()/equals()-based?

Maybe indeed I'd need a different map for lookup, but then more likely
because during a put()-operation it will need compare() and by that
time, we might find the TreeMap in an inconsistent state for a get().
 
W

Wojtek

Tom Anderson wrote :
They wouldn't be large - you could write a little class which holds a
reference to a pKey and a name; since these are both part of the Person
anyway, the overhead of the key object is just an object header and two
pointers. Cheap as chips!

Hmm, so I could use:

TreeMap<Person,Person> with a custom Comparitor using person.getName(),
and an overridden Person.hashCode() returning a hash of the pKey, and
then for a lookup:

people.get(new Person(pKey,"");
 
W

Wojtek

Wojtek wrote :
Tom Anderson wrote :

Hmm, so I could use:

TreeMap<Person,Person> with a custom Comparitor using person.getName(), and
an overridden Person.hashCode() returning a hash of the pKey, and then for a
lookup:

people.get(new Person(pKey,""));

Wait, I need to handle hash collisions. WHat does a Map use for hash
collisions? The object serial number? Anything I can override?
 
L

Lew

Wojtek said:
Wait, I need to handle hash collisions. WHat does a Map use for hash
collisions? The object serial number? Anything I can override?

Equality.

Logically, presence in the map is determined by 'hashCode()' to find the
bucket, and 'equals()' to find which item in that bucket list equals the test
object. A 'put()' will put an absent object in the bucket list at the next
available location. According to the Javadocs, a 'SortedMap' such as
'TreeMap' will use a 'compare()' or 'compareTo()' to do the equality
comparison, so such methods must be consistent with 'equals()'.
 
W

Wojtek

Lew wrote :
Equality.

Logically, presence in the map is determined by 'hashCode()' to find the
bucket, and 'equals()' to find which item in that bucket list equals the test
object. A 'put()' will put an absent object in the bucket list at the next
available location. According to the Javadocs, a 'SortedMap' such as
'TreeMap' will use a 'compare()' or 'compareTo()' to do the equality
comparison, so such methods must be consistent with 'equals()'.

Ok, so this should work then:

class PersonMapKey implements Comparable<PersonMapKey>
{
private String ivKey;
private String ivName;

PersonMapKey( String key, String name )
{
super();

ivKey = key;
ivName = name;
}

private String getKey()
{
return ivKey;
}

private String getName()
{
return ivName;
}

@Override
public int hashCode()
{
return ivKey.hashCode();
}

@Override
public boolean equals( Object key )
{
return ivKey.equals( ((PersonMapKey) key).getKey() );
}

@Override
public int compareTo( PersonMapKey personKey )
{
return ivName.compareTo( personKey.getName() );
}
}

....

Person addPerson( String key String name )
{
PersonMapKey mapKey = new PersonMapKey( key, name );
Person person = ivPersonMap.get( mapKey );

if person == null)
{
person = new Person( key, name );
ivPersonMap.put( mapKey, person );
}

return person;
}
 
L

Lew

Wojtek said:
Ok, so this should work then:

class PersonMapKey implements Comparable<PersonMapKey>
{
private String ivKey;
private String ivName;

PersonMapKey( String key, String name )
{
super();

This idiom puzzles me. Why call 'super()' explicitly? Particularly since the
class extends 'Object'?
ivKey = key;
ivName = name;
}

private String getKey()
{
return ivKey;
}

private String getName()
{
return ivName;
}

@Override
public int hashCode()
{
return ivKey.hashCode();
}

@Override
public boolean equals( Object key )
{
return ivKey.equals( ((PersonMapKey) key).getKey() );
}

@Override
public int compareTo( PersonMapKey personKey )
{
return ivName.compareTo( personKey.getName() );
}

Is this consistent with 'equals()'?
 
A

Andreas Leitgeb

And that after I believed to know all the Maps by heart already...

I stand corrected.
 
L

Lew

Andreas said:
And that after I believed to know all the Maps by heart already...

I stand corrected.

I'm pretty sure that the business about using 'compare[To]()' applies only to
SortedMaps.
 
A

Andreas Leitgeb

Lew said:
Andreas said:
And that after I believed to know all the Maps by heart already...
I stand corrected.
I'm pretty sure that the business about using 'compare[To]()' applies only to
SortedMaps.

Yes, of course. I thought I had known those<* extends SortedMap>, as well. :)
Now back to studying...
 
D

Donkey Hottie

Lew said:
Or you could convert values() to a List that you
Collections.sort().

That would be extra code to run. The source code of Collections.sort() shows
that the List will be converted into an array, and then sorted using
Arrays.sort().

Just saying.

(I discovered this when I was wondering how sorting a LinkedList might work,
as it takes the same amount of time than sorting an ArrayList.. they get
sorted via an array).
 
A

Andreas Leitgeb

Donkey Hottie said:
(I discovered this when I was wondering how sorting a LinkedList might work,
as it takes the same amount of time than sorting an ArrayList.. they get
sorted via an array).

As another sidenote, quicksort could be implemented directly on (doubly-)linked
lists, if I happen not to be mistaken this time.

Java prefers (for good reasons) a stable sort like mergesort, which furthermore
also hasn't got those O(n^2) worst cases, so it doesn't really matter what could
work with linked lists, unless one implements his own quicksort.
 
E

Eric Sosman

Andreas said:
As another sidenote, quicksort could be implemented directly on (doubly-)linked
lists, if I happen not to be mistaken this time.

Java prefers (for good reasons) a stable sort like mergesort, which furthermore
also hasn't got those O(n^2) worst cases, so it doesn't really matter what could
work with linked lists, unless one implements his own quicksort.

Topicality drift: Quicksort on a singly- or doubly-linked list
can be implemented stably; in fact, stability pretty much "falls out"
of the obvious implementation. (So does "Dutch national flag"
partitioning.)

The hard part in a list-based Quicksort is choosing a good
partitioning key. Sampling schemes like the familiar "median of
first, middle, last" require navigating through the list to obtain
the samples. "Pick a random node's key" similarly requires
traversing half the list on average (a quarter if it's doubly
linked). Choosing a random key for stage S while performing the
partitioning at stage S-1 requires generating a random number for
every item.

As far as I can see, you're stuck with O(N) "overhead" just to
choose a suitable partitioning key for an N-node sub-list. (Well,
I guess you *could* just grab the key of the list's first node in
O(1) time, but the risk of degeneracy rises dramatically ...)

The reason Quicksort is "quick" on arrays is that its inner
loops are short and have good locality; bloat the inner loops with
random number generation, or apply them to a linked list that's
scattered all over memory, and Quicksort starts to lose its appeal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top