HashMap with primitive int key

Andrew E · Nov 28, 2005

Dear All,

I have a class like this:

class CustomerOrder {
int globalID;
...
}

I have an array of these.
I'd like to store an index of globalIDs, so I can do:

CustomerOrder o = index.get(123)

I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object for every
ID when I can just use an int bothers me.

I figure a HashMap is the right tool of choice, e.g.:

HashMap<Integer, CustomerOrder> index = new ...

Has anyone out there already written a modified hashmap class that uses int as
the key?

Thanks for any tips

Andrew

Oliver Wong · Nov 28, 2005

Andrew E said:
Dear All,

I have a class like this:

class CustomerOrder {
int globalID;
...
}

I have an array of these.
I'd like to store an index of globalIDs, so I can do:

CustomerOrder o = index.get(123)

I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object for
every
ID when I can just use an int bothers me.

I figure a HashMap is the right tool of choice, e.g.:

HashMap<Integer, CustomerOrder> index = new ...

Has anyone out there already written a modified hashmap class that uses
int as
the key?

Thanks for any tips

If you use Integer.valueOf(int), it'll re-use instances of Integer when
possible rather than creating a new instance every time with the
constructor. See the Flyweight design pattern.

- Oliver

Chris Smith · Nov 28, 2005

Andrew E said:
I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object for every
ID when I can just use an int bothers me.

It's possible to write a custom hashing class that uses int as the key
type... but it's probably not a good idea. Instead, just eat the cost
of the new objects. Short-lived object allocation has very different
performance characteristics in C++ and Java, so your intuition may be
off here.

In any case, it's easy enough to fix if you find out it's a performance
problem.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

zero · Nov 29, 2005

Andrew E said:
Dear All,

I have a class like this:

class CustomerOrder {
int globalID;
...
}

I have an array of these.
I'd like to store an index of globalIDs, so I can do:

CustomerOrder o = index.get(123)

I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object
for every ID when I can just use an int bothers me.

I figure a HashMap is the right tool of choice, e.g.:

HashMap<Integer, CustomerOrder> index = new ...

Has anyone out there already written a modified hashmap class that
uses int as the key?

Thanks for any tips

Andrew

In JDK 1.5 your original code would work fine because Java creates an
Integer object automatically. Of course this doesn't really change
anything, it just hides the object from you.

Obviously, a primitive int does not have a hash code - unless you define
the int itself as its own hash code. It wouldn't be too hard to re-
implement HashMap to use a raw int as key, but this would only be a good
idea if you can be sure you won't have too many collisions. And even
then, I think it wouldn't be worth the effort.

Andrew E · Nov 29, 2005

Chris said:
It's possible to write a custom hashing class that uses int as the key
type... but it's probably not a good idea. Instead, just eat the cost
of the new objects. Short-lived object allocation has very different
performance characteristics in C++ and Java, so your intuition may be
off here.

Thanks for your reply Chris, and thanks to Oliver and zero too.

Its interesting that you wrote there are different performance characteristics
for short-lived objects between C++ and Java.

Can you expand on that, or suggest some references to read? Perhaps my C++
background is hampering me here, as I look at 'new Integer(1)' etc and cringe.
I'd like to learn more in this area.
Any tips?

Thanks

Andrew

Ian Pilcher · Nov 29, 2005

zero said:
Obviously, a primitive int does not have a hash code - unless you define
the int itself as its own hash code. It wouldn't be too hard to re-
implement HashMap to use a raw int as key, but this would only be a good
idea if you can be sure you won't have too many collisions. And even
then, I think it wouldn't be worth the effort.

Have you looked at what Integer.hashCode() returns?

Chris Smith · Nov 29, 2005

Andrew E said:
Its interesting that you wrote there are different performance characteristics
for short-lived objects between C++ and Java.

Can you expand on that, or suggest some references to read? Perhaps my C++
background is hampering me here, as I look at 'new Integer(1)' etc and cringe.
I'd like to learn more in this area.

Sure.

Obviously, Java and C++ differ in that Java is designed as a garbage
collected language, while C++ is designed with a manually managed heap.
It's a common, but very wrong, assumption that garbage collection is
something that's ADDED in Java, and that otherwise memory allocation
looks about the same. Neither language specifies the management of the
heap, but in fact, the heap looks very different in typical Java than in
typical C++.

C++ keeps a linked list of available memory for a process, and
allocating an object means starting somewhere in that list and walking
through the list to look for some available bit of memory. This is,
worst case, actually O(n) on the total number blocks of memory currently
in use by the system -- though clearly it's not quite that bad in common
practice. In return for the terrible asymptotic complexity of memory
allocation, deallocation in C++ is cheap, running in constant time.

(I'm not considering calling destructors, which obviously could run in
non-constant time, to be part of deallocation, nor do I consider
constructors as part of allocation.)

Java doesn't maintain a linked list, at least for newly allocated
objects. Instead, it just has a block of memory and keeps lopping off
more objects until it runs out. That block of memory is called the
nursery. So allocation is constant time, and with an extremely low
constant at that! Andrew Appel famously demonstrated that on processors
with auto-increment addressing modes, allocation can actually be
performed by dedicated CPU hardware with no performance cost whatsoever,
so it may actually be said to be O(0), or free. By contrast,
deallocation is more expensive; it is O(m+n), where m is the size of the
root set -- often considered to be constant -- and n is the number of
objects that will NOT be deallocated. Hence, the only cost of
allocating a small, short-lived object is that the next minor
deallocation (garbage collection) will need to happen a little sooner.

Now, clearly that allocation is not free... more frequent garbage
collections do add up. However, the impact is certainly far less than
frequently allocating and freeing small objects in C++.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation

zero · Nov 29, 2005

Have you looked at what Integer.hashCode() returns?

lol no I haven't, and I'm quite sure it's just the integer itself. But
that doesn't change the fact that a raw int doesn't have a hash code.

Thomas Hawtin · Nov 29, 2005

Andrew said:
Its interesting that you wrote there are different performance characteristics
for short-lived objects between C++ and Java.

Can you expand on that, or suggest some references to read?

"Java theory and practice: Urban performance legends, revisited" is a
recent, widely linked article that comes top of my google search for
Java allocation.

http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-lnxw01JavaUrbanLegends

In Sun's JVM at least: Each thread has Thread Local Allocation Buffer
(TLAB). A Processor Local Allocation Buffer (PLAB) is also an effective
alternative, using some cunning tricks. Small objects are allocated
within the thread's TLAB. The fast-path for allocating these objects is
effectively just bumping a pointer (with some complications for
interactions with the garbage collector). When the TLAB is exhausted, a
new area is allocated. Periodically the GC will copy out any live
objects within the old area. The area now containing copied and dead
objects can just be used a fresh area of memory. It's not necessary to
so much as look at dead objects.

The details are quite complicated. For instance, working out when
objects from other areas reference newly created objects. Still, where
you have lots of short lived objects, the performance easily beats
malloc-style algorithms.

Perhaps my C++
background is hampering me here, as I look at 'new Integer(1)' etc and cringe.
I'd like to learn more in this area.

That should read 'Integer.valueOf(1)'.

Tom Hawtin

Thomas Hawtin · Nov 29, 2005

Chris said:
C++ keeps a linked list of available memory for a process, and
allocating an object means starting somewhere in that list and walking
through the list to look for some available bit of memory. This is,
worst case, actually O(n) on the total number blocks of memory currently
in use by the system -- though clearly it's not quite that bad in common
practice. In return for the terrible asymptotic complexity of memory
allocation, deallocation in C++ is cheap, running in constant time.

I think C/C++ tends to be better implemented than that.

I remember Acorn (ARM) C malloc having a description of the algorithm
used. For small sizes it maintained an array of linked lists. A separate
list for each allocation size (rounded to a certain power of two). Small
allocations consisted of: checking value is size is small enough,
rounding to the fixed power of two, loading the table address, checking
that there is a node in that list and switching to the next link. That
should be about as fast as the modern Java implementation, and have much
better cache locality. Obviously preemptive multi-threading would have
made it more difficult.

Tom Hawtin

Roedy Green · Nov 30, 2005

A Processor Local Allocation Buffer (PLAB) is also an effective
alternative, using some cunning tricks.

then there is one more trick to allocate objects on the stack so they
automatically get popped off when the method ends without using
garbage collection.

Roedy Green · Nov 30, 2005

lol no I haven't, and I'm quite sure it's just the integer itself. But
that doesn't change the fact that a raw int doesn't have a hash code.

Integer in contrast does have an hashCode, the integer value itself

public int hashCode() {
return value;

Long xors the low and high halves:

public int hashCode() {
return (int)(value ^ (value >>> 32));
}

key-value pair , with key as primitive data type	1	Nov 16, 2007
Integer as key in HashMap	12	Jan 17, 2007
C# MVC Razor page form	0	Feb 9, 2024
need clarification on HashMap storage-retrieval	7	Sep 3, 2008
HashMap problem: insert with hash code retrieve by index	4	Apr 27, 2008
How HashMap performs a get?	14	Jul 19, 2006
Updating an object in a HashMap	15	Feb 13, 2008
How easy it is to manipulate a hashmap?	3	Mar 2, 2005

HashMap with primitive int key

Andrew E

Oliver Wong

Chris Smith

zero

Andrew E

Ian Pilcher

Chris Smith

zero

Thomas Hawtin

Thomas Hawtin

Roedy Green

Roedy Green

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads