HashMap with primitive int key

A

Andrew E

Dear All,

I have a class like this:

class CustomerOrder {
int globalID;
...
}

I have an array of these.
I'd like to store an index of globalIDs, so I can do:

CustomerOrder o = index.get(123)

I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object for every
ID when I can just use an int bothers me.

I figure a HashMap is the right tool of choice, e.g.:

HashMap<Integer, CustomerOrder> index = new ...


Has anyone out there already written a modified hashmap class that uses int as
the key?

Thanks for any tips :)

Andrew
 
O

Oliver Wong

Andrew E said:
Dear All,

I have a class like this:

class CustomerOrder {
int globalID;
...
}

I have an array of these.
I'd like to store an index of globalIDs, so I can do:

CustomerOrder o = index.get(123)

I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object for
every
ID when I can just use an int bothers me.

I figure a HashMap is the right tool of choice, e.g.:

HashMap<Integer, CustomerOrder> index = new ...


Has anyone out there already written a modified hashmap class that uses
int as
the key?

Thanks for any tips :)

If you use Integer.valueOf(int), it'll re-use instances of Integer when
possible rather than creating a new instance every time with the
constructor. See the Flyweight design pattern.

- Oliver
 
C

Chris Smith

Andrew E said:
I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object for every
ID when I can just use an int bothers me.

It's possible to write a custom hashing class that uses int as the key
type... but it's probably not a good idea. Instead, just eat the cost
of the new objects. Short-lived object allocation has very different
performance characteristics in C++ and Java, so your intuition may be
off here.

In any case, it's easy enough to fix if you find out it's a performance
problem.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Z

zero

Andrew E said:
Dear All,

I have a class like this:

class CustomerOrder {
int globalID;
...
}

I have an array of these.
I'd like to store an index of globalIDs, so I can do:

CustomerOrder o = index.get(123)

I could modify my class to be:

class CustomerOrder {
Integer globalID;
...
}

Perhaps it is my C++ background, but the idea of allocating an object
for every ID when I can just use an int bothers me.

I figure a HashMap is the right tool of choice, e.g.:

HashMap<Integer, CustomerOrder> index = new ...


Has anyone out there already written a modified hashmap class that
uses int as the key?

Thanks for any tips :)

Andrew

In JDK 1.5 your original code would work fine because Java creates an
Integer object automatically. Of course this doesn't really change
anything, it just hides the object from you.

Obviously, a primitive int does not have a hash code - unless you define
the int itself as its own hash code. It wouldn't be too hard to re-
implement HashMap to use a raw int as key, but this would only be a good
idea if you can be sure you won't have too many collisions. And even
then, I think it wouldn't be worth the effort.
 
A

Andrew E

Chris said:
It's possible to write a custom hashing class that uses int as the key
type... but it's probably not a good idea. Instead, just eat the cost
of the new objects. Short-lived object allocation has very different
performance characteristics in C++ and Java, so your intuition may be
off here.

Thanks for your reply Chris, and thanks to Oliver and zero too.

Its interesting that you wrote there are different performance characteristics
for short-lived objects between C++ and Java.

Can you expand on that, or suggest some references to read? Perhaps my C++
background is hampering me here, as I look at 'new Integer(1)' etc and cringe.
I'd like to learn more in this area.
Any tips?

Thanks :)
Andrew
 
I

Ian Pilcher

zero said:
Obviously, a primitive int does not have a hash code - unless you define
the int itself as its own hash code. It wouldn't be too hard to re-
implement HashMap to use a raw int as key, but this would only be a good
idea if you can be sure you won't have too many collisions. And even
then, I think it wouldn't be worth the effort.

Have you looked at what Integer.hashCode() returns?
 
C

Chris Smith

Andrew E said:
Its interesting that you wrote there are different performance characteristics
for short-lived objects between C++ and Java.

Can you expand on that, or suggest some references to read? Perhaps my C++
background is hampering me here, as I look at 'new Integer(1)' etc and cringe.
I'd like to learn more in this area.

Sure.

Obviously, Java and C++ differ in that Java is designed as a garbage
collected language, while C++ is designed with a manually managed heap.
It's a common, but very wrong, assumption that garbage collection is
something that's ADDED in Java, and that otherwise memory allocation
looks about the same. Neither language specifies the management of the
heap, but in fact, the heap looks very different in typical Java than in
typical C++.

C++ keeps a linked list of available memory for a process, and
allocating an object means starting somewhere in that list and walking
through the list to look for some available bit of memory. This is,
worst case, actually O(n) on the total number blocks of memory currently
in use by the system -- though clearly it's not quite that bad in common
practice. In return for the terrible asymptotic complexity of memory
allocation, deallocation in C++ is cheap, running in constant time.

(I'm not considering calling destructors, which obviously could run in
non-constant time, to be part of deallocation, nor do I consider
constructors as part of allocation.)

Java doesn't maintain a linked list, at least for newly allocated
objects. Instead, it just has a block of memory and keeps lopping off
more objects until it runs out. That block of memory is called the
nursery. So allocation is constant time, and with an extremely low
constant at that! Andrew Appel famously demonstrated that on processors
with auto-increment addressing modes, allocation can actually be
performed by dedicated CPU hardware with no performance cost whatsoever,
so it may actually be said to be O(0), or free. By contrast,
deallocation is more expensive; it is O(m+n), where m is the size of the
root set -- often considered to be constant -- and n is the number of
objects that will NOT be deallocated. Hence, the only cost of
allocating a small, short-lived object is that the next minor
deallocation (garbage collection) will need to happen a little sooner.

Now, clearly that allocation is not free... more frequent garbage
collections do add up. However, the impact is certainly far less than
frequently allocating and freeing small objects in C++.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
Z

zero

Have you looked at what Integer.hashCode() returns?

lol no I haven't, and I'm quite sure it's just the integer itself. But
that doesn't change the fact that a raw int doesn't have a hash code.
 
T

Thomas Hawtin

Andrew said:
Its interesting that you wrote there are different performance characteristics
for short-lived objects between C++ and Java.

Can you expand on that, or suggest some references to read?

"Java theory and practice: Urban performance legends, revisited" is a
recent, widely linked article that comes top of my google search for
Java allocation.

http://www-128.ibm.com/developerworks/java/library/j-jtp09275.html?ca=dgr-lnxw01JavaUrbanLegends

In Sun's JVM at least: Each thread has Thread Local Allocation Buffer
(TLAB). A Processor Local Allocation Buffer (PLAB) is also an effective
alternative, using some cunning tricks. Small objects are allocated
within the thread's TLAB. The fast-path for allocating these objects is
effectively just bumping a pointer (with some complications for
interactions with the garbage collector). When the TLAB is exhausted, a
new area is allocated. Periodically the GC will copy out any live
objects within the old area. The area now containing copied and dead
objects can just be used a fresh area of memory. It's not necessary to
so much as look at dead objects.

The details are quite complicated. For instance, working out when
objects from other areas reference newly created objects. Still, where
you have lots of short lived objects, the performance easily beats
malloc-style algorithms.
Perhaps my C++
background is hampering me here, as I look at 'new Integer(1)' etc and cringe.
I'd like to learn more in this area.

That should read 'Integer.valueOf(1)'.

Tom Hawtin
 
T

Thomas Hawtin

Chris said:
C++ keeps a linked list of available memory for a process, and
allocating an object means starting somewhere in that list and walking
through the list to look for some available bit of memory. This is,
worst case, actually O(n) on the total number blocks of memory currently
in use by the system -- though clearly it's not quite that bad in common
practice. In return for the terrible asymptotic complexity of memory
allocation, deallocation in C++ is cheap, running in constant time.

I think C/C++ tends to be better implemented than that.

I remember Acorn (ARM) C malloc having a description of the algorithm
used. For small sizes it maintained an array of linked lists. A separate
list for each allocation size (rounded to a certain power of two). Small
allocations consisted of: checking value is size is small enough,
rounding to the fixed power of two, loading the table address, checking
that there is a node in that list and switching to the next link. That
should be about as fast as the modern Java implementation, and have much
better cache locality. Obviously preemptive multi-threading would have
made it more difficult.

Tom Hawtin
 
R

Roedy Green

A Processor Local Allocation Buffer (PLAB) is also an effective
alternative, using some cunning tricks.

then there is one more trick to allocate objects on the stack so they
automatically get popped off when the method ends without using
garbage collection.
 
R

Roedy Green

lol no I haven't, and I'm quite sure it's just the integer itself. But
that doesn't change the fact that a raw int doesn't have a hash code.

Integer in contrast does have an hashCode, the integer value itself

public int hashCode() {
return value;

Long xors the low and high halves:

public int hashCode() {
return (int)(value ^ (value >>> 32));
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top