Cannot seem to lock HashMap

B

byoder

Basically the situation is that I have a HashMap (yes I know this is
not synchronized and therefore must do it manually). The HashMap is
used by two threads running concurrently, and one thread is adding
values to the map while the other is iterating over the values. Now
typically I would expect to get the ConcurrentModificationException -
but I have tried to synchronize the object manually without any luck.

Perhaps I don't understand locking, but according to Java 1.5 API this
should work, I wrote the following test class to demonstrait what I am
trying to do:

import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

public class TestIndex {

public static HashMap<Date, Date> values = new HashMap<Date, Date>();

public static void main(String[] args) {
new Thread_1().start();
new Thread_2().start();
}

public static class Thread_1 extends Thread {

public Thread_1() {
super("Thread_1");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
TestIndex.values.put(new Date(),new Date());
}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_1 END");
}
}

public static class Thread_2 extends Thread {

public Thread_2() {
super("Thread_2");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {

Map m = Collections.synchronizedMap(TestIndex.values);
synchronized (m) {
HashMap<Date, Date> newMap = new HashMap<Date, Date>(m);
}

}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_2 END");
}
}

}
 
S

SadRed

Basically the situation is that I have a HashMap (yes I know this is
not synchronized and therefore must do it manually). The HashMap is
used by two threads running concurrently, and one thread is adding
values to the map while the other is iterating over the values. Now
typically I would expect to get the ConcurrentModificationException -
but I have tried to synchronize the object manually without any luck.

Perhaps I don't understand locking, but according to Java 1.5 API this
should work, I wrote the following test class to demonstrait what I am
trying to do:

import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

public class TestIndex {

public static HashMap<Date, Date> values = new HashMap<Date, Date>();

public static void main(String[] args) {
new Thread_1().start();
new Thread_2().start();
}

public static class Thread_1 extends Thread {

public Thread_1() {
super("Thread_1");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
TestIndex.values.put(new Date(),new Date());
}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_1 END");
}
}

public static class Thread_2 extends Thread {

public Thread_2() {
super("Thread_2");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {

Map m = Collections.synchronizedMap(TestIndex.values);
synchronized (m) {
HashMap<Date, Date> newMap = new HashMap<Date, Date>(m);
}

}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_2 END");
}
}

}

Use this technique mentioned in the API documentation of the HashMap
class:
Map m = Collections.synchronizedMap(new HashMap(...));
 
L

Lew

Rather than "locking" you should be thinking "synchronization".

What are you trying to do? It's not clear from the code.
import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

public class TestIndex {

public static HashMap<Date, Date> values = new HashMap<Date, Date>();

public static void main(String[] args) {
new Thread_1().start();
new Thread_2().start();
}

public static class Thread_1 extends Thread {

public Thread_1() {
super("Thread_1");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
TestIndex.values.put(new Date(),new Date());
}

Nothing is synchronized here.

OK, here you create a Map from your "values" Map a million times, doubly
synchronized (?) and copied into another Map, which latter is then discarded.

Your Thread_2 class creates a million copies of your values Map, possibly at
various stages of its population with values, inside a synchronized block that
synchronizes on a synchronized Map.

What are you really trying to do?
 
O

Owen Jacobson

Basically the situation is that I have a HashMap (yes I know this is
not synchronized and therefore must do it manually). The HashMap is
used by two threads running concurrently, and one thread is adding
values to the map while the other is iterating over the values. Now
typically I would expect to get the ConcurrentModificationException -
but I have tried to synchronize the object manually without any luck.

Perhaps I don't understand locking, but according to Java 1.5 API this
should work, I wrote the following test class to demonstrait what I am
trying to do:

import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

public class TestIndex {

public static HashMap<Date, Date> values = new HashMap<Date, Date>();

public static void main(String[] args) {
new Thread_1().start();
new Thread_2().start();
}

public static class Thread_1 extends Thread {

public Thread_1() {
super("Thread_1");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
TestIndex.values.put(new Date(),new Date());
}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_1 END");
}
}

public static class Thread_2 extends Thread {

public Thread_2() {
super("Thread_2");
this.setDaemon(false);
}

public void run() {
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {

Map m = Collections.synchronizedMap(TestIndex.values);
synchronized (m) {
HashMap<Date, Date> newMap = new HashMap<Date, Date>(m);
}

}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_2 END");
}
}

}

Your Thread_1 thread is not participating in any synchronization
whatsoever, which means your Thread_2 reader thread is subject to race
conditions in a number of ways. As SadRed suggested, use the
synchronizedMap wrapper provided by Collections; alternately, have
Thread_1 also synchronize on the map while inserting entries:

public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
synchronized (TestIndex.values) {
TestIndex.values.put(new
Date(),new Date());
}
}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_1 END");
}
 
G

Gordon Beaton

As to what I am trying to do: Basically I want to use the HashMap vs.
Hashtable because it is faster, and because there is really only one
situation that I have that needs to be synchronized. That situation
is when I need to clone my HashMap. Unfortunatly when cloning my
HashMap it is very likely that another thread is adding or removing
values to the HashMap. So I need to somehow lock down the HashMap so
that it cannot be modified while the clone method is called.

Then you also must synchronize operations that add or remove values.

If you synchronize only while cloning, then only cloning becomes an
exclusive operation (i.e. you can't clone the same table twice at the
same time).

If you neglect to synchronize add or remove, then those operations are
not affected by the synchronization on clone.

/gordon

--
 
B

byoder

Basically the situation is that I have a HashMap (yes I know this is
not synchronized and therefore must do it manually). The HashMap is
used by two threads running concurrently, and one thread is adding
values to the map while the other is iterating over the values. Now
typically I would expect to get the ConcurrentModificationException -
but I have tried to synchronize the object manually without any luck.
Perhaps I don't understand locking, but according to Java 1.5 API this
should work, I wrote the following test class to demonstrait what I am
trying to do:
import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
public class TestIndex {
public static HashMap<Date, Date> values = new HashMap<Date, Date>();
public static void main(String[] args) {
new Thread_1().start();
new Thread_2().start();
}
public static class Thread_1 extends Thread {
public Thread_1() {
super("Thread_1");
this.setDaemon(false);
}
public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
TestIndex.values.put(new Date(),new Date());
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_1 END");
}
}
public static class Thread_2 extends Thread {
public Thread_2() {
super("Thread_2");
this.setDaemon(false);
}
public void run() {
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {
Map m = Collections.synchronizedMap(TestIndex.values);
synchronized (m) {
HashMap<Date, Date> newMap = new HashMap<Date, Date>(m);
}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_2 END");
}
}

Your Thread_1 thread is not participating in any synchronization
whatsoever, which means your Thread_2 reader thread is subject to race
conditions in a number of ways. As SadRed suggested, use the
synchronizedMap wrapper provided by Collections; alternately, have
Thread_1 also synchronize on the map while inserting entries:

public void run() {
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
synchronized (TestIndex.values) {
TestIndex.values.put(new
Date(),new Date());
}
}

} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Thread_1 END");
}- Hide quoted text -

- Show quoted text -

As to what I am trying to do: Basically I want to use the HashMap vs.
Hashtable because it is faster, and because there is really only one
situation that I have that needs to be synchronized. That situation
is when I need to clone my HashMap. Unfortunatly when cloning my
HashMap it is very likely that another thread is adding or removing
values to the HashMap. So I need to somehow lock down the HashMap so
that it cannot be modified while the clone method is called.

I tried to use the Collections.synchronizedMap() - with a synchronized
block of code, thinking that this would "lock" the HashMap from
allowing other threads to put values until after the synchronized
block has completed. However it now occurs to me that I have to do
this in both threads, otherwise it will not work. This leads me to
think I should just use Hashtable since I cannot lock HashMap when I
need to (all or nothing).

The above code is just an example to throw the concurrent exception, I
know that it is just "throwing away" the HashMap it creates in thread
2. Again this is just for demo purposes.
 
B

byoder

Thanks for the reply - but I tried to synchronize on both HashMap
put() and clone() operations, and that still does not work (I still
get ConcurrentModificationException). Any ideas?

I was finally able to get something else working - although it is slow
(see final example at end of posting). This just uses the wrapper
class to do the locking, as the underlying Hashmap is not made public.

.... CHANGES ONLY MADE TO TestContainer class ...

public static class TestContainer {

private HashMap<Date, Date> _values;

public TestContainer() {
_values = new HashMap<Date, Date>();
}

public TestContainer(HashMap<Date, Date> v) {
_values = v;
}

private void put(Date key, Date value) {

Map<Date, Date> m = Collections.synchronizedMap(_values);

synchronized (m) {
m.put(key, value);
}
}

public Date get(Date key) {
return _values.get(key);
}

public Object clone() {

TestContainer eStore = null;

Map<Date, Date> m = Collections.synchronizedMap(_values);

synchronized (m) {
HashMap<Date, Date> cValues = new HashMap<Date, Date>(m);
m.remove(ITFGlobalID.GID_FLAG);
eStore = new TestContainer(cValues);
}
return eStore;

}
}



Also here is my final code, that DOES work, but is not ideal (seems to
run slow).

package com.sscims.casetracker;

import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

import com.sscims.jeb.enterprise.ITFGlobalID;

public class TestIndex {

public static void main(String[] args) {

TestContainer c = new TestContainer();
new Thread_1(c).start();
new Thread_2(c).start();
}

public static class Thread_1 extends Thread {

TestContainer c;

public Thread_1(TestContainer c) {
super("Thread_1");
this.setDaemon(false);
this.c = c;
}

public void run() {
Date start = new Date();
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
Date key = new Date();
c.put(key,new Date());
c.get(key);
}

} catch (Exception e) {
e.printStackTrace();
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("Thread_1 END: " + diff + " total time");
}
}

public static class Thread_2 extends Thread {

TestContainer c;

public Thread_2(TestContainer c) {
super("Thread_2");
this.setDaemon(false);
this.c = c;
}

public void run() {
Date start = new Date();
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {
c.clone();
}

} catch (Exception e) {
e.printStackTrace();
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("Thread_2 END: " + diff + " total time");
}
}

public static class TestContainer {

private HashMap<Date, Date> _values;

public TestContainer() {
_values = new HashMap<Date, Date>();
}

public TestContainer(HashMap<Date, Date> v) {
_values = v;
}

private synchronized void put(Date key, Date value) {
_values.put(key, value);
}

public Date get(Date key) {
return _values.get(key);
}

public synchronized Object clone() {
synchronized (this) {
HashMap<Date, Date> cValues = new HashMap<Date, Date>(_values);
cValues.remove(ITFGlobalID.GID_FLAG);
return new TestContainer(cValues);
}
}
}
}
 
B

byoder

I found the following from Sun:

"Note that constructors cannot be synchronized - using the
synchronized keyword with a constructor is a syntax error.
Synchronizing constructors doesn't make sense, because only the thread
that creates an object should have access to it while it is being
constructed."

[http://java.sun.com/docs/books/tutorial/essential/concurrency/
syncmeth.html]

So it would seem that I should use clone() method, but this is not
available from the Map that is returned from the
Collections.synchronizedMap method. So it seems that I cannot do what
I would like because of this. I think I either have to use Hashtable,
or use a wrapper class (above example) to do the locking.

Thanks everybody...
 
L

Lew

I found the following from Sun:

"Note that constructors cannot be synchronized - using the
synchronized keyword with a constructor is a syntax error.
Synchronizing constructors doesn't make sense, because only the thread
that creates an object should have access to it while it is being
constructed."

[http://java.sun.com/docs/books/tutorial/essential/concurrency/
syncmeth.html]

So it would seem that I should use clone() method, but this is not
available from the Map that is returned from the
Collections.synchronizedMap method. So it seems that I cannot do what
I would like because of this. I think I either have to use Hashtable,
or use a wrapper class (above example) to do the locking.

Constructors have nothing to do with your issue.

Your problem stemmed from not synchronizing your puts on the Map.

You might also consider just declaring the Map instance variable as a
synchronized Map in the first place, thus avoiding the Hashtable gotchas.
(Speed isn't one of them.)
public static class TestContainer {

A "static" class? Was this a nested class? If so, you should post the outer
class, too, to make an SSCCE. If not, then you should post code that will
compile.

And stop embedding TAB characters in Usenet posts.

private final Map said:
public TestContainer() {
this( new HashMap said:
}

public TestContainer( Map<Date, Date> v) { //again, use the interface
if ( v == null )
{
throw new IllegalArgumentException( "Map must not be null." );
}
values = Collections.synchronizedMap( v );
}

private void put(Date key, Date value) {
values.put( key, value );
}

public Date get(Date key) {
return values.get(key);

See? /Now/ this is synchronized. It wasn't before.
}

public Object clone() {
synchronized ( values ) { // prevent concurrent mod during clone()
Map<Date, Date> cValues = new HashMap<Date, Date>( values );
values.remove(ITFGlobalID.GID_FLAG);
return new TestContainer( cValues );

Another approach is to copy the Map in the constructor, thus avoiding most of
the issue.
 
L

Lew

I found the following from Sun:

"Note that constructors cannot be synchronized - using the
synchronized keyword with a constructor is a syntax error.
Synchronizing constructors doesn't make sense, because only the thread
that creates an object should have access to it while it is being
constructed."

[http://java.sun.com/docs/books/tutorial/essential/concurrency/
syncmeth.html]

So it would seem that I should use clone() method, but this is not
available from the Map that is returned from the
Collections.synchronizedMap method. So it seems that I cannot do what
I would like because of this. I think I either have to use Hashtable,
or use a wrapper class (above example) to do the locking.

Constructors have nothing to do with your issue.

Your problem stemmed from not synchronizing your puts or gets on the Map.

You might also consider just declaring the Map instance variable as a
synchronized Map in the first place, thus avoiding the Hashtable gotchas.
(Speed isn't one of them.)
public static class TestContainer
{
/*
A "static" class? Was this a nested class? If so, you should post the outer
class, too, to make an SSCCE. If not, then you should post code that will
compile.

And stop embedding TAB characters in Usenet posts.
*/
public TestContainer()
{
this( new HashMap said:
}
>
public TestContainer( Map<Date, Date> v)
{
//again, use the interface
if ( v == null )
{
throw new IllegalArgumentException( "Map must not be null." );
}
values = Collections.synchronizedMap( v );
}

private void put(Date key, Date value) {
values.put( key, value );

// See? /Now/ this is synchronized. It wasn't before.
}

public Date get(Date key) {
return values.get(key);

// See? /Now/ this is synchronized. It wasn't before.
}

public Object clone()
{
// prevent concurrent mod during clone()
synchronized ( values )
{
Map<Date, Date> cValues = new HashMap<Date, Date>( values );
values.remove(ITFGlobalID.GID_FLAG);
return new TestContainer( cValues );

Another approach is to copy the Map in the constructor, thus avoiding most of
the issue.
 
B

byoder

Lew -

Sorry for the TABS (I copied code from Eclipse). I do agree that your
code may work, but I would be paying penalty of having to synchronize
(lock) for every call that uses "values".

But I found solution that allows me to ONLY synchronize methods that
either depend on the map NOT getting modified, and methods that modify
the map. To do this I must synchronize all methods that can modify
the map, but all methods that don't need to synchronize the map (such
as get() don't depend on any locking) are not synchronized so there is
no locking and thus are faster.

Any methods that depend on the map data TO NOT CHANGE I explicitly
lock the container class. This is the best approach because I don't
have to pay penalty for locking (synchronizing) all of my methods -
which is the solution I was looking for as speed is very important to
me in this code.

See example (two classes):

public class TestIndex {

public static void main(String[] args) {

TestContainer c = new TestContainer();
new Thread_1(c).start();
new Thread_2(c).start();
}

public static class Thread_1 extends Thread {

TestContainer c;

public Thread_1(TestContainer c) {
super("Thread_1");
this.setDaemon(false);
this.c = c;
}

public void run() {
Date start = new Date();
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
Date key = new Date();
c.put(key,new Date());
c.get(key);
}

} catch (Exception e) {
e.printStackTrace();
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("Thread_1 END: " + diff + " total time");
}
}

public static class Thread_2 extends Thread {

TestContainer c;

public Thread_2(TestContainer c) {
super("Thread_2");
this.setDaemon(false);
this.c = c;
}

public void run() {
Date start = new Date();
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {
c.clone();
}

} catch (Exception e) {
e.printStackTrace();
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("Thread_2 END: " + diff + " total time");
}
}
}

public class TestContainer {

private HashMap<Date, Date> _values;

public TestContainer() {
_values = new HashMap<Date, Date>();
}

public TestContainer(HashMap<Date, Date> v) {
_values = v;
}

public void put(Date key, Date value) {
_values.put(key, value);
}

public Date get(Date key) {
return _values.get(key);
}

public Object clone() {

HashMap<Date, Date> cValues = new HashMap<Date,
Date>(_values.size());

synchronized (this) {
cValues.putAll(_values);
}
return new TestContainer(cValues);
}
}
 
L

Lew

Lew -

Sorry for the TABS (I copied code from Eclipse). I do agree that your
code may work, but I would be paying penalty of having to synchronize
(lock) for every call that uses "values".

It is necessary to synchronize every access to a shared resource if you want
to avoid memory model anomalies.
But I found solution that allows me to ONLY synchronize methods that
either depend on the map NOT getting modified, and methods that modify
the map. To do this I must synchronize all methods that can modify
the map, but all methods that don't need to synchronize the map (such
as get() don't depend on any locking) are not synchronized so there is
no locking and thus are faster.

This means that you risk having the get() return different values than were put().
Any methods that depend on the map data TO NOT CHANGE I explicitly
lock the container class. This is the best approach because I don't
have to pay penalty for locking (synchronizing) all of my methods -

The "penalty" is not so large, and certainly smaller than getting the wrong
results.
which is the solution I was looking for as speed is very important to
me in this code.

But correct results are not?
See example (two classes):

public class TestIndex {

public static void main(String[] args) {

TestContainer c = new TestContainer();
new Thread_1(c).start();
new Thread_2(c).start();
}

public static class Thread_1 extends Thread {

TestContainer c;

public Thread_1(TestContainer c) {
super("Thread_1");
this.setDaemon(false);
this.c = c;
}

public void run() {
Date start = new Date();
System.out.println("Thread_1 run...");
try {
for (int i=0; i<1000000; i++) {
Date key = new Date();
c.put(key,new Date());
c.get(key);
}

} catch (Exception e) {
e.printStackTrace();
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("Thread_1 END: " + diff + " total time");
}
}

public static class Thread_2 extends Thread {

TestContainer c;

public Thread_2(TestContainer c) {
super("Thread_2");
this.setDaemon(false);
this.c = c;
}

public void run() {
Date start = new Date();
System.out.println("Thread_2 run...");
try {
for (int i=0; i<1000000; i++) {
c.clone();
}

} catch (Exception e) {
e.printStackTrace();
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("Thread_2 END: " + diff + " total time");
}
}
}

public class TestContainer {

private HashMap<Date, Date> _values;

Apparently you overlooked the suggestion to declare this as a Map rather than
a HashMap.
public TestContainer() {
_values = new HashMap<Date, Date>();
}

public TestContainer(HashMap<Date, Date> v) {
_values = v;
}

public void put(Date key, Date value) {
_values.put(key, value);

This unsynchronized access will cause trouble.
}

public Date get(Date key) {
return _values.get(key);

This unsynchronized access will cause trouble.
}

public Object clone() {

HashMap<Date, Date> cValues = new HashMap<Date,
Date>(_values.size());

This unsynchronized access will cause trouble.

Declare the variable as the interface type, instantiate as the concrete type.
synchronized (this) {
cValues.putAll(_values);

Synchronizing on only one access to a protected resource but leaving the rest
unsynchronized causes bugs.
}
return new TestContainer(cValues);
}
}

Your code is subject to a host of ills because you aren't synchronizing your
access to a shared resource.
 
B

byoder

You are correct, I posted the wrong version of my code. The "public
void put()..." should have read "public synchronized void put()..." -
so you are correct on this point.

No, not really. I DO NOT care if the value returned from get() is "up-
to-date" or not, think of it like "dirty read". My requirements are
very simple...

a) I must be able to put NULL values (not keys) in my map. Therefore
I cannot use ConcurrentHashMap, Hashtable.
b) I don't want to synchronize ALL methods, such as get() methods,
since I don't care if the data has changed - I allow dirty read (old
data).
c) I must never throw ConcurrentModificationException. Therefore I
must be sure that I synchronize methods that either (a) modify the
data (the put method) OR (b) methods that rely on the data not getting
modified during execution (the clone method).

HashMap states the following:

"If multiple threads access this map concurrently, and at least one of
the threads modifies the map structurally, it must be synchronized
externally. (A structural modification is any operation that adds or
deletes one or more mappings; merely changing the value associated
with a key that an instance already contains is not a structural
modification.) This is typically accomplished by synchronizing on
some object that naturally encapsulates the map".

"synchronizing on some object" is done via my TestContainer class, so
I beleive I have accomplished this. And Sun also states that "If no
such object exists, the map should be 'wrapped' using the
Collections.synchronizedMap" - but this is an alternative, NOT in
addition to. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/HashMap.html

So that said I feel I am OK to allow the get() method to run
unsynchronized, but in short I agree the clone and put methods (in the
TestContainer class) MUST BOTH be synchronized which they were not in
my prior post.

Thanks for the conversation - this has been real
 
P

Patricia Shanahan

You are correct, I posted the wrong version of my code. The "public
void put()..." should have read "public synchronized void put()..." -
so you are correct on this point.


No, not really. I DO NOT care if the value returned from get() is "up-
to-date" or not, think of it like "dirty read". My requirements are
very simple...

You seem to be assuming atomic behavior, so that the get result is
either the old or the new value. There is nothing in the API
documentation that promises atomic get.

As far as I can tell, the actual get implementation has real
vulnerabilities, even in a simple single processor situation, without
getting into the complexities of memory order.

Consider the following two lines from the get implementation:

int i = indexFor(hash, table.length);
Entry<K,V> e = table;

Suppose there is an interrupt and context switch between the two lines.
Some other thread does a put that causes a rehash with a larger table
size. The hash may no longer belong in bucket i. If get searches the
wrong bucket it will return null regardless of whether the key is
actually present or not.

This could happen with everything else synchronized, as long as get
itself is not synchronized.

Patricia
 
L

Lew

Patricia said:
You are correct, I posted the wrong version of my code. The "public
void put()..." should have read "public synchronized void put()..." -
so you are correct on this point.


No, not really. I DO NOT care if the value returned from get() is "up-
to-date" or not, think of it like "dirty read". My requirements are
very simple...

You seem to be assuming atomic behavior, so that the get result is
either the old or the new value. There is nothing in the API
documentation that promises atomic get.

As far as I can tell, the actual get implementation has real
vulnerabilities, even in a simple single processor situation, without
getting into the complexities of memory order.

Consider the following two lines from the get implementation:

int i = indexFor(hash, table.length);
Entry<K,V> e = table;

Suppose there is an interrupt and context switch between the two lines.
Some other thread does a put that causes a rehash with a larger table
size. The hash may no longer belong in bucket i. If get searches the
wrong bucket it will return null regardless of whether the key is
actually present or not.

This could happen with everything else synchronized, as long as get
itself is not synchronized.


What we are trying to point out is that you can't only synchronize some of the
methods that access the Map and not others. It's all or nothing. That's the
way it is. You cannot safely synchronize only some of the methods that access
the Map.
 
B

byoder

This is a good point, but I am not sure if you are correct (this is
getting a bit out of my league). It appears that the HashMap.Entry
DOES depends on the table.length - so it is sensitive to changes in
size.

But I cannot say for sure if this will cause the behavior or problems
you mention above (return NULL becuase of unsynchronized get()
method). One thing that leads me to think you ARE correct is the
implementation of Hashtable HAS a synchronized get() method. But it
seems strange to me that HashMap would have such a BIG vunerability in
it - but perhaps that is what you get when using unsynchronized
HashMap vs. Hashtable.

Any Java experts care to expand on this?

I think that tomorrow I will take the HashMap source code and run some
tests to prove or disprove this theory, since I cannot tell by looking
at it. But if true, then I will need to also synchronize the get()
method in my wrapper class.

Below methods are from HashMap.java (1.5.0_12):

public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key ||
key.equals(k)))
return e.value;
}
return null;
}

static int indexFor(int h, int length) {
return h & (length-1);
}

public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key ||
key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}

modCount++;
addEntry(hash, key, value, i);
return null;
}

void addEntry(int hash, K key, V value, int bucketIndex) {
Entry<K,V> e = table[bucketIndex];
table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
if (size++ >= threshold)
resize(2 * table.length);
}
 
P

Patricia Shanahan

But I cannot say for sure if this will cause the behavior or problems
you mention above (return NULL becuase of unsynchronized get()
method). One thing that leads me to think you ARE correct is the
implementation of Hashtable HAS a synchronized get() method. But it
seems strange to me that HashMap would have such a BIG vunerability in
it - but perhaps that is what you get when using unsynchronized
HashMap vs. Hashtable.

I am not suggesting a vulnerability in the sense of a failure of HashMap
to conform to its contract. The scenario requires a structural change to
the map while get is running. The API documentation says "If multiple
threads access this map concurrently, and at least one of the threads
modifies the map structurally, it must be synchronized externally.". The
vulnerability lies in trying to use HashMap in a way that is contrary to
its contract.

The scenario is a simple single processor case. How well do you
understand the Java memory model? See
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.4
I think that tomorrow I will take the HashMap source code and run some
tests to prove or disprove this theory, since I cannot tell by looking
at it. But if true, then I will need to also synchronize the get()
method in my wrapper class.

Synchronization bugs may be hard to reproduce, and remember that you
don't just have to deal with the one scenario I suggested. In order to
not synchronize get you would need to look at its interaction with every
method in HashMap.

An easier approach might be to first do a performance test simply
synchronizing get. It may run fast enough. For the default load level,
the average chain length it needs to search is 0.75 entries. If get is
too slow to synchronize, I would suspect a bad hashCode function in the
key class, and try to fix that.
Below methods are from HashMap.java (1.5.0_12):

public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key ||
key.equals(k)))
return e.value;
}
return null;
}

static int indexFor(int h, int length) {
return h & (length-1);
}

I looked at 1.5.0_3. The same point applies to the later code - consider
a resize happening anywhere between indexFor committing to its result
and the end of the search.

This brings out an important point. If you are going to depend for
correctness of your program on assumptions about how API methods are
implemented, you cannot just review one implementation. You need to
review EVERY version on which you intend to support your program.

Patricia
 
S

sentientholon

This is a good point, but I am not sure if you are correct (this is
getting a bit out of my league). It appears that the HashMap.Entry
DOES depends on the table.length - so it is sensitive to changes in
size.

But I cannot say for sure if this will cause the behavior or problems
you mention above (return NULL becuase of unsynchronized get()
method). One thing that leads me to think you ARE correct is the
implementation of Hashtable HAS a synchronized get() method. But it
seems strange to me that HashMap would have such a BIG vunerability in
it - but perhaps that is what you get when using unsynchronized
HashMap vs. Hashtable.

The unsynchronized behavior of HashMap is simply there so you can use
it without synchronization when you don't need to. The wrapper
provided by Collections.synchronizedMap(...) is sufficient for safe
concurrent use, as long as everybody who interacts with the underlying
Map does so through the same wrapper (duh).

As Patricia has noted - yes, accesses have to be synchronized as well
as mutations, because a get is completely non-atomic in this
situation. I believe there are some utility libraries out there that
provide some "dirty read" types of functionality, by doing various
kinds of copy-on-write tricks, but the Collections API doesn't promise
anything of the sort. Maybe something like this is what you are
looking for?

http://commons.apache.org/collections/api-release/org/apache/commons/collections/FastHashMap.html

Fred
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top