Proposed new Java feature

M

Mike Schilling

First a few observations:

1. ThreadLocals may be, in general, an abomination, but there are situation
where they're the least of evils. And if you're building a framework in
which third-party code runs (e.g. a webserver), there's no way to disallow
their use.

2. ThreadLocals interact badly with ThreadPools, because the ThreadLocals
keep their value when the tyhread is put back into the pool. This can lead
to leaks and even potential security issues.

3. The current implementation of ThreadLocal is this: each thread contains a
map from the ThreadLocal instance to its value for that thread. (This is
slightly less intiutive than giving a ThreadLocal instance a map from Thread
to value, but has the advantage that the map, which is only used within one
thread, needn't be synchronized.)

Proposed feature: a static method on Thread that clears all ThreadLocals for
the current thread. By 3 it's trivial to implement. By 1 and 2 it's
needed. And ThreadPool implementations can simply call it directly before
putting the Thread back into the pool.
 
R

Robert Klemme

First a few observations:

1. ThreadLocals may be, in general, an abomination, but there are situation
where they're the least of evils. And if you're building a framework in
which third-party code runs (e.g. a webserver), there's no way to disallow
their use.

"abomination" is a too strong word: they are a tool with particular
usefulness and particular issues. They should definitively be used
sparingly because they carry state in a kind of hidden way. But there
are good use cases (e.g. attaching transaction context to a thread).
2. ThreadLocals interact badly with ThreadPools, because the ThreadLocals
keep their value when the tyhread is put back into the pool. This can lead
to leaks and even potential security issues.

I would actually consider this good interaction with thread pools: the
local stays around for as long as the thread lives. If you introduce
security issues this way than you are probably not using thread locals
properly. There are two things that you generally need to consider with
thread locals which both result from the fact that the life time of a
thread local value extends across a current method call (i.e. earlier
and later):

1. You need to be ready to calculate the value any time because it might
be the first time that you access it in the current thread.

2. You need to be aware that the ThreadLocal value will stay around
longer than the current method call. So if you want things removed from
it after the current call terminates you better ensure it's done
(usually in a finally block).
3. The current implementation of ThreadLocal is this: each thread contains a
map from the ThreadLocal instance to its value for that thread. (This is
slightly less intiutive than giving a ThreadLocal instance a map from Thread
to value, but has the advantage that the map, which is only used within one
thread, needn't be synchronized.)

Even more important: this construction will allow for GC of all thread
local objects when the thread dies. This is important since a
ThreadLocal instance often lives much longer than threads which might
use it (especially if it is a static member which is a typical use case).
Proposed feature: a static method on Thread that clears all ThreadLocals for
the current thread. By 3 it's trivial to implement. By 1 and 2 it's
needed. And ThreadPool implementations can simply call it directly before
putting the Thread back into the pool.

I am not convinced this is a good idea: the current design ensures that
all ThreadLocals are completely independent from each other. By
introducing this clear all method you can generate side effects on other
thread locals that might not be wanted - this could at least make things
significantly more inefficient because values have to be recalculated
much more often than intended. It may in fact introduce functional
bugs: Consider a thread context which is stored in a ThreadLocal before
your current method was invoked in order to carry it forward to methods
much deeper on the call stack (e.g. a method on a JCA connection). You
decide to do Thread.clearAllLocals() in this thread. The JCA method
cannot properly deal with the TX because the thread local value is gone
and the caller relies on the ThreadLocal to be still there and when it's
gone the TX cannot be properly finished.


Side note: it happened to me more than once that I found Java's standard
library design or implementation weird. And there are in fact bad
quirks (Vector, Hashtable) but often when I think longer about how they
did it I have to say it is done properly the way they did it. So the
standard lib is definitively better than one often thinks.

Kind regards

robert
 
M

markspace

Proposed feature: a static method on Thread that clears all ThreadLocals for
the current thread.


I can see your points. However, I don't have any real experience with
ThreadLocal, and when a neophyte agrees with your argument, that's a red
flag.

Here's a blog where someone seems to have the same issue as you.

<http://weblogs.java.net/blog/jjvian...-bad-idea-or-dealing-apparent-glassfish-memor>

At the end of the comments, there's a suggestion to use
ThreadLocal::remove(), with the implication that it allows the thread
local variable to be garbage collection. Is there a reason that doesn't
work for you?

My other thought is that "for the current thread" could be improved with
"for a given thread." So, inside an Executor, I can just call

Thread t = ...
// .. use the thread ..
Thread.removeLocals( t );
// now add the thread back into the pool...

And this seems better because I don't have to rely on the users of a
thread remembering to do it themselves. External control seems better here.
 
T

Tom Anderson

Proposed feature: a static method on Thread that clears all ThreadLocals
for the current thread. By 3 it's trivial to implement. By 1 and 2
it's needed. And ThreadPool implementations can simply call it directly
before putting the Thread back into the pool.

It's a good idea. Tomcat already has some sort of eldritch hack to do
exactly this to request threads; it logs a warning if it finds undeleted
entries. There's a bit about it here:

http://wiki.apache.org/tomcat/MemoryLeakProtection

They, and other container manufacturers, might be glad of an official way
of doing this.

tom
 
M

Mike Schilling

markspace said:
I can see your points. However, I don't have any real experience with
ThreadLocal, and when a neophyte agrees with your argument, that's a red
flag.

Here's a blog where someone seems to have the same issue as you.

<http://weblogs.java.net/blog/jjvian...-bad-idea-or-dealing-apparent-glassfish-memor>

At the end of the comments, there's a suggestion to use
ThreadLocal::remove(), with the implication that it allows the thread
local variable to be garbage collection. Is there a reason that doesn't
work for you?

That acts on an individual ThreadLocal (and works quite well), but it
doens't allow removing all ThreadLocals that might have been accumlated.
My other thought is that "for the current thread" could be improved with
"for a given thread." So, inside an Executor, I can just call

Thread t = ...
// .. use the thread ..
Thread.removeLocals( t );
// now add the thread back into the pool...

And this seems better because I don't have to rely on the users of a
thread remembering to do it themselves. External control seems better
here.

Same comment. What I'm asking for is Thread.removeLocals(), which doesn't
currently exist.
 
M

Mike Schilling

Robert Klemme said:
I would actually consider this good interaction with thread pools: the
local stays around for as long as the thread lives. If you introduce
security issues this way than you are probably not using thread locals
properly. There are two things that you generally need to consider with
thread locals which both result from the fact that the life time of a
thread local value extends across a current method call (i.e. earlier and
later):


OK. Now, coinsider these two cases (for, say, a webserver):

1. I create a new thread to handle each new request.
2. I optimize (1) by using a thread pool to minimize thread creation.

I want those two to behave identically (other than performance). To acheive
that, I need to be able to kill all the ThreadLocals when putting the
Threads back into the pool for later reuse. Otherwise

A. The ThreadLocals for threads in the pool cause packratting.
B. A reused thread contains context created during its previous use. This
may be context that the user correspnding to the request currently being
handled by the thread should not be able to see.
 
D

Daniel Pitts

That acts on an individual ThreadLocal (and works quite well), but it
doens't allow removing all ThreadLocals that might have been accumlated.
You're basically saying "This type of resource can leak if not cleared
appropriately, so there should be a 'Release all resources' method."

When paraphrased that way, does that make it clearer why it isn't a good
idea? It would be about the same as a "File.closeAll()" or a
"Socket.closeAll()" call. Extremely dangerous and only a crutch for not
doing the right thing to begin with.
 
M

Mike Schilling

Daniel Pitts said:
You're basically saying "This type of resource can leak if not cleared
appropriately, so there should be a 'Release all resources' method."

When paraphrased that way, does that make it clearer why it isn't a good
idea? It would be about the same as a "File.closeAll()" or a
"Socket.closeAll()" call. Extremely dangerous and only a crutch for not
doing the right thing to begin with.

Or a "collect all unused memory" call . Clearly, that's a crutch for not
keeping track of memory allocation properly in the first place. And the
fact that files and sockets are closed when a process exits is yet another
crutch.
 
E

Eric Sosman

Or a "collect all unused memory" call . Clearly, that's a crutch for not
keeping track of memory allocation properly in the first place. And the
fact that files and sockets are closed when a process exits is yet another
crutch.

I'm with Daniel. Your code uses classes that you wrote, and
classes you got from somewhere else -- Sioux Unusuals, perhaps.
And if those classes use ThreadLocals for their own purposes, and
you blithely destroy them all, what happens then? Or what about
the class that invoked your code, passing an InheritableThreadLocal?
Is it a good idea to change parts of your invoker's state that you
don't understand, that you aren't even aware of?
 
M

Mike Schilling

Eric Sosman said:
I'm with Daniel. Your code uses classes that you wrote, and
classes you got from somewhere else -- Sioux Unusuals, perhaps.
And if those classes use ThreadLocals for their own purposes, and
you blithely destroy them all, what happens then? Or what about
the class that invoked your code, passing an InheritableThreadLocal?
Is it a good idea to change parts of your invoker's state that you
don't understand, that you aren't even aware of?

Consider the use case again. I've written a container. It procures a
thread and lets third-party code run in it. Currently I have two choices:

1. Create a fresh thread each time. This kills all the ThreadLocals that
might be lying around, but doesn't perform very well.
2. Use a ThreadPool. This improves performance at the cost of letting
ThreadLocals packrat (which, in the worst case, holds entire ClassLoaders in
memory).

I'm greedy; I want the performance without the memory issues. Or, to say it
another way, when getting a Thread from a ThreadPool, it should be
completely indistinguishable whether it's a recycled or brand-new thread.
 
E

Eric Sosman

Consider the use case again.

I understood it fine the first time around, thanks. If you
think of anything new to add, pray do so.
[... reiteration of old material ...]

I notice that you do not address any of the issues Daniel or I
raised; you just repeat what you "want." I know three-year-olds
who can do that, but I don't rush to grant their every whim.
 
M

markspace

I understood it fine the first time around, thanks. If you
think of anything new to add, pray do so.


This is uncharitable. In particular because I think his point went
whooshing over your head, and you didn't understand it at all.

He's saying the thread is done. Kaput. Any Sioux Unusual class using
the thread will loose its thread locals, because at this point it
expects the thread to die.

So instead of allowing the thread to die, we re-use it. But what would
Sioux Unusual do? How would it expect this thread to be matched up with
the same object ever again? There's no guarantee of that. Thread
assignment from the executor is random. Worse, another Sioux Unusual
object could see the thread local from some different object, and assume
it's been assigned from its own invocation, when it clearly hasn't.

Really, the problem is that Sioux Unusual is broken for this
application, and shouldn't be used when threads are gong to be re-used
and randomly reassigned. But that's not what he's talking about. He's
talking about the case where he doesn't need the thread locals to
persist, but he does need to use them, and to release them too.

The posts by myself and Tom should indicate that Mike is not alone in
this need. He's got a real problem. That you insist that his
requirements should be obviated because you or someone else might have
some different requirement... well that just doesn't make sense.
 
T

Tom Anderson

I understood it fine the first time around, thanks.

It seems to me that you did not.
If you think of anything new to add, pray do so.

Mike added this pithy statement of the goal:

Or, to say it another way, when getting a Thread from a ThreadPool, it
should be completely indistinguishable whether it's a recycled or
brand-new thread.

At the moment, it is *not* possible to achieve that, because ThreadLocals
will survive recycling.

One way of looking at a thread pool is that it provides a supply of fresh
short-lived virtual threads, implemented on top of long-lived real
threads. Every time we take a thread out of the pool, we want it to appear
brand-new. Deleting all ThreadLocals when a real thread is recycled is not
"extremely dangerous", because at that moment, the *virtual* thread is
dying, and a new one is being created; that makes it an entirely
appropriate thing to do.

It is certainly true that it is "only a crutch for not doing the right
thing to begin with", but the sad fact is that programmers persist in not
doing the right thing to begin with, and it is useful for framework
writers to be able to limit the damage they can cause.

tom
 
M

Mike Schilling

Eric said:
Consider the use case again.

I understood it fine the first time around, thanks. If you
think of anything new to add, pray do so.
[... reiteration of old material ...]

I notice that you do not address any of the issues Daniel or I
raised; you just repeat what you "want." I know three-year-olds
who can do that, but I don't rush to grant their every whim.

Note that it makes your objections moot. There's no way to enforce that the
called code be "correct", and there is no invoker to worry about.
 
R

Robert Klemme

OK. Now, coinsider these two cases (for, say, a webserver):

1. I create a new thread to handle each new request.
2. I optimize (1) by using a thread pool to minimize thread creation.

I want those two to behave identically (other than performance). To acheive
that, I need to be able to kill all the ThreadLocals when putting the
Threads back into the pool for later reuse. Otherwise

No, you don't. If you employ proper cache size management schemes (e.g.
via a LRU) then you do not get rid of the ThreadLocal after each
request. If for any reason this is not possible and you want to get rid
of state after the request then it should be done per individual
ThreadLocal and not globally for all ThreadLocals.
A. The ThreadLocals for threads in the pool cause packratting.

Then you do not employ proper cache size management. You always need to
be aware of the life time difference between a method call and a
ThreadLocal and code appropriately. Not doing this is not following the
contract of ThreadLocal.
B. A reused thread contains context created during its previous use. This
may be context that the user correspnding to the request currently being
handled by the thread should not be able to see.

Then you are not using ThreadLocal properly, i.e. have omitted proper
cleanup (e.g. because you either did not do it or did not do it in a
safe way, i.e. finally block).

Kind regards

robert
 
R

Robert Klemme

This is uncharitable. In particular because I think his point went
whooshing over your head, and you didn't understand it at all.

He's saying the thread is done. Kaput. Any Sioux Unusual class using the
thread will loose its thread locals, because at this point it expects
the thread to die.

How can you know if you did not create the thread? If the thread dies
as in "terminates and then is GC'ed" there is no need for additional
cleanup because GC will do that just fine.
So instead of allowing the thread to die, we re-use it. But what would
Sioux Unusual do? How would it expect this thread to be matched up with
the same object ever again? There's no guarantee of that. Thread
assignment from the executor is random. Worse, another Sioux Unusual
object could see the thread local from some different object, and assume
it's been assigned from its own invocation, when it clearly hasn't.

Really, the problem is that Sioux Unusual is broken for this
application, and shouldn't be used when threads are gong to be re-used
and randomly reassigned. But that's not what he's talking about. He's
talking about the case where he doesn't need the thread locals to
persist, but he does need to use them, and to release them too.

But he cannot know how long ThreadLocals need to stay which are not
created in his code. If cleanup is needed then it must be done per
ThreadLocal and never globally.
The posts by myself and Tom should indicate that Mike is not alone in
this need. He's got a real problem. That you insist that his
requirements should be obviated because you or someone else might have
some different requirement... well that just doesn't make sense.

If there is a real problem then it's not using ThreadLocal properly.
That should be fixed. And not a dangerous functionality added to
standard library where changes are enormous to wreck havoc on all sorts
of code.

Kind regards

robert
 
R

Robert Klemme

Mike added this pithy statement of the goal:

Or, to say it another way, when getting a Thread from a ThreadPool, it
should be completely indistinguishable whether it's a recycled or
brand-new thread.

At the moment, it is *not* possible to achieve that, because
ThreadLocals will survive recycling.

But that is actually a good thing: ThreadLocal's purpose is to attach
state to a thread which can be accessed without synchronization (if not
shared). Consider attaching a database connection to a thread. It
would be disastrous for the connection itself to just clear the object
reference through some global "remove all" and you would pay a high
price for opening new connections over and over again. If the database
connection is supposed to be attached to a thread for only a certain
amount of time (e.g. a method call) then it must be coded to be properly
cleaned up; in case of a JDCB connection that means invoking commit() or
rollback() and then close(). Only after that you can safely clear the
reference. But this entirely depends on the ThreadLocal at hand and
needs to be done specifically for individual ThreadLocals - it cannot be
done globally.
One way of looking at a thread pool is that it provides a supply of
fresh short-lived virtual threads, implemented on top of long-lived real
threads. Every time we take a thread out of the pool, we want it to
appear brand-new. Deleting all ThreadLocals when a real thread is
recycled is not "extremely dangerous", because at that moment, the
*virtual* thread is dying, and a new one is being created; that makes it
an entirely appropriate thing to do.

It *is* dangerous as I have explained above. The observable state of
each of these virtual threads should of course be identical but that
does not mean that the technical state must be identical (that's
basically what caching is all about). For example, in case of the DB
connection you might open the connection in initialValue() or some other
logic so the code using the ThreadLocal always sees a properly
initialized connection.
It is certainly true that it is "only a crutch for not doing the right
thing to begin with", but the sad fact is that programmers persist in
not doing the right thing to begin with, and it is useful for framework
writers to be able to limit the damage they can cause.

Actually the global remove would cause more damage than it prevents.
You cannot know about all the other ThreadLocals used in code and a
framework writer who attempts to just clear state of other modules
without even knowing them should be tarred and feathered.

Cheers

robert
 
M

markspace

How can you know if you did not create the thread?


But we are creating the thread. It's part of a thread pool for general
use. That's his use case; it's the fundamental crux of his request.

If the thread dies as
in "terminates and then is GC'ed" there is no need for additional
cleanup because GC will do that just fine.


Threads in a thread pool don't die, they are re-used. Hence the need
for manual (even external) clean-up.

But he cannot know how long ThreadLocals need to stay which are not
created in his code.


When the thread is returned to the pool, all thread locals must be
marked as invalid. Period.

If there is a real problem then it's not using ThreadLocal properly.
That should be fixed. And not a dangerous functionality added to
standard library where changes are enormous to wreck havoc on all sorts
of code.


I can think of a few scenarios where it would be useful to use both
thread locals and a thread pool, so while maybe "dangerous" it's still
something we should investigate properly. Hacks like the reflective
code I linked to earlier that dump private fields seem a lot more
"dangerous" to me, yet they are currently needed.
 
R

Robert Klemme

But we are creating the thread. It's part of a thread pool for general
use. That's his use case; it's the fundamental crux of his request.

Maybe there is a misunderstanding: I read your statement to mean that
"Sioux Unusual class" assumes the thread to be dead while it was created
by someone else (presumably the thread pool). So it cannot know
anything about thread creation and dead because it does not have any
control over the thread's lifetime.
I can think of a few scenarios where it would be useful to use both
thread locals and a thread pool, so while maybe "dangerous" it's still
something we should investigate properly.

It's certainly a useful idiom: the thread pool saves the overhead of
creating and destroying threads (and can also control the amount of
concurrency in an application) and ThreadLocals cache state which can be
accessed without synchronization overhead. Killing that every time a
thread returns to the pool will make caching much less efficient.
Hacks like the reflective code
I linked to earlier that dump private fields seem a lot more "dangerous"
to me, yet they are currently needed.

IMHO they are OK as long as they are used to log debugging warnings
during development but as I said, a general mechanism to clear all
thread locals does have more drawbacks than advantages. Now you are
hunting memory leaks through badly coded thread locals, then you might
have to hunt down weird application behavior because of state
disappearing which is expected to be still there. Remember that *any*
method in the thread's call stack can invoke cleaner which means that
all methods upwards the call stack will not find their ThreadLocals back
once control returns to them.

If we think about extending ThreadLocal's functionality at all then I
would think in the direction of registering a cleanup handle (callback
interface) with a ThreadLocal (or defining an empty method in
ThreadLocal which can be overridden). This handle would be invoked by
the Thread itself prior to termination but could also be invoked by a
thread pool or other framework. Even that functionality might introduce
bad bugs since - again - all methods on the call stack would be able to
invoke it. At least the creator of the ThreadLocal would have a chance
to detect the situation and report a proper error (illegalstate for
example).

Illustration

public class ThreadLocal<T> {
static class ThreadLocalMap {

private void remove(ThreadLocal<X> key) {
Entry[] tab = table;
int len = tab.length;
int i = key.threadLocalHashCode & (len-1);
for (Entry e = tab;
e != null;
e = tab[i = nextIndex(i, len)]) {
if (e.get() == key) {
// invoke cleanup code:
key.cleanup((X) e.value);

e.clear();
expungeStaleEntry(i);
return;
}
}

}

/**
* This method does nothing. Sub classes must override as they see fit.
* @param the value of a thread which must be cleared.
* @see #remove()
*/
protected void cleanup(T val) {
// nop
}

// Now we can even allow a cleanup all
public static removeAll() {
// pseudo code
for (final ThreadLocal<?> tl : allLocalsOfThisThread()) {
tl.remove();
}
}
}

However I think about it: the idea of simply clearing ThreadLocal
references still does not become a good idea. The creator of a
ThreadLocal needs to consider how it is used and how cleanup is done.
Everything else introduces dangerous side effects which lead to bugs
which are at least as hard to find as those memory leaks.

Kind regards

robert
 
M

markspace

However I think about it: the idea of simply clearing ThreadLocal
references still does not become a good idea. The creator of a
ThreadLocal needs to consider how it is used and how cleanup is done.
Everything else introduces dangerous side effects which lead to bugs
which are at least as hard to find as those memory leaks.


Naw, I just disagree. Saying that the creator is responsible for
cleanup is like saying that because some objects need an explicit
"close," we should force those semantics onto all objects.

If you have a thread local that requires special semantics, go ahead and
provide the special code for those objects. The rest of the code can
use a common cleanup up mechanism that MOST objects require: just GC them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top