Semantics of STL containers (std::map) in a multithreaded scenario

D

Dilip

I understand the C++ standard does not talk about threading. My
question here is directed more towards what happens when a STL
container is used in a certain way. I'd appreciate any thoughts. I
re-iterate I don't want to probe into what C++ standard says when I
trample some data from multiple threads, I simply want to know if I
have understood this right.

I have a "std::map<somedatatype, someotherdatatype> myMap", where stuff
gets inserted regularly. In a multi-threaded environment, there are
other parts of code that simultaneously read from this map.

I would like to know if I have understood this right. Consider a
scenario like this:

// thread 1 in one part of the application

busy_inserting_stuff_into_myMap

// while thread 2 in another part of the application

1. gets a reference to the myMap
2. starts iterating by performing an ordinary loop:

std::map<>::iterator itrbegin = myMap.begin();
std::map<>::iterator itrend = myMap.end();
std::map<>::iterator itr;
for (itr = itrbegin; itr != itr.end(); ++itr)
{
}

Since thread 1 is regularly inserting stuff, is there a chance thread 2
finds all its iterators invalidated while its in the process of
looping? In other words what is the effect of inserting into a map
from one thread while simultaneously looping through the same map in
another thread? I don't care if thread 2 misses some info during the
looping (since the map gets updated all the time...)

thanks!
 
D

Dilip

**sigh**

I said:
I
re-iterate I don't want to probe into what C++ standard says when I
trample some data from multiple threads, I simply want to know if I
have understood this right.

and ended up asking:
In other words what is the effect of inserting into a map
from one thread while simultaneously looping through the same map in
another thread? I don't care if thread 2 misses some info during the
looping (since the map gets updated all the time...)

Apologies. If someone still wants to clarify this, I'd appreciate it a
lot. thanks!
 
T

Thomas J. Gritzan

Dilip said:
Since thread 1 is regularly inserting stuff, is there a chance thread 2
finds all its iterators invalidated while its in the process of
looping? In other words what is the effect of inserting into a map
from one thread while simultaneously looping through the same map in
another thread? I don't care if thread 2 misses some info during the
looping (since the map gets updated all the time...)

<OT>
You should not write to a variable in one thread while reading the same
variable in other threads. Its not just about invalidating iterators.

You could even run into problems when writing to one single integer.
Thread A writes to variable "size", and Thread B starts reading it just
when Thread A wrote the first byte of "size" and left the other bytes
unchanged. So Thread B gets a byte-wise mixture of the old and the new
values of "size".

Use some kind of mutex when you access variables or objects from more
than one thread.
</OT>

Thomas
 
D

Dilip

Thomas said:
Dilip schrieb:
<OT>
You should not write to a variable in one thread while reading the same
variable in other threads. Its not just about invalidating iterators.

Understood. Sadly, I am in an unenviable position of retrofitting
multithreading into a heavily C++/STL dependant application and I
frequently keep running into corner cases like this :-(
 
D

Dilip

Thomas said:
Dilip schrieb:
<OT>
You should not write to a variable in one thread while reading the same
variable in other threads. Its not just about invalidating iterators.

I just looked at the docs for std::map and it looks like for
insertion/removal no iterators are invalidated. So atleast in this
particular case, aren't I safe?
 
T

Thomas J. Gritzan

Dilip said:
I just looked at the docs for std::map and it looks like for
insertion/removal no iterators are invalidated. So atleast in this
particular case, aren't I safe?

In a single thread? Yes.
 
A

Andre Kostur

Understood. Sadly, I am in an unenviable position of retrofitting
multithreading into a heavily C++/STL dependant application and I
frequently keep running into corner cases like this :-(

You need to consult with the documentation for the platform(s) that your
STL is implemented on and see what threading guarantees it provides.
 
A

Andre Kostur

I just looked at the docs for std::map and it looks like for
insertion/removal no iterators are invalidated. So atleast in this
particular case, aren't I safe?

No. You have no control as to what's happening behind the scenes. The
only promise you have is that after the insert/erase is completed, no
other iterators into that map have been invalidated by that operation.
(Note I said _other_ iterators. In the case of erase, the iterator you're
working on becomes invalid....)
 
R

red floyd

Dilip said:
I just looked at the docs for std::map and it looks like for
insertion/removal no iterators are invalidated. So atleast in this
particular case, aren't I safe?

No, because your inserter thread could be involved in a race with your
reader thread.

Inserter thread:
Insert-Start -------------------------------- Insert Finished

-------------------- IncrementIterator ----------------------
Reader Thread

See what happens? Not good.
 
H

Howard Hinnant

"Dilip said:
**sigh**

I said:


and ended up asking:


Apologies. If someone still wants to clarify this, I'd appreciate it a
lot. thanks!

I haven't prototyped this scenario, but I have experience implementing
std::map. I strongly suspect that this scenario could very rarely
introduce cases where the increment of the read thread gets redirected
into oblivion (say while the insert is rotating a sub-tree). Note that
"very rarely" is a worst case scenario. It means your testing probably
won't expose the bug. I wish I could give you a firmer answer than
that, but quite frankly that's a research project (which I lack
resources for). Retrofitting multithreading is not a pretty place to be.

-Howard
 
J

Joe Seigh

Dilip said:
I understand the C++ standard does not talk about threading. My
question here is directed more towards what happens when a STL
container is used in a certain way. I'd appreciate any thoughts. I
re-iterate I don't want to probe into what C++ standard says when I
trample some data from multiple threads, I simply want to know if I
have understood this right.

I have a "std::map<somedatatype, someotherdatatype> myMap", where stuff
gets inserted regularly. In a multi-threaded environment, there are
other parts of code that simultaneously read from this map.

I would like to know if I have understood this right. Consider a
scenario like this: ....
Since thread 1 is regularly inserting stuff, is there a chance thread 2
finds all its iterators invalidated while its in the process of
looping? In other words what is the effect of inserting into a map
from one thread while simultaneously looping through the same map in
another thread? I don't care if thread 2 misses some info during the
looping (since the map gets updated all the time...)

It's undefined what may happen if you do that. What actually will
happend depends on the implementation. Probably something not nice.

You might want to take a look at the synchronization wrappers for
Java collection classes to see how they do it. Basically they
make all the methods synchronized except the iterator which requires
you to explicity get the lock before doing the interation or risk
getting a ConcurrentModificationException.

If you did your own interator implementation you could possibly
acquire the lock when you get the iterator and release it when
the iterator is d'tored, making more RAII like.

You can go lock-free but not yet. We're not there yet. :)
 
R

rdilipk

Howard said:
I strongly suspect that this scenario could very rarely
introduce cases where the increment of the read thread gets redirected
into oblivion (say while the insert is rotating a sub-tree).

Howard

Bingo! Thats _exactly_ what I was worrying about. But after some
reflection I think I am in a much deeper morass. Multiple writer
threads are messing around with various parts of the map -- while this
is happening, whether or not iterators are invalidated, any reader
thread is going to see inconsistencies. So I am in party land
already..

Sorry group for the threading distraction....
 
P

Pete Becker

Dilip said:
I just looked at the docs for std::map and it looks like for
insertion/removal no iterators are invalidated. So atleast in this
particular case, aren't I safe?

No, there's a deeper problem. The map is guaranteed to be properly
sorted when the code returns from an insertion or removal, but during
that operation it doesn't have to be. So halfway through an insertion it
can be in a nonsensical state, and if you try to read it from another
thread you'll get nonsense. The brute force solution is to make all
external operations on the map atomic, by locking a mutex when the
operation starts, and unlocking it when the operation ends. That will
work for your scenario, since you said you don't care about missing
updates in your reader. But more generally, multi-threading has to be
designed in from the top down, not hacked in from the bottom. As it is,
you're trying to put your socks on after you've put your shoes on.
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Dilip said:
I just looked at the docs for std::map and it looks like for
insertion/removal no iterators are invalidated. So atleast in this
particular case, aren't I safe?

For my company's framework (FOST.3) we've been through this too. What
we did was to define a synchronisation object that allowed a limited
number of read locks or a single write lock. You can have a number of
readers, but only a single write thread at a time accessing the data
structure.

If you're writing about as often as you're reading this may not be the
best way. It really depends also on the larger context. Re-architecting
may be an option to sidestep the issue altogether, or using a different
map implementation. Some CPUs offer the possibility of lockless
structures, but they're very hard to implement properly.
 
D

Dilip

Pete said:
No, there's a deeper problem. The map is guaranteed to be properly
sorted when the code returns from an insertion or removal, but during
that operation it doesn't have to be. So halfway through an insertion it
can be in a nonsensical state, and if you try to read it from another
thread you'll get nonsense.

Pete.. another poster (redfloyd?) also pointed out the same thing. I
am trying to wrap my head around this one. Just to make sure I
understand: there is no guarantee what can happen if you try to
perform any operation on a map when another operation is an interrupted
state, right? That is an incomplete operation leaves the internal
state of the map inconsistent. Just so that I can convince my
colleagues with some esoteric knowledge can you give me an example of
what _might_ be the state of a map when a thread performing an insert
operation is interrupted because its time-slice ran out and another
thread mistakenly tries to read from the same map?
The brute force solution is to make all
external operations on the map atomic, by locking a mutex when the
operation starts, and unlocking it when the operation ends.

If what I understood is correct do we even have a choice? I mean if
you are destined to use a map in a multithreaded environment what else
can one do? Right now I am trying to do something like this:

typedef std::map<std::string, SomeComplexStruct*> myMap;

I protect any access to the map whenever an element is being _inserted_
for the first time.

However if its just an update, I protect only the _find_ operation.
Later instead of locking the whole map, I lock only that instance of
SomeComplexStructure I retrieved and make whatever changes I need to.

Is this a reasonable approach? I understand this is fast getting OT
for this newsgroup -- I am not sure what to do if I need to continue to
engage Pete in this discussion. I am using the Dinkumware library
anyway.
That will
work for your scenario, since you said you don't care about missing
updates in your reader. But more generally, multi-threading has to be
designed in from the top down, not hacked in from the bottom. As it is,
you're trying to put your socks on after you've put your shoes on.

Well yeah as I am starting to discover the hole I stuck my foot is not
an opening of a shoe -- its a goddamn banyan tree. Forget putting my
socks, I don't even know how to take my foot out :-(
 
D

Dilip

Kirit said:
For my company's framework (FOST.3) we've been through this too. What
we did was to define a synchronisation object that allowed a limited
number of read locks or a single write lock. You can have a number of
readers, but only a single write thread at a time accessing the data
structure.

That smells like a ReaderWriterLock, right? I decided it might not be
of use to me since I make way too many updates to the map.
 
P

Pete Becker

Dilip said:
Pete.. another poster (redfloyd?) also pointed out the same thing. I
am trying to wrap my head around this one. Just to make sure I
understand: there is no guarantee what can happen if you try to
perform any operation on a map when another operation is an interrupted
state, right? That is an incomplete operation leaves the internal
state of the map inconsistent. Just so that I can convince my
colleagues with some esoteric knowledge can you give me an example of
what _might_ be the state of a map when a thread performing an insert
operation is interrupted because its time-slice ran out and another
thread mistakenly tries to read from the same map?

Well, without looking at the code, one possibility is that one of the
tree nodes will have a child pointer that points to its parent. Another
is that some node will have multiple parents.

It's not a matter of what the map's internal structure might be, but of
whether there's anything you can make sense of. For example, if writing
the value of an int is not atomic, then it's possible to get a time
slice between writing two parts of its value; if another thread then
looks at the stored value, it will be nonsense.

unsigned int i = 0;

in one thread:
i = 0xffffffff;
If it gets sliced between writing the upper and lower halves of the
value, then the other thread sees a value of 0xffff0000 or 0x0000ffff.
Neither one makes sense. Same problem with pointer updates, which occur
frequently when rebalancing a tree.
 
?

=?iso-8859-1?q?Kirit_S=E6lensminde?=

Dilip said:
That smells like a ReaderWriterLock, right? I decided it might not be
of use to me since I make way too many updates to the map.

It is indeed exactly that sort of lock. One exclusive writer and many
readers.

You're going to need to use something like this though or re-architect
so that you don't need to lock at all. I'm afraid those are your only
safe options. To start with get it working correctly, then see if it is
too slow. If so then re-tihink what you're doing.

You will need to work out what the correct way to gain the locks is
going to be. Normally I would say that a writer attempting a lock would
stop any new readers, but depending on your application you may want to
allow them. Reducing the number of readers that you allow may allow the
writer to get in more quickly and improve the overall speed. Similarly,
increasing them might work better in your application. You'll need to
experiment to find out.

Clearly you also need to ensure that the locks are held for the
smallest possible time.
 
S

Steve Pope

Dilip said:
Just to make sure I
understand: there is no guarantee what can happen if you try to
perform any operation on a map when another operation is an interrupted
state, right? That is an incomplete operation leaves the internal
state of the map inconsistent.

This is true, however my (perhaps simplistic) concept of
multithreading is that it does not involve "interrupts"
in the way that multitasking does. Two threads operate
without interruption or context switching upon the same memory.
Neither need be in an interrupted state for a conflict to
occur. Were this not true you could solve all your problems
with the appropriate kernel calls to raise/lower priority
level.

Steve
 
J

Joe Seigh

Pete said:
Well, without looking at the code, one possibility is that one of the
tree nodes will have a child pointer that points to its parent. Another
is that some node will have multiple parents.

It's not a matter of what the map's internal structure might be, but of
whether there's anything you can make sense of. For example, if writing
the value of an int is not atomic, then it's possible to get a time
slice between writing two parts of its value; if another thread then
looks at the stored value, it will be nonsense.

unsigned int i = 0;

in one thread:
i = 0xffffffff;
If it gets sliced between writing the upper and lower halves of the
value, then the other thread sees a value of 0xffff0000 or 0x0000ffff.
Neither one makes sense. Same problem with pointer updates, which occur
frequently when rebalancing a tree.

That's a known issue. The usual solution is to use atomic types
with the right guarantees, i.e. atomicity and acquire or release
semantics if needed. STL don't use atomic types so you don't know
if they can be used in this way since it's compiler and platform
dependent.

The more insideous problem is you can't just arbitrarily move things
around and expect that the proper semantics will hold. For the
purposes of illustration I'll use a binary search on a sorted array
as it's equivalent to a binary tree and easier to show in text.

You have a sorted array X[] = (1, 2).
Thread 1 wants to find if the value 2 exists.
Thread 1 examines X[0] and finds 1.
Thread 2 deletes the value 1 and adds the value 5.
X[] = (2, 5).
Thread 1 examines X[1] and finds 5 and concludes
the value 2 does not exist in the set when in fact
it always was there.

While it may be acceptable to not see a value being concurrently
added, or see a value that's being concurrently deleted, this
isn't one of thost cases.

PCOW (partial copy on write) will solve this problem although
the keeping the tree balanced is a little tricky. It's almost
certain that STL doesn't deal with this problem correctly without
a conventional lock based solution.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top