is +=1 thread safe

A

AlFire

Hi,

I have a piece of software which uses threads in very massive way - like
hundreds of them generated every second.

there is also a piece of code which maintains the number of outstanding
threads, simply

counter+=1 is executed when before starting the thread and counter-=1
after it finishes.

all is very simple and by the end of the program life I expect the
counter to zero out.

however I am getting values -1, -2, 1 ,2 ,3 and quite often 0 as expected.

I guarded those statement with Lock.{acquire,release} and now it always
returns 0.


But I still can not believe that +=1 is not a thread safe operation.


Any clue?
 
D

Diez B. Roggisch

AlFire said:
Hi,

I have a piece of software which uses threads in very massive way - like
hundreds of them generated every second.

there is also a piece of code which maintains the number of outstanding
threads, simply

counter+=1 is executed when before starting the thread and counter-=1
after it finishes.

all is very simple and by the end of the program life I expect the
counter to zero out.

however I am getting values -1, -2, 1 ,2 ,3 and quite often 0 as expected.

I guarded those statement with Lock.{acquire,release} and now it always
returns 0.


But I still can not believe that +=1 is not a thread safe operation.

don't confuse augmented assignment with incrementation as it is offered
by C (if your data-type actually fits into a single addressable memory
spot, that is)

python's += works like this


a += b <=> a = a.__iadd__(b)

Thus you actually get a situation where the expression on the right is
evaluated but not yet assigned - and then another thread can take over
control, computing with the old value of a.

Diez
 
J

John Nagle

AlFire said:
Hi,
all is very simple and by the end of the program life I expect the
counter to zero out.

however I am getting values -1, -2, 1 ,2 ,3 and quite often 0 as expected.

I guarded those statement with Lock.{acquire,release} and now it always
returns 0.


But I still can not believe that +=1 is not a thread safe operation.


Any clue?

"counter += 1" is not an atomic operation, apparently. Sorry.

However, appending and removing from a list or dict are atomic operations,
because if they were not, memory allocation would break.

John Nagle
 
G

Gary Herron

AlFire said:
Hi,

I have a piece of software which uses threads in very massive way -
like hundreds of them generated every second.

there is also a piece of code which maintains the number of
outstanding threads, simply

counter+=1 is executed when before starting the thread and counter-=1
after it finishes.

all is very simple and by the end of the program life I expect the
counter to zero out.

however I am getting values -1, -2, 1 ,2 ,3 and quite often 0 as
expected.

I guarded those statement with Lock.{acquire,release} and now it
always returns 0.


But I still can not believe that +=1 is not a thread safe operation.


Any clue?
Of course it's not thread safe. For the same reason and more basic,
even the expression i++ is not thread safe in C++.

Any such calculation, on modern processors, requires three operations:
retrieve value of i into a register,
increment the register
write the value into i.

If a thread is interrupted anywhere within that sequence, and another
thread access i, you have a conflict. (And indeed, hardware interrupts
can occur between any two instructions.)

Gary Herron
 
M

Marc 'BlackJack' Rintsch

Of course it's not thread safe. For the same reason and more basic,
even the expression i++ is not thread safe in C++.

Any such calculation, on modern processors, requires three operations:
retrieve value of i into a register,
increment the register
write the value into i.

There are no modern processors with an opcode for incrementing a memory
location!? At least my C64 can do that. ;-)

Ciao,
Marc 'BlackJack' Rintsch
 
P

Paul Rubin

AlFire said:
But I still can not believe that +=1 is not a thread safe operation.

In CPython I believe it is thread safe, because of the global
interpreter lock. Thread switches can happen only between bytecode
executions, not in the middle of a bytecode, even though executing a
bytecode can take several machine instructions.
 
H

Hrvoje Niksic

Paul Rubin said:
In CPython I believe it is thread safe, because of the global
interpreter lock. Thread switches can happen only between bytecode
executions, not in the middle of a bytecode, even though executing a
bytecode can take several machine instructions.

It's safe when writing extensions in C (because of the GIL), but not
when writing Python code. Python switches threads after a number of
bytecode instructions, so Python code behaves somewhat like C
multithreaded code on a single CPU -- although there is no true
parallelism, you still have context switches at arbitrary places to
worry about, and still need locks because of them. Since the +=
operator is not compiled into a single bytecode instruction, it needs
the lock. x+=1 gets compiled into the following bytecode:

3 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)

If two threads execute that code simultaneously, a thread switch could
easily occur some time between LOAD_GLOBAL and STORE_GLOBAL, causing
the variable to be incremented by 1 instead of by 2.
 
D

Diez B. Roggisch

Duncan said:
The operation i++ in C/C++ implies two things: it increments the memory
location and returns the new result. That means there are at least two ways
to generate code for this:

increment memory
load new value

or

load value
incremenent
store

Neither of these is going to be thread-safe (even on a C64) unless you
protect the operations somehow. The latter is probably preferred as it only
implies two operations to memory whereas the former implies three.

This is not entirely right. increment is usually atomic (at least for
single-core processors), so you don't suffer from a load/store being
interrupted.

Of course it can happen that the previous or later loading of the value
for computational purposes is interrupted by another thread. But you
have the guarantee that all incs (and decs) are counted, in contrast to
what the OP observed with python.

Diez
 
A

Arnaud Delobelle

Marc 'BlackJack' Rintsch said:
There are no modern processors with an opcode for incrementing a memory
location!? At least my C64 can do that. ;-)

Indeed! I remember a simple use was to make the border change colour
very fast, a v. cool effect when you're 12!

CLV
LOOP: INC $D020
BVC LOOP

(from memory, untested!)
 
P

Paul Rubin

Hrvoje Niksic said:
Since the += operator is not compiled into a single bytecode
instruction, it needs the lock.

Aha, you are right. What I was remembering is that xrange.next
is atomic in CPython, i.e. you can say something like

counter = xrange(10000)

and then

a = counter.next()

doesn't need a lock. I am personally squeamish about relying on things
like this but apparently it is a standard idiom. I will guess, but
I haven't checked and I don't remember hearing, that itertools.count()
also works like that.
 
A

AlFire

Duncan Booth wrote:
[...]
is equivalent to:

x = x.__iadd__(1)

thx all for answers and hints ...
Generating hundreds of threads is, BTW, a very good way to get poor
performance on any system. Don't do that. Create a few threads and put the
actions for those threads into a Queue. If you want the threads to execute
in parallel investigate using sub-processes.


I know that limitation. However I am bridging to existing software which
is hard to be changed. and on powerful machine I have at hand it works
quite fast.
The threading module already has a function to return the number of Thread
objects currently alive.

I have threads within threads - so it does not suit me :-(.
 
M

Marc 'BlackJack' Rintsch

Indeed! I remember a simple use was to make the border change colour
very fast, a v. cool effect when you're 12!

CLV
LOOP: INC $D020
BVC LOOP

(from memory, untested!)

That works but I think

LOOP: INC $D020
JMP LOOP

is a bit more straight forward. Shorter in opcodes, equal in bytes, the
loop is as fast but you save the two cycles of the CLV. :)

Ciao,
Marc 'BlackJack' Rintsch
 
A

Arnaud Delobelle

Marc 'BlackJack' Rintsch said:
That works but I think

LOOP: INC $D020
JMP LOOP

is a bit more straight forward. Shorter in opcodes, equal in bytes, the
loop is as fast but you save the two cycles of the CLV. :)

Ah you're right! I thought I remembered that Bxx instructions took 2
cycles whereas a JMP takes 3. I've checked it on the wonderful wide
web and Bxx takes 2 cycles if the branch fails, but 3 cycles if the
branch succeeds :) (or even 5 if it branches to a new page).

To bring this back on topic, I would qualify my code snippet as more
"Pythonesque" than yours ;)
 
A

Alexander Schmolck

AlFire said:
I have threads within threads - so it does not suit me :-(.

How about using a scalar numpy array? They are mutable, so I assume that x +=
1 should be atomic.

'as
 
G

Gary Herron

Alexander said:
How about using a scalar numpy array? They are mutable, so I assume that x +=
1 should be atomic.

No NO NO! The only way to increment a variable in memory is through a
three step process:

Load a register from a memory location
Increment the register
Store the value back into memory.

Ages ago, there were architectures that would do an increment on a
memory location in an atomic way, but with the current (RISC)
architectures these are three separate operations.

A good compiler may be able to eliminate some of these operations by
keeping the variable in question in a register, but this just worsens
the problem with respect to threads since each thread has it's own set
of register values.

Gary Herron
 
A

AlFire

Alexander said:
How about using a scalar numpy array? They are mutable, so I assume that x +=
1 should be atomic.
I ended up with following:

counter=[] counter=0

counter.append(None) counter+=1

counter.pop() counter-=1

len(counter) counter

A.
 
C

Carl Banks

No NO NO! The only way to increment a variable in memory is through a
three step process:

Yes, this will be atomic. It's a pretty good idea, in fact. The
underlying increment operation is protected by the GIL; it could be
three, forty, or a hundred steps and it'd be atomic at the Python
level.


Carl Banks
 
G

Gary Herron

Carl said:
Yes, this will be atomic. It's a pretty good idea, in fact. The
underlying increment operation is protected by the GIL; it could be
three, forty, or a hundred steps and it'd be atomic at the Python
level.

But... It's not!

A simple test shows that. I've attached a tiny test program that shows
this extremely clearly. Please run it and watch it fail. In this test,
two threads both increment a shared global variable 1 million times.
This should end with a value of 2 million, but it rarely does so.
(Actually it has *never* done so yet.)

The GIL makes sure no two Python threads are running at the same time,
and from that you can conclude that Python virtual machine instructions
are atomic, but you cannot conclude that Python statements/expression
are atomic. Any Python statement or expression that compiles into more
than one OP code can be interrupted between those OP codes, and this can
cause devastating results to the interrupted statement.

Gary Herron
 
B

Bjoern Schliessmann

Gary said:
No NO NO! The only way to increment a variable in memory is
through a three step process:

Load a register from a memory location
Increment the register
Store the value back into memory.

I suggest you read "Intel 64 and IA-32 Architectures Software
Developer's Manual" (2A, Instruction Set Reference A-M). INC and
DEC can have register *or* memory location as operand, and can be
executed atomically even in multiprocessor environments (LOCK
prefix).
Ages ago, there were architectures that would do an increment on a
memory location in an atomic way, but with the current (RISC)
architectures these are three separate operations.

That may be true for RISC architectures. Unfortunately, all 8086
based CPUs are CISC.

Regards,


Björn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top