Benefits of asyncio

C

Chris Angelico

Write me a purely nonblocking
web site concept that can handle a million concurrent connections,
where each one requires one query against the database, and one in a
hundred of them require five queries which happen atomically.


I don't see why that can't be done. Twisted has everyting I can think of
except database bits (adb runs on threads), and I got txpostgres[1]
running in production, it seems quite robust so far. what else are we
missing?

[1]: https://pypi.python.org/pypi/txpostgres

I never said it can't be done. My objection was to Marko's reiterated
statement that asynchronous coding is somehow massively cleaner than
threading; my argument is that threading is often significantly
cleaner than async, and that at worst, they're about the same (because
they're dealing with exactly the same problems).

ChrisA
 
C

Chris Angelico

So why not keep a 'connection pool', and for every potentially blocking
request, grab a connection, set up a callback or a 'yield from' to wait for
the response, and unblock.

Compare against a thread pool, where each thread simply does blocking
requests. With threads, you use blocking database, blocking logging,
blocking I/O, etc, and everything *just happens*; with a connection
pool, like this, you need to do every single one of them separately.
(How many of you have ever written non-blocking error logging? Or have
you written a non-blocking system with blocking calls to write to your
error log? The latter is far FAR more common, but all files, even
stdout/stderr, can block.) I don't see how Marko's assertion that
event-driven asynchronous programming is a breath of fresh air
compared with multithreading. The only way multithreading can possibly
be more complicated is that preemption can occur anywhere - and that's
exactly one of the big flaws in async work, if you don't do your job
properly.

ChrisA
 
M

Marko Rauhamaa

Chris Angelico said:
I don't see how Marko's assertion that event-driven asynchronous
programming is a breath of fresh air compared with multithreading. The
only way multithreading can possibly be more complicated is that
preemption can occur anywhere - and that's exactly one of the big
flaws in async work, if you don't do your job properly.

Say you have a thread blocking on socket.accept(). Another thread
receives the management command to shut the server down. How do you tell
the socket.accept() thread to abort and exit?

The classic hack is close the socket, which causes the blocking thread
to raise an exception.

The blocking thread might be also stuck in socket.recv(). Closing the
socket from the outside is dangerous now because of race conditions. So
you will have to carefully use add locking to block an unwanted closing
of the connection.

But what do you do if the blocking thread is stuck in the middle of a
black box API that doesn't expose a file you could close?

So you hope all blocking APIs have a timeout parameter. You then replace
all blocking calls with polling loops. You make the timeout value long
enough not to burden the CPU too much and short enough not to annoy the
human operator too much.

Well, ok,

os.kill(os.getpid(), signal.SIGKILL)

is always an option.


Marko
 
C

Chris Angelico

Say you have a thread blocking on socket.accept(). Another thread
receives the management command to shut the server down. How do you tell
the socket.accept() thread to abort and exit?

The classic hack is close the socket, which causes the blocking thread
to raise an exception.

How's that a hack? If you're shutting the server down, you need to
close the listening socket anyway, because otherwise clients will
think they can get in. Yes, I would close the socket. Or just send the
process a signal like SIGINT, which will break the accept() call. (I
don't know about Python specifically here; the underlying Linux API
works this way, returning EINTR, as does OS/2 which is where I
learned. Generally I'd have the accept() loop as the process's main
loop, and spin off threads for clients.) In fact, the most likely case
I'd have would be that the receipt of that signal *is* the management
command to shut the server down; it might be SIGINT or SIGQUIT or
SIGTERM, or maybe some other signal, but one of the easiest ways to
notify a Unix process to shut down is to send it a signal. Coping with
broken proprietary platforms is an exercise for the reader, but I know
it's possible to terminate a console-based socket accept loop in
Windows with Ctrl-C, so there ought to be an equivalent API method.
The blocking thread might be also stuck in socket.recv(). Closing the
socket from the outside is dangerous now because of race conditions. So
you will have to carefully use add locking to block an unwanted closing
of the connection.

Maybe. More likely, the same situation applies - you're shutting down,
so you need to close the socket anyway. I've generally found -
although this may not work on all platforms - that it's perfectly safe
for one thread to be blocked in recv() while another thread calls
send() on the same socket, and then closes that socket. On the other
hand, if your notion of shutting down does NOT include closing the
socket, then you have to deal with things some other way - maybe
handing the connection on to some other process, or something - so a
generic approach isn't appropriate here.
But what do you do if the blocking thread is stuck in the middle of a
black box API that doesn't expose a file you could close?

So you hope all blocking APIs have a timeout parameter.

No! I never put timeouts on blocking calls to solve shutdown problems.
That is a hack, and a bad one. Timeouts should be used only when the
timeout is itself significant (eg if you decide that your socket
connections should time out if there's no activity in X minutes, so
you put a timeout on socket reads of X*60000 and close the connection
cleanly if it times out).
Well, ok,

os.kill(os.getpid(), signal.SIGKILL)

is always an option.

Yeah, that's one way. More likely, you'll find that a lesser signal
also aborts the blocking API call. And even if you have to hope for an
alternate API to solve this problem, how is that different from hoping
that all blocking APIs have corresponding non-blocking APIs? I
reiterate the example I've used a few times already:

https://docs.python.org/3.4/library/logging.html#logging.Logger.debug

What happens if that blocks? How can you make sure it won't?

ChrisA
 
C

Chris Angelico

I haven't used that class. Generally, Python standard libraries are not
readily usable for nonblocking I/O.

For myself, I have solved that particular problem my own way.

Okay. How do you do basic logging? (Also - rolling your own logging
facilities, instead of using what Python provides, is the simpler
solution? This does not aid your case.)

ChrisA
 
R

Roy Smith

Marko Rauhamaa said:
Say you have a thread blocking on socket.accept(). Another thread
receives the management command to shut the server down. How do you tell
the socket.accept() thread to abort and exit?

You do the accept() in a daemon thread?
 
M

Marko Rauhamaa

Chris Angelico said:
Okay. How do you do basic logging? (Also - rolling your own logging
facilities, instead of using what Python provides, is the simpler
solution? This does not aid your case.)

Asyncio is fresh out of the oven. It's going to take years before the
standard libraries catch up with it.


Marko
 
B

Burak Arslan

Write me a purely nonblocking
web site concept that can handle a million concurrent connections,
where each one requires one query against the database, and one in a
hundred of them require five queries which happen atomically.

I don't see why that can't be done. Twisted has everyting I can think of
except database bits (adb runs on threads), and I got txpostgres[1]
running in production, it seems quite robust so far. what else are we
missing?

[1]: https://pypi.python.org/pypi/txpostgres
I never said it can't be done. My objection was to Marko's reiterated
statement that asynchronous coding is somehow massively cleaner than
threading; my argument is that threading is often significantly
cleaner than async, and that at worst, they're about the same (because
they're dealing with exactly the same problems).

Ah ok. Well, a couple of years of writing async code, my
not-so-objective opinion about it is that it forces you to split your
code into functions, just like Python forces you to indent your code
properly. This in turn generally helps the quality of the codebase.

If you manage to keep yourself out of the closure hell by not writing
more and more functions inside one another, I say async code and
(non-sloppy) blocking code looks almost the same. (which means, I guess,
that we mostly agree :))

Burak
 
C

Chris Angelico

Ah ok. Well, a couple of years of writing async code, my not-so-objective
opinion about it is that it forces you to split your code into functions,
just like Python forces you to indent your code properly. This in turn
generally helps the quality of the codebase.

That's entirely possible, but it depends hugely on your
library/framework, then - see earlier comments in this thread about
Node.js and the nightmare of callbacks.

One thing I'm seeing, though, the more different styles of programming
I work with, is that since it's possible to write good code in pretty
much anything (even PHP, and my last boss used that as a
counter-argument to "PHP sucks"), and since a good programmer will
write good code in anything, neither of these is really a good
argument in favour of (or against) a feature/library/framework/style.
Python forces you to indent your code. Fine! But a good programmer
will already indent, and a sloppy programmer isn't forced to be
consistent. (At worst, you just add "if True:" every time you
unexpectedly indent.) To judge the quality of a framework based on
code style, you need to look at a *bad* programmer and what s/he
produces. A bad programmer, with just GOTO and line numbers, will
often produce convoluted code that's completely unreadable; a bad
programmer with a good suite of structured control flow will more
generally stumble to something that's at least mostly clear.

So how does async vs threaded stack up there? A competent programmer
won't have a problem with either model. A mediocre programmer probably
will think about one thing at a time, and will then run into problems.
Threading produces these problems in one set of ways, asyncio produces
problems in another set of ways. Which one would you, as an expert,
prefer to deal with in a junior programmer's code?

ChrisA
 
P

Paul Rubin

Marko Rauhamaa said:
That's a good reason to avoid threads. Once you realize you would have
been better off with an async approach, you'll have to start over.

That just hasn't happened to me yet, at least in terms of program
organization. Python threads get too slow once there are too many
tasks, but that's just an implementation artifact of Python threads, and
goes along with Python being slow in general. Write threaded code in
GHC or Erlang or maybe Go, and you can handle millions of connections,
as the threads are in userspace and are very lightweight and fast.

http://haskell.cs.yale.edu/wp-content/uploads/2013/08/hask035-voellmy.pdf
 
P

Paul Rubin

Marko Rauhamaa said:
Mostly asyncio is a way to deal with anything you throw at it. What do
you do if you need to exit the application immediately and your threads
are stuck in a 2-minute timeout?

Eh? When the main thread exits, all the child threads go with it.
Sometimes there is some crap in the stderr because of resource cleanups
happening in unexpected order as the various threads exit, but it
all shuts down.

The new Tulip i/o stuff based on "yield" coroutines should combine the
advantages of async and threads.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,555
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top