Nonblocking IO read

A

ara.t.howard

Windows will be a problem. Admittedly, I haven't tried ruby 1.8.5 yet,
which has new nonblock_* methods. However, my expectation is that you'll
only get nonblocking behavior on windows from sockets, not from pipes.

On Windows, calling select() on a pipe, always returns immediately with
"data ready to read", regardless if there's any data there or not.

This has been the bane of my existence on Windows ruby for 5 or 6 years. I
do IPC on Windows ruby using TCP over loopback, instead of pipes, in order
to get nonblocking semantics. (That still doesn't help for reading from the
console, though... (search the archives for 'kbhit' for a partial solution
there...))

One of these years, I'd like to chat with a Windows guru and ask how he/she
would recommend making a select() that works on both sockets and pipes on
Windows. Ruby could *really* use one.

maybe

http://apr.apache.org/docs/apr/group__apr__poll.html

-a
 
A

ara.t.howard

I thought Ruby internally uses non-blocking I/O in order to avoid that a
green thread reading something blocks every other thread: am I wrong? Or is
this true just under unix?


try this on windows

harp:~ > cat a.rb
t = Thread.new{ loop{ STDERR.puts Time.now.to_f } }
STDIN.gets


-a
 
T

Tom Pollard

Actually this is not correct: if there is a lot written to stderr
then you need to read that concurrently. If you do not do that
then the process will block on some stderr write operation that
fills up the pipe and you get a deadlock because your code waits
for process termination.

I guess I can see that, though I can't think of a program that I'd
expect to be able to generate enough stderr output to clog a pipe.
In any case, my response would be to merge stdout and stderr, rather
than use non-blocking IO. If you're just reading one stream while
the command is executing, you don't need to worry about blocking.
I'm certainly with Ara in recommending that if you can avoid non-
blocking IO, you should.

At the risk of starting an unrelated discussion ("stderr considered
harmful"), my feeling has long been that stderr is misused by most
people, and that the only context in which it makes any sense is for
small commandline tools that you expect to use in a pipeline. For
apps like that, it's helpful to keep error messages out of your
stdout stream. For most apps, however, I don't think it makes any
sense to write error messages to a separate file. Error messages
should be written to the app's main log file or output file, where
the user will be looking for their results. That way, nonfatal error
messages also appear naturally in the proper sequence with other
output. I work with a lot of scientist programmers who don't think
much about issues like this and (typically) write their error
messages to stderr just because it's there. I'm not sure that's
relevant to the OP's situation or not. (Probably not.)

Tom
 
A

ara.t.howard

I guess I can see that, though I can't think of a program that I'd expect to
be able to generate enough stderr output to clog a pipe.

did you check out my recent post (switched subjects) - it takes a suprisingly
small amount (4242 lines of output does it easily)!
In any case, my response would be to merge stdout and stderr, rather than
use non-blocking IO. If you're just reading one stream while the command is
executing, you don't need to worry about blocking. I'm certainly with Ara
in recommending that if you can avoid non-blocking IO, you should.

no argument there... but
At the risk of starting an unrelated discussion ("stderr considered
harmful"), my feeling has long been that stderr is misused by most people,
and that the only context in which it makes any sense is for small
commandline tools that you expect to use in a pipeline. For apps like that,
it's helpful to keep error messages out of your stdout stream. For most
apps, however, I don't think it makes any sense to write error messages to a
separate file. Error messages should be written to the app's main log file
or output file, where the user will be looking for their results. That way,
nonfatal error messages also appear naturally in the proper sequence with
other output. I work with a lot of scientist programmers who don't think
much about issues like this and (typically) write their error messages to
stderr just because it's there. I'm not sure that's relevant to the OP's
situation or not. (Probably not.)

i'm in the same boat as you (writing for scientists) and have found it's
nearly __always__ the case that a program can produce something useful in
stdout and therefore always log to stderr so that programs can be used in
pipes.

mostly i agree though.

cheers.

-a
 
R

Robert Klemme

Tom said:
I guess I can see that, though I can't think of a program that I'd
expect to be able to generate enough stderr output to clog a pipe.

A typical pipe buffer size is 4k which can get filled pretty fast.
In
any case, my response would be to merge stdout and stderr, rather than
use non-blocking IO. If you're just reading one stream while the
command is executing, you don't need to worry about blocking.

Merging both from outside the subprocess is certainly possible from Ruby
(via a shell) but I am not sure, whether there is a portable solution
(one of the popenN methods?).
I'm
certainly with Ara in recommending that if you can avoid non-blocking
IO, you should.

I second that.
At the risk of starting an unrelated discussion ("stderr considered
harmful"), my feeling has long been that stderr is misused by most
people, and that the only context in which it makes any sense is for
small commandline tools that you expect to use in a pipeline. For apps
like that, it's helpful to keep error messages out of your stdout
stream. For most apps, however, I don't think it makes any sense to
write error messages to a separate file. Error messages should be
written to the app's main log file or output file, where the user will
be looking for their results. That way, nonfatal error messages also
appear naturally in the proper sequence with other output. I work with
a lot of scientist programmers who don't think much about issues like
this and (typically) write their error messages to stderr just because
it's there. I'm not sure that's relevant to the OP's situation or not.
(Probably not.)

All true. But if you do not know what the program does or how it is
implemented you better deal with potential output to stderr (either by
merging, see above, or by making sure that stderr and stdout are read)
because otherwise the consequences might be somewhat catastrophic. And
this could also mean that your program at some point in the future
simply stops working because another piece of software has changed.

Kind regards

robert
 
B

Bjorn Borud

[[email protected]]
| >
| > Java for example got proper nonblocking socket IO in 1.4 (see
| > JSR-000051 "New I/O APIs for the JavaTM Platform").
|
| but it's archaic compared to event driven frameworks like
|
| http://www.monkey.org/~provos/libevent/
| http://liboop.ofb.net/

you are comparing apples and oranges here. you have to look at where
these things fit into the big picture. the above mentioned libraries
provide abstractions that are used to provide a more convenient
programming model, and at least in the case of libevent, provide an
abstraction that insulates code from having to deal with supporting
the various readiness selection APIs that exist. they do not replace
the underlying OS interfaces upon which they are built.

JSR-000051 does more or less the same thing, but with a slightly
different goal in mind. the primary goal is to provide the programmer
with access to asynchronous network IO in a platform-agnostic
manner. the emphasis is on providing a sensible abstraction that works
well with the various operating system APIs that exist -- exposing a
reasonable common subset of functionality that one can expect to be
able to support on a variety of platforms. JSR-000051 provides the
primitives on top of which you would implement IO frameworks that
provide conventient programming models.

for a language, providing sensible APIs for primitives is more
important than imposing a particular programming model. given a set
of sensible primitives that can be widely supported, whatever higher
level frameworks one wishes to create can be built atop that.

| in addition it's a hack in the c lib of many oses.

I am not sure I understand exactly what you mean. the way I see it
there are three distinct levels to networking APIs:

1. OS API, that is, the system calls

2. standardized system libraries, like libc, java.io, java.util.nio
etc, which provide access to IO primitives and possibly abstract
away the underlying OS APIs

3. higher level IO frameworks that provide more convenient programming
models

libevent would overlap with both 2 and 3 in this case since its
mission is *both* to abstract away the underlying OS interfaces *and*
provide a convenient programming model. Java's NIO is what you'd find
in 2 and a typical Reactor pattern implementation would be in 3.

(if you use low-level socket APIs (ie the types of functions
documented in section 2 of the UNIX manual pages) on UNIX you will
find yourself using a mix of 1 and 2 if you use C/C++ since you use
wrappers in libc to perform system calls, but this is just a very,
very thin convenience layer on top of the system calls. if this is
confusing then just forget I mentioned it :).

| > for examples of use you might want to check out the Reactor pattern
| > and other patterns for concurrent programming.
|
| afaik the reactor pattern is a synchronous pattern

no, the Reactor pattern is mainly used to implement asynchronous IO
and is not really anything new. I've both written and seen variations
of this pattern in a multitude of languages since I started writing
networking code in the early 90s and the only thing that has really
changed is that we've gotten better at classifying these types of
patterns, give them better definitions, and give them names. (when I
started writing networking software in the early 90s, "patterns"
wasn't commonly part of the programmers vocabulary).

| http://www.artima.com/articles/io_design_patterns2.html
|
| not unlike the model of libevent and liboop - which are both synchronous...
| am i missing something?

you are missing an "a" in front of "synchronous" :)

-Bjørn
 
S

S. Robert James

Yep, that's how I came across this problem initially.

Doing that (replace 'cat -' with my command) hung indefinetly.
Commenting out the stderr line fixed it. I assume that it was waiting
for something to write to stderr before progressing.

Now, you are correct that the external process had terminated. Why
that didn't close stderr and move on I do not know. More importantly -
is there a way to do what Tom is suggesting - that is, have Ruby move
on the second the external process terminates - that will work on
Windows as well?

(As an aside, kudos to the developers of popen4 - it's really great.)
 
A

ara.t.howard

Yep, that's how I came across this problem initially.

Doing that (replace 'cat -' with my command) hung indefinetly.
Commenting out the stderr line fixed it. I assume that it was waiting
for something to write to stderr before progressing.

Now, you are correct that the external process had terminated. Why
that didn't close stderr and move on I do not know. More importantly -
is there a way to do what Tom is suggesting - that is, have Ruby move
on the second the external process terminates - that will work on
Windows as well?

check out systemu
(As an aside, kudos to the developers of popen4 - it's really great.)

i am 99% positive that the implimentation of popen4 does not play well with
windows and may be impossible to make it do so. the systemu package i just
released is my attempt and an alternate implimentation. give it a while and
let me know how it goes.

regards.

-a
 
V

Vidar Hokstad

it's exactly things like eventmachine that make me say using nbio is archaic -
i don't need to handle the complexities of nbio when powerful abstractions
like it exist!

The problem is that all of these "powerful abstractions" are
ridiculously slow compared to a well written nbio approach for many
types of applications. Particularly as long as Ruby's threading is so
abysmal.

Try writing a network server that needs to handle a high number of
concurrent connections, and you'll quickly find "select()" taking most
of your CPU if you use a model that makes use of threading and blocking
IO - your only real choice to get decent performance out of Ruby for
that kind of app is multiplexing the processing manually using nbio
(which is what Ruby is trying to do being the scenes, but fails
miserably at doing effectively once the number of threads gets high
enough) or fork instead which has it's own problems if you need to
share significant state.

This is from personal experience - I currently have a guy on my team
rewriting an important backend process because we started running into
those exact issues.

Even when Ruby's threading is sorted out so we won't run into these
problems, nbio will be vital for high performance network programming -
well done nbio reduces the number of syscalls, and thereby context
switches enormously.

Vidar
 
V

Vidar Hokstad

it can't be done. search the archives, this sort of thing almost always
indicates a design flaw. for instance - what will your program do if there is
no input?

I have a messaging middleware server running right now that is
processing millions of messages a day. It's written in Ruby, and does
_all_ it's work in a single thread using IO multiplexing with select().


Not only can it be done - it is fairly easy (~700 lines for the entire
app, including db persistence support etc.). Doing the same thing with
threads and blocking IO, on the other hand, would fall apart horribly
due to the way Ruby does threading. Forking also wouldn't be an option
as the processes would still need to actually exchange those messages.

In fact, internally, Ruby does all it's IO using non-blocking IO
exactly because the threading model would cause everything to block
otherwise. Incidentally that's also one of the reasons why using
threads + blocking IO performs extremely badly in Ruby once the number
of threads gets beyond a certain level, because it causes Ruby to call
select() far too often.

Vidar
 
B

Bill Kelly

From: "Vidar Hokstad said:
The problem is that all of these "powerful abstractions" are
ridiculously slow compared to a well written nbio approach for many
types of applications. Particularly as long as Ruby's threading is so
abysmal.

Sounds like you might want to actually take a look at eventmachine,
then. :)


Regards,

Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top