How to prevent hanging when writing lots of text to a pipe?

J

jl_post

Hi,

I'm trying to write text to a piped writeHandle (created with the
pipe() function) so that I can later read the text by extracting it
from a readHandle. However, I discovered that if I write a lot of
text to the pipe, my script just hangs. Here is an example program:

#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
pipe(my $readHandle, my $writeHandle);

# Autoflush $writeHandle:
my $oldHandle = select($writeHandle);
$| = 1;
select($oldHandle);

print $writeHandle "$_\n" foreach 1 .. $number;
close($writeHandle);

# Extract the output, line-by-line:
<$readHandle>
};

print "Extracted output lines:\n @lines";

__END__

When I run this program, I notice that it runs perfectly for small
values of $number (like 10). But on high values (like ten thousand),
the program hangs.

From testing, I discovered that the limit on the Windows platform
I'm using is 155, and the limit on the Linux platform I'm using is
1040. Any higher number causes the program to hang.

As for why this is happening, my best guess is that a program can
only stuff so much output into a piped writeHandle before it gets
full. Therefore, deadlock occurs, as the reading won't happen until
the writing is finished.

However, I'm not fully convinced this is the case, because I
replaced the lines:

print $writeHandle "$_\n" foreach 1 .. $number;
close($writeHandle);

with:

if (fork() == 0)
{
# Only the child process gets here:
print $writeHandle "$_\n" foreach 1 .. $number;
close($writeHandle);
exit(0);
}

and now the Perl script hangs on both Windows and Linux platforms,
even with low values of $number (such as 5). My intent was to make
the child process solely responsible for stuffing the output into the
pipe, while the parent process read from the $readHandle as data
became available. That way we would avoid the pipe getting stuffed to
capacity.

But as I've said, that fork()ing code change doesn't even work for
any values, so I must be doing something wrong somewhere.

So my question is: How do I prevent my script from hanging when I
have a lot of text to send through the pipe?

Thanks in advance for any help.

-- Jean-Luc
 
I

ilovelinux

   I'm trying to write text to a piped writeHandle (created with the
pipe() function) so that I can later read the text by extracting it
from a readHandle.  However, I discovered that if I write a lot of
text to the pipe, my script just hangs.  Here is an example program: [snip]
   As for why this is happening, my best guess is that a program can
only stuff so much output into a piped writeHandle before it gets
full.  Therefore, deadlock occurs, as the reading won't happen until
the writing is finished.

That's right. Pipes have a limited capacity. See http://linux.die.net/man/7/pipe.
Posix prescribes a minimum capacity of 512, which is exactly
implemented on your windows machine:
$ perl -we 'print "$_\n" for 1..155'| wc -c
512
$

Linux has 4096 on your machine:
$ perl -we 'print "$_\n" for 1..1040'| wc -c
4093
$

but in modern kernels it is 2^16.

As for your fork()ing test program: you should close both the write
descriptor of the pipe in the reading process and the read descriptor
in the writing process.
 
J

Jürgen Exner

I'm trying to write text to a piped writeHandle (created with the
pipe() function) so that I can later read the text by extracting it
from a readHandle. However, I discovered that if I write a lot of
text to the pipe, my script just hangs. Here is an example program:
[...]
When I run this program, I notice that it runs perfectly for small
values of $number (like 10). But on high values (like ten thousand),
the program hangs.
[...]
Therefore, deadlock occurs, as the reading won't happen until
the writing is finished.

And that is your problem. Pipes are an IPC method, they are not meant
and not designed to store any data. You are abusing the buffer for
long-term storage. Of course that is not going to work.

You need a different design, e.g. using a file as storage medium. There
are other options, too, like the Storable module which may or may not be
of help.

jue
 
J

jl_post

Thank you all for your excellent (and rapid) responses. They were
all helpful and clear.

The solution is to add

close($writeHandle);

as the first statement executed in the parent after the child is forked;


That worked! Thanks! Although, I'm a little puzzled: Why should
the parent close the $writeHandle? I would think that it's enough for
the child to close it. After all, the child process (and not the
parent) is the one that's doing the writing.

Note: I have no idea if this will work on Win32. Win32 perl's fork
emulation can sometimes be a little peculiar.

I agree with your statement (as I've encountered Windows' fork()ing
peculiarities myself). However, I tested the code with your
modifications on Windows (on Vista running Strawberry Perl) and it
works just fine.

So here is my final script:

#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
pipe(my $readHandle, my $writeHandle);

if (fork() == 0)
{
# Only the child process gets here:
print $writeHandle "$_\n" foreach 1 .. $number;
exit(0);
}

close($writeHandle);

# Extract the output, line-by-line:
local $/ = "\n";
<$readHandle>
};

print "Extracted output lines:\n @lines";

__END__

Note that no fileHandles are explicitly closed except for
$writeHandle in the parent thread. Perhaps I should close
$writeHandle in the child thread and $readHandle in the parent thread,
but I figured that since I declared both fileHandles lexically, then
they'll automatically be closed at the end of their own scopes.

(Of course, that won't be true if the fileHandles aren't lexically
declared, so that's something to keep in mind when not using lexical
fileHandles.)

However, ilovelinux's and Jürgen Exner's responses got me thinking
about a non-pipe() way of doing what I wanted, so I looked at "perldoc
-f open" and read up on writing to a scalar. I was able to re-write
my code so that it still wrote output to a fileHandle, and that output
ended up in a scalar (and eventually into an array).

Here is that code:

#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
use 5.008; # for writing-to-scalar support
my $output;
open(my $writeHandle, '>', \$output)
or die "Cannot write to \$output scalar: $!\n";

print $writeHandle "$_\n" foreach 1 .. $number;

close($writeHandle);

# Split after all newlines, except for the
# one at the very end of the $output string:
split m/(?<=\n)(?!\z)/, $output;
};

print "Extracted output lines:\n @lines";

__END__

This works great, and without the need to invoke a child thread.
However, according to the documentation this capability (writing to a
scalar) is only available in Perl 5.8 or later.

I've checked, and the target machine I'm writing code for has Perl
5.8.8 (even though I personally use Perl 5.10). However, I want to
write my code so that it runs on most machines. As to what "most
machines" means is rather tricky; most machines I've run "perl -v" on
(that might need to run my script) reveal that they are running Perl
5.8, and I haven't found one yet running an older version.

So I have a question for all of you: Which of the two above
approaches should I use? The fork()ing approach in the first script
(that runs on Unix and on at least some Windows platforms), or the open
() approach that only runs with Perl 5.8 or later?

I suppose I can combine the two. I can check if the Perl version
is 5.8 or later, and if it is, use the approach that uses open(). If
not, fall back to fork()ing and hope for the best. And unless someone
suggests a simpler way, I can do it like so:

eval { require 5.008 };
if ($@)
{
# use the fork() approach
}
else
{
# use the open() approach
}

So which should I go for? Should I go with the fork(), open(), or
both? Any thoughts are appreciated.

Thanks again for all the help already given.

-- Jean-Luc
 
T

Ted Zlatanov

jpc> I'm trying to write text to a piped writeHandle (created with the
jpc> pipe() function) so that I can later read the text by extracting it
jpc> from a readHandle. However, I discovered that if I write a lot of
jpc> text to the pipe, my script just hangs. Here is an example program:

One way to do this is to write a file name to the pipe. Then the client
just looks at that file for the actual data when the I/O arrives.

Depending on the size and frequency of the data updates, you could also
use a SQLite database file. It supports access from multiple clients so
your client can just SELECT the new data if you have some way of
detecting it (maybe pass the row IDs over the pipe), and the server just
writes (INSERT/UPDATE) the data whenever it wants.

Ted
 
C

C.DeRykus

...
   However, ilovelinux's and Jürgen Exner's responses got me thinking
about a non-pipe() way of doing what I wanted, so I looked at "perldoc
-f open" and read up on writing to a scalar.  I was able to re-write
my code so that it still wrote output to a fileHandle, and that output
ended up in a scalar (and eventually into an array).

   Here is that code:

#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
use 5.008; # for writing-to-scalar support
my $output;
open(my $writeHandle, '>', \$output)
or die "Cannot write to \$output scalar: $!\n";

print $writeHandle "$_\n" foreach 1 .. $number;

   close($writeHandle);

   # Split after all newlines, except for the
   # one at the very end of the $output string:
   split m/(?<=\n)(?!\z)/, $output;

};

print "Extracted output lines:\n @lines";

__END__

   This works great, and without the need to invoke a child thread.
However, according to the documentation this capability (writing to a
scalar) is only available in Perl 5.8 or later.
...

Hm, if there's no IPC involved, can't you simply populate
an array directly...eliminating filehandles, Perl version
worries, and the 'do' statement completely. Did I miss
something else?


my @lines;
push @lines, "$_\n" for 1 .. $number;
print "Extracted output lines:\n @lines";
 
J

jl_post

A pipe doesn't report EOF until there are no handles on it opened for
writing. The parent still has its write handle open, and for all the OS
knows it might be wanting to write to the pipe too.

Makes sense.

You can use IO::Scalar under 5.6. (Indeed, you could simply use
IO::Scalar under all versions of perl: it's a little less efficient and
a little less pretty than the PerlIO-based solution in 5.8, but it will
work just fine.)

That's a great suggestion! To use IO::Scalar in my program, I had
to create two new IO::Scalars: one to write to, and one to read from.
I edited my sample program to be:


#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
use IO::Scalar;

my $output;
my $writeHandle = new IO::Scalar(\$output);

# Populate $output:
print $writeHandle "$_\n" foreach 1 .. $number;
close($writeHandle);

# Populate @lines with the lines in $output:
my $readHandle = new IO::Scalar(\$output);
<$readHandle>
};

print "Extracted output lines:\n @lines";

__END__


For some reason my program wouldn't work with just one IO::Scalar.
Regardless, it works perfectly now, and without the need to fork a new
process.

Thanks again for your excellent response, Ben. Your advice was
very helpful.

-- Jean-Luc
 
J

jl_post

Um, there's no need for this. Just use

split /\n/, $output;

That doesn't do the same thing. Splitting on /\n/ removes the
newlines from the entries, and creates an extra final element that's
an empty string.

I could have used this instead:

split m/(?<=\n)(?!\z)/, $output;

That way the $output is split after each newline, but only if that
newline is not the last character of $output. (All newlines would be
retained with their lines.)

I'm not sure which is faster or more efficient, but I figured I'd
avoid the look-behind and negative look-ahead, and instead use the
(more familiar) diamond operator on a file handle to split out each
line.

Probably you have forgotten that you need to rewind the filehandle after
writing and before reading.

Ah, you're right again. Now I can avoid the second IO::Scalar and
use a seek() call instead:


#!/usr/bin/perl

use strict;
use warnings;

print "Enter a number: ";
my $number = <STDIN>;
chomp($number);

my @lines = do
{
use IO::Scalar;
my $output;
my $handle = new IO::Scalar(\$output);

# Print the lines into the $handle:
print $handle "$_\n" foreach 1 .. $number;

# Now rewind the handle and put its lines into @lines:
seek($handle, 0, 0);
<$handle>
};

print "Extracted output lines:\n @lines";

__END__


Thanks once again, Ben.

-- Jean-Luc
 
J

jl_post

Hm, if there's no IPC involved, can't you simply populate
an array directly...eliminating filehandles, Perl version
worries, and the 'do' statement completely. Did I miss
something else?

I left out a few details, such as the fact that the routine I'm
calling writes to a filehandle and contains over a thousand lines of
code. (The routine is much larger than the original "foreach" loop I
used as an example.) I could go through all the code and change it so
that it pushes its lines onto an array, but then I'd have to change
all the code that calls that routine as well.

Or I could make a copy of that routine and change only that copy,
but then any changes (major and minor) made to the original routine
would have to be made a second time in the new routine. (I'd rather
not maintain two almost identical large routines, if it can be
avoided.)

Of course, I could just hand the routine a write-filehandle to a
temporary file on disk, but since I'd just have to read the file
contents back in, I'd rather just skip that step and avoid the disk I/
O altogether. (Plus, there's no guarantee that the user has
permission to write to a temporary file outside of /tmp.)

Ideally, I would like to be able to write to a filehandle that
didn't require disk I/O. Creating a pipe() accomplishes that, but as
I mentioned before, it requires a fork() process to properly avoid
hanging the program.

The other solutions are to use open() to write to a scalar (which
works, but only on Perl 5.8 and later) and using IO::Scalar (which
should work on Perl 5.6 and later). So that's why I'm currently
sticking with IO::Scalar.

If you know of a better way, let me know. (There may be an obvious
way I'm just not seeing.)

-- Jean-Luc
 
J

John W. Krahn

That doesn't do the same thing. Splitting on /\n/ removes the
newlines from the entries, and creates an extra final element that's
an empty string.

I could have used this instead:

split m/(?<=\n)(?!\z)/, $output;

Or this:

$output =~ /.*\n/g;



John
 
J

jl_post

Pipes have a limited capacity. See http://linux.die.net/man/7/pipe.
Posix prescribes a minimum capacity of 512, which is exactly
implemented on your windows machine:
$ perl -we 'print "$_\n" for 1..155'| wc -c
512


Thanks for the info. Now that I know this, I have a quick
question:

Say I'm writing to a pipe in a child thread, while the parent is
reading from the pipe. If the child writes an extremely long line of
text to the pipe (like 50,000 characters ending in a newline), will
that cause deadlock?

I ask this because the parent generally reads one (newline-
terminated) line at a time when using Perl's diamond operator. If
Perl's pipe capacity is 512 (or 4096), then the child will have
written that capacity well before the parent can read that line (and
therefore before the pipe is cleared).

The good news is that I'm testing this by piping a lot of text with
the line:

print $writeHandle $_ x 10000, "\n" foreach 1 .. $number;

and there doesn't seem to be a problem, despite the fact that the
newline comes well after the 512/4096 pipe buffer limit you informed
me of.

The only explanation I can think of is that Perl itself has to read
the pipe (and therefore clear its buffer) in order to see if a newline
character is in the "pipeline". Maybe in doing so Perl transfers the
text from the pipe's buffer to a Perl internal buffer, effectively
clearing the pipe and preventing deadlock from happening.

But that's my guess. I'd like to know what you (or anybody else)
have to say about it. (Hopefully I made myself clear enough to
understand.)

Thanks for any advice.

-- Jean-Luc
 
J

jl_post

Or this:

         $output =~ /.*\n/g;


Hey, that's clever! I like it!

However, there's a tiny difference: Your way will discard the last
line if $output does not end in a newline character, whereas the first
way will keep the line.

(Of course, this won't be an issue if $output is guaranteed to end
in a newline.)

-- Jean-Luc
 
J

jl_post

When you read from a filehandle using the <> operator,
perl actually reads large chunks and buffers the result, then goes
hunting through the buffer for a newline. If it doesn't find one, it
will keep reading chunks (and extending the buffer) until it does, so
reading really long lines needs lots of buffer space in the perl
process, not lots of room in the pipe.


That makes sense. Thank you.

-- Jean-Luc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top