Evading the limit of a pipe's standard input

Ashley Moran · Aug 29, 2006

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a while.
I've tried a load of tricks, even putting the lines in an array, eg:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Thanks
Ashley

Gennady Bystritsky · Aug 29, 2006

-----Original Message-----

From: Ashley Moran [mailto:[email protected]]=20
Sent: Tuesday, August 29, 2006 12:17 PM
To: ruby-talk ML
Subject: Evading the limit of a pipe's standard input
=20
Hi
=20
I'm trying to write a tool that generates a really long SQL script =20
and passes it to psql, but I've hit a problem because my script is =20
too long to be sent to the stdin of psql. This is the first time =20
I've run into the limit so it had me scratching my head for a=20
while. =20
I've tried a load of tricks, even putting the lines in an array, eg:
=20
MAX_SUCCESSFUL_TIMES =3D 3047
=20
query =3D ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"
=20
IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end
=20
but it still fails when the total length of commands exceeds the =20
limit (which by experiment I've found to be 128K on Mac OS X, hence =20
the specific number of times above).
=20
What's the best solution to this. I would like to stick to inter-=20
process communication, and avoid temporary files and rewriting it to =20
use DBD, if possible. Or are they my only options?

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.

snacktime · Aug 29, 2006

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

It's not, psql is commonly used to restore whole databases by feeding
it data from stdin.

Ken Bloom · Aug 30, 2006

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a while.
I've tried a load of tricks, even putting the lines in an array, eg:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Rewriting to DBI is not very difficult, so unless you have a reason other
than not wanting DBI calls to clutter your code or not wanting to
massively restructure your code, the following should work:

class DBI:

atabaseHandle
#Takes a whole SQL script in a string
#and executes it on the database.
def batch(sql)
sql=sql.split(";").delete_at(-1)
sql.each{ |statement| self.do(statement) }
end
end

in the sample code you gave above,
query.each {|statement|dbh.do(statement)}
should also work just fine.

Is it possible with the pipe solution that you need to read back the
output data so that pgsql doesn't back up its pipe buffer as Grennady
Bystritsky has suggested?

Ashley Moran · Aug 30, 2006

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.

Hi Gennady

I was not very clear in my e-mail, I didn't think psql was limited in the size
of file it can read. You were right it was just the buffer filling up. I
had actually tried reading from the buffer but not in parallel. This seems
to work:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

I assume pipe.read blocks, so the thread doesn't turn the CPU into a heating
element.

Cheers
Ashley

Ashley Moran · Aug 30, 2006

=A0 =A0IO.popen("psql -U test test","r+") do |pipe|
=A0 =A0 =A0Thread.new { loop { pipe.read } } =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0# fixes it
=A0 =A0 =A0query.each { |statement| pipe.puts statement }
=A0 =A0end

Actually I lied... I need a "sleep 2" at the end of the IO.popen block for=
=20
this to work. Is there a less magical way to sync it? (I took a stab at=20
Process.wait but it doesn't work.)

Ashley

=2D-=20
"If you do it the stupid way, you will have to do it again"
- Gregory Chudnovsky

Ken Bloom · Aug 30, 2006

Actually I lied... I need a "sleep 2" at the end of the IO.popen block for
this to work. Is there a less magical way to sync it? (I took a stab at
Process.wait but it doesn't work.)

Ashley

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test >
/dev/null","r+")

--Ken

Ashley Moran · Aug 30, 2006

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test > /dev/null","r+")

--Ken

Thanks Ken the last one was what I was looking for. I forgot to say
that 'IO.popen("psql -U test test","w")' was actually one of the first things
I tried, but it dumps the (several thousand lines of) psql results back into
the shell, which renders the output of the script useless.

Ashley

How to limit the number of web pages downloaded from a site?	53	Aug 8, 2008
Pushing the limit of PHP vs ASP.NET	9	May 14, 2004
Puzzled about the output of my demo of a proof of The Euler Series	0	Aug 12, 2011
C and the future of computing	0	Apr 1, 2011
Testing for the presence of input from stdin.	7	Jan 23, 2006
A Brief Review of jQuery 1.5	13	Feb 13, 2011
A crosspost from the Perl Community	54	Jun 4, 2008
How to resume execution of a FOR loop after timeout error	3	Oct 14, 2009

Evading the limit of a pipe's standard input

Ashley Moran

Gennady Bystritsky

snacktime

Ken Bloom

Ashley Moran

Ashley Moran

Ken Bloom

Ashley Moran

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads