Evading the limit of a pipe's standard input

A

Ashley Moran

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a while.
I've tried a load of tricks, even putting the lines in an array, eg:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Thanks
Ashley
 
G

Gennady Bystritsky

-----Original Message-----
From: Ashley Moran [mailto:[email protected]]=20
Sent: Tuesday, August 29, 2006 12:17 PM
To: ruby-talk ML
Subject: Evading the limit of a pipe's standard input
=20
Hi
=20
I'm trying to write a tool that generates a really long SQL script =20
and passes it to psql, but I've hit a problem because my script is =20
too long to be sent to the stdin of psql. This is the first time =20
I've run into the limit so it had me scratching my head for a=20
while. =20
I've tried a load of tricks, even putting the lines in an array, eg:
=20
MAX_SUCCESSFUL_TIMES =3D 3047
=20
query =3D ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"
=20
IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end
=20
but it still fails when the total length of commands exceeds the =20
limit (which by experiment I've found to be 128K on Mac OS X, hence =20
the specific number of times above).
=20
What's the best solution to this. I would like to stick to inter-=20
process communication, and avoid temporary files and rewriting it to =20
use DBD, if possible. Or are they my only options?

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.
 
S

snacktime

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

It's not, psql is commonly used to restore whole databases by feeding
it data from stdin.
 
K

Ken Bloom

Hi

I'm trying to write a tool that generates a really long SQL script
and passes it to psql, but I've hit a problem because my script is
too long to be sent to the stdin of psql. This is the first time
I've run into the limit so it had me scratching my head for a while.
I've tried a load of tricks, even putting the lines in an array, eg:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
query.each { |statement| pipe.puts statement }
end

but it still fails when the total length of commands exceeds the
limit (which by experiment I've found to be 128K on Mac OS X, hence
the specific number of times above).

What's the best solution to this. I would like to stick to inter-
process communication, and avoid temporary files and rewriting it to
use DBD, if possible. Or are they my only options?

Rewriting to DBI is not very difficult, so unless you have a reason other
than not wanting DBI calls to clutter your code or not wanting to
massively restructure your code, the following should work:

class DBI::DatabaseHandle
#Takes a whole SQL script in a string
#and executes it on the database.
def batch(sql)
sql=sql.split(";").delete_at(-1)
sql.each{ |statement| self.do(statement) }
end
end

in the sample code you gave above,
query.each {|statement|dbh.do(statement)}
should also work just fine.

Is it possible with the pipe solution that you need to read back the
output data so that pgsql doesn't back up its pipe buffer as Grennady
Bystritsky has suggested?
 
A

Ashley Moran

Make sure the command you spawn in IO.popen is actually reading out
stuff from its stdin in parallel. The limit a pipe have on the system
level is the amount of the unread data it can hold. When the data is
read out, it makes room for more data. However, if your reader is stuck
for some reason, the writer will wait too.

If you say that psql accepts only limited amount of data from its stdin,
it is more of its problem, rather than pipe's. To check it, try to feed
the big file to psql via redirection, like in:

psql -U test test < "your big file"

Hope I am not too off from what you meant,
Gennady.

Hi Gennady

I was not very clear in my e-mail, I didn't think psql was limited in the size
of file it can read. You were right it was just the buffer filling up. I
had actually tried reading from the buffer but not in parallel. This seems
to work:

MAX_SUCCESSFUL_TIMES = 3047

query = ["BEGIN WORK;"]
(MAX_SUCCESSFUL_TIMES + 1).times do
query << "INSERT INTO people (name) VALUES ('Fred');"
end
query << "COMMIT;"

IO.popen("psql -U test test","r+") do |pipe|
Thread.new { loop { pipe.read } } # fixes it
query.each { |statement| pipe.puts statement }
end

I assume pipe.read blocks, so the thread doesn't turn the CPU into a heating
element.

Cheers
Ashley
 
A

Ashley Moran

=A0 =A0IO.popen("psql -U test test","r+") do |pipe|
=A0 =A0 =A0Thread.new { loop { pipe.read } } =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0# fixes it
=A0 =A0 =A0query.each { |statement| pipe.puts statement }
=A0 =A0end


Actually I lied... I need a "sleep 2" at the end of the IO.popen block for=
=20
this to work. Is there a less magical way to sync it? (I took a stab at=20
Process.wait but it doesn't work.)

Ashley

=2D-=20
"If you do it the stupid way, you will have to do it again"
- Gregory Chudnovsky
 
K

Ken Bloom

Actually I lied... I need a "sleep 2" at the end of the IO.popen block for
this to work. Is there a less magical way to sync it? (I took a stab at
Process.wait but it doesn't work.)

Ashley

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test >
/dev/null","r+")

--Ken
 
A

Ashley Moran

Since you're throwing away the output, instead of using a thread, why try
one of the following:
* popen with the mode "w"
IO.popen("psql -U test test","w")
* redirect the results from pgsql to /dev/null inside the popen call
IO.popen("psql -U test test > /dev/null","r+")

--Ken


Thanks Ken the last one was what I was looking for. I forgot to say
that 'IO.popen("psql -U test test","w")' was actually one of the first things
I tried, but it dumps the (several thousand lines of) psql results back into
the shell, which renders the output of the script useless.

Ashley
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,211
Messages
2,571,092
Members
47,693
Latest member
david4523

Latest Threads

Top