H
Helge Elvik
Hi,
I'm currently implementing a database application in Ruby, and I use
PostgreSQL for the data store. At first I tried using DBI to make the
data store as platform/database independent as possible, but when I came
back to work this morning I found that my computer was starved for
virtual memory, and that Ruby was using about 700MBs of RAM and rising.
What I found out is that DBI slurp reads the whole result set into
memory before it sends it back from the library. I poked around in the
code, but found no way to instead return one-and-one result serially as
the server sends them back.
So I poked around in the documentation for the postgres-library, but
sadly with no luck there either. No method there accepts a block so that
you can just receive the results serially. I also had a quick poke at
postgres-pr, but the situation seems to be the same there.
Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I'm having trouble with currently has about 2 million
rows, so reading everything into RAM is absolutely impossible. I'm
looking for something like this:
db.execute("select * from huge_table").each_row do |row|
# process row here
end
PS: Reading these 2 million rows is not a normal day-to-day operation,
but I do this when I normalize some data from a CSV file. Basically I
restructure the data into several tables, and for now I've mostly done
this with "CREATE TABLE t AS SELECT". I've found that I need to compute
some of the columns with a Ruby-function though, so I basically need to
loop through them all.
Regards,
Helge Elvik
I'm currently implementing a database application in Ruby, and I use
PostgreSQL for the data store. At first I tried using DBI to make the
data store as platform/database independent as possible, but when I came
back to work this morning I found that my computer was starved for
virtual memory, and that Ruby was using about 700MBs of RAM and rising.
What I found out is that DBI slurp reads the whole result set into
memory before it sends it back from the library. I poked around in the
code, but found no way to instead return one-and-one result serially as
the server sends them back.
So I poked around in the documentation for the postgres-library, but
sadly with no luck there either. No method there accepts a block so that
you can just receive the results serially. I also had a quick poke at
postgres-pr, but the situation seems to be the same there.
Am I missing something, or is the really now way of receiving the
results serially without the libraries slurping everything up in RAM
first? The table I'm having trouble with currently has about 2 million
rows, so reading everything into RAM is absolutely impossible. I'm
looking for something like this:
db.execute("select * from huge_table").each_row do |row|
# process row here
end
PS: Reading these 2 million rows is not a normal day-to-day operation,
but I do this when I normalize some data from a CSV file. Basically I
restructure the data into several tables, and for now I've mostly done
this with "CREATE TABLE t AS SELECT". I've found that I need to compute
some of the columns with a Ruby-function though, so I basically need to
loop through them all.
Regards,
Helge Elvik