W
William James
Michael said:M. Edward (Ed) Borasky said:Michael Linfield wrote:
### this sadly only returned an output of => []
any ideas?
Thanks!
OK ... first of all, define "huge" and what are your restrictions? Let
me assume the worst case just to get started -- more than 256 columns
and more than 65536 rows and you're on Windows.![]()
Seriously, though, if this is a *recurring* use case rather than a
one-shot "somebody gave me this *$&%^# file and wants an answer by 5 PM
tonight!" use case, I'd load it into a database (assuming your database
doesn't have a column count limitation larger than the column count in
your file, that is) and then hook up to it with DBI. But if it's a
one-shot deal and you've got a command line handy (Linux, MacOS, BSD or
Cygwin) just do "grep blah1 huge-file.csv > temp-file.csv". Bonus points
for being able to write that in Ruby and get it debugged before someone
who's been doing command-line for years types that one-liner in.
lol, alright lets say the senario will be in the range of 20k - 70k
lines of data. no more than 20 columns
and i wanna avoid using command line to do this, because yes in fact
this will be used to process more than one datafile which i hope to
setup in optparse to have a command line arg that directs the prog to
the file. also i wanted to for the meantime not have to throw it on any
database...avoiding DBI for the meanwhile. But an idea flew through my
head a few minutes ago....what if i did this --res = []
res << File.readlines('filename.csv').grep(/Blah1/) #thanks chris
There's a problem with using File.readlines that I don't think anyone's
mentioned yet. I don't know if it's relevant to your dataset, but CSV
fields are allowed to contain newlines if the field is quoted. For
example, this single CSV row will break your process:
1,2,"foo
Blah1",bar
I think that this can be handled easily by this approach:
to extract a record from the csv file, continue reading lines
until the number of double quotes in the record is even.
Something like
record = ""
begin
record << gets.chomp
end until record.count( '"' ) % 2 == 0