P
psaffrey
I'm using the CSV library to process a large amount of data - 28
files, each of 130MB. Just reading in the data from one file and
filing it into very simple data structures (numpy arrays and a
cstringio) takes around 10 seconds. If I just slurp one file into a
string, it only takes about a second, so I/O is not the bottleneck. Is
it really taking 9 seconds just to split the lines and set the
variables?
Is there some way I can improve the CSV performance? Is there a way I
can slurp the file into memory and read it like a file from there?
Peter
files, each of 130MB. Just reading in the data from one file and
filing it into very simple data structures (numpy arrays and a
cstringio) takes around 10 seconds. If I just slurp one file into a
string, it only takes about a second, so I/O is not the bottleneck. Is
it really taking 9 seconds just to split the lines and set the
variables?
Is there some way I can improve the CSV performance? Is there a way I
can slurp the file into memory and read it like a file from there?
Peter