R
Rich_Elswick
Hi all,
I am parsing a large data sets (62 gigs on one file). I can parse out
the into smaller files fine with perl, which is what we have to do
anyway (i.e. hex data becomes ascii .csv file type of different decoded
variables.) I am working with CAN data for those that know about
Controller Area Networks collected by Vector CANalyzer.
After they are parsed out, I am looking at the largest data file (1
file becomes ~100 smaller files) is about 2 gigs as of right now, but
who knows how large it could become in the future. I then use GDGraph
to parse through the data files and rapidly generate some .png files
for review (I have issues with this as well and will post those
questions some other time.) I run this on the whole batch of 100
files, going through each file one at a time using a batch program to
call the each perl program separately for each GDGraph, because GDGraph
loads the entire data set into memory before graphing the data. This
limits me to using this method on data files smaller than ~20 megs,
based on system memory. I suppose I could up the memory size of the
individual machine, but that is 1. costs money, 2. makes me request it
form IT (not easy), 3. Still doesn't work with a 2 gig file.
I was wondering 2 things.
1. Is there a better way of graphing this data, which uses less memory?
2. What is everyone else out there using?
Please no comments about just sampling the data (once every 5 lines or
something like that) and graphing the sampled data as we have already
considered this and that may be our method of resolving our issues.
Thanks,
Rich Elswick
Test Engineer
Cobasys LLC
http://www.cobasys.com
I am parsing a large data sets (62 gigs on one file). I can parse out
the into smaller files fine with perl, which is what we have to do
anyway (i.e. hex data becomes ascii .csv file type of different decoded
variables.) I am working with CAN data for those that know about
Controller Area Networks collected by Vector CANalyzer.
After they are parsed out, I am looking at the largest data file (1
file becomes ~100 smaller files) is about 2 gigs as of right now, but
who knows how large it could become in the future. I then use GDGraph
to parse through the data files and rapidly generate some .png files
for review (I have issues with this as well and will post those
questions some other time.) I run this on the whole batch of 100
files, going through each file one at a time using a batch program to
call the each perl program separately for each GDGraph, because GDGraph
loads the entire data set into memory before graphing the data. This
limits me to using this method on data files smaller than ~20 megs,
based on system memory. I suppose I could up the memory size of the
individual machine, but that is 1. costs money, 2. makes me request it
form IT (not easy), 3. Still doesn't work with a 2 gig file.
I was wondering 2 things.
1. Is there a better way of graphing this data, which uses less memory?
2. What is everyone else out there using?
Please no comments about just sampling the data (once every 5 lines or
something like that) and graphing the sampled data as we have already
considered this and that may be our method of resolving our issues.
Thanks,
Rich Elswick
Test Engineer
Cobasys LLC
http://www.cobasys.com