E
ela
I have some large data in pieces, e.g.
asia.gz.tar 300M
or
roads1.gz.tar 100M
roads2.gz.tar 100M
roads3.gz.tar 100M
roads4.gz.tar 100M
I wonder whether I should concatenate them all into a single ultra large
file and then perform parsing them into a large table (I don't know whether
perl can handle that...).
The final table should look like this:
ID1 ID2 INFO
X1 Y9 san diego; california; West Coast; America; North Ameria; Earth
X2.3 H9 Beijing; China; Asia
.....
each row may come from a big file of >100M (as aforementioned):
CITY Beijing
NOTE Capital
RACE Chinese
....
And then I have another much smaller table which contains all the ID's
(either ID1 or ID2, maybe 100,000 records, <20M). and I just need to make
this 20M file annotated with the INFO. Hashing seems not to be a solution
for my 32G, 8-core machine...
Any advice? or should i resort to some other languages?
asia.gz.tar 300M
or
roads1.gz.tar 100M
roads2.gz.tar 100M
roads3.gz.tar 100M
roads4.gz.tar 100M
I wonder whether I should concatenate them all into a single ultra large
file and then perform parsing them into a large table (I don't know whether
perl can handle that...).
The final table should look like this:
ID1 ID2 INFO
X1 Y9 san diego; california; West Coast; America; North Ameria; Earth
X2.3 H9 Beijing; China; Asia
.....
each row may come from a big file of >100M (as aforementioned):
CITY Beijing
NOTE Capital
RACE Chinese
....
And then I have another much smaller table which contains all the ID's
(either ID1 or ID2, maybe 100,000 records, <20M). and I just need to make
this 20M file annotated with the INFO. Hashing seems not to be a solution
for my 32G, 8-core machine...
Any advice? or should i resort to some other languages?