M
Mag Gam
Hello All,
I am very new to python and I am in the process of loading a very
large compressed csv file into another format. I was wondering if I
can do this in a multi thread approach.
Here is the pseudo code I was thinking about:
Let T = Total number of lines in a file, Example 1000000 (1 million files)
Let B = Total number of lines in a buffer, for example 10000 lines
Create a thread to read until buffer
Create another thread to read buffer+buffer ( So we have 2 threads
now. But since the file is zipped I have to wait until the first
thread is completed. Unless someone knows of a clever technique.
Write the content of thread 1 into a numpy array
Write the content of thread 2 into a numpy array
But I don't think we are capable of multiprocessing tasks for this....
Any ideas? Has anyone ever tackled a problem like this before?
I am very new to python and I am in the process of loading a very
large compressed csv file into another format. I was wondering if I
can do this in a multi thread approach.
Here is the pseudo code I was thinking about:
Let T = Total number of lines in a file, Example 1000000 (1 million files)
Let B = Total number of lines in a buffer, for example 10000 lines
Create a thread to read until buffer
Create another thread to read buffer+buffer ( So we have 2 threads
now. But since the file is zipped I have to wait until the first
thread is completed. Unless someone knows of a clever technique.
Write the content of thread 1 into a numpy array
Write the content of thread 2 into a numpy array
But I don't think we are capable of multiprocessing tasks for this....
Any ideas? Has anyone ever tackled a problem like this before?