Qustion about struct.unpack

OhKyu Yoon · Apr 30, 2007

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))
# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.
I would like it to run faster.
Do you have any suggestions?
Thank you very much.

OhKyu

Steven D'Aprano · Apr 30, 2007

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

I assume that is supposed to be infile.read()

# do something

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset

ffset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8

ffset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

eC · May 1, 2007

Hi!
I have a really long binary file that I want to read.
The way I am doing it now is:

Click to expand...

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Click to expand...

I assume that is supposed to be infile.read()

# do something

Click to expand...

Each loop takes about 0.2 ms in my computer, which means the whole for loop
takes 2000 seconds.

Click to expand...

You're reading 400 million bytes, or 400MB, in about half an hour. Whether
that's fast or slow depends on what the "do something" lines are doing.

I would like it to run faster.
Do you have any suggestions?

Click to expand...

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offsetffset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8ffset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

I think the file.read() already buffers reads...

Gabriel Genellina · May 1, 2007

En Tue said:
I have a really long binary file that I want to read.
The way I am doing it now is:

Click to expand...

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Click to expand...

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offsetffset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8ffset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

Click to expand...

I think the file.read() already buffers reads...

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,os

fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()

def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offset

ffset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8

ffset+rsize])
f.close()

print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---

Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

OhKyu Yoon · May 1, 2007

Wow, thank you all!

Gabriel Genellina said:
En Tue said:

On Mon, 30 Apr 2007 00:45:22 -0700, OhKyu Yoon wrote:
I have a really long binary file that I want to read.
The way I am doing it now is:

for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', infile.read(8))
# do something
tdc = struct.unpack('=LiLiLiLi',self.lmf.read(32))

Disk I/O is slow, so don't read from files in tiny little chunks. Read a
bunch of records into memory, then process them.

# UNTESTED!
rsize = 8 + 32 # record size
for i in xrange(N//1000):
buffer = infile.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offsetffset+8])
# do something
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8ffset+rsize])
# do something

(Now I'm just waiting for somebody to tell me that file.read() already
buffers reads...)

Click to expand...

I think the file.read() already buffers reads...

Click to expand...

Now we need someone to actually measure it, to confirm the expected
behavior... Done.

--- begin code ---
import struct,timeit,os

fn = r"c:\temp\delete.me"
fsize = 1000000
if not os.path.isfile(fn):
f = open(fn, "wb")
f.write("\0" * fsize)
f.close()
os.system("sync")

def smallreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N): # N is about 10,000,000
time = struct.unpack('=HHHH', f.read(8))
tdc = struct.unpack('=LiLiLiLi', f.read(32))
f.close()

def bigreads(fn):
rsize = 40
N = fsize // rsize
f = open(fn, "rb")
for i in xrange(N//1000):
buffer = f.read(rsize*1000) # read 1000 records at once
for j in xrange(1000): # process each record
offset = j*rsize
time = struct.unpack('=HHHH', buffer[offsetffset+8])
tdc = struct.unpack('=LiLiLiLi', buffer[offset+8ffset+rsize])
f.close()

print "smallreads", timeit.Timer("smallreads(fn)","from __main__ import
fn,smallreads,fsize").repeat(3,1)
print "bigreads", timeit.Timer("bigreads(fn)", "from __main__ import
fn,bigreads,fsize").repeat(3,1)
--- end code ---

Output:
smallreads [4.2534193777646663, 4.126013885559789, 4.2389176672125458]
bigreads [1.2897319939456011, 1.3076018578892405, 1.2703250635695138]

So in this sample case, reading in big chunks is about 3 times faster than
reading many tiny pieces.

how to build a dict including a large number of data	3	Jan 4, 2008
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Code with random module faster on the vm than the vm host...	4	Nov 8, 2013
introduction and first question about multithreading	1	Jul 11, 2012
Help with my responsive home page	2	Dec 14, 2022
lest talk a litle more about directories	25	Jul 26, 2013
Improving the web page download code.	5	Aug 27, 2013
Long rant about Python in Education	2	Aug 12, 2010

Qustion about struct.unpack

OhKyu Yoon

Steven D'Aprano

eC

Gabriel Genellina

OhKyu Yoon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads