Reading from file up to arbitrary byte.

D

David M. Wilson

Hello!

Is there a simple way of reading from a file object up to a specific
byte value? I would like to do this without reading one character at a
time, or reading in chunks and holding a remainder over.

Failing that, as a lightheartedly proposed extension, what do people
think of:

my_file.readline(line_terminator = "\x7f")
or
my_file.line_terminator = "\x7f"


Does anyone know of a publically available subclass of file that would
allow such, or similar behaviour?

Thanks,


David.
 
P

Peter Hansen

David M. Wilson said:
Is there a simple way of reading from a file object up to a specific
byte value? I would like to do this without reading one character at a
time, or reading in chunks and holding a remainder over.

I'd say the simple way to do it would be to read in chunks and hold a
remainder over. It's not complicated: why don't you like that way?

Or even simpler: just use read()[:chunksize] and not worry about the
fact that you're reading all the data and throwing some away.
Performance-wise, this probably beats the pants off most alternatives,
if performance is what concerns you, and unless your file is really
big and chunksize is small, who cares about the memory that is wasted
for a few microseconds?

It might also help respondents if you describe the reason for wanting
to read the first part of the file like that. Maybe there's a more
suitable approach.

-Peter
 
D

David M. Wilson

Peter said:
I'd say the simple way to do it would be to read in chunks and hold a
remainder over. It's not complicated: why don't you like that way?
Or even simpler: just use read()[:chunksize] and not worry about the
fact that you're reading all the data and throwing some away.
Performance-wise, this probably beats the pants off most alternatives,
if performance is what concerns you, and unless your file is really
big and chunksize is small, who cares about the memory that is wasted
for a few microseconds?
It might also help respondents if you describe the reason for wanting
to read the first part of the file like that. Maybe there's a more
suitable approach.

Hi Peter, thanks for your reply.

I wanted to avoid keeping a remainder as I would have thought the
underlying implementation would have to do this anyway when doing
readline(). The tool I am working on reads the UK Postal Address File
(1.5gb of data), to be deployed on a small 800mhz VIA C3 server.

I have created a minimalist module for reading in the tabular data, in a
way that is as close to 'wire speed' as possible. Previously I have used
the Python 2.3 CSV module, and a C implementation of a CSV reader I
found on the web, however the data set I am dealing with has a very
basic structure, and I found the two CSV modules overly complicated for
the task.

I failed to produce something that is clean, but it does exactly what it
says on the tin and that's all I need. If you care for a nosey:

http://botanicus.net/dw/IDTDR.py.txt


David.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,965
Members
47,513
Latest member
JeremyLabo

Latest Threads

Top