How to do this in Python?

A

Armin

That looks more like C than pseudocode to me...
Someone's been spending far too much time on C-like languages, if that's
what your idea of simply readable code looks like. Thank heavens you
found Python before it was too late!

I should agree, that looks too much like C. (except there are no ; at the end
of first two lines). And I'm sure you will much enjoy your adventure as a
pythonista (pythanista?) just as I have after migration from C++.
 
J

Jim Garrison

I'm an experienced C/Java/Perl developer learning Python.

What's the canonical Python way of implementing this pseudocode?

String buf
File f
while ((buf=f.read(10000)).length() > 0)
{
do something....
}

In other words, I want to read a potentially large file in 10000 byte
chunks (or some other suitably large chunk size). Since the Python
'file' object implements __next__() only in terms of lines (even,
it seems, for files opened in binary mode) I can't see how to use
the Python for statement in this context.

Am I missing something basic, or is this the canonical way:

with open(filename,"rb") as f:
buf = f.read(10000)
while len(buf) > 0
# do something....
buf = f.read(10000)
 
J

Josh Holland

What's the canonical Python way of implementing this pseudocode?

String buf
File f
while ((buf=f.read(10000)).length() > 0)
{
do something....
}

That looks more like C than pseudocode to me...
Someone's been spending far too much time on C-like languages, if that's
what your idea of simply readable code looks like. Thank heavens you
found Python before it was too late!
 
M

Matthew Woodcraft

Jim Garrison said:
buf = f.read(10000)
while len(buf) > 0
# do something....
buf = f.read(10000)

I think it's more usual to use a 'break' rather than duplicate the read.

That is, something more like

while True:
buf = f.read(10000)
if len(buf) == 0:
break
# do something

-M-
 
T

Tim Chase

Am I missing something basic, or is this the canonical way:
with open(filename,"rb") as f:
buf = f.read(10000)
while len(buf) > 0
# do something....
buf = f.read(10000)

That will certainly do. Since read() should simply return a
0-length string when you're sucking air, you can just use the
test "while buf" instead of "while len(buf) > 0".

However, if you use it multiple places, you might consider
writing an iterator/generator you can reuse:

def chunk_file(fp, chunksize=10000):
s = fp.read(chunksize)
while s:
yield s
s = fp.read(chunksize)

with open(filename1, 'rb') as f:
for portion in chunk_file(f):
do_something_with(portion)

with open(filename2, 'rb') as f:
for portion in chunk_file(f, 1024):
do_something_with(portion)

-tkc
 
L

Luis Zarrabeitia

Am I missing something basic, or is this the canonical way:

with open(filename,"rb") as f:
buf = f.read(10000)
while len(buf) > 0
# do something....
buf = f.read(10000)

well, a bit more canonical would be:
...
while buf:
# do something
...
instead of comparing len(buf) with 0. But that's a minor detail.

One could use this:

with open(filename, "rb") as f:
for buf in iter(lambda: f.read(1000),''):
do_something(buff)

but I don't really like a lambda in there. I guess one could use
functools.partial instead, but it still looks ugly to me. Oh, well, I guess I
also want to see the canonical way of doing it.
 
J

Jim Garrison

Tim said:
That will certainly do. Since read() should simply return a 0-length
string when you're sucking air, you can just use the test "while buf"
instead of "while len(buf) > 0".

However, if you use it multiple places, you might consider writing an
iterator/generator you can reuse:

def chunk_file(fp, chunksize=10000):
s = fp.read(chunksize)
while s:
yield s
s = fp.read(chunksize)

with open(filename1, 'rb') as f:
for portion in chunk_file(f):
do_something_with(portion)

with open(filename2, 'rb') as f:
for portion in chunk_file(f, 1024):
do_something_with(portion)

-tkc

Ah. That's the Pythonesque way I was looking for. I knew
it would be a generator/iterator but haven't got the Python
mindset down yet and haven't played with writing my own
generator. I'm still trying to think in purely object-
oriented terms where I would override __next__() to
return a chunk of the appropriate size.

Give a man some code and you solve his immediate problem.
Show him a pattern and you've empowered him to solve
his own problems. Thanks!
 
J

Jim Garrison

andrew said:
Jim said:
I'm an experienced C/Java/Perl developer learning Python.

What's the canonical Python way of implementing this pseudocode?
[snip]


embarrassed by the other reply i have read,

There's always some "trollish" behavior in any comp.lang.*
group. Too many people treat languages as religions instead
of tools. They all have strengths and weaknesses :)
but not doing much binary i/o
myself, i suggest:

with open(...) as f:
while (True):
buf = f.read(10000)
if not buf: break
...

but are you sure you don't want:

with open(...) as f:
for line in f:
...

andrew

For a one-off,,your first example would work fine. See the
other reply from Tim Chase for a much more Pythonesque
pattern. I don't want "for line in f:" because binary
files don't necessarily have lines and I'm bulk processing
files potentially 100MB and larger. Reading them one line
at a time would be highly inefficient.

Thanks
 
T

Terry Reedy

Jim said:
Ah. That's the Pythonesque way I was looking for. I knew
it would be a generator/iterator but haven't got the Python
mindset down yet and haven't played with writing my own
generator. I'm still trying to think in purely object-
oriented terms where I would override __next__() to
return a chunk of the appropriate size.

Give a man some code and you solve his immediate problem.
Show him a pattern and you've empowered him to solve
his own problems. Thanks!

Python's iterator-fed for-loops are its primary motor for calculation.
Anytime one thinks of processing successive items with a while-loop, one
could consider factoring out the production of the successive items with
an iterator. While loops are really only needed for repeated processing
of a single object.

tjr
 
B

bieffe62

 I don't want "for line in f:" because binary
files don't necessarily have lines and I'm bulk processing
files potentially 100MB and larger.  Reading them one line
at a time would be highly inefficient.

Thanks- Hide quoted text -

- Show quoted text -

For what I know, there are at least two levels of cache between your
application
and the actual file: python interpreter caches its reads, and the
operating system
does that too. So if you are worried about reading efficiently the
file, I think you can stop
worry. Instead, if you are processing files which might not have line
termination at all,
then reading in blocks is the right thing to do.

Ciao
 
H

Hrvoje Niksic

Luis Zarrabeitia said:
One could use this:

with open(filename, "rb") as f:
for buf in iter(lambda: f.read(1000),''):
do_something(buff)

This is by far the most pythonic solution, it uses the standard 'for'
loop, and clearly marks the sentinel value. lambda may look strange
at first, but this kind of thing is exactly what lambdas are for.
Judging by the other responses, it would seem that few people are
aware of the two-argument 'iter'.
but I don't really like a lambda in there. I guess one could use
functools.partial instead, but it still looks ugly to me. Oh, well,
I guess I also want to see the canonical way of doing it.

I believe you've just written it.
 
T

Tim Chase

def chunk_file(fp, chunksize=10000):
That's not pythonic unless you really do need to use
chumk_file() in a lot of places (IMO, more than 3 or 4). If it
only going to be used once, then just do the usual thing:

Different strokes for different folks -- my reuse threshold tends
towards "more than once". So even a mere 2 copies of the same
pattern would warrant refactoring out this idiom.

Thanks also to those in the thread that have modeled the new
next() sentinel syntax -- nifty.

-tkc
 
U

Ulrich Eckhardt

Grant said:
with open(filename,"rb") as f:
while True:
buf = f.read(10000)
if not buf: break
# do something

The pattern

with foo() as bar:
# do something with bar

is equivalent to

bar = foo()
if bar:
# do something with bar

except for the calls to __enter__ and __exit__, right? What I was wondering
was whether a similar construct was considered for a while loop or even an
if clause, because then the above could be written like this:

if open(filename, 'rb') as f:
while f.read(1000) as buf:
# do something with 'buf'

Uli
 
J

Jim Garrison

Luis said:
for buf in iter(lambda: f.read(1000),''):
do_something(buff)

This is the most pythonic solution yet.

Thanks to all the responders who took time to ponder this seemingly
trivial question. I learned a lot about the Python mind-set.
 
M

Mel

Jim said:
andrew said:
Jim said:
I'm an experienced C/Java/Perl developer learning Python.
What's the canonical Python way of implementing this pseudocode?
[ ... ]
but not doing much binary i/o
myself, i suggest:

with open(...) as f:
while (True):
buf = f.read(10000)
if not buf: break
...
[ ... ]
For a one-off,,your first example would work fine. See the
other reply from Tim Chase for a much more Pythonesque
pattern. I don't want "for line in f:" because binary
files don't necessarily have lines and I'm bulk processing
files potentially 100MB and larger. Reading them one line
at a time would be highly inefficient.

It would be more work, but subclassing the file class, with a next method
yielding the binary record you want would be fairly clean.

Mel.
 
J

Jim Garrison

Jim said:
This is the most pythonic solution yet.

Thanks to all the responders who took time to ponder this seemingly
trivial question. I learned a lot about the Python mind-set.

I just tried the code as given above and it results in an infinite loop.

Since f.read() returns a byte string when in binary mode, the sentinel
has to be b''. Is there a value that will compare equal to both '' and b''?

It's a shame the iter(o,sentinel) builtin does the
comparison itself, instead of being defined as iter(callable,callable)
where the second argument implements the termination test and returns a
boolean. This would seem to add much more generality... is
it worthy of a PEP?
 
A

Andrii V. Mishkovskyi

I just tried the code as given above and it results in an infinite loop.

Since f.read() returns a byte string when in binary mode, the sentinel
has to be b''.  Is there a value that will compare equal to both '' and b''?

It's a shame the iter(o,sentinel) builtin does the
comparison itself, instead of being defined as iter(callable,callable)
where the second argument implements the termination test and returns a
boolean.  This would seem to add much more generality... is
it worthy of a PEP?

Just before you start writing a PEP, take a look at `takewhile'
function in `itertools' module. ;)
 
S

S Arrowsmith

Jim Garrison said:
It's a shame the iter(o,sentinel) builtin does the
comparison itself, instead of being defined as iter(callable,callable)
where the second argument implements the termination test and returns a
boolean. This would seem to add much more generality... is
it worthy of a PEP?

class sentinel:
def __eq__(self, other):
return termination_test()

for x in iter(callable, sentinel()):
...

Writing a sensible sentinel.__init__ is left as an exercise....
 
J

Jim Garrison

Andrii said:
Just before you start writing a PEP, take a look at `takewhile'
function in `itertools' module. ;)

OK, after reading the itertools docs I'm not sure how to use it
in this context. takewhile() requires a sequence, and turning
f.read(bufsize) into an iterable requires iter() (no?) which
wants to do its own termination testing. The following kludge
would subvert iter()'s termination testing but this is starting
to look Perlishly byzantine.

with open(filename, "rb") as f:
for buf in itertools.takewhile( \
lambda b:b, \
iter(lambda: f.read(1000),None)):
do_something(buf)

As opposed to

with open(filename, "rb") as f:
for buf in iter(lambda: f.read(1000), lambda b:b)):
do_something(buf)

where iter(callable,callable) is defined to

1) call the first argument
2) pass the returned value to the second argument
3) yield the first result and continue if the return value
from the second call is True, or terminate if False
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,294
Messages
2,571,511
Members
48,203
Latest member
LillianaFr

Latest Threads

Top