looping over a big file

martian · Jul 3, 2005

Hi,

I've a couple of questions regarding the processing of a big text file
(16MB).

1) how does python handle:

for line in big_file:

is big_file all read into memory or one line is read at a time or a buffer
is used or ...?

2) is it possible to advance lines within the loop? The following doesn't
work:

for line in big_file:

line_after = big_file.readline()

the function readline (file pointer) is "out of sync" with the loop (and
this suggests bug_file is not read one line at a time in the loop).

Thanks,
Fernando Martins

Roy Smith · Jul 3, 2005

martian said:
1) how does python handle:

is big_file all read into memory or one line is read at a time or a buffer
is used or ...?

The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).

2) is it possible to advance lines within the loop? The following doesn't
work:

line_after = big_file.readline()

You probably want something like:

for line in file ("filename"):
if skipThisLine:
continue

Mike Meyer · Jul 3, 2005

Roy Smith said:
The "right" way to do this is:

for line in file ("filename"):
whatever

The file object returned by file() acts as an iterator. Each time through
the loop, another line is read and returned (I'm sure there is some
block-level buffering going on at a low level).

I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

<mike

Michael Hoffman · Jul 4, 2005

Mike said:
I disagree. That's the *convenient* way to do it, and perfectly
acceptable in many situations. But not all Python interpreters will
close the file when for loop ends. Likewise, if you get an exception
during the processing, the file may not get closed properly. Those
things may matter to you, in which case the "right" way is:

data = open("filename")
try:
for line in data:
whatever
finally:
data.close()

Guido has made a pronouncement on open vs. file. I think he prefers
open for opening files, and file for type testing, but may well be
wrong. I don't think it's critical.

He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().

<wink>

Peter Hansen · Jul 4, 2005

Michael said:
He has said that open() may be used for things other than files in the
future. So if you want to be sure you're opening a file, use file().

Probably this is the same sort of things as "if you want to be sure your
function is working with an integer, you have to test whether it is an
integer" (or use a statically typed language).

Which is advice that is generally rebutted around here with comments
about "duck typing" (as in, if it acts like an integer, then stop
worrying about what it actually is).

If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").

I'm not going to try to picture just how this might happen, but I could
imagine, for example, some kind of support for protocol prefixes (ala
"http:" or "ftp:"), or perhaps some sort of support for encrypted or
compressed data. Or maybe it would require a prior call to some
function to enable the magic that lets open() return non-files.

If any of that is reasonable, then using open() is actually the better
approach to ensuring your code "does the right thing" in the future, and
"file" should still be used in the rare case where you actually want to
test whether something is a particular type of thing.

-Peter

Terry Hancock · Jul 4, 2005

If open() can ever return things other than files, it seems likely it
will do so only under conditions that make it pretty much safe to assume
that existing code will continue to operate "as expected" (note: not
"always with a file").

WHEN it returns things other than files. Like a StringIO object,
which can be quite handy. True, it won't be a "big file", but it'd
be nice if the same code would tolerate it. I've used this with
e.g. PIL quite a bit when working with Zope, because it isn't
really desireable to have to write the file out to disk and read
it back when you've already got it in memory.

Quack! ;-)
Terry

Asun Friere · Jul 6, 2005

Jp said:
fileIter = iter(big_file)
for line in fileIter:
line_after = fileIter.next()

Don't mix iterating with any other file methods, since it will confuse the buffering scheme.

Isn't a file an iterable already?

[GCC 3.3.3 20040412 (Red Hat Linux 3.3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.True

Asun Friere · Jul 6, 2005

sorry lost the first line in pasting:
Python 2.4.1 (#1, Jun 21 2005, 12:38:55)
:/

Buffer Overflow and Over-read Vulnerabilities	1	Jan 16, 2025
What option should I take? Java Senior Software Developer or Genesys Developer in a really big company?	0	Jul 27, 2023
looping versus comprehension	0	Jan 30, 2013
iterating over a file with two pointers	18	Sep 18, 2013
Using a DTSX file with GoDaddy	0	Apr 21, 2024
groveling over a file for Q:: and A:: stmts	3	Jul 24, 2012
At which point in a web centric project using postgres over mysql (or vice-versa) begin to make a noticiable difference?	0	Sep 12, 2024
Big problem I need to solve with some unix utils	1	Jun 19, 2022

looping over a big file

martian

Roy Smith

Mike Meyer

Michael Hoffman

Peter Hansen

Terry Hancock

Asun Friere

Asun Friere

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads