A bug with file.tell()?

N

Nick Jacobson

Is this a bug?

This code involves the file methods "seek" and "tell". Even though
the file pointer is in the middle of the file, "tell" returns a
position at the end of the file!

fp = file("test.txt")
#read in some of the file:
for line in fp:
if line == "blah3\n":
break
fpos = fp.tell() #save the current position...

for line in fp:
print line #prints "asdf", so it wasn't at the end of the file

fp.seek(fpos) #rewind?
for line in fp:
print line #prints nothing, because it's at the end of the file!


The test.txt file is:
blah1
blah2
blah3
asdf
 
A

Andrew Dalke

Nick said:
Is this a bug?

[example of mixing iterator-style line reads from a
file and tell]

No, it's a documented behaviour.

http://docs.python.org/lib/bltin-file-objects.html

] In order to make a for loop the most efficient way
] of looping over the lines of a file (a very common
] operation), the next() method uses a hidden read-ahead
] buffer. As a consequence of using a read-ahead
] buffer, combining next() with other file methods (like
] readline()) does not work right. However, using seek()
] to reposition the file to an absolute position will
] flush the read-ahead buffer.
This code involves the file methods "seek" and "tell". Even though
the file pointer is in the middle of the file, "tell" returns a
position at the end of the file!

That means the underlying file handle is at
the end and everything remaining is in memory.



Andrew
(e-mail address removed)
 
N

Nick Craig-Wood

Nick Jacobson said:
Is this a bug?

This code involves the file methods "seek" and "tell". Even though
the file pointer is in the middle of the file, "tell" returns a
position at the end of the file!

Its something to do with buffering when using the file() as an
iterator.
fp = file("test.txt")
#read in some of the file:
for line in fp:
if line == "blah3\n":
break
fpos = fp.tell() #save the current position...

for line in fp:
print line #prints "asdf", so it wasn't at the end of the file

fp.seek(fpos) #rewind?
for line in fp:
print line #prints nothing, because it's at the end of the file!


The test.txt file is:
blah1
blah2
blah3
asdf

If you re-write your loop using this equivalent code :-

#...
while 1:
line = fp.readline()
if line == "": # EOF
break
if line == "blah3\n":
break
#...

You'll find it works fine.

IMHO this is a bug - even with buffering tell() should give where the
file returned data to the user not where its read data from the file.
 
N

Nick Jacobson

It gets weirder:

I added the second fp.read(3) statement below. And that statement
doesn't print anything out! It (fp.read(3)) thinks it's at the end of
the file, while the next statement "for line in fp" thinks it's not
(and reads "asdf")!

Now I'm definitely confused.



fp = file("testcrlf.txt")
#read in some of the file:

print fp.read(3)
for line in fp:
if line == "blah3\n":
break
fpos = fp.tell() #save the current position...

print fp.read(3)
for line in fp:
print line #prints "asdf", so it wasn't at the end of the file

fp.seek(fpos) #rewind?
for line in fp:
print line #prints nothing, because it's at the end of the file!
 
P

Paul Watson

Nick Jacobson said:
It gets weirder:

I added the second fp.read(3) statement below. And that statement
doesn't print anything out! It (fp.read(3)) thinks it's at the end of
the file, while the next statement "for line in fp" thinks it's not
(and reads "asdf")!

Now I'm definitely confused.



fp = file("testcrlf.txt")
#read in some of the file:

print fp.read(3)
for line in fp:
if line == "blah3\n":
break
fpos = fp.tell() #save the current position...

print fp.read(3)
for line in fp:
print line #prints "asdf", so it wasn't at the end of the file

fp.seek(fpos) #rewind?
for line in fp:
print line #prints nothing, because it's at the end of the file!

The first "for line in fp" created an iterator for the file that has a
read-ahead buffer. While your loop operated on one line at a time, under
the covers the iterator read more of the file to improve speed performance.
With a file of this size, it is likely that the entire file was read at
once. Because of that, the file pointer was already at the end of file.

The fact that your loop had only processed three of the lines is not know to
File.tell().

Is this a bug? While it is documented, it means that tell() and seek()
cannot be used if a file is processed by an iterator. The documentation for
seek() says that "only offsets returned by tell() are legal." One person's
documented behavior is another person's bug. The documentation for tell()
should probably mention this fact.
 
N

Nick Jacobson

That means the underlying file handle is at
the end and everything remaining is in memory.

Thanks a lot for the reply, it explains the problem very well.

But not allowing "for line in file" and read(), tell(), etc. together
still seems like a bug to me. And it's true that it's documented, but
the problem is buried in the documentation under next(), which is not
directly called!

Here's one idea: Have functions read(), tell(), etc. use the
read-ahead buffer for reference, instead of the file handle.

Here's another: raise an exception if calling read(), tell(), etc.
when the file handle != the read ahead buffer.

There has got to be a way to make these things consistent! Otherwise,
if it were up to me, I would avoid the read-ahead entirely, reasoning
that trading speed for buggy behavior is a bad practice...

What do you think?
 
E

Elbert Lev

Here is the script. In the first part I open the file and read it line
by line.
In the second part I open the same file and read lines in: "for s in
f:". The difference is that the second method first reads the whoole
file in memory and creates the list of string. Actually you are taking
strings from this list.

f = file("test")
fo = file("out", "w")

while 1:
off = f.tell()
s = f.readline()
if not s: break
print >>fo, "%5d:%s" % (off, s)

f = file("test")
print >>fo, "==========================="
for s in f:
off = f.tell()
print >>fo, "%5d:%s" % (off, s)


Output:
0:blah1
7:blah2
14:blah3
21:asdf
===========================
25:blah1
25:blah2
25:blah3
25:asdf

By the way, because for s if f: reads the whoole file in memory such
construct is not recommended for reading large files (but is very
convenient).

I do not think, this is a bug.
 
A

Alex Martelli

Elbert Lev said:
By the way, because for s if f: reads the whoole file in memory such

Nope! It buffers a few KB at a time.
construct is not recommended for reading large files (but is very
convenient).

It's perfectly suitable for reading files as huge as you wish.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,206
Messages
2,571,069
Members
47,675
Latest member
RollandKna

Latest Threads

Top