file position *tell()* works different

P

Peter Abel

Hi all,
I'm working under W2k with
Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32

I have a file *test_data.txt* with the following content:
0123456789
0123456789
abcdefghi
ABCDEFGHIJKLMNOPQ

and I work on it with the following python script:

# Open NOT in binary mode
fp=file('test_data.txt','r')
a='xx'
while a:
print 'Filepointer: %3d' % fp.tell()
a=fp.readline()
fp.close()

print

# Open IN binary mode
fp=file('test_data.txt','r+b')
a='xx'
while a:
print 'Filepointer: %3d' % fp.tell()
a=fp.readline()
fp.close()

Now, when test_data.txt is saved in PC-mode with 0xC, 0xA as newline
it works correct.
But when I save the file in UNIX-Mode with 0xA as newline,
my script gives me the following output, where that one with
the file not opened in binary mode is wrong:
Filepointer: 0
Filepointer: 7
Filepointer: 19
Filepointer: 30
Filepointer: 49
Filepointer: 51

Filepointer: 0
Filepointer: 11
Filepointer: 22
Filepointer: 32
Filepointer: 50
Filepointer: 51

When I try this under HP-UX it works fine in both cases.
So I wonder if the function *tell()* is not correctly implemented under win32.

Regards
Peter
 
M

M-a-S

I'm not sure if that't the reason, but the binary mode for reading is 'rb'.
Actually, the order of 'r' and 'b' shouldn't matter. But the '+' has a different
meaning: the file should allow "opposite" access as well, e.g. 'r+', 'rb+'
means that you can write to the file too, while 'w+' means: open it for
writing but permit reading too. You can try to say 'rt' for the read/text mode.

Anyway, you program works under Windows XP/Python 2.3 as expected:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Home\Programming\Python\2>py
Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
C:\Home\Programming\Python\2>test.py
Filepointer: 0
Filepointer: 12
Filepointer: 24
Filepointer: 35
Filepointer: 54

Filepointer: 0
Filepointer: 12
Filepointer: 24
Filepointer: 35
Filepointer: 54

I'm sorry if it doesn't help. The bug must be somewhere else then.

M-a-S


Peter Abel said:
Hi all,
I'm working under W2k with
Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32

I have a file *test_data.txt* with the following content:
0123456789
0123456789
abcdefghi
ABCDEFGHIJKLMNOPQ

and I work on it with the following python script:

# Open NOT in binary mode
fp=file('test_data.txt','r')
a='xx'
while a:
print 'Filepointer: %3d' % fp.tell()
a=fp.readline()
fp.close()

print

# Open IN binary mode
fp=file('test_data.txt','r+b')
a='xx'
while a:
print 'Filepointer: %3d' % fp.tell()
a=fp.readline()
fp.close()

Now, when test_data.txt is saved in PC-mode with 0xC, 0xA as newline
it works correct.
But when I save the file in UNIX-Mode with 0xA as newline,
my script gives me the following output, where that one with
the file not opened in binary mode is wrong:
Filepointer: 0
Filepointer: 7
Filepointer: 19
Filepointer: 30
Filepointer: 49
Filepointer: 51

Filepointer: 0
Filepointer: 11
Filepointer: 22
Filepointer: 32
Filepointer: 50
Filepointer: 51

When I try this under HP-UX it works fine in both cases.
So I wonder if the function *tell()* is not correctly implemented under win32.

Regards
Peter
 
R

Richie Hindle

[Peter]
I wonder if the function *tell()* is not correctly implemented under win32.
[M-a-S]
Anyway, you program works under Windows XP/Python 2.3 as expected:

M-a-S, are you sure you saved test_data.txt with Unix line endings? I
tested Peter's script under WinXP/Python2.3 as well, and it failed as
expected (though with slightly different results):
pythonw -u peter.py
Filepointer: 0
Filepointer: 8
Filepointer: 20
Filepointer: 31
Filepointer: 50

Filepointer: 0
Filepointer: 11
Filepointer: 22
Filepointer: 32
Filepointer: 50
 
E

Eric Brunel

M-a-S said:
I'm not sure if that't the reason, but the binary mode for reading is 'rb'.
Actually, the order of 'r' and 'b' shouldn't matter. But the '+' has a different
meaning: the file should allow "opposite" access as well, e.g. 'r+', 'rb+'
means that you can write to the file too, while 'w+' means: open it for
writing but permit reading too. You can try to say 'rt' for the read/text mode.

Sorry, but no you can't: the default is to open the file in text mode, and you
can change it with a 'b', but 't' has no meaning at all. BTW, 'b' also has no
meaning at all on all Unices: the so-called "binary" or "text" mode are the
same, i.e. what is read is what is in the file. Windows needs it only because of
its superfluous \r's at the end of each line.
Anyway, you program works under Windows XP/Python 2.3 as expected:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Home\Programming\Python\2>py
Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

C:\Home\Programming\Python\2>test.py
Filepointer: 0
Filepointer: 12
Filepointer: 24
Filepointer: 35
Filepointer: 54

Filepointer: 0
Filepointer: 12
Filepointer: 24
Filepointer: 35
Filepointer: 54

I'm sorry if it doesn't help. The bug must be somewhere else then.

M-a-S


Peter Abel said:
Hi all,
I'm working under W2k with ===
Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32

I do confirm the methods tell and/or seek are broken on Win2K when you open the
file in text mode: doing myFile.seek(myFile.tell()) is not a no-op. The "ghost"
\r's at the end of lines seem to be taken into account by one of the methods and
not by the other one. The problem also happens on Win98.

I don't know if it's a Python bug or a bug in the underlying C API. Knowing
Windows, and considering the Python wrapper must be quite trivial, I'd bet on
the C API...

The only workaround I found was to always open the files in binary mode, and
explicitely ignore the \r's.

HTH
 
R

Richie Hindle

[Peter]
I wonder if the function *tell()* is not correctly implemented under win32.

[Tim, quoting the standard]
For a text stream, its file position indicator contains
unspecified information, usable by the fseek function for returning
the file position indicator for the stream to its position at the
time of the ftell call

It still doesn't seem to work as specified:

------------------------------ peter.py ------------------------------

# Open the file in text mode, read a line, and store the position.
fp = file('test_data.txt', 'rt')
line = fp.readline()
storedPosition = fp.tell()
print 'Line: %r, file pointer after read: %d' % (line, storedPosition)

# Read some more and print it.
print 'Read another line from this position: %r' % fp.readline()

# Now seek back and read the same line again.
fp.seek(storedPosition)
print 'Another read from the same position: %r' % fp.readline()

----------------------------------------------------------------------

This prints:

Line: '0123456789\n', file pointer after read: 8
Read another line from this position: '0123456789\n'
Another read from the same position: '89\n'

I'd expect doing readline/tell/readline/seek/readline to read the same
line the second two times. And however you implement tell and seek, a
tell value of 8 after reading 11 bytes looks pretty weird.

I'd write the same code in C if I had the time, so at least we could be
*sure* we can blame Microsoft. :cool:
 
P

Peter Abel

Richie Hindle said:
[Peter]
I wonder if the function *tell()* is not correctly implemented under win32.
[M-a-S]
Anyway, you program works under Windows XP/Python 2.3 as expected:

M-a-S, are you sure you saved test_data.txt with Unix line endings? I
This is exactly the point.
It works under win32 with PC-endings, but not with Unix line endings.
The workaround is to open it in binray mode. I know there are differences
between the open-modes 'rb', 'r+b' ... but that's not the problem. Both
work fine.
tested Peter's script under WinXP/Python2.3 as well, and it failed as
expected (though with slightly different results):

Filepointer: 0
Filepointer: 8
It doesn't make any sense for me that file position results in *8* here.
The line has 10 chars: 0123456789 plus one newline which makes 11 for me
if newline is only a 0xa and 12 if newline is a 0xc, 0xa. So let's suppose
the *file(file_name).readline()* reads until it detects a 0xa and then subtracts
an os-depending number of bytes namely one for UNIX-newline and two for PC-newline.
Then it would result in 10 or 9 but never in 8. It doesn't make any sense for
me. I think it must be a bug.
Filepointer: 20
Filepointer: 31
Filepointer: 50

Filepointer: 0
Filepointer: 11
Filepointer: 22
Filepointer: 32
Filepointer: 50

When the file is opened in binary mode the above output shows that python
does the right work.

Peter
 
M

M-a-S

Richie Hindle said:
M-a-S, are you sure you saved test_data.txt with Unix line endings? I
tested Peter's script under WinXP/Python2.3 as well, and it failed as
expected (though with slightly different results):
<......>

Oops! Stupid me! With '\n' it behaves really weird. Same values as yours.

Filepointer: 0
Filepointer: 8 (-3 off the right value - M-a-S)
Filepointer: 20 (-2)
Filepointer: 31 (-1)
Filepointer: 50 (-0)

Filepointer: 0
Filepointer: 11
Filepointer: 22
Filepointer: 32
Filepointer: 50

For the file '0123456789\n'*12 it prints:

Filepointer: 0
Filepointer: 0 (-11 off the right value)
Filepointer: 12 (-10)
Filepointer: 24 (-9)
Filepointer: 36 (-8)
Filepointer: 48 (-7)
Filepointer: 60 (-6)
Filepointer: 72 (-5)
Filepointer: 84 (-4)
Filepointer: 96 (-3)
Filepointer: 108 (-2)
Filepointer: 120 (-1)
Filepointer: 132 (-0)

Filepointer: 0
Filepointer: 11
Filepointer: 22
Filepointer: 33
Filepointer: 44
Filepointer: 55
Filepointer: 66
Filepointer: 77
Filepointer: 88
Filepointer: 99
Filepointer: 110
Filepointer: 121
Filepointer: 132

If I add another line, it breakes:

Filepointer: 0
Traceback (most recent call last):
File "C:\Home\Programming\Python\t\t.py", line 5, in ?
print 'Filepointer:%4d' % fp.tell()
IOError: (0, 'Error')

M-a-S
 
M

M-a-S

Eric Brunel said:
Sorry, but no you can't: the default is to open the file in text mode, and you
can change it with a 'b', but 't' has no meaning at all. BTW, 'b' also has no
meaning at all on all Unices: the so-called "binary" or "text" mode are the
same, i.e. what is read is what is in the file. Windows needs it only because of
its superfluous \r's at the end of each line.
<...>
HTH


The idea was to tell the humans that it's text. It won't hurt neither Unix nor Windows.
I know that nobody cares, though.

M-a-S
 
E

Eric Brunel

M-a-S said:
The idea was to tell the humans that it's text. It won't hurt neither Unix nor Windows.
I know that nobody cares, though.

Not only to tell humans: Windows automatically removes the '\r' at the end of
each line when a file is opened in text mode. It won't happen in binary mode.

And I wish I could stop caring, but I occasionally run into problems just
because of this behaviour, and I know I'm not the only one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top