Reading a file and then writing something back

K

Kevin T. Ryan

Hi All -

I'm not sure, but I'm wondering if this is a bug, or maybe (more
likely) I'm misunderstanding something...see below:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IOError: (0, 'Error')

I've figured out that I can do an open('testfile', 'r+') and then seek
and write something (without an error), but it just seems odd that I
would get an IOError for what I was trying to do. Oh, and I also
tried to do "f.flush()" before the write operation with no luck.
I've searched google, but can't seem to find much. Any thoughts???
TIA,

Kevin
 
R

Remy Blank

Kevin said:
I'm not sure, but I'm wondering if this is a bug, or maybe (more
likely) I'm misunderstanding something...see below:



Traceback (most recent call last):
File "<stdin>", line 1, in ?
IOError: (0, 'Error')

This is just a guess, I don't know the inner workings of files in
Python, but here we go:

I think that readline() doesn't read one character at a time from the
file, until it finds a newline, but reads a whole block of characters,
looks for the first newline and returns that string (for efficiency
reasons). Due to this buffering, the file pointer position is undefined
after a readline(), and so a write() afterwards doesn't make sense.
Python tries to help you not to fall into this trap.
I've figured out that I can do an open('testfile', 'r+') and then seek
and write something (without an error), but it just seems odd that I
would get an IOError for what I was trying to do.

When you do a seek(), the file pointer position is clearly defined, so a
write() makes sense.

A tentative solution could be:

pos = f.tell()
s = f.readline() # Reads 'dan\n'
f.seek(pos + len(s))
f.write('chris\n')

However, I'm not sure you want to do that, as the string written will
just overwrite the previous content, and will probably not be aligned
with the next newline in the file. Except if you don't care about the
data following your write.

Hope this helps.
-- Remy


Remove underscore and anti-spam suffix in reply address for a timely
response.
 
J

Jeff Epler

Python's file object is based on ISO C's file I/O primitives
(fopen, fread, etc) and inherits both the requirements of the standard
and any quirks of your OS's C implementation.

According to this document
http://www.lysator.liu.se/c/rat/d9.html#4-9-5-3
a direction change is only permitted after a "flushing" operation
(fsetpos, fseek, rewind, fflush). file.flush calls C's fflush.

I believe that this C program is equivalent to your Python program:

#include <stdio.h>

int main(void) {
char line[21];
FILE *f = fopen("testfile", "w");
fputs("kevin\n", f);
fputs("dan\n", f);
fputs("pat\n", f);
fclose(f);

f = fopen("testfile", "r+");
fgets(line, 20, f); printf("%s", line);
fgets(line, 20, f); printf("%s", line);

fflush(f);

if(fputs("chris\n", f) == EOF) { perror("fputs"); }
fclose(f);

return 0;
}

On my Linux machine, it prints
kevin
pat
and testfile's third and final line is "chris".

On a windows machine nearby (compiled with mingw, but using msvcrt.dll)
it prints
kevin
dan
fputs: No error
and testfile's third and final line is "pat".

If I add fseek(f, 0, SEEK_CUR) after fflush(f), I don't get a failure
but I do get the curious contents
kevin
dan
pat
chris

If I use just fseek(f, 0, SEEK_CUR) I get no error and correct
contents in testfile.

I don't have a copy of the actual C standard, but even Microsoft's
documentation says
When the "r+", "w+", or "a+" access type is specified, both reading
and writing are allowed (the file is said to be open for “updateâ€).
However, when you switch between reading and writing, there must be
an intervening fflush, fsetpos, fseek, or rewind operation. The
current position can be specified for the fsetpos or fseek
operation, if desired.
http://msdn.microsoft.com/library/d...n-us/vccore98/HTML/_crt_fopen.2c_._wfopen.asp
so it smells like a bug to me. Do you happen to be using Windows? I
guess you didn't actually say.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA/ov9Jd01MZaTXX0RAp+CAJ9/rH5C+G1fNhCCFqbCp88tTx2rtgCeNqkw
JPdi2MVpPT81hWiMYjADddE=
=ax80
-----END PGP SIGNATURE-----
 
J

Jeff Epler

IEEE Std 1003.1 says that a "file positioning function" (fseek, fsetpos,
rewind) must be called when a stream's direction changes from input
to output.

http://www.opengroup.org/onlinepubs/009695399/functions/fopen.html

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFA/pmWJd01MZaTXX0RAs7EAJ9nZCXeYK0bIr+zu7oQw7VRLZvxLwCdGlrD
eEVIhhgHe/2N4DAqKWg2Jkk=
=P/SR
-----END PGP SIGNATURE-----
 
B

Byron

Hi Kevin,

Even though I am fairly new to Python, it appears that you might of
found a bug with 'r+' writing / reading mode.

Here's a couple of suggestions which you might find helpful:

1) To make your programs faster and less 'error'-prone, you might want
to read the text file into memory (a list) first, like this:

f = open("c:/test.txt", "r")
names = f.readlines() # Read all lines in file and store data in a list.
for name in names: # Display a listing of all lines in the file.
print name
f.close() # Close the text file (we'll change it later).


2) When you want to add new names to the "text file" (in memory), you
can easily do so by doing this:

names = names + ["William\n"] # Adds William to the list (text file
in memory).
names = names + ["Steven\n"] # Adds Steven to the list.
names = names + ["Tony\n"] # Adds Tony to the list also.


3) If you wish to sort the list in memory, you can do this:

names.sort() # Places the names in the list now in ascending
order (A - Z).


4) Finally, to re-write the text file on the disk, you can do this:

f = open("c:/test.txt", "w") # Re-write the file from scratch with
revised info.
for name in names: # For each name that is in the list (names)
f.write(name) # Write it to the file.
f.close() # Finally, since the file has now been 100%
rewritten with new data, close it.


--------------

Why does this have advantages? Several reasons, which are:

1) It does the processing in the memory, which is much quicker. Faster
programs are always a nice feature!
2) It allows for additional processes to occur, such as sorting, etc.
3) It reduces the chances of "having a disk problem." One simple read &
one simple write.


Hope this helps,

Byron
---
 
B

Byron

Opps, forgot to add one extra thing:
--------------------------------------------

If you would like to see all of the names from your "names" list, you
can do the following:

for name in names:
print name


This provides you with the results:

Dan
Kevin
Pat
Steven
Tony
William


---

If you would like to see the first and fourth items in the list, you can
do the following:

print names[0] # Display the first item in the list. First item
always starts with zero.
print names[3] # Display the fourth item in the list.

Result is:

Dan
Steven

---

Finally, if you would like to remove an item from the list:

del names[3]

---

Hope this helps!

Byron
----------------------------
 
K

Kevin T. Ryan

Remy said:
This is just a guess, I don't know the inner workings of files in
Python, but here we go:

I think that readline() doesn't read one character at a time from the
file, until it finds a newline, but reads a whole block of characters,
looks for the first newline and returns that string (for efficiency
reasons). Due to this buffering, the file pointer position is undefined
after a readline(), and so a write() afterwards doesn't make sense.
Python tries to help you not to fall into this trap.


When you do a seek(), the file pointer position is clearly defined, so a
write() makes sense.

A tentative solution could be:

pos = f.tell()
s = f.readline() # Reads 'dan\n'
f.seek(pos + len(s))
f.write('chris\n')

However, I'm not sure you want to do that, as the string written will
just overwrite the previous content, and will probably not be aligned
with the next newline in the file. Except if you don't care about the
data following your write.

Hope this helps.
-- Remy


Remove underscore and anti-spam suffix in reply address for a timely
response.
Thanks all for the suggestions. I'm guessing that what Remy stated seems to
be about right. Jeff: I AM using windows (or at least, was today while i
was writing that script)...if Remy was wrong, then it still might be a bug
though - I tried to do the f.flush(), but the error still occurred.

Byron - thanks for the advice. For my simple example, you're totally
correct, but I was thinking along the lines of a much bigger file w/ tons
of records - and therefore didn't want to slurp everything in to memory in
case the file got to be too big. Maybe I'm wrong though - I don't know how
much a "normal" computer could hold in memory (maybe 100's of 1,000's of
lines?).

Oh well, thanks again :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top