K
Kenneth Brody
I recently ran into an "issue" related to text files and ftell/fseek,
and I'd like to know if it's a bug, or simply an annoying, but still
conforming, implementation.
The platform is Windows, where text files use CF+LF (0x0d, 0x0a) to
mark end-of-line. The file in question, however, was in Unix format,
with only LF (0x0a) at the end of each line.
First, does the above situation already invoke "implementation defined"
or "undefined" behavior? Or is it still "defined"?
The problem comes in how ftell() reports the current position. (And,
subsequently fseek()ing back to the same position is wrong.)
Suppose that you have fread() the following 12 characters, starting at
the beginning of the file:
'1' '2' '3' '4' '5' 0x0a '1' '2' '3' '4' '5' 0x0a
(Remember, this file is in Unix format, with a single 0x0a for end-of-
line.)
While you are now at offset 12 within the file, ftell() will return 14,
because it assumes that those '\n' newlines are really CR+LF, and that
the CR was stripped off when read. (Had this file been in Windows format,
you really would be at offset 14 after reading those 12 characters.) For
each 0x0a returned by fread(), ftell() will assume you have advanced two
characters in the file.
The net result here is that a subsequent fseek() to the same position
will be wrong.
So, have I invoked undefined behavior by reading a Unix text file in a
Windows environment? Or is the compiler allowed to return the "wrong"
value as part of an "implementation defined" restriction? Or is this
a bug in the compiler's runtime library?
--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>
and I'd like to know if it's a bug, or simply an annoying, but still
conforming, implementation.
The platform is Windows, where text files use CF+LF (0x0d, 0x0a) to
mark end-of-line. The file in question, however, was in Unix format,
with only LF (0x0a) at the end of each line.
First, does the above situation already invoke "implementation defined"
or "undefined" behavior? Or is it still "defined"?
The problem comes in how ftell() reports the current position. (And,
subsequently fseek()ing back to the same position is wrong.)
Suppose that you have fread() the following 12 characters, starting at
the beginning of the file:
'1' '2' '3' '4' '5' 0x0a '1' '2' '3' '4' '5' 0x0a
(Remember, this file is in Unix format, with a single 0x0a for end-of-
line.)
While you are now at offset 12 within the file, ftell() will return 14,
because it assumes that those '\n' newlines are really CR+LF, and that
the CR was stripped off when read. (Had this file been in Windows format,
you really would be at offset 14 after reading those 12 characters.) For
each 0x0a returned by fread(), ftell() will assume you have advanced two
characters in the file.
The net result here is that a subsequent fseek() to the same position
will be wrong.
So, have I invoked undefined behavior by reading a Unix text file in a
Windows environment? Or is the compiler allowed to return the "wrong"
value as part of an "implementation defined" restriction? Or is this
a bug in the compiler's runtime library?
--
+-------------------------+--------------------+-----------------------------+
| Kenneth J. Brody | www.hvcomputer.com | |
| kenbrody/at\spamcop.net | www.fptech.com | #include <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------------+
Don't e-mail me at: <mailto:[email protected]>