Reading from files and range of char and friends

E

Eric Sosman

The standard lists the operations that can *generate* a negative zero.
One could argue that operations like cast and assignment simply preserve
an existing negative zero rather than generating a new one.

Even if a negative zero arises without being "generated," the
Standard does not assure us that negativity is preserved. 6.2.6.2p3:

"It is unspecified [...] whether a negative zero becomes
a normal zero when stored in an object."
 
P

Phil Carmody

Eric Sosman said:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?
Was that really intended?

Phil
 
E

Eric Sosman

Eric Sosman said:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?
Yes.

Was that really intended?

It certainly *looks* intentional, but I wasn't one of the
authors and can't speak for them.

Note that if you acquire a negative zero somehow, then

int minus_zero = ...whatever...;
int normal_zero = - minus_zero;

.... might yield either zero, not necessarily a normal zero.
(That's my reading, anyhow.)
 
E

Eric Sosman

[...]
I got into this computer stuff in 1963 at Philco. The other major
players of the time were IBM and Univac. None of us used
ones-complement. I learned it in school but I've never seen it in
production. Minus zero is a concept but not an actuality in my experience.

Are there any current instances of signed-magnitude or ones-complement
systems we might encounter in the 'real world'?

Marginal topicality ...

Dunno about "current," but I've seen both conventions on old
systems. The first machine I wrote programs for was an IBM 1620
with signed magnitude decimal arithmetic, and a little later I had
a very brief encounter with a ones' complement Univac system. No
C implementation on either system, though.

If you're interested in finding current examples, I think the
places to look would be among the embedded and special-purpose
processors. The really small ones (hand-held calculators and so
on) probably can't support C, but there might be C-programmable
"exotic" CPU's operating traffic lights and gathering telemetry
and controlling your car's fuel injectors. And burning your toast,
of course.
 
T

Tim Rentsch

Phil Carmody said:
Eric Sosman said:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?

Yes. Surprising but true.
Was that really intended?

Apparently it was.
 
T

Tim Rentsch

pete said:
(-0) cannot be a negative zero.

A bitwise operation has to be involved somewhere
in the generation of a negative zero.
That seems to be the intention of the standard to me.

Or some kinds of conversions, although one might argue
those could be thought of as bitwise operations also.
 
T

Tim Rentsch

Eric Sosman said:
The standard lists the operations that can *generate* a negative zero.
One could argue that operations like cast and assignment simply preserve
an existing negative zero rather than generating a new one.

Even if a negative zero arises without being "generated," the
Standard does not assure us that negativity is preserved. 6.2.6.2p3:

"It is unspecified [...] whether a negative zero becomes
a normal zero when stored in an object."

The Standard doesn't, but an implementation can. We are
after all talking about implementation-specific behavior
here.
 
K

Keith Thompson

Tim Rentsch said:
Phil Carmody said:
Eric Sosman said:
On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?

Yes. Surprising but true.
Was that really intended?

Apparently it was.

And I think it makes a certain amount of sense. It means that this:

int i = /* something */
int j = -i;

won't store a negative zero in j, even if the value of i is 0.

Of course, one could argue about whether that's desirable. In any case,
having *some* rules that let you avoid spurious negative zeros in
general calculations seems like a good idea.

(Using two's-complement seems like an even better idea.)
 
B

Barry Schwarz

Dunno about "current," but I've seen both conventions on old
systems. The first machine I wrote programs for was an IBM 1620
with signed magnitude decimal arithmetic, and a little later I had

Still the finest machine ever invented for introducing students to the
fundamentals of computer programming.
 
S

Spiros Bousbouras

One strange text file, yes. But not so strange for a binary
file, where any bit pattern at all might appear. If a char that looks
like minus zero appears somewhere in the middle of a double, and you
fwrite() that double to a binary stream, the underlying fputc() calls
(a direct requirement; not even an "as if") convert each byte in turn
from unsigned char to int. I think the conversion allows the bits to
be diddled irreversibly -- although on reconsideration it may happen
only when sizeof(int)==1 as well.

I see now that a negative zero in a file is a realistic possibility.
But if the bits were diddled in a way that would turn a negative zero
to a regular zero wouldn't that violate 7.19.2p3 ? I guess it depends
on what the "shall compare equal" part exactly means. IMO it should
mean equal as bit patterns in which case no bit diddling is allowed.

Another thing : I don't see why the "as if" rule wouldn't apply
to the specification of fwrite() just as much as it applies to
everything else. Why would the standard force the implementations to
actually repeatedly call fputc() ? Perhaps there is some operating
system specific way to achieve the same result faster without calling
fputc() .Why would the standard forbid that ?
When sizeof(int)==1, there will exist a perfectly valid unsigned
char value whose conversion to int yields EOF. (Or else there will
exist two or more distinct unsigned char values that convert to the
same int value, which is even worse and violates 7.19.2p3.) So
checking the value of getc() against EOF isn't quite enough: Having
found EOF, you also need to call feof() and ferror() before concluding
that it's "condition" rather than "data." More information is being
forced through the return-value channel than the unaided channel
can accommodate.

So then this means that the common idiom
int a;
....
while ( (a = fgetc(f)) != EOF ) ...

is actually wrong ! (Unless someone has already checked that
sizeof(int) != 1 but I don't imagine you'll see a lot of code which
does that.) Yikes , I can't believe I've been doing it wrong all
those years. Has anyone seen a book which mentions the issue ?
 
S

Spiros Bousbouras

For example, suppose ptrdiff_t is
compatible with long. Then converting (say, via an explicit cast)
a negative zero of type long to ptrdiff_t would yield a negative
zero of type ptrdiff_t -- but a cast is not one of the operations
that can yield a negative zero.

The standard lists the operations that can *generate* a negative zero.
One could argue that operations like cast and assignment simply preserve
an existing negative zero rather than generating a new one.[/QUOTE]

That's my reading too but the problem is what happens when you read
from a file ? I think the standard would be more clear if it said that
reading from a binary stream can also generate a negative zero.
 
S

Spiros Bousbouras

Spiros Bousbouras said:
40 AM, Spiros Bousbouras wrote:
If you are reading from a file by successively calling fgetc() is there
any point in storing what you read in anything other than unsigned
char ?

Sure. To see one reason in action, try

unsigned char uchar_password[SIZE];
...
if (strcmp(uchar_password, "SuperSecret") == 0) ...

Just to be clear , the only thing that can go wrong with this example
is that strcmp() may try to convert the elements of uchar_password to
char thereby causing the implementation defined behavior. The same
issue could arise with any other str* function. Or is there something
specific about your example that I'm missing ?

The call to strcmp() violates a constraint. strcmp() expects const
char* (a non-const char* is also ok), but uchar_password, after
the implicit conversion is of type unsigned char*. Types char*
and unsigned char* are not compatible, and there is no implicit
conversion from one to the other.

I see. I assumed that the implicit conversion would be ok because
paragraph 27 of 6.2.5 says "A pointer to void shall have the same
representation and alignment requirements as a pointer to a character
type.39)" and footnote 39 says "The same representation and alignment
requirements are meant to imply interchangeability as arguments to
functions, return values from functions, and members of unions." I
assumed that the relation "same representation and alignment
requirements" is transitive.

On the other hand footnote 35 of paragraph 15 says that char is not
compatible with signed or unsigned char and in 6.7.5.1 we read that
pointers to types are compatible only if the types are compatible. We
must conclude then that the relation "same representation and alignment
requirements" is not transitive. That's a damn poor choice of
terminology then.

Actually if the relation "same representation and alignment
requirements" were transitive *and* symmetric then we could conclude
that the implicit conversion would be ok. The word "same" suggests to
me a relation which is transitive and symmetric so I still think it's a
poor choice of terminology.
 
K

Keith Thompson

Spiros Bousbouras said:
I see now that a negative zero in a file is a realistic possibility.
But if the bits were diddled in a way that would turn a negative zero
to a regular zero wouldn't that violate 7.19.2p3 ? I guess it depends
on what the "shall compare equal" part exactly means. IMO it should
mean equal as bit patterns in which case no bit diddling is allowed.

I think "shall compare equal" refers to the "==" operator.
It doesn't make sense for it to refer to bit-level representations --
nor is it necessary.

In the hypothetical C-like language you're describing, an input
file some of whose bytes contain negative zeros would indeed cause
problems; it wouldn't necessarily be possible to read data from a
binary file and write it out again without losing information.

Which is why the standard actually specifies that fgetc() reads
unsigned char values, which have a one-to-one mapping to bit-level
representations. There are no two representations that have the
same value, so the problem you're worried about doesn't arise.
Another thing : I don't see why the "as if" rule wouldn't apply
to the specification of fwrite() just as much as it applies to
everything else. Why would the standard force the implementations to
actually repeatedly call fputc() ? Perhaps there is some operating
system specific way to achieve the same result faster without calling
fputc() .Why would the standard forbid that ?


So then this means that the common idiom
int a;
...
while ( (a = fgetc(f)) != EOF ) ...

is actually wrong ! (Unless someone has already checked that
sizeof(int) != 1 but I don't imagine you'll see a lot of code which
does that.) Yikes , I can't believe I've been doing it wrong all
those years. Has anyone seen a book which mentions the issue ?

Well, I wouldn't say it's wrong; rather, I'd say it's only 99+% portable
rather than 100% portable. It works just fine *unless* sizeof(int) == 1,
which implies CHAR_BIT >= 16.

As far as I know, all existing hosted C implementations have
CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
aren't even required to support stdio).

If I were worried about the possibility, rather than adding calls
to feof() and ferror(), I'd probably add
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
And if I ever see that error message, it almost certainly means
that I forgot to add the "#include <limits.h>"

(Actually, checking that sizeof(int) > 1 would be better, since
the usual EOF check works just fine on a system with 16-bit char
and 32-bit int, but that's a little harder to check at compile time.)
 
S

Spiros Bousbouras

I'm afraid I'm not following you here.

I initially assumed you meant getc and fgetc would be reading
int-sized chunks from the file, rather than (as C currently
specifies) reading bytes, interpreting them as unsigned char,
and converting that to int.

Without the intermediate step, how is the int value determined?

Actually , all this digression about an alternative getc() was unneeded
and it was a very poor way to ask whether a file can contain negative
zeros. But what I had in mind was simply that the implementation reads
a bit pattern which can fit into an int and puts it in a int .
Perhaps you mean getc and fgetc read a byte from the file, interpret
is as *plain* char, and then convert the result to int.

No , no conversions at all.
If so, and if plain char is signed and has a distinct representation
for negative zero (this excludes 2's-complement systems), then
could getc() return a negative zero?

I'd say no. Converting a negative zero from char to int does not
yield a negative zero int; 6.2.6.2p3 specifies the operations that
might generate a negative zero, and conversions aren't in the list.

I don't think the problem is the conversion. I'm with
<[email protected]>
http://groups.google.com/group/comp.lang.c/msg/bdfed4e3a92d711c?dmode=source
on this one. But whether the actual reading can generate a negative
zero , I feel the standard could be clearer on this.
Which means that getc() and fgetc() would be unable to distinguish
between a positive and negative zero in a byte read from a file.
Which is probably part of the reason why the standard specifies
that the value is treated as an unsigned char.

Yes , unsigned char is the type which allows one to deal with arbitrary
bit patterns so it's the appropriate type to use for reading from a
stream which might be binary.
Or the standard could have said specifically that getc and fgetc do
return a negative zero in these cases, but dealing with that in code
would be nasty (and, since most current systems don't have negative
zeros, most programmers wouldn't bother).

I don't see why any special dealing would be needed. In an
implementation which is one's complement or sign and magnitute but does
not support negative zeros the programmer needs to be careful not to
accidentally create such a pattern but if negative zeros are supported
then it's completely transparent to the programmer.

--
Of course this doesn't mean that the sciences haven't also been tools
of capitalism, imperialism, and all the rest: but the reason chemistry
is a much better tool of imperialist domination than alchemy is that
it's much more true.
Cosma Shalizi
 
S

Spiros Bousbouras

Ahh, I didn't understand that. I don't know what would
happen in alternative C; I don't have any kind of reference
manual or standards document for that language.

As I just said in a different post , on this occasion invoking an
alternative C was pointless. But the way it works in general is that
you take the usual C , change some bits and pieces in the standard and
you have your new C.
 
S

Spiros Bousbouras

Eric Sosman said:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?
Was that really intended?

Why wouldn't it be ? I don't think there's any reason a programmer
would want to create a negative zero even where it's supported so the
standard doesn't give you a way to create one.
 
P

Pushkar Prasad

Thanks everybody.

I will explore further based on your inputs. So far I have tried inline
assembly to LOCK and increment / decrement the global, used
InterlockedIncrement() API in Windows SDK without any success. For small
number of threads the modifications to the Global seems to be serialized
but when I spawn thousands of thread then things get messy due to thread
rescheduling.

As I mentioned earlier. I can put Critical Section around the Global and
ensure that the functions acquire the Critical Section for updating the
global but that will create too much of contention point in my code. I was
looking for a leaner way of doing it, InterlockedIncrement() and
InterlockedDecrement() looked to be suitable initially but it failed when
I spawned thousands of threads.


Thanks & Regards
Pushkar Prasad

Eric Sosman said:
On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
[...]
Ok , I guess it could happen. But then I have a different objection. Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?
Was that really intended?

Why wouldn't it be ? I don't think there's any reason a programmer
would want to create a negative zero even where it's supported so the
standard doesn't give you a way to create one.
 
P

Pushkar Prasad

My apologies to everybody. The response below was meant for another
thread. Please ignore my response on this thread.

Thanks & Regards
Pushkar Prasad

Thanks everybody.

I will explore further based on your inputs. So far I have tried inline
assembly to LOCK and increment / decrement the global, used
InterlockedIncrement() API in Windows SDK without any success. For small
number of threads the modifications to the Global seems to be serialized but
when I spawn thousands of thread then things get messy due to thread
rescheduling.

As I mentioned earlier. I can put Critical Section around the Global and
ensure that the functions acquire the Critical Section for updating the
global but that will create too much of contention point in my code. I was
looking for a leaner way of doing it, InterlockedIncrement() and
InterlockedDecrement() looked to be suitable initially but it failed when I
spawned thousands of threads.


Thanks & Regards
Pushkar Prasad

On 3/11/2011 4:55 PM, Spiros Bousbouras wrote:
[...]
Ok , I guess it could happen. But then I have a different objection.
Eric said

(The situation is particularly bad for systems with
signed-magnitude or ones' complement notations, where the
sign of zero is obliterated on conversion to unsigned char
and thus cannot be recovered again after getc().)

It seems to me that an implementation can easily ensure that the sign
of zero does not get obliterated. If by using fgetc() an unsigned char
gets the bit pattern which corresponds to negative zero then the
implementation can assign the negative zero when converting to int .
The standard allows this.

Could you indicate where? I'm looking at 6.2.6.2p3, which lists
the operations that can generate a minus zero, and does not list
"conversion" among them.

That prevents ``signed char s = -0;'' from making s a negative zero?
Was that really intended?

Why wouldn't it be ? I don't think there's any reason a programmer
would want to create a negative zero even where it's supported so the
standard doesn't give you a way to create one.
 
S

Spiros Bousbouras

Well, I wouldn't say it's wrong; rather, I'd say it's only 99+% portable
rather than 100% portable.

It can unexpectedly give wrong results i.e. you'd think that the file
terminated before it actually did. I call that wrong.
It works just fine *unless* sizeof(int) == 1,
which implies CHAR_BIT >= 16.

As far as I know, all existing hosted C implementations have
CHAR_BIT == 8 and sizeof(int) >= 2 (and non-hosted implementations
aren't even required to support stdio).

If I were worried about the possibility, rather than adding calls
to feof() and ferror(),

You do need to call one of the two in order to distinguish between
error and end of file.
I'd probably add
#if CHAR_BIT != 8
#error "CHAR_BIT != 8"
#endif
And if I ever see that error message, it almost certainly means
that I forgot to add the "#include <limits.h>"
(Actually, checking that sizeof(int) > 1 would be better, since
the usual EOF check works just fine on a system with 16-bit char
and 32-bit int, but that's a little harder to check at compile time.)

Personally I think it's more straightforward to use feof() and ferror()
which is what I'll be doing from now on.
 
K

Keith Thompson

Spiros Bousbouras said:
It can unexpectedly give wrong results i.e. you'd think that the file
terminated before it actually did. I call that wrong.

But that wrongness can only occur on a vanishingly small number of
platforms -- quite possibly nonexistent in real life.

fgetc(f) can return EOF under only three circumstances:

1. You've reached the end of the file (very common).

2. You've encountered an error condition (rarer, but certainly worth
worrying about).

3. You're on a system where int cannot represent all values of type
char plus the value EOF without loss of information, *and* you've
just read a byte from the file whose value happens to match EOF
(typically -1). This can only happen if int has no more sign and
value bits than char, which is possible only if CHAR_BIT >= 16.
You do need to call one of the two in order to distinguish between
error and end of file.
True.



Personally I think it's more straightforward to use feof() and ferror()
which is what I'll be doing from now on.

Suppose your code is running on a system with 16-bit char and 16-bit
int, and it reads a byte with the value 0xffff, which yields -1 when
converted to int (note that even this is implementation-defined), which
happens to be the value of EOF. Since you've added extra checks, you
notice that feof() and ferror() both returned 0, meaning the -1 is a
value read from the file. How sure are you that you can handle that
value correctly? I'm not necessarily saying you can't, but I can
imagine that there might be some subtle problems.

More to the point, how are you going to be able to test your code?
Unless you have access to an exotic system, or unless you replace
fgetc() with your own version, the code that handles this case will
never be executed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,091
Messages
2,570,605
Members
47,225
Latest member
DarrinWhit

Latest Threads

Top