From: "dandelion said:
You can screw the loop invariant easily by adding i++ somewhere in
the loop, but that's something i'd consider (ipso facto) bad coding
practice, allthough it's sadly far from uncommon.
Suppose the function converts UTF-8 into Unicode points. Most of the
time it takes just one byte, sees the high-order bit zero, and so it's
a single-byte ASCII character. Once in a while it sees the high-order
bit one, and so it's the start of a multi-byte representation of a
non-ASCII Unicode character, so it must do i++ one or more times to
gobble additional byte(s) before finishing this trip through the loop.
[first idea] Upon exiting the loop, you need to check whether i is
really exactly what you expected it to be, instead of past that point
indicating input-buffer overrun.
[second idea] I would probably rewrite the loop as an infinite loop
with explicit tests at the bottom, first a test for overrun, signalling
an error, and if that didn't happen then the normal end-of-loop test
with break statement. That way somebody starting to read the loop at
the top, and then skipping the details to see the very end, wouldn't be
confused by the apparent loop invariant being tested for no apparent
reason.
[third idea] Or even better I'd write a subroutine to gobble as many
bytes as needed, updating the loop pointer/index as a side-effect
(clearly commented in the calling code), not updating the pointer/index
at all in the main loop, and the subroutine would signal an error
*immediately* upon overrunning the buffer instead of gladly reading
additional bytes and detecting the overrun only later after the loop
exits (my [first idea] above) or near the bottom of the the loop (my
[second idea] above), and of course I'd write the program in CL
(CommonLisp) or Java which have nice exception handling instead of in
C.
By the way, I found this thread by scanning comp.programming in Google
Groups, but then when I tried to followup I discovered it was going
only to a group I don't ordinarily read, comp.lang.c, and also
my remarks are more general than just the C language, so I put
comp.programming back in the newsgroups list, hope nobody minds.