R
Randy Howard
Sorry, your logic is too foolish for me to understand.
Can the two of you go off privately somewhere and beat each other to
a pulp? Watching it here doesn't seem very productive.
Sorry, your logic is too foolish for me to understand.
Jun Woong said:char foo[] = "\x70\x70\x01\x02";
char bar[MB_CUR_MAX];
Assuming that str[] contains a valid multibyte character sequence,
'\x70' is a shift character and redundant shift characters are
allowed,
mbtowc(&wc, str, sizeof(str)-1);
In said:Indeed you are, as am I.
My opinion is that your opinion is downright broken. ;-)
There were very good reasons for the restriction in C89.
Dan Pop said:This statement is worth zilch without an enumeration of the "very good
reasons". Unlike JW, I'm completely immune to the "magister dixit" style
of argumentation.
Dan Pop said:The work on Unicode started in 1986, which is a good three years before
the adoption of C89.
Bully for you. This isn't my area of expertise, thus the appeal to
authority. P. J. Plauger alludes to the kinds of problems it was
intended to address in his discussion of the _Printf function in "The
Standard C Library".
The fundamental issue is how to recognize a "%" in the format string.
As you've said, it is necessary to convert the format string to a
sequence of wide characters and look for one corresponding to a percent
sign. But what is the wide character code for a percent sign? It's
tempting to say that it's L'%', but remember that the wide character
encoding is allowed to be locale-specific, and the user is allowed to
change the current locale at any time, so that doesn't work without
something like the restriction under discussion. (With the restriction,
of course, you don't even need to use a wide character constant, '%' is
sufficient).
Without it, you'd be forced to call mbtowc on "%" every time to get the
current encoding, but the implementation must behave as if no library
function calls mbtowc, so you'd also have to save and restore its state
around the call. That was considered to be unacceptable overhead to
require, thus the restriction. (Which, as I've said before, was
innocuous at the time since no one was even contemplating an
implementation where it did not hold.)
Kevin Easton said:Bully for you. This isn't my area of expertise, thus the appeal to
authority. P. J. Plauger alludes to the kinds of problems it was
intended to address in his discussion of the _Printf function in "The
Standard C Library".
The fundamental issue is how to recognize a "%" in the format string.
As you've said, it is necessary to convert the format string to a
sequence of wide characters and look for one corresponding to a percent
sign. But what is the wide character code for a percent sign? It's
tempting to say that it's L'%', but remember that the wide character
encoding is allowed to be locale-specific, and the user is allowed to
change the current locale at any time, so that doesn't work without
something like the restriction under discussion. (With the restriction,
of course, you don't even need to use a wide character constant, '%' is
sufficient).
Without it, you'd be forced to call mbtowc on "%" every time to get the
current encoding, but the implementation must behave as if no library
function calls mbtowc, so you'd also have to save and restore its state
around the call. That was considered to be unacceptable overhead to
require, thus the restriction. (Which, as I've said before, was
innocuous at the time since no one was even contemplating an
implementation where it did not hold.)
Why can't the implementation provide, for it's own use, a lookup table
of what_percent_looks_like_in_this_locale[] - after all, mbtowc clearly
has this information available.
Jun Woong said:Kevin Easton said:(e-mail address removed) wrote: [ ...implementing _Printf, and '%' == L'%'... ]Without it, you'd be forced to call mbtowc on "%" every time to get the
current encoding, but the implementation must behave as if no library
function calls mbtowc, so you'd also have to save and restore its state
around the call. That was considered to be unacceptable overhead to
require, thus the restriction. (Which, as I've said before, was
innocuous at the time since no one was even contemplating an
implementation where it did not hold.)
Why can't the implementation provide, for it's own use, a lookup table
of what_percent_looks_like_in_this_locale[] - after all, mbtowc clearly
has this information available.
One reason I can think is portability. One easier (but not portable)
way than you said is to take advantage of an internal access to the
state of the conversion.
Dan Pop said:UCS did exist when C99 was drafted, yet the broken text is still there.
The work on Unicode started in 1986, which is a good three years before
the adoption of C89.
What *exactly* was it buying to C90?
This doesn explain anything at all about the necessity of having
'a' == L'a', does it?
char foo[] = "\x70\x70\x01\x02";
char bar[MB_CUR_MAX];
Assuming that str[] contains a valid multibyte character sequence,
'\x70' is a shift character and redundant shift characters are
allowed,
mbtowc(&wc, str, sizeof(str)-1);
wctomb(bar, wc);
the sequence in bar[] can be "\x70\x01\x02". Is this wrong?
I can't see anything wrong with that. Where is the problem?
And what the hell is wrong with
if (wc == L'%') /* conversion specifier */
which does NOT depend on that guarantee and is what I have suggested as
the portable solution to your problem?
Then, why did you invoke *portability* arguments for the usefulness of
the guarantee under discussion?
Nope, the code was equally easy to write in pure C89, without relying on
the guarantee, as demonstrated above.
Kevin Easton said:There are plenty of library functions that have unacceptable overheads
when implemented in a portable manner, but can usually be efficiently
implemented in a non-portable way. In particular, strcmp() comes to
mind - so I don't think the possibility of a portable implementation
suffering unacceptable overhead when a non-portable implementation
wouldn't is sufficient reason to add the restriction.
Dan Pop said:This statement is worth zilch without an enumeration of the "very good
reasons". Unlike JW, I'm completely immune to the "magister dixit" style
of argumentation.
Jun Woong said:The story can change, if the committee thought over a possibility for
uses to want to write a similar code in a portable way like that.
Without such a guarantee, the only way you, as an user of an
implementation who don't know about the implementation details, can
write a similar code is to use a technique that's somewhat complicated
and has overhead.
In said:Bully for you. This isn't my area of expertise, thus the appeal to
authority. P. J. Plauger alludes to the kinds of problems it was
intended to address in his discussion of the _Printf function in "The
Standard C Library".
The fundamental issue is how to recognize a "%" in the format string.
Dan Pop said:And the trivial solution is btowc(), rather than imposing even *more*
conditions on the encoding of the character sets used by a conforming
implementation.
It doesn't look like the design of btowc() was beyond the capabilities of
the X3J11 committee, and its necessity is obvious, given the restrictions
of use of mbtowc().
But even mbtowc() could be safely used by printf for this purpose, right
before calling it on the first character of the format string, which
already assumes the initial shift state: converting % is not going to
cause any change of shift state.
Dan Pop said:But even mbtowc() could be safely used by printf for this purpose, right
before calling it on the first character of the format string, which
already assumes the initial shift state: converting % is not going to
cause any change of shift state.
How could it be safe without saving and restoring the state
information, if an user interleaves a call to printf() between
two calls to mbtowc(), the latter of which depends on the state
changed by the former?
In said:btowc() didn't exist in C90 (it was added in AM1), so it hardly
qualifies as a "trivial solution".
(And I'm not sure what you mean by
"imposing even *more* conditions on the encoding", C imposes very few
conditions.)
No, it's necessity was *not* obvious -- the restriction served the
purpose just as well.
Dan Pop said:I know that it didn't exist in C90, but this doesn't make it a less
trivial solution, as explained below. Once the problem was identified,
there were two solutions: the wrong one, which the committee chose, and
the correct one: provide the required conversion function.
1. The encoding of any member of the base character set, when stored
in a char, has a non-negative value.
2. The digit characters have contiguous encodings.
3. The members of the base character set have the same value when encoded
as character constants, wide character constants and multibyte
characters in the initial shift state.
Dan Pop said:We have already agreed that a portable implementation of printf *must*
use mbtowc to parse the format string, haven't we?
In said:It must be nice to see everything in black and white and not have to
worry about those annoying shades of gray.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.