Forums
New posts
Search forums
Members
Current visitors
Log in
Register
What's new
Search
Search
Search titles only
By:
New posts
Search forums
Menu
Log in
Register
Install the app
Install
Forums
Archive
Archive
C++
Wide characters and streams
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
Reply to thread
Message
[QUOTE="P.J. Plauger, post: 2580739"] Indeed that is pretty stupid. I don't mind stupid defaults so long as they are described in the documentation, but the documentation of std::wofstream or std::wcout makes no mention of this. I notice though that std::wstringstream doesn't seem to suffer this problem. As far as std::wcout goes though there must be something else going on as well or the AE ligature would not have been mangled to a Greek mu. This would seem to imply that using a codecvt that passed through UTF-16 would not work or is it the existing codecvt that is performing the miss-transliteration? [pjp] The whole problem is the stupid default conversion. Our C++ library has always used the fgetwc/fputwc machinery from the C library for the default wchar_t codecvt facet. Thus, we more or less inherit whatever decision a compiler vendor has chosen for C. (Unless, of course, that vendor has also licensed our C library, in which case you get UTF-16/UTF-8 by default.) But remember that what you see is also determined by the display software, which is outside the purview of C and C++. Sometimes that's not what you expect, so extended character sets get curdled in surprising ways on their way to your eyeballs. --- I can't help but think that a lot of the frustration could be very simply resolved by just properly documenting what the libraries do and putting that documentation where people will see it. [pjp] I agree that these decisions could be better highlighted. --- As a practical matter I don't understand how wchar_t streams can be seen as anything but broken (in the 'not working' sense) on this platform if I have to write my own codecvt implementation or buy one in so that I can write UTF-16 files. [pjp] If they don't do what you want, then they are broken to you. --- It seems bizarre that an assertion that the streams aren't broken is compatible with the fact that they cannot be used in what must be a very common (if not the most common) use case. An inability to write UTF-16 to the console sure seems broken to me and an implementation that writes UTF-16 streams as you describe surely can't be described as 'working' for any practical purpose. [pjp] The common use case of today is not the one that was common a decade or more ago, when some of these decisions were made. The default conversion is doubtless overdue for revision. --- Actually if the default codecvt was simply a null, do nothing UTF-16 to UTF-16 that would be fine too. [pjp] For some people. --- We did notice that writing a codecvt implementation is no trivial task. We tried to write a UTF-16 to UTF-8 codecvt, but haven't managed to get it to work. [pjp] It's the hardest codecvt facet of all to write. In fact, it's officially impossible, since codecvt was "designed" to do 1-N code conversions, and UTF-16/UTF-8 is M-N. No Standard C++ library except ours will even give you a fighting chance, and it's a fiendishly difficult coding problem even then. --- Looking at the comments in our source it seems that there was some confusion about what do_length should return. I think the standard says it should return the number of bytes, but the documentation we were using at the time seemed to imply that it should return the number of wchar_t. The documentation we're now using looks to have been changed, but I'm not sure I can work out from the wording what it is saying should be returned. [pjp] The description of codecvt in the C++ Standard is murky, to put it politely. --- This is something that we may revisit. On your web site, is "compleat" some joke that I'm not getting? [pjp] "Compleat" is an older spelling of "complete". See, for example, the noted 17th century book, "The Compleat Angler or the Contemplative man's Recreation." --- And thanks for taking the time to answer. It's certainly cleared up a lot about what is going on. [pjp] Welcome. [/QUOTE]
Verification
Post reply
Forums
Archive
Archive
C++
Wide characters and streams
Top