G
gamehack
Hi all,
I was doing a bit of research about writing yet another build tool but
that's not the point of my mail. I'm going to ask a few questions about
how to resolve a few internationalization problems and I'm sorry if
this
is not the right mailing list - I couldn't find any other which was
suited(since my goal is to resolve the problems in a
platform-independent way). The goal is - being able to deal with
different encodings on different platforms with no problems in a
portable fashion. After reading a few articles on the net I realized
that everything boils down to the character size. The problem is
separated into how you manage the chars/strings internally and
externally.
Internally(the way they are put in the source code files and what types
they are stored in):
Using wchar_t:
Basically using wchar_t as the fundamental character type(AFAIK it is
2-4 bytes depending on the platform) and using all correspondent w
functions and streams. The problem is what to do if there is no OS
function which accepts wchar_t. Then I would need to write my own
library to handle the proper conversions(not sure if simple type casts
would do the job). And wchar_t is not said to be in any particular
encoding so I'm bit confused about that. If I write in a source file
wchar_t* st = "something"; what encoding would it be stored as? And
what
about wchar_t* st = L"something"; UTF-8?
Using UTF-8:
I've not seen any articles on how do this(except suggestions to use
long
unsigned to store the chars but what about conversions and passing
strings to OS functions?)
Externally(OS interfaces):
I've completely no idea how to handle this. When you write e.g.
main(int
argc, char** argv) what happens if they pass the arguments as UTF-8
strings? How do you handle that? How do you handle conversion back/from
the internal representation(writing your own library?) Is there
actually
a portable way of doing it?
I'm sorry if this is not the right place to ask these questions but I'm
completely puzzled and thought you guys will be able to point me out to
the right direction. As I said the only thing which I need is to be
able
to communicate with the OS in a transparent manner without worrying
about the encoding and being able to use the future program in complete
UTF-8 environments so any valid UTF-8 could be passed etc. Any
comments/directions/remarks are greatly appreciated.
Regards,
gamehack
I was doing a bit of research about writing yet another build tool but
that's not the point of my mail. I'm going to ask a few questions about
how to resolve a few internationalization problems and I'm sorry if
this
is not the right mailing list - I couldn't find any other which was
suited(since my goal is to resolve the problems in a
platform-independent way). The goal is - being able to deal with
different encodings on different platforms with no problems in a
portable fashion. After reading a few articles on the net I realized
that everything boils down to the character size. The problem is
separated into how you manage the chars/strings internally and
externally.
Internally(the way they are put in the source code files and what types
they are stored in):
Using wchar_t:
Basically using wchar_t as the fundamental character type(AFAIK it is
2-4 bytes depending on the platform) and using all correspondent w
functions and streams. The problem is what to do if there is no OS
function which accepts wchar_t. Then I would need to write my own
library to handle the proper conversions(not sure if simple type casts
would do the job). And wchar_t is not said to be in any particular
encoding so I'm bit confused about that. If I write in a source file
wchar_t* st = "something"; what encoding would it be stored as? And
what
about wchar_t* st = L"something"; UTF-8?
Using UTF-8:
I've not seen any articles on how do this(except suggestions to use
long
unsigned to store the chars but what about conversions and passing
strings to OS functions?)
Externally(OS interfaces):
I've completely no idea how to handle this. When you write e.g.
main(int
argc, char** argv) what happens if they pass the arguments as UTF-8
strings? How do you handle that? How do you handle conversion back/from
the internal representation(writing your own library?) Is there
actually
a portable way of doing it?
I'm sorry if this is not the right place to ask these questions but I'm
completely puzzled and thought you guys will be able to point me out to
the right direction. As I said the only thing which I need is to be
able
to communicate with the OS in a transparent manner without worrying
about the encoding and being able to use the future program in complete
UTF-8 environments so any valid UTF-8 could be passed etc. Any
comments/directions/remarks are greatly appreciated.
Regards,
gamehack