H
HK
Suppose you are faced with an java.io.InputStream
and it is supposed to carry either HTML or XML.
Ultimately you want to read with a Reader and the
correct encoding, of course.
Is the following a correct strategy:
1) Wrap the InputStream into a BufferedInputStream
to make sure mark() and reset() work.
2) Read single bytes from it up to some reasonable limit
and convert them to characters by simple casting:
char ch = (char)the_byte_I_read;
3) check for encoding, e.g. with regexp
4) call reset() on the BufferedInputStream
5) wrap the BufferedInputStream into a Reader
with the determined encoding
6) Start reading.
What bothers me a bit is the additional
BufferedInputStream in between when the
Reader later has another buffer. I am also
not sure if the cast is the right way to
convert bytes to chars before you know the
encoding.
Comments?
Harald.
and it is supposed to carry either HTML or XML.
Ultimately you want to read with a Reader and the
correct encoding, of course.
Is the following a correct strategy:
1) Wrap the InputStream into a BufferedInputStream
to make sure mark() and reset() work.
2) Read single bytes from it up to some reasonable limit
and convert them to characters by simple casting:
char ch = (char)the_byte_I_read;
3) check for encoding, e.g. with regexp
4) call reset() on the BufferedInputStream
5) wrap the BufferedInputStream into a Reader
with the determined encoding
6) Start reading.
What bothers me a bit is the additional
BufferedInputStream in between when the
Reader later has another buffer. I am also
not sure if the cast is the right way to
convert bytes to chars before you know the
encoding.
Comments?
Harald.