Character encoding (2)

pietdejong · Oct 25, 2004

Hello again,

Sorry for posting this again, but since my thread of last saturday kind
of ended on a dead track, I decided to post it brand new. Refer also
to:
http://groups-beta.google.com/group...8&mode=thread&noheader=1#doc_b7b2b45a08c061be

The problem I'm having is basically only on the server side...
I'm working on a server that should receive HTTP requests. It is
however possible that the request that arrives at the server is not
HTTP. This possibility is verified on the first byte of data.
(in other words:
if the first byte is equal to 0x01,
then not HTTP
else ... )

Given that the information is posted according to HTTP, I'm trying to
resolve the following: I don't know a priori which encoding is used for
the data stream. The following rules for encoding apply:

If the string (using regex) <?xml [^>]+encoding="([^"]+)" is
encountered, $1 is used for decoding, otherwise a default char set is
used. My goal is to both use the characters (i.e. the server's
'interpretation' of the bytes received) as the original byte stream. I
want to write to a file the original byte stream, while using the
derived character stream for processing (using beans, XSL
transformation etc.)

I tried simulating the client using a basic HTML page, with a FORM
action to my server's url. Now in HTML I can specify the meta element
Content-type, and set it to "text/xml; charset=utf-8 or whatever I
like. I recall that by default HTML Forms encode using the platform
default charset and content-type application/x-www-form-urlencoded

Also tried to simulate the client with a JAVA application that makes
use of the java.net.HttpURLConnection. Here I have set the
requestProperty "Content-type" to "text/xml; charset=utf-8".

Now I'm not sure whether in either one or both cases the stream is mime
encoded...?

Someone in the previous thread suggested me to use HttpURLConnection
also on the serverside, but since I'm expecting also non-HTTP requests,
I'm not sure if I can. Most likely I cannot use a BufferedReader,
because it is based on a character stream, so I lose the original byte
stream...

Thx

pietdejong · Oct 25, 2004

Actually, rereading my post, I would like to add: I want to write to a
file the original byte representation of characters after having
processed them. The bean object(s) I'm using have their own write()
method, which writes to an outputstream.

I guess the solution here is remind what was the original encoding, use
the bean's write method to write to a ByteArrayOutputStream, and then
parse that to a String using platform default encoding, and then
rewrite that using the original encoding to the file output stream...
right?

Character operations in C++	2	Jan 28, 2024
How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
Uploading images - binary or unsupported text encoding	2	Dec 24, 2022
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
HTTP request with trailer	0	Mar 22, 2024
Character encoding	14	Feb 15, 2008
Encoding of character literals	4	Nov 3, 2011
xml.dom.minidom character encoding	6	Apr 21, 2010

Character encoding (2)

pietdejong

pietdejong

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads