Character encoding (2)

P

pietdejong

Hello again,

Sorry for posting this again, but since my thread of last saturday kind
of ended on a dead track, I decided to post it brand new. Refer also
to:
http://groups-beta.google.com/group...8&mode=thread&noheader=1#doc_b7b2b45a08c061be


The problem I'm having is basically only on the server side...
I'm working on a server that should receive HTTP requests. It is
however possible that the request that arrives at the server is not
HTTP. This possibility is verified on the first byte of data.
(in other words:
if the first byte is equal to 0x01,
then not HTTP
else ... )

Given that the information is posted according to HTTP, I'm trying to
resolve the following: I don't know a priori which encoding is used for
the data stream. The following rules for encoding apply:

If the string (using regex) <?xml [^>]+encoding="([^"]+)" is
encountered, $1 is used for decoding, otherwise a default char set is
used. My goal is to both use the characters (i.e. the server's
'interpretation' of the bytes received) as the original byte stream. I
want to write to a file the original byte stream, while using the
derived character stream for processing (using beans, XSL
transformation etc.)

I tried simulating the client using a basic HTML page, with a FORM
action to my server's url. Now in HTML I can specify the meta element
Content-type, and set it to "text/xml; charset=utf-8 or whatever I
like. I recall that by default HTML Forms encode using the platform
default charset and content-type application/x-www-form-urlencoded

Also tried to simulate the client with a JAVA application that makes
use of the java.net.HttpURLConnection. Here I have set the
requestProperty "Content-type" to "text/xml; charset=utf-8".

Now I'm not sure whether in either one or both cases the stream is mime
encoded...?

Someone in the previous thread suggested me to use HttpURLConnection
also on the serverside, but since I'm expecting also non-HTTP requests,
I'm not sure if I can. Most likely I cannot use a BufferedReader,
because it is based on a character stream, so I lose the original byte
stream...

Thx
 
P

pietdejong

Actually, rereading my post, I would like to add: I want to write to a
file the original byte representation of characters after having
processed them. The bean object(s) I'm using have their own write()
method, which writes to an outputstream.

I guess the solution here is remind what was the original encoding, use
the bean's write method to write to a ByteArrayOutputStream, and then
parse that to a String using platform default encoding, and then
rewrite that using the original encoding to the file output stream...
right?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top