Why JDBC need explicit character conversion?

H

howachen

Hi,

I use java to connect to MySQL (5.x), using the latest connector.

I have declared the character setting in the connection:
//-----------------------------------------------------------------------------
Properties props = new Properties();
props.put("characterEncoding", "UTF-8");
props.put("useUnicode", "true");
props.put("user", "root");
props.put("password", "password");
Class.forName("com.mysql.jdbc.Driver");

String dbUrl = "jdbc:mysql://" + "127.0.0.1" + "/"
+ "test_db";

this.setConnection(DriverManager.getConnection(dbUrl, props));
//-----------------------------------------------------------------------------

in the query part
//-----------------------------------------------------------------------------

String sqlstr = "SELECT * FROM test_table WHERE id = 8";
try {
PreparedStatement s = this.getConnection().prepareStatement(sqlstr);
ResultSet rs = s.executeQuery();

while (rs.next()) {

String text1Val = "";
String text2Val = "";

text1Val = rs.getString("text1"); // THIS DOES NOT WORK

try {
text2Val = new String(rs.getString("text1").getBytes(),
"UTF-8"); // THIS WORK!
} catch (UnsupportedEncodingException e2) {
e2.printStackTrace();
}
}
.....


//----------------------------------------

the value storing in "text1" field is a UTF-8 character.

Why need this kind of overhead in character conversion when using java?


thanks...
 
C

Chris Smith

I use java to connect to MySQL (5.x), using the latest connector.

As an unrelated side note, have you considered PostgreSQL instead? It
is free, but also cares about your data integrity, has a better design
and better standards compliance, and most importantly no one from the
PostgreSQL project has ever made any statements to me of the form: "We
think you (and everyone else who has written database-independent JDBC
code that may be usable with MySQL) may owe us money despite possibly
having never used our product, ever; but we can't give you legal advice
on interpreting our own license, so please contact an attorney if you
need advice on this matter." That last one is important to me. :)

On to your question...
I have declared the character setting in the connection:
//-----------------------------------------------------------------------------
Properties props = new Properties();
props.put("characterEncoding", "UTF-8");
props.put("useUnicode", "true");

The reference manual at:

http://dev.mysql.com/doc/refman/5.0/en/cj-character-sets.html

suggests that you should be specifying "utf8" instead of "UTF-8". Does
this make any difference?

....
text1Val = rs.getString("text1"); // THIS DOES NOT WORK

try {
text2Val = new String(rs.getString("text1").getBytes(),
"UTF-8"); // THIS WORK!

Hmm. That's really bad! The database appears to be sending the data
back in the system default encoding. If that data is representable in
the system default encoding, then the code above will work; but
otherwise, it will fail. Hence, your application will fail non-
deterministically based on things like the operating system, version,
locale settings, environment variables (in UNIX), etc.

If the above suggestion doesn't work, I'll poke around further.
Why need this kind of overhead in character conversion when using java?

This doesn't have anything to do with Java; it's a JDBC driver problem.
 
H

howachen

Chris Smith 寫é“:
As an unrelated side note, have you considered PostgreSQL instead? It
is free, but also cares about your data integrity, has a better design
and better standards compliance, and most importantly no one from the
PostgreSQL project has ever made any statements to me of the form: "We
think you (and everyone else who has written database-independent JDBC
code that may be usable with MySQL) may owe us money despite possibly
having never used our product, ever; but we can't give you legal advice
on interpreting our own license, so please contact an attorney if you
need advice on this matter." That last one is important to me. :)

On to your question...


The reference manual at:

http://dev.mysql.com/doc/refman/5.0/en/cj-character-sets.html

suggests that you should be specifying "utf8" instead of "UTF-8". Does
this make any difference?

this does not work.

in fact, the documentation you quoted above said we should use "UTF-8"
:

i.e.
When specifying character encodings on the client side, Java-style
names should be used. The following table lists Java-style names for
MySQL character sets...

....or by configuring the JDBC driver to use "UTF-8" through the
characterEncoding property.
 
C

Chris Smith

this does not work.

in fact, the documentation you quoted above said we should use "UTF-8"
:

i.e.
When specifying character encodings on the client side, Java-style
names should be used. The following table lists Java-style names for
MySQL character sets...

...or by configuring the JDBC driver to use "UTF-8" through the
characterEncoding property.

Oops. I misread the page. Sorry, I don't know why the driver is using
the wrong encoding.
 
?

=?ISO-8859-2?Q?Dra=BEen_Gemi=E6?=

I have declared the character setting in the connection:
AS a matter of fact, I never had that kind of problem with Postgres or
MS SQL, or Hypersonic SQL.....

Check your jdbc driver documentation.....probably there is one.

DG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top