Data entry in foreign languages

R

Roedy Green

To what extent can you ignore the foreign language problem for
entering data into Java?

do the OS keyboard drivers and Unicode handle everything?

What about Hebrew, right to left. Do the Strings read right to left
too?

What about Arabic which has 2D placement and all kinds of special
forms for the glyphs. Do the fonts contain enough information that you
just string the unicode chars together and it renders plausibly?

Has anyone any experience with languages that don't use the Roman
alphabet?

If something strange is required, how the heck do you write code
without being able to tell if the results are correct? Are there some
test strings?
 
O

opalpa

I have experience with writing program which accept chinese variants
and japanese characters.

There are ample example strings and web tools for the varients of
Unicode. One way to handle input is to copy and paste examples from
webpages into web form inputs and have those feed into Java servlets.
Another way is to save examples and read binary input into Java Strings
followed by Graphics.drawString invocations.

This has worked very smoothly for me.

The only gottcha I ran into was when trying to pass paramters into
java's main from environemtn -- this is specified as non-standard, but
I like using it. The problem is that main(String a[]) does not specify
an interpretation of bytes passed in from environment. I suppose the
assumption is that ascii was read in from terminal.

Strings can be concatinated without trouble as long as same Unicode
encoding is used (95% sure of this). There are different Unicode
encoding (100% sure of this). UTF-8 appears to be best. UTF-8 did not
exist till two years after Java 1.0. Java 1.0 uses Unicode 16.

I am not sure about the Hebrew thing being right to left. I suspect
Java does take account of that stuff because I remember seeing
LayoutManagers take account of local conventions. I have not tried
displaying a right to left language.

Opalinski
(e-mail address removed)
http://www.geocities.com/opalpaweb/
 
O

Oliver Wong

Roedy Green said:
To what extent can you ignore the foreign language problem for
entering data into Java?

do the OS keyboard drivers and Unicode handle everything?

What about Hebrew, right to left. Do the Strings read right to left
too?

What about Arabic which has 2D placement and all kinds of special
forms for the glyphs. Do the fonts contain enough information that you
just string the unicode chars together and it renders plausibly?

Has anyone any experience with languages that don't use the Roman
alphabet?

If something strange is required, how the heck do you write code
without being able to tell if the results are correct? Are there some
test strings?

I'll warn you right now that the matter is complicated by the fact that
there may be an additional layer of indirection between the keyboard, and
Java. Microsoft uses something called IME (Input Method Editor, or something
like that?) such that when I set my local to Japanese Hiragana, and I type
in 'k', Java will notice that a key-down event has occured, and that the 'k'
key was pressed, but the character 'k' has not yet been sent to the input.

Then, I later press 'a', and Java will notice that a key-down event has
occured, and that the 'a' key was pressed, and still no character has been
sent to the input yet.

I then press 'enter', and Java will notice yet another key-down event
has occured, and that the 'enter' key was pressed, and now, finally, a
single character 'hiragana ka' (\u304b) has been sent to the input.

So when you ask "Does Java handle everything?" it depends on what you're
doing. As long as you don't associate key pressed with character input, you
should be okay. The IME (or its equivalent on Linux and MacOS) will take
care of translating key events into unicode strings. But if you start mixing
key-handling and string reading, you may run into problems.

- Oliver
 
R

Roedy Green

To what extent can you ignore the foreign language problem for
entering data into Java?

I am doing some experiments after coming up somewhat dry on google,
getting mostly information IN Hebrew rather than about how to render
it.

I discovered first it is not strictly right to left. Numbers go left
to right. AND you still enter the digits high order first.

When you key into a Java JTextField the cursor flips back and forth
between the two modes. It is quite insane.

Further just setting the input locale and keyboard driver to Hebrew
was sufficient to trigger this mixed right to left behaviour. However,
it was not sufficient to trigger an automatic right justify on a
JTextField.

The caretListener getText sees something quite different from what is
on the screen. I have yet to sort out what the internal forms are.

The is all complicated by the fact the only thing I know about Hebrew
is what as Aleph character looks like.

Then on top of this, I have to figure out what drawString thinks.
which probably has no notion of locale.

I have started a web page on my findings. See
http://mindprod.com/jgloss/hebrew.html
 
R

Roedy Green

I have started a web page on my findings. See
http://mindprod.com/jgloss/hebrew.html

At least for output, Java, it turns out is quite automatic. It
ignores the locale. It simply notices Hebrew characters in your
Jlabel, JTextArea, JTextField or drawString and renders them right to
left. The first char you pronounce is at position 0 of the string, and
at the far right on screen.

So all works fine except that everything is left instead of right
aligned. I gather then Israelis are used to seeing inept computer
printout all left aligned.

I have posted a little test program to demonstrate this.

Console IO is hopeless.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top