Approaches to localization?

P

Phil Tomson

I'm developing a GUI app using Ruby and FLTK. One of the requirements
that's come up lately is that this app should be 'localizable'. I'm trying
to figure out approaches for doing this as I design the app.

The approach I'm thinking of is to have some seperate file, maybe in XML,
that would contain all of the text which is used for things like button
labels, output messages, etc. Then to support different languages all the
customer would have to do is include a different language specific file.

These files might look something like:

<ButtonLabels>
<Next>
"Next->"
</Next>
<Cancel>
"Cancel"
</Cancel>
<Back>
"<-Back"
</Back>
<Login>
"Login"
</Login>
<Register>
"Register"
</Register>
</ButtonLabels>
<ErrorMessages>
<LoginFailed>
"Could not log into the server! Please check your proxy configuration"
</LoginFailed>
</ErrorMessages>

So, if they wanted to make a Japanese version of the app, they would
replace each of the strings above with their Japanese counterparts.
While it seems a bit heavy, I'm thinking an XML approach would be good for
a couple of reasons:
1) this localization will probably be taking place long after I'm gone
(this is a contract job) and XML is widely known and understood.
2) it's easy to extract data from it using REXML


Does this seem like a reasonable approach?

Has anyone used alternate approaches?

Phil
 
P

Peter C. Verhage

Does this seem like a reasonable approach?
Has anyone used alternate approaches?

It does sound very heavy to me. In the Java world the widely used method
for localization is storing strings in simple properties files (or in
simple classes). I think they are even more easy to understand than XML
and far more lightweight. An example could be the following:

buttonlabels.next=Next->
buttonlabels.cancel=Cancel
etc.

As you can see, you can create the same kind of hierarchy (as you did
using XML) using the above dot notation.

I'm sure a properties file parser for Ruby exists. And if not can be
created with maybe only a single line of code.

Regards,

Peter
 
P

Phil Tomson

It does sound very heavy to me. In the Java world the widely used method
for localization is storing strings in simple properties files (or in
simple classes). I think they are even more easy to understand than XML
and far more lightweight. An example could be the following:

buttonlabels.next=Next->
buttonlabels.cancel=Cancel
etc.

As you can see, you can create the same kind of hierarchy (as you did
using XML) using the above dot notation.

This is an interesting idea, however, I can't assume that the customer
will have people who know Ruby in the future when I'm gone. Currently,
I'm the only one in the group who does Ruby and I'm developing this
particular app on my own. While I am supposed to do some training to
bring others up to speed, it may not come before the budget axe falls
(something rumored to be hanging over our heads). So if they've got
this XML file that can be edited that's probably a big advantage over
having them go through the Ruby code. Also, with the XML file approach
all of these strings are in one location and they're easy to find when it
comes time for them to edit.

Phil
 
M

Mark Hubbart

I'm developing a GUI app using Ruby and FLTK. One of the requirements
that's come up lately is that this app should be 'localizable'. I'm
trying
to figure out approaches for doing this as I design the app.

The approach I'm thinking of is to have some seperate file, maybe in
XML,
that would contain all of the text which is used for things like button
labels, output messages, etc. Then to support different languages all
the
customer would have to do is include a different language specific
file.

These files might look something like:

[...]

So, if they wanted to make a Japanese version of the app, they would
replace each of the strings above with their Japanese counterparts.
While it seems a bit heavy, I'm thinking an XML approach would be good
for
a couple of reasons:
1) this localization will probably be taking place long after I'm gone
(this is a contract job) and XML is widely known and understood.
2) it's easy to extract data from it using REXML


Does this seem like a reasonable approach?

Has anyone used alternate approaches?

You might look at Apple/NextStep's approach to localization. (probably
others use it too) Basically, there's a file that maps english text to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the appropriate
language.

So to do this in ruby, you might have a pig-latin translation file,
using yaml:

---
Enter text here: Enterway exttay erehay
Type "yes" to continue: Ypetay "yes" otay ontinuecay

Then write a small class:

class Localization
def initialize(filename)
# load yaml data from file to a hash
# stored in @translations
end
def [](text)
@translations[text] || text
end
end

localization files are automatically generated for english, that
include all the string literals from the source code. For the english
files, they just map each string to itself. But then you can easily
hand the file over for translation, and add the new one in.

HTH,
--Mark
 
P

Peter C. Verhage

Phil said:
This is an interesting idea, however, I can't assume that the customer
will have people who know Ruby in the future when I'm gone. Currently,
I'm the only one in the group who does Ruby and I'm developing this
particular app on my own. While I am supposed to do some training to
bring others up to speed, it may not come before the budget axe falls
(something rumored to be hanging over our heads). So if they've got
this XML file that can be edited that's probably a big advantage over
having them go through the Ruby code. Also, with the XML file approach
all of these strings are in one location and they're easy to find when it
comes time for them to edit.

The sample I gave you is no ruby code. It's just a plain text file. It's
easy to parse with Ruby, Java has native support for it and it's
actually easy to parse with any kind of language. They can edit the file
using notepad, vi or whatever editor they want. If they've seen a single
example I think they will understand directly what they need to do.

An article about Java internationalization can be found here:
http://java.sun.com/developer/technicalArticles/Intl/ResourceBundles/,
especially the part about the PropertyResourceBundle should interest you.

Regards,

Peter
 
P

Phil Tomson

I'm developing a GUI app using Ruby and FLTK. One of the requirements
that's come up lately is that this app should be 'localizable'. I'm
trying
to figure out approaches for doing this as I design the app.

The approach I'm thinking of is to have some seperate file, maybe in
XML,
that would contain all of the text which is used for things like button
labels, output messages, etc. Then to support different languages all
the
customer would have to do is include a different language specific
file.

These files might look something like:

[...]

So, if they wanted to make a Japanese version of the app, they would
replace each of the strings above with their Japanese counterparts.
While it seems a bit heavy, I'm thinking an XML approach would be good
for
a couple of reasons:
1) this localization will probably be taking place long after I'm gone
(this is a contract job) and XML is widely known and understood.
2) it's easy to extract data from it using REXML


Does this seem like a reasonable approach?

Has anyone used alternate approaches?

You might look at Apple/NextStep's approach to localization. (probably
others use it too) Basically, there's a file that maps english text to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the appropriate
language.

So to do this in ruby, you might have a pig-latin translation file,
using yaml:

---
Enter text here: Enterway exttay erehay
Type "yes" to continue: Ypetay "yes" otay ontinuecay

Then write a small class:

class Localization
def initialize(filename)
# load yaml data from file to a hash
# stored in @translations
end
def [](text)
@translations[text] || text
end
end

localization files are automatically generated for english, that
include all the string literals from the source code.

I like this idea. However, I'm not clear on how the localization files
are automatically generated, can you provide more details?
For the english
files, they just map each string to itself. But then you can easily
hand the file over for translation, and add the new one in.

yes, this seems like a good route to take, even if I can't automatically
generate thee localization files and knowledge of YAML isn't really an
issue because the file format is rather self-explanatory.

Phil
 
J

Joel VanderWerf

Mark said:
You might look at Apple/NextStep's approach to localization. (probably
others use it too) Basically, there's a file that maps english text to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the appropriate
language.

How do they handle ambiguity, when one English word or phrase should be
translated differently in different contexts? For example, "File" is
both a verb and a noun in English, but in some languages there are two
words.
 
P

Phil Tomson

I'm developing a GUI app using Ruby and FLTK. One of the requirements
that's come up lately is that this app should be 'localizable'. I'm
trying
to figure out approaches for doing this as I design the app.

The approach I'm thinking of is to have some seperate file, maybe in
XML,
that would contain all of the text which is used for things like button
labels, output messages, etc. Then to support different languages all
the
customer would have to do is include a different language specific
file.

These files might look something like:

[...]

So, if they wanted to make a Japanese version of the app, they would
replace each of the strings above with their Japanese counterparts.
While it seems a bit heavy, I'm thinking an XML approach would be good
for
a couple of reasons:
1) this localization will probably be taking place long after I'm gone
(this is a contract job) and XML is widely known and understood.
2) it's easy to extract data from it using REXML


Does this seem like a reasonable approach?

Has anyone used alternate approaches?

You might look at Apple/NextStep's approach to localization. (probably
others use it too) Basically, there's a file that maps english text to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the appropriate
language.

So to do this in ruby, you might have a pig-latin translation file,
using yaml:

---
Enter text here: Enterway exttay erehay
Type "yes" to continue: Ypetay "yes" otay ontinuecay

Then write a small class:

class Localization
def initialize(filename)
# load yaml data from file to a hash
# stored in @translations
end
def [](text)
@translations[text] || text
end
end

localization files are automatically generated for english, that
include all the string literals from the source code. For the english
files, they just map each string to itself. But then you can easily
hand the file over for translation, and add the new one in.

I thought a bit more about this approach, it I think I see a flaw:
If you change the text in your program, you need to also change the text
in your translation file. For example:

#your translation.yaml file for English:
"Login Failed. Please check your proxy settings" : "Login Failed.
Please check your proxy settings"

I'm assuming you use the Localization class like so:

#Your Ruby code:
loc = Localization.new("translation.yaml")
#...
puts loc["Login Failed. Please check your proxy settings."]



Now let's say you want to make the message more detailed so you change
your Ruby code like so:

puts loc["Login Failed. You could be behind a firewall. If so,
please check your Proxy settings"]


For the case of English, it's not a problem, you'll see the new message.
However, when you give the translation file to your translator to
translate, they will translate the old message and when your program is
run it will display the English text instead of the translated
text because the hash doesn't contain the
key for the new message. In order to prevent this you must meticulously
keep your translation.yaml file in sync with your Ruby code. This seems
to me to be a violation of the Do Not Repeat Yourself rule.

Phil
 
N

nobu.nokada

Hi,

At Tue, 27 Apr 2004 12:48:21 +0900,
Austin Ziegler wrote in [ruby-talk:98474]:
When I want to display a message, I do:

puts @config.message[:no_webmaster_defined]
raise @config.message[:backend_unknown] % ["madaleine"]

The latter illustrates a rather large problem with this approach, though. I
believe that the only proper answer is tagged-templated translation, but
this is a very heavyweight response. Most messages are NOT fixed messages,
but include information from the program. printf-formats (like I use) are
positional, but that won't work for all languages in all contexts. Ideally,
a fast format would be available that allows me to do:

:message_1 => "The <:subject> is <:eek:bject>."

Then, it could be translated as (I'm going from a very rusty memory here, so
please forgive me):

:message_1 => "<:eek:bject> ga <:subject> desu."

Use %n$ format.

$ ruby -e 'printf "%2$s ga %1$s desu.", "object", "subject"'
subject ga object desu.

# Also in Japanese, `subject' almost comes first in general.
# E: subject verb object
# J: subject object verb

BTW, once I'd proposed the format indexed by name, similar to
yours, borrowed from Python:

$ ruby -e 'printf "The %(subject)s is %(object)s.", {subject:"system", object:"down"}'
The system is down.

It was rejected, however, I believe this would be usefull for
L10N.
 
M

Mark Hubbart

I like this idea. However, I'm not clear on how the localization files
are automatically generated, can you provide more details?

Based on how they look, I'm guessing it strips all of the string
literals out of the source code.

yes, this seems like a good route to take, even if I can't
automatically
generate thee localization files

Perhaps a regexp could get all the "quoted" strings, or maybe a lexer
would be needed, I'm not sure.
and knowledge of YAML isn't really an
issue because the file format is rather self-explanatory.

I was somewhat struck by how simple and readable it was in YAML...
definitely a tribute to it's flexibility.

cheers,
--Mark
 
G

gabriele renzi

il Tue, 27 Apr 2004 14:31:45 +0900, (e-mail address removed) ha
scritto::

BTW, once I'd proposed the format indexed by name, similar to
yours, borrowed from Python:

$ ruby -e 'printf "The %(subject)s is %(object)s.", {subject:"system", object:"down"}'
The system is down.

It was rejected, however, I believe this would be usefull for
L10N.

Incredible, I was thinking of this this morning.
Now that the battle field is changed (i.e. we have key:val hashes)
could we insist on this feature a little more? :)
 
M

Mark Hubbart

[...]

Does this seem like a reasonable approach?

Has anyone used alternate approaches?

You might look at Apple/NextStep's approach to localization. (probably
others use it too) Basically, there's a file that maps english text to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the appropriate
language.

So to do this in ruby, you might have a pig-latin translation file,
using yaml:

---
Enter text here: Enterway exttay erehay
Type "yes" to continue: Ypetay "yes" otay ontinuecay

Then write a small class:

class Localization
def initialize(filename)
# load yaml data from file to a hash
# stored in @translations
end
def [](text)
@translations[text] || text
end
end

localization files are automatically generated for english, that
include all the string literals from the source code. For the english
files, they just map each string to itself. But then you can easily
hand the file over for translation, and add the new one in.

I thought a bit more about this approach, it I think I see a flaw:
If you change the text in your program, you need to also change the
text
in your translation file. For example:

#your translation.yaml file for English:
"Login Failed. Please check your proxy settings" : "Login Failed.
Please check your proxy settings"

I'm assuming you use the Localization class like so:

#Your Ruby code:
loc = Localization.new("translation.yaml")
#...
puts loc["Login Failed. Please check your proxy settings."]



Now let's say you want to make the message more detailed so you change
your Ruby code like so:

puts loc["Login Failed. You could be behind a firewall. If so,
please check your Proxy settings"]


For the case of English, it's not a problem, you'll see the new
message.
However, when you give the translation file to your translator to
translate, they will translate the old message and when your program is
run it will display the English text instead of the translated
text because the hash doesn't contain the
key for the new message. In order to prevent this you must
meticulously
keep your translation.yaml file in sync with your Ruby code. This
seems
to me to be a violation of the Do Not Repeat Yourself rule.

If you can automatically generate the english version from the source
code, then merge with the old files, you should be able to update the
translation files pretty easily. So I guess this method would require
that you be able to generate the english files from the source.
 
J

Jean-Hugues ROBERT

il Tue, 27 Apr 2004 14:31:45 +0900, (e-mail address removed) ha
scritto::
Incredible, I was thinking of this this morning.
Now that the battle field is changed (i.e. we have key:val hashes)
could we insist on this feature a little more? :)

We have key: val ? I tried, that does not work in 1.8 (is it 1.9 ?).
However, try this:
p {a:"1"}
It apparently does nothing, weird ?

Jean-Hugues
 
M

Mark Hubbart

How do they handle ambiguity, when one English word or phrase should
be translated differently in different contexts? For example, "File"
is both a verb and a noun in English, but in some languages there are
two words.

It seems that they also allow the defining of constants in localization
files. Looking at various files, I've seen some parts that look like
this:

NO_MORE_FOO_BUTTON_USER_CANCEL = "Cancel Foo"

...and parts that look like this:

"We could not find any more Foo." = "We could not find any more Foo."

... this being, of course, an English file. So I guess it's designed to
flex around those types of situations.

cheers,
--Mark
 
M

Masao Mutoh

Hi,

On Tue, 27 Apr 2004 03:09:03 +0900
I'm developing a GUI app using Ruby and FLTK. One of the requirements
that's come up lately is that this app should be 'localizable'. I'm trying
to figure out approaches for doing this as I design the app.
Does this seem like a reasonable approach?

If you think it to be general library, You need to consider some issues.
For examples,
- File encoding.
It may be no problem to use UTF-8 only, though.
- How to include values from the program.
As Austin says, you may need to support similer printf style.
- Plural sentence
* Sometimes we want to separate "Find a file" / "Find files"
as a dialog message. And some languages need 3 or 4 variations
to express this kind of a message.
- Maintainability
* Developers may need to create the XML file automatically.
* Translaters need to update when the program is updated.
The developer needs to prepare "diff" program for that.
- How to select the language when you execute the program.
You may need to use environment variables(LC_ALL, LC_CTYPE..).
But some environment(like Windows) doesn't support them.
So you may need to write a wrapper for it.

Anyway, I recommand to read GNU GetText online manual[1] once.
Has anyone used alternate approaches?

IMO, Ruby-GetText-Package[2] is suitable for GUI applications.
You can use many tools for maintaince the locale files.
And the above issues have already solved.

Actually, some projects have already used it.
For example, rbbr[3] has already supported 11 languages
with Ruby-GetText-Package.

[1] http://www.gnu.org/software/gettext/manual/html_chapter/gettext_toc.html
[2] http://ponx.s5.xrea.com/hiki/ruby-gettext.html
[3] http://ruby-gnome2.sourceforge.jp/hiki.cgi?rbbr
 
M

Mark Hubbart

We have key: val ? I tried, that does not work in 1.8 (is it 1.9 ?).
However, try this:
p {a:"1"}
It apparently does nothing, weird ?

well, it will do nothing anyway, since it thinks you are trying to pass
a block to Kernel#p...

but:

mark@imac% ruby -v -e'p({a:1,b:2})'
ruby 1.9.0 (2004-04-11) [powerpc-darwin]
{:b=>2, :a=>1}

I'm not sure when this was implemented; in fact, I didn't know it was
there already until now :) very nice, thanks matz!

cheers,
--Mark
 
J

Jean-Hugues ROBERT

well, it will do nothing anyway, since it thinks you are trying to pass a
block to Kernel#p...

Thanks. It apparently is equivalent to p { a( :"1") }.
I got confused because I was not aware of the :"some string" syntax to
create a Symbol !
mark@imac% ruby -v -e'p({a:1,b:2})'
ruby 1.9.0 (2004-04-11) [powerpc-darwin]
{:b=>2, :a=>1}
I'm not sure when this was implemented; in fact, I didn't know it was
there already until now :) very nice, thanks matz!

I agree that it is somehow difficult to track news. Apparently 1.9 will
implement
the 2.0 syntax before work actually starts on the VM for 2.0.
See http://www.rubygarden.org/ruby?Rite

Jean-Hugues
 
G

Gawnsoft

I'm developing a GUI app using Ruby and FLTK. One of the requirements
that's come up lately is that this app should be 'localizable'. I'm
trying
to figure out approaches for doing this as I design the app.

The approach I'm thinking of is to have some seperate file, maybe in
XML,
that would contain all of the text which is used for things like button
labels, output messages, etc. Then to support different languages all
the
customer would have to do is include a different language specific
file.

These files might look something like:

[...]

So, if they wanted to make a Japanese version of the app, they would
replace each of the strings above with their Japanese counterparts.
While it seems a bit heavy, I'm thinking an XML approach would be good
for
a couple of reasons:
1) this localization will probably be taking place long after I'm gone
(this is a contract job) and XML is widely known and understood.
2) it's easy to extract data from it using REXML


Does this seem like a reasonable approach?

Has anyone used alternate approaches?

You might look at Apple/NextStep's approach to localization. (probably
others use it too) Basically, there's a file that maps english text to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the appropriate
language.

So to do this in ruby, you might have a pig-latin translation file,
using yaml:

---
Enter text here: Enterway exttay erehay
Type "yes" to continue: Ypetay "yes" otay ontinuecay

Then write a small class:

class Localization
def initialize(filename)
# load yaml data from file to a hash
# stored in @translations
end
def [](text)
@translations[text] || text
end
end

localization files are automatically generated for english, that
include all the string literals from the source code. For the english
files, they just map each string to itself. But then you can easily
hand the file over for translation, and add the new one in.

I thought a bit more about this approach, it I think I see a flaw:
If you change the text in your program, you need to also change the text
in your translation file. For example:

#your translation.yaml file for English:
"Login Failed. Please check your proxy settings" : "Login Failed.
Please check your proxy settings"

I'm assuming you use the Localization class like so:

#Your Ruby code:
loc = Localization.new("translation.yaml")
#...
puts loc["Login Failed. Please check your proxy settings."]



Now let's say you want to make the message more detailed so you change
your Ruby code like so:

puts loc["Login Failed. You could be behind a firewall. If so,
please check your Proxy settings"]


For the case of English, it's not a problem, you'll see the new message.
However, when you give the translation file to your translator to
translate, they will translate the old message and when your program is
run it will display the English text instead of the translated
text because the hash doesn't contain the
key for the new message. In order to prevent this you must meticulously
keep your translation.yaml file in sync with your Ruby code. This seems
to me to be a violation of the Do Not Repeat Yourself rule.

Phil

So you need a Ruby applet that searches the source files for all loc
calls and checks if there's an entry for its argument - as part of
your test suite?


Cheers,
Euan
Gawnsoft: http://www.gawnsoft.co.sr
Symbian/Epoc wiki: http://html.dnsalias.net:1122
Smalltalk links (harvested from comp.lang.smalltalk) http://html.dnsalias.net/gawnsoft/smalltalk
 
M

Mark Hubbart

On Apr 26, 2004, at 11:09 AM, Phil Tomson wrote:

I'm developing a GUI app using Ruby and FLTK. One of the
requirements
that's come up lately is that this app should be 'localizable'. I'm
trying
to figure out approaches for doing this as I design the app.

The approach I'm thinking of is to have some seperate file, maybe in
XML,
that would contain all of the text which is used for things like
button
labels, output messages, etc. Then to support different languages
all
the
customer would have to do is include a different language specific
file.

These files might look something like:

[...]

So, if they wanted to make a Japanese version of the app, they would
replace each of the strings above with their Japanese counterparts.
While it seems a bit heavy, I'm thinking an XML approach would be
good
for
a couple of reasons:
1) this localization will probably be taking place long after I'm
gone
(this is a contract job) and XML is widely known and understood.
2) it's easy to extract data from it using REXML


Does this seem like a reasonable approach?

Has anyone used alternate approaches?

You might look at Apple/NextStep's approach to localization.
(probably
others use it too) Basically, there's a file that maps english text
to
the translated text. Then, anywhere in the program where a particular
english string is used, it checks for a translation to the
appropriate
language.

So to do this in ruby, you might have a pig-latin translation file,
using yaml:

---
Enter text here: Enterway exttay erehay
Type "yes" to continue: Ypetay "yes" otay ontinuecay

Then write a small class:

class Localization
def initialize(filename)
# load yaml data from file to a hash
# stored in @translations
end
def [](text)
@translations[text] || text
end
end

localization files are automatically generated for english, that
include all the string literals from the source code. For the english
files, they just map each string to itself. But then you can easily
hand the file over for translation, and add the new one in.

I thought a bit more about this approach, it I think I see a flaw:
If you change the text in your program, you need to also change the
text
in your translation file. For example:

#your translation.yaml file for English:
"Login Failed. Please check your proxy settings" : "Login Failed.
Please check your proxy settings"

I'm assuming you use the Localization class like so:

#Your Ruby code:
loc = Localization.new("translation.yaml")
#...
puts loc["Login Failed. Please check your proxy settings."]



Now let's say you want to make the message more detailed so you change
your Ruby code like so:

puts loc["Login Failed. You could be behind a firewall. If so,
please check your Proxy settings"]


For the case of English, it's not a problem, you'll see the new
message.
However, when you give the translation file to your translator to
translate, they will translate the old message and when your program
is
run it will display the English text instead of the translated
text because the hash doesn't contain the
key for the new message. In order to prevent this you must
meticulously
keep your translation.yaml file in sync with your Ruby code. This
seems
to me to be a violation of the Do Not Repeat Yourself rule.

Phil

So you need a Ruby applet that searches the source files for all loc
calls and checks if there's an entry for its argument - as part of
your test suite?

good point. I was trying to think of how to strip all the strings out
of the source code; all that would need to be found are the translation
calls. So if the translation calls are unique, it shouldn't be too
difficult.

--Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top