best perl l10n solution that can support online translation and change management

  • Thread starter danielmcbrearty
  • Start date
D

danielmcbrearty

Hi perl people

I have a site which is multilingual in nature (www.engoi.com). It is
written in perl. I did not have too much experience in l10n and
accompanying issues when I wrote it, so I made a home cooked system,
which until now is working tolerably well.

Since then, of course, I've discovered some of the riches available in
CPAN, and read some writings on the subjects by people like Audrey Tang
and Sean Burke, which has given me much food for thought.

I must say that the issues they have touched on haven't given me too
much problems so far (for instance, pluralities) - at least not as far
as my users or translators have told me. This might be because my site
is relatively simple, and I tried to keep the texts simple, and fixed
as much as possible.

The issue that DOES trouble me more is one that is not mentioned in any
writing or module doc I have come across: change management. As I am on
the verge of a major site redesign, I am looking at how to make this
happen seamlessly. Please bear with me.

Let's say I define a text string as part of a new feature: "Please
click the button", which is duly translated by my willing volunteers,
and works it's way into the production database at some point (ALL
translatable text is in the database used by the site).

Let's say that at some future date, due to feedback from my users, I
want to change it to "Please click the button (please do not click
twice)".

Now, what I want to happen is that the existing translations in each
language continue to be used until the translator gets around to
updating. Furthermore, I want the translators to be automatically
notified (by a weekly report or such) that the old translation needs is
out of date.

I can easily achieve this by some restructuring of the database to
include timestamps. I have a clear mental picture of how to do it.

Let's also consider that the original text was misspelt. I now want to
correct it without triggering an "out of date" condition for the text.

I can do all of this by making actions called "update" and "correct"
available to teh site author (me). I think it would be a really neat
system.

Please bear in mind that we are talking about a system where
translation happens entirely online (this is currently the case, the
timestamping is a future add-on). Translators can immediately see a
test version of the site containing their words as they work. This has
the benefit of letting them see their work in context early in the
process.

Much as I admire the intellect and writings of the experts in the
field, and would like to make use of the great modules they offer, as
most of what is there seems to be based on po/mo files (a system
originally designed for offline use, and requiring recompilation steps
before translated content is actually available for use), this just
doesn't fit.

Maybe it's possible to get around this by using ties to a database, but
it seems likely to add complexity to a system which right now has the
virtue of simplicity.

So, I have to weigh up the benefits of using existing specialist
modules : the main one seems to be software support for difficult cases
like the famed "5 files in 14 folders were found" in Russian. It's
great to have that, but it hasn't actually been a major problem. The
best approcah I've found has been to use alternative wordings - maybe
"Folders searched : 14 Files found : 5". Not as elegant or grammatical,
but clear, readable and it does the job - with simplicity.

The other nice feature mentioned is inheritance of languages - my
mexican spanish translation could inherit from castillian spanish. This
is great and really useful - but I also suspect I can find simple ways
of doing it (it's essentially making the "gettranslation' method know
how to get defaults in a sensible order).

I feel that I might be missing major parts of the argument, but having
spent quite some time reading, I don't see what. I also don't see
existing solutions addressing my practical concerns in any clear way.

Apologies for the long post, but if anyone can shine a light in the
dark, I'd be very grateful.

thanks

Daniel
 
G

Guest

(e-mail address removed) wrote:

: Now, what I want to happen is that the existing translations in each
: language continue to be used until the translator gets around to
: updating. Furthermore, I want the translators to be automatically
: notified (by a weekly report or such) that the old translation needs is
: out of date.

: Let's also consider that the original text was misspelt. I now want to
: correct it without triggering an "out of date" condition for the text.

The problem (or the challenge) in Perl is: TIMTOWTDI. There is more than
one way to do it, and you haven't told us which way you've chosen so far.

Given the fact that the total data volume of your site (a handful or two
of languages, a handful or two of categories, and a handful or two of
words in each category) is sooo managable by today's machines in terms
of memory usage that you can easily keep the whole thing in memory, per-
haps in a hash of hashes. Look at perlreftut for a simple introduction,
or perlref for the full monty, or perldata. You can easily construct
stuff like

${'german'}{'color'}{'red'}="rot"

where you could replace 'german' by a variable $language, which could even
be an element of an array, like $language[3], dito for your other cate-
gories.

You can always store a copy of a hash contents on disk using the Storable
module (it's so blastingly convenient and simple to use that I wonder why
it hasn't been around from the very beginning of Perl), or you could
(given volume and structure of your data) encapsulate both contents and
status information into a simple XML file, using XML::Simple (which comes
with an excellent documentation, btw.) You can at any given time use
XMLout() to write a modified version of your data to disk, and so you
can bridge the gap between having your data accessible and modifiable
online, while being able to store them offline, keeping all administration
information e.g. as attribute (change date, time stamp, alternatives),
and all this without the overhead of tying yourself to a more conventional
database management system (which is simply not justified by the small
volume of your data).

With XML::Simple, you also have access to nested data structures in a
notation very similar to the one given two paragraphs above, and you can
define the nesting structure by whatever hierarchy seems most convenient
to you. A made-up entry could look like this:

<engoi_Dictionary>
<!-- assuming your "hub" is english -->
<word category="colours" english="red">
<translation>
language="japanese"
status="ok"
timestamp="200604032317"
akai</translation>
</word>
...
...
</engoi_Dictionary>

All steps necessary to link this data structure to a nested hash, and
how to access and modify this hash, are given in the documentation
of XML::Simple. I also consider your application so simple that the
ordering of entries in the resulting XML file doesn't really matter
as long you can sort your hash keys as you like, or keeping them in
desired order in an array. You can even include choice in your XML
structure and make them accessible by a status variable like
<word choice="3" of "5">
<entry>red</entry>
<entry>rot</entry>
<entry>ulaan</entry>
<entry>akai</entry>
<entry>mig</entry>
</word>

and tell XML::Simple to store the entries below ${'word'} in an array,
like ${'word'}->{'entry'}->[3] which would say 'akai' if printed.

Unfortunately, since you've given us little insight in how you originally
solved the issues of storing data etc. the things mentioned above cannot
be more than suggestions. Helpful or not, it's up to you.

Again, TIMTOWTDI. Imo, the critical task in redesigning your site is the
definition of the data model; coding it in Perl is not the challenge.

Oliver.
 
R

robic0

Hi perl people

I have a site which is multilingual in nature (www.engoi.com). It is
written in perl. I did not have too much experience in l10n and
accompanying issues when I wrote it, so I made a home cooked system,
which until now is working tolerably well.

Since then, of course, I've discovered some of the riches available in
CPAN, and read some writings on the subjects by people like Audrey Tang
and Sean Burke, which has given me much food for thought.

I must say that the issues they have touched on haven't given me too
much problems so far (for instance, pluralities) - at least not as far
as my users or translators have told me. This might be because my site
is relatively simple, and I tried to keep the texts simple, and fixed
as much as possible.

The issue that DOES trouble me more is one that is not mentioned in any
writing or module doc I have come across: change management. As I am on
the verge of a major site redesign, I am looking at how to make this
happen seamlessly. Please bear with me.

Let's say I define a text string as part of a new feature: "Please
click the button", which is duly translated by my willing volunteers,
and works it's way into the production database at some point (ALL
translatable text is in the database used by the site).

Let's say that at some future date, due to feedback from my users, I
want to change it to "Please click the button (please do not click
twice)".

Now, what I want to happen is that the existing translations in each
language continue to be used until the translator gets around to
updating. Furthermore, I want the translators to be automatically
notified (by a weekly report or such) that the old translation needs is
out of date.

I can easily achieve this by some restructuring of the database to
include timestamps. I have a clear mental picture of how to do it.

Let's also consider that the original text was misspelt. I now want to
correct it without triggering an "out of date" condition for the text.

I can do all of this by making actions called "update" and "correct"
available to teh site author (me). I think it would be a really neat
system.

Please bear in mind that we are talking about a system where
translation happens entirely online (this is currently the case, the
timestamping is a future add-on). Translators can immediately see a
test version of the site containing their words as they work. This has
the benefit of letting them see their work in context early in the
process.

Much as I admire the intellect and writings of the experts in the
field, and would like to make use of the great modules they offer, as
most of what is there seems to be based on po/mo files (a system
originally designed for offline use, and requiring recompilation steps
before translated content is actually available for use), this just
doesn't fit.

Maybe it's possible to get around this by using ties to a database, but
it seems likely to add complexity to a system which right now has the
virtue of simplicity.

So, I have to weigh up the benefits of using existing specialist
modules : the main one seems to be software support for difficult cases
like the famed "5 files in 14 folders were found" in Russian. It's
great to have that, but it hasn't actually been a major problem. The
best approcah I've found has been to use alternative wordings - maybe
"Folders searched : 14 Files found : 5". Not as elegant or grammatical,
but clear, readable and it does the job - with simplicity.

The other nice feature mentioned is inheritance of languages - my
mexican spanish translation could inherit from castillian spanish. This
is great and really useful - but I also suspect I can find simple ways
of doing it (it's essentially making the "gettranslation' method know
how to get defaults in a sensible order).

I feel that I might be missing major parts of the argument, but having
spent quite some time reading, I don't see what. I also don't see
existing solutions addressing my practical concerns in any clear way.

Apologies for the long post, but if anyone can shine a light in the
dark, I'd be very grateful.

thanks

Daniel

I can shine some light on your subject. Get your head out of ur ass!
Be pragmatic. Break it down into chunks of logic. Yes, even a diagram-
a flow chart. You write like your on a blog, a location for abstract
insanity and conceptual errors.

There is no clear core in your post, just a bunch of blog ramble.
Doesn't seem apropriate in a logic setting. Who the **** are you
trying to impress?
 
D

danielmcbrearty

Your reply, robic0, tells more about you than it does about me.

Thanks a lot for yours, Oliver. What I wanted to discuss, though, was
not really about the implementation details. I'll try to be a bit
clearer - my first post was a bit long.

There is quite a handful of existing solutions, and noone wants to be a
wheel-reinventor. What I was trying to find out was whether any of the
existing ones handle the particular problem of managing small text
changes on the site that I decsribe above. I know how to do it - it's
more question of doing it in the context of existing best solutions -
and is that even necessary? Am I the first person to want to solve this
problem? Seems hard to believe.

It doesn't look like Locale::Maketext::Lexicon does deal withy this
problem - actually it's not really it's job to. But it suppports the
Tie::Hash interface, so that means it's possible to put something
underneath that should make it all play nicely together.

Cheers

D
 
G

Guest

(e-mail address removed) wrote:

: Thanks a lot for yours, Oliver. What I wanted to discuss, though, was
: not really about the implementation details. I'll try to be a bit
: clearer - my first post was a bit long.

: There is quite a handful of existing solutions, and noone wants to be a
: wheel-reinventor. What I was trying to find out was whether any of the
: existing ones handle the particular problem of managing small text
: changes on the site that I decsribe above. I know how to do it - it's
: more question of doing it in the context of existing best solutions -
: and is that even necessary? Am I the first person to want to solve this
: problem? Seems hard to believe.

The central point of my posting still holds. Timtowtdi. Since you don't
tell us what you do, neither tell us where you felt immediate deficiencies,
nobody here can tell you how to improve your code base. That aside, I'd
rather not call a 1-line example of a nested hash an 'implementation detail';
it just shows a possible way to do it, and is as far from an implementation
as is a toy aircraft combustion engine from a full-fledged flying and
functional scaled model aircraft. It provides core functionality, or rather,
insight into the principle, but it doesn't fly. If all the (functional)
examples of code in the complete perl documentation were considered
implementation details, nobody would have to write any code any more
(or less, ymmv).

Please do not mistake me for being sarcastic, but your question sounds a
bit like: "I've built a car and crafted my own engine, and it works, actually
quite nicely. Now I want to upgrade my car. Do I take more pistons, or
which is better: four-stroke or two-stroke engine? I've also seen cars
running on fuel cells and batteries. Please advice as I do not want to
re-invent the wheel".

You leave us pretty much guessing, and nobody wants to do that. And, as
I tried to mention in my last posting, the choice of a particular solution
is intrinsically linked to your architecture, concept, data model, domain
etcetcetc. Try to straighten out how you structure your data (by language,
by semantical group, etc.) and you'll find an amazing number of ways to
define a hash (or an XML description, which I consider a very good case
of merging data and data description into one structure, in your case).
Get these basics sorted out first, then write your code, and then see
which problems arise, and once aired here, everybody can tell you whether
the problem goes with a different approach or is a conceptual problem.
This is a newsgroup about Perl, and not about universal data model
considerations.

Last, no matter what structure you use to hold your data in memory right
now, you can always make your data persistent with the Storable module;
practically no change is necessary to your code. Have you had a look at
that?

Oliver.
 
D

danielmcbrearty

the way it works now is that the texts are stored in a single table of
a database, one col per language. The problem is that there are no
timestamps on the text entries (I believe that the same is true of
po/mo based solutions, apart from the timestamp on the file - I could
be wrong though).

One way to fix that is to use a second table with columns of text,
timestamp and whatever else, and reference it from the first. But in
essence it doesn't matter much whether it's and sql db or a perl data
structure made persistent by tieing to some disc structure. I was
really trying to find out

a) am I reinventing the wheel here
b) if not, how do I make this new wheel work well with the existing
horse and cart

I still find it hard to believe I am the first person to want this.

again, I apologise for the lack of conciseness of my existing offering.

cheers

D
 
G

Guest

(e-mail address removed) wrote:

: again, I apologise for the lack of conciseness of my existing offering.

You're still not showing any code. What kind if input do you expect from
those who still bother to read your postings?

Your data model is so uncomplicated (didn't want to say: simple) and the
sheer volume of data fits well into a nutshell, hence there is no need
for a huge database operating in the background.

Just HOW you store your modification date is sooooo much a matter of your
personal taste and coding preferences. You can either have hash elements
like $lastmodified{$language}{$entry[5]}="2006/04/04" (you won't even
notice a difference in performance); you can also fold everything into
an XML description like <word lastmod="20060404">orange</word>. Again:
it's up to your data model how you arrange, nest and administrate your
data, including change management. There is no case in "re-inventing
the wheel", this is a basic household chore of writing a piece of software.
This task is mundane but has to be done, and no module etc. will ever
relieve you from that.

Sort out your requirements and your structure (in your first posting
you already mentioned you have a mental picture how to do things), and
certainly there will be a perfect fit among the many convenient data
structures Perl can offer. There is no need for more than that. Using
any ready-made module _still_ requires you to have a clear understanding
of how you want to manage your data.

Oliver.
 
G

Guest

(e-mail address removed) wrote:
: the way it works now is that the texts are stored in a single table of
: a database, one col per language. The problem is that there are no
: timestamps on the text entries (I believe that the same is true of
: po/mo based solutions, apart from the timestamp on the file - I could
: be wrong though).

So you have a two-dimensional data structure already. Make it three-
dimensional, storing administration information in the third dimension.
Look at perlreftut and perlref and search the perl documentation for
how to create arrays of arrays and you can extend your existing data
structure more or less on the fly, leaving everything else intact.

There is no need for any module here since you do not require new
functions or methods, you do not introduce new objects, etc. You
simply extend your existing data structure, and if you run into
problems, then, please, show us the code which caused problems,
otherwise I'm going to make myself a complete fool that I am still
the only one to answer your postings.

Oliver.
 
R

robic0

Your reply, robic0, tells more about you than it does about me.

Thanks a lot for yours, Oliver. What I wanted to discuss, though, was
not really about the implementation details. I'll try to be a bit
clearer - my first post was a bit long.

There is quite a handful of existing solutions, and noone wants to be a
wheel-reinventor. What I was trying to find out was whether any of the
existing ones handle the particular problem of managing small text
changes on the site that I decsribe above. I know how to do it - it's
more question of doing it in the context of existing best solutions -
and is that even necessary? Am I the first person to want to solve this
problem? Seems hard to believe.

It doesn't look like Locale::Maketext::Lexicon does deal withy this
problem - actually it's not really it's job to. But it suppports the
Tie::Hash interface, so that means it's possible to put something
underneath that should make it all play nicely together.

Cheers

D

Looking for a security friend in Oliver?
I don't believe in abstract design software therefore I hold the
position stated. Its possible to take an entirely abstract idea
and throw it away because of a defect, a conceptual error.
If its an abstraction based on your definitions and no one elses
(unless your Einstien who needs to explain) then it can never be
analyzed. The diagnosis is of a mental disorder.

Furthermore, you can't mix abstract with logic unless you equate
them. Otherwise its all in your pea-brain head.

For someone who knows every single detail in M$hit's OS, to think
of the entire thing is instantaneous. An abstraction in you mind
should be a thought experiment. Some people's that have designed
and analyzed a multitude of concepts brought to fruition, do it
in thier minds ahead of time, the whole thing. If your serious,
the hardest taskmaster in the world is yourself. Why if it doesen't
work for you will it work for someone else? I mean you expressed
the sentiment that you know it will work.

Here's my conclusion I stated a couple of days ago, with some added
criticism.

Get the **** back on your goddmed schizophrenic medication, with
some additional thorazine before you fuckin post here man!!

And btw, KISS MY MOTHERFUCKIN ASS, jackoff!!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top