Unicode challenges for a I18N Instiki

  • Thread starter David Heinemeier Hansson
  • Start date
D

David Heinemeier Hansson

Hi there,

I'm preparing a I18N version of Instiki, but I'm running into a few
issues that's holding me back. First of all, even though constructs
like \w will correctly recognize unicode characters, [:upper:] does not
include capital unicode letters.

That's somewhat of a problem if you want to allow wiki words like
ÆbletærtenVarMåskeØensBedste ("the apple pie was perhaps the islands
best" in Danish).

I've currently implemented a hack where I just have a long list of
capital unicode letters that I know for Danish ("ÅØÆ"). This list could
probably even be found at some unicode site (links for that would be
great!), but I was wondering if there wasn't a cleaner way?

Also, the URI parser in WEBrick seems to break down on url encoded
unicode characters. "DuÆlskerLegetøj" ("you love toys" in Danish)
breaks down like this:

- -> /wiki/show/Du%C3%86lskerLeget%C3%B8j
[2004-04-24 21:45:15] ERROR URI::InvalidURIError: bad URI(is not URI?):
/wiki/new/DuÆlskerJoLegetøj
/usr/local/lib/ruby/1.8/uri/common.rb:345:in `split'
/usr/local/lib/ruby/1.8/uri/common.rb:368:in `parse'
/usr/local/lib/ruby/1.8/uri/generic.rb:840:in `merge0'
/usr/local/lib/ruby/1.8/uri/generic.rb:799:in `merge'
/usr/local/lib/ruby/1.8/webrick/httpresponse.rb:146:in
`setup_header'
/usr/local/lib/ruby/1.8/webrick/httpresponse.rb:84:in
`send_response'
/usr/local/lib/ruby/1.8/webrick/httpserver.rb:67:in `run'

I'd be much grateful for any tips on how to handle this. The faster I
get it solved, the faster a new Instiki release will see the light of
day :)

P.S.: As a treat, I can tell that the new release has a new in-wiki
configuration page that allows you to:
* switch between markup languages (test to find what you like best
without starting/stopping Instiki)
* make additions to the stylesheet (easy tweak the entire look of
Instiki)
* Rename/move the entire web
* Add/remove password protection
 
D

David Heinemeier Hansson

Hi there,
I'm preparing a I18N version of Instiki, but I'm running into a few
issues that's holding me back. First of all, even though constructs
like \w will correctly recognize unicode characters, [:upper:] does
not include capital unicode letters.

It turned out that just keeping a list of capital words in latin,
greek, and cyrillic worked out great. It would of course be great if
[:upper:] could do the same, but not that big of a problem.
Also, the URI parser in WEBrick seems to break down on url encoded
unicode characters. "DuÆlskerLegetøj" ("you love toys" in Danish)
breaks down like this:

I was a foul and didn't escape properly.

So yes, Instiki with I18N (latin, greek, and cyrillic) wiki words is
forthcoming. Oh yearh, [[wiki link]] and [[c]] works now too.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,888
Messages
2,569,964
Members
46,294
Latest member
HollieYork

Latest Threads

Top