D
David Heinemeier Hansson
Hi there,
I'm preparing a I18N version of Instiki, but I'm running into a few
issues that's holding me back. First of all, even though constructs
like \w will correctly recognize unicode characters, [:upper:] does not
include capital unicode letters.
That's somewhat of a problem if you want to allow wiki words like
ÆbletærtenVarMåskeØensBedste ("the apple pie was perhaps the islands
best" in Danish).
I've currently implemented a hack where I just have a long list of
capital unicode letters that I know for Danish ("ÅØÆ"). This list could
probably even be found at some unicode site (links for that would be
great!), but I was wondering if there wasn't a cleaner way?
Also, the URI parser in WEBrick seems to break down on url encoded
unicode characters. "DuÆlskerLegetøj" ("you love toys" in Danish)
breaks down like this:
- -> /wiki/show/Du%C3%86lskerLeget%C3%B8j
[2004-04-24 21:45:15] ERROR URI::InvalidURIError: bad URI(is not URI?):
/wiki/new/DuÆlskerJoLegetøj
/usr/local/lib/ruby/1.8/uri/common.rb:345:in `split'
/usr/local/lib/ruby/1.8/uri/common.rb:368:in `parse'
/usr/local/lib/ruby/1.8/uri/generic.rb:840:in `merge0'
/usr/local/lib/ruby/1.8/uri/generic.rb:799:in `merge'
/usr/local/lib/ruby/1.8/webrick/httpresponse.rb:146:in
`setup_header'
/usr/local/lib/ruby/1.8/webrick/httpresponse.rb:84:in
`send_response'
/usr/local/lib/ruby/1.8/webrick/httpserver.rb:67:in `run'
I'd be much grateful for any tips on how to handle this. The faster I
get it solved, the faster a new Instiki release will see the light of
day
P.S.: As a treat, I can tell that the new release has a new in-wiki
configuration page that allows you to:
* switch between markup languages (test to find what you like best
without starting/stopping Instiki)
* make additions to the stylesheet (easy tweak the entire look of
Instiki)
* Rename/move the entire web
* Add/remove password protection
I'm preparing a I18N version of Instiki, but I'm running into a few
issues that's holding me back. First of all, even though constructs
like \w will correctly recognize unicode characters, [:upper:] does not
include capital unicode letters.
That's somewhat of a problem if you want to allow wiki words like
ÆbletærtenVarMåskeØensBedste ("the apple pie was perhaps the islands
best" in Danish).
I've currently implemented a hack where I just have a long list of
capital unicode letters that I know for Danish ("ÅØÆ"). This list could
probably even be found at some unicode site (links for that would be
great!), but I was wondering if there wasn't a cleaner way?
Also, the URI parser in WEBrick seems to break down on url encoded
unicode characters. "DuÆlskerLegetøj" ("you love toys" in Danish)
breaks down like this:
- -> /wiki/show/Du%C3%86lskerLeget%C3%B8j
[2004-04-24 21:45:15] ERROR URI::InvalidURIError: bad URI(is not URI?):
/wiki/new/DuÆlskerJoLegetøj
/usr/local/lib/ruby/1.8/uri/common.rb:345:in `split'
/usr/local/lib/ruby/1.8/uri/common.rb:368:in `parse'
/usr/local/lib/ruby/1.8/uri/generic.rb:840:in `merge0'
/usr/local/lib/ruby/1.8/uri/generic.rb:799:in `merge'
/usr/local/lib/ruby/1.8/webrick/httpresponse.rb:146:in
`setup_header'
/usr/local/lib/ruby/1.8/webrick/httpresponse.rb:84:in
`send_response'
/usr/local/lib/ruby/1.8/webrick/httpserver.rb:67:in `run'
I'd be much grateful for any tips on how to handle this. The faster I
get it solved, the faster a new Instiki release will see the light of
day
P.S.: As a treat, I can tell that the new release has a new in-wiki
configuration page that allows you to:
* switch between markup languages (test to find what you like best
without starting/stopping Instiki)
* make additions to the stylesheet (easy tweak the entire look of
Instiki)
* Rename/move the entire web
* Add/remove password protection