[ANN] ICU4R 0.1.0 - initial release

L

Lugovoi Nikolai

=3D=3DICU4R v.0.1.0 - initial release =3D=3D

=3D Abstract

ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.

Project Site: http://rubyforge.org/projects/icu4r/

Download: http://rubyforge.org/frs/download.php/8116/icu4r-0.1.0.tar.gz

RDoc: http://icu4r.rubyforge.org/

=3D Install Notes

To build ICU4R you'll need GCC and ICU v3.4 libraries, which can be
downloaded from
http://ibm.com/software/globalization/icu/downloads.jsp

Build and install:
ruby extconf.rb && make && make check && make install

=3D Features

ICU4R is Ruby C-extension binding for ICU library.
It is NOT mirroring full ICU object hierarchy, but is rather set of simple
interfaces for some practically useful functionality, and provides:

- UString : String-like class with internal UTF16 storage;
- UCA rules for UString comparisons (<=3D>, casecmp);
- Unicode regular expressions;
- encoding(codepage) conversion;
- Unicode normalization;
- access to resource bundles, including ICU locale data;
- transliteration, also rule-based;

Bunch of locale-sensitive functions:
- upcase/downcase;
- string collation;
- string search;
- iterators over text line/word/char/sentence breaks;
- message formatting (number/currency/string/time);
- date and number parsing.

=3D=3D DISCLAIMER =3D=3D

The code is slow and inefficient yet, can have many security and memory lea=
ks,
bugs, inconsistent documentation, incomplete test suite. Use it at
your own risk.

Critics, bug reports, feature requests are welcome :)

WBR, Nikolai Lugovoi <[email protected]>
 
A

Alex Fenton

Lugovoi said:
==ICU4R v.0.1.0 - initial release ==

ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.

Thanks, this is really interesting - not heard of the ICU library before.

There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.

But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:

SCIPIUS:~/installers/ruby/icu4r alex$ make
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
ustring.c: In function `icu_ustr_new_set':
ustring.c:169: warning: assignment discards qualifiers from pointer target type
ustring.c: In function `icu_reg_get_replacement':
ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c: In function `icu_ustr_substr':
ustring.c:2296: warning: unused variable `n'
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
ld: multiple definitions of symbol _rb_cUString
ustring.o definition of _rb_cUString in section (__DATA,__common)
fmt.o definition of _rb_cUString in section (__DATA,__common)
make: *** [ustring.bundle] Error 1

SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)

HTH
alex
 
G

Gyoung-Yoon Noh

=3D=3DICU4R v.0.1.0 - initial release =3D=3D

=3D Abstract

ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.

Project Site: http://rubyforge.org/projects/icu4r/

Download: http://rubyforge.org/frs/download.php/8116/icu4r-0.1.0.tar.gz

RDoc: http://icu4r.rubyforge.org/

=3D Install Notes

To build ICU4R you'll need GCC and ICU v3.4 libraries, which can be
downloaded from
http://ibm.com/software/globalization/icu/downloads.jsp

Build and install:
ruby extconf.rb && make && make check && make install

=3D Features

ICU4R is Ruby C-extension binding for ICU library.
It is NOT mirroring full ICU object hierarchy, but is rather set of simpl= e
interfaces for some practically useful functionality, and provides:

- UString : String-like class with internal UTF16 storage;
- UCA rules for UString comparisons (<=3D>, casecmp);
- Unicode regular expressions;
- encoding(codepage) conversion;
- Unicode normalization;
- access to resource bundles, including ICU locale data;
- transliteration, also rule-based;

Bunch of locale-sensitive functions:
- upcase/downcase;
- string collation;
- string search;
- iterators over text line/word/char/sentence breaks;
- message formatting (number/currency/string/time);
- date and number parsing.

=3D=3D DISCLAIMER =3D=3D

The code is slow and inefficient yet, can have many security and memory l= eaks,
bugs, inconsistent documentation, incomplete test suite. Use it at
your own risk.

Critics, bug reports, feature requests are welcome :)

WBR, Nikolai Lugovoi <[email protected]>

Great work. I'll check out next week.
 
L

Lugovoi Nikolai

Alex, thank you for pointing this bug.
I had no compile problems with GCC 3.4.2, GCC 4.0 and MSVC++ 7.1, so
didn't catch that, looks like GCC 3.3 has different default linking
options.

Could you try 0.1.1 release ?
http://rubyforge.org/frs/download.php/8168/icu4r-0.1.1.tar.gz

(Sorry for late response)

Alex said:
Lugovoi said:
==ICU4R v.0.1.0 - initial release ==

ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.

Thanks, this is really interesting - not heard of the ICU library before.

There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.

But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:

SCIPIUS:~/installers/ruby/icu4r alex$ make
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
ustring.c: In function `icu_ustr_new_set':
ustring.c:169: warning: assignment discards qualifiers from pointer target type
ustring.c: In function `icu_reg_get_replacement':
ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c: In function `icu_ustr_substr':
ustring.c:2296: warning: unused variable `n'
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
ld: multiple definitions of symbol _rb_cUString
ustring.o definition of _rb_cUString in section (__DATA,__common)
fmt.o definition of _rb_cUString in section (__DATA,__common)
make: *** [ustring.bundle] Error 1

SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)

HTH
alex
 
A

Alex Fenton

Lugovoi said:

thanks for this, it compiles fine on OS X 10.3 (see below), but segfaults when I run the ruby test with

dyld: ruby Undefined symbols:
___gxx_personality_v0
Trace/BPT trap

Let's take it off-list unless this rings any bells for anyone

alex

SCIPIUS:~/icu4r alex$ make clean; make; ruby test/test_ustring.rb
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc

(Sorry for late response)

Alex said:
Lugovoi said:
==ICU4R v.0.1.0 - initial release ==

ICU4R is an attempt to provide better Unicode support for Ruby, based
on ICU library.
Thanks, this is really interesting - not heard of the ICU library before.

There have been a few threads on Ruby + Unicode recently. Though the answer 'it's not broken' is true in that Ruby won't mess with your low-level UTF-8/16 bytes, the absence of support for semantics of glyphs is a big hindrance for writing multilingual text handling apps. It's things like having character classes like [:alpha:] and methods like String#upcase that actually work. Looks like ICU4r could address this.

But .. I couldn't try it as the build failed on OS X 10.3 . Installed ICU to /usr/local without a hitch, and ran extconf.rb without problem. But make died with:

SCIPIUS:~/installers/ruby/icu4r alex$ make
gcc -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c ustring.c
ustring.c: In function `icu_ustr_new_set':
ustring.c:169: warning: assignment discards qualifiers from pointer target type
ustring.c: In function `icu_reg_get_replacement':
ustring.c:1854: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c:1864: warning: passing arg 4 of `ustr_splice_units' discards qualifiers from pointer target type
ustring.c: In function `icu_ustr_substr':
ustring.c:2296: warning: unused variable `n'
g++ -fno-common -Wall -I. -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I/usr/local/lib/ruby/1.8/powerpc-darwin7.9.0 -I. -c fmt.cpp
cc -dynamic -bundle -undefined suppress -flat_namespace -licuuc -licui18n -licudata -L"/usr/local/lib" -o ustring.bundle ustring.o fmt.o -ldl -lobjc
ld: multiple definitions of symbol _rb_cUString
ustring.o definition of _rb_cUString in section (__DATA,__common)
fmt.o definition of _rb_cUString in section (__DATA,__common)
make: *** [ustring.bundle] Error 1

SCIPIUS:~/installers/ruby/icu4r alex$ gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1666)

HTH
alex
 
M

Michal Suchanek

T24gMS8yNC8wNiwgQWxleCBGZW50b24gPGFsZXhAZGVsZXRlbWUucHJlc3N1cmUudG8+IHdyb3Rl
Ogo+IEx1Z292b2kgTmlrb2xhaSB3cm90ZToKPgo+ID4gQ291bGQgeW91IHRyeSAwLjEuMSByZWxl
YXNlID8KPiA+IGh0dHA6Ly9ydWJ5Zm9yZ2Uub3JnL2Zycy9kb3dubG9hZC5waHAvODE2OC9pY3U0
ci0wLjEuMS50YXIuZ3oKPgo+ICB0aGFua3MgZm9yIHRoaXMsIGl0IGNvbXBpbGVzIGZpbmUgb24g
T1MgWCAxMC4zIChzZWUgYmVsb3cpLCBidXQgc2VnZmF1bHRzIHdoZW4gSSBydW4gdGhlIHJ1Ynkg
dGVzdCB3aXRoCj4KPiBkeWxkOiBydWJ5IFVuZGVmaW5lZCBzeW1ib2xzOgo+IF9fX2d4eF9wZXJz
b25hbGl0eV92MAo+IFRyYWNlL0JQVCB0cmFwCj4KClVzdWFsbHkgQysrIGNvZGUgY29tcGlsZWQg
YnkgZGlmZmVyZW50IHZlcnNpb25zIG9mIGdjYyBsaW5rZWQgdG9nZXRoZXIuCgpDaGVjayB0aGF0
IGFsbCB0aGUgc3R1ZmYgYW5kIHRoZSBsaWJyYXJpZXMgaXQgbGlua3Mgd2l0aCBhcmUgY29tcGls
ZWQKd2l0aCB0aGUgc2FtZSBnY2MuCgpUaGFua3MKCk1pY2hhbAo=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,698
Latest member
LydiaHalle

Latest Threads

Top