upcase in 1.9.2-preview1

Brian Candler · Jul 29, 2009

Can someone confirm whether this is intentional or not?

RUBY_DESCRIPTION => "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]"
s = "Ã¼ber" => "Ã¼ber"
s.upcase

Click to expand...

=> "Ã¼BER"

That is, a lower-case "Ã¼" is not uppercased to "Ãœ". And yet, "Ãœ" is
detected as an upper-case letter:

"ÃœBER" =~ /[[:upper:]]/ => 0
"ÃœBER" =~ /[[:lower:]]/

Click to expand...

=> nil

Thanks,

Brian.

Bertram Scharpf · Jul 29, 2009

Hi,

Am Mittwoch, 29. Jul 2009, 16:43:36 +0900 schrieb Brian Candler:

Can someone confirm whether this is intentional or not?
=20

RUBY_DESCRIPTION =3D> "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]"
s =3D "=C3=BCber" =3D> "=C3=BCber"
s.upcase

Click to expand...

Click to expand...

=3D> "=C3=BCBER"
=20
That is, a lower-case "=C3=BC" is not uppercased to "=C3=9C". And yet, "= =C3=9C" is
detected as an upper-case letter:
=20

"=C3=9CBER" =3D~ /[[:upper:]]/ =3D> 0
"=C3=9CBER" =3D~ /[[:lower:]]/

Click to expand...

Click to expand...

=3D> nil

By the way: I detected that there are some unicode characters where
it is not clear whether they are up- or downcase. For example the
DZ digraph has a version Dz that is the downcased version of DZ
and the upcased version of dz.

U+01F1 =C7=B1
U+01F2 =C7=B2
U+01F3 =C7=B3

http://en.wikipedia.org/wiki/Latin_Extended-B_unicode_block

Vim's ~ operator cycles through the three values. How will or
should Ruby treat them?

Bertram

--=20
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

Brian Candler · Jul 29, 2009

Bertram said:
Vim's ~ operator cycles through the three values. How will or
should Ruby treat them?

I only have access to a slightly older 1.9.2 here, but:

RUBY_DESCRIPTION => "ruby 1.9.2dev (2009-04-08 trunk 23158) [i686-linux]"
["\u01f1", "\u01f2", "\u01f3"].each { |c| puts c =~ /[[:lower:]]/ }

Click to expand...

0
=> ["Ç±", "Ç²", "Ç³"]

["\u01f1", "\u01f2", "\u01f3"].each { |c| puts c =~ /[[:upper:]]/ }

Click to expand...

0

=> ["Ç±", "Ç²", "Ç³"]

So the first is upper, the third is lower, and the second is neither

upcase/downcase does not affect any of them - but I'm not sure if the
current behaviour is correct, which is why I started this thread.

Brian Candler · Jul 29, 2009

Brian said:
Can someone confirm whether this is intentional or not?

=> "Ã¼BER"

To answer my own question: looking at the source code, it looks like
this *is* intentional. From encoding.c:

int
rb_enc_toupper(int c, rb_encoding *enc)
{
return
(ONIGENC_IS_ASCII_CODE(c)?ONIGENC_ASCII_CODE_TO_UPPER_CASE(c)

c));
}

int
rb_enc_tolower(int c, rb_encoding *enc)
{
return
(ONIGENC_IS_ASCII_CODE(c)?ONIGENC_ASCII_CODE_TO_LOWER_CASE(c)

c));
}

That is: only ASCII characters (potentially encoded as UTF16 or
whatever) are eligible for case conversion.

IÃ±aki Baz Castillo · Jul 29, 2009

2009/7/29 Brian Candler said:
That is: only ASCII characters (potentially encoded as UTF16 or
whatever) are eligible for case conversion.

Is it the correct approach?
For me it's very clear that the upcase version of =C3=A1 is =C3=81.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

Brian Candler · Jul 29, 2009

IÃ±aki Baz Castillo said:
Is it the correct approach?
For me it's very clear that the upcase version of Ã¡ is Ã.

There are perfectly clear Unicode rules for case conversion, but they
are not simple. In some cases you need to replace one character by two
(e.g. ÃŸ to SS)

There is a useful discussion about this from Python's point of view
here:
http://bytes.com/groups/python/22883-convert-unicode-lower-uppercase

1.9 upper-case for constants?	0	Jul 30, 2009
`Kernel.load`ing a file with a class definition in $SAFE = 1 underRuby 1.9.2-p180	0	Jun 10, 2011
Color in Mac OS X's Terminal for RI	1	Nov 3, 2009
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.vhdl FAQ part 4 of 4: glossary	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

upcase in 1.9.2-preview1

Brian Candler

Bertram Scharpf

Brian Candler

Brian Candler

IÃ±aki Baz Castillo

Brian Candler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads