upcase in 1.9.2-preview1

B

Brian Candler

Can someone confirm whether this is intentional or not?
RUBY_DESCRIPTION => "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]"
s = "über" => "über"
s.upcase
=> "üBER"

That is, a lower-case "ü" is not uppercased to "Ü". And yet, "Ü" is
detected as an upper-case letter:
"ÃœBER" =~ /[[:upper:]]/ => 0
"ÃœBER" =~ /[[:lower:]]/
=> nil

Thanks,

Brian.
 
B

Bertram Scharpf

Hi,

Am Mittwoch, 29. Jul 2009, 16:43:36 +0900 schrieb Brian Candler:
Can someone confirm whether this is intentional or not?
=20
RUBY_DESCRIPTION =3D> "ruby 1.9.2dev (2009-07-18 trunk 24186) [i686-linux]"
s =3D "=C3=BCber" =3D> "=C3=BCber"
s.upcase
=3D> "=C3=BCBER"
=20
That is, a lower-case "=C3=BC" is not uppercased to "=C3=9C". And yet, "= =C3=9C" is
detected as an upper-case letter:
=20
"=C3=9CBER" =3D~ /[[:upper:]]/ =3D> 0
"=C3=9CBER" =3D~ /[[:lower:]]/
=3D> nil

By the way: I detected that there are some unicode characters where
it is not clear whether they are up- or downcase. For example the
DZ digraph has a version Dz that is the downcased version of DZ
and the upcased version of dz.

U+01F1 =C7=B1
U+01F2 =C7=B2
U+01F3 =C7=B3

http://en.wikipedia.org/wiki/Latin_Extended-B_unicode_block

Vim's ~ operator cycles through the three values. How will or
should Ruby treat them?

Bertram


--=20
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de
 
B

Brian Candler

Bertram said:
Vim's ~ operator cycles through the three values. How will or
should Ruby treat them?

I only have access to a slightly older 1.9.2 here, but:
RUBY_DESCRIPTION => "ruby 1.9.2dev (2009-04-08 trunk 23158) [i686-linux]"
["\u01f1", "\u01f2", "\u01f3"].each { |c| puts c =~ /[[:lower:]]/ }


0
=> ["DZ", "Dz", "dz"]
["\u01f1", "\u01f2", "\u01f3"].each { |c| puts c =~ /[[:upper:]]/ }
0


=> ["DZ", "Dz", "dz"]

So the first is upper, the third is lower, and the second is neither :)

upcase/downcase does not affect any of them - but I'm not sure if the
current behaviour is correct, which is why I started this thread.
 
B

Brian Candler

Brian said:
Can someone confirm whether this is intentional or not?

=> "üBER"

To answer my own question: looking at the source code, it looks like
this *is* intentional. From encoding.c:

int
rb_enc_toupper(int c, rb_encoding *enc)
{
return
(ONIGENC_IS_ASCII_CODE(c)?ONIGENC_ASCII_CODE_TO_UPPER_CASE(c):(c));
}

int
rb_enc_tolower(int c, rb_encoding *enc)
{
return
(ONIGENC_IS_ASCII_CODE(c)?ONIGENC_ASCII_CODE_TO_LOWER_CASE(c):(c));
}

That is: only ASCII characters (potentially encoded as UTF16 or
whatever) are eligible for case conversion.
 
I

Iñaki Baz Castillo

2009/7/29 Brian Candler said:
That is: only ASCII characters (potentially encoded as UTF16 or
whatever) are eligible for case conversion.

Is it the correct approach?
For me it's very clear that the upcase version of =C3=A1 is =C3=81.


--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,920
Members
47,464
Latest member
Bobbylenly

Latest Threads

Top