downcase part of a string

I

ilhamik

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

after downcase it should look like "this is a text and (NO Change HERE)
help"

I don't want to downcase the letters in parentheses.
How can i do that, i tried it with regular expressions but can't do
it.

Thanks for any help
 
K

Kalman Noel

ilhamik:
hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

If the parentheses occur only once:

if msg =~ /\(.*?\)/
$~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman
 
P

Peter Szinek

ilhamik said:
hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as well:

original = msg.scan(/\(.+?\)/)
msg.downcase!
altered = msg.scan(/\(.+?\)/)
original.each_with_index { |stuff, i| msg.sub!(altered,stuff) }
 
I

ilhamik

No, they can occur more then onece.

Kalman said:
ilhamik:
hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

If the parentheses occur only once:

if msg =~ /\(.*?\)/
$~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman
 
I

ilhamik

Thanks Peter, it works fine.


Peter said:
ilhamik said:
hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as well:

original = msg.scan(/\(.+?\)/)
msg.downcase!
altered = msg.scan(/\(.+?\)/)
original.each_with_index { |stuff, i| msg.sub!(altered,stuff) }
 
J

James Edward Gray II

Thanks Peter, it works fine.

You missed Tim Bray's RubyConf talk. According to him we should,
never be using the case changing methods. "Just don't do it!" ;)

James Edward Gray II
 
S

Scott

Certainly not pretty with that funky regex, but it works:

msg = "THIS is a Text and (NO Change HERE) HELP (Not here Either)"

msg.gsub!(/([^\(]*(?!\())|(\(.*?\))|(\)[^\)]*\))/) do |m|
m[0] == 40 ? m : m.downcase
end

- Scott
 
M

Mike Durham

James said:
You missed Tim Bray's RubyConf talk. According to him we should, never
be using the case changing methods. "Just don't do it!" ;)

James Edward Gray II
Why not? What reason did he give?
Cheers
 
W

Wilson Bilkovich

Why not? What reason did he give?

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.
 
M

Mike Durham

Wilson said:
The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.
Thanks Wilson, that explains everything. I'd never thought about
problems like that.
Cheers, Mike
 
J

James Edward Gray II

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

Yes, this is basically it.

Tim Bray feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a
user entered into your web form a month back, are you going to know
if that string was encoded in a Turkish local (critical info if it
contains an "i")?

Even if it were possible, Tim suggests that it's a performance
killer. See Java, which tries to address as many rules as it
possibly can, for proof.

James Edward Gray II
 
F

F. Senault

Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :
As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

Fred
 
A

ara.t.howard

Yes, this is basically it.

Tim Bray feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a user
entered into your web form a month back, are you going to know if that
string was encoded in a Turkish local (critical info if it contains an "i")?

Even if it were possible, Tim suggests that it's a performance killer. See
Java, which tries to address as many rules as it possibly can, for proof.

James Edward Gray II

one caveat that tim did not mention, and which is quite applicable to many
small sites, is that you simply don't always have to care. for instance, if
your site is in english only to don't have to care. now, i'm not saying that
is a good idea - but a whole tons of successful business models work that way:
many successful newspapers, for example, publish in english only. the trick
is knowing if that's what you want up front. if that's unacceptable then it
does seem like you're screwed.

-a
 
H

Hal Fulton

F. Senault said:
Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :




This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

That's very interesting. So Tim is mistaken?


Hal
 
X

x1

"no".capitalize, Tim is right, but ruby is a "logical" language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini's
because some people can't wear them doesn't seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.
 
H

Hal Fulton

x1 said:
"no".capitalize, Tim is right, but ruby is a "logical" language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini's
because some people can't wear them doesn't seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

I don't think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.


Hal
 
W

Wilson Bilkovich

I don't think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.

It's entirely possible I'm mis-remembering that part of Tim's talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented 'e' character on it.
 
H

Hal Fulton

Wilson said:
It's entirely possible I'm mis-remembering that part of Tim's talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented 'e' character on it.

That's the way I remember it -- he said that a lowercase accented
character was sometimes uppercased differently, and it varied
"from district to district."

Earlier tonight I think he mentioned Quebec (but with a proper accent
that I don't know how to type).

I wouldn't be surprised if the French sometimes sneered a little at
the French spoken in Quebec, the way (sometimes) Brits make fun of
Americans, or Spanish (or Colombians) make fun of Mexicans.

But heck: Even if he was totally mistaken, his point still stands --
that capitalization is an unholy mess and is to be avoided. (Actually
he might have stated it more strongly.) Mistaken or not on that one
point, I thought the talk was excellent and informative.

Tim: Read my ch 4 when you can and give me your opinion. ;)


Hal
 
A

Austin Ziegler

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

No, not depending on jurisdiction in France. In French French, one
would capitalize =EAtre as Etre. In Canadian French, one would
capitalize it as =CAtre.
Also, in Turkish, there are four different cases of 'i', not just two.. a= nd which is
correct depends on the jurisdiction.

Not quite. There are two different 'i' letters: one with a dot, one
without. One is capitalized with a dot and one is capitalized without
the dot.

Also, the German eszet (=DF, as in Schlo=DF) would be capitalized as
SCHLOSS, but downcasing that would be schloss, not necessarily schlo=DF.
(Actually, and the Germans here will correct me on this I'm sure, I
think it would always be Schloss or Schlo=DF becaus the leading S would
not be lowercased in proper German. Looking at some German webpages
suggests so.)
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

Not impossible, just fraught with errors and performance issues. One
would not only have to have the locale lookup stuff, but one would
have to do statistical analysis to get better than mostly wrong with
anything but English. ;)

-austin
--=20
Austin Ziegler * (e-mail address removed) * http://www.halostatue.ca/
* (e-mail address removed) * http://www.halostatue.ca/feed/
* (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,217
Messages
2,571,121
Members
47,724
Latest member
Farreach2565

Latest Threads

Top