downcase part of a string

ilhamik · Oct 22, 2006

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

after downcase it should look like "this is a text and (NO Change HERE)
help"

I don't want to downcase the letters in parentheses.
How can i do that, i tried it with regular expressions but can't do
it.

Thanks for any help

Kalman Noel · Oct 22, 2006

ilhamik:

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

If the parentheses occur only once:

if msg =~ /$.*?$/
$~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman

Peter Szinek · Oct 22, 2006

ilhamik said:
hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as well:

original = msg.scan(/$.+?$/)
msg.downcase!
altered = msg.scan(/$.+?$/)
original.each_with_index { |stuff, i| msg.sub!(altered,stuff) }

ilhamik · Oct 22, 2006

No, they can occur more then onece.

Kalman said:
ilhamik:

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

Click to expand...

If the parentheses occur only once:

if msg =~ /$.*?$/
$~.pre_match.downcase + $~[0] + $~.post_match.downcase
end

Kalman

ilhamik · Oct 22, 2006

Thanks Peter, it works fine.

Peter said:
ilhamik said:

hi,
I want to downcase a string but without specific parts.
for example:
msg = "THIS is a Text and (NO Change HERE) HELP"

Click to expand...

Hi,

This is kind of old school and I am sure there are nicer rubyish
solutions for it, but at least it works for multiple parentheses as well:

original = msg.scan(/$.+?$/)
msg.downcase!
altered = msg.scan(/$.+?$/)
original.each_with_index { |stuff, i| msg.sub!(altered,stuff) }

James Edward Gray II · Oct 22, 2006

Thanks Peter, it works fine.

You missed Tim Bray's RubyConf talk. According to him we should,
never be using the case changing methods. "Just don't do it!"

James Edward Gray II

Scott · Oct 22, 2006

Certainly not pretty with that funky regex, but it works:

msg = "THIS is a Text and (NO Change HERE) HELP (Not here Either)"

msg.gsub!(/([^$]*(?!\())|(\(.*?$)|(\)[^\)]*\))/) do |m|
m[0] == 40 ? m : m.downcase
end

- Scott

Mike Durham · Oct 22, 2006

James said:
You missed Tim Bray's RubyConf talk. According to him we should, never
be using the case changing methods. "Just don't do it!"

James Edward Gray II

Why not? What reason did he give?
Cheers

Wilson Bilkovich · Oct 23, 2006

Why not? What reason did he give?

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.

Mike Durham · Oct 23, 2006

Wilson said:
The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of 'i', not just two.. and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.

Thanks Wilson, that explains everything. I'd never thought about
problems like that.
Cheers, Mike

James Edward Gray II · Oct 24, 2006

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

Yes, this is basically it.

Tim Bray feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a
user entered into your web form a month back, are you going to know
if that string was encoded in a Turkish local (critical info if it
contains an "i")?

Even if it were possible, Tim suggests that it's a performance
killer. See Java, which tries to address as many rules as it
possibly can, for proof.

James Edward Gray II

F. Senault · Oct 24, 2006

Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

Fred

ara.t.howard · Oct 24, 2006

Yes, this is basically it.

Tim Bray feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a user
entered into your web form a month back, are you going to know if that
string was encoded in a Turkish local (critical info if it contains an "i")?

Even if it were possible, Tim suggests that it's a performance killer. See
Java, which tries to address as many rules as it possibly can, for proof.

James Edward Gray II

one caveat that tim did not mention, and which is quite applicable to many
small sites, is that you simply don't always have to care. for instance, if
your site is in english only to don't have to care. now, i'm not saying that
is a good idea - but a whole tons of successful business models work that way:
many successful newspapers, for example, publish in english only. the trick
is knowing if that's what you want up front. if that's unacceptable then it
does seem like you're screwed.

-a

Hal Fulton · Oct 25, 2006

F. Senault said:
Le 23 octobre 2006 à 03:16, Wilson Bilkovich a écrit :

This is way off topic, but I'd like to know where he heard that. It's
the first time for me, and I'm a native french speaker...

That's very interesting. So Tim is mistaken?

Hal

x1 · Oct 25, 2006

"no".capitalize, Tim is right, but ruby is a "logical" language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini's
because some people can't wear them doesn't seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

Tim Bray · Oct 25, 2006

That's very interesting. So Tim is mistaken?

I've been told that common usage differs in Qu=E9bec. -Tim

Hal Fulton · Oct 25, 2006

x1 said:
"no".capitalize, Tim is right, but ruby is a "logical" language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini's
because some people can't wear them doesn't seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.

I don't think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.

Hal

Wilson Bilkovich · Oct 25, 2006

I don't think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.

It's entirely possible I'm mis-remembering that part of Tim's talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented 'e' character on it.

Hal Fulton · Oct 25, 2006

Wilson said:
It's entirely possible I'm mis-remembering that part of Tim's talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented 'e' character on it.

That's the way I remember it -- he said that a lowercase accented
character was sometimes uppercased differently, and it varied
"from district to district."

Earlier tonight I think he mentioned Quebec (but with a proper accent
that I don't know how to type).

I wouldn't be surprised if the French sometimes sneered a little at
the French spoken in Quebec, the way (sometimes) Brits make fun of
Americans, or Spanish (or Colombians) make fun of Mexicans.

But heck: Even if he was totally mistaken, his point still stands --
that capitalization is an unholy mess and is to be avoided. (Actually
he might have stated it more strongly.) Mistaken or not on that one
point, I thought the talk was excellent and informative.

Tim: Read my ch 4 when you can and give me your opinion.

Hal

Austin Ziegler · Oct 25, 2006

The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.

As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.

No, not depending on jurisdiction in France. In French French, one
would capitalize =EAtre as Etre. In Canadian French, one would
capitalize it as =CAtre.

Also, in Turkish, there are four different cases of 'i', not just two.. a= nd which is
correct depends on the jurisdiction.

Not quite. There are two different 'i' letters: one with a dot, one
without. One is capitalized with a dot and one is capitalized without
the dot.

Also, the German eszet (=DF, as in Schlo=DF) would be capitalized as
SCHLOSS, but downcasing that would be schloss, not necessarily schlo=DF.
(Actually, and the Germans here will correct me on this I'm sure, I
think it would always be Schloss or Schlo=DF becaus the leading S would
not be lowercased in proper German. Looking at some German webpages
suggests so.)

Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.

Not impossible, just fraught with errors and performance issues. One
would not only have to have the locale lookup stuff, but one would
have to do statistical analysis to get better than mostly wrong with
anything but English.

-austin
--=20
Austin Ziegler * (e-mail address removed) * http://www.halostatue.ca/
* (e-mail address removed) * http://www.halostatue.ca/feed/
* (e-mail address removed)

How to upcase/downcase utf-8 chars?	4	Sep 4, 2008
Measuring a string of text	1	Sep 15, 2022
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
Regular expressions: Find part of a string	4	Jun 4, 2009
Problem Splitting Text String	2	Dec 29, 2022
jQuery 3 part slider	0	Apr 20, 2016
Small JS Countdown timer where user has to type string of numbers to stop it and win	8	Jun 16, 2024

downcase part of a string

ilhamik

Kalman Noel

Peter Szinek

ilhamik

ilhamik

James Edward Gray II

Scott

Mike Durham

Wilson Bilkovich

Mike Durham

James Edward Gray II

F. Senault

ara.t.howard

Hal Fulton

x1

Tim Bray

Hal Fulton

Wilson Bilkovich

Hal Fulton

Austin Ziegler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads