Changing case in a sentence to Capitalize Case.

jackson.rayne · Sep 22, 2008

Hello,

I am a javascript newbie and I'm stick at one place.

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

So for above example the output should be

This Is A Sentence

I have found some scripts that can change case to uppercase or
lowercase but I'm not able to come up with a solution for this.

One more thing, there is no limit on the number of words that I'll get
in the sentence. I may get one word or even ten words.

I'm looking for a solution that will work for all scenarios,

Regards,
Rayne

Thomas 'PointedEars' Lahn · Sep 22, 2008

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

You mean _letter_; *not* alphabet, which is a set of letters.

So for above example the output should be

This Is A Sentence

v1 = v1.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

You may adapt the character class to fit your needs.

PointedEars

Tom de Neef · Sep 22, 2008

Hello,

I am a javascript newbie and I'm stick at one place.

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

So for above example the output should be

This Is A Sentence

a) Regular Expressions, but I don't know how to use them.

b) 1:Split string into words; 2:capitalize first letters; 3:concatenate
words into string.
1: look up what var wordarray = []; wordarray = v1.split(' ') will do
2: for all k: wordarray[k] = wordarray[k].charAt(0).toUpperCase() +
wordarray[k].substr(1);
3: check out the join function: output = wordarray.join(' ');

Tom

Thomas 'PointedEars' Lahn · Sep 22, 2008

Conrad said:
v1 = v1.replace(/\b(\w)/g, function (s, c) {
return c.toUpperCase();
});

\b matches a word boundary; it does not work with non-ASCII letters.
\w matches ASCII letters, decimal digits and `_'.

BTW, your question looks like a typical homework assignment. If that's
the case: letting other people solve your beginner assignments is not a
not a clever idea, if you want to learn the language or have to pass
exams later. If this wasn't homework, please disregard.

Full ACK.

PointedEars

SAM · Sep 22, 2008

(e-mail address removed) a écrit :

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

function capitalize( t ) {
t = t.split(' ');
for(var i=0; i<t.length; i++) {
t = t.charAt(0).toUpperCase()+t.substring(1);
}
return t.join(' ');
}

alert(capitalize(v1));

I'm looking for a solution that will work for all scenarios,

Click to expand...

alert(capitalize('ask google for charAt, join, '+
'substring and split in javaScript'));

HTML :
======
<a href="javascript:document.geElementById('here').innerHTML = v1">
capitalize the variable v1</a>

<p id="here" style="text-transform: capitalize"></p>

Thomas 'PointedEars' Lahn · Sep 22, 2008

Conrad said:
Yes, I was assuming simple English sentences, where \b will usually work
(and it doesn't matter when toUpperCase is applied digits or the
underscore).

It matters because it would be needlessly inefficient.

In this case, my earlier example could even be simplified to:

v1 = v1.replace(/\b\w/g, function (c) {
return c.toUpperCase();
});

Correct, \b would match the empty string before the \w then.

Your character class approach (in your other post) would work if the
character set is known and rather small. Latin1, for example, could use
[a-zÃ Ã¡Ã¢Ã£Ã¤Ã¥Ã¦Ã§Ã¨Ã©ÃªÃ«Ã¬ÃÃ®Ã¯Ã°Ã±Ã²Ã³Ã´ÃµÃ¶Ã¸ÃŸÃ¹ÃºÃ»Ã¼Ã½]. But if we're assuming an random
international setting, this is going to be a lot harder.

Harder, granted.

Creating a character class that would work on the complete Unicode set
would be almost impossible, and also error prone.

I do not think it any of the above would apply, though. ISTM you are
unaware of the fact that, while the Unicode Standard (4.0) already defines a
finite character set of which ECMAScript implementations only support the
Basic Multilingual Plane (U+0000 to U+FFFF), the number of characters that
can be subject to case switching is even more limited, and that character
ranges can be used in regular expressions, whereas their boundaries can also
be written as Unicode escape sequences.

All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for. Take some Latin
character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)

See also: said:
It would be simpler to define custom "word boundary" characters, and just
let JavaScript uppercase everything following them:

Would it? ISTM the punctuation of languages is a lot more complicated than
their letters; take Spanish, for example. But then ISTM capitalizing titles
is not something that is common in other languages than English, and some
even consider it deprecated there already. However, for uniformity one
might be inclined to apply this formatting to non-English (song) titles as
well; I have seen that before.

var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.

wBound would still have to be adjusted as required to include, for
example, different types of quotes, or the Japanese/Chinese full stop
character ã€‚).

I am afraid it would have to be rewritten entirely anyway.

PointedEars

SAM · Sep 23, 2008

Thomas 'PointedEars' Lahn a Ã©crit :

Conrad said:
Conrad said:

var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

Click to expand...

That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character.

Hu ? at least : ' and " can be found
(the others too if typo)

In fact, it is customary to have (white) space between those

I see no white space after ' or " in following :
l'Ã©lÃ©phant Ã§a "trompe" Ã©normÃ©ment

characters and the word character to be uppercased, so there would never be
a match then.

We never need to capitalize all words of a sentence in any case. It is a
spelling mistake otherwise of grammar in french.

I am afraid it would have to be rewritten entirely anyway.

and your solution doesn't work for me

'l\'Ã©lÃ©phant Ã§a "trompe" Ã©normÃ©ment'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'Ã©lÃ©phant Ã§a "trompe" Ã©normÃ©ment

While Conrad's code gives :
L'Ã‰lÃ©phant Ã‡a "Trompe" Ã‰normÃ©ment
if the charset is e.g. Latin 1 (and not utf-8)

Thomas 'PointedEars' Lahn · Sep 23, 2008

Conrad said:
On 2008-09-23 00:20, Thomas 'PointedEars' Lahn wrote:
[...]

All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for.

Click to expand...

... this is still quite an undertaking, and I wouldn't presume to
understand enough about, say, Mongolian or Burmese to decide which of
the characters could/should be converted to uppercase. There are over a
hundred scripts in the BMP, not including the symbol collections.

ISTM few writing systems have a concept of letter case, see below. (CMIIW)

Take some Latin character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)

Click to expand...

Again, I think I've already got a pretty good picture, but thanks for
the effort. Just to illustrate the pitfalls of your approach - out of
only 591 characters (basic latin to latin extended-b), you have

- included all the uppercase characters like Ã€ (U+00C0)

That was done on purpose, though, because although it should,
case-insensitive matching might not recognize the proper uppercase character
for a non-ASCII lowercase letter and vice-versa.

- included the Ã— character (U+00D7) which is a symbol

ACK, I overlooked that one.

- used the "i" modifier, which is redundant because you have already
listed the exact code points that you want included

It is *not* redundant because it would definitely be supported for /[a-z]/.

That's for a group of characters that we're largely familiar with. Now,
to find out which of the characters in the more exotic groups are
lowercase letters, that would take more than just "a bit of research".

Perhaps somebody else has already collected all the interesting
character ranges, and we could use that information in our character
class,

but why should we, if JavaScript's toUpperCase() already does the
right thing with all types of characters?

Iff it does. And that would still not mean anything for other implementations.

That's beside the point. For one thing, to a lesser extent, capitalising
the first letters in titles is also common in some Germanic languages,
in Italian, etc. More importantly, deciding which languages do or do not
use capitalisation, or have deprecated it, or are only using it for
certain words, is beyond what we can do in a simple script function;
that's up to the person requesting the functionality.

My point was that ISTM punctuation is more difficult to handle than letters.

var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

Click to expand...

That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.

Click to expand...

"is likely"? "it is customary"? That's wishful thinking, just look at
some of the postings in this group (SCNR). You often see people omitting
the space after full stops or commas, for example:

"this is a sentence!and so is this,see?"

It may not be pretty, but there is no doubt that "and" and "see" are
both separate words, and thus should be capitalised.

Non sequitur; one should only capitalize properly written text. YMMV.

PointedEars

SAM · Sep 23, 2008

Neither my example nor Thomas's were meant as complete implementations.

Yes. It was just a way of saying.

All in all, a perfect solution isn't going to be posted here
(at least not by me).

and your solution doesn't work for me

'l\'Ã©lÃ©phant Ã§a "trompe" Ã©normÃ©ment'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'Ã©lÃ©phant Ã§a "trompe" Ã©normÃ©ment

Click to expand...

To be fair, he did mention that the character class should be adapted.

And that should not have to be with your solution.
(except with charset utf-8 in my Fx in quirksmode, and I do not really
understand why)

If you use [a-z\u00DF-\u00FF] instead of [a-z], "Ã§a" and "Ã©normÃ©ment"
will be capitalised as well.

a little better :
L'Ã©lÃ©phant Ã‡a "trompe" Ã‰normÃ©ment
-----^------------^

Oui mais Ã§a fait pas propre ces \u... ou \x...
This JS can't find by itself correct corresponding unicodes ?

Switch case in a JavaScript function	7	Dec 1, 2022
Did you know that there is a match-case function in python?	4	Dec 17, 2023
simple_html_dom: simple use-case - getting a scipt to work	0	Mar 2, 2020
Data saving in condition of changing reality	0	Apr 29, 2022
Sentence case	7	Oct 6, 2004
Yet Another Switch-Case Syntax Proposal	0	Apr 2, 2014
FAQ 4.30 How do I capitalize all the words on one line?	0	Mar 24, 2011
change each element of vector to upper case?	15	Apr 16, 2014

Changing case in a sentence to Capitalize Case.

jackson.rayne

Thomas 'PointedEars' Lahn

Tom de Neef

Thomas 'PointedEars' Lahn

SAM

Thomas 'PointedEars' Lahn

SAM

Thomas 'PointedEars' Lahn

SAM

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads