Changing case in a sentence to Capitalize Case.

J

jackson.rayne

Hello,

I am a javascript newbie and I'm stick at one place.

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

So for above example the output should be

This Is A Sentence

I have found some scripts that can change case to uppercase or
lowercase but I'm not able to come up with a solution for this.

One more thing, there is no limit on the number of words that I'll get
in the sentence. I may get one word or even ten words.

I'm looking for a solution that will work for all scenarios,

Regards,
Rayne
 
T

Thomas 'PointedEars' Lahn

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

You mean _letter_; *not* alphabet, which is a set of letters.

So for above example the output should be

This Is A Sentence

v1 = v1.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

You may adapt the character class to fit your needs.


PointedEars
 
T

Tom de Neef

Hello,

I am a javascript newbie and I'm stick at one place.

I have a requirement where I will get a sentence in a variable

example

var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

So for above example the output should be

This Is A Sentence

a) Regular Expressions, but I don't know how to use them.

b) 1:Split string into words; 2:capitalize first letters; 3:concatenate
words into string.
1: look up what var wordarray = []; wordarray = v1.split(' ') will do
2: for all k: wordarray[k] = wordarray[k].charAt(0).toUpperCase() +
wordarray[k].substr(1);
3: check out the join function: output = wordarray.join(' ');

Tom
 
T

Thomas 'PointedEars' Lahn

Conrad said:
v1 = v1.replace(/\b(\w)/g, function (s, c) {
return c.toUpperCase();
});

\b matches a word boundary; it does not work with non-ASCII letters.
\w matches ASCII letters, decimal digits and `_'.
BTW, your question looks like a typical homework assignment. If that's
the case: letting other people solve your beginner assignments is not a
not a clever idea, if you want to learn the language or have to pass
exams later. If this wasn't homework, please disregard.

Full ACK.


PointedEars
 
S

SAM

(e-mail address removed) a écrit :
var v1 ="This is a sentence"

Now I have to change the sentence to Capitalize case where the first
alphabet of every word will be in caps

function capitalize( t ) {
t = t.split(' ');
for(var i=0; i<t.length; i++) {
t = t.charAt(0).toUpperCase()+t.substring(1);
}
return t.join(' ');
}

alert(capitalize(v1));
I'm looking for a solution that will work for all scenarios,

alert(capitalize('ask google for charAt, join, '+
'substring and split in javaScript'));




HTML :
======
<a href="javascript:document.geElementById('here').innerHTML = v1">
capitalize the variable v1</a>

<p id="here" style="text-transform: capitalize"></p>
 
T

Thomas 'PointedEars' Lahn

Conrad said:
Yes, I was assuming simple English sentences, where \b will usually work
(and it doesn't matter when toUpperCase is applied digits or the
underscore).

It matters because it would be needlessly inefficient.
In this case, my earlier example could even be simplified to:

v1 = v1.replace(/\b\w/g, function (c) {
return c.toUpperCase();
});

Correct, \b would match the empty string before the \w then.
Your character class approach (in your other post) would work if the
character set is known and rather small. Latin1, for example, could use
[a-zàáâãäåæçèéêëìíîïðñòóôõöøßùúûüý]. But if we're assuming an random
international setting, this is going to be a lot harder.

Harder, granted.
Creating a character class that would work on the complete Unicode set
would be almost impossible, and also error prone.

I do not think it any of the above would apply, though. ISTM you are
unaware of the fact that, while the Unicode Standard (4.0) already defines a
finite character set of which ECMAScript implementations only support the
Basic Multilingual Plane (U+0000 to U+FFFF), the number of characters that
can be subject to case switching is even more limited, and that character
ranges can be used in regular expressions, whereas their boundaries can also
be written as Unicode escape sequences.

All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for. Take some Latin
character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)

See also: said:
It would be simpler to define custom "word boundary" characters, and just
let JavaScript uppercase everything following them:

Would it? ISTM the punctuation of languages is a lot more complicated than
their letters; take Spanish, for example. But then ISTM capitalizing titles
is not something that is common in other languages than English, and some
even consider it deprecated there already. However, for uniformity one
might be inclined to apply this formatting to non-English (song) titles as
well; I have seen that before.
var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.
wBound would still have to be adjusted as required to include, for
example, different types of quotes, or the Japanese/Chinese full stop
character 。).

I am afraid it would have to be rewritten entirely anyway.


PointedEars
 
S

SAM

Thomas 'PointedEars' Lahn a écrit :
Conrad said:
var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});

That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character.

Hu ? at least : ' and " can be found
(the others too if typo)
In fact, it is customary to have (white) space between those

I see no white space after ' or " in following :
l'éléphant ça "trompe" énormément
characters and the word character to be uppercased, so there would never be
a match then.

We never need to capitalize all words of a sentence in any case. It is a
spelling mistake otherwise of grammar in french.
I am afraid it would have to be rewritten entirely anyway.

and your solution doesn't work for me

'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'éléphant ça "trompe" énormément

While Conrad's code gives :
L'Éléphant Ça "Trompe" Énormément
if the charset is e.g. Latin 1 (and not utf-8)
 
T

Thomas 'PointedEars' Lahn

Conrad said:
On 2008-09-23 00:20, Thomas 'PointedEars' Lahn wrote:
[...]
All it takes is a bit of research on the defined Unicode character ranges
and the scripts (as in writing) they provide support for.

... this is still quite an undertaking, and I wouldn't presume to
understand enough about, say, Mongolian or Burmese to decide which of
the characters could/should be converted to uppercase. There are over a
hundred scripts in the BMP, not including the symbol collections.

ISTM few writing systems have a concept of letter case, see below. (CMIIW)
Take some Latin character ranges for example:

/[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

(This can be optimized, of course, but it helps [you] to get the picture.)

Again, I think I've already got a pretty good picture, but thanks for
the effort. Just to illustrate the pitfalls of your approach - out of
only 591 characters (basic latin to latin extended-b), you have

- included all the uppercase characters like À (U+00C0)

That was done on purpose, though, because although it should,
case-insensitive matching might not recognize the proper uppercase character
for a non-ASCII lowercase letter and vice-versa.
- included the × character (U+00D7) which is a symbol

ACK, I overlooked that one.
- used the "i" modifier, which is redundant because you have already
listed the exact code points that you want included

It is *not* redundant because it would definitely be supported for /[a-z]/.
That's for a group of characters that we're largely familiar with. Now,
to find out which of the characters in the more exotic groups are
lowercase letters, that would take more than just "a bit of research".

Perhaps somebody else has already collected all the interesting
character ranges, and we could use that information in our character
class,

but why should we, if JavaScript's toUpperCase() already does the
right thing with all types of characters?

Iff it does. And that would still not mean anything for other implementations.
That's beside the point. For one thing, to a lesser extent, capitalising
the first letters in titles is also common in some Germanic languages,
in Italian, etc. More importantly, deciding which languages do or do not
use capitalisation, or have deprecated it, or are only using it for
certain words, is beyond what we can do in a simple script function;
that's up to the person requesting the functionality.

My point was that ISTM punctuation is more difficult to handle than letters.
var wBound = '\\s,.;:?!\'"';
var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');

v1 = v1.replace(rex, function (s, g1, g2) {
return g1 + g2.toUpperCase();
});
That does not make much sense, though, since with the exception of white
space, and single and double quote, none of those (punctuation) characters
is likely to occur directly before something that can be considered a word
character. In fact, it is customary to have (white) space between those
characters and the word character to be uppercased, so there would never be
a match then.

"is likely"? "it is customary"? That's wishful thinking, just look at
some of the postings in this group (SCNR). You often see people omitting
the space after full stops or commas, for example:

"this is a sentence!and so is this,see?"

It may not be pretty, but there is no doubt that "and" and "see" are
both separate words, and thus should be capitalised.

Non sequitur; one should only capitalize properly written text. YMMV.


PointedEars
 
S

SAM

Conrad Lender a écrit :
Neither my example nor Thomas's were meant as complete implementations.

Yes. It was just a way of saying.
All in all, a perfect solution isn't going to be posted here
(at least not by me).
and your solution doesn't work for me

'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
function(m, p1, p2) {
return p1 + p2.toUpperCase();
});

result :
L'éléphant ça "trompe" énormément

To be fair, he did mention that the character class should be adapted.

And that should not have to be with your solution.
(except with charset utf-8 in my Fx in quirksmode, and I do not really
understand why)

If you use [a-z\u00DF-\u00FF] instead of [a-z], "ça" and "énormément"
will be capitalised as well.

a little better :
L'éléphant Ça "trompe" Énormément
-----^------------^

Oui mais ça fait pas propre ces \u... ou \x...
This JS can't find by itself correct corresponding unicodes ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top