Diacritical marks in array don't translate

J

jiverbean

Dear group members,

I cam across a glitch in Javascript and I don't know how to solve it
elegantly.

I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

Because HTML doesn't or didn't allow some of these characters, I wrote:

profile[1] = "Fröhliches Fräulein";

but when I use an alert(profile[1]); the dialog displays the escape
codes instead of the diacritical marks. I then figured the unescape()
function would solve the problem, but not. I don't want to write:

profile[1] = "Fr%190hliches Fr%191ulein";
alert(unescape(profile[1]));

The numbers in the above example only serve to illustrate the idea. I
don't know where to look the exact numbers up, unless they are the
ASCII codes. I haven't tried the last technique yet, but I'm pondering
the issue

Any suggestions,
Jean Biver

________________________________________________
Check out my home page at http://homepage.internet.lu/aibiver
Please recommend my seti@home profile at
http://setiathome2.ssl.berkeley.edu/fcgi-bin/fcgi?cmd=view_feedback&id=26539
 
S

Safalra

I cam across a glitch in Javascript and I don't know how to solve it
elegantly.

I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

Because HTML doesn't or didn't allow some of these characters

Then you need to use a different character set for your document. Try
ISO-8859-1, which allows the standard European accented characters.
 
T

Thomas 'PointedEars' Lahn

jiverbean said:
I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

Because HTML doesn't or didn't allow some of these characters,

That's an urban legend that will probably never die. HTML allows these
characters, HTTP is and has been 8-bit-safe. You just need to declare
that with the Content-Type header and, for offline use,

<head>
<meta http-equiv="Content-Type" content="text/html; charset=...">
...
</head>

A good reason for escaping 8-bit characters _in HTML_ is editing on
different platforms without having the knowledge or facility (due to
keyboard layout) to type them there.

I wrote:

profile[1] = "Fr&ouml;hliches Fr&auml;ulein";

JS (programming language) is not HTML (markup language). This source code
has to be interpreted by the JS engine, and it does not and is not supposed
to "know" how to handle SGML character entity references like "&ouml;".

There is no problem you have to work around.
but when I use an alert(profile[1]); the dialog displays the escape
codes instead of the diacritical marks. I then figured the unescape()
function would solve the problem, but not. I don't want to write:

profile[1] = "Fr%190hliches Fr%191ulein";
alert(unescape(profile[1]));

It is not supposed to work anyway. unescape(), which is proprietary,
accepts only 8-bit escape sequences (in contrast to standardized
decodeURI*()). The above results in

Fr<EM>0hliches Fr<EM>1ulein

where said:
________________________________________________
[...]

Signatures are to be delimited by a line containing only "--<SP><CR><LF>".


HTH

PointedEars (a German)
 
R

Robert

jiverbean said:
Dear group members,

I cam across a glitch in Javascript and I don't know how to solve it
elegantly.

I have an array with strings of German words:

profile[1] = "Fröhliches Fräulein";

The fact that these words are in an array doesn't matter.
The problem that you are probably having is that the encoding that your
html and/or javascript is saved is in a different encoding than the
encoding you specified in your HTML. Or maybe you forgot to specify the
encoding and the encoding is wrongly auto-detected.

The most useuful encoding in your case is probably UTF-8.
So makes sure you have this in your header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Next make sure that your html/javascript file is in UTF-8 format.
I myself use emeditor as a text editior, because it uses the correct
Unicode terminology for saving files.

Because HTML doesn't or didn't allow some of these characters, I wrote:

profile[1] = "Fr&ouml;hliches Fr&auml;ulein";

but when I use an alert(profile[1]); the dialog displays the escape
codes instead of the diacritical marks.

That's because javascript is not html. Javascript has other mechanisms
for escaping characters such as the \u for any unicode character.
So you can write
profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
if you want to save your file in a different encoding as your output.

Robert.
 
T

Thomas 'PointedEars' Lahn

Robert said:
The problem that you are probably having is that the encoding that your
html and/or javascript is saved is in a different encoding than the
encoding you specified in your HTML. Or maybe you forgot to specify the
encoding and the encoding is wrongly auto-detected.

The most useuful encoding in your case is probably UTF-8.
So makes sure you have this in your header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

Utter nonsense.

First, the above (which cannot be part of the [HTTP] header, but of
the `head' element) will not suffice, the HTTP Content-Type header is
important. Second, UTF-8, especially the German umlauts in it, is
not compatible to ISO-8859-* (encoding is different), and you do not
know that he used a Unicode editor for this file.
Next make sure that your html/javascript file is in UTF-8 format.

He does not need to and should not want to if not necessary.
ISO-8859-1(5) will suffice and will be more widely supported.
So you can write
profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
if you want to save your file in a different encoding as your output.

Provided that the used script engine supports Unicode escape sequences.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Robert said:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

I forgot to mention that the above is not Valid HTML. It is subject to
error-correction if SGML NET is ignored; if not, it is equivalent to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;

XHTML != HTML.


PointedEars
 
R

Robert

Thomas said:
Robert wrote:




Utter nonsense.

What part is utter nonsense?
First, the above (which cannot be part of the [HTTP] header, but of
the `head' element) will not suffice, the HTTP Content-Type header is
important.

As can clearly seen by the syntax I was talking about HTML and not the
header transmitted by a server.
To my knowledge the content-type of the HTML file overrides the one
given by the webserver. However I may be wrong about this and therefore
made no comment about it before. It does not change the fact that as a
good author you must provide the content-type in your webpage.

Second, UTF-8, especially the German umlauts in it, is
not compatible to ISO-8859-* (encoding is different),

Where exactly did you see me write that these are compatible?
and you do not
know that he used a Unicode editor for this file.

Where exactly did you see me write this?
I actually made a suggestion for a good editor for him if he needed it.
He does not need to and should not want to if not necessary.
ISO-8859-1(5) will suffice and will be more widely supported.

There is huge support for Unicode.

I cannot see his full needs in one word. Maybe he will need characters
that are not in ISO-8859-1 soon. In any case ISO-8859-1 may suffice, but
UTF-8 will suffice for sure and it's just as easy to use.
So you can write
profile[1] = "Fr\u00F6hliches Fr\u00E4ulein";
if you want to save your file in a different encoding as your output.


Provided that the used script engine supports Unicode escape sequences.

Which it should in 2005.

Don't make yourself look ridiculous by saying something is utter nonsense.
Unicode is very important.
 
R

Robert

Thomas said:
Robert wrote:




I forgot to mention that the above is not Valid HTML.

The original poster did not specify if he wanted HTML or XHTML.
It is subject to
error-correction if SGML NET is ignored; if not, it is equivalent to

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">&gt;

XHTML != HTML.

Most browsers do not use real SGML parsers and will not see the
difference between those two.
 
T

Thomas 'PointedEars' Lahn

Robert said:
The original poster did not specify if he wanted HTML or XHTML.

However, he specified that he is using HTML right now. Why
do you try to force XHTML (or HTML to be error-corrected,
for that matter) on him, with all its ramifications?
Most browsers do not use real SGML parsers and will not see the
difference between those two.

Relying on error-correction is error-prone.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Robert said:
Thomas said:
Utter nonsense.

What part is utter nonsense?
First, the above (which cannot be part of the [HTTP] header, but of
the `head' element) will not suffice, the HTTP Content-Type header is
important.

As can clearly seen by the syntax I was talking about HTML and not the
header transmitted by a server.

There is no such thing as a HTML header. There is the HTML `head'
element, which is a completely different thing. To call the latter
a "header" is inappropriate.
To my knowledge the content-type of the HTML file overrides the one
given by the webserver. However I may be wrong about this and therefore
made no comment about it before.

It MAY override the default (before serving), there is no MUST.
HTML 4.01, section 7.4.4, clearly states that:

,-<http://www.w3.org/TR/html4/struct/global.html#h-7.4.4.2>
|
| The http-equiv attribute can be used in place of the name attribute and
| has a special significance when documents are retrieved via the Hypertext
| Transfer Protocol (HTTP). HTTP servers may use the property name specified
| by the http-equiv attribute to create an [RFC822]-style header in the HTTP
| response.

Most notably, the HTML 4.01 Specification does _not_ state that user agents
MUST or MAY allow the Content-Type header to be overridden by the `meta'
element _after_ the document was served with a different header value.
It does not change the fact that as a good author you must provide the
content-type in your webpage.

For possible future non-HTTP use. Yes, indeed.
Where exactly did you see me write that these are compatible?

Your statement is written in a way that is looks like as if the
OP does not have the choice. You have been proposing the more
complicated way when there is a simpler and still compliant one
which I consider a Bad Thing, especially when addressing a newbie.
There is huge support for Unicode.

Especially on the Web, one has to consider to be backwards compatible.
There are used UAs out there which does not support Unicode, so it is
unwise to use or recommend that if not needed. And it is certainly
not needed here.
Don't make yourself look ridiculous by saying something is utter
nonsense.

I may have been a bit harsh but proposing to declare UTF-8 and using
an Unicode-compatible editor where ISO-8859-* and any text editor
sufficed seemed rather quite ridiculous to me.
Unicode is very important.

Unicode is very important, I did not and do not doubt that. However,
using and recommending it without thinking of the ramifications of its
use only makes matters worse.


Regards,
PointedEars
 
R

Robert

Thomas said:
There is no such thing as a HTML header. There is the HTML `head'
element, which is a completely different thing. To call the latter
a "header" is inappropriate.

Maybe not 100% accurate, but not utter nonsense.
For possible future non-HTTP use. Yes, indeed.

So not utter nonsense too.
Your statement is written in a way that is looks like as if the
OP does not have the choice.

I do not see it in that way. I clearly stated his problem and said UTF-8
is probably best for him and how he could fix it.
You have been proposing the more
complicated way when there is a simpler and still compliant one
which I consider a Bad Thing, especially when addressing a newbie.

I do not think it is the more complicated way. Even for a newbie Unicode
awareness cannot come soon enough. Actually I do not know if this person
is a newbie, because I have seen developers with years of experience,
but no knowledge about character sets and encodings, and have the same
problems that he is having.
Especially on the Web, one has to consider to be backwards compatible.
There are used UAs out there which does not support Unicode, so it is
unwise to use or recommend that if not needed. And it is certainly
not needed here.

The sooner everyone adopts the Unicode standard, the faster these
outdated user agents will be updated.
I may have been a bit harsh but proposing to declare UTF-8 and using
an Unicode-compatible editor where ISO-8859-* and any text editor
sufficed seemed rather quite ridiculous to me.

Even notepad (windows xp) can save in UTF-8!

Look it is obvious that we have different views towards using Unicode,
and there is room for discussion. But to just put away with it as utter
nonsense is insulting.
I just wish someone made me aware of Unicode and encodings in my newbie
days. Now the original poster is aware, he can inform himself further
and can make a conscious decision. And when he decides to use ISO-8859-1
instead, I am sure he is capable to change my suggestion to
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
instead
 
R

Robert

Thomas said:
Robert wrote:




However, he specified that he is using HTML right now.

Ok did not see that.
Just copied it and did not think about removing the last slash.
 
T

Thomas 'PointedEars' Lahn

Robert said:
The sooner everyone adopts the Unicode standard, the faster these
outdated user agents will be updated.

Though I am all in for standards compliance (you will find me advocating
Valid markup, W3C DOM compliant scripting and the like here the n-th time),
especially but not exclusively people trying to make money on and from the
Web can seldom afford this extreme attitude, sometimes in the literal
sense.

It has always been my opinion that a Web developer should try to get as
much audience as possible if the odds for achieving this are acceptable.
Since Unicode is not needed here and the alternative is easy to implement
while having the advantage of broader support, I find them acceptable here.

BTW, talking about adhering to standards, your From header is a violation
of RFC2822, section 3.4.
Even notepad (windows xp) can save in UTF-8!

Interesting, I did not know that. My work platforms are GNU/Linux
and (seldom) Win2k (where I even more seldom use Notepad).


PointedEars
 
D

Dag Sunde

Thomas 'PointedEars' Lahn said:
Robert wrote:
Even notepad (windows xp) can save in UTF-8!

Interesting, I did not know that. My work platforms are GNU/Linux
and (seldom) Win2k (where I even more seldom use Notepad).

Yup...
* ANSI
* Unicode
* Unicode Big Endian
* UTF-8

You can select it from a combobox in the save dialog.
(ANSI is default).
 
R

Robert

Dag said:
Yup...
* ANSI
* Unicode
* Unicode Big Endian
* UTF-8

You can select it from a combobox in the save dialog.
(ANSI is default).

Just wanted to comment that of course the "Unicode" selection is kinda
ridiculous. What they meant was UTF-16 (Little Endian)

Actually ANSI is kinda ridiculous too, because it has nothing to do with
the American National Standards Institute.
 
D

Dag Sunde

Robert said:
Just wanted to comment that of course the "Unicode" selection is kinda
ridiculous. What they meant was UTF-16 (Little Endian)

Actually ANSI is kinda ridiculous too, because it has nothing to do with
the American National Standards Institute.

Of course it is ridiculous...
It's MS NotePad!

;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,002
Messages
2,570,259
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top