cgi and escapeHTML but not ampersand

M

Marek

Hello all!

Please if this is not the appropriate group, point my to the right
one.

I am trying to find since a good while, how to convince the CGI server
module, not to replace the ampersand, by &

I have an array like follows with two entity encoded *Umlauts*:

my @element_liste =
(
...
{ type => "text", name => "email",
bez => "Email (zur Auftragsbestätigung):", size =>
36 },
{ type => "text", name => "fahrgast",
bez => "Fahrgäste:", size => 36, muss => 1 },
...
);

and later the cgi I is producing a form with :

foreach my $f (@{$element_liste_ref})
{
print escapeHTML ($f->{bez}), " ",
textfield (-name => $f->{name},
-size => $f->{size}),
br (), "\n";
}

How to prevent, that the entity encoded ä is coming back as
ä ?
In my @element_liste I tried with every imaginable tricks, like :

bez => "Fahrgäste:",
bez => "Fahrg\äste:",
bez => "Fahrg\\äste:",
bez => "Fahrgäste:",
or remove the escapeHTML

The header of the cgi is:

print header (),
start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => "Title",
-lang => 'de',
-style=>{'src'=>'/style/style.css',
-type=>'text/css',
-media=>'screen'},
-charset=>'utf-8'
),

which is producing the non valid <body charset="utf-8">. On the server
is running unfortunately an outdated CGI version: CGI.pm Version:
2.752

Thank you for your help.


marek
 
J

Jens Thoms Toerring

Marek said:
I am trying to find since a good while, how to convince the CGI server
module, not to replace the ampersand, by &amp;
I have an array like follows with two entity encoded *Umlauts*:
my @element_liste =
(
...
{ type => "text", name => "email",
bez => "Email (zur Auftragsbest&auml;tigung):", size =>
36 },
{ type => "text", name => "fahrgast",
bez => "Fahrg&auml;ste:", size => 36, muss => 1 },
...
);
and later the cgi I is producing a form with :
foreach my $f (@{$element_liste_ref})
{
print escapeHTML ($f->{bez}), " ",
textfield (-name => $f->{name},
-size => $f->{size}),
br (), "\n";
}
How to prevent, that the entity encoded &auml; is coming back as
&amp;auml; ?

The only way to prevent replacement of characters that have a
special meaning in HTML is not to call a function that's meant
to do just that. And I don't see the need to call escapeHTML()
here since what you output seems to be fully written by you
and not derived from user input, so you can manually "escape"
everything that needs escaping.
In my @element_liste I tried with every imaginable tricks, like :
bez => "Fahrgäste:",

That, of course, works since there's nothing in that string that
would need conversion. On the other hand, then the encoding for
the page must be set correctly (probably either iso-8859-1 or
utf-8) to get it displayed correctly on the client side.
bez => "Fahrg\&auml;ste:",
bez => "Fahrg\\&auml;ste:",

The backslash isn't an "escape character" recognized by that
function.
bez => "Fahrg&amp;auml;ste:",

That can only make things worse, you will end up with
"Fahrg&amp;amp;auml;ste:";-)
or remove the escapeHTML

Looks like the way to go if the text to be output is written
by you and doesn't incorporate elements coming from the out-
side. If you need to use text coming from the outside then
run escapeHTML() on it before you use in the text you want
to output.
The header of the cgi is:
print header (),
start_html (-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => "Title",
-lang => 'de',
-style=>{'src'=>'/style/style.css',
-type=>'text/css',
-media=>'screen'},
-charset=>'utf-8'
),
which is producing the non valid <body charset="utf-8">. On the server
is running unfortunately an outdated CGI version: CGI.pm Version:
2.752

Iif you have to you could simply forgo using start_html() and
output the text for the page header directly. Just take what
the call of start_html() outputs, correct it as necessary, and
then output it with a simple print.

Regards, Jens
 
M

Marek

The only way to prevent replacement of characters that have a
special meaning in HTML is not to call a function that's meant
to do just that. And I don't see the need to call escapeHTML()
here since what you output seems to be fully written by you
and not derived from user input, so you can manually "escape"
everything that needs escaping.


That, of course, works since there's nothing in that string that
would need conversion. On the other hand, then the encoding for
the page must be set correctly (probably either iso-8859-1 or
utf-8) to get it displayed correctly on the client side.


The backslash isn't an "escape character" recognized by that
function.


That can only make things worse, you will end up with
"Fahrg&amp;amp;auml;ste:";-)


Looks like the way to go if the text to be output is written
by you and doesn't incorporate elements coming from the out-
side. If you need to use text coming from the outside then
run escapeHTML() on it before you use in the text you want
to output.


Iif you have to you could simply forgo using start_html() and
output the text for the page header directly. Just take what
the call of start_html() outputs, correct it as necessary, and
then output it with a simple print.

                                Regards, Jens


Jens! Vielen Dank!

I am appreciating your help! You were right! I thought, that I tried
really everything, but your hints helped me out of an impasse! Here my
steps:


1. I put in a blank start_html()
2. I removed all escapeHTML
3. I tried with Umlauts "ä" etc (not working)
4. So I tried with entity-encoding (working!!! Uff!!)
5. I reinserted the wished Doctype and style-sheet

to 5.:

My server is giving back a non valid Doctype:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">

Strange, my html-validator is telling me, that such kind of beast does
not exist. Probably this is due to the old cgi version: CGI.pm
Version: 2.752

A last question: how to set correctly the encoding to utf-8?


Thank you again Jens
 
J

Jens Thoms Toerring

Marek said:
My server is giving back a non valid Doctype:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd">
Strange, my html-validator is telling me, that such kind of beast does
not exist. Probably this is due to the old cgi version: CGI.pm
Version: 2.752
A last question: how to set correctly the encoding to utf-8?

I guess you will need to output something like the following
instead of calling start_html() (if you it all manually I do
not think you should call it at all, even without arguments):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Title</title>
<link rel="stylesheet" type="text/css" href="style.css" media="screen" />
</head>
<body>

At least that seems to get accepted by the HTML validator;-)

Regards, Jens
 
M

Marek

I guess you will need to output something like the following
instead of calling start_html() (if you it all manually I do
not think you should call it at all, even without arguments):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Title</title>
    <link rel="stylesheet" type="text/css" href="style.css" media="screen" />
  </head>
  <body>

At least that seems to get accepted by the HTML validator;-)

                               Regards, Jens


Thank you Jens!


Of course; I can still mix "hand written" code with cgi produced html.
By the way, this produces valid xhtml:

print start_html ({-dtd => '-//W3C//DTD XHTML 1.0 Transitional//EN',
-title => 'Title',
-style=>{'src'=>'style/style.css'}}
),

But I still don't know how to produce a valid charset=utf-8
declaration with cgi. Probably I will stick to insert the header by
hand nevertheless, as you recommended it.


Greetings from Munich



marek
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top