CGI.pm and lost carriage returns

J

Joseph Czapski

Hi, Perl practitioners. I'm having a problem with CGI.pm. If I have an
HTML form with a textarea input box, I would like my Perl program to see the
carriage returns that the user typed in so I can format his text
appropriately.

Using

$value = $q->param($name);

gives me the text with all the carriage returns deleted. Some words are
just stuck together where they were separated by only one or more carriage
returns.

I like to use CGI.pm for the neatness and the file uploading capability.

Thanks for your help!

Joe Czapski
Boston, Mass.
 
X

xhoster

Joseph Czapski said:
Hi, Perl practitioners. I'm having a problem with CGI.pm. If I have an
HTML form with a textarea input box, I would like my Perl program to see
the carriage returns that the user typed in so I can format his text
appropriately.

Using

$value = $q->param($name);

gives me the text with all the carriage returns deleted.

Most likely, either your web browser isn't sending what you think it is
sending, or the data you are seeing is not what you think you are seeing.

Can you provide an example script that produces the form and evaluates the
response in a way to demonstrate what you are saying?

Xho
 
D

David Squire

Joseph said:
Hi, Perl practitioners. I'm having a problem with CGI.pm. If I have an
HTML form with a textarea input box, I would like my Perl program to see the
carriage returns that the user typed in so I can format his text
appropriately.

Using

$value = $q->param($name);

gives me the text with all the carriage returns deleted. Some words are
just stuck together where they were separated by only one or more carriage
returns.

How and where are you displaying $value to make this judgment? In a web
browser? If so, not that HTML does not recognize carriage returns - it
uses <BR> (or <BR/> for XHTML :) ) to indicate line breaks.

Still, it should at least treat them as white space...

Can you give us some more details, e.g. as the posting guidelines for
this group say, a small but complete script demonstrating your problem
(including data)?


DS
 
U

usenet

Joseph said:
Hi, Perl practitioners. I'm having a problem with CGI.pm. If I have an
HTML form with a textarea input box, I would like my Perl program to see the
carriage returns that the user typed in so I can format his text appropriately.

I think CGI is a great module, but the one fault that I would find is
the sloppy and incomplete perldocs. I cannot think of a Perl builtin
module that has worse documentation.

IMHO, If someone wants to do serious CGI programming, s/he really needs
to get a book that fills in the gaping holes in the perldocs.
Unfortunately, the selection is neither wide nor particularly good. I
have the "Official Guide to Programming with CGI.pm" by Lincoln Stein
(the author of the module), which is kinda like an annotated version of
the perldocs. But, at least, it has fairly complete information.

Page 261-262 of the "Official Guide" describes the behavior of the
textarea's wrapping properties, which is controlled by a "-wrap"
argument (which is not mentioned in any way in the perldocs).

<quote>
-wrap: Sets the WRAP attribute. It can be one of "off," "physical," or
"virtual." If "off," word wrapping only occurs in the field when the
user presses the Enter key. The contents of the field are transmitted
to your script with line breaks inserted exactly as they were displayed
to the user. If "physical," word wrapping occurs automatically when the
text exceeds the width of the field, and the text is transmitted to
your script as if the user had actually transmitted it that way. If
"virtual," word wrapping occurs automatically when the text exceeds the
width of the field, but the contents of the field are transmitted to
your script as a single unbroken line of text (unless the user inserts
a blank line manually).
</quote>

Of course, someone who was rather familiar with HTML itself could
probably guess how WRAP (which is an HTML property) is implemented in
CGI.pm. Personally, though, I use CGI precisely because I DON'T want
to fool with the oddities of HTML.
 
J

Joseph Czapski

David Filmer wrote:
....
-wrap: Sets the WRAP attribute. It can be one of "off," "physical," or
"virtual." If "off," word wrapping only occurs in the field when the
user presses the Enter key. The contents of the field are transmitted
to your script with line breaks inserted exactly as they were displayed
to the user. If "physical," word wrapping occurs automatically when the
text exceeds the width of the field, and the text is transmitted to
your script as if the user had actually transmitted it that way. If
"virtual," word wrapping occurs automatically when the text exceeds the
width of the field, but the contents of the field are transmitted to
your script as a single unbroken line of text (unless the user inserts
a blank line manually).
....

Holy smoke, I think the WRAP attribute may be the issue. I have it set to
'virtual' on all forms. I'm going to test that and then reply back.

Thank you very much!

Joe Czapski
Boston, Mass.
 
T

Todd

Joseph said:
Hi, Perl practitioners. I'm having a problem with CGI.pm. If I have an
HTML form with a textarea input box, I would like my Perl program to see the
carriage returns that the user typed in so I can format his text
appropriately.

Using

$value = $q->param($name);

gives me the text with all the carriage returns deleted. Some words are
just stuck together where they were separated by only one or more carriage
returns.

I like to use CGI.pm for the neatness and the file uploading capability.

Thanks for your help!

Joe Czapski
Boston, Mass.
Did you try printing the result out as:

print "<hr /><pre>$value</pre><hr />";

Todd
 
X

xhoster

I think CGI is a great module, but the one fault that I would find is
the sloppy and incomplete perldocs. I cannot think of a Perl builtin
module that has worse documentation.

I don't think it is a module's documentation's job to document stuff
outside of the module. It is great if it happens to point out some outside
gotchas, but that is not what it is primarily there for.
IMHO, If someone wants to do serious CGI programming, s/he really needs
to get a book that fills in the gaping holes in the perldocs.

Yes, especially the gaping holes that have nothing to do with Perl.
There are plenty of resources on the web for this.
Unfortunately, the selection is neither wide nor particularly good. I
have the "Official Guide to Programming with CGI.pm" by Lincoln Stein
(the author of the module), which is kinda like an annotated version of
the perldocs. But, at least, it has fairly complete information.

Page 261-262 of the "Official Guide" describes the behavior of the
textarea's wrapping properties, which is controlled by a "-wrap"
argument (which is not mentioned in any way in the perldocs).

And, in fact, not mentioned in any relevant way in the CGI.pm source code
either. -wrap is merely passed on to the html directly without any
specific interpretation on the part of CGI.pm.
<quote>
-wrap: Sets the WRAP attribute. It can be one of "off," "physical," or
"virtual." If "off," word wrapping only occurs in the field when the
user presses the Enter key. The contents of the field are transmitted
to your script with line breaks inserted exactly as they were displayed
to the user. If "physical," word wrapping occurs automatically when the
text exceeds the width of the field, and the text is transmitted to
your script as if the user had actually transmitted it that way. If
"virtual," word wrapping occurs automatically when the text exceeds the
width of the field, but the contents of the field are transmitted to
your script as a single unbroken line of text (unless the user inserts
a blank line manually).
</quote>

That description seems to be quite inaccurate. The behavior of the wrap
attribute depends on what browser you are using, but as far as I can tell
no modern browser behaves the way that description says. I think that is
an excellent argument for not including it in the perldoc. Also, it does
not cover "hard" or "soft".
Of course, someone who was rather familiar with HTML itself could
probably guess how WRAP (which is an HTML property) is implemented in
CGI.pm. Personally, though, I use CGI precisely because I DON'T want
to fool with the oddities of HTML.

Unfortunately, the oddities of HTML cannot be entirely abstracted away, no
matter how much we wish they could be.

Xho
 
J

Joseph Czapski

I said:
Holy smoke, I think the WRAP attribute may be the issue. I have it set to
'virtual' on all forms. I'm going to test that and then reply back.

Yup. Setting WRAP to 'physical' solved the problem. I had to add some
additional code, too:

$value =~ s/(\S)\s*?\x0A\s*\x0A\s*?(\S)/$1<br><br>$2/g;
$value =~ s/(\S)\s*?\x0D\s*\x0D\s*?(\S)/$1<br><br>$2/g;
$value =~ s/\s*\x0A\s*/ /g;
$value =~ s/\s*\x0D\s*/ /g;

The 'physical' wrap isn't so great, either. It sends a line break at every
wrapping point. But that's OK, because I can get the formatting I desire by
preserving just the double (or greater) line breaks as <br><br>, and
replacing the single line breaks with a space. I tried to do this in a
platform independent way in the above code. It tests out well.

Thanks again to all who replied!

Joe Czapski
Boston, Mass.
 
J

Joseph Czapski

Xho wrote:
....
That description seems to be quite inaccurate. The behavior of the wrap
attribute depends on what browser you are using, but as far as I can tell
no modern browser behaves the way that description says. I think that is
an excellent argument for not including it in the perldoc. Also, it does
not cover "hard" or "soft".
....

You're right! Further testing shows me that the 'physical' wrap of the
textarea box does NOT behave as described in the HTML spec. when using
Internet Explorer. The form does not return newlines at each wrap point,
but returns only newlines typed by the user. I was taking the spec. as
truth.

This is good news actually. Now I can know exactly where the user typed
newlines, and format appropriately.

Joe Czapski
Boston, Mass.
 
G

Gunnar Hjalmarsson

Joseph said:
I think the WRAP attribute may be the issue. I have it set to
'virtual' on all forms.

Really? The 'wrap' attribute is not mentioned in the HTML 4.01
Specification, not even as deprecated, and various browsers may (and do)
ignore some of the wrap variants.

I think you should consider to not let the textarea width determine the
text formating, but instead use a module, e.g. Text::Format, for the
purpuse.
 
J

Justin C

How and where are you displaying $value to make this judgment? In a web
browser? If so, not that HTML does not recognize carriage returns - it
uses <BR> (or <BR/> for XHTML :) ) to indicate line breaks.

I don't like to get pedantic but I like even less incorrect information
being passed on.

XHTML is lower case only, at least from 1.0 onwards. So that'd be <br/>.


Justin.
 
D

David Squire

Justin said:
I don't like to get pedantic but I like even less incorrect information
being passed on.

XHTML is lower case only, at least from 1.0 onwards. So that'd be <br/>.

Yikes. I had no idea. Thanks for that. No browser that I know of yet cares.

I had (wrongly) assumed that it continued the (perhaps de facto) case
insensitivity of HTML.

I must admit that I can't see any advantage to case-sensitivity for
XHTML tokens, particularly given the history of HTML.


DS
 
J

Justin C

Yikes. I had no idea. Thanks for that. No browser that I know of yet cares.

I had (wrongly) assumed that it continued the (perhaps de facto) case
insensitivity of HTML.

I must admit that I can't see any advantage to case-sensitivity for
XHTML tokens, particularly given the history of HTML.

It's got something to do with XML being case sensitive.

http://www.w3.org/TR/xhtml1/#h-4.2

Actually, the above link doesn't say any more than I have above... but
it's from the horses mouth. I'm sure, if you want the gorey details,
they're around on Google.


Justin.
 
B

Ben Morrow

Quoth David Squire said:
Yikes. I had no idea. Thanks for that. No browser that I know of yet cares.

I'd be very surprised if Mozilla-based browsers didn't object *if* you
serve the XHTML as XHTML (i.e. with an XML content-type). If you serve
it as HTML (which is wrong anyway, AppC of the XHTML spec
notwithstanding) then the browser will parse it by HTML's rules. In this
case you'd be much better off using HTML instead.
I had (wrongly) assumed that it continued the (perhaps de facto) case
insensitivity of HTML.

I must admit that I can't see any advantage to case-sensitivity for
XHTML tokens, particularly given the history of HTML.

All XML element names are case-sensitive.

Ben
 
A

Alan J. Flavell

How and where are you displaying $value to make this judgment? In a
web browser? If so, not that HTML does not recognize carriage
returns - it uses <BR> (or <BR/> for XHTML :) ) to indicate line
breaks.

Apart from being wrong in detail, as already pointed out, this seems
to me to be bizarrely wrong at the level of principles too. Even
though they are, strictly speaking, off-topic for this group, I feel
bound to make a comment.

The format of a submitted textarea is reasonably well specified in the
real HTML specification,
http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.7
(in conjunction with
http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13.3 ).

(This is not confuddled by proprietary "wrap=" attributes, which are
implemented in diverse and confusing ways. Obviously I've noted the
subsequent discussion about browsers inserting newlines for local
display purposes only, and not sending them as part of the submitted
data).

Anyhow, my point is that the submitted data (once the form submission
encoding layer has been unwrapped at the server side) is in principle
*plain text*.

Sure, that plain text *could* be HTML "source", or equally it could be
C++ source or a Perl script or... just plain *plain text*.

The idea of simply stuffing-in <br> tags wherever a newline is seen in
the source is quite bizarre to me. If you want to produce proper HTML
from what was meant to be plain text then you need a properly defined
procedure for doing so (you see such functionality in the
editing features of various Wikis, for example).

On the other hand if your users are expecting to be inputting HTML
"source code", you sure don't want to go inserting unsolicited tags.
You might very well want to analyze the input for potentially
compromising markup, though (scripting attacks and such).
Still, it should at least treat them as white space...

What's "it" meant to be in this sentence? Have we even understood
what it is that the O.P is intending to achieve? Whatever it is, I'm
highly sceptical of the server-side processing merely sprinkling the
input with <br> tags instead of newlines, and nothing more: it does
not seem to be a solution to any variant of this problem that I can
think of. BICBW, of course.

regards
 
D

David Squire

Alan said:
I'm
highly sceptical of the server-side processing merely sprinkling the
input with <br> tags instead of newlines, and nothing more: it does
not seem to be a solution to any variant of this problem that I can
think of. BICBW, of course.

Hmmm. I see it so often that I would almost call it a FAQ. People ask
"where did my linebreaks go?" when displaying text in a browser. This is
due to not realizing that HTML does not use CR, LF etc. for this purpose.

A common situation where this might arise is a simple comment field
where the comment typed is to be displayed on an HTML page, and the
designer wants user newlines to be retained in formatting. Often <BR>
tags is all that is needed to get the desired effect... and indeed the
OP has already indicated that doing just that solved his problem.

DS
 
A

Alan J. Flavell

Hmmm. I see it so often that I would almost call it a FAQ. People
ask "where did my linebreaks go?" when displaying text in a browser.
This is due to not realizing that HTML does not use CR, LF etc. for
this purpose.

But the input was *NOT* meant to be HTML in the first place, so
attempting to display it as such is completely illogical. If it's
plain text, then send it as text/plain. Even MSIE has finally caught
up with that concept.
A common situation where this might arise is a simple comment field
where the comment typed is to be displayed on an HTML page, and the
designer wants user newlines to be retained in formatting.

Yeah, and then the mischievous user inserts some naughty javascript,
or includes a link to some dangerous web page, and soon the damage is
done.
Often <BR> tags is all that is needed

*Absolutely not*. Have you *no* sense of network security?
to get the desired effect...

The "desired effect" is not half of what you're liable to get, if you
allow arbitrary web users to type their choice of HTML and you calmly
insert it into your web page.
and indeed the OP has already indicated that doing just that solved
his problem.

It might have "solved" what the O.P perceived to be the problem. After
all, the (in)famous Matt would have had no idea when he launched his
Script Archive just what kinds of network abuse he would be
responsible for.

--
 
D

David Squire

Alan said:
But the input was *NOT* meant to be HTML in the first place, so
attempting to display it as such is completely illogical.

I don't agree with this. You could see it as a terribly simple Wiki
code: only newlines are significant as extra mark-up. There are all
sorts of Wikis around now that take non-HTML mark-up entered as plain
text in forms and convert it to HTML.
*Absolutely not*. Have you *no* sense of network security?

Fair enough. Point taken. There would have to be other sanity checks too.


DS
 
J

Joseph Czapski

I said:
Yup. Setting WRAP to 'physical' solved the problem.
....

Sorry for the further confusion. Now I think that *eliminating* the WRAP
attribute entirely is the best thing to do. And my code snippet after
getting the $value back from CGI.pm is:

$value =~ s/(\S)\s*?\x0A\s*\x0A\s*?(\S)/$1<br><br>$2/g;
$value =~ s/(\S)\s*?\x0D\s*\x0D\s*?(\S)/$1<br><br>$2/g;
$value =~ s/\s*\x0A\s*/<br>/g;
$value =~ s/\s*\x0D\s*/<br>/g;


Joe Czapski
Boston, Mass.
 
D

Dr.Ruud

Joseph Czapski schreef:
$value =~ s/(\S)\s*?\x0A\s*\x0A\s*?(\S)/$1<br><br>$2/g;
$value =~ s/(\S)\s*?\x0D\s*\x0D\s*?(\S)/$1<br><br>$2/g;

$value =~ s/(\S)\s*?(\x0A|\x0D)\s*\2\s*?(\S)/$1<br><br>$3/g ;

or maybe

$value =~ s/(\S)\s*?(\x0A|\x0D)\s*?\2\s*?(\S)/$1 said:
$value =~ s/\s*\x0A\s*/<br>/g;
$value =~ s/\s*\x0D\s*/<br>/g;

$value =~ s/\s*(?:\x0A|\x0D)\s*/<br>/g ;

I would also do a s/(<br>)(.)/$1\n$2/g ;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top