Converting Word Control characters to an ASCII format

M

Murali

Hi,

I support a web application the customers use to update information
regarding various products.
A lot of times, they copy and paste from a Word Document into the
textbox in the provided form.

The cgi beind, reads in this data and stores it in an ASCCI text file.
When the product
information is updated, and this new information is displayed, all the
Word special
characters are displayed as ? (including the apostrophes, hyphens and
other stuff).

I tried dos2unix program to convert this, but it's not working. Is
there a Perl regular
expression out there which can do this job? So I clean up the text
before storing it in a
text file?

Thanks,

-Murali
 
G

Gregory Toomey

Murali said:
Hi,

I support a web application the customers use to update information
regarding various products.
A lot of times, they copy and paste from a Word Document into the
textbox in the provided form.

The cgi beind, reads in this data and stores it in an ASCCI text file.
When the product
information is updated, and this new information is displayed, all the
Word special
characters are displayed as ? (including the apostrophes, hyphens and
other stuff).

I tried dos2unix program to convert this, but it's not working. Is
there a Perl regular
expression out there which can do this job? So I clean up the text
before storing it in a
text file?

Thanks,

-Murali

First use a unix program like od (octal dump) to get a hex dump of the
characters. This gives you an idea of what you're dealing with.

Hex use tr within Perl to remove what you dont want. Or just remove
non-acceptable ascii characters (ie characters other than
a-z,A-Z,0-9,!@#$%^&*() etc).

gtoomey
 
A

A. Sinan Unur

I support a web application the customers use to update information
regarding various products. A lot of times, they copy and paste from a
Word Document into the textbox in the provided form.

The cgi beind, reads in this data and stores it in an ASCCI text file.
When the product information is updated, and this new information is
displayed, all the Word special characters are displayed as ?
(including the apostrophes, hyphens and other stuff).

http://brent.epicserve.com/blog.php?id=20

might help.

I also found

http://www.indwes.edu/Faculty/bcupp/things/Characters/www/windows-
chars.html

informative.

On the other hand, please note that this is not particularly on-topic here.
The problem descriptions and solutions would be the same regardless of the
language you use in the CGI script.

Sinan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,169
Messages
2,570,919
Members
47,458
Latest member
Chris#

Latest Threads

Top