Best practice for translating web page character data so that page will be scrapable/e-mailable

G

Guest

I have web pages that I periodically want to a) programmatically "scrape",
and b) programmatically send in e-mail. These web pages are built via
content management systems and occassionally have Word "curly quotation
marks" and other weird entities embedded in them.

If you fail to translate characters properly, you have the familiar problem
of some characters turning into question marks when sent in e-mail and/or
scraped. You will see this problem all of the time on web-based newsletters
and the like.

When I was working in classic ASP, I wrote "translate" functions that would
render weird characters into their safe equivalents using a simple string
"replace". This was a limited solution because it was premised on my ability
to identify all of the problematic characters myself and translate them.

I am wondering if there is an all-in-one solution to this problem inside or
outside of the .NET framework. I have read a bit about the character
encoding classes and I'm hoping that one of them represent a complete
solution to my problem.

Can anyone offer any guidance?

Thanks,
-KF
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top