Smart Quotes

M

Martin DeMello

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

martin
 
R

Robert Klemme

Martin DeMello said:
Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?

What exactly are "smart quotes"?

robert
 
A

Aredridel

Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?


In what character set? While normal quotes fall in the ASCII set, smart
quotes don't, so the replacing them will depend on the character set.

For UTF-8, try gsub(/“(.*)â€/, "\"\1\"") -- though that'll only get one
possible set of quotes. There's also the German-style low-high quotes,
like „thisâ€, and the French like «this» -- I don't know what cases
you're trying to solve.

Ari
 
A

Aredridel

"Smart quotes are a feature found in many popular word processing
programs. They're smart because they automatically insert open
quotation marks at the beginning of a word and closed quotation marks
at the end. Unfortunately, HTML is not smart enough for smart quotes
since they aren't plain ASCII, so if you have smart quotes in your
code, you'll end up with some strange characters on your Web page. Be
sure to have smart quotes turned off whenever writing HTML code. "

Actually, just put the appropriate character set declaration in your
code, and it works nicely:

<meta http-equiv='Content-type' value='text/html; charset=iso8859-1' />

If you speak English and the smart quotes are one byte, then iso-8859-1
is for you. If they're two bytes, then UTF-8 is the character set that's
being used.

If you're not in a primarily English-speaking country, it'll be iso-
8859-something-else (-2 for poland, I know -- there's a list if you
look.)
 
S

Stephan Kämper

Martin said:
Trying to save myself a bit of tedium - has anyone already written code
to replace smart quotes and other such extensions with their normal
ascii equivalents?


I think _why did, at least RedCloth handles qoutes nicely.

irb(main):001:0> require "RedCloth"
=> true
irb(main):002:0> a = RedCloth.new( "\"Quotes\" in a RedCloth string")
=> "\"Quotes\" in a RedCloth string"
irb(main):004:0> a.to_html
=> "<p>“Quotes” in a RedCloth string</p>"


Now, the other posts to this thread made me think about character sets...

Happy rubying

Stephan
 
A

Aaron Schrab

At 02:19 +0900 11 Jun 2004 said:
If you speak English and the smart quotes are one byte, then iso-8859-1
is for you. If they're two bytes, then UTF-8 is the character set that's

This is wrong. ISO-8859-1 doesn't include smart quotes. You're likely
using the windows-1252 character set (aka cp1252), a Microsoft extension
of ISO-8859-1 that much of their software likes to claim is the actual
standard character set. Please accurately label the character set that
is used following the standards.

A list of characters that actually are in ISO-8859-1, along with the
extensions present in windows-1252 is available at:

<http://www.psacake.com/web/0302-b.asp>
 
R

Robert Klemme

Aaron Schrab said:
This is wrong. ISO-8859-1 doesn't include smart quotes. You're likely

Can we please stop writing "smart quotes" when we in fact mean "matching
quotes" or "opening and closing quotes"? IMHO smart quotes do not denote
certain characters but a feature of software, namely that the software
inserts matching opening and closing quotes of the user's language
convention whenever the user enters double / single quotes.

From this it's immediately clear that there is no single pair of "smart
quotes" but a whole bunch of different character pairs that are inserted
whenever certain pieces of software think they should replace other quotes
entered by the user.

Thanks!

Kind regards

robert
 
M

Martin DeMello

Robert Klemme said:
Can we please stop writing "smart quotes" when we in fact mean "matching
quotes" or "opening and closing quotes"? IMHO smart quotes do not denote
certain characters but a feature of software, namely that the software
inserts matching opening and closing quotes of the user's language
convention whenever the user enters double / single quotes.

From this it's immediately clear that there is no single pair of "smart
quotes" but a whole bunch of different character pairs that are inserted
whenever certain pieces of software think they should replace other quotes
entered by the user.

Hm - I'm referring specifically to the characters that MSWord inserts,
then (since I keep getting them in my email, and have to convert them to
pure ascii). I blame Outlook's use of Word(!) as a mail editor.

martin
 
M

Martin DeMello

Stephan Kämper said:
I think _why did, at least RedCloth handles qoutes nicely.

Not what I meant - I want to go through an 'extended ascii' document,
and replace every extended quote character with its ascii equivalent.

FWIW, pasting into vim and typing 'show ascii' gives Hex93 for the open "
and Hex94 for the close one - I was hoping someone had already written a
tr string to do the lot (there's an ellipsis, an en- and em-dash, and a
few other punctuation marks too).

martin
 
A

Aredridel

This is wrong. ISO-8859-1 doesn't include smart quotes. You're likely
using the windows-1252 character set (aka cp1252), a Microsoft extension
of ISO-8859-1 that much of their software likes to claim is the actual
standard character set. Please accurately label the character set that
is used following the standards.

Whoops. My apologies there. I've been using Unicode for so long now that
I forgot they curlies weren't in 8859-1. Windows 1252 is so close that
for a while, I assumed they were the same.

Aren't character sets lovely?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top