how to remove strange characters

L

Li Chen

Hi all,

I grap some info from a webpage. Sometimes I get some stranges
characters as follows (by p):
To depart in a hurry; abscond: \342\200\234Your horse
has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.

or (by print):
To depart in a hurry; abscond: “Your horse has absquatulated!â€Â
(Robert M. Bird) To die.

Any idea to to get rid of them?


Thanks,

Li
 
L

Li Chen

Stephen said:
Those are multi-byte characters (curly quotes, in this case). You
probably don't want to get rid of them, but you can use the iconv
library to transliterate them back to their ASCII almost-equivalents:

=> "To depart in a hurry; abscond: \342\200\234Your horse
has\nabsquatulated!\342\200\235 (Robert M. Bird) To die."
To depart in a hurry; abscond: "Your horse has
absquatulated!" (Robert M. Bird) To die.
=> nil

Stephen

Thank you,

Li
 
L

Li Chen

Hi Stephen and others,

Iconv only works for some characters. It doesn't work for the following
scripts.

Any idea?

Thanks,

Li


C:\Users\Alex>irb
irb(main):001:0> require 'iconv'
=> true
irb(main):002:0> string1="Fatal injury or ruin:\223Hath some fond lover
tic'd thee to thy bane?\224
\342\200\246"
=> "Fatal injury or ruin:\223Hath some fond lover tic'd thee to thy
bane?\224\342\200\246"
irb(main):003:0> puts
Iconv.iconv('ASCII//TRANSLIT','utf-8',string1).to_s
Iconv::IllegalSequence: "\223Hath some fond "...
from (irb):3:in `iconv'
from (irb):3
irb(main):004:0>
 
P

Pablo Q.

[Note: parts of this message were removed to make it a legal post.]

what do you think doing something like this?

class String
def remove_nonascii(replacement)
n=self.split("")
self.slice!(0..self.size)
n.each{|b|
if (b[0].to_i< 32 || b[0].to_i>124) then
self.concat(replacement)
elsif
[34,35,37,42,43,44,45,47,60,61,62,63,91,92,93,94,96,123].include?(b[0].to_i)
self.concat(replacement)
else
self.concat(b)
end
}
self.to_s
end
end

"Fatal injury or ruin:\223Hath some fond lover tic'd thee to
thybane?\224\342\200\246".remove_nonascii('+')

=> "Fatal injury or ruin:+Hath some fond lover tic'd thee to thybane+++++"

how you can see, it made the replacement with char '+'.
 
N

Nit Khair

Li said:
Hi all,

I grap some info from a webpage. Sometimes I get some stranges
characters as follows (by p):
To depart in a hurry; abscond: \342\200\234Your horse
has\nabsquatulated!\342\200\235 (Robert M. Bird) To die.

Here's a quick hack I used recently. It was messing my display on
ncurses, and I did not need the characters.

dataitem.gsub!(/[^[:space:][:print:]]/,'')

I got this while googling, iirc, its used somewhere in ROR.
 
L

Li Chen

Nit said:
Here's a quick hack I used recently. It was messing my display on
ncurses, and I did not need the characters.

dataitem.gsub!(/[^[:space:][:print:]]/,'')

I got this while googling, iirc, its used somewhere in ROR.

It works on scenario where iconv doesn't work. Good job!!!

Li
 
B

Bilyk, Alex

There is no one-click installer for 1.9 on Windows as far as I can tell. Do=
wnloading and unpacking the ziped binaries didn't get me very far as both r=
uby and irb complain that something is missing. Does binary distribution re=
quire me to install anything else? Like libraries? If this is the case what=
additional stuff do I need to make 1.9 to work and where can I get it?

Thanks,
Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top