Unknown character print on irb or command prompt

P

Priyank Shah

hi,

I read html file using nokogiri. and its work fine.

But after read when i print it, it shows me unknown charater like

" " in place of <somestarting>hello&nbsp;</somecomplete>

so it looks like "hello ".

it create problem bcoz of &nbsp and ending tag.

If any know about its solution please help.

Thanks,
Priyank Shah
 
B

Brian Candler

Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.
 
P

Priyank Shah

Brian said:
Try using
p str
or
puts str.inspect
or
puts str.bytes.to_a.inspect

to get a better look at what character codes are in there.

Hi

Thanks for reply,

But it is not useful for me if i use inspect it convert "hello\302\240"

i want simple space.

Thanks,
Priyank Shah
 
B

Brian Candler

Priyank said:
But it is not useful for me if i use inspect it convert "hello\302\240"

That is useful.

It shows that the &nbsp; has been converted into the sequence \302\240
(octal)
or \xc2\xa0 (hex)

That happens to be the code for a non-breaking space in UTF-8, codepoint
160:

$ irb19
160.chr("UTF-8") => " "
160.chr("UTF-8").bytes.to_a => [194, 160]
160.chr("UTF-8").force_encoding("ASCII-8BIT")
=> "\xC2\xA0"

So the terminal you are trying to print it to is non-UTF-8. Perhaps a
Windows box? You didn't say what your platform was.

In that case, you need to re-encode it to the appropriate character set.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top