Get rid of extra, blank lines via html parsing?

D

David Ainley

So I am trying to get some information from a snippet of html
(http://pastebin.com/iTXyxQ0j), and im using doc.inner_text to get the
important parts, but when I do so I get an odd amount of spacing
(http://pastebin.com/6HWDs5dm). is there a way where I can get rid of
all that extra spacing so I can just print the output and it looks
clean? possibly something like

pino
0.2.11-ubuntu0~lucid
troorl
(2010-07-04)

pino
0.2.10-ubuntu0~karmic
troorl
(2010-05-27)

that? or can i get each piece of text and add it to an array? if i do
that while its got all that odd spacing, is that spacing a piece of the
variable? or is it juts the text?

thanks guys!
 
J

Jesús Gabriel y Galán

So I am trying to get some information from a snippet of html
(http://pastebin.com/iTXyxQ0j), and im using doc.inner_text to get the
important parts, but when I do so I get an odd amount of spacing
(http://pastebin.com/6HWDs5dm). =A0is there a way where I can get rid of
all that extra spacing so I can just print the output and it looks
clean? =A0possibly something like

pino
0.2.11-ubuntu0~lucid
troorl
(2010-07-04)

pino
0.2.10-ubuntu0~karmic
troorl
(2010-05-27)

that? =A0or can i get each piece of text and add it to an array? =A0if i = do
that while its got all that odd spacing, is that spacing a piece of the
variable? =A0or is it juts the text?

You can remove 2 or more consecutive "\n" like this:

irb(main):001:0> s =3D<<EOS
irb(main):002:0" test
irb(main):003:0"
irb(main):004:0" test2
irb(main):005:0" sdfsdf
irb(main):006:0" werwer
irb(main):007:0"
irb(main):008:0"
irb(main):009:0"
irb(main):010:0"
irb(main):011:0" sdfsdfsd
irb(main):012:0" sdfer234
irb(main):013:0" EOS
=3D> "test\n\ntest2\nsdfsdf\nwerwer\n\n\n\n\nsdfsdfsd\nsdfer234\n"
irb(main):019:0> s.gsub /\n\n+/, "\n"
=3D> "test\ntest2\nsdfsdf\nwerwer\nsdfsdfsd\nsdfer234\n"

or

irb(main):020:0> s.gsub /\n{2,}/, "\n"
=3D> "test\ntest2\nsdfsdf\nwerwer\nsdfsdfsd\nsdfer234\n"

Hope this helps,

Jesus.
 
D

David Ainley

Hey guys, thanks for the responses. Jesus, the gsubs don't do anything
:/, the output still looks the same.

And Gianfranco, everytime I try to use readline, it gives me an error
"private method `readline' called for #<String:0xb71c3fd8>
(NoMethodError)"
 
J

Jesús Gabriel y Galán

Hey guys, thanks for the responses. =A0Jesus, the gsubs don't do anything
:/, the output still looks the same.
And Gianfranco, everytime I try to use readline, it gives me an error
"private method `readline' called for #<String:0xb71c3fd8>
(NoMethodError)"

Can you show your code?

Jesus.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top