How to remove leading   from string

L

Lucky Nl

Hi
I Need small help that how to remove leading   tags
My text is:
str = "<p>Welcome to ruby &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"
I want result is
str = "<p>Welcome to ruby</p>"

Can anybody help
 
B

Brian Candler

Lucky said:
Hi
I Need small help that how to remove leading &nbsp; tags
My text is:
str = "<p>Welcome to ruby &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"
I want result is
str = "<p>Welcome to ruby</p>"

Can anybody help

You mean trailing, rather than leading?

You probably want String#gsub or String#gsub!. For example:

$ irb --simple-prompt
Removing empty paragraphs is left as an exercise. For more information
on String and Regexp see http://www.ruby-doc.org/docs/ProgrammingRuby/

However for anything other than the most basic transformations, you are
almost certainly better off with a HTML parser like Nokogiri, than
chomping HTML with regexps.
 
L

Lucky Nl

Hi ,
Am entering multiple paragrpahs in editior .that will be saved into str
varable.

Ex:
str is
<p>test one &nbsp;&nbsp;</p><p>&nbsp;&nbsp;</p><p>test one test
onetest onetest one</p> <p>test two test
two test two test two &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>

we have enetered like this way
I want result is

<p>test one &nbsp;&nbsp;</p><p>&nbsp;&nbsp;</p><p>test one test
onetest onetest one</p> <p>test two test
two test two test two</p>


here removed end of the nbsptags between paragparhs and removed nbsp; 's
in "<p>test two test
two test two test two</p>"
 
B

Brian Candler

Lucky said:
Hi ,
Am entering multiple paragrpahs in editior .that will be saved into str
varable.

Ex:
str is
<p>test one &nbsp;&nbsp;</p><p>&nbsp;&nbsp;</p><p>test one test
onetest onetest one</p> <p>test two test
two test two test two &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>

we have enetered like this way
I want result is

<p>test one &nbsp;&nbsp;</p><p>&nbsp;&nbsp;</p><p>test one test
onetest onetest one</p> <p>test two test
two test two test two</p>


here removed end of the nbsptags between paragparhs and removed nbsp; 's
in "<p>test two test
two test two test two</p>"

Your requirement is unclear. Are you saying you want to remove the
&nbsp;'s within the fourth paragraph only, and remove the fifth
paragraph entirely?

I've shown you how to use gsub, and where to find more documentation on
it. String#scan might be useful too.

I suggest you use them in whatever way you need, since only you
understand what you're trying to achieve.
 
M

MrZombie

<p>test one &nbsp;&nbsp;</p><p>&nbsp;&nbsp;</p><p>test one test
onetest onetest one</p> <p>test two test
two test two test two &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>

str = str.gsub(/&nbsp;/,"").gsub(/<p>\s*<\/p>/,"")

This will remove any &nbsp; from your html, and after that, remove any
<p> tag that contained only whitespace character.

It's less than optimal, as you could combine it in one go, probably,
but I don't want to spend time on stuff you should be able to do on
your own.
 
B

BruceL

Hi
  I Need small help that how to remove leading &nbsp; tags
  My text is:
     str = "<p>Welcome to ruby &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"
   I want result is
     str = "<p>Welcome to ruby</p>"

Can anybody help

simply a.delete!("&nbsp;")
 
R

Rob Biedenharn

simply a.delete!("&nbsp;")


Well, that's almost certainly not what the OP wants!

irb> str = "<p>Welcome to ruby &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"
=> "<p>Welcome to ruby &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"
irb> str.delete('&nbsp;')
=> "<>Welcome to ruy </> <></>"

Look at the documentation for String#delete

Then take a look at String#gsub

I suspect that you want to do two passes: one for '&nbsp;' and one for
empty paragraphs.

-Rob

Rob Biedenharn
(e-mail address removed) http://AgileConsultingLLC.com/
(e-mail address removed) http://GaslightSoftware.com/
 
L

Lucky Nl

Hi ,
Let me explain my requiremnt clearly.
Am usinng fck editor in rubyonrails.
So I can enter data is multiple paragraphs or single paragraph. but
after the last paragraph if there is any spaces , i want to remove them
 
B

Brian Candler

SO i entered data is like.
---------------------------------------------------------------
<p> pargraph1 pargraph1 &nbsp;&nbsp; </p>
<p>pargraph2 pargraph2 pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello3 hell14 &nbsp;&nbsp; </p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;</p>
--------------------------------------------------------

In the above text 4th paragraph is the last paragrph which i
entered.after that i was pressed enter button so editor converted this
into " <p>&nbsp;&nbsp;&nbsp;&nbsp;</p"

I want to remove nbsp's after text in last paragrpah means result look
like
-------------------------------------------
<p> pargraph1 pargraph1 &nbsp;&nbsp; </p>
<p>pargraph2 pargraph2 pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello3 hell14</p>
----------------------------------------------------------------

So the approach is:
(1) Write a regular expression which matches just the thing you want to
delete;
(2) Invoke it with gsub to replace that text with the empty string.

For example, to delete *all* empty paragraphs, then you want to match
<p> followed by any mixture of &nbsp; and space followed by </p>. So you
could write:

str.gsub! /<p>(&nbsp;|\s)*<\/p>/, ''

(x|y) means match x or y, \s means match any whitespace character, and *
means match it 0 or more times.

To delete only the *last* paragraph if it is empty, then you can tweak
it to:

str.gsub! /<p>(&nbsp;|\s)*<\/p>\s*\z/, ''

where \z matches the end of the string, and \s* allows 0 or more space
characters, including newlines, to precede that.

Once you're happy with that, then you can do another match and replace
to change the final instance of "&nbsp;&nbsp; </p>" into just "</p>"

But you might want to be sure this is what you really want. How did the
previous &nbsp; entries get there? Do you really want to keep them? It
would be much simpler just to replace all sequences of &nbsp; or space
with a single space.

str.gsub! /(&nbsp;|\s)+/, ' '
 
L

Lucky Nl

Brian said:
So the approach is:
(1) Write a regular expression which matches just the thing you want to
delete;
(2) Invoke it with gsub to replace that text with the empty string.

For example, to delete *all* empty paragraphs, then you want to match
<p> followed by any mixture of &nbsp; and space followed by </p>. So you
could write:

str.gsub! /<p>(&nbsp;|\s)*<\/p>/, ''

(x|y) means match x or y, \s means match any whitespace character, and *
means match it 0 or more times.

To delete only the *last* paragraph if it is empty, then you can tweak
it to:

str.gsub! /<p>(&nbsp;|\s)*<\/p>\s*\z/, ''

where \z matches the end of the string, and \s* allows 0 or more space
characters, including newlines, to precede that.

Once you're happy with that, then you can do another match and replace
to change the final instance of "&nbsp;&nbsp; </p>" into just "</p>"

But you might want to be sure this is what you really want. How did the
previous &nbsp; entries get there? Do you really want to keep them? It
would be much simpler just to replace all sequences of &nbsp; or space
with a single space.

str.gsub! /(&nbsp;|\s)+/, ' '


Hi when i was used below logic.
str.gsub! /<p>(&nbsp;|\s)*<\/p>\s*\z/, ''

-------------------------------------------------
str = "<p> pargraph1 pargraph1 &nbsp;&nbsp; </p> <p>pargraph2 pargraph2
pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello3 hell14 &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"
str = str.gsub! /<p>(&nbsp;|\s)*<\/p>\s*\z/, ''
puts str
 
L

Lucky Nl

Oh k basically result not modified then returns nil right?

its very helpful your regular expression
But i need one mroe help
str = "<p> pargraph1 pargraph1 &nbsp;&nbsp; </p> <p>pargraph2 pargraph2
pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello3 hell14 &nbsp;&nbsp;</p> <p>&nbsp;&nbsp;</p>"


In the above str <p>&nbsp;&nbsp;</p> is empty line inmy point of view.
so the enetered text in editor is upto

str = "<p> pargraph1 pargraph1 &nbsp;&nbsp; </p> <p>pargraph2 pargraph2
pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello4 hell14 &nbsp;&nbsp;</p>

i want to remove spaces wt the endof the lastpargrpah also.Not in the 1
&2&3rd pargrpahs
Want result is
str = "<p> pargraph1 pargraph1 &nbsp;&nbsp; </p> <p>pargraph2 pargraph2
pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello4 hell14</p>
 
B

Brian Candler

Lucky said:
But if i gave with littile modifiaction at endof line is enetered with
chars <p>dada&nbsp;&nbsp;ddsa</p>"

str giving result nil

Yes, the result of gsub! is nil if no change is made; but the string
remains as it was.

irb(main):001:0> str = "abc"
=> "abc"
irb(main):002:0> str.gsub!(/d/,"")
=> nil
irb(main):003:0> str
=> "abc"

It's intended so you can say

if str.gsub! ...
# it changed
else
# it didn't
end

If you use gsub instead of gsub!, then it always returns the resulting
string.

irb(main):004:0> str2 = str.gsub(/d/,"")
=> "abc"
 
B

Brian Candler

Lucky said:
i want to remove spaces wt the endof the lastpargrpah also.Not in the 1
&2&3rd pargrpahs
Want result is
str = "<p> pargraph1 pargraph1 &nbsp;&nbsp; </p> <p>pargraph2 pargraph2
pargraph2 &nbsp;&nbsp; </p>
<p> pargraph3 hello3 hell13 &nbsp;&nbsp; </p>
<p> pargraph4 hello4 hell14</p>

So, write a regexp which matches any number of &nbsp; or space, followed
by </p>, followed by end of string. I've given you the tools to do that
already.

If you can't make it work then show what you tried, and we can explain
what needs changing.

You can test your regexps using irb, or you can use this web site:
http://rubular.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,375
Latest member
FelishaCma

Latest Threads

Top