Making my Regex less greedy?

L

luke

Hi,

I'm working on a regular expression that will chop a posted message in half,
but chop it on a new paragraph break. I've decided it should look for the
new paragraph break after 100 characters. I'd like the regular expression to
choose an earlier paragraph break rather than a later one, but at the
moment, if there is a message with a number of paragraphs, it chooses the
last possible one it can in order to make a match. I remember reading in the
Pickaxe about how regular expressions are 'greedy', and wonder if this is a
case of regex gluttony perhaps and what I can do to recommend to it a
lighter diet.

# final act is to chop message in half
if message =~ /\A(.{100,#{message.length}})<\/p>\s*<p>(.*)/m then
first_half = $1
second_half = "</p>\n<p>" + $2
else
first_half = message
end

The logic I'd like the above regex to operate with is: "Starting 100
characters into the message, chop the message at the next paragraph break".

Thanks
Luke
 
G

Gavin Kistner

The logic I'd like the above regex to operate with is: "Starting 100
characters into the message, chop the message at the next paragraph
break".

The question mark makes quantifiers non-greedy.
+ versus .+?
* versus .*?
{a,b} versus .{a,b}?

For example:
txt = <<END
Twas brillig
and the slithy toves
did gyre and gimble
in the wabe
END

def truncate_after( str, length )
str =~ /\A(.{#{length},}?)\n(.+)/m
return [ $1, $2 ]
end

p truncate_after( txt, 0 )
#=>["Twas brillig", "and the slithy toves\ndid gyre and gimble\nin
the wabe\n"]

p truncate_after( txt, 20 )
#=>["Twas brillig\nand the slithy toves", "did gyre and gimble\nin
the wabe\n"]

p truncate_after( txt, 40 )
#=>["Twas brillig\nand the slithy toves\ndid gyre and gimble", "in
the wabe\n"]
 
L

luke

Thanks very much, that works a treat. Always nice to have something
demonstrated in Lewis Carroll.

So, {100,#{m.length}}? effectively is now finding the first match, if any
.... Out of curiosity, is it easy to express "find the 3rd match"?. (Rather
than saying "find 3 matches").

Luke


Gavin Kistner said:
The logic I'd like the above regex to operate with is: "Starting 100
characters into the message, chop the message at the next paragraph
break".

The question mark makes quantifiers non-greedy.
+ versus .+?
* versus .*?
{a,b} versus .{a,b}?

For example:
txt = <<END
Twas brillig
and the slithy toves
did gyre and gimble
in the wabe
END

def truncate_after( str, length )
str =~ /\A(.{#{length},}?)\n(.+)/m
return [ $1, $2 ]
end

p truncate_after( txt, 0 )
#=>["Twas brillig", "and the slithy toves\ndid gyre and gimble\nin
the wabe\n"]

p truncate_after( txt, 20 )
#=>["Twas brillig\nand the slithy toves", "did gyre and gimble\nin
the wabe\n"]

p truncate_after( txt, 40 )
#=>["Twas brillig\nand the slithy toves\ndid gyre and gimble", "in
the wabe\n"]
 
W

William James

luke said:
... Out of curiosity, is it easy to express "find the 3rd match"?. (Rather
than saying "find 3 matches").

Third integer (counting starts at 0):

"1. All 27 bells were rung 3 times.".scan(/\d+/)[2]
 
G

Gavin Kistner

So, {100,#{m.length}}? effectively is now finding the first match,
if any
.... Out of curiosity, is it easy to express "find the 3rd match"?.
(Rather
than saying "find 3 matches").

a) As noted in my example, you can leave the second 'argument' to the
range quantifier empty, in which case it is unbounded.
a{3,5} <== find 3-5 'a' chars
a{3,} <== find at least 3 'a' chars, up to ... well, as many as
you can

b) String#scan will take a regexp and return an array of all matches
in the document. (Not as useful if you need the saved sub-
expressions, however.)
 
D

Dave Burt

luke asked:
... Out of curiosity, is it easy to express "find the 3rd match"?. (Rather
than saying "find 3 matches").

scan will find all matches, so you can do:

a = "first second third".scan(/\w+/)
a[2] #=> "third"

Cheers,
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top