A regex problem

gga · Aug 21, 2005

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Thorsten Haude · Aug 21, 2005

--/WwmFnJnmDyWGHa4
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,

* gga wrote (2005-08-21 09:16):

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Just by looking at it, this only seems to not-find 'Mrs..'.

I also wonder why you use a look-ahead, I would rather use a
look-behind. As it is, your regex would find any dot, because no dot
matches (Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.). So in the regex dialect I know
best (NEdit):
(?<!(Jr|Sr|Miss|Mr|Mrs))\.
(aka. find a dot not preceeded by Jr, Sr, etc.)

Thorsten
--=20
Gerade wenn wir alle ganz sichergehen wollen, schaffen
wir eine Welt voll =E4u=DFerster Unsicherheit
- Dag Hammarskj=F6ld

--/WwmFnJnmDyWGHa4
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDCDDSW/x2JEBlodoRAvdcAKDjTyipZKbAxjeAjFFwGIBnWTmKnwCgnPzS
gBsowZu+y7PG2aQIt9j5V6M=
=jLl0
-----END PGP SIGNATURE-----

--/WwmFnJnmDyWGHa4--

William James · Aug 21, 2005

gga said:
I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

William James · Aug 21, 2005

William said:
gga said:

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Click to expand...

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

This would be a lot easier if Ruby had look-behind.

[
'.start',
'-. HERE .-',
'Jr. is rotten',
'Mr. Smith is here',
'Mr-. Smith is here',
'Mr. Smith is here.',
'Mrs. Jones left',
'Meet Mr. Elihu Snark, Jr.',
'A good line.',
'A mystery guest, introduced by his father, Mr. Bob Eck, Sr.'
].each {|s|
if s =~ %r{ (?:
(?!Jr|Sr|Mr) ^ .{0,2} |
(?!.Jr|.Sr|.Mr|Mrs) ...
)
\.
}x
puts s
end
}

Gavin Kistner · Aug 21, 2005

I also wonder why you use a look-ahead, I would rather use a
look-behind.

A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature. Fortunately, the regexp handler of the next
version of Ruby does. Even more fortunately, this future handler
(Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current
case, or you can get Oniguruma working and use a negative look-behind.

Robert Klemme · Aug 21, 2005

Gavin Kistner said:
A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature. Fortunately, the regexp handler of the next
version of Ruby does. Even more fortunately, this future handler
(Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current
case, or you can get Oniguruma working and use a negative look-behind.

I'd probably use something like /(\w+)\./ and do a programmatic check (or
use a second RX) that the word before the dot is not one of those no match
words.

Kind regards

robert

Thorsten Haude · Aug 21, 2005

--XWOWbaMNXpFDWE00
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,

* Gavin Kistner wrote (2005-08-21 16:41):

A negative look-behind would be the perfect, simple approach to this =20
regex problem. Unfortunately, Ruby's current regexp handler does not =20
have such a feature.

Sorry if I added to the confusion, I'm pretty new to Ruby and wasn't
aware of that limitation.

Thorsten
--=20
A: Top posters
Q: What's the most annoying thing about email these days?

--XWOWbaMNXpFDWE00
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDCLqzW/x2JEBlodoRAn9ZAKCWOaH4VU77pvZ2Kb6UCfl/XK/IVwCgl39K
2JiFXbTTIb/1Vjs8FWXGiJ8=
=6H/k
-----END PGP SIGNATURE-----

--XWOWbaMNXpFDWE00--

Problem populating a hash with regex results	7	Jan 16, 2011
Regex ^ beginning not strong?	2	Jul 26, 2010
Why is regex so slow?	21	Jun 18, 2013
Ruby multiline regex problem	5	Apr 8, 2008
Regex challenge	15	Jun 4, 2008
a DSL + scope problem	4	May 5, 2009
regex negative lookbehind assertion not working correctly?	0	Mar 31, 2009
regex matching question	10	May 19, 2007

A regex problem

gga

Thorsten Haude

William James

William James

Gavin Kistner

Robert Klemme

Thorsten Haude

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads