A regex problem

G

gga

I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?
 
T

Thorsten Haude

--/WwmFnJnmDyWGHa4
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,

* gga wrote (2005-08-21 09:16):
Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

Just by looking at it, this only seems to not-find 'Mrs..'.

I also wonder why you use a look-ahead, I would rather use a
look-behind. As it is, your regex would find any dot, because no dot
matches (Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.). So in the regex dialect I know
best (NEdit):
(?<!(Jr|Sr|Miss|Mr|Mrs))\.
(aka. find a dot not preceeded by Jr, Sr, etc.)


Thorsten
--=20
Gerade wenn wir alle ganz sichergehen wollen, schaffen
wir eine Welt voll =E4u=DFerster Unsicherheit
- Dag Hammarskj=F6ld

--/WwmFnJnmDyWGHa4
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDCDDSW/x2JEBlodoRAvdcAKDjTyipZKbAxjeAjFFwGIBnWTmKnwCgnPzS
gBsowZu+y7PG2aQIt9j5V6M=
=jLl0
-----END PGP SIGNATURE-----

--/WwmFnJnmDyWGHa4--
 
W

William James

gga said:
I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}
 
W

William James

William said:
gga said:
I am usually pretty good at regexes but this one has me stumped.
I want to basically match any line that has a period in it, but only if
that period is not part of a salutation. Ideally I want to do this
with a single regex.

Thus:
'end of line. And we continue' # should match
'The incredible Mrs. Robner' # should not match
'Sammy Davis Jr. is an okay guy.' # should match, due to last .

I tried doing something logical, like:

/(?!Jr\.|Sr\.|Miss\.|Mr\.|Mrs\.)\./

but, alas, this does not work. Any ideas?

a = [
'end of line. And we continue',
'The incredible Mrs. Robner',
'Sammy Davis Jr. is an okay guy.'
]

a.each {|s|
puts s if s.gsub(/(?:Jr\.|Sr\.|Mr\.|Mrs\.)/,"") =~ /\./
}

This would be a lot easier if Ruby had look-behind.

[
'.start',
'-. HERE .-',
'Jr. is rotten',
'Mr. Smith is here',
'Mr-. Smith is here',
'Mr. Smith is here.',
'Mrs. Jones left',
'Meet Mr. Elihu Snark, Jr.',
'A good line.',
'A mystery guest, introduced by his father, Mr. Bob Eck, Sr.'
].each {|s|
if s =~ %r{ (?:
(?!Jr|Sr|Mr) ^ .{0,2} |
(?!.Jr|.Sr|.Mr|Mrs) ...
)
\.
}x
puts s
end
}
 
G

Gavin Kistner

I also wonder why you use a look-ahead, I would rather use a
look-behind.

A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature. Fortunately, the regexp handler of the next
version of Ruby does. Even more fortunately, this future handler
(Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current
case, or you can get Oniguruma working and use a negative look-behind.
 
R

Robert Klemme

Gavin Kistner said:
A negative look-behind would be the perfect, simple approach to this
regex problem. Unfortunately, Ruby's current regexp handler does not
have such a feature. Fortunately, the regexp handler of the next
version of Ruby does. Even more fortunately, this future handler
(Oniguruma) is available now.

So, you can write a more complex regexp/logic to detect your current
case, or you can get Oniguruma working and use a negative look-behind.

I'd probably use something like /(\w+)\./ and do a programmatic check (or
use a second RX) that the word before the dot is not one of those no match
words.

Kind regards

robert
 
T

Thorsten Haude

--XWOWbaMNXpFDWE00
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,

* Gavin Kistner wrote (2005-08-21 16:41):
A negative look-behind would be the perfect, simple approach to this =20
regex problem. Unfortunately, Ruby's current regexp handler does not =20
have such a feature.

Sorry if I added to the confusion, I'm pretty new to Ruby and wasn't
aware of that limitation.


Thorsten
--=20
A: Top posters
Q: What's the most annoying thing about email these days?

--XWOWbaMNXpFDWE00
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDCLqzW/x2JEBlodoRAn9ZAKCWOaH4VU77pvZ2Kb6UCfl/XK/IVwCgl39K
2JiFXbTTIb/1Vjs8FWXGiJ8=
=6H/k
-----END PGP SIGNATURE-----

--XWOWbaMNXpFDWE00--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top