Concatenating Regex Smartly

S

Shak Shak

Is there a way of quickly concatenating two full string patterns in a
way that takes into account the boundaries? So for example:

\A\d+\Z and \A[a-z]+\Z

would give:

\A\d+[a-z]+\Z

?

Or is this a context sensitive situation where I'd have to parse and
join it myself? If so, what is the best way to "tokenise" a pattern?

Shak
 
R

Robert Klemme

Is there a way of quickly concatenating two full string patterns in a
way that takes into account the boundaries? So for example:

\A\d+\Z and \A[a-z]+\Z

IIRC the "Z" must be lower case.
would give:

\A\d+[a-z]+\Z

?

Or is this a context sensitive situation where I'd have to parse and
join it myself? If so, what is the best way to "tokenise" a pattern?

Why do you have to parse them? There is a bit of context missing but
without further facts I would recommend to keep individual patterns
without the start and end anchors and only apply those after
constructing the full regexp that you want to use. My 0.02 EUR...

Kind regards

robert
 
B

Brian Candler

Shak said:
\A\d+\Z and \A[a-z]+\Z

These are two regular expressions both anchored to the start and end of
the string.

If you want to match one or the other:

re1 = /\A\d+\z/
re2 = /\A[a-z]+\z/

re3 = /#{re1}|#{re2}/
=> /(?-mix:\A\d+\z)|(?-mix:\A[a-z]+\z)/

But to "concatenate" in the sense of making a regexp which matches
digits followed by letters, you need to remove the anchors.

re1 = /\d+/
re2 = /[a-z]+/

re3 = /\A#{re1}#{re2}\z/
=> /\A(?-mix:\d+)(?-mix:[a-z]+)\z/

Note that #{re1} and #{re2} are each surrounded by a non-capturing group
(?...) when they are interpolated into re3. So it should also work
properly for more complex REs, e.g.

re1 = /a|b/
re2 = /c|d/
re3 = /\A#{re1}#{re2}\z/

But if you want to be extra-certain that it's done correctly, you can
always add your own additional layer of grouping:

re3 = /\A(?:#{re1}#{re2})\z/
 
R

Robert Dober

Is there a way of quickly concatenating two full string patterns in a
way that takes into account the boundaries? So for example:

\A\d+\Z =A0and \A[a-z]+\Z
Not that I am aware of, their semantics however is slightly different:

irb(main):001:0> "abc\n" =3D~ /.\Z/ # \Z matches the \n
=3D> 2
irb(main):002:0> "abc\n" =3D~ /.\z/ # \z does not match the \n and neither =
does .
=3D> nil
irb(main):003:0> "abc\n" =3D~ /.\z/m # Now, in multiline mode, the .
matches the \n
=3D> 3

Now this is for 1.9 maybe this does not hold for 1.8.
Cheers
R.

--=20
If you want to build a ship, don=92t herd people together to collect
wood and don=92t assign them tasks and work, but rather teach them to
long for the endless immensity of the sea.
-- Antoine de Saint-Exupery
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,175
Messages
2,570,947
Members
47,498
Latest member
yelene6679

Latest Threads

Top