regex: get the first match

  • Thread starter Trochalakis Christos
  • Start date
T

Trochalakis Christos

Hello!

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:
=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

Is there any way to do this?

Thanks!
 
R

Robert Dober

Hello!

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:
=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

Is there any way to do this?

Thanks!
This is a FAQ, and yes I will give the solution ;)
Regexps are gready par default, they consume as many chars as
possible, there are some possibilities - not tested:

(1) use non gready matches
"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
(2) use less general expressions
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
(3) Combine both ;)
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)

HTH
Robert

P.S.
This *really* is a FAQ though
 
R

Robert Dober

Hello!

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"

i am doing:

"<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
=> [["this is</i><i>my string"]]

What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

Is there any way to do this?

Thanks!
This is a FAQ, and yes I will give the solution ;)
Regexps are gready par default, they consume as many chars as
possible, there are some possibilities - not tested:

(1) use non gready matches
"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
(2) use less general expressions
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*)<\/i>/)
(3) Combine both ;)
"<i>this is</i><i>my string</i>".scan(/<i>(.[^<]*?)<\/i>/)


.Unless you want to match strings like <i><foo</i>, it would be simple to
just use [^<]*, and not .[^<]*. .[^<]* will also not match <i></i>. If the
intent was to make the regexp not match that, a better regexp would be [^<]+
Thanks for correcting my typos.
Robert
 
T

Trochalakis Christos

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"
i am doing:
"<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
=> [["this is</i><i>my string"]]
What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

The solution is :

"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
=> [["this is"], ["my string"]]

The regexp scope is default maximum as is possible to find.
If you use '?' character you minimze the scope.
(.*?) instead of (.*) and the </i><i> part of string don't be include
into one result.

Regards,
Grzegorz Golebiowski

Thanks Grzegorz, nice trick!
 
R

Robert Dober

I want to parse a tagged string like this: "<i>this is</i><i>my
string</i>"
i am doing:
"<i>this is</i><i>my string</i>".scan(/<i>(.*)<\/i>/)
=> [["this is</i><i>my string"]]
What i want is a regex that will return the *first* segment that
matches.
in the above case -> [["this is", "my string"]]

The solution is :

"<i>this is</i><i>my string</i>".scan(/<i>(.*?)<\/i>/)
=> [["this is"], ["my string"]]

The regexp scope is default maximum as is possible to find.
If you use '?' character you minimze the scope.
(.*?) instead of (.*) and the </i><i> part of string don't be include
into one result.

Regards,
Grzegorz Golebiowski

Thanks Grzegorz, nice trick!
You are welcome ;)
Robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,261
Messages
2,571,308
Members
47,968
Latest member
SerenaRusc

Latest Threads

Top