a regex

Alexandru Popescu · Oct 27, 2006

Hi!

I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex
(I confess I couldn't figure it out so far :-( ).

many thanks,

/alex

dblack · Oct 27, 2006

Hi --

Hi!

I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex
(I confess I couldn't figure it out so far :-( ).

Try this:

str.split("%2F")

to get all the other stuff in an array.

David

--
David A. Black | (e-mail address removed)
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] http://www.manning.com/black | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org

Thomas Adam · Oct 27, 2006

Hi!

I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex
(I confess I couldn't figure it out so far :-( ).

many thanks,

./alex

``
[thomas@debian ~]% irb
irb(main):001:0> a="2006%2F10%2Fasdfasdf".split('%2F')
=> ["2006", "10", "asdfasdf"]
irb(main):002:0>
''

-- Thomas Adam

Chris Gernon · Oct 27, 2006

Alexandru said:
I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex

String#split would be easier than using a regex in this case:

irb(main):001:0> '2006%2F10%2Fasdfasdf'.split('%2F')
=> ["2006", "10", "asdfasdf"]

Alexandru Popescu · Oct 27, 2006

Alexandru said:
Alexandru said:

I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex

Click to expand...

String#split would be easier than using a regex in this case:

irb(main):001:0> '2006%2F10%2Fasdfasdf'.split('%2F')
=> ["2006", "10", "asdfasdf"]

Thanks for all suggestions, but the requirement is to be done thru
regex only

. I knew how to do it with split, but I need to do it
with regexps only.

/alex[/QUOTE]

Chris Gernon · Oct 27, 2006

Alexandru said:
Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

How about just:

irb(main):001:0> match = '2006%2F10%2Fasdfasdf'.match
/^(.*)%2F(.*)%2F(.*)$/
=> #<MatchData:0x1cf404>
irb(main):002:0> match[1]
=> "2006"
irb(main):003:0> match[2]
=> "10"
irb(main):004:0> match[3]
=> "asdfasdf"

Jon Lim · Oct 27, 2006

s = "2006%2F10%2Fasdfasdf"
year,month,other = URL.decode(s).split(/\//)

dblack · Oct 27, 2006

Hi --

Alexandru said:
Alexandru said:

I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex

Click to expand...

String#split would be easier than using a regex in this case:

irb(main):001:0> '2006%2F10%2Fasdfasdf'.split('%2F')
=> ["2006", "10", "asdfasdf"]

Click to expand...

Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

Regexes alone don't do anything other than specify a pattern. You
need to *use* a regular expression in some operation (like split) to
get a result.

David

--
David A. Black | (e-mail address removed)
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] http://www.manning.com/black | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org

Robert Klemme · Oct 27, 2006

Hi --

Alexandru Popescu wrote:
I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex

String#split would be easier than using a regex in this case:

irb(main):001:0> '2006%2F10%2Fasdfasdf'.split('%2F')
=> ["2006", "10", "asdfasdf"]

Click to expand...

Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

Click to expand...

Regexes alone don't do anything other than specify a pattern. You
need to *use* a regular expression in some operation (like split) to
get a result.

Actually the code you presented did not even use a RX. You used the
string form of split, didn't you?

Kind regards

robert

Alexandru Popescu · Oct 27, 2006

Hi --

Alexandru Popescu wrote:
I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex

String#split would be easier than using a regex in this case:

irb(main):001:0> '2006%2F10%2Fasdfasdf'.split('%2F')
=> ["2006", "10", "asdfasdf"]

Click to expand...

Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

Click to expand...

Regexes alone don't do anything other than specify a pattern. You
need to *use* a regular expression in some operation (like split) to
get a result.

Yes... use groupings, but what I wanted to get is not done through
string.split or something, but through string =~ /pattern/ and than
like in Perl whatever to have access to the groups through $1, $2,
etc.

/alex
--
w( the_mindstorm )p.

David

--
David A. Black | (e-mail address removed)
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] http://www.manning.com/black | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org

Robert Klemme · Oct 27, 2006

Alexandru said:
Alexandru said:

Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

Click to expand...

How about just:

irb(main):001:0> match = '2006%2F10%2Fasdfasdf'.match
/^(.*)%2F(.*)%2F(.*)$/
=> #<MatchData:0x1cf404>
irb(main):002:0> match[1]
=> "2006"
irb(main):003:0> match[2]
=> "10"
irb(main):004:0> match[3]
=> "asdfasdf"

This has a potential for disastrous backtracking with large strings.
This one is better - if you can guarantee there there is no "%" besides
the one preceding the "2F":

=> "2006%2F10%2Fasdfasdf"

s.match(/^([^%]*)%2F([^%]*)%2F(.*)$/).to_a

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Or maybe even

s.match(/^((?>[^%]*))%2F((?>[^%]*))%2F((?>.*))$/).to_a

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Kind regards

robert

Alexandru Popescu · Oct 27, 2006

Alexandru said:
Alexandru said:

Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

Click to expand...

How about just:

irb(main):001:0> match = '2006%2F10%2Fasdfasdf'.match
/^(.*)%2F(.*)%2F(.*)$/
=> #<MatchData:0x1cf404>
irb(main):002:0> match[1]
=> "2006"
irb(main):003:0> match[2]
=> "10"
irb(main):004:0> match[3]
=> "asdfasdf"

Click to expand...

This has a potential for disastrous backtracking with large strings.
This one is better - if you can guarantee there there is no "%" besides
the one preceding the "2F":

=> "2006%2F10%2Fasdfasdf"

s.match(/^([^%]*)%2F([^%]*)%2F(.*)$/).to_a

Click to expand...

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Or maybe even

s.match(/^((?>[^%]*))%2F((?>[^%]*))%2F((?>.*))$/).to_a

Click to expand...

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Yep... this is the closest I got too

.

/alex

Chris Gernon · Oct 27, 2006

Robert said:
This has a potential for disastrous backtracking with large strings.
This one is better - if you can guarantee there there is no "%" besides
the one preceding the "2F":

=> "2006%2F10%2Fasdfasdf"

s.match(/^([^%]*)%2F([^%]*)%2F(.*)$/).to_a

Click to expand...

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Or maybe even

s.match(/^((?>[^%]*))%2F((?>[^%]*))%2F((?>.*))$/).to_a

Click to expand...

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

I have a couple of questions about this; I'm always trying to further my
(currently basic) understanding of regular expressions.

1. Why does my first regex have a potential for disastrous backtracking?
(By disastrous I assume you mean inefficient and CPU-time-consuming,
right?)

2. What does the "?>" do in your second regex? I haven't seen that
before.

Thanks!

Phrogz · Oct 27, 2006

Alexandru said:
Thanks for all suggestions, but the requirement is to be done thru
regex only . I knew how to do it with split, but I need to do it
with regexps only.

I'm unclear on what your real requirements are, but here are some
possible alternatives:

s = "2006%2F10%2Fasdfasdf"
p s.scan( /.+?(?=%2F|$)/ ).map{ |v| v.gsub( '%2F', '' ) }
#=> ["2006", "10", "asdfasdf"]

p s.gsub( '%2F', "\n" ).scan( /[^\n]+/ )
#=> ["2006", "10", "asdfasdf"]

p s.match( /^(.+?)%2F(.+?)%2F(.+?)$/ ).to_a
#=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Patrick Hurley · Oct 27, 2006

Hi!

I have a string in the following form: 2006%2F10%2Fasdfasdf (or more
generic: any_characters+%2Fany_characters+%2Frest_of_it). I am
wondering if I can retrieve the groups 2006, 10, asdfasdf thru a regex
(I confess I couldn't figure it out so far :-( ).

many thanks,

./alex

I am pretty sure I am missing something from your requirements...

"some character%2Fmore stuff%2Fthe rest...".match(/^(.*?)%2F(.*?)%2F(.*)$/)
puts $1
puts $2
puts $3

pth

Robert Klemme · Oct 27, 2006

Robert said:
Robert said:

This has a potential for disastrous backtracking with large strings.
This one is better - if you can guarantee there there is no "%" besides
the one preceding the "2F":

=> "2006%2F10%2Fasdfasdf"

s.match(/^([^%]*)%2F([^%]*)%2F(.*)$/).to_a

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Or maybe even

s.match(/^((?>[^%]*))%2F((?>[^%]*))%2F((?>.*))$/).to_a

Click to expand...

=> ["2006%2F10%2Fasdfasdf", "2006", "10", "asdfasdf"]

Click to expand...

I have a couple of questions about this; I'm always trying to further my
(currently basic) understanding of regular expressions.

If you are really interested in the matter I can recommend "Mastering
Regular Expressions". Even I got valuable insights from it although I
would have regarded me "senior" with regard to RX.

1. Why does my first regex have a potential for disastrous backtracking?
(By disastrous I assume you mean inefficient and CPU-time-consuming,
right?)

Correct. The first ".*" will match greedily as far as it can which
means: to the end of the sequence. Then the RX engine (it is a NFA in
the case of Ruby) detects that it cannot get an overall match with that
because there is no "%2F" following. So it starts backing up by
stepping back one character and trying the "%2F" again etc. This will
go until the first group matches "2006%2F10". Ah, now we can match the
first "%2F" in the pattern. Then comes the next greedy ".*" and the
game starts over again with that. Match to the end, then try to back
up. Eventually the engine will find out that with the first group
eating up the first "%2F" as well there is no overall match since in the
remaining portion there is no more "%2F". Then backing up the first
group starts again until the first group's match is reduced to "2006".

2. What does the "?>" do in your second regex? I haven't seen that
before.

That's an atomic sub RX. Basically it will not give back any characters
that it has consumed. Using that in this example with ".*" will make
the overall match fail:
=> []

Actually I believe atomic grouping is not needed in this case as the
[^%] cannot match past a "%" and so there is probably no potential for
backtracking. Benchmarking probably shows the whole picture. It is
definitively harmful with ".*" because then the backtracking (see above)
cannot start and there will be no overall match.

You can easily see the backtracking with a tool like "Regex Coach" with
which you can step graphically through the match.

Kind regards

robert

SQL Connection string regex pattern to parse sections	1	May 9, 2024
regex question	7	Jun 20, 2013
Very simple regex question	2	Nov 13, 2010
Hopefully simple regex	4	Dec 17, 2008
Defining a variable using comparison operators	3	Dec 17, 2022
Regex not matching a string	2	Jan 9, 2013
Regex breaks when string contains binary data	7	Sep 8, 2010
Problem populating a hash with regex results	7	Jan 16, 2011

a regex

Alexandru Popescu

dblack

Thomas Adam

Chris Gernon

Alexandru Popescu

Chris Gernon

Jon Lim

dblack

Robert Klemme

Alexandru Popescu

Robert Klemme

Alexandru Popescu

Chris Gernon

Phrogz

Patrick Hurley

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads