Regex for "not matching" an unneeded prefix substring?

J

Jet Koten

Hi all,

I'm new to Ruby and even newer to regex. I'm trying to write my first
[useful] Ruby program and need a way to cut out an unneeded prefix
substring and retain the substring that comes after it.

Here are the actual details from my code:

result.each do |item|
price = item.search(".price").text.match(/\d+[.]\d+/)
condition = item.search(".condition").text.match(/Used - ([^,]+)/)
rating = item.search(".rating a").text.to_i
seller = item.search(".seller b").text
puts "#{price} - #{condition} - #{rating} - #{seller}"
end

The one from condition [in the code above] is the one that is giving me
a challenge. The string that is sent to condition will always be exactly
one of the following and nothing else at all:

"Used - Like New"
"Used - Very Good"
"Used - Good"
"Used - Acceptable"

I'm trying to get them to display as the following in the puts at the
end of my code:

"Like New"
"Very Good"
"Good"
"Acceptable"

The regex that I've got there in the condition line works in Rubular,
but not in my code. I'm running 1.8.7 if that matters...

One last thing that I don't understand too is that in Rubular my regex
for price shows the match in the "Match result:" line, but the regex for
condition shows the whole string as a match in the "Match result:" line
but shows the correctlt matching substring in the "Match captures:"
line.

I'm grateful for this great resource (the list/forum) and would be very
happy to hear from anyone who can help me sort this out!

Thanks in advance,
J
 
A

Alexander Jesner

The one from condition [in the code above] is the one that is giving me
a challenge. The string that is sent to condition will always be exactly
one of the following and nothing else at all:

"Used - Like New"
"Used - Very Good"
"Used - Good"
"Used - Acceptable"

I'm trying to get them to display as the following in the puts at the
end of my code:

"Like New"
"Very Good"
"Good"
"Acceptable"

If you just want to ged rid of the word "Used", you could use something
like this:

text = "Used - Like New"
text[7, text.length]
=> "Like New"

Regards
 
R

Robert Klemme

2010/2/26 Jet Koten said:
Hi all,

I'm new to Ruby and even newer to regex. I'm trying to write my first
[useful] Ruby program and need a way to cut out an unneeded prefix
substring and retain the substring that comes after it.

Here are the actual details from my code:

result.each do |item|
=A0price =3D item.search(".price").text.match(/\d+[.]\d+/)
=A0condition =3D item.search(".condition").text.match(/Used - ([^,]+)/)
=A0rating =3D item.search(".rating a").text.to_i
=A0seller =3D item.search(".seller b").text
=A0puts "#{price} - #{condition} - #{rating} - #{seller}"
end

The one from condition [in the code above] is the one that is giving me
a challenge. The string that is sent to condition will always be exactly
one of the following and nothing else at all:

"Used - Like New"
"Used - Very Good"
"Used - Good"
"Used - Acceptable"

I'm trying to get them to display as the following in the puts at the
end of my code:

"Like New"
"Very Good"
"Good"
"Acceptable"

The regex that I've got there in the condition line works in Rubular,
but not in my code. I'm running 1.8.7 if that matters...

I am not sure which regexp you are referring to specifically.
However, you can do this

irb(main):001:0> s =3D "Used - Like New"
=3D> "Used - Like New"
irb(main):002:0> s[/\AUsed\s+-\s+(.*)\z/, 1]
=3D> "Like New"
irb(main):003:0> s[7..-1]
=3D> "Like New"

String#[] with regular expression is a very powerful tool - especially
when used with grouping as in this case.
One last thing that I don't understand too is that in Rubular my regex
for price shows the match in the "Match result:" line, but the regex for
condition shows the whole string as a match in the "Match result:" line
but shows the correctlt matching substring in the "Match captures:"
line.

I am having difficulties to follow you here since I don't know what
"item" is in your case. It's probably easier if you provide a simple
test case that demonstrates your point. Using IRB often also helps.
I'm grateful for this great resource (the list/forum) and would be very
happy to hear from anyone who can help me sort this out!

We'll try to help but please provide a bit more information.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
J

Jet Koten

Alexander said:
end of my code:

"Like New"
"Very Good"
"Good"
"Acceptable"

If you just want to ged rid of the word "Used", you could use something
like this:

text = "Used - Like New"
text[7, text.length]
=> "Like New"

Regards

It works! :) I had to change 7 to 8 to get rid of an extra space, but
that did it in a far less complex way than using regex! Thanks.
 
J

Jet Koten

Jet said:
Alexander said:
end of my code:

"Like New"
"Very Good"
"Good"
"Acceptable"

If you just want to ged rid of the word "Used", you could use something
like this:

text = "Used - Like New"
text[7, text.length]
=> "Like New"

Regards

It works!

Hmmm, well, actually, it kind of works. I did this:

result.each do |item|
price = item.search(".price").text.match(/\d+[.]\d+/)
condition = item.search(".condition").text
rating = item.search(".rating a").text.to_i
seller = item.search(".seller b").text
puts "#{price} - #{condition.chomp[8, condition.length]} - #{rating} -
#{seller}"
end

and then realized I actually need to be able to just put #{condition} by
itself in the puts and not use #{condition.chomp[8, condition.length]}

but, I tried and found that I don't know how to adjust the code in the
block above. Can someone help again?
 
A

Alexander Jesner

and then realized I actually need to be able to just put #{condition} by
itself in the puts and not use #{condition.chomp[8, condition.length]}

Insert

condition = condition.chomp[8, condition.length]

after

condition = item.search(".condition").text


and you can use #{condition} in the string.

Regards
 
J

Jet Koten

Robert said:
2010/2/26 Jet Koten said:
�condition = item.search(".condition").text.match(/Used - ([^,]+)/)
"Used - Very Good"

The regex that I've got there in the condition line works in Rubular,
but not in my code. I'm running 1.8.7 if that matters...

I am not sure which regexp you are referring to specifically.
However, you can do this

irb(main):001:0> s = "Used - Like New"
=> "Used - Like New"
irb(main):002:0> s[/\AUsed\s+-\s+(.*)\z/, 1]
=> "Like New"
irb(main):003:0> s[7..-1]
=> "Like New"

String#[] with regular expression is a very powerful tool - especially
when used with grouping as in this case.
One last thing that I don't understand too is that in Rubular my regex
for price shows the match in the "Match result:" line, but the regex for
condition shows the whole string as a match in the "Match result:" line
but shows the correctlt matching substring in the "Match captures:"
line.

I am having difficulties to follow you here since I don't know what
"item" is in your case. It's probably easier if you provide a simple
test case that demonstrates your point. Using IRB often also helps.
I'm grateful for this great resource (the list/forum) and would be very
happy to hear from anyone who can help me sort this out!

We'll try to help but please provide a bit more information.

Kind regards

robert

Hi Robert,

Thanks a lot. I've discovered that there are many ways of achieving this
goal, whether it's through regex, ranges, or even split (as a friend
offline just advised me of).

I've gotten it working for now, but I'll likely be back eventually when
the next question arises. :)
 
S

Siep Korteling

Jet said:
Hi all,

I'm new to Ruby and even newer to regex. I'm trying to write my first
[useful] Ruby program and need a way to cut out an unneeded prefix
substring and retain the substring that comes after it.

Here are the actual details from my code:

result.each do |item|
price = item.search(".price").text.match(/\d+[.]\d+/)
condition = item.search(".condition").text.match(/Used - ([^,]+)/)
rating = item.search(".rating a").text.to_i
seller = item.search(".seller b").text
puts "#{price} - #{condition} - #{rating} - #{seller}"
end

The one from condition [in the code above] is the one that is giving me
a challenge. The string that is sent to condition will always be exactly
one of the following and nothing else at all:

"Used - Like New"
"Used - Very Good"
"Used - Good"
"Used - Acceptable"
(...)

This is another option, avoiding regular expressions. It's kind of old
school, but it's fast, flexible, and handles garbage.

sanitize_condition = Hash.new("Unknown")
sanitize_condition["Used - Like New"] = "Like New"
sanitize_condition["Used - Very Good"] = "Very Good"
sanitize_condition["Used - Good"] = "Good"
sanitize_condition["Used - Acceptable"] = "Acceptable"
sanitize_condition["Used - Broken"] = "Kaput"

demo_conditions = ["Used - Like New","",nil,"Used - Broken","Used -
Acceptable","garble"]
demo_conditions.each{|cond| puts sanitize_condition[cond] }

hth,

Siep
 
J

Jet Koten

Siep said:
Jet said:
Hi all,

I'm new to Ruby and even newer to regex. I'm trying to write my first
[useful] Ruby program and need a way to cut out an unneeded prefix
substring and retain the substring that comes after it.

Here are the actual details from my code:

result.each do |item|
price = item.search(".price").text.match(/\d+[.]\d+/)
condition = item.search(".condition").text.match(/Used - ([^,]+)/)
rating = item.search(".rating a").text.to_i
seller = item.search(".seller b").text
puts "#{price} - #{condition} - #{rating} - #{seller}"
end

The one from condition [in the code above] is the one that is giving me
a challenge. The string that is sent to condition will always be exactly
one of the following and nothing else at all:

"Used - Like New"
"Used - Very Good"
"Used - Good"
"Used - Acceptable"
(...)

This is another option, avoiding regular expressions. It's kind of old
school, but it's fast, flexible, and handles garbage.

sanitize_condition = Hash.new("Unknown")
sanitize_condition["Used - Like New"] = "Like New"
sanitize_condition["Used - Very Good"] = "Very Good"
sanitize_condition["Used - Good"] = "Good"
sanitize_condition["Used - Acceptable"] = "Acceptable"
sanitize_condition["Used - Broken"] = "Kaput"

demo_conditions = ["Used - Like New","",nil,"Used - Broken","Used -
Acceptable","garble"]
demo_conditions.each{|cond| puts sanitize_condition[cond] }

hth,

Siep

Hi Siep,

Thanks! My offline friend actually suggested that I refactor everything
into a hash actually, because the condition info is just one criteria of
many that I am pulling into my app...

but it is making my head spin because I am so new to Ruby, so I'm going
to take a break and then look at it again and also look at the
documentation for hash and see what I can come up with.

My friend also suggested that I write sudocode for all my desired
functionality and that that could help a lot. I have a prioritized list
for now, but it is making my head hurt to try and do so much that I
don't know how to do! :)

I can't say enough how helpful the list/forum is, and that I'm very
grateful for everyone using their free time to help me along.
 
R

Robert Klemme

Thanks a lot. I've discovered that there are many ways of achieving this
goal, whether it's through regex, ranges, or even split (as a friend
offline just advised me of).

That's often the case with Ruby - and many of those ways are also elegant.
I've gotten it working for now, but I'll likely be back eventually when
the next question arises. :)

"I'll be back." - oooh... ;-)

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top