regular expressions question

Neil Stevens · Dec 15, 2005

Neil said:
My mistake, heh. I wonder how many who use them know that, though, and
how many just do it without checking because it's popular in perl or
popular on here.

Hold on, this wasn't really my mistake, I think. How is one supposed to
know a dollar-sign variable isn't always global?

This sounds to me like some special-case hackery done to keep careless
coders from shooting themselves in the foot.

Jeff Wood · Dec 15, 2005

--------------010903040806040504060700
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Yes, I recognize that you are probably speaking at least in part to
me, since I did that in this very thread. You can call me by name if
you like. I'm a big boy and I can take it.

Hang on there Mr. Code Police. Let's not lay down the law down too
heavily before we get into this...

I seriously doubt those variables were invented in Perl. They are a
common feature to many Regular Expression implementation and I'm not
sure they are even that ugly. $1 holds what was grabbed by the first
set of parenthesis. Fairly logical.

I also showed a MatchData example.

I've used them a time or two, but honestly, they just don't feel
right to me. I've stopped using the default variable, I'm using a
two-space tab, etc. I'm Ruby assimilated, but I just like the Regexp-
linked variables.

I see a lot of code running the Ruby Quiz and I feel quite confident
saying that the Regexp variables are far more common than MatchData.
I don't think that says anything bad about the latter, but it does
tell me that you are in the minority.

We won't yell at you for using MatchData, if you'll provide the same
consideration...

James Edward Gray II

Quite simply, Ruby is *supposed* to be about consistency ... Having the
"everything is an object, principal of least surprise" mantra, then
using these which act like a global ( $ ) but aren't actually ( local
scope ) is just vile.

That's why I have a problem with them. If the community uses them,
well, that's their option, I'm just one that's all for consistency,
always, as much as possible... It tends to make things more generic and
able to handle change better.

I can't speak as to whether Perl was the first language to do the ${x}
variables ... but, it ( so far of the languages I've learned ) uses it
heavily, and it contributes to all of the punctuation soup that we all
left Perl to get away from... ( again I'm speaking generally, but I
could also again be wrong ... that's one of the uglies I left because of ).

... Not trying to be language police, I just really love the MatchData,
and find it MUCH easier to deal with. Then you can keep your datasets
from multiple matches around ... to me it *is* easier to read ...
instead of ... $1 ... where'd that come from ... I didn't assign a
glob... oh that's right ...

Anyways, I'm sorry to have causes this thread to go on this long ... I
just really thought more of the people on the list would step up and
say, yeah, those are some very ugly warts and we don't use them ... but
apparently, I was wrong.

I'll shut up now.

j.

--------------010903040806040504060700--

Joel VanderWerf · Dec 15, 2005

Neil said:
Hold on, this wasn't really my mistake, I think. How is one supposed to
know a dollar-sign variable isn't always global?

This sounds to me like some special-case hackery done to keep careless
coders from shooting themselves in the foot.

Usually if you think about what the variable represents, it's obvious
whether it should be thread-local or global, and ruby does it that way.
(Results of something the thread did, like call an external process, or
match a regex--those are thread local. Environment that was given when
the program started--those are global.) The local/global distinction is
not hackery, but the notation (inherited from perl) is not great.

I do kinda wish there was a consistent visual cue of some kind, like
$$foo for global and $foo for local, or $foo for global and $_foo for
local. It would also be nice to have a faster way to access user-defined
thread vars: $_foo versus Thread.current[:foo].

Bill Kelly · Dec 15, 2005

From: "Neil Stevens said:
Hold on, this wasn't really my mistake, I think. How is one supposed to
know a dollar-sign variable isn't always global?

This sounds to me like some special-case hackery done to keep careless
coders from shooting themselves in the foot.

I think it's more like when the intrepid Ruby nuby first notices
a method not suffixed with ! that modifies the receiver--and
posts to the list: This is inconsistent! This can't be right!
This violates POLS! etc.! "All methods that modify the receiver
should end in !, right???"

And Matz points out that the rationale is somewhat different. . . .

Similarly, it doesn't seem reasonable to condemn method-local $1..$n
as special-cace hackery designed to benefit careless coders, so much
as Ruby behaving in the most naturally useful way possbile.

Huzzah! &c.

Regards,

Bill

Robert Klemme · Dec 15, 2005

ako... said:
thank you. yes, it seems to be the only way. just that it is a shame
that we have to match the same expression again! the information was
available already, it was just discarded during the first match in
your sample.

I still didn't get what exactly you want. Does this help?
=> ["a", "b", "c"]

Kind regards

robert

tony summerfelt · Dec 15, 2005

Ross Bamford wrote on 12/14/2005 4:32 PM:

You could try:

a regex tool i'm finding invaluable is "redet" (on freshmeat)

works with a number of languages including ruby...

Garance A Drosehn · Dec 15, 2005

ako... said:
ako... said:

thank you. yes, it seems to be the only way. just that it is a shame
that we have to match the same expression again! the information was
available already, it was just discarded during the first match in
your sample.

Click to expand...

I still didn't get what exactly you want. Does this help?
=3D> ["a", "b", "c"]

Now that I've read the responses in this thread a few times, I think
I understand what he wants to do. And I don't think it can be done
via scan.

First: He wants a single regex which will verify the syntax of an
entire line. So, first he wants a true/false value, saying "The line
is valid, or it is not valid". Never mind any values in the line, just
"is the line *completely valid*?".

Then, if the line is valid, he wants to break out individual pieces
of what was scanned, and he wants to do that without re-doing
any of the scans he did in the first regex. The trick is that some
of those pieces are a repeating group, such as /(\s\w)*/.

What is confusing us is that he describes this using a simple
example, and when we solve the simple example he then says
"you don't get the bigger picture!". Ugh.

Let me give an example, and see if someone can solve it. My
example might still be something other than what he's thinking
of, but maybe it will help.

Let's say I'm expecting command lines of the form:
first word is either 'copy' or 'duplicate'
followed by one or more words
followed by the word 'before' or 'after'
followed by one or more words

So I could do the first step with the regexp:

/^(copy|duplicate) \s+ (\w+\s+)+ (before|after) \s+ (\w+\s*)+ $/x

(hopefully I've done that right!). *IF* that matches, then I know
the entire line is valid. Then, after I know the line is valid, I want
the array of source-words, and the array of destination-words
which were matched. I want to do that by picking out information
in Matchdata, not by doing a new scan. The thing is, I don't think
I have a way of knowing how many times the first '(\d+\s+)+' was
matched. So I can't just do a slice of $~.captures because I don't
know what the starting and ending indexes of that slice would be.
I could put another set of parenthesis around the two repeating
groups:

/^(copy|duplicate) \s+ ((\w+\s+)+) (before|after) \s+ ((\w+\s*)+) $/x

But that doesn't really give me two separate arrays of the
individual values that made up each group. It just matches
each group as a whole.

Given two data lines of:
copy apple pear plum peach after bill bob
duplicate tomato before joe alice alfred tommy jane

in the first case I want a way to set two arrays:
srcfood =3D ["apple ", "pear ", "plum ", "peach "]
destword =3D ["bill ", "bob"]
from the first line, and
srcfood =3D ["tomato "]
destword =3D ["joe ", "alice", "alfred ", "tommy ", "jane"]
from the second line.

I'll agree this is a weird example, but I think it shows the issue.
If I apply the above pattern to the first line, I'll see a Matchdata
result where:

$~.captures =3D=3D
["copy", "apple pear plum peach ", "peach ", "after", "bill bob", "bob"]

Notice: There isn't *any* element which contains a value of just "apple ",
or just "pear ", or just "plum ", even though the regex obviously had to
match each one of those.

ako... · Dec 15, 2005

yes, thank you. this is a better description of the problem. i am not a
native english speaker, so may be this is one of the reasons why my
question is not clear.

i saw a solution to this problem that uses split at the end. it of
course won't work if you change your example and allow quoted strings
in source-words and destination-words. a quoted string can contain
anything, spaces too and your keywords too, so the subsequent split
won't work.

well, i did not realise that the term "group's captures" is that rare.
i thought it was a standard term. but may be i am brainwashed by
microsoft. so i have this code in .net which might help to clarify what
i am talking about:

string text = "One car red car blue car";
string pat = @"^(?

\w+)\s+)*(\w+)$";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);

// Match the regular expression pattern against a text
string.
Match m = r.Match(text);
if (m.Success)
{
Console.WriteLine("match: [{0}]", m);
foreach (Group g in m.Groups)
{
Console.WriteLine("group: [{0}]", g);
foreach (Capture c in g.Captures)
{
Console.WriteLine("\tcapture: [{0}]", c);
}
}
}

the output is:

match: [One car red car blue car]
group: [One car red car blue car]
capture: [One car red car blue car]
group: [blue]
capture: [One]
capture: [car]
capture: [red]
capture: [car]
capture: [blue]
group: [car]
capture: [car]

as you see, the first group is $0, the second group is $1, and the
third is $2. but $1 and $2 contain captures too. it is like if $1 and
$2 were arrays in Ruby.

in my opinion this is a big limitation of ruby's regular expressions.
it just must be as powerful as .net ; -)

konstantin

James Edward Gray II · Dec 15, 2005

You are the one who first suggested that more people are using these
special-case magic variables so wantonly in their code, by bringing up
what you said in the Ruby Quiz.

Since you are educating me about the joys of MatchData, allow me to
educate you on the differences of what we both said. I explained
what I have seen and made statements about how common a given
practice is. You were, and still are, insulting people you know
nothing about.

Yeah, it's clear that regardless of how nice the Ruby language is,
nothing can bring up the quality of the average Internet programmer,
unfortunately. Better tools don't make a better carpenter.

You are rude for absolutely no reason and you have ceased to add
anything to this conversation. I'm done trying to reason with you.

James Edward Gray II

Neil Stevens · Dec 15, 2005

James said:
You are rude for absolutely no reason and you have ceased to add
anything to this conversation. I'm done trying to reason with you.

It's unfortunate that you think it's rude to point out facts that are
uncomfortable to you. It's unfortunate that you're apparently seeing
personal attacks that don't exist.

Gregory Brown · Dec 15, 2005

It's unfortunate that you think it's rude to point out facts that are
uncomfortable to you. It's unfortunate that you're apparently seeing
personal attacks that don't exist.

And what "facts" have you mentioned?

To me, the magic variables fit fine with regex.

I find it really hard to understand why people are complaining about
using a little $1 after some string of absolutely cryptic pattern
matching sequences.

People tend to like Ruby's regex system because it's perl-like, no?

But in the interest of preserving TIMTOWTDI, there is an object
oriented solution which many posters have already mentioned.

James saying that it was common practice, and providing evidence of it
is much more reasonable than you simply insulting people you don't
know.

A quote from Arnold in Kindergarden Cop, "Stop Whining!" applies very well =
here.
You're free to never use $1 .. $n, so I don't see what the issue is.

Ezra Zygmuntowicz · Dec 15, 2005

It's unfortunate that you think it's rude to point out facts that are
uncomfortable to you. It's unfortunate that you're apparently seeing
personal attacks that don't exist.

kill-filed

-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
http://yakimaherald.com
509-577-7732
(e-mail address removed)

jeff.darklight · Dec 15, 2005

I know I said I'd shut up, and I am, but I did feel that after some of
the messages that have popped since mine, that I should have a final
comment...

I do apologize to you all for causing such grief. I was simply trying
to add my $0.02 as what I thought was best practice ... and made the
assumption that most people were doing things that way ( I don't use
the ${x} vars at all, but I bet you knew that already )...

I'm not saying that people are bad coders for using those, I didn't
think I was being insulting in my OP. I was simply suggesting that I
thought it a better practice to show the most oop way of doing things
in code sent to newer programmers ...

Truly, if anybody took offense, it was unintentional. If you knew me
better, you would understand that I tend to be passionate about things,
especially when it comes to teaching the next gen of programmers and/or
users of Ruby...

So, peace, love, and joyful programming to you all, I will try to be
more cautious with my posts in the future.

j.

Neil Stevens · Dec 16, 2005

Ezra said:
Can we please try to keep it civil in here? You go code how you want
and I will go code how I want and we can both be happy.

Actually, no, you're wrong. String's gsub, for one, has been hard-wired
to force you to use magic punctuation in order to access the match data.

This forces *everyone* to depend on the pseudo-global variable hacks
unless they run their Regexp multiple times:

str = 'root:*:0:0:System Administrator:/var/root:/bin/sh'
re = /([^:]+)

|$)/
str.gsub!(re) do |m|
matchData = re.match(m)
'x' * matchData[1].length + matchData[2]
end
puts str
=> 'xxxx:x:x:x:xxxxxxxxxxxxxxxxxxxx:xxxxxxxxx:xxxxxxx'

So the langauge is hard-coded to force everyone to use pseudo-global
magically-scoped variables, that you have to cross your fingers and hope
work the way you need at the moment.

dblack · Dec 16, 2005

Hi --

Actually, no, you're wrong. String's gsub, for one, has been hard-wired
to force you to use magic punctuation in order to access the match data.

This forces *everyone* to depend on the pseudo-global variable hacks
unless they run their Regexp multiple times:

You can minimize it by using $~.

David

--
David A. Black
(e-mail address removed)

"Ruby for Rails: Ruby techniques for Rails developers",
coming April 2006 (http://www.manning.com/books/black)

James Edward Gray II · Dec 16, 2005

I do apologize to you all for causing such grief.

I too apologize if I came back at you too harsh. It was unintentional.

Group hug everyone!

James Edward Gray II

Ryan Leavengood · Dec 16, 2005

I do apologize to you all for causing such grief.

You have been fairly civil. What I find most amusing is how Neil
started the flame-fest by agreeing to your self-imposed shutting up,
then ended up having the same opinion as you (that the $1 variables
are bad.) Oh the irony.

Ryan

Neil Stevens · Dec 16, 2005

Ryan said:
You have been fairly civil. What I find most amusing is how Neil
started the flame-fest by agreeing to your self-imposed shutting up,
then ended up having the same opinion as you (that the $1 variables
are bad.) Oh the irony.

Actually, I think you should read more carefully what I wrote. The
irony here isn't what you think.

Ryan Leavengood · Dec 16, 2005

Actually, I think you should read more carefully what I wrote. The
irony here isn't what you think.

Yes I see. Actually your statement can be interpreted either way,
based on context. Since I hadn't read the thread for several hours I
forgot which side of the debate you were on. So, when I read this:

"I think that's exactly what some people want you to do. People don't
want to be told that they're lousy coders, and their poor practices have
only been made to work through a special case in the language interpreter."

It sounded like you wanted Jeff to shut up, and that you were in the
group being called lousy coders and you didn't like it. But now I see
you were talking about other people in the nasty way shown all over
this thread.

I don't know if you will fit in the Ruby community because we are
generally a nice bunch of people and your behavior on this thread
hasn't been very nice. In case the life lesson hasn't been taught to
you yet, you attract more flies with honey than with vinegar, so to
speak.

Regards,
Ryan

Neil Stevens · Dec 16, 2005

Ryan said:
I don't know if you will fit in the Ruby community because we are
generally a nice bunch of people and your behavior on this thread
hasn't been very nice. In case the life lesson hasn't been taught to
you yet, you attract more flies with honey than with vinegar, so to
speak.

I never called out anyone in particular, didn't have anyone in mind in
fact, but it's funny how some people are insecure enough that they're
getting offended by what I wrote, and getting so defensive that they
feel the need to lash out in return.

It reminds me of a saying: "If you throw a rock into a pack of dogs, the
dog that yelps loudest is the one that got hit."

Regular expressions, capture repeated groups	4	Jul 8, 2010
Do You Understand Regular Expressions?	17	Jun 20, 2007
regular expressions and matching delimeters	17	May 21, 2014
Processing regular expressions?	2	Oct 15, 2010
Recursive regular expressions in Ruby?	4	Jan 31, 2011
The power of regular expressions without regular expressions.	0	Jul 17, 2013
Regular Expressions	4	Jun 17, 2008
Utility to locate errors in regular expressions	3	May 24, 2013

regular expressions question

Neil Stevens

Jeff Wood

Joel VanderWerf

Bill Kelly

Robert Klemme

tony summerfelt

Garance A Drosehn

ako...

James Edward Gray II

Neil Stevens

Gregory Brown

Ezra Zygmuntowicz

jeff.darklight

Neil Stevens

dblack

James Edward Gray II

Ryan Leavengood

Neil Stevens

Ryan Leavengood

Neil Stevens

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads