About Regular Expressions

J

James Britt

James said:
My experience has been the opposite.

At my wife's company, they spend a significant portion of everyday
importing raw text reports from their database into Excel for various
uses. Some of the reports do not go into Excel well at all. A large
chunk of most employee's day is spent cleaning up these reports, by
hand. (I'm considering one of these reports for a future Ruby Quiz, if
that gives you any idea how wonky they can be.)
...

My experience is that I use REs on a sporadic basis, and tend to learn
and remember just enough to get a task done. But by the time I need to
write another RE, if it isn't something trivial, I have to go looking
for docs and such. And the docs often do not explain how to do certain
things, or tell me if it is even possible.

(An example: Can I write an RE that tells me if a given string contains
all substrings in a given set of substrings, irrespective of the order
of the substrings in either the target string or the set of substrings? )

But, to be fair, some of these sorts of requests may be beyond a basic
cookbook or newbie intro Web page. So, it's either get the O"Reilly
book, post a question, or hack away.

In general, I don't really care if people are too lazy to look things
up, if they don't make it a regular habit and overrun the list. I
scan message headers, do a sort of triage for attention, and ignore
many, many things. Pretty painless. So, stupid questions are welcome.
(I'd hate to feel reluctant to ask a stupid question myself; I have too
many of them.)

One a side note, I wonder what it would take to write a domain language,
parable by Ruby, that let one write REs in plain-ish English? Maybe
take some of the mystery out of regular expressions for most of the
common cases. (Or would that just get people dependent on too much
hand-holding?)


James
 
D

David A. Black

Hi --

One a side note, I wonder what it would take to write a domain language,
parable by Ruby, that let one write REs in plain-ish English? Maybe
take some of the mystery out of regular expressions for most of the
common cases. (Or would that just get people dependent on too much
hand-holding?)

Florian Gross has written such a thing. Paging flgr....


David
 
M

Mauricio Fernández

One a side note, I wonder what it would take to write a domain language,
parable by Ruby, that let one write REs in plain-ish English? Maybe
take some of the mystery out of regular expressions for most of the
common cases. (Or would that just get people dependent on too much
hand-holding?)

Take a look at Florian Groß' Regexp::English.
Short example: http://www.rubycookbook.org/cookbook/view/ReadableRegularExpressions

Also Simon Strandgaard's re might interest you; you can do things like
+-Alternation
+-Sequence
| +-Inside set="a"
| +-Inside set="b"
+-Sequence
| +-Inside set="c"
| +-Inside set="d"
| +-Lookahead positive
| +-Inside set="x"
+-Sequence
+-Inside set="x"
+-Inside set="x"
+-Inside set="x"
=> nil
 
S

Simon Strandgaard

On Thursday 18 November 2004 21:19, Mauricio Fernández wrote:
[snip english (sorry)]
Also Simon Strandgaard's re might interest you; you can do things like

+-Alternation
+-Sequence
| +-Inside set="a"
| +-Inside set="b"
+-Sequence
| +-Inside set="c"
| +-Inside set="d"
| +-Lookahead positive
| +-Inside set="x"
+-Sequence
+-Inside set="x"
+-Inside set="x"
+-Inside set="x"
=> nil

Thanks Mauricio for mentioning this.

It can be obtained through raa (or via rpa)

http://raa.ruby-lang.org/project/regexp/



btw: does any one use this package?
(got ideas for improvement?)
 
B

Brian Schröder

[snip]
Mathematicians prefer terseness and so do many programmers, but there is
no need for terseness in something like regular expressions used for
search and replace. That's why EMACS and Vim use extended versions of
BRE not ERE for example (which is a point though as \ is a bitch to
type).

Thats the only thing I hate in emacs. I always have to look up the regular expression syntax, or do some trial-error until I get the emacs re escapings right. Why don't they simply use the syntax everyone else uses.

I do not see, where the emacs syntax is simpler to read/write than ruby/perl/egrep syntax.

regards,

Brian
 
B

Brian Schröder

As the person who posted that question, let me just say that while
there are lots of Ruby details I forget, you couldn't reasonably call
me a newbie. One reason, I think, is that the char class I was dealing
with had other characters that distracted me:

/^[0123456789.-*]+/

... which isn't certainly a sign of programming genius, but I like to
think is the sort of odd blind spot that happens to most programmers
from time to time. I'd also like to think that most of what I post here
isn't nearly that newbie-ish, but then I'm obviously not the best judge
of that.

F.

I hope that this does not shun anyone away from posting. It must be allowed to post a "dumb" questions from time to time as Nikolai himself said, he who is not interested in answering "dumb" questions just skips them.

Doing a bit of research on your own is the politeness I expect everyone to have done. As I said before, I think that every mere mortal has some blind spots sometimes. And Francis question was also answered as expected with a correction and a pointer for further reading. I'm shure every poster that gets this kind of treatment will learn something new.

Maybe we should strip the harsh tone from Nikolais post and consider if it is a sensible idea to put up more documentation, or a link to some good existing documentation on ruby-doc or "ri Regexp"

Regards,

Brian
 
N

Nikolai Weibull

* Brian Schröder said:
Thats the only thing I hate in emacs. I always have to look up the
regular expression syntax, or do some trial-error until I get the
emacs re escapings right. Why don't they simply use the syntax
everyone else uses.

Good question. See below.
I do not see, where the emacs syntax is simpler to read/write than
ruby/perl/egrep syntax.

It isn't simpler. It is less mathematical if you will. The idea is
that most often you want to search for a fixed string and thus /(/ should
match '(' and not be a metacharacter for grouping (and capturing if
that's required). This makes much more sense in Vi-based editors, as
there is only one search command (by default), namely '/'. In EMACS you
have a choice of searching for a regular expression and a fixed string,
by using different keybindings. Thus, for EMACS, using BRE over ERE is
just plain silly (I'm trying to use kind words from now on). Anyway,
the main problem with the current state of affairs is that there are so
many incompatible regular expression implementations, all with their own
quirks and special syntax. The biggest evil-doer in my opinion is
Perl5. There's too much going on. They have crammed context-free
language matching into regular expressions, while at the same time
making them NP-complete. Sure, look-around has its uses, and there's a
lot of nice stuff you can do. Is it necessary, though? Hardly. I
won't argue this point any further, but the bottom line is that regular
expressions have been abused and need some time to recuperate.
nikolai
 
R

Robert Klemme

all substrings in a given set of substrings, irrespective of the order
of the substrings in either the target string or the set of
ubstrings? )

Yes you can, but it's going to be uuuuugly and slooooow if you have more
than two sub strings. A set of one regexp per sub string is likely to be
more efficient:

substrings = %w{foo bar baz}
rxs = substrings.map {|s| Regexp.new(Regexp.escape(s))}

string = "klajsd askdjkahs bar asjdajsd asdbazagdjhagsdh f0dfdfuufoosds"
puts "Got it: #{string}" if rxs.all? {|rx| rx =~ string}
One a side note, I wonder what it would take to write a domain language,
parable by Ruby, that let one write REs in plain-ish English? Maybe
take some of the mystery out of regular expressions for most of the
common cases. (Or would that just get people dependent on too much
hand-holding?)

Such a language with the same expressiveness as typical regexps would be
too verbose for me. But I think I remember someone did something similar
with Ruby. Can't quite remember who it was or a mail thread subject but I
do believe someone has attempted to do something in that direction.

Kind regards

robert
 
J

James Britt

Robert said:
ubstrings? )

Yes you can, but it's going to be uuuuugly and slooooow if you have more
than two sub strings. A set of one regexp per sub string is likely to be
more efficient:

substrings = %w{foo bar baz}
rxs = substrings.map {|s| Regexp.new(Regexp.escape(s))}

string = "klajsd askdjkahs bar asjdajsd asdbazagdjhagsdh f0dfdfuufoosds"
puts "Got it: #{string}" if rxs.all? {|rx| rx =~ string}


Thank you. I would be searching on possibly any number (but figure 5 or
6 as an average) of such substrings. The all? syntax is nice and clear.

Such a language with the same expressiveness as typical regexps would be
too verbose for me. But I think I remember someone did something similar
with Ruby. Can't quite remember who it was or a mail thread subject but I
do believe someone has attempted to do something in that direction.

Yes, the details were posted here the other day. I thought it would be
handy for folks who have a hard time remembering certain syntax, or want
the intent of the regexp to be more clear.


Thanks,

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,430
Latest member
7dog123

Latest Threads

Top