More questions about =~

GGarramuno · Jan 1, 2004

irb(main):006:1* class String
irb(main):006:1* alias oldRegex =~
irb(main):006:1* def =~(o)
irb(main):007:2> p "string regex"
irb(main):008:2> oldRegex(o)
irb(main):009:2> end
irb(main):010:1> end

irb(main):017:0> class Regexp
irb(main):018:1> alias oldRegex =~
irb(main):019:1* def =~(o)
irb(main):020:2> p "regexp regex"
irb(main):021:2> oldRegex(o)
irb(main):022:2> end
irb(main):023:1> end

"abc" =~ /a/
=> 0

Eh? Where is print statement?
What =~ is being called?

If I do:

irb(main):040:0> "abc".=~ /a/
"string regex"
=> 0

irb(main):041:0> /a/.=~("abc")
"regex regex"
=> 0

all seems logical, thou.

I am obviously missing some other method that =~ uses, right?

ts · Jan 1, 2004

G> I am obviously missing some other method that =~ uses, right?

Yes, internally ruby is optimized to call directly the C function rather
than calling the ruby method.

Guy Decoux

GGarramuno · Jan 1, 2004

ts said:
G> I am obviously missing some other method that =~ uses, right?

Yes, internally ruby is optimized to call directly the C function rather
than calling the ruby method.

Hmm... so, is there a way to overload/redefine it from ruby?
Also, how do you find out about the existance of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?

Dave Wilson · Jan 1, 2004

GGarramuno said:
Hmm... so, is there a way to overload/redefine it from ruby?

no way of overloading or redefining from ruby that I know of, but you do
have some other options. this optimization is used when the regexp is a
a literal, which means you can get by this by not using a regexp
literal, for example:

pattern = /a/
# or
pattern = Regexp.new('a')

"abc" =~ pattern

you could also prepend . to the =~ operator, which still looks
operator-like but is interpreted as a method call:

"abc" .=~ /a/

it would be best if ruby were to first check to see if =~ has been
redefined before applying this optimization.

Also, how do you find out about the existence of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?

yep, the ruby source code is the primary source of documentation for
this kind of stuff.

Dave

Shashank Date · Jan 1, 2004

Dave Wilson said:
yep, the ruby source code is the primary source of documentation for
this kind of stuff.

More over, you have to keep in mind that it is Guy who is answering
your questions. I strongly suspect, and I am sure I am not the only one
on this ML, that he knows the source code by heart !

Dave Wilson · Jan 1, 2004

Dave said:
this optimization is used when the regexp is a
a literal, which means you can get by this by not using a regexp
literal, for example:

pattern = /a/
# or
pattern = Regexp.new('a')

"abc" =~ pattern

I should clarify: you can use a literal regexp, but not "in-place". the
optimization only happens when you match against the literal regexp
"in-place", but if you match against a variable that has been assigned a
literal regexp, this optimization won't be used.

Dave

Simon Strandgaard · Jan 2, 2004

Hmm... so, is there a way to overload/redefine it from ruby?
Also, how do you find out about the existance of this optimization?
By browsing the ruby C code? Or are there some docs available
somewhere?

Look here if you want to overwrite $1..$9, $', $`, $+
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/86167

BTW: Why are you interested in overloading the regexp methods?
Is it because you are writing your own regexp engine?

GGarramuno · Jan 2, 2004

Simon Strandgaard said:
On Thu, 01 Jan 2004 12:46:54 -0800, GGarramuno wrote:

BTW: Why are you interested in overloading the regexp methods?
Is it because you are writing your own regexp engine?

Kind of.

I was just trying to have an extended string class that stored the
position of the last match position, to allow supporting something
similar to Perl's pos() command (which allows continuing matches from
the last match done or setting where to begin from like index() ---
only it works for subs and all regex methods, too). I am porting some
nasty perl code that makes heavy use of that feature. Ruby's regex
engine already has the building blocks needed to support this feature,
but the interface to it is not as easy to use as perl's.
I had thought I could use ruby's excellent OO to add this seamlessly
into the string class (and also make it work for any other string that
used my module, too).
But the fact that =~ of regex literals works in a non OO manner, has
kind of put a damp on that idea.

Simon Strandgaard · Jan 2, 2004

Kind of.

I was just trying to have an extended string class that stored the
position of the last match position, to allow supporting something
similar to Perl's pos() command (which allows continuing matches from
the last match done or setting where to begin from like index() ---
only it works for subs and all regex methods, too). I am porting some
nasty perl code that makes heavy use of that feature.

#begin /#end can tell the offset.. is that what you want ?

irb(main):001:0> txt = "hello world, hello world"
=> "hello world, hello world"
irb(main):002:0> m = /orl/.match(txt)
=> #<MatchData:0x81c8754>
irb(main):003:0> m.begin(0)
=> 7
irb(main):004:0> m.end(0)
=> 10
irb(main):005:0>

Ruby's regex
engine already has the building blocks needed to support this feature,
but the interface to it is not as easy to use as perl's.
I had thought I could use ruby's excellent OO to add this seamlessly
into the string class (and also make it work for any other string that
used my module, too).
But the fact that =~ of regex literals works in a non OO manner, has
kind of put a damp on that idea.

It sound interesting, I like challenges (but i don't do perl).
Can you show us some of the perl code which you are porting ?

GGarramuno · Jan 2, 2004

Simon Strandgaard said:
#begin /#end can tell the offset.. is that what you want ?

Kind of. Those are what I call the building blocks.

The differences are that:
a) That's an index that gets stored with the string variable, as if
it were an attribute of it.
b) It can also be set, like String#pos(number), so that any
further matches or substitutions begin from that position on (this is
somewhat akin to ruby's index())
c) It interacts with the \G flag of regular expressions.

It sound interesting, I like challenges (but i don't do perl).
Can you show us some of the perl code which you are porting ?

Well, you really don't want to look at it. It is a library written by
Damian Conway (one of Perl's top gurus and designers) and as such it
is brilliant.
But unless you've been coding perl for some years, it will more likely
look like part of an obfuscated code contest

Anyway, here it is:
http://www.cpan.org/modules/01modules.index.html
Module is: Getopt-Declare.

Simon Strandgaard · Jan 3, 2004

Kind of. Those are what I call the building blocks.

The differences are that:
a) That's an index that gets stored with the string variable, as if
it were an attribute of it.
b) It can also be set, like String#pos(number), so that any
further matches or substitutions begin from that position on (this is
somewhat akin to ruby's index())

How about #scan ?

c) It interacts with the \G flag of regular expressions.

never seen \G before.. is that global ?

Well, you really don't want to look at it. It is a library written by
Damian Conway (one of Perl's top gurus and designers) and as such it
is brilliant.
But unless you've been coding perl for some years, it will more likely
look like part of an obfuscated code contest

Agree on that, it looks obfuscated... but less than other perl modules I
have seen.

Anyway, here it is:
http://www.cpan.org/modules/01modules.index.html
Module is: Getopt-Declare.

I am curios to how its differ from GetoptLong ?

GGarramuno · Jan 3, 2004

Simon Strandgaard said:
How about #scan ?

No, different thing.

never seen \G before.. is that global ?

From the perl manual...
\G Match only at pos() (e.g. at the end-of-match position
of prior m//g)

Ruby supposedly supports it, but not as a setter, from what I can see
so far (ie. it is missing pos()).

Agree on that, it looks obfuscated... but less than other perl modules I
have seen.

Don't be fooled. It is one of the most obfuscated modules once you
realize all it does with so little code.

I am curios to how its differ from GetoptLong ?

You can read the full docs at the end of the module (if you have perl,
perldoc is better, thou). The docs are 1500 lines long.

Overall, it is vastly superior and makes all other option parsing
modules obsolete and primitive, imho.

Among the not so standard features:
- Allows also using a config file for options and reading parameters
from other places other than commandline (files, for example).
- It keeps the flags and docs as a single string (ie. you basically
type the help string message ONLY and the module extracts the flags
from that). It makes for extremely clean code while still allowing
you to format the help line as you wish. Help line is provided
automatically, too, removing special characters or blocks.
- It supports arbitrary user created types for matching, not just
string, numerics, etc.
- For numbers it supports matching positive, negative w or w/o 0.
- Allows arrays parsing and ranges parsing/expanding.
- Allows matching parameters with a specific manual regex.
- It supports all sorts of user shortcuts for flags (not just two).
- Supports aliases for flags easily.
- It creates regex code that can be spit out for matching if needed.
- It allows code blocks to be imbedded (ie. when flags are seen full
blocks can be parsed with perl, MUCH more powerful ways than other
similar getopts)
- Allows case to be ignored on a parameter or globally.
- Allows options to be exclusive, inclusive, strict, etc.
- Allows clustering of flags in a couple of forms
- Allows parameters to be put on a queue, so that they only get
interpreted after all others have.
- Can check file parameter to verify their existance.

Nathaniel Talbott · Jan 3, 2004

Overall, it is vastly superior and makes all other option parsing
modules obsolete and primitive, imho.

Among the not so standard features:
- Allows also using a config file for options and reading parameters
from other places other than commandline (files, for example).
- It keeps the flags and docs as a single string (ie. you basically
type the help string message ONLY and the module extracts the flags
from that). It makes for extremely clean code while still allowing
you to format the help line as you wish. Help line is provided
automatically, too, removing special characters or blocks.
- It supports arbitrary user created types for matching, not just
string, numerics, etc.
- For numbers it supports matching positive, negative w or w/o 0.
- Allows arrays parsing and ranges parsing/expanding.
- Allows matching parameters with a specific manual regex.
- It supports all sorts of user shortcuts for flags (not just two).
- Supports aliases for flags easily.
- It creates regex code that can be spit out for matching if needed.
- It allows code blocks to be imbedded (ie. when flags are seen full
blocks can be parsed with perl, MUCH more powerful ways than other
similar getopts)
- Allows case to be ignored on a parameter or globally.
- Allows options to be exclusive, inclusive, strict, etc.
- Allows clustering of flags in a couple of forms
- Allows parameters to be put on a queue, so that they only get
interpreted after all others have.
- Can check file parameter to verify their existance.

A lot of those things are provided by the Ruby package optparse (which
I've used with great effect), and I was wondering if you could compare
optparse with Getopt-Declare; perhaps Nobu will add the missing
features

You can find optparse documentation here:

http://www.ruby-doc.org/stdlib/libdoc/optparse/rdoc/index.html

Thanks,

Nathaniel

<

(><

GGarramuno · Jan 4, 2004

Nathaniel Talbott said:
On Jan 3, 2004, at 06:51, GGarramuno wrote:

A lot of those things are provided by the Ruby package optparse (which
I've used with great effect), and I was wondering if you could compare
optparse with Getopt-Declare; perhaps Nobu will add the missing
features

I can compare the features, most likely. But the first thing that
quickly turns me off against it is how parameters are passed,
inefficiently one at a time.

Compare that, to the simplicity and elegance of perl's
Getopt:

eclare.

For a simple example:

$args = new Getopt:

eclare (<<'EOPARAM');

============================================================
Required parameter:

-in <infile> Input file [required]

------------------------------------------------------------

Optional parameters:

(The first two are mutually exclusive) [mutex: -r -p]

-r[and[om]] Output in random order
-p[erm[ute]] Output all permutations

---------------------------------------------------

-out <outfile> Optional output file

------------------------------------------------------------
Note: this program is known to run very slowly of files with
long individual lines.
============================================================
EOPARAM

The beauty of the system is that the syntax definition can almost
looks like the help itself (from which a default -h flag printout is
extracted), so it is very easy to understand, even for newbies who
never read the docs to the module.
You just need to recall anything within [] is optional or a special
command to the engine, while {} is code, etc.

For a more complex case (involving complex switches, embedded code,
multiple file parsing, ranges, arrays, etc.), look at this one:

$args = new Getopt:

eclare <<'EOARGS';
($0 version $VERSION)
General options:

-e <f:i>..<t:i> Set expansion factor to specified range
[requires: <file>]
{ print "k = [$f..$t]\n"; }

-e [<k:n>...] Set expansion factor to <k> (or 2 by default)
[required]
{ @k = (2) unless @k;
print "k = [", join(',', @k), "]\n";
}

-b <blen:i> Use byte length of <blen>
[excludes: -a +c]
{ print "byte len: $blen\n"; }

<file>... Process files [required] [implies: -a]
{ print "files: \@file\n"; }

-a [<N:n>] Process all data [except item <N>]
{ print "proc all\n"; print "except
$N\n" if $N; }

-fab The fabulous option (is always required

[required]
{ defer { print "fabulous!\n" } }

File creation options:

+c <file> Create file [mutex: +c -a]
{ print "create: $file\n"; }

+d <file> Duplicate file [implies: -a and -b 8]
This is a second line
{ print "dup (+d) $file\n"; }
--dup <file> [ditto] (long form)
# { print "dup (--dup) $file\n"; }

-how <N:i> Set height to <N> [repeatable]

Garbling options:

-g [<seed:i>] Garble output with optional seed [requires:
+c]
{ print "garbling with $seed\n"; }
-i Case insensitive garbling [required]
{ print "insensitive\n"; }
-s Case sensitive garbling
-w WaReZ m0De 6aRBL1N6

[mutex: -i -s -w]
EOARGS

The time it would take me to write something like that in other getopt
parsers, I'd get frustrated. It's probably not an issue if you write
command-line tools once in a while, but if you write many of them
every now and then (or expect some program to keep adding complex
switches), it makes sense to use something like Getopt:

eclare.

All flags, eventually end up being stored in a public hash of the
object, to extract them later, of course.

Socket hang in thread	1	Sep 15, 2009
Something changed an instance variable ... and now I'm confused	3	Jan 8, 2010
Question about "protected" and "private"	4	May 1, 2009
why one array continues to grow after repeated call	20	Sep 18, 2008
Ruby Hash Keys and Related Questions	6	Feb 23, 2011
bsearch.rb	0	Nov 15, 2009
ruby thread is buggy while using serialport	7	Aug 7, 2009
Subclassing in module from top module?	3	Oct 12, 2009

More questions about =~

GGarramuno

ts

GGarramuno

Dave Wilson

Shashank Date

Dave Wilson

Simon Strandgaard

GGarramuno

Simon Strandgaard

GGarramuno

Simon Strandgaard

GGarramuno

Nathaniel Talbott

GGarramuno

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads