emulating @+ and @-

  • Thread starter Tassilo v. Parseval
  • Start date
T

Tassilo v. Parseval

Hi,

I am currently working on something that should be backward-compatible
at least up to 5.00503. Unfortunately, this relies on @+ and @- which
weren't there by that time. So I have to find a way to create and
populate these two arrays with 5.00503. Consider the original code that
assumes that @(+|-) exist:

sub bla {
my ($string, $pat, $code) = @_
while ($string =~ /$pat/g) {
$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
};
}

This is essentially what Ruby's scan() does: Scan a string for a pattern
and call the code via the reference. If the pattern contains
subpatterns, call it like

$code->($1, $2, ...);

otherwise (that means, no captured subpatterns) do

$code->($&);

This means, I need the whole of @- and @+, and not just the first
element of each two. My question is specifically about generating
elements 1 to $#-. My current solution:

while ($string =~ /($pat)/g) {
@- = @+ = (); # clear previous match offsets
# populates $-[0] and $+[0]
push @-, index($string, $1);
push @+, pos($string);

# fill @-[1..#@-] and @+[1..#@+]
my $digit = 2; # $1 is the whole match
while () {
no strict 'refs';
if (defined $$digit) {
# extract offsets of $$digit
...
$digit++;
} else {
last;
}
}

$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
}

The part I'm uneasy about is

if (defined $$digit) {
...
}

Specifically, can $2 be undefined but $3 still contain a submatch? I
vaguely remember that I had such cases but I can't reproduce them now.
If they exist, I can't use the above code and need something better. If
so, what would be a correct solution?

Thanks in advance for any pointers,
Tassilo
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
push @-, index($string, $1);
push @+, pos($string);
$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);

??? What is the point of finding indices, then calling substr? Why
not use $$_ directly?
Specifically, can $2 be undefined but $3 still contain a submatch?

Of course:

(a)?(b)

Hope this helps,
Ilya
 
A

Anno Siegel

Tassilo v. Parseval said:
Hi,

I am currently working on something that should be backward-compatible
at least up to 5.00503. Unfortunately, this relies on @+ and @- which
weren't there by that time. So I have to find a way to create and
populate these two arrays with 5.00503. Consider the original code that

[$string =~ /($pat)/g]
Specifically, can $2 be undefined but $3 still contain a submatch? I
vaguely remember that I had such cases but I can't reproduce them now.
If they exist, I can't use the above code and need something better. If
so, what would be a correct solution?

I have vaguely asked myself that too.

It is my impression that after a successful match $1 ... $n are
always defined where n is the number of capturing parentheses in $pat.
Even if a submatch doesn't apply (as in an alternation), the corresponding
$i is empty, but defined.

The hard part is finding where this is documented. I can't.

Anno
 
T

Tassilo v. Parseval

Also sprach Ilya Zakharevich:
push @-, index($string, $1);
push @+, pos($string);
$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);

??? What is the point of finding indices, then calling substr? Why
not use $$_ directly?

Copy and paste. I have the ordinary function scan() and the backward
compatible one scan5005003() and I do a

if ($] < 5.006) {
*scan = \&scan500503;
}

to replace the version not suitable for this perl. So the
backward-compatible ones are just a copy of the ordinary functions plus
the population of @- and @+.

Maybe I am going to reimplement it later by directly using $$digit, but
that's quite a bit of work since I have around five or six of these
duplicate functions. But I am not yet sure whether I should do that
because then a user cannot use @- and @+ in his code references (as he
could now).
Of course:

(a)?(b)

Hope this helps,

It does, thank you. It means I have to find a different way to
figure out how many submatches exist. :-(

What about perl_get_sv? My code is 90% XS anyway, so I can just as
easily add more of it. Would this be reliable?

int
num_submatches ()
PREINIT:
int i = 0;
char *digit;
int len = 1;
CODE:
New(0, digit, 2, char);
digit = "2";
while (1) {
if (!sv_get(digit, FALSE))
break;
/* next() already exists elsewhere and
* increments/grows the string accordingly */
next(&digit, &len);
i++;
}
RETVAL = i;
OUTPUT:
RETVAL

It assumes that $4 does not exist when $3 was the last succesful submatch.
But I don't know whether perl really destroys all the digit variables
that are larger than the highest submatch.

Tassilo
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
What about perl_get_sv? My code is 90% XS anyway, so I can just as
easily add more of it. Would this be reliable?
int
num_submatches ()

Just copy the code used for access to @-/@+. Lemme see... mg.c:

case '+':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
if (paren)
goto getparen;
}
sv_setsv(sv,&PL_sv_undef);
break;

Check with older Perl sources, but I think that PM_GETRE should be
something like identity macro on older perls...

Yours,
Ilya
 
T

Tassilo v. Parseval

Also sprach Ilya Zakharevich:
Just copy the code used for access to @-/@+. Lemme see... mg.c:

case '+':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
if (paren)
goto getparen;
}
sv_setsv(sv,&PL_sv_undef);
break;

Check with older Perl sources, but I think that PM_GETRE should be
something like identity macro on older perls...

That's smart. I think that this will make it easy to solve my problem.
'struct regexp' (among others) holds

U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
U32 lastcloseparen; /* last paren matched */

One of those (or even all of them) is probably what I am looking for.

Thanks for your help!
Tassilo
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval
That's smart. I think that this will make it easy to solve my problem.
'struct regexp' (among others) holds

U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
U32 lastcloseparen; /* last paren matched */

One of those (or even all of them) is probably what I am looking for.

Keep in mind that struct regexp is used for *two* purposes (I did not
have time to clean this when working with REx engine): it keeps a part
of the state during the match process, and it keeps the info used for
$<digit> etc *after* the match. Be sure to use only the entries
mentioned in mg.c.

Yours,
Ilya
 
T

Tassilo v. Parseval

Also sprach Ilya Zakharevich:
Keep in mind that struct regexp is used for *two* purposes (I did not
have time to clean this when working with REx engine): it keeps a part
of the state during the match process, and it keeps the info used for
$<digit> etc *after* the match. Be sure to use only the entries
mentioned in mg.c.

I eventually went for lastparen which seems to work very well. PM_GETRE
had to be added, and that was all. My little XSUB now reads as

int
num_submatches ()
CODE:
#ifndef PM_GETRE
# define PM_GETRE(o) ((o)->op_pmregexp)
#endif
RETVAL = PM_GETRE(PL_curpm)->lastparen;
OUTPUT:
RETVAL

After a wild and ambitious attempt of adding the whole of @+ in @- to
5.00503 failed (naturally, it didn't yet know of about 'D' magic), I can
still implement them as tied arrays. That way I wouldn't need to change
my Perl code.

Ilya, thanks for the invaluable pointers to the relevant bits of the
Perl source. Normally, regexes on the source level really scare me, but
here it was surprisingly not hard at all.

Tassilo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,137
Messages
2,570,795
Members
47,342
Latest member
eixataze

Latest Threads

Top