emulating @+ and @-

Tassilo v. Parseval · Sep 8, 2003

Hi,

I am currently working on something that should be backward-compatible
at least up to 5.00503. Unfortunately, this relies on @+ and @- which
weren't there by that time. So I have to find a way to create and
populate these two arrays with 5.00503. Consider the original code that
assumes that @(+|-) exist:

sub bla {
my ($string, $pat, $code) = @_
while ($string =~ /$pat/g) {
$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
};
}

This is essentially what Ruby's scan() does: Scan a string for a pattern
and call the code via the reference. If the pattern contains
subpatterns, call it like

$code->($1, $2, ...);

otherwise (that means, no captured subpatterns) do

$code->($&);

This means, I need the whole of @- and @+, and not just the first
element of each two. My question is specifically about generating
elements 1 to $#-. My current solution:

while ($string =~ /($pat)/g) {
@- = @+ = (); # clear previous match offsets
# populates $-[0] and $+[0]
push @-, index($string, $1);
push @+, pos($string);

# fill @-[1..#@-] and @+[1..#@+]
my $digit = 2; # $1 is the whole match
while () {
no strict 'refs';
if (defined $$digit) {
# extract offsets of $$digit
...
$digit++;
} else {
last;
}
}

$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);
}

The part I'm uneasy about is

if (defined $$digit) {
...
}

Specifically, can $2 be undefined but $3 still contain a submatch? I
vaguely remember that I had such cases but I can't reproduce them now.
If they exist, I can't use the above code and need something better. If
so, what would be a correct solution?

Thanks in advance for any pointers,
Tassilo

Ilya Zakharevich · Sep 8, 2003

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval

push @-, index($string, $1);
push @+, pos($string);

$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);

??? What is the point of finding indices, then calling substr? Why
not use $$_ directly?

Specifically, can $2 be undefined but $3 still contain a submatch?

Of course:

(a)?(b)

Hope this helps,
Ilya

Anno Siegel · Sep 8, 2003

Tassilo v. Parseval said:
Hi,

I am currently working on something that should be backward-compatible
at least up to 5.00503. Unfortunately, this relies on @+ and @- which
weren't there by that time. So I have to find a way to create and
populate these two arrays with 5.00503. Consider the original code that

[$string =~ /($pat)/g]

Specifically, can $2 be undefined but $3 still contain a submatch? I
vaguely remember that I had such cases but I can't reproduce them now.
If they exist, I can't use the above code and need something better. If
so, what would be a correct solution?

I have vaguely asked myself that too.

It is my impression that after a successful match $1 ... $n are
always defined where n is the number of capturing parentheses in $pat.
Even if a submatch doesn't apply (as in an alternation), the corresponding
$i is empty, but defined.

The hard part is finding where this is documented. I can't.

Anno

Tassilo v. Parseval · Sep 8, 2003

Also sprach Ilya Zakharevich:

push @-, index($string, $1);
push @+, pos($string);

Click to expand...

$code->(@- > 1 ? () : substr($string, $-[0], $+[0] - $-[0]),
map substr($string, $-[$_], $+[$_] - $-[$_]), 1 .. $#-);

Click to expand...

??? What is the point of finding indices, then calling substr? Why
not use $$_ directly?

Copy and paste. I have the ordinary function scan() and the backward
compatible one scan5005003() and I do a

if ($] < 5.006) {
*scan = \&scan500503;
}

to replace the version not suitable for this perl. So the
backward-compatible ones are just a copy of the ordinary functions plus
the population of @- and @+.

Maybe I am going to reimplement it later by directly using $$digit, but
that's quite a bit of work since I have around five or six of these
duplicate functions. But I am not yet sure whether I should do that
because then a user cannot use @- and @+ in his code references (as he
could now).

Of course:

(a)?(b)

Hope this helps,

It does, thank you. It means I have to find a different way to
figure out how many submatches exist. :-(

What about perl_get_sv? My code is 90% XS anyway, so I can just as
easily add more of it. Would this be reliable?

int
num_submatches ()
PREINIT:
int i = 0;
char *digit;
int len = 1;
CODE:
New(0, digit, 2, char);
digit = "2";
while (1) {
if (!sv_get(digit, FALSE))
break;
/* next() already exists elsewhere and
* increments/grows the string accordingly */
next(&digit, &len);
i++;
}
RETVAL = i;
OUTPUT:
RETVAL

It assumes that $4 does not exist when $3 was the last succesful submatch.
But I don't know whether perl really destroys all the digit variables
that are larger than the highest submatch.

Tassilo

Ilya Zakharevich · Sep 8, 2003

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval

What about perl_get_sv? My code is 90% XS anyway, so I can just as
easily add more of it. Would this be reliable?

int
num_submatches ()

Just copy the code used for access to @-/@+. Lemme see... mg.c:

case '+':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
if (paren)
goto getparen;
}
sv_setsv(sv,&PL_sv_undef);
break;

Check with older Perl sources, but I think that PM_GETRE should be
something like identity macro on older perls...

Yours,
Ilya

Tassilo v. Parseval · Sep 8, 2003

Also sprach Ilya Zakharevich:

Just copy the code used for access to @-/@+. Lemme see... mg.c:

case '+':
if (PL_curpm && (rx = PM_GETRE(PL_curpm))) {
paren = rx->lastparen;
if (paren)
goto getparen;
}
sv_setsv(sv,&PL_sv_undef);
break;

Check with older Perl sources, but I think that PM_GETRE should be
something like identity macro on older perls...

That's smart. I think that this will make it easy to solve my problem.
'struct regexp' (among others) holds

U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
U32 lastcloseparen; /* last paren matched */

One of those (or even all of them) is probably what I am looking for.

Thanks for your help!
Tassilo

Ilya Zakharevich · Sep 8, 2003

[A complimentary Cc of this posting was sent to
Tassilo v. Parseval

That's smart. I think that this will make it easy to solve my problem.
'struct regexp' (among others) holds

U32 nparens; /* number of parentheses */
U32 lastparen; /* last paren matched */
U32 lastcloseparen; /* last paren matched */

One of those (or even all of them) is probably what I am looking for.

Keep in mind that struct regexp is used for *two* purposes (I did not
have time to clean this when working with REx engine): it keeps a part
of the state during the match process, and it keeps the info used for
$<digit> etc *after* the match. Be sure to use only the entries
mentioned in mg.c.

Yours,
Ilya

Tassilo v. Parseval · Sep 8, 2003

Also sprach Ilya Zakharevich:

Keep in mind that struct regexp is used for *two* purposes (I did not
have time to clean this when working with REx engine): it keeps a part
of the state during the match process, and it keeps the info used for
$<digit> etc *after* the match. Be sure to use only the entries
mentioned in mg.c.

I eventually went for lastparen which seems to work very well. PM_GETRE
had to be added, and that was all. My little XSUB now reads as

int
num_submatches ()
CODE:
#ifndef PM_GETRE
# define PM_GETRE(o) ((o)->op_pmregexp)
#endif
RETVAL = PM_GETRE(PL_curpm)->lastparen;
OUTPUT:
RETVAL

After a wild and ambitious attempt of adding the whole of @+ in @- to
5.00503 failed (naturally, it didn't yet know of about 'D' magic), I can
still implement them as tied arrays. That way I wouldn't need to change
my Perl code.

Ilya, thanks for the invaluable pointers to the relevant bits of the
Perl source. Normally, regexes on the source level really scare me, but
here it was surprisingly not hard at all.

Tassilo

Ajax function only returns error and does not call php function	2	Aug 6, 2022
Help with Loop	0	Mar 30, 2023
Making Datatypes Constant and Emulating Const Correctness	2	Aug 25, 2005
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
I HAVE MADE AN IMPROVED INPUT FOR INTEGERS ONLY	3	Oct 28, 2024
processing strings char-by-char	0	Aug 18, 2013
processing strings char-by-char	3	Aug 18, 2013
Adobe Acrobat JavaScript PDF Script Issues: File Matching and Dynamic Retrieval	0	Nov 29, 2024

emulating @+ and @-

Tassilo v. Parseval

Ilya Zakharevich

Anno Siegel

Tassilo v. Parseval

Ilya Zakharevich

Tassilo v. Parseval

Ilya Zakharevich

Tassilo v. Parseval

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads