Outer scope of a sub inside a sub

K

Koszalek Opalek

Could you elaborate on the subtlety related to scoping demonstrated
by the listing below?

Routine run_test_1 contains a definition of sub filter().
Filter is then passed in a call to add_elem() as a code reference.

run_test_2 uses a different syntax:
my $filter = sub { ...
and then passes variable $filter.

They work identical in the first run. In the second run, however,
the variable @elems accessed by sub filter is not the same as
the variable defined in the body of run_test_1.

Full disclosure: perl emits the following warning:
Variable "@elems" will not stay shared at /tmp/test.pl line 15


#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw/refaddr/;


sub add_elem {
$_[0]->();
}

sub run_test_1 {
my @elems = ();

sub filter {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
}

add_elem( \&filter, "abc" );
add_elem( \&filter, "abc" );
printf "Total number of elements %d\n", scalar @elems;
};

sub run_test_2 {
my @elems = ();

my $filter = sub {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
};

add_elem( $filter, "abc" );
add_elem( $filter, "abc" );
printf "Total number of elements %d\n", scalar @elems;
};

print "--- Test 1 ---\n";
run_test_1();
run_test_1();

print "--- Test 2 ---\n";
run_test_2();
run_test_2();
 
C

C.DeRykus

Could you elaborate on the subtlety related to scoping demonstrated
by the listing below?

Routine run_test_1 contains a definition of sub filter().
Filter is then passed in a call to add_elem() as a code reference.

run_test_2 uses a different syntax:
   my $filter = sub { ...
and then passes variable $filter.

They work identical in the first run. In the second run, however,
the variable @elems accessed by sub filter is not the same as
the variable defined in the body of run_test_1.

Full disclosure: perl emits the following warning:
Variable "@elems" will not stay shared at /tmp/test.pl line 15

#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw/refaddr/;

sub add_elem {
        $_[0]->();

}

sub run_test_1 {
        my @elems = ();

        sub filter {
                printf "%d\n", refaddr \@elems;
                push @elems, $_[0];
        }

        add_elem( \&filter, "abc" );
        add_elem( \&filter, "abc" );
        printf "Total number of elements %d\n", scalar @elems;

};

sub run_test_2 {
        my @elems = ();

        my $filter = sub {
                printf "%d\n", refaddr \@elems;
                push @elems, $_[0];
        };

        add_elem( $filter, "abc" );
        add_elem( $filter, "abc" );
        printf "Total number of elements %d\n", scalar @elems;

};

print "--- Test 1 ---\n";
run_test_1();
run_test_1();

print "--- Test 2 ---\n";
run_test_2();
run_test_2();

You'll need to use an anonymous sub instead of
a named one in both.

Add this pragma: 'use diagnostics -verbose;',
to see a full explanation of the error.
 
J

jl_post

Could you elaborate on the subtlety related to scoping demonstrated
by the listing below?

They work identical in the first run. In the second run, however,
the variable @elems accessed by sub filter is not the same as
the variable defined in the body of run_test_1.

Full disclosure: perl emits the following warning:
Variable "@elems" will not stay shared at /tmp/test.pl line 15

#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw/refaddr/;

sub add_elem {
        $_[0]->();

}

sub run_test_1 {
        my @elems = ();

        sub filter {
                printf "%d\n", refaddr \@elems;
                push @elems, $_[0];
        }

        add_elem( \&filter, "abc" );
        add_elem( \&filter, "abc" );
        printf "Total number of elements %d\n", scalar @elems;

};

sub run_test_2 {
        my @elems = ();

        my $filter = sub {
                printf "%d\n", refaddr \@elems;
                push @elems, $_[0];
        };

        add_elem( $filter, "abc" );
        add_elem( $filter, "abc" );
        printf "Total number of elements %d\n", scalar @elems;

};

print "--- Test 1 ---\n";
run_test_1();
run_test_1();

print "--- Test 2 ---\n";
run_test_2();
run_test_2();


You are printing out the refaddr of @elems only before you push a
new element onto it, yet not right before you query it for its
length. Consider adding the line:

printf "%d\n", refaddr \@elems;

inside both the filter() and $filter subroutines. With this change, I
see this output:

--- Test 1 ---
4639308
4639308
4639308
Total number of elements 2
4639308
4639308
10075924
Total number of elements 0
--- Test 2 ---
4848924
4848924
4848924
Total number of elements 2
4848924
4848924
4848924
Total number of elements 2


You'll see that, although run_test_1() always adds to the same
@elems array, it seems to be calling scalar() on a different one.
That's why run_test_1() shows @elem as having a different number of
elements on subsequent runs: except for that first time, @elems in
filter() does not match @elems in run_test_1().

In other words, @elems in the filter() subroutine is constructed
only once, whereas @elems in run_test_1() is constructed every time
run_test_1() is called. They only match the first time run_test_1()
is called.

As for run_test_2(), the $filter subroutine is constructed every
time run_test_2() is called, and it uses the @elems array declared in
the previous line.

It's a bit confusing because my output shows that the @elems array
in run_test_2() always has the same refaddr (which is probably what
you're seeing as well). But that's only happening because it gets
cleaned up (as there are no more references to it), and further calls
to run_test_2() are using the same memory space for @elems, now that
it has been freed.

To prevent the same memory space from being re-used, declare a
global @references array at the top of the script, and insert the
line:

push @references, \@elems;

after each declaration of @elems. That (and my earlier suggestion of
printing refaddr(\@elems) inside the filter() and $filter subroutines)
should make it clearer as to which @elems array is being used when.

I hope this helps, Koszalek.

-- Jean-Luc
 
S

sln

Could you elaborate on the subtlety related to scoping demonstrated
by the listing below?

Routine run_test_1 contains a definition of sub filter().
Filter is then passed in a call to add_elem() as a code reference.

run_test_2 uses a different syntax:
my $filter = sub { ...
and then passes variable $filter.

They work identical in the first run. In the second run, however,
the variable @elems accessed by sub filter is not the same as
the variable defined in the body of run_test_1.

Full disclosure: perl emits the following warning:
Variable "@elems" will not stay shared at /tmp/test.pl line 15
The easiest way to see it is to put some debug statements in.

filter() stays in the first scope of test1
the first time its called and everytime.
When you push it goes into the first @elems.

$filter is a new instance of sub {} and
is new to test2 everytime its called.
Every time you push, it goes into a new @elems.

Btw, you are not pushing anything but undef's.

-sln

use strict;
use warnings;
use Scalar::Util qw/refaddr/;


sub add_elem {
$_[0]->( $_[0] );
}

sub run_test_1 {
my @elems = ();
printf "\n\ntest_1 : addr \@elems = %d\n", refaddr \@elems;

sub filter {
push @elems, $_[0];
printf " filter() : addr \@elems = %d, scalar \@elems = %d\n", refaddr \@elems, scalar @elems;
printf " my sub addr = %d\n", refaddr $_[0];
}

add_elem( \&filter, "abc" );
add_elem( \&filter, "abc" );
printf "Total number of elements %d, addr elems = %d\n", scalar @elems, refaddr \@elems;
};

sub run_test_2 {
my @elems = ();
printf "\n\ntest_2 : addr \@elems = %d\n", refaddr \@elems;

my $filter = sub {
push @elems, $_[0];
printf " \$filter->() : addr \@elems = %d, scalar \@elems = %d\n", refaddr \@elems, scalar @elems;
printf " my sub addr = %d\n", refaddr $_[0];
};

add_elem( $filter, "abc" );
add_elem( $filter, "abc" );
printf "Total number of elements %d\n", scalar @elems;
};

print "\n\n\n--- Test 1 ---\n";
run_test_1();
run_test_1();

print "\n\n\n--- Test 2 ---\n";
run_test_2();
run_test_2();

__END__

Variable "@elems" will not stay shared at aa.pl line 15.



--- Test 1 ---


test_1 : addr @elems = 25453388
filter() : addr @elems = 25453388, scalar @elems = 1
my sub addr = 25631444
filter() : addr @elems = 25453388, scalar @elems = 2
my sub addr = 25631444
Total number of elements 2, addr elems = 25453388


test_1 : addr @elems = 2272444
filter() : addr @elems = 25453388, scalar @elems = 3
my sub addr = 25631444
filter() : addr @elems = 25453388, scalar @elems = 4
my sub addr = 25631444
Total number of elements 0, addr elems = 2272444



--- Test 2 ---


test_2 : addr @elems = 25650684
$filter->() : addr @elems = 25650684, scalar @elems = 1
my sub addr = 2272124
$filter->() : addr @elems = 25650684, scalar @elems = 2
my sub addr = 2272124
Total number of elements 2


test_2 : addr @elems = 25651212
$filter->() : addr @elems = 25651212, scalar @elems = 1
my sub addr = 25651196
$filter->() : addr @elems = 25651212, scalar @elems = 2
my sub addr = 25651196
Total number of elements 2
 
X

Xho Jingleheimerschmidt

Koszalek said:
Full disclosure: perl emits the following warning:
Variable "@elems" will not stay shared at /tmp/test.pl line 15
....


sub run_test_1 {
my @elems = ();

At compile time, a memory location is arranged for @elems. The first
time run_test_1 is called, it uses this location set up at compile time.
But on subsequent calls, it sets a new location.
sub filter {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
}

At compile time, this glombs onto the same memory location as was used
for @elems when the outer scope was compiled. It is not subsequently
recompiled, and so does not re-glomb onto the new addresses when new
addresses come into existence.
sub run_test_2 {
my @elems = ();

my $filter = sub {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
};

This is recompiled for each invocation of run_test_2, and so re-glombs
onto the underlying address of the structure behind @elems which is in
use each time it recompiled. (I don't think it is recompiles in full,
they use shortcuts that allows it to re-glomb, with full recompilation.
But it behaves as if it were recompiled)

Xho
 
B

Bart Lateur

Koszalek said:
Could you elaborate on the subtlety related to scoping demonstrated
by the listing below?

Routine run_test_1 contains a definition of sub filter().
Filter is then passed in a call to add_elem() as a code reference.

run_test_2 uses a different syntax:
my $filter = sub { ...
and then passes variable $filter.

They work identical in the first run. In the second run, however,
the variable @elems accessed by sub filter is not the same as
the variable defined in the body of run_test_1.

That is one of the biggest warts in Perl, in my not so humble opinion.
The second is behaving in the way the first should have behaved, and as
it does behave in other more sane languages that have nested subs, and
as it behaves in Javascript.

The first is... well... an abomination.

How it behaves is that the nested sub, even though it is defined inside
a sub, is still a global definition. Test it out, in the second example,
the sub filter is still accessible *outside* the scope of run_test_1.
And it is defined once. It's a closure bound to the variables in the
first invocation of the run of the outer sub. So every time you run
filter, it will access the same, now anonymous, variables, be it if you
run it outside of the outer sub, or inside it.

Why it behaves like that? Well, I'm not sure, but I think it was related
to how BEGIN and END (and INIT, and CHECK...) blocks are actually subs,
and you can have sub definitions inside them. Those subs have to behave
globally, as if they were not inside the BEGIN block. Something like
that.

I once chatted to a few of the top P5P people about it, and they said it
was *never* going to change.

Like I said, I think it's an abomination. I think BEGIN, INIT, CHECK
etc. should be special cased, not just subs, to behave like this, and
normal subs should behave differently: the inner sub should really be
only visible inside the scope of the outer sub, be it with local (so
it'll also be visible in subs that you call from within the outer sub),
or, more sane, truely lexical (the way your second example behaves). But
lexical subs don't exist, unfortunately.

BTW here's an example using local:

sub run_test_3 {
my @elems = ();

local *filter = sub {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
}
...
}

That local assignment should happen first when the sub is called, even
if the "sub filter" definition in the source happens to be in the end of
the outer sub.

I'm not sure itll behave properly if you depend on \&filter, though...
But plain `filter(...)` invocations should behave well.
 
K

Koszalek Opalek

That is one of the biggest warts in Perl, in my not so humble opinion.
The second is behaving in the way the first should have behaved, and as
it does behave in other more sane languages that have nested subs, and
as it behaves in Javascript.

Thanks to everyone who posted a reply.
A very interesting thread IMO :)

A.
 
I

Ilya Zakharevich

At compile time, a memory location is arranged for @elems. The first
time run_test_1 is called, it uses this location set up at compile time.
But on subsequent calls, it sets a new location.
^^^^^^^^^^^^^^^^^^^^^^

I think this is misleading. The whole point of pre-allocating "the
location" is an optimization - "the location" (the container
referenced by the lexical) is USUALLY reused in subsequent calls.
Only if the old container is used by something else (has a refcount of
2 or more - e.g., a reference to it is stored somewhere) a new
container is allocated for the lexical.

[E.g., if I'm not mistaken, having @elems refered below, in &filter,
stores a reference in the compile tree of &filter.]
sub filter {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
}
At compile time, this glombs onto the same memory location as was used
for @elems when the outer scope was compiled. It is not subsequently
recompiled, and so does not re-glomb onto the new addresses when new
addresses come into existence.

There is no such thing as "recompilation" (the whole point of closures
is that they are cheap; what you think of is IMO a
poor-man-implementation of closures via eval "sub {...}"). The
"re-SOMETHING" which may happen is a reallocation of lexicals. THIS
is what happens when "sub" is executed at run time, as here:.
sub run_test_2 {
my @elems = ();

my $filter = sub {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
};

Yours,
Ilya
 
C

C.DeRykus

Koszalek said:
Could you elaborate on the subtlety related to scoping demonstrated
by the listing below?
[snip]
...
They work identical in the first run. In the second run, however,
the variable @elems accessed by sub filter is not the same as
the variable defined in the body of run_test_1.

That is one of the biggest warts in Perl, in my not so humble opinion.
The second is behaving in the way the first should have behaved, and as
it does behave in other more sane languages that have nested subs, and
as it behaves in Javascript.

The first is... well... an abomination.

How it behaves is that the nested sub, even though it is defined inside
a sub, is still a global definition. Test it out, in the second example,
the sub filter is still accessible *outside* the scope of run_test_1.
And it is defined once. It's a closure bound to the variables in the
first invocation of the run of the outer sub. So every time you run
filter, it will access the same, now anonymous, variables, be it if you
run it outside of the outer sub, or inside it.

At least there's a warning:

(W closure) An inner (nested) named subroutine
is referencing a lexical variable defined in
an outer named subroutine.

I think the issue's subtle enough that the above
warning could/should add a couple of pointers:

See: 'use diagnostics -verbose'.
Also perlsub/perlmod.

Since 'use diagnostics -verbose' is much completer:

When the inner subroutine is called, it will
see the value of the outer subroutine's variable
as it was before and during the *first* call to
the outer subroutine; in this case, after the
first call to the outer subroutine is complete,
the inner and outer subroutines will no longer
share a common value for the variable. In other
words, the variable will no longer be shared.

This problem can usually be solved by making the
inner subroutine anonymous, using the sub {}
syntax. When inner anonymous subs that reference
variables in outer subroutines are created, they
are automatically rebound to the current values
of such variables.
Why it behaves like that? Well, I'm not sure, but I think it was related
to how BEGIN and END (and INIT, and CHECK...) blocks are actually subs,
and you can have sub definitions inside them. Those subs have to behave
globally, as if they were not inside the BEGIN block. Something like
that.

I once chatted to a few of the top P5P people about it, and they said it
was *never* going to change.

Like I said, I think it's an abomination. I think BEGIN, INIT, CHECK
etc. should be special cased, not just subs, to behave like this, and
normal subs should behave differently: the inner sub should really be
only visible inside the scope of the outer sub, be it with local (so
it'll also be visible in subs that you call from within the outer sub),
or, more sane, truely lexical (the way your second example behaves). But
lexical subs don't exist, unfortunately.

And perlsub/perlmod would help clarify that as well:

The "BEGIN", "UNITCHECK", "CHECK", "INIT"
and "END" subroutines are not so much subroutines
as named special code blocks, of which you can have
more than one in a package, and which you can not
call explicitly. See "BEGIN, UNITCHECK, CHECK, INIT
and END" in perlmod

These code blocks can be prefixed with "sub" to
give the appearance of a subroutine (although this
is not considered good style). One should note
that these code blocks don't really exist as named
subroutines (despite their appearance). The thing
that gives this away is the fact that you can have
more than one of these code blocks in a program,
and they will get all executed at the appropriate
moment. So you can't execute any of these code
blocks by name.
 
X

Xho Jingleheimerschmidt

Ilya said:
There is no such thing as "recompilation" (the whole point of closures
is that they are cheap; what you think of is IMO a
poor-man-implementation of closures via eval "sub {...}").

Sure there is such a thing as recompilation. Just put it in a string
eval. It isn't recompiled in *this* context, of course, which is what I
said.



Xho
 
I

Ilya Zakharevich

Sure there is such a thing as recompilation. Just put it in a string
eval. It isn't recompiled in *this* context, of course, which is what I
said.

Mea culpa, I quoted a wrong paragraph. I should have stated it "like
this":
sub run_test_2 {
my @elems = ();

my $filter = sub {
printf "%d\n", refaddr \@elems;
push @elems, $_[0];
};

This is recompiled for each invocation of run_test_2, and so re-glombs
onto the underlying address of the structure behind @elems which is in
use each time it recompiled. (I don't think it is recompiles in full,
they use shortcuts that allows it to re-glomb, with full
recompilation. But it behaves as if it were recompiled)

There is no such thing as "recompilation" (the whole point of closures
is that they are cheap; what you think of is IMO a
poor-man-implementation of closures via eval "sub {...}").

Sorry,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top