Nested function scope problem

J

Josiah Manson

I found that I was repeating the same couple of lines over and over in
a function and decided to split those lines into a nested function
after copying one too many minor changes all over. The only problem is
that my little helper function doesn't work! It claims that a variable
doesn't exist. If I move the variable declaration, it finds the
variable, but can't change it. Declaring the variable global in the
nested function doesn't work either.

But, changing the variable in the containing scope is the whole purpose
of this helper function.

I'm new to python, so there is probably some solution I haven't
encountered yet. Could you please suggest a nice clean solution? The
offending code is below. Thanks.

def breakLine(s):
"""Break a string into a list of words and symbols.
"""
def addTok():
if len(tok) > 0:
ls.append(tok)
tok = ''

ls = []
tok = ''
splitters = '?()&|:~,'
whitespace = ' \t\n\r'

for c in s:
if c in splitters:
addTok()
ls.append(c)
elif c in whitespace:
addTok()
else:
tok = tok + c

addTok()

return ls

#some tests to make sure it works
print breakLine('carolina(Prada):cat(X,Y)')
print breakLine('trouble :bird (X ) &cat ( Y )')
print breakLine('?trouble')
 
G

Gerhard Fiedler

I found that I was repeating the same couple of lines over and over in
a function and decided to split those lines into a nested function
after copying one too many minor changes all over. The only problem is
that my little helper function doesn't work! It claims that a variable
doesn't exist. If I move the variable declaration, it finds the
variable, but can't change it. Declaring the variable global in the
nested function doesn't work either.

But, changing the variable in the containing scope is the whole purpose
of this helper function.

I'm new to python, so there is probably some solution I haven't
encountered yet. Could you please suggest a nice clean solution? The
offending code is below. Thanks.

I'm no Python specialist, so here's just some guesses... I don't know how
to make variables known inside the inner function. It seems just using the
names there overrides the outside names. It also seems that local variables
are in some kind of dictionary; so maybe you can access them through that
somehow.

One other solution (a bit ugly) would be to make this a class with two
static methods (breakLine and _addTok) and two static attributes (_ls and
_tok).

Still another (also ugly) would be to pass both tok and ls to addTok() and
pass tok back out again. (I think ls doesn't have to be passed back,
because it is a list and so its data gets modified. tok's data doesn't get
modified, so local changes don't propagate to the outside.)

Gerhard
 
S

Simon Forman

Gerhard said:
I'm no Python specialist, so here's just some guesses... I don't know how
to make variables known inside the inner function. It seems just using the
names there overrides the outside names. It also seems that local variables
are in some kind of dictionary; so maybe you can access them through that
somehow.

One other solution (a bit ugly) would be to make this a class with two
static methods (breakLine and _addTok) and two static attributes (_ls and
_tok).

Still another (also ugly) would be to pass both tok and ls to addTok() and
pass tok back out again. (I think ls doesn't have to be passed back,
because it is a list and so its data gets modified. tok's data doesn't get
modified, so local changes don't propagate to the outside.)

Gerhard

That third option seems to work fine.

def breakLine(s):
"""Break a string into a list of words and symbols.
"""

def addTok(tok, ls):
if len(tok) > 0:
ls.append(tok)
tok = ''
return tok

ls = []
tok = ''
splitters = '?()&|:~,'
whitespace = ' \t\n\r'

for c in s:
if c in splitters:
tok = addTok(tok, ls)
ls.append(c)
elif c in whitespace:
tok = addTok(tok, ls)
else:
tok = tok + c

tok = addTok(tok, ls)

return ls

#some tests to make sure it works
print breakLine('carolina(Prada):cat(X,Y)')
print breakLine('trouble :bird (X ) &cat ( Y )')
print breakLine('?trouble')

# Prints:
['carolina', '(', 'Prada', ')', ':', 'cat', '(', 'X', ',', 'Y', ')']
['trouble', ':', 'bird', '(', 'X', ')', '&', 'cat', '(', 'Y', ')']
['?', 'trouble']
 
J

Justin Azoff

Simon said:
That third option seems to work fine.

Well it does, but there are still many things wrong with it

if len(tok) > 0:
should be written as
if(tok):

tok = ''
tok = toc + c
should be written as
tok = []
tok.append(c)
and later
''.join(toc)

anyway, the entire thing should be replaced with something like this:
import re
def breakLine(s):
splitters = '?()&|:~,'
chars = '^ \t\n\r\f\v%s' % splitters
regex = '''(
(?:[%s])
|
(?:[%s]+))''' % (splitters, chars)
return re.findall(regex, s,re.VERBOSE)

That should be able to be simplified even more if one were to use the
character lists built into the regex standard.
 
L

Lawrence D'Oliveiro

Justin said:
Well it does, but there are still many things wrong with it

if len(tok) > 0:
should be written as
if(tok):

I prefer the first way. Besides, your way is sub-optimal.
 
J

Josiah Manson

Thank you for your corrections to the previous code. Your regex
solution is definitely much cleaner. Referring to your other
suggestions, is the advantage of using a list of chars instead of
adding to a string just a bow to big-O complexity, or are there other
considerations? First I had tried appending to the string, but it seems
they are immutable. It seems that using a list for a string isn't a
very clear way to represent a mutable string.

Although I gladly accept that using a regex is the best solution to
this problem, I am still interested in knowing how to access the
variables in a containing function. It seems that there should be some
keyword akin to global that would expose them, or some other method. I
have read that python uses nested scopes (or at least was planning to
in 2.2), so I wonder what I am missing.
 
J

Josiah Manson

I just did some timings, and found that using a list instead of a
string for tok is significantly slower (it takes 1.5x longer). Using a
regex is slightly faster for long strings, and slightly slower for
short ones. So, regex wins in both berevity and speed!
 
J

Justin Azoff

Josiah said:
I just did some timings, and found that using a list instead of a
string for tok is significantly slower (it takes 1.5x longer). Using a
regex is slightly faster for long strings, and slightly slower for
short ones. So, regex wins in both berevity and speed!

I think the list.append method of building strings may only give you
speed improvements when you are adding bigger chunks of strings
together instead of 1 character at a time. also:

http://docs.python.org/whatsnew/node12.html#SECTION0001210000000000000000

"""String concatenations in statements of the form s = s + "abc" and s
+= "abc" are now performed more efficiently in certain circumstances.
This optimization won't be present in other Python implementations such
as Jython, so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to efficiently glue a large
number of strings together. (Contributed by Armin Rigo.)"""

I tested both, and these are my results for fairly large strings:

justin@latitude:/tmp$ python /usr/lib/python2.4/timeit.py -s'import
foo' 'foo.test(foo.breakLine)'
10 loops, best of 3: 914 msec per loop

justin@latitude:/tmp$ python /usr/lib/python2.4/timeit.py -s'import
foo' 'foo.test(foo.breakLineRE)'
10 loops, best of 3: 289 msec per loop
 
B

Bruno Desthuilliers

Josiah Manson a écrit :
I found that I was repeating the same couple of lines over and over in
a function and decided to split those lines into a nested function
after copying one too many minor changes all over. The only problem is
that my little helper function doesn't work! It claims that a variable
doesn't exist. If I move the variable declaration, it finds the
variable, but can't change it. Declaring the variable global in the
nested function doesn't work either.

But, changing the variable in the containing scope is the whole purpose
of this helper function.

I'm new to python, so there is probably some solution I haven't
encountered yet. Could you please suggest a nice clean solution? The
offending code is below. Thanks.

def breakLine(s):
"""Break a string into a list of words and symbols.
"""
def addTok():
if len(tok) > 0:

if tok:

An empty sequence evals to False in a boolean context.
ls.append(tok)
tok = ''

First point: the nested function only have access to names that exists
in the enclosing namespace at the time it's defined.

Second point: a nested function cannot rebind names from the enclosing
namespace. Note that in Python, rebinding a name and modifying the
object bound to a name are very distinct operations.

Third point : functions modifying their environment this way are usually
considered bad form.

Here's a possible solution - but note that there are probably much
better ways to get the same result...

def breakline(line):
"""Break a string into a list of words and symbols."""
class TokenList(list):
def append(self, token):
if token:
list.append(self, token)
return ''

tokens = TokenList()
token = ''
splitters = '?()&|:~,'
whitespace = ' \t\n\r'
specials = splitters + whitespace

for char in line:
if char in specials:
token = tokens.append(token)
if char in splitters:
tokens.append(char)
else:
token += char

tokens.append(token)
return list(tokens)

(snip)
 
B

Bruno Desthuilliers

Justin Azoff a écrit :
Simon said:
That third option seems to work fine.


Well it does, but there are still many things wrong with it
(snip)

tok = ''
tok = toc + c
should be written as
tok = []
tok.append(c)
and later
''.join(toc)

IIRC, string concatenation slowness has been fixed a few versions ago -
at least in CPython - , so there's no more reason to use this idiom.

(snip)
 
B

Bruno Desthuilliers

Justin Azoff a écrit :
Well it does, but there are still many things wrong with it

if len(tok) > 0:
should be written as
if(tok):

actually, the parenthesis are useless.
 
J

Justin Azoff

Bruno said:
Justin Azoff a écrit :

actually, the parenthesis are useless.

yes, that's what happens when you edit something instead of typing it
over from scratch :)
 
D

danielx

Bruno said:
Josiah Manson a écrit :

if tok:

An empty sequence evals to False in a boolean context.

I can't figure out why Josiah's breakLine function won't work either. I
know Josiah has had his problem resolved, but I'd still like to know
why his func won't work. I'd like to redirect this discussion in that
direction, if I may.
First point: the nested function only have access to names that exists
in the enclosing namespace at the time it's defined.

Coming from lisp, that doesn't make very much sense, and I'm not sure
that's true. If you move the def for addTok bellow the lines that
initialize the locals of breakLines, you still get the same problem.
Second point: a nested function cannot rebind names from the enclosing
namespace. Note that in Python, rebinding a name and modifying the
object bound to a name are very distinct operations.

I'm not sure that's the problem, because when I ran the debugger, the
problem is with the line that says if len(tok), not the one bellow it
which says tok = "". Regardless, my above issue seems to be overriding
this one.
Third point : functions modifying their environment this way are usually
considered bad form.

Again, this is coming from lisp, but I don't see anything wrong with
that :p.

***

After some experimentation, I am completely baffeled as to why
breakLine won't work. Here is an example of one of the things I did,
which I believe exactly mimics what breakLine does:
.... def inner():
.... if outerLocal:
.... return "I hear you, 'hello world'."
.... else:
.... return "Come again?"
.... outerLocal = "hello world"
.... return inner()
...."I hear you, 'hello world'."

As I said, I believe the line which sets tok should break (quietly),
but not the line which tests tok. My experiment seems to confirm
this...

One thing I can understand is why the line tok = "" in addTok won't
work. This is because when Python sees that line, it should create a
new local variable in the scope of addTok. Once addTok returns, that
variable is lost. That's pretty deep, now that I've thought about it...
 
G

Gerhard Fiedler

I can't figure out why Josiah's breakLine function won't work either. I
know Josiah has had his problem resolved, but I'd still like to know
why his func won't work. I'd like to redirect this discussion in that
direction, if I may.

I think what happens is this (and this may not be expressed in the proper
terms for Python): It is possible to read variables from the outer function
in the inner function. But when trying to write to them, this causes that
same name to be re-defined in the inner function's scope, making it a
different variable. Now, in the OP's code, that caused that new variable
(with scope of the inner function) to be accessed before anything was
assigned to it.

One obvious way is to not write to the variables from the outer scope, but
rather return a value from the inner function and assign it in the outer
function. But it seems there should be a way to be able to write in the
inner function to variables that are defined in the outer function.

Coming from lisp, that doesn't make very much sense, and I'm not sure
that's true. If you move the def for addTok bellow the lines that
initialize the locals of breakLines, you still get the same problem.

The problem there is only secondarily a scope problem. At first it is
reading a variable (the inner function scope variable tok) before anything
has been assigned to it. Of course, the real problem is the secondary one:
that this variable tok is a variable of scope addTok and not of scope
breakLine.
I'm not sure that's the problem, because when I ran the debugger, the
problem is with the line that says if len(tok), not the one bellow it
which says tok = "". Regardless, my above issue seems to be overriding
this one.

Yes, but it is the line "tok = ''" that seems to cause tok to be now a
variable of the inner function's scope (rather than the variable tok of
breakLine).

After some experimentation, I am completely baffeled as to why
breakLine won't work. Here is an example of one of the things I did,
which I believe exactly mimics what breakLine does:

... def inner():
... if outerLocal:
... return "I hear you, 'hello world'."
... else:
... return "Come again?"
... outerLocal = "hello world"
... return inner()
...
"I hear you, 'hello world'."

As I said, I believe the line which sets tok should break (quietly),
but not the line which tests tok. My experiment seems to confirm
this...

The line that sets tok causes tok to be a different tok from the outer tok
-- in the whole scope of the assignment.

Gerhard
 
B

Bruno Desthuilliers

Bruno Desthuilliers wrote:
(snip)
First point: the nested function only have access to names that exists
in the enclosing namespace at the time it's defined.

Duh.

Sometimes I'd better go to bed instead of answering posts here - I'd say
less stupidities. re-reading this, I can't believe I actually wrote such
an absurdity.
 
B

Bruno Desthuilliers

danielx said:
I can't figure out why Josiah's breakLine function won't work either. I
know Josiah has had his problem resolved, but I'd still like to know
why his func won't work. I'd like to redirect this discussion in that
direction, if I may.

oops - Sorry, said an obvious stupidity here (was very tired, should not
have answered at all...)
Coming from lisp, that doesn't make very much sense, and I'm not sure
that's true. If you move the def for addTok bellow the lines that
initialize the locals of breakLines, you still get the same problem.

of course.
I'm not sure that's the problem, because when I ran the debugger, the
problem is with the line that says if len(tok), not the one bellow it
which says tok = "".

That's a side-effect of rebinding tok - it makes the name local. Even if
the rebiding is done *after* first use of the name...

Again, this is coming from lisp, but I don't see anything wrong with
that :p.


***

After some experimentation, I am completely baffeled as to why
breakLine won't work. Here is an example of one of the things I did,
which I believe exactly mimics what breakLine does:


... def inner():
... if outerLocal:
... return "I hear you, 'hello world'."
... else:
... return "Come again?"
... outerLocal = "hello world"
... return inner()
...


"I hear you, 'hello world'."

As I said, I believe the line which sets tok should break (quietly),
but not the line which tests tok. My experiment seems to confirm
this...

You did not rebind 'outerLocal' in your above code.
One thing I can understand is why the line tok = "" in addTok won't
work. This is because when Python sees that line, it should create a
new local variable in the scope of addTok.

Yes. But this local name is referenced before assignment.
Once addTok returns, that
variable is lost. That's pretty deep, now that I've thought about it...

(snip)

Sorry once again for the obvious stupidity I wrote as first point. Next
time I'll go to bed instead, I promise :(
 
D

danielx

Gerhard said:
I think what happens is this (and this may not be expressed in the proper
terms for Python): It is possible to read variables from the outer function
in the inner function. But when trying to write to them, this causes that
same name to be re-defined in the inner function's scope, making it a
different variable. Now, in the OP's code, that caused that new variable
(with scope of the inner function) to be accessed before anything was
assigned to it.

One obvious way is to not write to the variables from the outer scope, but
rather return a value from the inner function and assign it in the outer
function. But it seems there should be a way to be able to write in the
inner function to variables that are defined in the outer function.



The problem there is only secondarily a scope problem. At first it is
reading a variable (the inner function scope variable tok) before anything
has been assigned to it. Of course, the real problem is the secondary one:
that this variable tok is a variable of scope addTok and not of scope
breakLine.


Yes, but it is the line "tok = ''" that seems to cause tok to be now a
variable of the inner function's scope (rather than the variable tok of
breakLine).

OHH! Yes, that sounds like it could be it. Wow, to me, that behavior is
eXtremely unexpected (there's lisp popping up its ugly head again :p).
So you're saying that because an assignment to tok appears later within
the def for addTok, that the line if len(tok) won't try to look in
enclosing local scopes? (If such things even exist...)

Gerhard's reply sounded not so confident. Can we have someone who
"really" knows weigh in on this? Thanks!
 
S

Steve Holden

danielx said:
Gerhard Fiedler wrote: [...]
Yes, but it is the line "tok = ''" that seems to cause tok to be now a
variable of the inner function's scope (rather than the variable tok of
breakLine).


OHH! Yes, that sounds like it could be it. Wow, to me, that behavior is
eXtremely unexpected (there's lisp popping up its ugly head again :p).
So you're saying that because an assignment to tok appears later within
the def for addTok, that the line if len(tok) won't try to look in
enclosing local scopes? (If such things even exist...)

Gerhard's reply sounded not so confident. Can we have someone who
"really" knows weigh in on this? Thanks!
Would I do? If there's a binding to a name *anywhere* in the function's
body then that name is treated as local to the function. This is a
matter of static analysis, and is irrespective of where in the body the
assignment is found.

Of course, you could always test this yourself in interactive mode ...

regards
Steve
 
G

Gerhard Fiedler

Gerhard's reply sounded not so confident.

Yes, it is not. It's just the conclusion I drew from my experiments. (I'm
still all wet behind the ears WRT Python...)

As long as there was no write access to the variable, the inner function
could read the value just fine. When there was a write access, the first
read (if it was before the write access) bombed with that error message
that there was a read without a previous write. (Actually, it wasn't a
simple read, it was that "len(var)" access. I'm not sure a normal read
would bomb the same way. But the error message was that it was a read
access without a previous write.)

So that was just my conclusion. It is also consistent with the observation
that variables seem to be known in their scope, even before the location of
their first appearance.
Can we have someone who "really" knows weigh in on this?

That would be nice :)

Gerhard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top