Is crawling the stack "bad"? Why?

R

Russell Warren

I've got a case where I would like to know exactly what IP address a
client made an RPC request from. This info needs to be known inside
the RPC function. I also want to make sure that the IP address
obtained is definitely the correct one for the client being served by
the immediate function call. That is kind of dumb/obvious to say, but
I do just to highlight that it could be a problem for an RPC server
allowing multiple simultaneous connections on multiple threads. ie: I
can't set some simple "current_peer_info" variable when the connection
is made and let the RPC function grab that value later since by the
time it does it could easily be wrong.

In order to solve this I toyed with a few schemes, but have (so far)
settled on crawling up the stack from within the RPC call to a point
where the precise connection info that triggered the RPC call to run
could be determined. This makes sure (I think!) that I get the exact
connection info in the event of a lot of simultaneous executions on
different threads. It seems hackish, though. I frequently find that
I take the long way around to do something only to find out later that
there is a nice and tight pythonic way to get it done. This seems
like it might be one of those cases and the back of my mind keeps
trying to relegate this into the realm of cheat code that will cause
me major pain later. I can't stop thinking the old days and slapping
gotos all over code to fix something "quickly" rather than
restructuring properly. Crawling around the stack in non-debugger
code always seems nasty to me, but it sure seems to work nicely in
this case...

To illustrate this scheme I've got a short program using
SimpleXMLRPCServer to do it. The code is below. If you run it you
should get an output something like:

RPC call came in on: ('127.0.0.1', 42264)

Does anyone have a better way of doing this? Anyone want to warn me
off of crawling the stack to get this type of info? The docstring for
sys._getframe already warns me off by saying "This function should be
used for internal and specialized purposes only", but without
providing any convincing argument why that is the case. I'd love to
hear a reasonable argument... the only thing I can think of is that it
starts dipping into lower level language behavior and might cause
problems if your aren't careful. Which is almost as vague as "for
internal and specialized purposes only".

I'm very curious to hear what you python wizards have to say.

----

import SimpleXMLRPCServer, xmlrpclib, threading, sys

def GetCallerNameAndArgs(StackDepth = 1):
"""This function returns a tuple (a,b) where:
a = The name of the calling function
b = A dictionary with the arg values in order
"""
f = sys._getframe(StackDepth + 1) #+1 to account for this call
callerName = f.f_code.co_name
#get the arg count for the frame...
argCount = f.f_code.co_argcount
#get a tuple with the local vars in the frame (puts the args
first)...
localVars = f.f_code.co_varnames
#now get the tuple of just the args...
argNames = localVars[:argCount]
#now to make a dictionary of args and values...
argDict = {}
for key in argNames:
argDict[key] = f.f_locals[key]
return (callerName, argDict)

def GetRpcClientConnectionInfo():
#Move up the stack to the right point to figure out client info...
requestHandler = GetCallerNameAndArgs(4)[1]["self"]
usedSocket = requestHandler.connection
return str(usedSocket.getpeername())

def StartSession():
return "RPC call came in on: %s" % GetRpcClientConnectionInfo()

class DaemonicServerLaunchThread(threading.Thread):
def __init__(self, RpcServer, **kwargs):
threading.Thread.__init__(self, **kwargs)
self.setDaemon(1)
self.server = RpcServer
def run(self):
self.server.serve_forever()

rpcServer = SimpleXMLRPCServer.SimpleXMLRPCServer(("", 12390), \
logRequests = False)
rpcServer.register_function(StartSession)
slt = DaemonicServerLaunchThread(rpcServer)
slt.start()

sp = xmlrpclib.ServerProxy("http://localhost:12390")
print sp.StartSession()
 
R

Russell Warren

Argh... the code wrapped... I thought I made it narrow enough. Here
is the same code (sorry), but now actually pasteable.

---

import SimpleXMLRPCServer, xmlrpclib, threading, sys

def GetCallerNameAndArgs(StackDepth = 1):
"""This function returns a tuple (a,b) where:
a = The name of the calling function
b = A dictionary with the arg values in order
"""
f = sys._getframe(StackDepth + 1) #+1 to account for this call
callerName = f.f_code.co_name
#get the arg count for the frame...
argCount = f.f_code.co_argcount
#get a tuple with the local vars in the frame (args first)...
localVars = f.f_code.co_varnames
#now get the tuple of just the args...
argNames = localVars[:argCount]
#now to make a dictionary of args and values...
argDict = {}
for key in argNames:
argDict[key] = f.f_locals[key]
return (callerName, argDict)

def GetRpcClientConnectionInfo():
#Move up the stack to the location to figure out client info...
requestHandler = GetCallerNameAndArgs(4)[1]["self"]
usedSocket = requestHandler.connection
return str(usedSocket.getpeername())

def StartSession():
return "RPC call came in on: %s" % GetRpcClientConnectionInfo()

class DaemonicServerLaunchThread(threading.Thread):
def __init__(self, RpcServer, **kwargs):
threading.Thread.__init__(self, **kwargs)
self.setDaemon(1)
self.server = RpcServer
def run(self):
self.server.serve_forever()

rpcServer = SimpleXMLRPCServer.SimpleXMLRPCServer(("", 12390), \
logRequests = False)
rpcServer.register_function(StartSession)
slt = DaemonicServerLaunchThread(rpcServer)
slt.start()

sp = xmlrpclib.ServerProxy("http://localhost:12390")
print sp.StartSession()
 
P

Paul Rubin

That is just madness. The incoming ip address is available to the
request handler, see the SocketServer docs. Write a request handler
that stashes that info somewhere that rpc responders can access it in
a sane way.
 
R

Russell Warren

That is just madness.

What specifically makes it madness? Is it because sys._frame is "for
internal and specialized purposes only"? :)
The incoming ip address is available to the request handler, see the
SocketServer docs

I know... that is exactly where I get the address, just in a mad way.
Write a request handler that stashes that info somewhere that rpc
responders can access it in a sane way.

That is exactly where I started (creating my own request handler,
snagging the IP address and stashing it), but I couldn't come up with
a stash location that would work for a threaded server. This is the
problem I was talking about with the "current_peer_info" scheme. How
is the RPC responder function supposed to know what is the right
stash, given that when threaded there could be multiple stashes at a
time? The IP needs to get down to the exact function execution that
is responding to the client... how do I do that?

I had my options as:

1) stash the IP address somewhere where the RPC function could get it
2) pass the IP down the dispatch chain to be sure it gets to the
target

I couldn't come up with a way to get 1) to work. Then, trying to
accomplish 2) I reluctantly started messing with different schemes
involving my own versions of do_POST, _marshaled_dispatch, and
_dispatch in order to pass the IP directly down the stack. After some
pain at this (those dispatches are weird) I decided it was waaaay too
much of a hack. Then I thought "why not go up the stack to fetch it
rather than trying to mess with the nice/weird dispatch chain to send
it down". I now had a third option...

3) Go up the stack to fetch the exact IP for the thread

After realizing this I had my working stack crawl code only a few
minutes later (I had GetCallerNameAndArgs already). Up the stack has
a clear path. Down was murky and involved trampling on code I didn't
want to override. The results is much cleaner than what I was doing
and it worked, albeit with the as yet unfounded "crawling the stack is
bad" fear still there.

I should also point out that I'm not tied to SimpleXMLRPCServer, it is
just a convenient example. I think any RPC protocol and dispatcher
scheme would have the same problem.

I'd be happy to hear about a clean stashing scheme (or any other
alternative) that works for a threaded server.

My biggest specific fear at the moment is that sys._frame will do
funky things with multiple threads, but given that my toy example is
executing in a server on its own thread and it traces perfectly I'm
less worried. Come to think of it, I wonder what happens when you
crawl up to and past thread creation? Hmm.
 
P

Paul Rubin

Russell Warren said:
That is exactly where I started (creating my own request handler,
snagging the IP address and stashing it), but I couldn't come up with
a stash location that would work for a threaded server.

How about a dictionary indexed by by the thread name. It's pretty
lame, though, that the rpc server module itself doesn't make the
request available to the rpc responder. Maybe you should submit a
patch.
My biggest specific fear at the moment is that sys._frame will do
funky things with multiple threads,

You should not rely on anything that implementation specific at all.
What happens if you want to switch to pypy?
 
B

Boris Borcic

Paul said:
How about a dictionary indexed by by the thread name.

the threading.local class seems defined for that purpose, not that I've ever
used it ;)

BB
 
R

Russell Warren

How about a dictionary indexed by by the thread name.

Ok... a functional implementation doing precisely that is at the
bottom of this (using thread.get_ident), but making it possible to
hand around this info cleanly seems a bit convoluted. Have I made it
more complicated than I need to? There must be a better way? It sure
is a heck of a lot less straightforward than having a reasonably tight
CrawlUpStackToGetClientIP function call. But then nothing is more
straightforward than a simple goto, either...

So I ask again, what is wrong with crawling the stack?
What happens if you want to switch to pypy?

If it doesn't work if I decide to switch implementations for some
reason, I just fix it when my unit tests tell me it is busted. No?
Aren't there also python implementations that don't have threadign in
them that would file using thread.get_ident? Seems hard to satisfy
all implementations.
the threading.local class seems defined for that purpose, not that I've ever
used it ;)

I hadn't heard of that... it seems very useful, but in this case I
think it just saves me the trouble of making a stash dictionary...
unless successive calls to threading.local return the same instance?
I'll have to try that, too.

---

import xmlrpclib, threading, sys, thread
from SimpleXMLRPCServer import SimpleXMLRPCServer, \
SimpleXMLRPCRequestHandler

class RpcContainer(object):
def __init__(self):
self._Handlers = {} #keys = thread IDs, values=requestHandlers
def _GetRpcClientIP(self):
connection = self._Handlers[thread.get_ident()].connection
ip = connection.getpeername()[0]
return ip
def WhatIsMyIP(self):
return "Your IP is: %s" % self._GetRpcClientIP()

class ThreadCapableRequestHandler(SimpleXMLRPCRequestHandler):
def do_POST(self, *args, **kwargs):
#make the handler available to the RPCs, indexed by threadID...
self.server.RpcContainer._Handlers[thread.get_ident()] = self
SimpleXMLRPCRequestHandler.do_POST(self, *args, **kwargs)

class MyXMLRPCServer(SimpleXMLRPCServer):
def __init__(self, RpcContainer, *args, **kwargs):
self.RpcContainer = RpcContainer
SimpleXMLRPCServer.__init__(self, *args, **kwargs)

class DaemonicServerLaunchThread(threading.Thread):
def __init__(self, RpcServer, **kwargs):
threading.Thread.__init__(self, **kwargs)
self.setDaemon(1)
self.server = RpcServer
def run(self):
self.server.serve_forever()

container = RpcContainer()
rpcServer = MyXMLRPCServer( \
RpcContainer = container,
addr = ("", 12390),
requestHandler = ThreadCapableRequestHandler,
logRequests = False)
rpcServer.register_function(container.WhatIsMyIP)
slt = DaemonicServerLaunchThread(rpcServer)
slt.start()

sp = xmlrpclib.ServerProxy("http://localhost:12390")
print sp.WhatIsMyIP()
 
I

Ian Clark

I hadn't heard of that... it seems very useful, but in this case I
think it just saves me the trouble of making a stash dictionary...
unless successive calls to threading.local return the same instance?
I'll have to try that, too.

No, successive calls to threading.local() will return different objects.
So, you call it once to get your 'data store' and then use that one
object from all your threads. It takes care of making sure each thread
gets it's own data.

Here is your example, but using threading.local instead of your own
version of it. :)

Ian

import xmlrpclib, threading, sys, thread
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler

thread_data = threading.local()

class RpcContainer(object):
def __init__(self):
self._Handlers = {} #keys = thread IDs, values=requestHandlers
def _GetRpcClientIP(self):
#connection = self._Handlers[thread.get_ident()].connection
connection = thread_data.request.connection
ip = connection.getpeername()[0]
return ip
def WhatIsMyIP(self):
return "Your IP is: %s" % self._GetRpcClientIP()

class ThreadCapableRequestHandler(SimpleXMLRPCRequestHandler):
def do_POST(self, *args, **kwargs):
#make the handler available to the RPCs, indexed by threadID...
thread_data.request = self
SimpleXMLRPCRequestHandler.do_POST(self, *args, **kwargs)

class MyXMLRPCServer(SimpleXMLRPCServer):
def __init__(self, RpcContainer, *args, **kwargs):
self.RpcContainer = RpcContainer
SimpleXMLRPCServer.__init__(self, *args, **kwargs)

class DaemonicServerLaunchThread(threading.Thread):
def __init__(self, RpcServer, **kwargs):
threading.Thread.__init__(self, **kwargs)
self.setDaemon(1)
self.server = RpcServer
def run(self):
self.server.serve_forever()

container = RpcContainer()
rpcServer = MyXMLRPCServer(
RpcContainer = container,
addr = ("", 12390),
requestHandler = ThreadCapableRequestHandler,
logRequests = False)
rpcServer.register_function(container.WhatIsMyIP)
slt = DaemonicServerLaunchThread(rpcServer)
slt.start()

sp = xmlrpclib.ServerProxy("http://localhost:12390")
print sp.WhatIsMyIP()
 
R

Russell Warren

Thanks Ian... I didn't know about threading.local before but have been
experimenting and it will likely come in quite handy in the future.
For this particular case it does basically seem like a replacement for
the threadID indexed dictionary, though. ie: I'll still need to set
up the RpcContainer, custom request handler, and custom server in
order to get the info handed around properly. I will likely go with
this approach since it lets me customize other aspects at the same
time, but for client IP determination alone I still half think that
the stack crawler is cleaner.

No convincing argument yet on why crawling the stack is considered
bad? I kind of hoped to come out of this with a convincing argument
that would stick with me...

I hadn't heard of that... it seems very useful, but in this case I
think it just saves me the trouble of making a stash dictionary...
unless successive calls to threading.local return the same instance?
I'll have to try that, too.

No, successive calls to threading.local() will return different objects.
So, you call it once to get your 'data store' and then use that one
object from all your threads. It takes care of making sure each thread
gets it's own data.

Here is your example, but using threading.local instead of your own
version of it. :)

Ian

import xmlrpclib, threading, sys, thread
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler

thread_data = threading.local()

class RpcContainer(object):
def __init__(self):
self._Handlers = {} #keys = thread IDs, values=requestHandlers
def _GetRpcClientIP(self):
#connection = self._Handlers[thread.get_ident()].connection
connection = thread_data.request.connection
ip = connection.getpeername()[0]
return ip
def WhatIsMyIP(self):
return "Your IP is: %s" % self._GetRpcClientIP()

class ThreadCapableRequestHandler(SimpleXMLRPCRequestHandler):
def do_POST(self, *args, **kwargs):
#make the handler available to the RPCs, indexed by threadID...
thread_data.request = self
SimpleXMLRPCRequestHandler.do_POST(self, *args, **kwargs)

class MyXMLRPCServer(SimpleXMLRPCServer):
def __init__(self, RpcContainer, *args, **kwargs):
self.RpcContainer = RpcContainer
SimpleXMLRPCServer.__init__(self, *args, **kwargs)

class DaemonicServerLaunchThread(threading.Thread):
def __init__(self, RpcServer, **kwargs):
threading.Thread.__init__(self, **kwargs)
self.setDaemon(1)
self.server = RpcServer
def run(self):
self.server.serve_forever()

container = RpcContainer()
rpcServer = MyXMLRPCServer(
RpcContainer = container,
addr = ("", 12390),
requestHandler = ThreadCapableRequestHandler,
logRequests = False)
rpcServer.register_function(container.WhatIsMyIP)
slt = DaemonicServerLaunchThread(rpcServer)
slt.start()

sp = xmlrpclib.ServerProxy("http://localhost:12390")
print sp.WhatIsMyIP()
 
S

Steve Holden

Russell said:
Thanks Ian... I didn't know about threading.local before but have been
experimenting and it will likely come in quite handy in the future.
For this particular case it does basically seem like a replacement for
the threadID indexed dictionary, though. ie: I'll still need to set
up the RpcContainer, custom request handler, and custom server in
order to get the info handed around properly. I will likely go with
this approach since it lets me customize other aspects at the same
time, but for client IP determination alone I still half think that
the stack crawler is cleaner.

No convincing argument yet on why crawling the stack is considered
bad? I kind of hoped to come out of this with a convincing argument
that would stick with me...
OK, if you crawl the stack I will seek you out and hit you with a big
stick. Does that affect your decision-making?

Seriously, crawling the stack introduces the potential for disaster in
your program, since there is no guarantee that the calling code will
provide the same environment i future released. So at best you tie your
solution to a particular version of a particular implementation of Python.

You might as well not bother passing arguments to functions at all,
since the functions could always crawl the stack for the arguments they
need.A major problem with this is that it constrains the caller to use
particular names for particular function arguments.

What happens if two different functions need arguments of the same name?
Seriously, forget this craziness.

regards
Steve
 
C

Carl Friedrich Bolz

Paul said:
How about a dictionary indexed by by the thread name. It's pretty
lame, though, that the rpc server module itself doesn't make the
request available to the rpc responder. Maybe you should submit a
patch.


You should not rely on anything that implementation specific at all.
What happens if you want to switch to pypy?

Apart from the fact that the idea of walking the stack to get info is
indeed rather crazy, PyPy supports sys._getframe and friends perfectly
fine (I think even Jython does, but I am not quite sure). In general
PyPy tries to implement all these "internals" of CPython as closely as
it is sane to do so. Stuff like inspecting code, function, frame, method
objects is very closely mirrored but of course small differences exist:


cfbolz@gauss:~$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.['__call__', '__class__', '__cmp__', '__delattr__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__',
'__setattr__', '__str__']


cfbolz@gauss:~$ pypy-c-46371-faassen
Python 2.4.1 (pypy 1.0.0 build 46371) on linux2
Type "help", "copyright", "credits" or "license" for more information.['__call__', '__class__', '__delattr__', '__dict__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__',
'__setattr__', '__setstate__', '__str__', '__weakref__', 'func_closure',
'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals',
'func_name']'dir'


Cheers,

Carl Friedrich Bolz
 
C

Carl Friedrich Bolz

Paul said:
How about a dictionary indexed by by the thread name. It's pretty
lame, though, that the rpc server module itself doesn't make the
request available to the rpc responder. Maybe you should submit a
patch.


You should not rely on anything that implementation specific at all.
What happens if you want to switch to pypy?

Apart from the fact that the idea of walking the stack to get info is
indeed rather crazy, PyPy supports sys._getframe and friends perfectly
fine (I think even Jython does, but I am not quite sure). In general
PyPy tries to implement all these "internals" of CPython as closely as
it is sane to do so. Stuff like inspecting code, function, frame, method
objects is very closely mirrored but of course small differences exist:


cfbolz@gauss:~$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.['__call__', '__class__', '__cmp__', '__delattr__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__',
'__setattr__', '__str__']


cfbolz@gauss:~$ pypy-c-46371-faassen
Python 2.4.1 (pypy 1.0.0 build 46371) on linux2
Type "help", "copyright", "credits" or "license" for more information.['__call__', '__class__', '__delattr__', '__dict__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__self__',
'__setattr__', '__setstate__', '__str__', '__weakref__', 'func_closure',
'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals',
'func_name']'dir'


Cheers,

Carl Friedrich Bolz
 
R

Russell Warren

OK, if you crawl the stack I will seek you out and hit you with a big
stick. Does that affect your decision-making?

How big a stick? :)
Seriously, crawling the stack introduces the potential for disaster in
your program, since there is no guarantee that the calling code will
provide the same environment i future released. So at best you tie your
solution to a particular version of a particular implementation of Python.

I'm gathering that the general argument is entirely centered around
portability and future-proofing of code. This certainly makes sense.
I could try and argue that that doesn't matter for write-once-change-
never code, but anything I'd say there might as well be applied to an
argument saying that writing crappy code is actually ok. And then I
would need to be committed for thinking that write-once-change-never
code actually exists. I'm making myself sick as I type this.
You might as well not bother passing arguments to functions at all,
since the functions could always crawl the stack for the arguments they
need.A major problem with this is that it constrains the caller to use
particular names for particular function arguments.

You know, you're right! Arguments are overrated. All future code I
write will be argument free. Just have to get rid of that pesky
"self" now.

I can't shake "'But I came here for an argument!' 'Oh... this is
abuse'" from my mind.
What happens if two different functions need arguments of the same name?
Seriously, forget this craziness.

I will (mostly)... I knew it was bad code and a total hack, I just was
looking for a concise reason as to why.

I appreciate the comments, guys... thanks!
 
D

Diez B. Roggisch

I will (mostly)... I knew it was bad code and a total hack, I just was
looking for a concise reason as to why.

I appreciate the comments, guys... thanks!

There is another one: crawling the stack is O(n), whilst using
thread-local storage is O(1)


Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top