network programming: how does s.accept() work?

7

7stud

I have the following two identical clients

#test1.py:-----------
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = 'localhost'
port = 5052 #server port

s.connect((host, port))
print s.getsockname()

response = []
while 1:
piece = s.recv(1024)
if piece == '':
break

response.append(piece)


#test3.py:----------------
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = 'localhost'
port = 5052 #server port

s.connect((host, port))
print s.getsockname()

response = []
while 1:
piece = s.recv(1024)
if piece == '':
break

response.append(piece)


and this basic server:

#test2.py:--------------
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = ''
port = 5052

s.bind((host, port))
s.listen(5)


while 1:
newsock, client_addr = s.accept()
print "orignal socket:", s.getsockname()

print "new socket:", newsock.getsockname()
print "new socket:", newsock.getpeername()
print


I started the server, and then I started the clients one by one. I
expected both clients to hang since they don't get notified that the
server is done sending data, and I expected the server output to show
that accept() created two new sockets. But this is the output I got
from the server:

original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5052)
new socket, peer: ('127.0.0.1', 50816)

original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5052)
new socket, peer: ('127.0.0.1', 50818)

The first client I started generated this output:

('127.0.0.1', 50816)

And when I ran the second client, the first client disconnected, and
the second client produced this output:

('127.0.0.1', 50818)

and then the second client hung. I expected the server output to be
something like this:

original socket: ('127.0.0.1', 5052)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 50816)

original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5054)
new socket, peer: ('127.0.0.1', 50818)

And I expected both clients to hang. Can someone explain how accept()
works?
 
B

bockman

I have the following two identical clients

#test1.py:-----------
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = 'localhost'
port = 5052  #server port

s.connect((host, port))
print s.getsockname()

response = []
while 1:
    piece = s.recv(1024)
    if piece == '':
        break

    response.append(piece)

#test3.py:----------------
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = 'localhost'
port = 5052  #server port

s.connect((host, port))
print s.getsockname()

response = []
while 1:
    piece = s.recv(1024)
    if piece == '':
        break

    response.append(piece)

and this basic server:

#test2.py:--------------
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = ''
port = 5052

s.bind((host, port))
s.listen(5)

while 1:
    newsock, client_addr = s.accept()
    print "orignal socket:", s.getsockname()

    print "new socket:", newsock.getsockname()
    print "new socket:", newsock.getpeername()
    print

I started the server, and then I started the clients one by one.  I
expected both clients to hang since they don't get notified that the
server is done sending data, and I expected the server output to show
that accept() created two new sockets.  But this is the output I got
from the server:

original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5052)
new socket, peer: ('127.0.0.1', 50816)

original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5052)
new socket, peer: ('127.0.0.1', 50818)

The first client I started generated this output:

('127.0.0.1', 50816)

And when I ran the second client, the first client disconnected, and
the second client produced this output:

('127.0.0.1', 50818)

and then the second client hung.  I expected the server output to be
something like this:

original socket: ('127.0.0.1', 5052)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 50816)

original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5054)
new socket, peer: ('127.0.0.1', 50818)

And I expected both clients to hang.  Can someone explain how accept()
works?

I guess (but I did not try it) that the problem is not accept(), that
should work as you expect,
but the fact that at the second connection your code actually throws
away the first connection
by reusing the same variables without storing the previous values.
This could make the Python
garbage collector to attempt freeing the socket object created with
the first connection, therefore
closing the connection.

If I'm right, your program should work as you expect if you for
instance collect in a list the sockets
returned by accept.

Ciao
 
7

7stud

I have the following two identical clients
#test1.py:-----------
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = 'localhost'
port = 5052  #server port
s.connect((host, port))
print s.getsockname()
response = []
while 1:
    piece = s.recv(1024)
    if piece == '':
        break
    response.append(piece)
#test3.py:----------------
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = 'localhost'
port = 5052  #server port
s.connect((host, port))
print s.getsockname()
response = []
while 1:
    piece = s.recv(1024)
    if piece == '':
        break
    response.append(piece)
and this basic server:
#test2.py:--------------
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = ''
port = 5052
s.bind((host, port))
s.listen(5)
while 1:
    newsock, client_addr = s.accept()
    print "orignal socket:", s.getsockname()
    print "new socket:", newsock.getsockname()
    print "new socket:", newsock.getpeername()
    print
I started the server, and then I started the clients one by one.  I
expected both clients to hang since they don't get notified that the
server is done sending data, and I expected the server output to show
that accept() created two new sockets.  But this is the output I got
from the server:
original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5052)
new socket, peer: ('127.0.0.1', 50816)
original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5052)
new socket, peer: ('127.0.0.1', 50818)
The first client I started generated this output:
('127.0.0.1', 50816)
And when I ran the second client, the first client disconnected, and
the second client produced this output:
('127.0.0.1', 50818)
and then the second client hung.  I expected the server output to be
something like this:
original socket: ('127.0.0.1', 5052)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 50816)
original socket: ('0.0.0.0', 5052)
new socket, self: ('127.0.0.1', 5054)
new socket, peer: ('127.0.0.1', 50818)
And I expected both clients to hang.  Can someone explain how accept()
works?

I guess (but I did not try it) that the problem is not accept(), that
should work as you expect,
but the fact that at the second connection your code actually throws
away the first connection
by reusing the same variables without storing the previous values.
This could make the Python
garbage collector to attempt freeing the socket object created with
the first connection, therefore
closing the connection.

If I'm right, your program should work as you expect if you for
instance collect in a list the sockets
returned by accept.

Ciao

The question I'm really trying to answer is: if a client connects to a
host at a specific port, but the server changes the port when it
creates a new socket with accept(), how does data sent by the client
arrive at the correct port? Won't the client be sending data to the
original port e.g. port 5052 in the client code above?
 
7

7stud

by reusing the same variables without storing the previous values.
This could make the Python
garbage collector to attempt freeing the socket object created with
the first connection, therefore
closing the connection.

If I'm right, your program should work as you expect if you for
instance collect in a list the sockets
returned by accept.

Yes, you are right about that. This code prevents the first client
from disconnecting:

newsocks = []
client_addys = []

while 1:
newsock, client_addr = s.accept()
newsocks.append(newsock)
client_addys.append(client_addr)

print "original socket:", s.getsockname()

print "new socket, self:", newsock.getsockname()
print "new socket, peer:", newsock.getpeername()
print
 
7

7stud

The question I'm really trying to answer is: if a client connects to a
host at a specific port, but the server changes the port when it
creates a new socket with accept(), how does data sent by the client
arrive at the correct port?  Won't the client be sending data to the
original port e.g. port 5052 in the client code above?

If I change the clients to this:


import socket
import time
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = 'localhost'
port = 5052 #server port

print s.getsockname() #<------------NEW LINE
s.connect((host, port))
print s.getsockname()

response = []
while 1:
piece = s.recv(1024)
if piece == '':
break

response.append(piece)


Then I get output from the clients like this:

('0.0.0.0', 0)
('127.0.0.1', 51439)

('0.0.0.0', 0)
('127.0.0.1', 51440)


The server port 5052(i.e. the one used in connect()) is not listed
there. That output indicates that the client socket is initially
created with some place holder values, i.e. 0.0.0.0, 0. Then accept()
apparently sends a message back to the client that in effect says,
"Hey, in the future send me data on port 51439." Then the client
fills in that port along with the ip address in its socket object.
Thereafter, any data sent using that socket is sent to that port and
that ip address.
 
B

bockman

The question I'm really trying to answer is: if a client connects to a
host at a specific port, but the server changes the port when it
creates a new socket with accept(), how does data sent by the client
arrive at the correct port?  Won't the client be sending data to the
original port e.g. port 5052 in the client code above?

I'm not an expert, never used TCP/IP below the socket abstraction
level, but I imagine
that after accept, the client side of the connection is someow
'rewired' with the new
socket created on the server side.

Anyhow, this is not python-related, since the socket C library behaves
exactly in the same way.

Ciao
 
7

7stud

The question I'm really trying to answer is: if a client connects to a
host at a specific port, but the server changes the port when it
creates a new socket with accept(), how does data sent by the client
arrive at the correct port?  Won't the client be sending data to the
original port e.g. port 5052 in the client code above?

If I change the clients to this:

import socket
import time
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

host = 'localhost'
port = 5052  #server port

print s.getsockname()  #<------------NEW LINE
s.connect((host, port))
print s.getsockname()

response = []
while 1:
    piece = s.recv(1024)
    if piece == '':
        break

    response.append(piece)

Then I get output from the clients like this:

('0.0.0.0', 0)
('127.0.0.1', 51439)

('0.0.0.0', 0)
('127.0.0.1', 51440)

The server port 5052(i.e. the one used in connect()) is not listed
there. That output indicates that the client socket is initially
created with some place holder values, i.e. 0.0.0.0, 0.  Then accept()
apparently sends a message back to the client that in effect says,
"Hey, in the future send me data on port 51439."  Then the client
fills in that port along with the ip address in its socket object.
Thereafter, any data sent using that socket is sent to that port and
that ip address.

Implicit in that description is that the client must get assigned a
port number before the call to connect(), or maybe the call to
connect() assigns a port number to the client. In any case, the
server has to receive both the client's ip address and port number as
part of the client's request for a connection in order for the server
to know where to send the response. The server then uses that ip
address and port number to send back a message to the client that
tells the client what port number future communications need to be
sent to.
 
T

Thomas Bellman

7stud said:
The question I'm really trying to answer is: if a client connects to a
host at a specific port, but the server changes the port when it
creates a new socket with accept(), how does data sent by the client
arrive at the correct port? Won't the client be sending data to the
original port e.g. port 5052 in the client code above?

The answer is that the server *doesn't* change its port. As you
could see in the output of your server, the socket that accept()
returned also had local port 5052. Each *client* will however
get a unique local port at *its* end.

A TCP connection is identified by a four-tuple:

( localaddr, localport, remoteaddr, remoteport )

Note that what is local and what is remote is relative to which
process you are looking from. If the four-tuple for a specific
TCP connection is ( 127.0.0.1, 5052, 127.0.0.1, 50816 ) in your
server, it will be ( 127.0.0.1, 50816, 127.0.0.1, 5052 ) in the
client for the very same TCP connection.

Since your client hasn't bound its socket to a specific port, the
kernel will chose a local port for you when you do a connect().
The chosen port will be more or less random, but it will make
sure that the four-tuple identifying the TCP connection will be
unique.
 
7

7stud

The answer is that the server *doesn't* change its port.  As you
could see in the output of your server, the socket that accept()
returned also had local port 5052.  Each *client* will however
get a unique local port at *its* end.

A TCP connection is identified by a four-tuple:

    ( localaddr, localport, remoteaddr, remoteport )

Note that what is local and what is remote is relative to which
process you are looking from.  If the four-tuple for a specific
TCP connection is ( 127.0.0.1, 5052, 127.0.0.1, 50816 ) in your
server, it will be ( 127.0.0.1, 50816, 127.0.0.1, 5052 ) in the
client for the very same TCP connection.

Since your client hasn't bound its socket to a specific port, the
kernel will chose a local port for you when you do a connect().
The chosen port will be more or less random, but it will make
sure that the four-tuple identifying the TCP connection will be
unique.

You seem to be describing what I see:

----server output-----
original socket: ('0.0.0.0', 5053)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 49302)

original socket: ('0.0.0.0', 5053)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 49303)

---client1 output-----
('0.0.0.0', 0)
('127.0.0.1', 49302)

---client2 output-----
('0.0.0.0', 0)
('127.0.0.1', 49303)


But your claim that the server doesn't change its port flies in the
face of every description I've read about TCP connections and
accept(). The articles and books I've read all claim that the server
port 5053 is a 'listening' port only. Thereafter, when a client sends
a request for a connection to the listening port, the accept() call on
the server creates a new socket for communication between the client
and server, and then the server goes back to listening on the original
socket. Here are two sources for that claim:

Socket Programming How To:
http://www.amk.ca/python/howto/sockets/

Tutorial on Network Programming with Python:
http://heather.cs.ucdavis.edu/~matloff/Python/PyNet.pdf

In either case, there are still some things about the output that
don't make sense to me. Why does the server initially report that its
ip address is 0.0.0.0:

original socket: ('0.0.0.0', 5053)

I would expect the reported ip address to be '127.0.0.1'. Also, since
a socket is uniquely identified by an ip address and port number, then
the ('0.0.0.0', 5053) socket is not the same as this socket:

new socket, self: ('127.0.0.1', 5053)
 
G

Grant Edwards

But your claim that the server doesn't change its port flies in the
face of every description I've read about TCP connections and
accept().

Then the descriptions are wrong.
The articles and books I've read all claim that the server
port 5053 is a 'listening' port only.

Not true.
Thereafter, when a client sends a request for a connection to
the listening port, the accept() call on the server creates a
new socket for communication between the client and server,

True. But, it doesn't change the local port number.

Both the listing socket and the connected socket are using
local port number 5053.
and then the server goes back to listening on the original
socket.

That's true.
I would expect the reported ip address to be '127.0.0.1'.
Also, since a socket is uniquely identified by an ip address
and port number,

It isn't.

1) You seem to be conflating sockets and TCP connections. A
socket is a kernel-space data structure used to provide a
user-space API to the network stack. In user-space it's
identified by an integer index into a per-process table of
file-like-objects. That socket may or may not have a TCP
connection associated with it. It may or may not be bound
to an IP address and/or port. It is not uniquely identified
by an IP address and port number.

2) A tcp connection is a _different_ thing (though it also
corresponds to a kernel-space data structure), and as Thomas
said, it is uniquely identified by the a four-tuple:

    (localaddr, localport, remoteaddr, remoteport)

[Technically, it's probably a 5-tuple with the above
elements along with a 'connection type' element, but since
we're only discussing TCP in this thread, we can ignore the
connection type axis and only consider the 4-axis space of
TCP connections.]

When a second client connects to the server on port 5053,
the first two elements in the tuple will be the same. One
or both of the last two elements will be different.
then the ('0.0.0.0', 5053) socket is not the same as this
socket:

new socket, self: ('127.0.0.1', 5053)

Referring to sockets using that notation doesn't really make
sense. There can be more than one socket associated with the
local address ('127.0.0.1', 5053) or to any other ip/port tuple
you'd like to pick.
 
G

Gabriel Genellina

In either case, there are still some things about the output that
don't make sense to me. Why does the server initially report that its
ip address is 0.0.0.0:

original socket: ('0.0.0.0', 5053)

Because you called "bind" with None (or '' ?) as its first argument; that
means: "listen on any available interface"
I would expect the reported ip address to be '127.0.0.1'. Also, since
a socket is uniquely identified by an ip address and port number, then
the ('0.0.0.0', 5053) socket is not the same as this socket:

new socket, self: ('127.0.0.1', 5053)

You got this *after* a connection was made, coming from your own PC.
127.0.0.1 is your "local" IP; the name "localhost" should resolve to that
number. If you have a LAN, try running the client on another PC. Or
connect to Internet and run the "netstat" command to see the connected
pairs.
 
R

Roy Smith

7stud said:
But your claim that the server doesn't change its port flies in the
face of every description I've read about TCP connections and
accept(). The articles and books I've read all claim that the server
port 5053 is a 'listening' port only. Thereafter, when a client sends
a request for a connection to the listening port, the accept() call on
the server creates a new socket for communication between the client
and server, and then the server goes back to listening on the original
socket.

You're confusing "port" and "socket".

A port is an external thing. It exists in the minds and hearts of packets
on the network, and in the RFCs which define the TCP protocol (UDP too, but
let's keep this simple).

A socket is an internal thing. It is a programming abstraction. Sockets
can exist that aren't bound to ports, and several different sockets can be
bound to the same port. Just like there can be multiple file descriptors
which are connected to a given file on disk.

The TCP protocol defines a state machine which determines what packets
should be sent in response when certain kinds of packets get received. The
protocol doesn't say how this state machine should be implemented (or even
demands that it be implemented at all). It only requires that a TCP host
behave in a way which the state machine defines.

In reality, whatever operating system you're running on almost certainly
implements in the kernel a state machine as described by TCP. That state
machine has two sides. On the outside is the network interface, which
receives and transmits packets. On the inside is the socket interface to
user-mode applications. The socket is just the API by which a user program
interacts with the kernel to get it to do the desired things on the network
interface(s).

Now, what the articles and books say is that there is a listening SOCKET.
And when you accept a connection on that socket (i.e. a TCP three-way
handshake is consummated on the network), the way the socket API deals with
that is to generate a NEW socket (via the accept system call). There
really isn't any physical object that either socket represents. They're
both just programming abstractions.

Does that help?
 
R

Roy Smith

"Gabriel Genellina said:
Because you called "bind" with None (or '' ?) as its first argument; that
means: "listen on any available interface"

It really means, "Listen on ALL available interfaces".
 
S

Steve Holden

7stud said:
You seem to be describing what I see:

----server output-----
original socket: ('0.0.0.0', 5053)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 49302)

original socket: ('0.0.0.0', 5053)
new socket, self: ('127.0.0.1', 5053)
new socket, peer: ('127.0.0.1', 49303)

---client1 output-----
('0.0.0.0', 0)
('127.0.0.1', 49302)

---client2 output-----
('0.0.0.0', 0)
('127.0.0.1', 49303)


But your claim that the server doesn't change its port flies in the
face of every description I've read about TCP connections and
accept(). The articles and books I've read all claim that the server
port 5053 is a 'listening' port only. Thereafter, when a client sends
a request for a connection to the listening port, the accept() call on
the server creates a new socket for communication between the client
and server, and then the server goes back to listening on the original
socket. Here are two sources for that claim:
There can be many TCP connections to a server all using the same
endpoint. Take a look at the traffic coming out of any busy web server:
everything that comes out of the same server comes from port 80. That
doesn't stop it listening for more connections on port 80.

The server disambiguates the packets when it demultiplexes the
connection packet streams by using the remote endpoint to differentiate
between packets that are part of different connections. TCP guarantees
that no two ephemeral port connections from the same client will use the
same port.

regards
Steve
 
R

Roy Smith

Steve Holden said:
TCP guarantees
that no two ephemeral port connections from the same client will use the
same port.

Where "client" is defined as "IP Address". You could certainly have a
remote machine that has multiple IP addresses using the same remote port
number on different IP addresses for simultaneous connections to the same
local port.
 
S

Steve Holden

Roy said:
Where "client" is defined as "IP Address". You could certainly have a
remote machine that has multiple IP addresses using the same remote port
number on different IP addresses for simultaneous connections to the same
local port.

Correct.
 
7

7stud

There can be many TCP connections to a server all using the same
endpoint. Take a look at the traffic coming out of any busy web server:
everything that comes out of the same server comes from port 80. That
doesn't stop it listening for more connections on port 80.

---
When you surf the Web, say to http://www.google.com, your Web browser
is a client. The program you contact at Google is a server. When a
server is run, it sets up business at a certain port, say 80 in the
Web case. It then waits for clients to contact it. When a client does
so, the server will usually assign a new port, say 56399, specifically
for communication with that client, and then resume watching port 80
for new requests.
 
7

7stud

You're confusing "port" and "socket".

A port is an external thing.  It exists in the minds and hearts of packets
on the network, and in the RFCs which define the TCP protocol (UDP too, but
let's keep this simple).

A socket is an internal thing.  It is a programming abstraction.  Sockets
can exist that aren't bound to ports, and several different sockets can be
bound to the same port.  Just like there can be multiple file descriptors
which are connected to a given file on disk.

The TCP protocol defines a state machine which determines what packets
should be sent in response when certain kinds of packets get received.  The
protocol doesn't say how this state machine should be implemented (or even
demands that it be implemented at all).  It only requires that a TCP host
behave in a way which the state machine defines.

In reality, whatever operating system you're running on almost certainly
implements in the kernel a state machine as described by TCP.  That state
machine has two sides.  On the outside is the network interface, which
receives and transmits packets.  On the inside is the socket interface to
user-mode applications.  The socket is just the API by which a user program
interacts with the kernel to get it to do the desired things on the network
interface(s).

Now, what the articles and books say is that there is a listening SOCKET.  
And when you accept a connection on that socket (i.e. a TCP three-way
handshake is consummated on the network), the way the socket API deals with
that is to generate a NEW socket (via the accept system call).  There
really isn't any physical object that either socket represents.  They're
both just programming abstractions.

Does that help?

If two sockets are bound to the same host and port on the server, how
does data sent by the client get routed? Can both sockets recv() the
data?
 
H

Hrvoje Niksic

7stud said:
When you surf the Web, say to http://www.google.com, your Web browser
is a client. The program you contact at Google is a server. When a
server is run, it sets up business at a certain port, say 80 in the
Web case. It then waits for clients to contact it. When a client does
so, the server will usually assign a new port, say 56399, specifically
for communication with that client, and then resume watching port 80
for new requests.

Actually the client is the one that allocates a new port. All
connections to a server remain on the same port, the one it listens
on:
# now, connect to port 10000 from elsewhere
('127.0.0.1', 10000) # note the same port, not a different one
 
F

Frank Millman

7stud said:
If two sockets are bound to the same host and port on the server, how
does data sent by the client get routed? Can both sockets recv() the
data?

I have learned a lot of stuff I did not know before from this thread,
so I think I can answer that.

There must be a layer of software that listens at the port. When it
receives a packet, it can tell which client sent the packet (host and
port number). It uses that to look up which socket is handling that
particular connection, and passes it as input to that socket.

Therefore each socket only receives its own input, and is not aware of
anything received for any other connection.

Not a technical explanation, but I think it describes what happens.

Frank Millman
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top