Text-oriented network protocols

M

Mike Mimic

Hi!

I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?


Mike
 
G

Gregory Toomey

Mike said:
Hi!

I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?


Mike

Well its easier to debug eg I run my own mail server & just telnet to port
25 to test whether its working properly.

gtoomey
 
W

Walter Roberson

:I have just read in Programming Perl that Network protocols
:should be text based (ASCII, UTF-8 ...). Why is this
:better than binary data? Would not binary data be more
:compact and so it would require less bandwith?

What is correct binary on one machine might not be correct binary on
another machine. Therefore, unless one is deliberately building an
application with the intent to restrict its use to certain kinds of machines
(and you don't intend to upgrade even within your own company!),
then one runs into the possibility that binary data sent from one place
will be misunderstood at the other. There are well-established protocols
to alleviate this problem, such as use xdr transformations on both ends,
but those add noticable ugliness to the code itself, and add noticable
overhead to the size of the bytestream being transferred.

Keep in mind that not all machines even use 2's-complement arithmetic,
that not all machines have the same size of 'int', that not all machines
agree on the byte order within an int, and that there are four different
possible byte orders within a long. And then you start getting into
floating point numbers and have to start worrying about the several
different incompatible IEEE floating point formats, together with the
legacy formats such as FORTRAN G's numbers being different than FORTRAN H's,
and then there are noticable numbers of floating point packages out there
that use formats that are only compatable with other packages from the
same manufacturer.

Have I talked yet about the different ways of handling denormalized numbers?
Or about the fact that it's not unusual for CPUs to have 80 bit floating
point numbers in microcode but compact that into 64 bits as soon as you
do an explicit store into memory, but every once in awhile you run
into a program that stores all 80 bits? Then there are differences
in how long doubles are implimented (the IEEE standards don't firmly
specify how big they are). And sometimes CPUs represent the same number
differently depending on the IEEE rounding mode, or depending on which
floating point exception masks are enabled...

Have you been seduced yet by the Dark Side? The temptation that if one
is going binary anyhow, that one should send *compressed* binary,
compressed with the algorithm you made up 5 minutes ago, or maybe compressed
with a standard enough algorithm except with the headers supressed because
the other end "will know" which compression algorithm is being used?


Have you heard what the latest state of the art is for creating data
archives that are intended to last hundreds or a thousand years and
still be retrievable by whatever the technology of that future day,
whether it be another dark age or an era that likes to store data by
selective modification of the fundamental quantum properties of vacuum
energy? The current state of the art for really *long* term data storage
is.... output the data at a human scale, using selected inks, printed
on good quality paper.

Keep that in mind when you write network protocols: a good simple text
based protocol (like smtp) has a longevity far exceeding complicated
"efficient" binary protocols.

Of course, no-one is saying you can't write a text-based protocol that
negotiates an efficient binary transfer if it so happens that the two
ends have lots in common, but treat that as a lucky optimization rather
than as the primary transport mode.
 
P

perl coder

Walter Roberson said:
The current state of the art for really *long* term data storage
is.... output the data at a human scale, using selected inks, printed
on good quality paper.

I'd go with clay tablets. Don't need to worry about fires then. ;-)

Good points though. And I like text because it lets you send raw
commands directly with netcat, telnet or some other commonplace tool.
Although that does become more difficult if your protocol uses some
fancy authentication mechanism (frex, where you have to calculate some
response value in a timely manner based on received data and a shared
secret).
 
W

Walter Roberson

:> The current state of the art for really *long* term data storage
:> is.... output the data at a human scale, using selected inks, printed
:> on good quality paper.

:I'd go with clay tablets. Don't need to worry about fires then. ;-)

Hah, you still have to worry about fires with clay -- if you *don't*
put the clay in a fire, then the clay is going to melt the first time
it gets wet. ;-)

I've heard some arguments in favour of etched titanium, or chiselled
granite. I seem to recall, though, that granite doesn't last as well
as paper if you have freeze/thaw cycles. I do not recall the argument
against etched titanium... other than the cost, that is.
 
M

Mike Mimic

Hi!

Walter said:
What is correct binary on one machine might not be correct binary on
another machine. Therefore, unless one is deliberately building an
application with the intent to restrict its use to certain kinds of machines
(and you don't intend to upgrade even within your own company!),
then one runs into the possibility that binary data sent from one place
will be misunderstood at the other. There are well-established protocols
to alleviate this problem, such as use xdr transformations on both ends,
but those add noticable ugliness to the code itself, and add noticable
overhead to the size of the bytestream being transferred.

Thanks. Really. I think that I understand now.

I was tempted because it would be protocol for a game. And it would be
nicer if it will not be "cracked" so easily. But anyway. I think that
portability is more important for me.


Mike
 
J

Joe Smith

Mike said:
I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?

Consider the FTP protocol. Once logged in, a typical dialog looks like:
==> cd pub
<== 250 CWD command successful.
==> bin
<== 200 Type set to I (image).
==> get movie.mpg
<== 226 BINARY Transfer complete.
=== 600,000,000 bytes

Here, the commands and responses are in ASCII, the data is transfered
in binary. If you were to use a binary dialog (such as a single byte
command codes instead of "cd " and "get ", and single byte responses),
the total transfer would be 600,000,019 bytes instead of 600,000,117.
A savings of 0.0000163%, but making it very difficult to debug manually.
With commands and responses in ASCII, debugging can be done by using
TELNET to port 21 and parsing the results by eyeball.

Another problem with network protocols in binary has to with sending
multibyte integers. Should the number 258 be sent as 0x0102 (16-bit
big endian), 0x00000102 (32-bit big endian), 0x0201 (16-bit little
endian), 0x02010000 (32-bit little endian), or as "2"+"5"+"8"+CR+LF
(byte-order independent).

Summary: network protocols are more flexible and easier to debug when
the commands and responses are plain text, and only the payload (raw
data) is transfered as a binary stream of single bytes.
-Joe
 
M

Mike Mimic

Hi!

Joe said:
Summary: network protocols are more flexible and easier to debug when
the commands and responses are plain text, and only the payload (raw
data) is transfered as a binary stream of single bytes.

This is not a Perl question anymore but anyway:

How than you know when data stream (binary stream) has ended?
You specify length in advance? But what if length is not known
than yet? How does FTP solve this? Is there some good page about
this?


Mike
 
J

James Willmore

This is not a Perl question anymore but anyway:

In a way, it is :)
How than you know when data stream (binary stream) has ended? You
specify length in advance? But what if length is not known than yet? How
does FTP solve this? Is there some good page about this?

In *most* cases, there is a module out there to accomplish whatever
networking issue you have (for example, there is
Net::FTP to interact with an FTP server in Perl :) ). In short - someone
has already solved to problem for you and made code available ... so you
too can solve your problems ... in Perl :)

Licoln Stein wrote a book entitled "Network Programming with Perl". The
source code for the examples is available at
http://modperl.com:9000/perl_networking/. You could download the examples
and look them over.

I'd also suggest, at least, looking over some of the RFC's on the aspects
of networking that pertain to what you're trying to accomplish
(http://www.rfc.net).

And .... you could use Google to search this newsgroup to find code
examples and discussions.

All of this in pursuit of knowing how to do network programming in Perl
:)

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
"Why is it that we rejoice at a birth and grieve at a funeral?
It is because we are not the person involved" -- Mark Twain
 
S

Sherm Pendley

Mike said:
How than you know when data stream (binary stream) has ended?
You specify length in advance? But what if length is not known
than yet? How does FTP solve this?

I haven't done raw FTP in years - there are modules for that now. If memory
from writing a 16-bit VBX for Windows 3.1 serves, however...

FTP uses two sockets, a command socket and a data socket. The client sends
the transfer request over the command socket. The data socket is then
created (how that happens depends on whether passive mode is used), and the
data sent.

After it finishes sending data, the server sends the "transfer complete" and
"x bytes sent" responses on the command socket. The client can then take
the appropriate action, depending on whether it has received that many
bytes on the data socket.

I don't know if there are any FTP servers that are capable of serving
dynamic content. If so, I would assume they'd handle it differently than an
HTTP server would. HTTP sends the Content-length header before the content,
so it needs to buffer a dynamic response until its final length can be
determined. A dynamic FTP server wouldn't need to do that; it could simply
count the bytes sent and report the total when it's done sending.

As I said though - if memory serves. It's been ten years, so take it with a
whole shaker of salt. ;-)
Is there some good page about this?

Dunno. You could always download the source for the Net::FTP module and have
a look if you're curious about how it might be done in Perl.

sherm--
 
W

Walter Roberson

:I don't know if there are any FTP servers that are capable of serving
:dynamic content. If so, I would assume they'd handle it differently than an
:HTTP server would. HTTP sends the Content-length header before the content,
:so it needs to buffer a dynamic response until its final length can be
:determined.

HTTP servers have other options. They can send a Multipart/related header,
and send a text/html header (with Content-length) for the part that they
already know. The second "part" can then be another text/html or
image/jpeg or whatever header if it is the last part, but it can
be another Multipart header if there are more parts yet to come.

You can do "animation" of an indefinite length using this technique:
just keep sending Multipart headers as prefixes as long as you think
there might be more to send later, and then send the next fixed-length
content as soon as it is available.
 
M

Mike Mimic

Hi!

Walter said:
:I don't know if there are any FTP servers that are capable of serving
:dynamic content. If so, I would assume they'd handle it differently than an
:HTTP server would. HTTP sends the Content-length header before the content,
:so it needs to buffer a dynamic response until its final length can be
:determined.

HTTP servers have other options. They can send a Multipart/related header,
and send a text/html header (with Content-length) for the part that they
already know. The second "part" can then be another text/html or
image/jpeg or whatever header if it is the last part, but it can
be another Multipart header if there are more parts yet to come.

You can do "animation" of an indefinite length using this technique:
just keep sending Multipart headers as prefixes as long as you think
there might be more to send later, and then send the next fixed-length
content as soon as it is available.

Thanks to all of you.


Mike
 
J

Juha Laiho

Mike Mimic said:
I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?

In case you're concerned about transport speeds, remember that there are
two factors determining speed limits; bandwidth and latency. And as you
mentioned in the later message, what you're planning is a protocol for
a game. Most often these are much more sensitive to latency than to
bandwidth (so, the amount of data transferred is not that great).

As an example, I'm located in Europe, and just tried to ping a host in
the US. With default-sized packets (84 bytes), the average round-trip
time was just under 200 milliseconds. With larger packets (1052 bytes),
the average round-trip time was around 230 milliseconds. My last link
is a residential DSL that in itself seems to be making most of the
difference; pinging the router at my ISP gives me ~21 msecs for default-
sized packets, and ~55 msecs for large packets.

So, effect of packet size depends a lot -- if I was communicating with
someone close by, the latency increase would be huge (multiplying the
round-trip time), but if I was communicating across the Atlantic, the
RTT difference would be about 15% (and this for data amount that is more
than 10 times the original). So, the data throughput for the larger
packets seem to be something like 10 times that of the throughput with
the small packets.

Of course, if your paydata has such huge volume that the volume makes
a bottleneck, by all means do everything you can to reduce that volume.
But as long as the data volume is not an issue, keep the protocol easy
to test and monitor (which most often means text-based).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top