Text-oriented network protocols

Mike Mimic · Apr 23, 2004

Hi!

I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?

Mike

Gregory Toomey · Apr 23, 2004

Mike said:
Hi!

I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?

Mike

Well its easier to debug eg I run my own mail server & just telnet to port
25 to test whether its working properly.

gtoomey

Walter Roberson · Apr 23, 2004

:I have just read in Programming Perl that Network protocols
:should be text based (ASCII, UTF-8 ...). Why is this
:better than binary data? Would not binary data be more
:compact and so it would require less bandwith?

What is correct binary on one machine might not be correct binary on
another machine. Therefore, unless one is deliberately building an
application with the intent to restrict its use to certain kinds of machines
(and you don't intend to upgrade even within your own company!),
then one runs into the possibility that binary data sent from one place
will be misunderstood at the other. There are well-established protocols
to alleviate this problem, such as use xdr transformations on both ends,
but those add noticable ugliness to the code itself, and add noticable
overhead to the size of the bytestream being transferred.

Keep in mind that not all machines even use 2's-complement arithmetic,
that not all machines have the same size of 'int', that not all machines
agree on the byte order within an int, and that there are four different
possible byte orders within a long. And then you start getting into
floating point numbers and have to start worrying about the several
different incompatible IEEE floating point formats, together with the
legacy formats such as FORTRAN G's numbers being different than FORTRAN H's,
and then there are noticable numbers of floating point packages out there
that use formats that are only compatable with other packages from the
same manufacturer.

Have I talked yet about the different ways of handling denormalized numbers?
Or about the fact that it's not unusual for CPUs to have 80 bit floating
point numbers in microcode but compact that into 64 bits as soon as you
do an explicit store into memory, but every once in awhile you run
into a program that stores all 80 bits? Then there are differences
in how long doubles are implimented (the IEEE standards don't firmly
specify how big they are). And sometimes CPUs represent the same number
differently depending on the IEEE rounding mode, or depending on which
floating point exception masks are enabled...

Have you been seduced yet by the Dark Side? The temptation that if one
is going binary anyhow, that one should send *compressed* binary,
compressed with the algorithm you made up 5 minutes ago, or maybe compressed
with a standard enough algorithm except with the headers supressed because
the other end "will know" which compression algorithm is being used?

Have you heard what the latest state of the art is for creating data
archives that are intended to last hundreds or a thousand years and
still be retrievable by whatever the technology of that future day,
whether it be another dark age or an era that likes to store data by
selective modification of the fundamental quantum properties of vacuum
energy? The current state of the art for really *long* term data storage
is.... output the data at a human scale, using selected inks, printed
on good quality paper.

Keep that in mind when you write network protocols: a good simple text
based protocol (like smtp) has a longevity far exceeding complicated
"efficient" binary protocols.

Of course, no-one is saying you can't write a text-based protocol that
negotiates an efficient binary transfer if it so happens that the two
ends have lots in common, but treat that as a lucky optimization rather
than as the primary transport mode.

perl coder · Apr 23, 2004

Walter Roberson said:

The current state of the art for really *long* term data storage
is.... output the data at a human scale, using selected inks, printed
on good quality paper.

I'd go with clay tablets. Don't need to worry about fires then. ;-)

Good points though. And I like text because it lets you send raw
commands directly with netcat, telnet or some other commonplace tool.
Although that does become more difficult if your protocol uses some
fancy authentication mechanism (frex, where you have to calculate some
response value in a timely manner based on received data and a shared
secret).

Walter Roberson · Apr 24, 2004

:> The current state of the art for really *long* term data storage
:> is.... output the data at a human scale, using selected inks, printed
:> on good quality paper.

:I'd go with clay tablets. Don't need to worry about fires then. ;-)

Hah, you still have to worry about fires with clay -- if you *don't*
put the clay in a fire, then the clay is going to melt the first time
it gets wet. ;-)

I've heard some arguments in favour of etched titanium, or chiselled
granite. I seem to recall, though, that granite doesn't last as well
as paper if you have freeze/thaw cycles. I do not recall the argument
against etched titanium... other than the cost, that is.

Mike Mimic · Apr 24, 2004

Hi!

Walter said:
What is correct binary on one machine might not be correct binary on
another machine. Therefore, unless one is deliberately building an
application with the intent to restrict its use to certain kinds of machines
(and you don't intend to upgrade even within your own company!),
then one runs into the possibility that binary data sent from one place
will be misunderstood at the other. There are well-established protocols
to alleviate this problem, such as use xdr transformations on both ends,
but those add noticable ugliness to the code itself, and add noticable
overhead to the size of the bytestream being transferred.

Thanks. Really. I think that I understand now.

I was tempted because it would be protocol for a game. And it would be
nicer if it will not be "cracked" so easily. But anyway. I think that
portability is more important for me.

Mike

Joe Smith · Apr 24, 2004

Mike said:
I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?

Consider the FTP protocol. Once logged in, a typical dialog looks like:
==> cd pub
<== 250 CWD command successful.
==> bin
<== 200 Type set to I (image).
==> get movie.mpg
<== 226 BINARY Transfer complete.
=== 600,000,000 bytes

Here, the commands and responses are in ASCII, the data is transfered
in binary. If you were to use a binary dialog (such as a single byte
command codes instead of "cd " and "get ", and single byte responses),
the total transfer would be 600,000,019 bytes instead of 600,000,117.
A savings of 0.0000163%, but making it very difficult to debug manually.
With commands and responses in ASCII, debugging can be done by using
TELNET to port 21 and parsing the results by eyeball.

Another problem with network protocols in binary has to with sending
multibyte integers. Should the number 258 be sent as 0x0102 (16-bit
big endian), 0x00000102 (32-bit big endian), 0x0201 (16-bit little
endian), 0x02010000 (32-bit little endian), or as "2"+"5"+"8"+CR+LF
(byte-order independent).

Summary: network protocols are more flexible and easier to debug when
the commands and responses are plain text, and only the payload (raw
data) is transfered as a binary stream of single bytes.
-Joe

Mike Mimic · Apr 24, 2004

Hi!

Joe said:
Summary: network protocols are more flexible and easier to debug when
the commands and responses are plain text, and only the payload (raw
data) is transfered as a binary stream of single bytes.

This is not a Perl question anymore but anyway:

How than you know when data stream (binary stream) has ended?
You specify length in advance? But what if length is not known
than yet? How does FTP solve this? Is there some good page about
this?

Mike

James Willmore · Apr 24, 2004

This is not a Perl question anymore but anyway:

In a way, it is

How than you know when data stream (binary stream) has ended? You
specify length in advance? But what if length is not known than yet? How
does FTP solve this? Is there some good page about this?

In *most* cases, there is a module out there to accomplish whatever
networking issue you have (for example, there is
Net::FTP to interact with an FTP server in Perl

). In short - someone
has already solved to problem for you and made code available ... so you
too can solve your problems ... in Perl

Licoln Stein wrote a book entitled "Network Programming with Perl". The
source code for the examples is available at
http://modperl.com:9000/perl_networking/. You could download the examples
and look them over.

I'd also suggest, at least, looking over some of the RFC's on the aspects
of networking that pertain to what you're trying to accomplish
(http://www.rfc.net).

And .... you could use Google to search this newsgroup to find code
examples and discussions.

All of this in pursuit of knowing how to do network programming in Perl

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
"Why is it that we rejoice at a birth and grieve at a funeral?
It is because we are not the person involved" -- Mark Twain

Sherm Pendley · Apr 24, 2004

Mike said:
How than you know when data stream (binary stream) has ended?
You specify length in advance? But what if length is not known
than yet? How does FTP solve this?

I haven't done raw FTP in years - there are modules for that now. If memory
from writing a 16-bit VBX for Windows 3.1 serves, however...

FTP uses two sockets, a command socket and a data socket. The client sends
the transfer request over the command socket. The data socket is then
created (how that happens depends on whether passive mode is used), and the
data sent.

After it finishes sending data, the server sends the "transfer complete" and
"x bytes sent" responses on the command socket. The client can then take
the appropriate action, depending on whether it has received that many
bytes on the data socket.

I don't know if there are any FTP servers that are capable of serving
dynamic content. If so, I would assume they'd handle it differently than an
HTTP server would. HTTP sends the Content-length header before the content,
so it needs to buffer a dynamic response until its final length can be
determined. A dynamic FTP server wouldn't need to do that; it could simply
count the bytes sent and report the total when it's done sending.

As I said though - if memory serves. It's been ten years, so take it with a
whole shaker of salt. ;-)

Is there some good page about this?

Dunno. You could always download the source for the Net::FTP module and have
a look if you're curious about how it might be done in Perl.

sherm--

Walter Roberson · Apr 24, 2004

:I don't know if there are any FTP servers that are capable of serving
:dynamic content. If so, I would assume they'd handle it differently than an
:HTTP server would. HTTP sends the Content-length header before the content,
:so it needs to buffer a dynamic response until its final length can be
:determined.

HTTP servers have other options. They can send a Multipart/related header,
and send a text/html header (with Content-length) for the part that they
already know. The second "part" can then be another text/html or
image/jpeg or whatever header if it is the last part, but it can
be another Multipart header if there are more parts yet to come.

You can do "animation" of an indefinite length using this technique:
just keep sending Multipart headers as prefixes as long as you think
there might be more to send later, and then send the next fixed-length
content as soon as it is available.

Mike Mimic · Apr 24, 2004

Hi!

Walter said:
:I don't know if there are any FTP servers that are capable of serving
:dynamic content. If so, I would assume they'd handle it differently than an
:HTTP server would. HTTP sends the Content-length header before the content,
:so it needs to buffer a dynamic response until its final length can be
:determined.

HTTP servers have other options. They can send a Multipart/related header,
and send a text/html header (with Content-length) for the part that they
already know. The second "part" can then be another text/html or
image/jpeg or whatever header if it is the last part, but it can
be another Multipart header if there are more parts yet to come.

You can do "animation" of an indefinite length using this technique:
just keep sending Multipart headers as prefixes as long as you think
there might be more to send later, and then send the next fixed-length
content as soon as it is available.

Thanks to all of you.

Mike

Juha Laiho · Apr 25, 2004

Mike Mimic said:
I have just read in Programming Perl that Network protocols
should be text based (ASCII, UTF-8 ...). Why is this
better than binary data? Would not binary data be more
compact and so it would require less bandwith?

In case you're concerned about transport speeds, remember that there are
two factors determining speed limits; bandwidth and latency. And as you
mentioned in the later message, what you're planning is a protocol for
a game. Most often these are much more sensitive to latency than to
bandwidth (so, the amount of data transferred is not that great).

As an example, I'm located in Europe, and just tried to ping a host in
the US. With default-sized packets (84 bytes), the average round-trip
time was just under 200 milliseconds. With larger packets (1052 bytes),
the average round-trip time was around 230 milliseconds. My last link
is a residential DSL that in itself seems to be making most of the
difference; pinging the router at my ISP gives me ~21 msecs for default-
sized packets, and ~55 msecs for large packets.

So, effect of packet size depends a lot -- if I was communicating with
someone close by, the latency increase would be huge (multiplying the
round-trip time), but if I was communicating across the Atlantic, the
RTT difference would be about 15% (and this for data amount that is more
than 10 times the original). So, the data throughput for the larger
packets seem to be something like 10 times that of the throughput with
the small packets.

Of course, if your paydata has such huge volume that the volume makes
a bottleneck, by all means do everything you can to reduce that volume.
But as long as the data volume is not an issue, keep the protocol easy
to test and monitor (which most often means text-based).

Measuring a string of text	1	Sep 15, 2022
Scripts	3	Feb 28, 2022
Final chapter of "Learn PHP, MySQL and JavaScript"	3	Jun 4, 2024
Sort by number of characters	1	Nov 2, 2023
Modeling life on Earth –- an object-oriented (Python?) challenge	1	Jul 30, 2013
which async framework?	0	Mar 10, 2014
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 23, 2014
object oriented design question in context of Java program	2	Jun 21, 2012

Text-oriented network protocols

Mike Mimic

Gregory Toomey

Walter Roberson

perl coder

Walter Roberson

Mike Mimic

Joe Smith

Mike Mimic

James Willmore

Sherm Pendley

Walter Roberson

Mike Mimic

Juha Laiho

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads