I agree with Yannick. I've been involved in debugging an ASN.1
encoding error since back in June, and we're still not done. It's easy
to see that the encoding is incorrect, but very hard to see why and
where it goes wrong.
Look at some of the popular protocols: HTTP, SMTP, NNTP. Very easy to
parse; even easier to read manually during debugging (which I assume
was what Yannick was thinking of).
Then you'd have to maintain two protocols: the binary you use live,
and a text protocol, with a debugging utility which converts to/from
the binary protocol.
Plus, that tool would be useless when you're looking at captured
network traffic, whereas tcpdump, tcpflow, Wireshark and similar tools
handle text-based protocols quite well.
I *really* recommend text-based protocols for all normal uses. If
it's too much data, add an optional zlib compression layer.
Probably this is typical bait to one very old argued to death
discussion again?
I have seen lot of strong suggestions and recommendations to use text
protocols, OTOH most real things that are not meant to transport human
readable/editable text/script/config files (like smpt, nntp and http
are) are in practice using binary protocols as rule. I think they do
it very correctly.
I have worked with both types of protocols and i have noticed that
amounts of problems with text are bigger. On cases when products made
by different organizations are using same protocol (or data files) the
integration works on case of text were greater. So now i agree with
text protocol only when it is for reading/editing by human or it is
XML with well defined schema (like WXS or XSD). How complex is text
can be easily realized by reviewing Apache xerces code-base.
1) Size. Binary formats zip usually to even smaller results. Also size
of unpacked binary may be predicted easier.
2) Navigability. Text needs to be read and parsed to navigate in it.
For example to bypass irrelevant (sometimes huge) parts of it.
3) Comparison. There are lots of text variants that can be considered
equal with each other. Text (if it is ever meant to be read or edited
by a human) is often formatted to provide better readability. For
example white space indents are added at will, or trailing zeroes /
thousands separators for numbers.
4) Binary compatibility. Sometimes there is need to apply some sort of
digital processing. For example a need to digitally sign parts of
data. That is achieved with algorithm to turn it (or its sub-parts)
into a "canonical form". The endianness/byte size etc. code for binary
format is a child play if to compare with algorithm converting a text
to some canonical form.
5) Usability in human interface. Computer-generated text is not really
readable (especially the canonical form) for human and so there is
tendency to add beautifier and localization algorithms or to convert
it into some graphical form.
Finally i have realized that it is way more complex to mess with text
than to use binary and to convert from a binary to text (or tree, or
graph, or table) where needed. This is not the case with XML since
there are relatively good libraries supporting it, but other text i
prefer to avoid where only i can. XML i like only if i may use XML-
processing libraries in product.