Packing a simple dictionary into a string - extending struct?

J

Jonathan Fine

Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' - http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?
 
M

Marc 'BlackJack' Rintsch

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

Maybe you can use ConfigObj_ or JSON_ to store that data. Another format
mentioned in the binary XML article you've linked in your post is
`ASN.1`_. And there's a secure alternative to `pickle` called cerealizer_.

... _`ASN.1`: http://pyasn1.sourceforge.net/
... _cerealizer: http://home.gna.org/oomadness/en/cerealizer/
... _ConfigObj: http://www.voidspace.org.uk/python/configobj.html
... _JSON: http://www.json.org/

Ciao,
Marc 'BlackJack' Rintsch
 
S

Sridhar Ratna

Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' - http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).
 
D

Diez B. Roggisch

What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).

There is an implementation available for python called simplejson, available
through easy_install.

Diez
 
J

John Machin

Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' -http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

C:\junk>copy con adict.csv
k1,v1
k2,v2
k3,v3
^Z
1 file(s) copied.

C:\junk>\python25\python
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
C:\junk>type bdict.csv
k3,v3
k2,v2
k1,v1

C:\junk>

Easy enough?
HTH,
John
 
J

Jonathan Fine

Sridhar Ratna said:
What about JSON? You can serialize your dictionary, for example, in
JSON format and then unserialize it in any language that has a JSON
parser (unless it is Javascript).

Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.

So it looks like I'll be using JSON.

Thanks.


Jonathan
 
P

Paddy

Hello

I want to serialise a dictionary, whose keys and values are ordinary strings
(i.e. a sequence of bytes).

I can of course use pickle, but it has two big faults for me.
1. It should not be used with untrusted data.
2. I want non-Python programs to be able to read and write these
dictionaries.

I don't want to use XML because:
1. It is verbose.
2. It forces other applications to load an XML parser.

I've written, in about 80 lines, Python code that will pack and unpack (to
use the language of the struct module) such a dictionary. And then I
thought I might be reinventing the wheel. But so far I've not found
anything much like this out there. (The closest is work related to 'binary
XML' -http://en.wikipedia.org/wiki/Binary_XML.)

So, what I'm looking for is something like and extension of struct that
allows dictionaries to be stored. Does anyone know of any related work?

You could use YAML or KSON then compress the output if size is an
issue.

- Paddy.
 
J

Jonathan Fine

Jonathan said:
Thank you for this suggestion. The growing adoption of JSON in Ajax
programming is a strong argument for my using it in my application, although
I think I'd prefer something a little more binary.

So it looks like I'll be using JSON.

Well, I tried. But I came across two problems (see below).

First, there's bloat. For binary byte data, one average one
character becomes just over 4.

Second, there's the inconvenience. I can't simple take a
sequence of bytes and encode them using JSON. I have to
turn them into Unicode first. And I guess there's a similar
problem at the other end.

So I'm going with me own solution:
http://mathtran.cvs.sourceforge.net/mathtran/py/bytedict.py?revision=1.1&view=markup

It seems to be related to cerializer:
http://home.gna.org/oomadness/en/cerealizer/index.html

It seems to me that JSON works well for Unicode text, but not
with binary data. Indeed, Unicode hides the binary form of
the stored data, presenting only the code points. But I don't
have Unicode strings!

Here's my test script, which is why I'm not using JSON:
===
import simplejson

x = u''
for i in range(256):
x += unichr(i)

print len(simplejson.dumps(x)), '\n'

simplejson.dumps(chr(128))
===

Here's the output
===
1046 # 256 bytes => 256 * 4 + 34 bytes

Traceback (most recent call last):
<snip>
File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte
===
 
J

John Machin

Well, I tried. But I came across two problems (see below).

First, there's bloat. For binary byte data, one average one
character becomes just over 4.

Second, there's the inconvenience. I can't simple take a
sequence of bytes and encode them using JSON. I have to
turn them into Unicode first. And I guess there's a similar
problem at the other end.

So I'm going with me own solution:http://mathtran.cvs.sourceforge.net/mathtran/py/bytedict.py?revision=...

def unpack(bytes, unpack_entry=unpack_entry):
'''Return dictionary gotten by unpacking supplied bytes.
Both keys and values in the returned dictionary are byte-strings.
'''
bytedict = {}
ptr = 0
while 1:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val
if ptr == len(bytes):
break
# That's beautiful code -- as pretty as a cane-toad.
# Well-behaved too, a very elegant response to unpack(pack({}))
# Try this:
blen = len(bytes)
while ptr < blen:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val

return bytedict

HTH,
John
 
J

Jonathan Fine

John said:
def unpack(bytes, unpack_entry=unpack_entry):
'''Return dictionary gotten by unpacking supplied bytes.
Both keys and values in the returned dictionary are byte-strings.
'''
bytedict = {}
ptr = 0
while 1:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val
if ptr == len(bytes):
break
# That's beautiful code -- as pretty as a cane-toad.

Well, it's nearly right. It has a transposition error.
# Well-behaved too, a very elegant response to unpack(pack({}))

Yes, you're right. An attempt to read bytes that aren't there.
# Try this:
blen = len(bytes)
while ptr < blen:
key, val, ptr = unpack_entry(bytes, ptr)
bytedict[key] = val

return bytedict

I've committed such a change. Thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top