A neat trick to serialize arrays and hashes

J

J. Romano

Dear Perl community,

Today I invented a neat new trick that I thought I'd share with
everyone here.

But before I continue, I'd like to point out to anyone out there
who thinks that my trick is "obvious to everyone but inexperienced
programmers" or that "it's not worth knowing because better approaches
exist" that some people enjoy learning a new simple trick, even if
they never get a chance to apply it. Besides, sharing a trick that
was just discovered (even if most programmers already know about it)
has the benefit of educating any programmer who, for some reason or
another, happens to not be aware of that particular technique. So if
you really must reply saying that you already knew this trick, instead
of saying how it didn't help you at all, how about sharing something
else that might be useful to someone in the Perl community? That
would be much appreciated.

Anyway, now that I'm off my soap box, here is what I discovered
this morning:

The pack string "(w/a*)*" is useful for serializing arrays and
hashes -- that is, it can pack and unpack arrays and hashes to and
from a string. Let me explain in more detail:

I have an array, which holds the names of some animals:

@a = ("dog", "cat", "bird", "camel", "giraffe");

I might want to serialize @a into a string for the purpose of storing
it off into a file so I can retrieve it later. Well, I could use the
Data::Dumper module to create a string (and later the eval command to
extract out the reference which then I can assign to the array), but
that can get complicated if I don't have much experience using the
Data::Dumper module.

Well, using the pack string "(w/a*)*" I can easily serialize the
array into a string like so:

$string = pack("(w/a*)*", @a);

Now $string contains all the encoded information needed to reconstruct
the @a array. So if I wanted to use $string to create a @b array that
was identical to the @a array, I can use unpack() with the same pack
string:

@b = unpack("(w/a*)*", $string);

Neat, doncha think? This same technique also works with hashes:

$string = pack("(w/a*)*", %ENV);
%wow = unpack("(w/a*)*", $string);
# The %wow hash is now an exact copy of %ENV

Now that we have a string representation of an array or hash, we
can save the string to a file, send it over a socket, or even encrypt
it using some encryption algorithm.

This approach can even handle arrays (and hashes) that contain
scalars consisting of newlines, null-bytes, and other unprintable
characters!

There are a few important items to point out:

1. The serialized string will most likely contain
non-printable characters, which may include some
newline characters, even if no scalar in the
original array/hash contains a "\n" character.
Because of this, you should use the binmode()
function on any filehandle you plan to print the
string out to.

2. If the array or hash contains any numbers, they
will be converted to their string representation.

3. This technique only handles simple arrays and hashes.
In other words, multi-dimensional arrays and hashes,
lists of lists, an references are not handled
correctly. If you really want to serialize a
complex structure such as one of these, I recommend
using another approach, like taking advantage of
the Data::Dumper module. You CAN however, create
an array of these serialized arrays, and serialize
that array!

4. The "w" in the pack string "(w/a*)*" allows for the
encoding of any arbitrary-length string, even if it
is longer than 0xffffffff bytes (4,294,967,295
bytes). But since "w" is only used for encoding
non-negative integers, the "(w/a*)*" pack string
cannot be used to encode arrays or hashes
containing negative-length strings. Fortunately,
that's never been a problem for me. :)

5. I do not know if this trick can handle arrays
and hashes containing Unicode strings. My guess
is that it can, but I haven't tested it so I can't
say for sure.

Anyway, that's my trick that I thought I would share with the rest
of you. Have fun with it!

-- Jean-Luc Romano
 
M

Matt Garrish

There are a few important items to point out:

1. The serialized string will most likely contain
non-printable characters, which may include some
newline characters, even if no scalar in the
original array/hash contains a "\n" character.
Because of this, you should use the binmode()
function on any filehandle you plan to print the
string out to.

2. If the array or hash contains any numbers, they
will be converted to their string representation.

3. This technique only handles simple arrays and hashes.
In other words, multi-dimensional arrays and hashes,
lists of lists, an references are not handled
correctly. If you really want to serialize a
complex structure such as one of these, I recommend
using another approach, like taking advantage of
the Data::Dumper module. You CAN however, create
an array of these serialized arrays, and serialize
that array!

4. The "w" in the pack string "(w/a*)*" allows for the
encoding of any arbitrary-length string, even if it
is longer than 0xffffffff bytes (4,294,967,295
bytes). But since "w" is only used for encoding
non-negative integers, the "(w/a*)*" pack string
cannot be used to encode arrays or hashes
containing negative-length strings. Fortunately,
that's never been a problem for me. :)

5. I do not know if this trick can handle arrays
and hashes containing Unicode strings. My guess
is that it can, but I haven't tested it so I can't
say for sure.

Sorry to rain on your parade, but with all the caveats don't you think it
would be better just to use the Storable module, especially since it's part
of the core distribution? Better techniques are worth noting for the simple
reason that they're better...

Matt
 
B

Ben Morrow

Quoth (e-mail address removed) (J. Romano):
3. This technique only handles simple arrays and hashes.
In other words, multi-dimensional arrays and hashes,
lists of lists, an references are not handled
correctly. If you really want to serialize a
complex structure such as one of these, I recommend
using another approach, like taking advantage of
the Data::Dumper module. You CAN however, create
an array of these serialized arrays, and serialize
that array!

....however, you can't then unserialize it, as the references have been
stringified and can't be converted back to refs. Yet another reason the
use Storable, which does this right...

Ben
 
T

Tassilo v. Parseval

Also sprach J. Romano:

[...]
Well, using the pack string "(w/a*)*" I can easily serialize the
array into a string like so:

$string = pack("(w/a*)*", @a);

Now $string contains all the encoded information needed to reconstruct
the @a array. So if I wanted to use $string to create a @b array that
was identical to the @a array, I can use unpack() with the same pack
string:

@b = unpack("(w/a*)*", $string);

Neat, doncha think? This same technique also works with hashes:

$string = pack("(w/a*)*", %ENV);
%wow = unpack("(w/a*)*", $string);
# The %wow hash is now an exact copy of %ENV

Now that we have a string representation of an array or hash, we
can save the string to a file, send it over a socket, or even encrypt
it using some encryption algorithm.
[...]

5. I do not know if this trick can handle arrays
and hashes containing Unicode strings. My guess
is that it can, but I haven't tested it so I can't
say for sure.

It can, but there's a slight drawback: You'll lose the UTF-8 flag when
unpacking the string:

$ perl -MDevel::peek -Mcharnames=:full
Dump((unpack "(w/a*)*", pack "(w/a*)*", "\N{EURO-CURRENCY SIGN}123")[0]);
^D
SV = PV(0x8139efc) at 0x8144cc4
REFCNT = 1
FLAGS = (TEMP,POK,pPOK)
PV = 0x8140020 "\342\202\240123"\0
CUR = 6
LEN = 7
$ perl -MDevel::peek -Mcharnames=:full
Dump("\N{EURO-CURRENCY SIGN}123");
^D
SV = PV(0x8174280) at 0x814891c
REFCNT = 1
FLAGS = (POK,READONLY,pPOK,UTF8)
PV = 0x81494a8 "\342\202\240123"\0 [UTF8 "\x{20a0}123"]
CUR = 6
LEN = 7

That's a bit of a problem because you can't tell whether
"\342\202\240123" is just a sequence of bytes or whether it happens to
be a unicode string.
Anyway, that's my trick that I thought I would share with the rest
of you. Have fun with it!

Very nice, thank you. I'm quite a fan of pack/unpack and so I love every
trick involving those.

Tassilo
 
J

J. Romano

Matt Garrish said:
Sorry to rain on your parade, but with
all the caveats don't you think it
would be better just to use the Storable
module, especially since it's part
of the core distribution? Better
techniques are worth noting for the simple
reason that they're better...

I remember learning about Storable a while back, but I didn't think
it was part of the core distribution. When I run:

perl -e "use Storable"

I get: Can't locate Storable.pm in @INC (...)

In case you're wondering, the first line of my "perl -v" output is:

This is perl, v5.6.1 built for i386-linux

It could be that I'm using an old version of Perl. So for those
who are using a version too old to have the Storable module (and, for
some reason, are unable to install that module), they can still use
the "(w/a*)*" serialization trick in a pinch.

-- Jean-Luc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top