J
J. Romano
Dear Perl community,
Today I invented a neat new trick that I thought I'd share with
everyone here.
But before I continue, I'd like to point out to anyone out there
who thinks that my trick is "obvious to everyone but inexperienced
programmers" or that "it's not worth knowing because better approaches
exist" that some people enjoy learning a new simple trick, even if
they never get a chance to apply it. Besides, sharing a trick that
was just discovered (even if most programmers already know about it)
has the benefit of educating any programmer who, for some reason or
another, happens to not be aware of that particular technique. So if
you really must reply saying that you already knew this trick, instead
of saying how it didn't help you at all, how about sharing something
else that might be useful to someone in the Perl community? That
would be much appreciated.
Anyway, now that I'm off my soap box, here is what I discovered
this morning:
The pack string "(w/a*)*" is useful for serializing arrays and
hashes -- that is, it can pack and unpack arrays and hashes to and
from a string. Let me explain in more detail:
I have an array, which holds the names of some animals:
@a = ("dog", "cat", "bird", "camel", "giraffe");
I might want to serialize @a into a string for the purpose of storing
it off into a file so I can retrieve it later. Well, I could use the
Data:umper module to create a string (and later the eval command to
extract out the reference which then I can assign to the array), but
that can get complicated if I don't have much experience using the
Data:umper module.
Well, using the pack string "(w/a*)*" I can easily serialize the
array into a string like so:
$string = pack("(w/a*)*", @a);
Now $string contains all the encoded information needed to reconstruct
the @a array. So if I wanted to use $string to create a @b array that
was identical to the @a array, I can use unpack() with the same pack
string:
@b = unpack("(w/a*)*", $string);
Neat, doncha think? This same technique also works with hashes:
$string = pack("(w/a*)*", %ENV);
%wow = unpack("(w/a*)*", $string);
# The %wow hash is now an exact copy of %ENV
Now that we have a string representation of an array or hash, we
can save the string to a file, send it over a socket, or even encrypt
it using some encryption algorithm.
This approach can even handle arrays (and hashes) that contain
scalars consisting of newlines, null-bytes, and other unprintable
characters!
There are a few important items to point out:
1. The serialized string will most likely contain
non-printable characters, which may include some
newline characters, even if no scalar in the
original array/hash contains a "\n" character.
Because of this, you should use the binmode()
function on any filehandle you plan to print the
string out to.
2. If the array or hash contains any numbers, they
will be converted to their string representation.
3. This technique only handles simple arrays and hashes.
In other words, multi-dimensional arrays and hashes,
lists of lists, an references are not handled
correctly. If you really want to serialize a
complex structure such as one of these, I recommend
using another approach, like taking advantage of
the Data:umper module. You CAN however, create
an array of these serialized arrays, and serialize
that array!
4. The "w" in the pack string "(w/a*)*" allows for the
encoding of any arbitrary-length string, even if it
is longer than 0xffffffff bytes (4,294,967,295
bytes). But since "w" is only used for encoding
non-negative integers, the "(w/a*)*" pack string
cannot be used to encode arrays or hashes
containing negative-length strings. Fortunately,
that's never been a problem for me.
5. I do not know if this trick can handle arrays
and hashes containing Unicode strings. My guess
is that it can, but I haven't tested it so I can't
say for sure.
Anyway, that's my trick that I thought I would share with the rest
of you. Have fun with it!
-- Jean-Luc Romano
Today I invented a neat new trick that I thought I'd share with
everyone here.
But before I continue, I'd like to point out to anyone out there
who thinks that my trick is "obvious to everyone but inexperienced
programmers" or that "it's not worth knowing because better approaches
exist" that some people enjoy learning a new simple trick, even if
they never get a chance to apply it. Besides, sharing a trick that
was just discovered (even if most programmers already know about it)
has the benefit of educating any programmer who, for some reason or
another, happens to not be aware of that particular technique. So if
you really must reply saying that you already knew this trick, instead
of saying how it didn't help you at all, how about sharing something
else that might be useful to someone in the Perl community? That
would be much appreciated.
Anyway, now that I'm off my soap box, here is what I discovered
this morning:
The pack string "(w/a*)*" is useful for serializing arrays and
hashes -- that is, it can pack and unpack arrays and hashes to and
from a string. Let me explain in more detail:
I have an array, which holds the names of some animals:
@a = ("dog", "cat", "bird", "camel", "giraffe");
I might want to serialize @a into a string for the purpose of storing
it off into a file so I can retrieve it later. Well, I could use the
Data:umper module to create a string (and later the eval command to
extract out the reference which then I can assign to the array), but
that can get complicated if I don't have much experience using the
Data:umper module.
Well, using the pack string "(w/a*)*" I can easily serialize the
array into a string like so:
$string = pack("(w/a*)*", @a);
Now $string contains all the encoded information needed to reconstruct
the @a array. So if I wanted to use $string to create a @b array that
was identical to the @a array, I can use unpack() with the same pack
string:
@b = unpack("(w/a*)*", $string);
Neat, doncha think? This same technique also works with hashes:
$string = pack("(w/a*)*", %ENV);
%wow = unpack("(w/a*)*", $string);
# The %wow hash is now an exact copy of %ENV
Now that we have a string representation of an array or hash, we
can save the string to a file, send it over a socket, or even encrypt
it using some encryption algorithm.
This approach can even handle arrays (and hashes) that contain
scalars consisting of newlines, null-bytes, and other unprintable
characters!
There are a few important items to point out:
1. The serialized string will most likely contain
non-printable characters, which may include some
newline characters, even if no scalar in the
original array/hash contains a "\n" character.
Because of this, you should use the binmode()
function on any filehandle you plan to print the
string out to.
2. If the array or hash contains any numbers, they
will be converted to their string representation.
3. This technique only handles simple arrays and hashes.
In other words, multi-dimensional arrays and hashes,
lists of lists, an references are not handled
correctly. If you really want to serialize a
complex structure such as one of these, I recommend
using another approach, like taking advantage of
the Data:umper module. You CAN however, create
an array of these serialized arrays, and serialize
that array!
4. The "w" in the pack string "(w/a*)*" allows for the
encoding of any arbitrary-length string, even if it
is longer than 0xffffffff bytes (4,294,967,295
bytes). But since "w" is only used for encoding
non-negative integers, the "(w/a*)*" pack string
cannot be used to encode arrays or hashes
containing negative-length strings. Fortunately,
that's never been a problem for me.
5. I do not know if this trick can handle arrays
and hashes containing Unicode strings. My guess
is that it can, but I haven't tested it so I can't
say for sure.
Anyway, that's my trick that I thought I would share with the rest
of you. Have fun with it!
-- Jean-Luc Romano