J
jl_post
Hi,
The unpack() function is very, very useful for me, as I regularly
do a lot of unpacking of non-Perl-created data strings to see what
information they hold. If I didn't have the use of the unpack()
function, certain tasks would be much more difficult.
However, there's something I want to do with unpack() that I
haven't figured out how to do: I'd like to unpack part of a string,
but keep track of where the unpacking ended, so I can resume unpacking
the string (at a later time) where I left off.
Here's a trivial example:
Let's say I have a data string that holds lists of strings, like
this:
" 2 5hello 5world 2 2hi 5there"
The first number (" 2") signifies that the first list holds two
strings. The next number (" 5") signifies that the first encoded
string is 5 characters long. The next number (also a " 5") signifies
the same for the next encoded string.
So I could write a format string for unpack() to be: "a2/(a2/a)"
So the lines of code:
my $dataString = ' 2 5hello 5world extra data';
my @a = unpack 'a2/(a2/a)', $dataString;
print "$_\n" foreach @a;
would output:
hello
world
My question becomes: What if I want to parse out the extra data
later with a different pack string? It would be nice if there was a
way to return the current offset somehow with unpack(), so that I
could unpack again with something like this:
my @b = unpack "\@$offset $newPackString", $dataString;
Now, I could calculate this offset myself by examining what was
placed in @a, but this gets tricky fast with packstrings that use "Z",
"A", and 'a' (and combinations).
(Incidentally, C's sscanf() function has a little-known "n" format
character that returns the number of characters consumed. I'm hoping
that unpack() has a similar feature.)
I posted a similar question back in 2004, and Anno Siegel responded
with the suggestion of adding "a*" to my first packstring, and then
using the length() of the last element to calculate the offset, like
this:
my $dataString = ' 2 5hello 5world extra data';
my @a = unpack 'a2/(a2/a) a*', $dataString;
my $offset = length($dataString) - length( pop(@a) );
print "$_\n" foreach @a;
my @b = unpack "\@$offset $newPackString", $dataString;
While this approach technically works, repeatedly using "a*" at the
end of a packstring in a continual loop creates a O(n^2) algorithm.
This isn't a problem for short $dataStrings, but is a significant
problem when $dataStrings are long and/or have no limit in length.
I've noticed that Perl 5.10 added lots of convenient new features
to pack() and unpack() (such as the ability to pack floats and doubles
in an endian-ness different than your own), so I'm hoping that
unpack() now has a way to return the $dataString offset. However,
I've read both "perldoc -f unpack" and "perldoc -f pack" but I can't
seem to find this behavior documented, if it exists at all.
So does anyone know if I can get unpack() to return an offset?
Thanks!
-- Jean-Luc
The unpack() function is very, very useful for me, as I regularly
do a lot of unpacking of non-Perl-created data strings to see what
information they hold. If I didn't have the use of the unpack()
function, certain tasks would be much more difficult.
However, there's something I want to do with unpack() that I
haven't figured out how to do: I'd like to unpack part of a string,
but keep track of where the unpacking ended, so I can resume unpacking
the string (at a later time) where I left off.
Here's a trivial example:
Let's say I have a data string that holds lists of strings, like
this:
" 2 5hello 5world 2 2hi 5there"
The first number (" 2") signifies that the first list holds two
strings. The next number (" 5") signifies that the first encoded
string is 5 characters long. The next number (also a " 5") signifies
the same for the next encoded string.
So I could write a format string for unpack() to be: "a2/(a2/a)"
So the lines of code:
my $dataString = ' 2 5hello 5world extra data';
my @a = unpack 'a2/(a2/a)', $dataString;
print "$_\n" foreach @a;
would output:
hello
world
My question becomes: What if I want to parse out the extra data
later with a different pack string? It would be nice if there was a
way to return the current offset somehow with unpack(), so that I
could unpack again with something like this:
my @b = unpack "\@$offset $newPackString", $dataString;
Now, I could calculate this offset myself by examining what was
placed in @a, but this gets tricky fast with packstrings that use "Z",
"A", and 'a' (and combinations).
(Incidentally, C's sscanf() function has a little-known "n" format
character that returns the number of characters consumed. I'm hoping
that unpack() has a similar feature.)
I posted a similar question back in 2004, and Anno Siegel responded
with the suggestion of adding "a*" to my first packstring, and then
using the length() of the last element to calculate the offset, like
this:
my $dataString = ' 2 5hello 5world extra data';
my @a = unpack 'a2/(a2/a) a*', $dataString;
my $offset = length($dataString) - length( pop(@a) );
print "$_\n" foreach @a;
my @b = unpack "\@$offset $newPackString", $dataString;
While this approach technically works, repeatedly using "a*" at the
end of a packstring in a continual loop creates a O(n^2) algorithm.
This isn't a problem for short $dataStrings, but is a significant
problem when $dataStrings are long and/or have no limit in length.
I've noticed that Perl 5.10 added lots of convenient new features
to pack() and unpack() (such as the ability to pack floats and doubles
in an endian-ness different than your own), so I'm hoping that
unpack() now has a way to return the $dataString offset. However,
I've read both "perldoc -f unpack" and "perldoc -f pack" but I can't
seem to find this behavior documented, if it exists at all.
So does anyone know if I can get unpack() to return an offset?
Thanks!
-- Jean-Luc