How do I parse certain "bits" out of bytes?

Adam Schneider · May 4, 2005

Here's a completely hypothetical example:

I have a binary file containing one byte: hex 65, or "e" in text. I
can get perl to read it and tell me that this is "01100101" in binary.
But hidden in here are two pieces of information: the first 5 bytes are
a number ranging from 0 to 32 and the last 3 bytes are another number
between 0 and 8. I need those two decimal numbers (in this case, 12
and 5) put into two variables.

I have read all the documentation on "pack," "unpack," "vec," bitwise
operators, and everything else, and I still have no idea how to extract
a few bits out of a longer string of bytes. The help files all seem to
have been written with the assumption that I majored in computer
science and learned how to program years ago in some pre-Perl language
-- which I did not.

Can someone please pass along a simple solution that will make sense to
someone who doesn't know the hardcore geeky stuff? I know it's out
there, I just can't find it. (The closest I came was "vec," but it
doesn't let you ask for chunks that aren't powers of 2.)

Thanks in advance,

Adam Schneider
Portland, OR

Anno Siegel · May 4, 2005

Adam Schneider said:
Here's a completely hypothetical example:

I have a binary file containing one byte: hex 65, or "e" in text. I
can get perl to read it and tell me that this is "01100101" in binary.
But hidden in here are two pieces of information: the first 5 bytes are
a number ranging from 0 to 32 and the last 3 bytes are another number
between 0 and 8. I need those two decimal numbers (in this case, 12
and 5) put into two variables.

Bytes or bits? Please keep them apart.

I have read all the documentation on "pack," "unpack," "vec," bitwise
operators, and everything else, and I still have no idea how to extract
a few bits out of a longer string of bytes. The help files all seem to
have been written with the assumption that I majored in computer
science and learned how to program years ago in some pre-Perl language
-- which I did not.
Whatever...

Can someone please pass along a simple solution that will make sense to
someone who doesn't know the hardcore geeky stuff? I know it's out
there, I just can't find it. (The closest I came was "vec," but it
doesn't let you ask for chunks that aren't powers of 2.)

Use the bitwise operators after turning the character into a number:

my $str = 'e';
my $bottom_three = ord( $str) & oct '0b111';
my $top_five = ord( $str) >> 3;

Anno

Josef Moellers · May 4, 2005

Adam said:
Here's a completely hypothetical example:

I have a binary file containing one byte: hex 65, or "e" in text. I
can get perl to read it and tell me that this is "01100101" in binary.
But hidden in here are two pieces of information: the first 5 bytes are
a number ranging from 0 to 32 and the last 3 bytes are another number
between 0 and 8. I need those two decimal numbers (in this case, 12
and 5) put into two variables.

I have read all the documentation on "pack," "unpack," "vec," bitwise
operators, and everything else, and I still have no idea how to extract
a few bits out of a longer string of bytes. The help files all seem to
have been written with the assumption that I majored in computer
science and learned how to program years ago in some pre-Perl language
-- which I did not.

Can someone please pass along a simple solution that will make sense to
someone who doesn't know the hardcore geeky stuff? I know it's out
there, I just can't find it. (The closest I came was "vec," but it
doesn't let you ask for chunks that aren't powers of 2.)

Perl knows shift and logical operators:

$n1 = ($n >> 3) & 0x1f;
$n2 = $n & 7;

Anno Siegel · May 4, 2005

Adam Schneider said:
Here's a completely hypothetical example:

I have a binary file containing one byte: hex 65, or "e" in text. I
can get perl to read it and tell me that this is "01100101" in binary.
But hidden in here are two pieces of information: the first 5 bytes are
a number ranging from 0 to 32 and the last 3 bytes are another number
between 0 and 8. I need those two decimal numbers (in this case, 12
and 5) put into two variables.

Bytes or bits? Please keep them apart.

I have read all the documentation on "pack," "unpack," "vec," bitwise
operators, and everything else, and I still have no idea how to extract
a few bits out of a longer string of bytes. The help files all seem to
have been written with the assumption that I majored in computer
science and learned how to program years ago in some pre-Perl language
-- which I did not.
Whatever...

Can someone please pass along a simple solution that will make sense to
someone who doesn't know the hardcore geeky stuff? I know it's out
there, I just can't find it. (The closest I came was "vec," but it
doesn't let you ask for chunks that aren't powers of 2.)

Use the bitwise operators after turning the character into a number:

my $char = 'e';
my $bottom_three = ord( $char) & oct '0b111';
my $top_five = ord( $char) >> 3;

Anno

Sisyphus · May 4, 2005

Adam Schneider said:
Here's a completely hypothetical example:

I have a binary file containing one byte: hex 65, or "e" in text. I
can get perl to read it and tell me that this is "01100101" in binary.
But hidden in here are two pieces of information: the first 5 bytes are
a number ranging from 0 to 32 and the last 3 bytes are another number
between 0 and 8. I need those two decimal numbers (in this case, 12
and 5) put into two variables.

I have read all the documentation on "pack," "unpack," "vec," bitwise
operators, and everything else, and I still have no idea how to extract
a few bits out of a longer string of bytes. The help files all seem to
have been written with the assumption that I majored in computer
science and learned how to program years ago in some pre-Perl language
-- which I did not.

Can someone please pass along a simple solution that will make sense to
someone who doesn't know the hardcore geeky stuff? I know it's out
there, I just can't find it. (The closest I came was "vec," but it
doesn't let you ask for chunks that aren't powers of 2.)

There's lots of ways. When dealing with bits I like to use the bitwise
operators - here's one simple, none-too-cute way of doing the job using
them:

use warnings;
use strict;

my $n = 0x65;
my $last_three_bits = $n & 7; # 7 = 2**3 - 1
print $last_three_bits, "\n";

#Finished with the last 3 bits, so remove them:
$n >>= 3;

my $first_five_bits = $n;
print $first_five_bits, "\n";

__END__

Cheers,
Rob

Adam Schneider · May 4, 2005

Josef said:
Perl knows shift and logical operators:

$n1 = ($n >> 3) & 0x1f;
$n2 = $n & 7;

But what's actually going on here? To a layperson, the meanings of the
terms "shift" and "logical operators" are opaque in this context and do
not obviously correlate with the ">>" and "&" symbols in those
statements. (What do the operators do? Why have you used 0x1f there?)

If someone can point me to a page somewhere that will explain how to
use these sorts of things, starting from square one, I'd appreciate it.
As I said, the documentation on these subjects is useless for those of
us who don't know the old-school programming terminology.

Adam Schneider
Portland, OR

Adam Schneider · May 4, 2005

Anno said:
Adam Schneider wrote in comp.lang.perl.misc:

Bytes or bits? Please keep them apart.

Oops. I meant 5 bits and 3 bits, of course.

Josef Moellers · May 4, 2005

Adam said:
But what's actually going on here? To a layperson, the meanings of the
terms "shift" and "logical operators" are opaque in this context and do
not obviously correlate with the ">>" and "&" symbols in those
statements. (What do the operators do? Why have you used 0x1f there?)

Sory, I thought if someone programmed in Perl, (s)he isn't a "layperson".

If someone can point me to a page somewhere that will explain how to
use these sorts of things, starting from square one, I'd appreciate it.
As I said, the documentation on these subjects is useless for those of
us who don't know the old-school programming terminology.

You'll definitely need a good book on the basics of computers and
programming.

You posed a question that you needed to dissect a given byte into two
bitfields. I gave the answer. If you do not understand the answer, the
problem may be beyond your knowledge of computing and how computers
work. Sorry if I sound too rude, but you do not seem to be very willing
to pick up hints and suggestions (and even a solution) and do some work
on your own.

As for the perl-specifics: "perldoc perlop" is a good starter, as would
be the camel book.

Sisyphus · May 4, 2005

Adam Schneider said:
But what's actually going on here?

Just play around with some code .... you'll soon work out what's happening.

use warnings;
use strict;

my $z1 = int(rand(256));
my $z2 = int(rand(256));

printf("AND\n%08b\n%08b\n%08b\n\nOR\n%08b\n%08b\n%08b\n\nXOR\n%08b\n%08b\n%0
8b\n\n",
$z1, $z2, $z1&$z2, $z1, $z2, $z1|$z2, $z1, $z2, $z1^$z2);

for(1..8) {
printf("%08b\n", $z1);
$z1 = $z1 >> 1;
}

print "\n\n";

$z1 = 3;

for(1..7) {
printf("%08b\n", $z1);
$z1 = $z1 << 1;
}

__END__

Cheers,
Rob

Arndt Jonasson · May 4, 2005

Adam Schneider said:
But what's actually going on here? To a layperson, the meanings of the
terms "shift" and "logical operators" are opaque in this context and do
not obviously correlate with the ">>" and "&" symbols in those
statements. (What do the operators do? Why have you used 0x1f there?)

If someone can point me to a page somewhere that will explain how to
use these sorts of things, starting from square one, I'd appreciate it.
As I said, the documentation on these subjects is useless for those of
us who don't know the old-school programming terminology.

What would a new-school programming terminology be?

An imaginary language might write it this way:

n1 = (n RIGHT_SHIFT 3) AND BINARY(11111);
n2 = n AND BINARY(111);

Does that help? Perl does have a way of specifying numbers in binary,
by the way: 0b11111.

Fabian Pilkowski · May 4, 2005

* Adam Schneider said:
But what's actually going on here? To a layperson, the meanings of the
terms "shift" and "logical operators" are opaque in this context and do
not obviously correlate with the ">>" and "&" symbols in those
statements. (What do the operators do? Why have you used 0x1f there?)

Let's assume $n contains one byte, e.g. 0x65. Its decimal representation
is 101, and its binary one is 0b01100101. First, have a look at this
shift operator -- just shift all the bits by 3 positions to the right.
Since shifting is null-padded you'll get

0b01100101 >> 3 = 0b00001100 = 12

Simplified, this just cuts off the last three bits. The trailing "&0x1f"
(or "&0b00011111") part ensures that only the last 5 bits are used. This
could be only a problem if there are more than 8 bits stored in $n.

The second statement uses the bitwise "and" operator. You could write it
as "$n & 0b111" instead -- to better see what it does. Bitwise "&" means
to capture only those bits which are true in both summands. Here, you'll
get the last 3 bits, by cutting off the first 5 bits.

0b01100101
& 0b00000111
------------
0b00000101 = 5

If someone can point me to a page somewhere that will explain how to
use these sorts of things, starting from square one, I'd appreciate it.
As I said, the documentation on these subjects is useless for those of
us who don't know the old-school programming terminology.

Sure. I could refer to `perldoc perlop` where all these bit operators
are described, but for shifting I just read this:

Binary ``>>'' returns the value of its left argument shifted right
by the number of bits specified by the right argument. Arguments
should be integers. (See also Integer Arithmetic.)
Note that both ``<<'' and ``>>'' in Perl are implemented directly
using ``<<'' and ``>>'' in C.

Well, it is hard to understand if someone knows nearly nothing about
binary digits, but it describes what it does. Certainly, someone who
wants to deal with binary data has to learn something about these low
level basics. Of course, these basics aren't Perl specific ... ;-)

regards,
fabian

xhoster · May 4, 2005

Adam Schneider said:
But what's actually going on here? To a layperson, the meanings of the
terms "shift" and "logical operators" are opaque in this context and do
not obviously correlate with the ">>" and "&" symbols in those
statements.

They do obviously correlate with those symbols if you look up those
operators in the docs. That is what I, who also has no formal compsci
edumacation, did.

(What do the operators do?

">>" shifts the bits of $n over (to less significance) 3 places, so that
the 5 leftmost bits are now the 5 rightmost bits. The "variable &
constant" is used to mask out bits in the variable you don't want.

Why have you used 0x1f there?)

Because 0x1f is the hex code for the 5 rightmost bits, which is what you
want to keep (after the shifting).

If someone can point me to a page somewhere that will explain how to
use these sorts of things, starting from square one, I'd appreciate it.

You already know what 'e' is 01100101, so it seems like you are already
beyond square one. I don't how you got that far, but maybe continuing
whatever it was you did to get that far would be a good idea. Beyond
that, just googling on "Bitwise operators" leads to tons of useful
hits. (Don't just stick to a Perl-centric view, take a look at the ones
for Java and especially C)

As I said, the documentation on these subjects is useless for those of
us who don't know the old-school programming terminology.

Well, Perl is mostly a "new school" language, so it is to be expected
that the docs don't go out of their way to explain old-school concepts.

Xho

Adam Schneider · May 4, 2005

Fabian said:
Let's assume $n contains one byte, e.g. 0x65. Its decimal representation
is 101, and its binary one is 0b01100101. First, have a look at this
shift operator -- just shift all the bits by 3 positions to the right.
Since shifting is null-padded you'll get

0b01100101 >> 3 = 0b00001100 = 12

Simplified, this just cuts off the last three bits. The trailing "&0x1f"
(or "&0b00011111") part ensures that only the last 5 bits are used. This
could be only a problem if there are more than 8 bits stored in $n.

Thanks, Fabian. Your explanation is very helpful -- and being able to
write the numbers in binary does make it more clear. Now, what do you
mean when you say "this could be only a problem if there are more than
8 bits"? Because in my real-world example, I need to get data from 32
bits. Can I not use these same techniques?

Adam

Adam Schneider · May 4, 2005

Josef said:
Sory, I thought if someone programmed in Perl, (s)he isn't a "layperson".

The wonderful thing about Perl is that it's forgiving enough and
(usually) intuitive enough that anyone can use it, even with no formal
programming background.

I've been using Perl to manipulate text and hierarchical data for
almost 10 years now, with no problems understanding the terminology,
but as soon as I venture into binary data, it's like the documentation
is written in another language. (In addition to the bitwise operators,
the "pack" and "unpack" commands are confusing too; there are not
nearly enough examples in the manual.)

Adam

Tad McClellan · May 4, 2005

Adam Schneider said:
Here's a completely hypothetical example:

Why?

It could easily and unambiguously be an example in Real Perl Code.

Have you seen the Posting Guidelines that are posted here frequently?

I have a binary file containing one byte: hex 65, or "e" in text. I
can get perl to read it and tell me that this is "01100101" in binary.

If you can read it, then we don't need to know where it came from.

You can replace that whole sentence with a bit of Real Code:

my $n = 0b01100101;

But hidden in here are two pieces of information: the first 5 bytes are
a number ranging from 0 to 32 and the last 3 bytes are another number
between 0 and 8. I need those two decimal numbers (in this case, 12
and 5) put into two variables.

my $first = ($n >> 3) # shift all bits 3 postions to the right
& 0b00011111; # mask off all but last 5 bits

my $second = $n & 0b00000111; # mask off all but last 3 bits

I have read all the documentation on "pack," "unpack," "vec," bitwise
operators, and everything else, and I still have no idea how to extract
a few bits out of a longer string of bytes. The help files all seem to
have been written with the assumption that I majored in computer
science and learned how to program years ago in some pre-Perl language
-- which I did not.

References for a programming language are not where you learn
how to program, they are where you learn how to write a program
in the language being discussed.

You need to learn what operations can be performed on your data
before you can look for those operations in Perl's fine manual.
Specifically, you can use "shift" and "and" operations to get
bit-ranges.

">>" is how you get a "shift" (right) in Perl.

"&" is how you do a bitwise "and" in Perl.

Can someone please pass along a simple solution that will make sense to
someone who doesn't know the hardcore geeky stuff?

No.

You will need to understand your data in order to write a program
that manipulates your data. That is the nature of the beast.

I know it's out
there, I just can't find it.

Googling for:

binary tutorial

finds lots of hits.

Once you know what you want to do, *then* you turn your attention
to how to do that in Perl.

(The closest I came was "vec,"

vec() treats its argument as a string, while you want to treat
it as a number, so vec() is not that close.

but it
doesn't let you ask for chunks that aren't powers of 2.)

If you want to use base-2 (binary), then all you *have* are powers of 2,
just as when you are working in base-10 all you have are powers of 10
(eg. ones, place, tens place, hundreds place...).

A. Sinan Unur · May 4, 2005

The wonderful thing about Perl is that it's forgiving enough and
(usually) intuitive enough that anyone can use it, even with no formal
programming background.

I've been using Perl to manipulate text and hierarchical data for
almost 10 years now, with no problems understanding the terminology,
but as soon as I venture into binary data, it's like the documentation
is written in another language. (In addition to the bitwise
operators, the "pack" and "unpack" commands are confusing too; there
are not nearly enough examples in the manual.)

I have been navigating the streets of Ankara, Turkey successfully for a
long time now, but I am sure, as soon as I venture to Amsterdam, it's,
like, the maps are written in a different language. Why should that be?
After all, it is just another city.

I find it disturbing that in all these years of not being able to
understand the documentation due to a lack of your knowledge, you never
had the desire or motivation to pick a simple tutorial on boolean logic,
or even to try and convert numbers among the commonly used bases such as
2, 8, 10, and 16.

I find your response to Josef Moellers extremely disturbing. He helped
you, and you paid him back by being obnoxious.

Sinan

jl_post · May 4, 2005

Adam said:
Here's a completely hypothetical example:

I have a binary file containing one byte:
hex 65, or "e" in text. I can get perl to read it
and tell me that this is "01100101" in binary.
But hidden in here are two pieces of information:
the first 5 bytes are a number ranging from 0 to 32
and the last 3 bytes are another number between 0
and 8. I need those two decimal numbers (in this
case, 12 and 5) put into two variables.

I have read all the documentation on "pack,"
"unpack," "vec," bitwise operators, and everything
else, and I still have no idea how to extract a
few bits out of a longer string of bytes.

Dear Adam,

Yeah, extracting bits can be tricky at times. I sometimes use:

my $bitString = unpack("B*", $string);

to convert a byte string to a bit string of 1s and 0s. Then I can use:

my $value = oct("0b$bitString");

to convert a bit string (of 1s and 0s) to their corresponding value.

This program does exactly what you want:

#!/usr/bin/perl
use warnings;
use strict;

my $string = "e";
print "String: \"$string\"\n";

# Extract the bit string:
my $bitString = unpack("B*", $string);
print "The bits are: $bitString\n";

# Extract the first five bits:
my $firstFiveBits = substr($bitString, 0, 5);
print "The first five bits are: $firstFiveBits\n";

# Extract the last three bits:
my $lastThreeBits = substr($bitString, -3);
print "The last three bits are: $lastThreeBits\n";

# Extract the value from the first five bits:
my $fiveBitValue = oct("0b$firstFiveBits");
print "The five bit value is: $fiveBitValue\n";

# Extract the value from the last three bits:
my $threeBitValue = oct("0b$lastThreeBits");
print "The three bit value is: $threeBitValue\n";

__END__

When you run it, you should get the following output:

String: "e"
The bits are: 01100101
The first five bits are: 01100
The last three bits are: 101
The five bit value is: 12
The three bit value is: 5

Hopefully this sample program will help you understand one way to
extract out bit values from byte strings.

-- Jean-Luc Romano

Tad McClellan · May 4, 2005

Adam Schneider said:
If someone can point me to a page somewhere that will explain how to
use these sorts of things, starting from square one, I'd appreciate it.

Google for

binary tutorial

As I said, the documentation on these subjects is useless for those of
us who don't know the old-school programming terminology.

Or those of us who don't know the modern-school programming terminology.

Nor the future-school terminology.

So it ends up being pretty much:

who don't have schooling in the data type that needs processing

You must resign yourself to getting (just enough) schooling on
binary data if you need to process binary data.

You only need to learn two operations, then look in perlop.pod
to find out how to get those operations using Perl.

Adam Schneider · May 4, 2005

A. Sinan Unur said:
I find it disturbing that in all these years... you never had the
desire or motivation to pick a simple tutorial on boolean logic

No, I didn't, for the same reason I've never picked up a tutorial on
speaking Swahili; I didn't need it.

I find your response to Josef Moellers extremely disturbing. He helped
you, and you paid him back by being obnoxious.

Actually, Josef's message was somewhat helpful, but (to me) confusing;
my response only explained why it didn't make sense to me, and I asked
for clarification. (And I wouldn't go accusing other people of
obnoxiousness after snidely commenting on the "lack of [my]
knowledge.")

The people in this newsgroup are acting as if I were attacking them
personally; my minor complaint is with the documentation, which makes
the assumption that the reader is already familiar with a lot of
technical programming terms, and furthermore has learned other
languages before perl. (There is indeed good "beginner's"
documentation for perl, but it never really gets into binary data;
neither "Learning Perl" nor the "Perl Cookbook" were able to help.)

Thank you very much to Fabian Pilkowski and Jean-Luc Romano, who posted
very helpful responses without being condescending or attempting to
subtly (or overtly) insult me. To the rest of you, my apologies for
trespassing in your little clubhouse.

Adam Schneider
Portland, OR

Tad McClellan · May 4, 2005

Adam Schneider said:
To the rest of you, my apologies for
trespassing in your little clubhouse.

So long then!

FAQ 4.53 How do I manipulate arrays of bits?	0	Feb 10, 2011
I'm tempted to quit out of frustration	1	Aug 13, 2023
How do i Do this function(dealing with arrays)	1	Dec 10, 2021
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
making bytes out of bits	33	Dec 7, 2008
How can I arrange a series of radio buttons?	2	Jan 25, 2024
Extracting bits out of huge numbers	14	Jul 29, 2008
FAQ 9.15 How do I parse a mail header?	0	Apr 10, 2011

How do I parse certain "bits" out of bytes?

Adam Schneider

Anno Siegel

Josef Moellers

Anno Siegel

Sisyphus

Adam Schneider

Adam Schneider

Josef Moellers

Sisyphus

Arndt Jonasson

Fabian Pilkowski

xhoster

Adam Schneider

Adam Schneider

Tad McClellan

A. Sinan Unur

jl_post

Tad McClellan

Adam Schneider

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads