Shift Confusion

K

Kamilche

I'm trying to pack two characters into a single byte, and the shifting
in Python has me confused.

Essentially, it should be possible to use a 'packed string' format in
Python, where as long as the characters you're sending are in the ASCII
range 0 to 127, two will fit in a byte.

Here's the code. Can you tell what I'm doing wrong?

|import types
|
|def PackString(s):
| if type(s) != types.StringType:
| raise Exception("This routine only packs strings!")
| l = len(s)
| if l % 2 != 0:
| s = s + '\0'
| l += 1
| chars = []
| for i in range(0, l, 2):
| x = ord(s)
| y = ord(s[i+1])
| chars.append(chr((y << 1) | x))
| return ''.join(chars)
|
|def UnpackString(s):
| if type(s) != types.StringType:
| raise Exception("This routine only unpacks strings!")
| l = len(s)
| chars = []
| for i in range(l):
| temp = ord(s)
| c = 0xf0 & temp
| chars.append(chr(c))
| c = 0x0f & temp
| chars.append(chr(c))
| return ''.join(chars)
|
|
|def main():
| s = "Test string"
| print s
| packed = PackString(s)
| print "This is the string packed:"
| print packed
| print "This is the string unpacked:"
| print UnpackString(packed)
|
|main()
 
J

John Machin

I'm trying to pack two characters into a single byte, and the shifting
in Python has me confused.

Essentially, it should be possible to use a 'packed string' format in
Python, where as long as the characters you're sending are in the ASCII
range 0 to 127, two will fit in a byte.

It should be possible, but only in a realm where phlogiston and
perpetual motion machines exist.

To hold one ASCII character 0 <= ord(c) < 128, you need log2(128) == 7
bits. There are 8 bits in a byte. Therefore you can hold only 8/7.0 ==
1.14... ASCII characters in a standard 8-bit byte. If there is such a
thing as a 14-bit byte, that's news to me.

Other things you are doing wrong:

1. Using "L".lower() as a variable name. Some fonts make it extremely
hard to work out what is what in "l1l1l1l1l1l1ll1l1l" -- can you read
that???

2. Hmmm:'CHR((Y << 1) | X))'

OK so that's a one, not the length of your string, as augmented.
Either would be be wrong. Shifting a character left ONE bit (or,
changing the emphasis, one BIT) and then ORing in another character
would be vaguely reasonable only if you were writing a hash function.

3. Supposing this modern alchemy had worked: when unpacking, how would
you know whether the string was originally (say) seven characters long
or 8 characters long?

4. Your unpacking routine appears to be trying to unpack two 4-bit
items (nibbles) out of a byte, but is not doing (temp & 0xf0) >> 4 for
the top nibble as one might expect ..... aaahhh!!?? are you trying to
emulate packed decimal???

5. Not writing down a clear statement of what you are trying to do,
followed by examples of input and expected output. This latter goes by
fancy names like "test-driven development"; when I started programming
it was known as "common sense".

6. Not using Python interactively to explore what's going on:

HTH,
John
 
Q

qwweeeit

At programming level it seems correct (a part a "return" closure
needed for the "main" function).

But the error is IMHO conceptual:
for a char you need 7 bits (from 0 to 127 or in hex from x00 to x7F)
and you can't accomodate the other char in only one bit!
The other 128 symbols (from 128 to 255 or in hex from x80 to xFF) are
only possible because you use again 7 bits, but with the 8th bit set
to 1!

What you are trying to do I made in C language (some years ago...)
using however bytes and words, packing 2 bytes in only one word, but
you can't pack 2 chars (each one beeing nearly a byte) in a byte!
 
D

Dennis Lee Bieber

Essentially, it should be possible to use a 'packed string' format in
Python, where as long as the characters you're sending are in the ASCII
range 0 to 127, two will fit in a byte.
Pardon? You are going to fit TWO 7-bit values into one 8-bit?


--
 
D

Dennis Lee Bieber

alt.sys.pdp10 ?
Closest thing I know of to what is being attempted is DEC's
RAD-50; but that was essentially just uppercase A..Z, 0..9, and a few
punctuation marks, and packed three of them into two bytes.

--
 
J

James Kew

Dennis Lee Bieber said:
Pardon? You are going to fit TWO 7-bit values into one 8-bit?

Quite. Although you can sort of see how one might naively arrive at this
conclusion: one 7-bit char takes 0...127, which when you put it into an
8-bit byte leaves 128...255 unused for a second char....

James
 
S

Steve Holden

Dennis said:
Closest thing I know of to what is being attempted is DEC's
RAD-50; but that was essentially just uppercase A..Z, 0..9, and a few
punctuation marks, and packed three of them into two bytes.
Another code it used was known as SIXBIT, allowing 64 different
characters. IIRC it could cope with letters, digits and a bunch of
punctuation - see

http://nemesis.lonestar.org/reference/telecom/codes/sixbit.html

The DECSystem-10 used a 3-6 bit word, so you could get six sixbit
characters to a word. In ASCII you could only get four (or, if you threw
the parity bit away, five) characters to a word.

While its character-handling instructions weren't, as I recall, unique
in the industry, the DECSystem-10 remains the only hardware I ever got
to use that had instructions to handle variable byte sizes.

regards
Steve
 
K

Kamilche

Quite. Although you can sort of see how one might naively arrive at
this
conclusion: one 7-bit char takes 0...127, which when you put it into an
8-bit byte leaves 128...255 unused for a second char....

James

Yep, that's what I was doing. Guess I was too tired to program usefully
last night.

Thanks for clearing that up, guys!
 
D

Dennis Lee Bieber

Quite. Although you can sort of see how one might naively arrive at this
conclusion: one 7-bit char takes 0...127, which when you put it into an
8-bit byte leaves 128...255 unused for a second char....
Heh... Which can be proven fairly false by just looking at the
old Wendy's advertising (I mean OLD -- like 1979-81). In counter to the
BK "Have it your way", Wendy's used to advertising 256 different ways to
make a burger -- the big feature was that /they/ didn't make the 256
versions. They had a condiment table with 8 toppings, and the buyer had
to put the toppings on... 8 toppings, on/off each, 256 different
combinations...

--
 
L

Lars

Hi Kamilche,

Aside from the 7bit confusion you should take a look at the 'struct'
module. I bet it will simplify your life considerably.

#two chars'AB'

#unsigned short + two chars'\xff\xffab'


Cheers
Lars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,221
Messages
2,571,131
Members
47,747
Latest member
swapote

Latest Threads

Top