writing bits to file

M

Matt Kowalczyk

Hello,

I am working on a compression project and I want to write ASCII characters using
the minimum amount of bits. Since I will be writing ASCII characters from 0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying to
write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and write
the byte. Concat two bits from the next word and write the byte, e.g. until I'm
finished.

The method seems overly complicated and I was wondering if there are any bit
libraries out there that I could take advantage of? Maybe you know of another
way for me to achieve this.

Thanks,
Matt
 
M

Malcolm

Matt Kowalczyk said:
Hello,

I am working on a compression project and I want to write ASCII characters
using
the minimum amount of bits. Since I will be writing ASCII characters from
0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying
to
write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and
write
the byte. Concat two bits from the next word and write the byte, e.g.
until I'm
finished.

The method seems overly complicated and I was wondering if there are any
bit
libraries out there that I could take advantage of? Maybe you know of
another
way for me to achieve this.

Write a function

int squashASCII(unsigned char *out, char *in)

to squash a string. Return the number of bytes, which will be 1/8th less
than the input, -1 on error (if someone passes a non-ASCII character).

It should not be at all difficult to do.
 
M

Matt Kowalczyk

Malcolm said:
Write a function

int squashASCII(unsigned char *out, char *in)

to squash a string. Return the number of bytes, which will be 1/8th less
than the input, -1 on error (if someone passes a non-ASCII character).

It should not be at all difficult to do.

Looks like that's what I will have to do. What I really want is a bitbuffer! I
was browsing around on the internet and I found one used by tooLAME (an mpeg
encoder) which hopefully won't be too complicated to integrate into my project.
 
R

Richard Heathfield

Matt Kowalczyk said:
Hello,

I am working on a compression project and I want to write ASCII characters
using
the minimum amount of bits. Since I will be writing ASCII characters from
0-127
I only need 7 bits to represent a character. Therefore, if I write each
character at a time, I will end up writing 8 bits.

One method would be to somehow concatinate all the 7 bit words I am trying
to write and just pad the last byte.

e.g. Read the first 7-bit word, concatinate a bit from the next word and
write
the byte. Concat two bits from the next word and write the byte, e.g.
until I'm finished.

A few years ago, a guy on sci.crypt defended his use of gcc extensions on
the grounds that his desire to use 19-bit integers could not be met in ISO
C. Naturally enough, then, I spent a few minutes cutting some ISO C code
that would let him do just that. (It's a bit of a hack, as you'll see in a
moment, but it should work fine on a C8S16IL32 system, which is almost
certainly what you're using.)

It occurs to me that you could easily adapt it to your purposes. Set up a
buffer - an array of unsigned char - that is at least (N + 6) / 7 bytes in
size where N is the number of ASCII characters you want to crunch down, and
write to it using Put_nBit_Int. Once you've finished, the array of unsigned
char holds your "packed" data (and possibly some spare space, so you should
beware of that - maybe retain (binary) 0000000 as a terminator). You can
store or send that. At the receiving end, use Get_nBit_Int to retrieve the
crunched bytes.

The main() function at the bottom should give you a rough idea of how to use
the nbit functions.

You are welcome to use what follows, without payment, provided you credit me
in the source code.

#include <stdio.h>
#include <limits.h>

#define SET_BIT(a, n) (a)[(n) / CHAR_BIT] |= \
(unsigned char)(1U << ((n) % CHAR_BIT))

#define CLEAR_BIT(a, n) (a)[(n) / CHAR_BIT] &= \
(unsigned char)(~(1U << ((n) % CHAR_BIT)))

#define TEST_BIT(a, n) (((a)[(n) / CHAR_BIT] & \
(unsigned char)(1U << ((n) % CHAR_BIT))) ? 1 : 0)

/* Debugging function, used for printing len * CHAR_BIT
* bits from s.
*/
int print_bits(unsigned char *s, int len)
{
int i, j;
for(i = 0; i < len; i++)
{
for(j = 0; j < CHAR_BIT; j++)
{
printf("%d", TEST_BIT(s, i * CHAR_BIT + j) ? 1 : 0);
}
printf(" ");
}
printf("\n");
return 0;

}

unsigned int BitsInUnsignedInt(void)
{
static unsigned int answer = 0;
unsigned int testval = UINT_MAX;
if(answer == 0)
{
while(testval > 0)
{
++answer;
testval >>= 1;
}
}

return answer;

}

/* This function gets the Indexth n-bit unsigned int field from the bit
* array. To do this, it builds the unsigned int value bit by bit.
*
* Example call:
*
* unsigned int val;
* val = Get_nBit_Int(MyBitArray, this is the base address
* 19, get a 19-bit number
* 13, get the 14th number (0 to max - 1)
* 7); skip 7 leading bits at the start of
the array
*
*/
unsigned int Get_nBit_Int(unsigned char *BitArray,
unsigned int n,
unsigned int Index,
unsigned int BaseBit)
{
unsigned int Value = 0;
unsigned int j;
unsigned int i = Index * n;

if(n <= BitsInUnsignedInt())
{
i += BaseBit;
BitArray += i / CHAR_BIT;
i %= CHAR_BIT;

for(j = 0; j < n; j++)
{
/* Move the populated bits out of the way.
* Yes, this means that the first iteration
* of the loop does a useless shift. I think
* I can live with that. :)
*/
Value <<= 1;

/* Populate the low bit */
Value |= TEST_BIT(BitArray, i + j);
}
}

return Value;

}

void Put_nBit_Int(unsigned char *BitArray,
unsigned int n,
unsigned int Index,
unsigned int BaseBit,
unsigned int Value)
{
unsigned int j;
unsigned int i = Index * n;

if(n <= 32)
{
i += BaseBit;

BitArray += i / CHAR_BIT;
i %= CHAR_BIT;

j = n;
while(j--)
{
/* Use the rightmost bit */
if(Value & 1)
{
SET_BIT(BitArray, i + j);
}
else
{
CLEAR_BIT(BitArray, i + j);
}
/* Throw the rightmost bit away, moving the next bit into
* position. On the last iteration of the loop, this
* instruction is pointless. <shrug>
*/
Value >>= 1;
}
}

}

int main(void)
{
unsigned char test_array[9] = {0};

print_bits(test_array, 9);
printf("Storing the 19-bit value 0x7FFFF starting at bit 3.\n");
Put_nBit_Int(test_array, 19, 0, 3, 0x7FFFF);
print_bits(test_array, 9);
printf("Retrieving the 19-bit value starting at bit 3: %X\n",
Get_nBit_Int(test_array, 19, 0, 3));
printf("Storing the 19-bit value 0x7EDCB starting at bit 3 + (2 *
19).\n");
Put_nBit_Int(test_array, 19, 2, 3, 0x7EDCB);
print_bits(test_array, 9);
printf("Retrieving the 19-bit value starting at bit 3 + (2 * 19): %X\n",
Get_nBit_Int(test_array, 19, 2, 3));

return 0;

}
 
K

Keith Thompson

Richard Heathfield said:
A few years ago, a guy on sci.crypt defended his use of gcc extensions on
the grounds that his desire to use 19-bit integers could not be met in ISO
C. Naturally enough, then, I spent a few minutes cutting some ISO C code
that would let him do just that. (It's a bit of a hack, as you'll see in a
moment, but it should work fine on a C8S16IL32 system, which is almost
certainly what you're using.)
[...]

I haven't studied the code in any depth, but I'll bet you could make
it portable, at least to systems with CHAR_BIT==8, by judicious use of
<stdint.h> or equivalent.

[...]
You are welcome to use what follows, without payment, provided you credit me
in the source code.

#include <stdio.h>
#include <limits.h>

#define SET_BIT(a, n) (a)[(n) / CHAR_BIT] |= \
(unsigned char)(1U << ((n) % CHAR_BIT))

#define CLEAR_BIT(a, n) (a)[(n) / CHAR_BIT] &= \
(unsigned char)(~(1U << ((n) % CHAR_BIT)))

#define TEST_BIT(a, n) (((a)[(n) / CHAR_BIT] & \
(unsigned char)(1U << ((n) % CHAR_BIT))) ? 1 : 0)
[snip]

unsigned int BitsInUnsignedInt(void)
{
static unsigned int answer = 0;
unsigned int testval = UINT_MAX;
if(answer == 0)
{
while(testval > 0)
{
++answer;
testval >>= 1;
}
}

return answer;

}
[snip]

unsigned int Get_nBit_Int(unsigned char *BitArray,
unsigned int n,
unsigned int Index,
unsigned int BaseBit)
{
unsigned int Value = 0;
unsigned int j;
unsigned int i = Index * n;

if(n <= BitsInUnsignedInt()) [snip]
return Value;

}

BitsInUnsignedInt() always returns the same value. I'd call it once,
store the value, and use the stored value.

Note that it differs from CHAR_BIT*sizeof(unsigned int) only if
unsigned int has padding bits.

[snip]

In both Get_nBit_Int() and Put_nBit_Int(), you process one bit at a
time. It occurs to me that you could probably work on a byte at a
time, using shifts and masks to grab any partial bytes at the
beginning and end of the bit sequence; the resulting code might be
significantly faster. Alas, I'm too lazy to write it.
 
R

Richard Heathfield

Keith Thompson said:
It occurs to me that you could probably work on a byte at a
time, using shifts and masks to grab any partial bytes at the
beginning and end of the bit sequence; the resulting code might be
significantly faster. Alas, I'm too lazy to write it.

Likewise. Your comments are noted and appreciated, but since this is only
the second time in N years that the code could conceivably have been the
slightest use to anyone, I prefer to focus my programming efforts
elsewhere. In other words, I'm too lazy to fix it. :)
 
M

Malcolm

Matt Kowalczyk said:
Looks like that's what I will have to do. What I really want is a
bitbuffer! I
was browsing around on the internet and I found one used by tooLAME (an
mpeg
encoder) which hopefully won't be too complicated to integrate into my
project.
You really shouldn't have to raid an MPEG library to write this simple
function.

int squashASCII(unsigned char *out, char *in)
{
int rack = 0; /* buffer to hold bits */
int racklen = 0; /* number of bits in buffer */
int answer = 0; /* number of bytes written */
while(*in)
{
if(!isascii(*in))
return -1;
rack += *in;
rack <<= 7;
racklen += 7;
if(racklen > 8)
{
out++ = (rack & 0xFF) >> 8;
racklen -= 8;
answer++;
}
in++;
}
/* write the last few to output */
if(racklen > 0)
{
rack <<= (16 - racklen);
*out++ = (rack & 0xFF) >> 8;
answer++;
if(racklen > 8)
{
*out++ = (rack & 0xFF);
answer++;
}
}

return answer;
}

It hasn't been tested.
 
M

Malcolm

Malcolm said:
You really shouldn't have to raid an MPEG library to write this simple
function.

int squashASCII(unsigned char *out, char *in)
{
int rack = 0; /* buffer to hold bits */
int racklen = 0; /* number of bits in buffer */
int answer = 0; /* number of bytes written */
while(*in)
{
if(!isascii(*in))
return -1;
rack += *in;
rack <<= 7;
racklen += 7;
if(racklen > 8)
{
out++ = (rack & 0xFF) >> 8;
racklen -= 8;
answer++;
}
in++;
}
/* write the last few to output */
if(racklen > 0)
{
rack <<= (16 - racklen);
*out++ = (rack & 0xFF) >> 8;
answer++;
if(racklen > 8)
{
*out++ = (rack & 0xFF);
answer++;
}
}

return answer;
}

It hasn't been tested.
In fact it's a sleepy Sunday night function.
It won't work properly and you'll have to fiddle with it to get it to work
(too tired now to fix it).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top