J
James Aguilar
Hey all,
I am trying to write a preliminary compression routine which counts the number
of different characters it sees in an input stream, compresses that set of
characters to either 2 bits, 4 bits, or 8 bits, then writes the compressed data
to the output stream. My target alphabet is the gene coding alphabet {A, C, G,
T}, so, understandably, I'd like to be able to encode four characters down to
two bits (this would reduce a ten megabyte file to 250 k, which is a huge
saving, considering I will then immediately move to a block compression model,
which will save even more space).
My problem is that if I map A to 0, the string AAAA yields 0x00 as the byte to
be output on the stream. When I open the file to decompress it, the istream
stops on that character. How can I prevent the istream from stopping until the
file is -really- over?
- JFA1
I am trying to write a preliminary compression routine which counts the number
of different characters it sees in an input stream, compresses that set of
characters to either 2 bits, 4 bits, or 8 bits, then writes the compressed data
to the output stream. My target alphabet is the gene coding alphabet {A, C, G,
T}, so, understandably, I'd like to be able to encode four characters down to
two bits (this would reduce a ten megabyte file to 250 k, which is a huge
saving, considering I will then immediately move to a block compression model,
which will save even more space).
My problem is that if I map A to 0, the string AAAA yields 0x00 as the byte to
be output on the stream. When I open the file to decompress it, the istream
stops on that character. How can I prevent the istream from stopping until the
file is -really- over?
- JFA1