Files containing by null characters

J

James Aguilar

Hey all,

I am trying to write a preliminary compression routine which counts the number
of different characters it sees in an input stream, compresses that set of
characters to either 2 bits, 4 bits, or 8 bits, then writes the compressed data
to the output stream. My target alphabet is the gene coding alphabet {A, C, G,
T}, so, understandably, I'd like to be able to encode four characters down to
two bits (this would reduce a ten megabyte file to 250 k, which is a huge
saving, considering I will then immediately move to a block compression model,
which will save even more space).

My problem is that if I map A to 0, the string AAAA yields 0x00 as the byte to
be output on the stream. When I open the file to decompress it, the istream
stops on that character. How can I prevent the istream from stopping until the
file is -really- over?

- JFA1
 
R

rmilh

Hey all,

I am trying to write a preliminary compression routine which counts the number
of different characters it sees in an input stream, compresses that set of
characters to either 2 bits, 4 bits, or 8 bits, then writes the compressed data
to the output stream. My target alphabet is the gene coding alphabet {A, C, G,
T}, so, understandably, I'd like to be able to encode four characters down to
two bits (this would reduce a ten megabyte file to 250 k, which is a huge
saving, considering I will then immediately move to a block compression model,
which will save even more space).

My problem is that if I map A to 0, the string AAAA yields 0x00 as the byte to
be output on the stream. When I open the file to decompress it, the istream
stops on that character. How can I prevent the istream from stopping until the
file is -really- over?

- JFA1

You are probably using operator >> which expects ascii text. Try with
member function get() instead.
 
J

James Aguilar

rmilh said:
You are probably using operator >> which expects ascii text. Try with
member function get() instead.

The problem is not that I cannot get the character (I can't, but that's another
issue). The problem is that fail() evaluates to true if the stream encounters a
0. This means that, even were I to ignore the fail and continue, the behavior
would be undefined.

Anyone else? For now, I'm just mapping the first char to 0x01 instead of 0x00,
which solves the problem, but necessarily cuts my memory efficiency in half.

- JFA1
 
R

rmilh

The problem is not that I cannot get the character (I can't, but that's another
issue). The problem is that fail() evaluates to true if the stream encounters a
0. This means that, even were I to ignore the fail and continue, the behavior
would be undefined.

Anyone else? For now, I'm just mapping the first char to 0x01 instead of 0x00,
which solves the problem, but necessarily cuts my memory efficiency in half.

- JFA1

unsigned char c;
ifstream in(filename,ios::in | ios::binary);
while(in.get(c))
{
cout << c;
}

//or

ifstream in(filename,ios::in | ios::binary);
while (in.peek() != EOF )
{
cout << (char)(in.get() ) ;
}

// or

filebuf in;
if (in.open( filename , ios::in | ios::binary))
{
while (in.sgetc() != EOF)
{
cout << (char)(in.sbumpc() ) ;
}
}

Works for me! Or maybe I'm missing something obvious?
 
J

James Aguilar

rmilh said:

Actually, I lied. No, that still doesn't work. Try this test program:

#include <iostream>
#include <fstream>

using namespace std;

int main()
{
ofstream o("test", ios::eek:ut | ios::binary);

o << "ASD\0\0ASD";
o.close();

ifstream i("test", ios::in | ios::binary);

char c;
while (i) {
i.get(c);
}

return 0;
}

On my computer, "while (i)" evaluates to false where the first 0 should be. As
we all know, EOF is guaranteed to be equal to (char) 0, changing the test to
"i.peek() != EOF" or any such thing will not help.

- JFA1
 
J

James Aguilar

James Aguilar said:
On my computer, "while (i)" evaluates to false where the first 0 should be.
As we all know, EOF is guaranteed to be equal to (char) 0, changing the test
to "i.peek() != EOF" or any such thing will not help.

And as we all know, I'm retarded and don't know what I'm talking about. EOF
is -1.

Sorry.

- JFA1
 
J

James Aguilar

James Aguilar said:
On my computer, "while (i)" evaluates to false where the first 0 should be.
As we all know, EOF is guaranteed to be equal to (char) 0, changing the test
to "i.peek() != EOF" or any such thing will not help.

And as we all know, I'm retarded and don't know what I'm talking about. EOF
is -1.

Sorry.

- JFA1
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

James said:
Actually, I lied. No, that still doesn't work. Try this test program:

#include <iostream>
#include <fstream>

using namespace std;

int main()
{
ofstream o("test", ios::eek:ut | ios::binary);

o << "ASD\0\0ASD";

You are writing a C-style string here, the \0 marks the end of the string.
 
K

Karl Heinz Buchegger

James said:
rmilh said:

Actually, I lied. No, that still doesn't work. Try this test program:

#include <iostream>
#include <fstream>

using namespace std;

int main()
{
ofstream o("test", ios::eek:ut | ios::binary);

o << "ASD\0\0ASD";
o.close();

ifstream i("test", ios::in | ios::binary);

char c;
while (i) {
i.get(c);
}

return 0;
}


On my computer, "while (i)" evaluates to false where the first 0 should be. As
we all know, EOF is guaranteed to be equal to (char) 0, changing the test to
"i.peek() != EOF" or any such thing will not help.

EOF has nothing to do with it.
Grab your favorite hex editor and take a look into the written
file. You will notice that it contains
ASD
only since the << operation stopped writing at the '\0' character.

Take a look at stream functions get(), put(), read(), write()
They are for working with binary files.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,666
Latest member
selsetu

Latest Threads

Top