How to treat unknown data...

P

Pablo

I have a dilemma.

Currently, I may be passing standard text (strings of char) or binary of 1
to 'x' bytes long to a program for comparison with data previously written
to a file. The problem I'm having is I'm writing some routines which may
compare the data written on the disk to the some data being passed by the
programmer. And, the streams I'm passing may be shorter than the allowable
length limit (defined at runtime) but never longer. The problem I'm having
is that if I pass say, a string, the program I've written so far compares
'data' from programmer (less than or eq max possible length) to data on
disk- so the comparison currently compares the total possible size of the
string of bytes. What happens is that if I pass a standard string of char,
say "abcde" (with null terminator), my program compares "abcde\0[some
garbage]" to what ended up on disk- and of course the comparison fails,
because what may have been written to the disk is "abcde\0[some different
garbage]". What I'm trying to get my mind around is, is there any non-brute
force method of of guaranteeing the contents of the data- or scrubbing it
before writing to disk AND before recomparison so what the programmer
INTENDS to compare is what is actually being compared. The program as it is
MUST write all bytes of the allowable length. But it won't know WHAT it's
writing at runtime- only some fixed # of bytes of type 'void'. What I've
considered/considering thus far:

o Creating a meta table so the program knows at startup that it's comparing
strings, or binary, or both. (Don't like this- don't want the overhead)
o Simply demanding that the programmer clear the array of bytes so that it
contains NULLS for the entire length before writing to disk or comparing.
(makes more work for the programmer using the class/functions- and leads to
possible error if programmer is sloppy and doesn't clear memory).
o Having the programmer pass a secondary argument telling the class/function
to compare only 'x' bytes of the string being passed. Ie:
foo(bytestring,strlen(bytestring)); Where 'foo' performs:

xxx foo(void* bytes,int nlength)
{
return Compare(bytes,bytesreadfromdisk,nlength);
}

I'm not pleased with any of these methods, and am looking for something more
elegant. I've investigated templates (of which I'm very inexperienced), but
after reading a tutorial, I didn't find it to be aiming towards what I
really wanted.

Again, a string of bytes can be up to a given length (previously defined and
program becomes aware at runtime) and the string can contain readable text,
binary, or both- intermixed. The general FORMAT of the data will not change
at runtime. Ie, a 25 byte string may contain 10 of string, 4 of binary, 7
of string, and 3 of binary- much like an old flat file database system
would. Any ideas appreciated. And I'm happy to provide clarifications.

paul
 
?

=?iso-8859-1?q?Stephan_Br=F6nnimann?=

Why do you not require that all elements of the buffer are initialized
to 0?
Then you can always compare byte by byte.

Stephan Brönnimann
(e-mail address removed)
http://www.osb-systems.com
Open source rating and billing engine for communication networks.
 
P

Pablo

In essence, I am requiring that they be initialized to 0, but I don't want
to create unnecessary overhead for the programmer using my class- or create
a burdensome process where steps might be forgotten. My initial vision was
that I would provide member functions which would add data in one step, and
compare/find it in one step as thus:

AddData(data);
[...]
FindData(data);

The 'FindData(...)' member funciton reads data from the disk and basically
does a memcmp() on it. But because it's a generic class going to a
predefined file (fixed length elements) it will compare the entire length of
the data element, even if the data being passed by the programmer contains:
"paul\0".

I can certainly create a 'scrubber function' which might work like thus:

BuildData(data); //scrubs out data putting in NULL then puts programmer
data into buffer
AddData(data); //Adds the data
[...]
BuildData(data); //ditto
FindData(data);

While this approach will work, if the programmer forgets the 'BuildData'
function before any call to write or compare data, the compare will fail, or
the 'Add' will write 'unexpected' trailing garbage at the end of the data
buffer (which may or may not be null terminated). This introduces the
possibility of more bugs.

Paul


Why do you not require that all elements of the buffer are initialized
to 0?
Then you can always compare byte by byte.

Stephan Brönnimann
(e-mail address removed)
http://www.osb-systems.com
Open source rating and billing engine for communication networks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,184
Messages
2,570,978
Members
47,561
Latest member
gjsign

Latest Threads

Top