Michael said:
I'm writing an application that requires an "intelligent merge" of 2
files. That is, equal data has a "preferred source" that I want to
write out. What I have works, I believe, but it seems horribly
cumbersome (having to set the input variables to ""...). Is there a
better way? TIA
while ((!feof(wf3)) || (!feof(wf1)))
This is the wrong way to check for end-of-file. Please
see Question 12.2 in the comp.lang.c Frequently Asked Questions
(FAQ) list at
http://www.eskimo.com/~scs/C-faq/top.html
Ditto.
This looks pointless. You're about to overwrite the contents
of WDBRec by using fgets() on it, so why do you care what's in it
beforehand? Perhaps this is an attempt to rescue the situation
after the unreliable end-of-file test -- if so, once you fix the
test you won't need this any more.
By the way, you didn't show us what WDBRec is. From the way
you're using it, it should be an array of char; a pointer to a
malloc'ed area would not work here.
if (fgets(WDBRec, sizeof(WDBRec), wf1) != NULL)
nic++;
}
if (!feof(wf3))
{
strcpy(DBERec, "");
if (fgets(DBERec, sizeof(DBERec), wf3) != NULL)
bic++;
}
dbeBib = atoi(copy(DBERec, 1, 5));
You haven't shown us what copy() is. I'm going to assume
that it copies the second through sixth characters (that is,
array elements [1] through [5]) into a six-char array somewhere
and appends a '\0'. Whether this works depends a lot on the
location and nature of that intermediate six-char array; see
Question 7.5 for a description of one all-too-frequent error.
(By the way, if fgets() didn't read anything, the second through
sixth characters will be the leftovers from the record prior to
the current one, if any.)
Despite its suggestive name, atoi() is not a very good way
to convert decimal strings to integers, not unless you're very
trusting of the source. The problem is that it will happily
convert "123x5" to 123 and give no indication that the input
is in any way strange. It won't even detect "xyzzy" as in any
way peculiar (indeed, its behavior on "xyzzy" is completely
unpredictable). So unless you are very, very sure that the
input is valid, atoi() is a poor way to convert it. There are
at least three superior ways to proceed:
- Use strtol(), because it will do the conversion *and*
report any oddities it finds, in a predictable way.
- Use sscanf(). It's a little bit trickier than it looks,
but allows you to do without the copy() stuff:
if (sscanf(DBERec+1, "%5d%n", &dbeBib, &len) == 1
&& len == 5) { all's well } else { bad input }
The "%5d" converts no more than five digits (in case
additional digits follow the field of interest). The
"%n" tells you how many digits were actually converted
(it will set len to 3 if the input was "123xy"). And
if the "%5d" finds no digits at all ("xyzzy") sscanf()
will stop and return zero.
- If you're really sure the respective fields contain digits,
you can compare them as characters without converting at
all by using memcmp(DBERec+1, WDBRec+1, 5). However, this
may cause some surprises with non-digits: for example,
"01234" and " 1234" will be treated as unequal, and "-1234"
will be treated as less than "-9999". You'll have to decide
whether this is appropriate for your application.
wdbBib = atoi(copy(WDBRec, 1, 5));
if (wdbBib == dbeBib) // records match - defer to old data
{
if (dbeBib > 0)
I'm not sure what this test is for, unless perhaps it's
part of the rescue attempt for the incorrect end-of-file test.
If there really are actual non-positive numbers in the input,
it looks like this will eliminate them from the output. But
if you've got purely digit fields that can't be negative (though
"00000" would, of course, be zero), I think this test and the
others like it can simply go away once you fix the EOF handling.
{
writeToDBE(DBERec); buc++;
}
}
if (wdbBib < dbeBib) // work file data is new - write it
{
if (wdbBib > 0)
{
writeToDBE(WDBRec); nuc++;
}
else
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
}
if (wdbBib > dbeBib)// prevailing old data - write it out
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
} // while
You say you believe this works, but one thing that strikes
me as strange is that you read new input from *both* files every
time through the loop. (Until the botched EOF detection kicks
in, of course.) That doesn't seem right at all: If you get the
sequence "11111" "33333" "55555" from WDB while DBE provides
"22222" "44444", I'd expect you'd want to see all five of these
in the output -- but that's not what you're doing, and I'm not
sure whether it's accidental or intentional. Take another look.