snprint rationale?

C

Charlie Gordon

Michael Mair said:
Keith said:
Charlie Gordon said:
[...]

No point in calling snprintf again, sprintf would do just fine:

result = sprintf(mem, fmt, arg1, arg2);

And, unless you're really paranoid, you don't even need its return value,
because it *must* be equal to ``needed'' (if the original snprintf call
succeeded, a sprintf call with the same arguments and in the same locale
must return the same value).

Bad advice : if the code for assessing the length is duplicated for
the actual formatting, chances are these copies will get out of
sync! how is it better to call sprintf() instead of snprintf() ?
What is there to gain ?

I suspect any actual implementation is going to use much of the same
underlying code for all the *printf functions. And even if the code
is duplicated, any discrepancy between the length determined by
snprintf() and the length determined by sprintf() would be a bug in
the implementation. If you're unwilling to assume that the
implementation gets things right, you probably shouldn't be using it
in the first place. If you happen to know of a specific bug in some
implementation, it can make sense to guard against it in your code
(*and* submit a bug report to the implementer), but guarding against
implementation bugs in general is a waste of time. Or did I miss your
point?

Charlies point, AFAICS, is that you easily can make a slip of the kind
"Oh, I will always copy over the format string" and then forget it
once. This may lead to exactly the buffer overrun which was to be
avoided in the first place.
As Dan referred to the const char * fmt in both cases, this point is
moot here; but in general, I agree that it certainly does not hurt to
always use snprintf...

Exactly. The bold programmer is his own worst enemy.
In Chris Torek's example snprintf() is called with a variable format string.
This is *bad* practice. It prevents casual and compiler assisted verification
of parameter consistency. The unknown format string fmt points to may well
refer to more than 2 parameters and not necessarily of the correct type.
Calling snprintf() twice with the same arguments may well produce different
output in this case (or much worse UB of course).
What I was refering to in my reply was not snprintf() and sprintf()
implementations getting out of sync: they most likely share a common
implementation. But the programmer's code for those 2 calls runs the risk of
differentiating of time, via ill-advised maintenance. Call it the law of
evolution applied to C code. Even from day one, the 2 calls may have been
created different by mistake.
It would be much less error-prone to wrap those in some kind of loop executed
twice with a single snprintf() statement.
GNU defines asprintf() for this purpose. The implementation is straightforward
and can be made an easy addition to projects targetted at non gcc platforms.

int asprintf(char **strp, const char *fmt, ...);
int vasprintf(char **strp, const char *fmt, va_list ap);

strp will receive a pointer to storage allocated via malloc(). This is exactly
what Chris Torek does in his example, except more general and usable.

Chqrlie.
 
D

Dan Pop

In said:
Call it paranoia if you like, but I recommend the two-snprintf()
method anyway. For instance, suppose we have:

char *some_subroutine(const char *str, int n) {
...
needed = snprintf(NULL, 0, "%s%d", str, n);
...
mem = malloc(needed + 1);
...
return mem;
}

and suppose the caller passes, as the "str" argument, a pointer to
freed space (or similarly invalid pointer) that gets overwritten
with some new value by the malloc() call. In particular, if
strlen(str) increases, the second snprintf() will "want to" write
more characters than we just allocated.

Your first example was, at least, realistic: it presented the code as
a compact block, exactly as it is going to be used in real code. There
is no point in mixing any other code between the lines, as your highly
artificial second does.

My point is that each function call serves a well defined purpose:
the snprintf call is used to determine the length of a buffer needed
by sprintf to generate the desired output. Once you have achieved this
purpose, you can *safely* use sprintf to generate the output, and I
don't buy excuses along the lines of "but I am an idiot who cannot rely
on his ability to call both functions correctly".

Checking the return value of the second s[n]printf call merely complicates
the code for no redeeming benefits. That is, unless you get paid
proportionally with the number of lines of source code you write ;-)
Obviously such a call is faulty -- but using snprintf() twice will
limit any additional damage, and the fact that the number of
characters printed has changed may help find the bug. The *cost*
of using snprintf() twice (instead of snprintf() followed by plain
sprintf()) is likely negligible; it may even save CPU time and/or
(code) space.

Please enlighten me about the way to implement snprintf in such a way
that it uses less (or even the same number of) CPU cycles as sprintf (for
the same arguments). Unless I'm missing something, snprintf has to do
everything sprintf does *and* perform some extra work.

Dan
 
C

Chris Torek

Please enlighten me about the way to implement snprintf in such a way
that it uses less (or even the same number of) CPU cycles as sprintf (for
the same arguments). Unless I'm missing something, snprintf has to do
everything sprintf does *and* perform some extra work.

Well, as it turns out, in my own implementation, it uses pretty
much the same number of instructions -- sprintf and snprintf both
call __svfprintf, which does all the real work, and __svfprintf
uses internal write routines to copy into the output buffer(s) for
stdio FILE objects. The "buffer" for a string is simply the memory
area being written to, and if it fills up, extra data are quietly
ignored.

Since the usual (and only correct!) case is, as you pointed out, that
the second snprintf() does not overflow, the "ignore overflow" code
is not used -- so sprintf and snprintf take the same code path. The
only divergence is at the top of the first call, where sprintf says:

fake_file_struct.len = 32767; /* XXX */

(I think I used 32767, but it might be some other constant), while
snprintf has an extra argument and says:

fake_file_struct.len = buffer_size;

Hnece one has a "move constant" instruction while the other has a
"copy argument" instruction; and of course, in the calls to sprintf
or snprintf, we get either "no parameter" or "buffer size parameter",
which may or may not require some instruction(s) depending on
calling sequence. A typical example might be:

mov %l1,%o0 ! buffer
mov %l5,%o1 ! fmt
mov %l6,%o2 ! arg
call sprintf

vs:

mov %l1,%o0 ! buffer
mov 80,%o1 ! sizeof buffer
mov %l5,%o2 ! fmt
mov %l6,%o3 ! arg
call snprintf

The extra instruction in the call to snprintf here is made up for
by the fact that "fake_file_struct.len = 32767" requires three
instructions inside sprintf, while "fake_file_struct.len = buffer_size"
requires only one (on the SPARC -- other architectures will differ).
(On the SPARC, we actually get a one-instruction advantage for
snprintf, but that is only because I used a smaller-than-4096-byte
buffer.)

Despite using (in effect) the same number of instructions, I claim
the snprintf() call potentially runs (slightly) faster. Why?
Because we just called it, so it is probably already in the
instruction cache. (Of course, this depends on how the I-cache
interacts with the actual printf engine -- __svfprintf, in my
stdio.)

Again, the advantages of using snprintf twice (instead of snprintf
once, then sprintf once) are anywhere from tiny to nonexistent,
but the disadvantages appear always to be nonexistent. If one
balances "up to one microgram" against "zero micrograms", the way
to bet is that the up-to-one-microgram version is heavier. :)
 
A

Alan Balmer

Well, as it turns out, in my own implementation, it uses pretty
much the same number of instructions

Which is not only irrelevant to standard C, but does not at all
invalidate Dan's observation.

Personally, I would use sprintf for the second call, else next year I
would be scratching my head and wondering why I used an unneeded
constraint. (Gee, did I lose a line of code somewhere?)
 
D

Dan Pop

In said:
Again, the advantages of using snprintf twice (instead of snprintf
once, then sprintf once) are anywhere from tiny to nonexistent,
but the disadvantages appear always to be nonexistent.

You're forgetting the real comparison: calling snprintf to do the real
work and checking its return value, as suggested in your post versus
calling sprintf to do the real work and ignoring its return value, as
suggested in my post.

I don't really care about the extra cycles, but I do care about the
source code bloat.

The reason I object to using snprintf for doing the real work is that it
is a conceptually more complex function and the added complexity doesn't
buy you anything. Furthermore, this added complexity is a minor potential
source of additional bugs (you have one more function argument that you
have to get right).

Dan
 
D

Dan Pop

In said:
Since the usual (and only correct!) case is, as you pointed out, that
the second snprintf() does not overflow, the "ignore overflow" code
is not used -- so sprintf and snprintf take the same code path. The
only divergence is at the top of the first call, where sprintf says:

fake_file_struct.len = 32767; /* XXX */

(I think I used 32767, but it might be some other constant), while
snprintf has an extra argument and says:

fake_file_struct.len = buffer_size;

Does this mean that your sprintf implementation is broken? If I got you
right, it truncates its output to some arbitrary length.

Dan
 
C

Chris Torek

Does this mean that your sprintf implementation is broken? If I got you
right, it truncates its output to some arbitrary length.

Well, yes; but also no. The Standard says only that I have to
handle a small fixed length, and I handle a larger fixed length. :)

If I may digress a bit...

I really do not remember what I set the size to -- possibly INT_MAX,
as the read-count field in the stdio structure is an "int" (signed
for various reasons, although actually the write count is the only
one that *has* to be signed). I do, however, remember where I
first saw this magic constant of 32767, which also might help
explain how snprintf came about in the first place.

Some time in the 1980s, I was working on a problem with a program
that used sprintf to print into an internal buffer, but was
overrunning the buffer. I investigated the stdio implementation
at the time, and saw that it did pretty much just what I expected:
set the output pointer to point to the buffer you handed it, and
set the length to "a large number" (32767, in this case -- the code
had obviously been taken from the 16-bit PDP-11 environment, where
it was the maximum possible value; though I doubt signedness was
really required, so that it could have been unsigned and hence
65535).

In any case, it was immediately obvious that I could clone the code
but set the length smaller. There was a special flag associated
with the stdio data structure as well. The only question I had
was: what happens if you exceed the supplied count?

The answer, as it turns out, was that the stdio internal code would
call write() on whatever file descriptor number was lying around
in the stdio structure. The special flag was ignored entirely!
Trying to write snprintf() using this particular printf engine
(named "_doprnt" and written in VAX assembly language) turned out
to be hopeless. I considered modifying the assembly language; but
this was unappealing for multiple reasons.

When I discovered that we needed to rewrite the engine in C (both
because of the in-progress ANSI C standard, and because the folks
at UCBerkeley were porting the system to the Tahoe, which did not
have the VAX's "editpc" instruction), I took advantage of the
rewrite to fix a bunch of things, including adding snprintf(). (I
also wanted a "malloc()ing sprintf", a la GNU's asprintf(), but
that got vetoed. I did get funopen() put in, but it was not
adopted for C99.)

The only trick I missed in that early snprintf() implementation
was allowing the buffer to be NULL if the size was given as 0.
Given my experience with programs with buffer overruns and programs
that wanted an "allocating sprintf", I did know that snprintf()
had to limit its actual output, but return the number of characters
needed for the complete output. (Of course, we also needed va_copy()
so that vsnprintf() could be used to build an allocating vasprintf(),
but again that had to wait for C99. The two-pass method also has
those "atomicity issues" that can be solved using funopen() to
build asprintf()/vasprintf(), which would have been my preferred
solution.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,155
Messages
2,570,871
Members
47,401
Latest member
CliffGrime

Latest Threads

Top