sprintf possibly broken with unions?

B

bushman

Hello,

I wrote a simple program that reads a formatted string into union
members and then does an sprintf to rewrite those values back into
another formatted string. Below is the code, if you run it, you will
see that you get a valid number printed for the first %d, however, all
values after that are invalid pointers. Please see the code below and
I will explain what I have found so far:

#include <stdio.h>
#include <windows.h>

union localizeValue
{
int intValue;
char charValue[260];
};

void main()
{
char formatted[128] = {0};
char unFormatted[128] = {0};

localizeValue values1, values2, values3;

char tempStr[128] = {0};

strcpy(formatted, "Using the help system %d times, every %d days,
every %d weeks");
strcpy(unFormatted, "Using the help system 27 times, every 5 days,
every 1 weeks");

sscanf(unFormatted, formatted, &values1, &values2, &values3);

sprintf(tempStr, formatted, values1, values2, values3);
}

so in this case I get the value of tempStr = "Using the help system 27
times, every -858993460 days, every -858993460 weeks"

if I specify the values as value1.intValue, value2.intValue, etc. in
the sprintf statement, everything prints out fine. So it seems that
the first value (values1) is has sprintf pointing at the beginning of
my union so it displays the correct number. It also seems as though
from there on out it's not pointing at the beggining of the next
union, so it has not value. If I change the value in the union of
charValue to be an array of size 4 (size of int), everything works
correctly also.

Can anyone explain why I am seeing this behavior... is there a fix for
this that anyone knows of?

Thanks!
 
R

robertwessel2

Hello,

I wrote a simple program that reads a formatted string into union
members and then does an sprintf to rewrite those values back into
another formatted string.  Below is the code, if you run it, you will
see that you get a valid number printed for the first %d, however, all
values after that are invalid pointers.  Please see the code below and
I will explain what I have found so far:

#include <stdio.h>
#include <windows.h>

union localizeValue
{
        int intValue;
        char charValue[260];

};

void main()
{
        char formatted[128] = {0};
        char unFormatted[128] = {0};

        localizeValue values1, values2, values3;

        char tempStr[128] = {0};

        strcpy(formatted, "Using the help system %d times, every %d days,
every %d weeks");
        strcpy(unFormatted, "Using the help system 27 times, every 5 days,
every 1 weeks");

        sscanf(unFormatted, formatted, &values1, &values2, &values3);

        sprintf(tempStr, formatted, values1, values2, values3);

}

so in this case I get the value of tempStr = "Using the help system 27
times, every -858993460 days, every -858993460 weeks"

if I specify the values as value1.intValue, value2.intValue, etc. in
the sprintf statement, everything prints out fine.   So it seems that
the first value (values1) is has sprintf pointing at the beginning of
my union so it displays the correct number.  It also seems as though
from there on out it's not pointing at the beggining of the next
union, so it has not value.  If I change the value in the union of
charValue to be an array of size 4 (size of int), everything works
correctly also.

Can anyone explain why I am seeing this behavior... is there a fix for
this that anyone knows of?


There is no standard format specifier in *printf that accepts a
structure or union argument, and I've never seen a an extension for
that either (nor can I imagine how it might be done, except in a very
clumsy way).

When you pass a structure or union in C or C++, it's passed by
reference, and on most implementations that means that you'll get
three localizeValue unions pushed on to the parameter stack for your
call to sprintf, each at about 260 bytes. Since you've told sprintf
that you have three ints on the stack, the behavior is clearly
undefined, although it would not be uncommon that the first int
happens to line up with the first bytes of the first passed union in
such a way as to produce the result you appear to be expecting.
Although that's most certainly not guaranteed.

The fix is to not lie to sprintf about what you’re passing to it.
 
B

Beej Jorgensen

bushman said:
sscanf(unFormatted, formatted, &values1, &values2, &values3);
sprintf(tempStr, formatted, values1, values2, values3);

You probably want something like: values1.intValue. Imagine
localizeValue were a struct or class--you'd definitely add ".intValue"
then, right? It's the same with a union.

But I'd be curious where you were going with this union, because I'm not
sure it's where you think.

-Beej
 
J

Jonathan Lee

Use the actual members of the union for sscanf and sprintf:

sscanf(unFormatted, formatted, &values1.intValue,
&values2.intValue, &values3.intValue);
sprintf(tempStr, formatted, values1.intValue,
values2.intValue, values3.intValue);

--Jonathan
 
B

bushman

You probably want something like:  values1.intValue.  Imagine
localizeValue were a struct or class--you'd definitely add ".intValue"
then, right?  It's the same with a union.

But I'd be curious where you were going with this union, because I'm not
sure it's where you think.

-Beej

The reason I am using a union is because, (even though I'm using all
ints in this example) I am translating strings manually in my
program. so someone can pass in any type of string format which would
include %p, %s, %f, %d, etc. and I was going to use a union so I
could avoid having to parse a string and determine where each type
specifier is needed. Hope that makes sense. A union sounded like a
perfect solution.
 
A

Alf P. Steinbach

* Beej Jorgensen:
Maybe someone with more C++ knowledge can verify if the first byte of
data in a struct necessarily corresponds with the first byte of data in
all the fields

It doe so correspond for a POD (even though the relevant text, a clarification
of the intention of the rule for reinterpret_cast, is a non-normative note).

, but it's looking like the answer is "no" based on the
wonky results we've seen so far.

'blaarg' has already answered the OP's question else-thread.


Cheers & hth.,

- Alf
 
H

highegg

The reason I am using a union is because, (even though I'm using all
ints in this example) I am translating strings manually in my
program.  so someone can pass in any type of string format which would
include %p, %s, %f, %d, etc.  and I was going to use a union so I
could avoid having to parse a string and determine where each type
specifier is needed.  Hope that makes sense.  A union sounded like a
perfect solution.

The problem is that printf wants arguments passed in by value and for
that you need to know their byte size, but your scanf call does not
store the size of the argument (int/double/pointer/string) anywhere.
Your approach will work as long as all allowed fields are of the same
bytesize.

The GCC compiler provides the "parse_printf_format" function that
would allow you do what you need - parse the string for you to get the
field bytesizes.

As you can see, printf/scanf are not suited very well to work with
user-specified strings.
 
J

James Kanze

I wrote a simple program that reads a formatted string into
union members and then does an sprintf to rewrite those values
back into another formatted string. Below is the code, if you
run it, you will see that you get a valid number printed for
the first %d, however, all values after that are invalid
pointers. Please see the code below and I will explain what I
have found so far:

Note that the C style IO functions are seriously broken. Don't
use them. (Note that sprintf is even more broken than the
others. It's almost impossible to use in a correct program.)
You should rewrite the code to use iostream.
#include <stdio.h>
#include <windows.h>
union localizeValue
{
int intValue;
char charValue[260];
};
void main()
{
char formatted[128] = {0};
char unFormatted[128] = {0};
localizeValue values1, values2, values3;
char tempStr[128] = {0};
strcpy(formatted, "Using the help system %d times, every %d days,
every %d weeks");
strcpy(unFormatted, "Using the help system 27 times, every 5 days,
every 1 weeks");
sscanf(unFormatted, formatted, &values1, &values2, &values3);

Undefined behavior. %d requires a pointer to an int, and you
pass a pointer to a localizeValue. Practically, this will work
on every implementation I've ever seen or heard of, but
formally, it's undefined beahvior. If you'd been using an
istream, like you should be, it wouldn't compile, since there is
no >> operator for localizeValue.
sprintf(tempStr, formatted, values1, values2, values3);

Undefined behavior. %d requires an int, not a localizeValue.
Practically, this will work if and only if localizeValue has the
same size as int. (In other words, if sizeof(int)==260. I'm
not aware of any machines off hand where that is the case.) If
you'd been using an ostream, like you should be, it wouldn't
have compiled, since there is no << operator for localizeValue.
so in this case I get the value of tempStr = "Using the help
system 27 times, every -858993460 days, every -858993460
weeks"

Or you get a core dump. Or just about anything else.
if I specify the values as value1.intValue, value2.intValue,
etc. in the sprintf statement, everything prints out fine.

Normal. The type of value1.intValue is int, and that's what is
required.
So it seems that the first value (values1) is has sprintf
pointing at the beginning of my union so it displays the
correct number. It also seems as though from there on out
it's not pointing at the beggining of the next union, so it
has not value. If I change the value in the union of
charValue to be an array of size 4 (size of int), everything
works correctly also.

By accident.
Can anyone explain why I am seeing this behavior... is there a
fix for this that anyone knows of?

You've already found the immediate fix: use value.intValue
everywhere you use value. The real fix, of course, is to learn
iostream, and use it.
 
J

James Kanze

On Apr 20, 4:18 pm, bushman <[email protected]> wrote:

[...]
There is no standard format specifier in *printf that accepts
a structure or union argument, and I've never seen a an
extension for that either (nor can I imagine how it might be
done, except in a very clumsy way).

Back when I implemented a version of the C standard library, we
extended it to support user defined types. Basically, we
allowed '<...>' as a format specifier, e.g. "%<abc> %#5.3<xyz>".
The user could then register a mapping name to function, and
we'd call the function to do the actual formatting.

With iostream, of course, this is standard---you just define an
appropriate overload of operator said:
When you pass a structure or union in C or C++, it's passed by
reference,

Since when? If there is a parameter, whatever you pass is used
to initialize a variable with the type of the parameter---if the
parameter is a reference, then a reference is initialized, and
if it isn't, then an object with the given type is initialized.
When there isn't a corresponding parameter (as is the case
here), the rules are a bit more complicated, but in this case,
the union will be passed by value (in a way such that it can be
accessed by va_arg).
and on most implementations that means that you'll get three
localizeValue unions pushed on to the parameter stack for your
call to sprintf, each at about 260 bytes.

Exactly. No reference.
Since you've told sprintf that you have three ints on the
stack, the behavior is clearly undefined, although it would
not be uncommon that the first int happens to line up with the
first bytes of the first passed union in such a way as to
produce the result you appear to be expecting. Although
that's most certainly not guaranteed.
The fix is to not lie to sprintf about what you're passing to
it.

The real fix is not to use sprintf.
 
J

James Kanze

bushman said:
union localizeValue
{
int intValue;
char charValue[260];
};
[snip obfuscated code]
(the code basically does this:
union localizeValue x, y;
x.intValue = 123;
y.intValue = 456;
printf( "%d, %d\n", x, y );
and he gets garbage for the second %d)
The problem with your code is that you're passing the wrong
argument for the %d specifier. It expects an int, and you're
passing it a union. It's irrelevant what members the union
has, simply that "union localizeValue" names a different type
than "int". Since there's no specifier for "union
localizeValue", and you want to print the intValue member,
pass that member:
printf( "%d, %d\n", x.intValue, y.intValue );
(I saw other people's replies, but none seemed to clearly
specify the cause of the problem, instead getting side-tracked
with implementation issues).

Yes. And of course, the same applies to his scanf, although
it's much less likely to cause a problem in practice.
 
I

Ian Collins

James said:
Note that the C style IO functions are seriously broken. Don't
use them. (Note that sprintf is even more broken than the
others. It's almost impossible to use in a correct program.)
You should rewrite the code to use iostream.

Why?

For a number of common use cases like having to build a fixed width
string with fixed width fields, sprintf is perfectly safe and less
verbose than using a stringstream.
 
R

robertwessel2

Since when?  If there is a parameter, whatever you pass is used
to initialize a variable with the type of the parameter---if the
parameter is a reference, then a reference is initialized, and
if it isn't, then an object with the given type is initialized.
When there isn't a corresponding parameter (as is the case
here), the rules are a bit more complicated, but in this case,
the union will be passed by value (in a way such that it can be
accessed by va_arg).


Exactly.  No reference.


Obvious brain fart on my part. I typed "by reference" when I meant
"by value." At least I described "by value..." *sigh*
 
J

James Kanze

James Kanze wrote:

Why are the C style IO functions seriously broken, or why is it
almost impossible to use sprintf in a correct program?
For a number of common use cases like having to build a fixed
width string with fixed width fields, sprintf is perfectly
safe and less verbose than using a stringstream.

Except that it doesn't have any support for fixed width fields.
(Nor does ostringstream, of course, but at least, it doesn't
overwrite memory when the values are too big. And I've
implemented a MemoryStreambuf which does ensure fixed width, at
least for the complete string.)
 
V

Vladimir Jovic

James said:
Why are the C style IO functions seriously broken, or why is it
almost impossible to use sprintf in a correct program?

I am interested in both questions :) So, why to both?

Except for a possible buffer overwrite. I would use snprintf() instead
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top