re-setting an array

A

Alf P. Steinbach

* Ioannis Vranos:
I did not understand what you are saying. For example, signed int and
unsigned int cannot have padding bits?

They can. In C more of the total bits can be padding bits in the signed
int. In C++ signed and unsigned are required to have the same number of
value representation bits, per the definition of "value representation".


In short it boils down to "is undefined behavior" versus "can be
undefined behavior". In the case of padding bits accessible to the
program it is undefined behavior. In the more general case of integer
types in C++ it isn't necessarily undefined behavior, and the
possibility of UB is only on antiquated machinery for which I'm not even
sure that C++ compilers exist, and furthermore that remote, purely
academic possibility can be avoided by a simple compile time assertion.

A compile time assertion is any statement that makes the program not
compile on a machine where some specified assumption does not hold.

So, wrt. integers in general, as opposed to the hypothetical situation
of having padding bits, it's not "is", it is "can hypothetically be".

What do you mean they are not accessible in a program.

What do you mean "they are not accessible in a program"?

All objects are accessible including their padding bits.

All C++ objects are accessible. You would however find it difficult to
access processor registers in pure standard C++ without any platform
specific library or language extensions. Similarly, you would find it
difficult to access any padding bits used internally by some processor
(said processor being just as hypothetical and far-fetched as one that
uses padding bits visible to C++). The context of my remark was
hypothetical _machines_ that use integer padding bits -- because it
seems you think such machines should be considered in deciding whether
to use memset or not. To understand that remark better, consider a real
machine, namely the ordinary PC, that uses extra bits in floating point
calculations (those are not padding bits, but). Whether the machine
itself uses such bits is one thing, and that is similar to the context
of the remark. Whether they're accessible to C++ is another; in old
versions of Visual C++ they were, in Visual C++ 7.1 they aren't.
 
J

JKop

Why does the following compile:

int BlahChar(char)
{
char aaa = 5;
return aaa;
}

int BlahChar(signed char)
{
signed char bbb = 5;
return bbb;
}

int BlahChar(unsigned char)
{
unsigned char ccc = 5;
return ccc;
}


int main()
{
signed char jk;

Blah(jk);
}



For instance:

short == short int == signed short == signed short int

int == signed int

long == long int = signed long = signed long int


Is the above correct in ALL circumstances?



I can only presume that:

char == signed char

char == unsigned char

is not neccessilary true, and that it's implementation defined. Would I be
right?


-JKop
 
J

John Harrison

JKop said:
Why does the following compile:

int BlahChar(char)
{
char aaa = 5;
return aaa;
}

int BlahChar(signed char)
{
signed char bbb = 5;
return bbb;
}

int BlahChar(unsigned char)
{
unsigned char ccc = 5;
return ccc;
}


int main()
{
signed char jk;

Blah(jk);
}



For instance:

short == short int == signed short == signed short int

int == signed int

long == long int = signed long = signed long int


Is the above correct in ALL circumstances?

AFAIK there is not such type as signed short, signed int or signed long. But
short == short int and long == long int always.
I can only presume that:

char == signed char

char == unsigned char

is not neccessilary true, and that it's implementation defined. Would I be
right?

Those are never true. Even though char is either signed or unsigned it is
not the same type as unsigned char, or signed char.

This mess is the history of C/C++. In the original C is was left undefined
whether char is signed or unsigned. I guess signed char was intorduced to
overcome this, at least you can now explicitly say you want a signed char.

john
 
A

Alf P. Steinbach

* JKop:
I can only presume that:

char == signed char

char == unsigned char

is not neccessilary true, and that it's implementation defined. Would I be
right?

Yes, if you mean what I think. §3.9.1/1: "Plain 'char', 'signed char',
and 'usigned char' are three distinct types."

'char' is either signed or unsigned in a given implementation, but not
the _same_ type as 'signed char' or 'unsigned char'.

The same situation holds for 'wchar_t', which is a distinct type but
mapped to another integer type, it's "underlying type" in the HS.
 
R

Richard Herring

JKop <[email protected]> said:
Why does the following compile:

int BlahChar(char)
{
char aaa = 5;
return aaa;
}

int BlahChar(signed char)
{
signed char bbb = 5;
return bbb;
}

int BlahChar(unsigned char)
{
unsigned char ccc = 5;
return ccc;
}


int main()
{
signed char jk;

Blah(jk);
}
Why wouldn't it? You have three distinct function overloads there.
For instance:

short == short int == signed short == signed short int

int == signed int

long == long int = signed long = signed long int

Is the above correct in ALL circumstances?



I can only presume that:

char == signed char

char == unsigned char

is not neccessilary true, and that it's implementation defined. Would I be
right?

Depends what you mean by "==". Why not buy that copy of the Standard and
turn to 3.9.1?

Plain char, signed char and unsigned char are three distinct types, but
plain char can take exactly the same values as one of the other two
types, and this choice is implementation-defined.
 
J

JKop

Richard Herring posted:
Why wouldn't it? You have three distinct function overloads there.


If I'd known that do you think I would've written my original post. Given
that, do you think your reply was in anyway stupid, ignorant or arrogant?

Depends what you mean by "==". Why not buy that copy of the Standard
and turn to 3.9.1?


You know exactly what I mean by "==", that's just you being ignorant again.

Because it's a rip-off.


-JKop
 
I

Ioannis Vranos

Alf said:
They can. In C more of the total bits can be padding bits in the signed
int. In C++ signed and unsigned are required to have the same number of
value representation bits, per the definition of "value representation".


You said it nicely. However both can have padding bits, and when we use
memset() to zero all bytes (including those with the padding bits), we
modify these parts of bytes too.



In short it boils down to "is undefined behavior" versus "can be
undefined behavior". In the case of padding bits accessible to the
program it is undefined behavior.


Ok then we agree.


In the more general case of integer
types in C++ it isn't necessarily undefined behavior, and the
possibility of UB is only on antiquated machinery for which I'm not even
sure that C++ compilers exist, and furthermore that remote, purely
academic possibility can be avoided by a simple compile time assertion.



It is not on antiquated machinery, only reading bytes containing padding
bits is guaranteed to be safe in the standard.




A compile time assertion is any statement that makes the program not
compile on a machine where some specified assumption does not hold.



Compile time assertions - if present - are system specific extensions
and their kind and use differ from system to system.

If you are saying that there are specific systems where we can
manipulate the padding bits and have no side effects, the answer is they
are. However we are talking about portability here.


All C++ objects are accessible. You would however find it difficult to
access processor registers in pure standard C++ without any platform
specific library or language extensions.


Yes, however using register keyword, it is up to the implementation to
decide if it will put a variable in a register.


Similarly, you would find it
difficult to access any padding bits used internally by some processor
(said processor being just as hypothetical and far-fetched as one that
uses padding bits visible to C++). The context of my remark was
hypothetical _machines_ that use integer padding bits -- because it
seems you think such machines should be considered in deciding whether
to use memset or not. To understand that remark better, consider a real
machine, namely the ordinary PC, that uses extra bits in floating point
calculations (those are not padding bits, but). Whether the machine
itself uses such bits is one thing, and that is similar to the context
of the remark. Whether they're accessible to C++ is another; in old
versions of Visual C++ they were, in Visual C++ 7.1 they aren't.


From the standard:


5.3.3 Sizeof

"When applied to a reference or a reference type, the result is the size
of the referenced type. When applied to a class, the result is the
number of bytes in an object of that class including any padding
required for placing objects of that type in an array."



If the object's size returned by sizeof includes padding bits, then
these padding bits have to be accessible.






Regards,

Ioannis Vranos
 
I

Ioannis Vranos

JKop said:
Why does the following compile:

int BlahChar(char)
{
char aaa = 5;
return aaa;
}

int BlahChar(signed char)
{
signed char bbb = 5;
return bbb;
}

int BlahChar(unsigned char)
{
unsigned char ccc = 5;
return ccc;
}


int main()
{
signed char jk;

Blah(jk);
}



For instance:

short == short int == signed short == signed short int

int == signed int

long == long int = signed long = signed long int


Is the above correct in ALL circumstances?




Of course since they are the *same type* respectively!





I can only presume that:

char == signed char

char == unsigned char

is not neccessilary true, and that it's implementation defined. Would I be
right?



The three above are different types, but it is required by the standard
their size to be 1 (1 byte) always.






Regards,

Ioannis Vranos
 
I

Ioannis Vranos

John said:
AFAIK there is not such type as signed short, signed int or signed long. But
short == short int and long == long int always.




Of course there is a type signed short, it is your "short" and "short
int". Its complete name is "signed short int" and it is glad to meet
you. :)

unsigned long int is also an existing type.






Regards,

Ioannis Vranos
 
A

Alf P. Steinbach

* Ioannis Vranos:
You said it nicely. However both can have padding bits, and when we use
memset() to zero all bytes (including those with the padding bits), we
modify these parts of bytes too.

Well, then, you'll have to come up with at least one C++ compiler
where this is the case for some standard integer type.

But note that that just moves the potential "problem" into the realm of
possibility.

If (against expectation) such compiler is found it doesn't mean that
zeroing arrays of integers via memset is UB in general; just that the
proposition that it _can hypothetically_ be can be amended to _can_.

It is not on antiquated machinery

No? Name one modern such computer, then. I.e. one in use with C++
compiler.

, only reading bytes containing padding
bits is guaranteed to be safe in the standard.
?



Compile time assertions - if present - are system specific extensions
and their kind and use differ from system to system.

No no no. What needs to asserted is simply that the range of the
relevant integer type requires the full number of bits in the object.
That is very easy to assert in system-independent way.
 
I

Ioannis Vranos

Alf said:
Well, then, you'll have to come up with at least one C++ compiler
where this is the case for some standard integer type.

But note that that just moves the potential "problem" into the realm of
possibility.

If (against expectation) such compiler is found it doesn't mean that
zeroing arrays of integers via memset is UB in general; just that the
proposition that it _can hypothetically_ be can be amended to _can_.


Everything not defined in the standard is undefined behaviour. The
hypothetically stuff don't fit in this discussion, because some day for
example, I


No no no. What needs to asserted is simply that the range of the
relevant integer type requires the full number of bits in the object.
That is very easy to assert in system-independent way.



With sizeof(), maximum/minimum values and numeric_limits<unsigned
char>::digits it can be done, however "legally" speaking (we are
language lawyers after all) there is no guarantee that there are not
integers with padding bits. However if you use the above mentioned check
and you find that there are not padding bits, then you can zero
everything on integer types. However such a check would imply that there
will be used an alternative when the above check "returns false", so why
not use fill() family in the first place?






Regards,

Ioannis Vranos
 
I

Ioannis Vranos

Ioannis Vranos wrote:


Fixed to be more comprehensible:



Everything not defined in the standard is undefined behaviour. The
hypothetically stuff don't fit in this discussion.




With sizeof(), maximum/minimum values and numeric_limits<unsigned
char>::digits it can be done portably, however "legally" speaking (we
are language lawyers after all) there is no guarantee that there are not
integers with padding bits in some system. However if you use the above
mentioned check and you find that there are not padding bits, then you
can zero everything on integer types and this scheme is portable.
However such a check would imply that there will be used an alternative
when the above check "returns false", so why not use fill() family in
the first place?






Regards,

Ioannis Vranos
 
A

Alf P. Steinbach

* Ioannis Vranos:
With sizeof(), maximum/minimum values and numeric_limits<unsigned
char>::digits it can be done portably, however "legally" speaking (we
are language lawyers after all) there is no guarantee that there are not
integers with padding bits in some system. However if you use the above
mentioned check and you find that there are not padding bits, then you
can zero everything on integer types and this scheme is portable.
However such a check would imply that there will be used an alternative
when the above check "returns false"

It doesn't necessarily mean an alternative would be used; the simplest
way to guarantee not UB is to have the program not compile when padding
bits are present; then from the requirements of the standard we have
value representation = object representation = all bits 0 for value 0.

However, templating can be used to select at compile time the most
efficient method, depending on whether pad bits exist or not.

On the third hand, I think the probability of that template selecting a
std::fill would be so near zero as to be practically equivalent to zero.

so why not use fill() family in the first place?

Well my position is that the added safety etc. of std::fill in general
far outweights the possible (and in practice more than possible) higher
efficiency of a memset, so that std::fill would nearly always be my
choice; I think we're in agreement there.
 
I

Ioannis Vranos

Alf said:
Well my position is that the added safety etc. of std::fill in general
far outweights the possible (and in practice more than possible) higher
efficiency of a memset, so that std::fill would nearly always be my
choice; I think we're in agreement there.


Which memset() is not guaranteed to be more efficient than fill()
anyway, since it can also use a loop inside it to assign values on bytes
as unsigned chars.

I had said, we should use fill() family unless we cannot do otherwise.
With that I meant that when the use of fill() raises performance
concerns while the use of memset() yields significant benefits, then we
should use memset().






Regards,

Ioannis Vranos
 
I

Ioannis Vranos

Ioannis Vranos wrote:

Which memset() is not guaranteed to be more efficient than fill()
anyway, since it can also use a loop inside it to assign values on bytes
as unsigned chars.


which would be slower than assigning int types with 0 using fill(), by
the way.

I had said, we should use fill() family unless we cannot do otherwise.
With that I meant that when the use of fill() raises performance
concerns while the use of memset() yields significant benefits, then we
should use memset().






Regards,

Ioannis Vranos
 
J

JKop

Alf P. Steinbach posted:

In short it boils down to "is undefined behavior" versus "can be
undefined behavior". In the case of padding bits accessible to the
program it is undefined behavior. In the more general case of integer
types in C++ it isn't necessarily undefined behavior, and the
possibility of UB is only on antiquated machinery for which I'm not
even sure that C++ compilers exist, and furthermore that remote, purely
academic possibility can be avoided by a simple compile time assertion.

Then only problem I myself can see with messing with padding bits is the
following:

struct Poo
{
char a;
char b;
char c;
long d;
};


If that long in there has to be on a 4-byte boundary or whatever, then in
memory it'll look like so:

__________
| |
| a |
|__________|
| |
| b |
|__________|
| |
| c |
|__________|
| |
| padding |
|__________|
| |
| d |
|__________|


From looking at that, it may seem harmless to alter the padding bits. But
consider if you had the following:

int main()
{
Poo poo;

char kkar;
}


The system could very well stick kkar into the vacant space:

__________
| |
| a |
|__________|
| |
| b |
|__________|
| |
| c |
|__________|
| |
| kkar |
|__________|
| |
| d |
|__________|


Can that happen?

Other than that, I see no reason for not messing with padding bits.


-JKop
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,933
Members
47,472
Latest member
blackwatermelon

Latest Threads

Top