size_t or int for malloc-type functions?

R

Richard Bos

jacob navia said:
Randy Howard a écrit :

Not at all. Please just see what the proposal really was
before going to answer to phantasy proposals.

The proposal was discussing the idea of using a signed
type to avoid problems with small negative numbers
that get translated into huge unsigned ones.

And that's where it breaks down. You see, where are you going to _get_
these small negative numbers? Are you going to get a negative
multiplicand from sizeof? No, because that's defined as giving a
positive number under all circumstances. Is your programmer going to
specify a negative number of objects? Hardly likely. That would be a
blunder of the first order.
So whence the negative number? Probably, one supposes, from multiplying
two largeish positive numbers and getting a signed integer overflow. Ah,
but! But signed integer overflow causes undefined behaviour. So the
error is not trying to allocate a negative number of bytes, the error is
computing the negative in the first place, and it's an error that is
allowed to be fatal and cannot reliably be caught.

Of course, there _is_ an easy way to stop the undefined behaviour. That
way is not to use signed integers for sizes in the first place.
Multiplying an unsigned integer by an (unsigned) size_t gives you
another unsigned integer. The multiplication cannot overflow, and cannot
cause UB. It _can_ wrap around, but that error is fairly easy to detect;
the way to do this is left as an exercise to the reader, but should not
evade any first-year student of C.

So, by suggesting that instead of the unsigned size_t, we should use
signed int or ssize_t, you are effectively advocating replacing a safe
method of handling malloc() in which overly large sizes are easily
spotted, by an unsafe method in which overly large numbers cause
untrappable errors which can only be caught after the damage has already
been done, and in which the program may crash before you even get to
check whether the result is negative at all. Is that wise? Seems to me
that it's not.

Richard
 
K

kuyper

CBFalconer said:
IIRC a ring defines a set of objects that are members of the ring,
and a set of operations on those objects, such that <m1 operation
m2> yields a member of the ring. unsigned objects and the
operations +, -, and * meet this definiton. The operation / does
not. Exponentiation does. Many bits have been dropped in my
memory, however.

The last time I studied mathematical rings was about 30 years ago; a
lot of the bits have dropped from my memory too. I don't remember how
the mathematicians decided to handle division for rings. I think that
they declared that rings are not closed under division. However, for
every case where m1/m2 gives a value that is also a member of a ring,
the C division operator gives that same member. Mathematics is
sufficiently flexible that I'm sure that every C type has a
corresponding mathematical construct which exactly matches it's
behavior, but no C type is an exact match to any simple mathematical
construct. A ring is the mathematical concept that is the closest
simple match to the behavior of C unsigned types.
 
K

kuyper

av wrote:
,,,,
your dear c standard is wrong in the definition on size_t

is there someone agree with me?

You use too little English to explain what it is you're talking about,
and the English you do provide has such sloppy grammar and punctuation
that I can't figure out what it is that you're trying to say.

In order to validly argue that a definition of a term is wrong, you
must reference a differing more authoritative definition. In this case,
the authoritative definition of size_t is the one provided the C
standard - there is no higher authority you can refer to, to justify
calling the definition wrong. The standard's definition might be poorly
written, unreadable, useless, meaningless, internally inconsistent,
inconsistent with other parts of the standard, inconsistent with some
other standard, unimplementable, or it might possess any of a wide
variety of other negative characteristics. But since C standard is the
relevant authority, it's definition of size_t is inherently incapable
of being wrong.

So - which of those other negative characteristics describes the
problem you're complaining about?
 
K

kuyper

Mark said:
No, thats just how you typed it. As far as malloc is concerned, you
passed UINT_MAX.

The argument passed to malloc is, as he said, -1. The parameter value
received by malloc() is SIZE_MAX, which might or might not be the same
as UINT_MAX.

Still, the result is the same: you can pass a negative value to
malloc(), but malloc() can't receive it.
 
B

Ben Pfaff

CBFalconer said:
Ben said:
Keith Thompson said:
void *calloc(size_t n,size_t s)
{
long long siz = (long long)n * (long long)s;
void *result;
if (siz>>32)
return 0;
result = malloc((size_t)siz);
if (result)
memset(result,0,(size_t)siz);
return result;
}
[...]

I wonder if "siz&~0xFFFFFFFF" might be marginally more efficient than
"siz>>32". [...]

I'd recommend "siz > SIZE_MAX" as being both clear and portable.

I don't think so. Ignoring the casts, maybe

if (!(SIZE_MAX - siz)) thingsarebad();
else carryon();

I seriously doubt that most machines can generate a value larger
than SIZE_MAX.

I don't understand your objection. You believe that the product
of one size_t and another size_t, both converted to long long,
cannot be larger than SIZE_MAX? (Jacob has postulated 32-bit
size_t, by the way.)

I don't understand why Jacob didn't use unsigned long long, by
the way. It would seem a more straightforward choice.
 
S

Stephen Sprunk

Old Wolf said:
Firstly, systems might exist where you can allocate more memory
than SIZE_MAX.

Not portably and in a single object. size_t is _defined_ to be able to
hold the size of the largest possible object. That, of course, doesn't
exclude systems where you can allocate SIZE_MAX bytes multiple times
(e.g. MS DOS) or where there is some other (i.e. non-portable)
allocator.
That aside, wouldn't it be more sensible behaviour for malloc to
return NULL or take some other action when you try request an
object bigger than the system can provide, rather than returning
a smaller object than requested? I think this is Navia's point.

malloc() has no way of knowing what you _tried_ to request; it only
knows what value actually showed up in its argument. If that doesn't
match what you thought you were passing, there's no way for malloc() to
know.

If malloc() doesn't know that you tried to request SIZE_MAX+N bytes,
then how is it supposed to know to respond to that case?

S
 
S

Stephen Sprunk

Randy Howard said:
Actually it's usually 2GB, due to splitting of address space between
the kernel and user space. Some go as high as 3GB with special boot
options.

RedHat has a special Linux kernel that gives just under 4GB of user
address space; a bit of kernel space is still required to keep syscalls
working, but it's pretty small. It's mainly used by database folks, who
should be moving to AMD64 now anyways (with its 2^51 bytes of user
space, currently).
However, there are hacks (outside of malloc) that allow for
what Intel calls "Physical Address Extension" (PAE) to allow
systems with Intel 32-bit processors to see memory above 4GB, sort
of like the old extended/expanded memory hacks in the DOS days.
Again, proprietary, and different APIs to use the memory from what
standard C supports.

A single process will never see more than 32 bits of memory at a time,
since both the virtual and linear address spaces are limited to that
size. What PAE does is allow the OS to map those 32 bits of per-process
linear address space into 36 bits of physical address space. That means
you still can't have more than 4GB in a single app, but you could run
sixteen apps that each have their own 4GB without conflict.

Some OSes may provide a way for processes to "see" different parts of
the 36-bit space at various times, similar to how EMS allowed 16-bit
apps to "see" different 1MB chunks of a 16MB physical address space.
Obviously that requires a lot of non-portable trickery to ensure the
right memory chunk is in place when you dereference a pointer.

S
 
K

Kenny McCormack

who is the troll?[/QUOTE]

The trolls (code for "people who speak truth (not claptrap)") are:

1) You
2) Me
3) Jacob
4) Frederick
5) Old Wolf

And growing. Applications for membership are always accepted.
 
K

Keith Thompson

CBFalconer said:
Bad. long longs can overflow, leading to undefined behaviour. No
guarantee you ever get to testing the product. Casts are always
suspicious.

I've already commented on this code. *If* you happen to know that
LLONG_MAX >= SIZE_MAX*SIZE_MAX, then no overflow is possible. You
can't assume that in portable code, but C runtime code needn't be
portable; it's free to depend on any implementation-specific behavior.

I agree that the casts should be removed. In this case, though, they
happen to be harmless, as long as declarations for malloc() and
memset() are visible.
 
C

Cesar Rabak

jacob navia escreveu:
Ben Pfaff a écrit : [snipped]
Here is part of what the Rationale says about integer overflow.
I believe that it supports my position:

The keyword unsigned is something of a misnomer, suggesting
as it does in arithmetic that it is non-negative but
capable of overflow. The semantics of the C type unsigned
is that of modulus, or wrap-around, arithmetic for which
overflow has no meaning. The result of an unsigned
arithmetic operation is thus always defined, whereas the
result of a signed operation may be undefined.

Yes, I know that, and I agree that the semantics of unsigned is
wrap around. What I am saying is that when "the result cannot be
represented" and this wrap around semantics reduces the result,
this reduced result is mathematically WRONG in the sense of the USUAL
multiplication operation.

PHEW!!!!

Specifically when I use the malloc (p * sizeof *p) "idiom"
even if the semantics are well defined this is NOT what I
inteded with that multiplication!!!!

There is no point in throwing me standards texts because I am not
questioning them. I am just saying that this "results that cannot be
represented" lead to a wrong result being passed to malloc!!!


I just can't understand why it is impossible to agree in such
an evident stuff !!!

I think the root is the difference in perspective.

There is a Standard that documents pretty well this behaviour. So it is
a limitation the user of the library has to live with.

OTOH, we have an interpretation of this as problematic as it may not the
"expectation" for a user not completely aware of this (well described)
way the wrapping works.

So jacob the question is not the 'evidence' but the _interpretation_:
people used to the Standard sees this as business as usual, and you as
an opportunity to improvement...

my 0.01999...
 
C

Cesar Rabak

Kenny McCormack escreveu:
[QUOTE="av said:
Besides which av is a troll.
who is the troll?

The trolls (code for "people who speak truth (not claptrap)") are:

1) You
2) Me
3) Jacob
4) Frederick
5) Old Wolf

And growing. Applications for membership are always accepted.[/QUOTE]
....and will this start of year applicants be free of fees? :-D
 
K

Keith Thompson

CBFalconer said:
Ben said:
Keith Thompson said:
void *calloc(size_t n,size_t s)
{
long long siz = (long long)n * (long long)s;
void *result;
if (siz>>32)
return 0;
result = malloc((size_t)siz);
if (result)
memset(result,0,(size_t)siz);
return result;
}
[...]

I wonder if "siz&~0xFFFFFFFF" might be marginally more efficient than
"siz>>32". [...]

I'd recommend "siz > SIZE_MAX" as being both clear and portable.

I don't think so. Ignoring the casts, maybe

if (!(SIZE_MAX - siz)) thingsarebad();
else carryon();

I seriously doubt that most machines can generate a value larger
than SIZE_MAX.

The code assumes 32-bit size_t and 64-bit long long. "siz > SIZE_MAX"
should work correctly given those assumptions.
 
K

Keith Thompson

Stephen Sprunk said:
Not portably and in a single object. size_t is _defined_ to be able
to hold the size of the largest possible object. That, of course,
doesn't exclude systems where you can allocate SIZE_MAX bytes multiple
times (e.g. MS DOS) or where there is some other (i.e. non-portable)
allocator.
[...]

That's not quite correct. The standard merely defines size_t as "the
unsigned integer type of the result of the sizeof operator" (C99
7.17p2). There are objects to which you can't apply the sizeof
operator, for example objects created by the *alloc() functions.
calloc() in particular can be used to *request* the creation of an
object bigger than SIZE_MAX bytes (though an implementation is likely
to reject all such requests).
 
K

Keith Thompson

Stephen Sprunk said:
Not portably and in a single object. size_t is _defined_ to be able
to hold the size of the largest possible object. That, of course,
doesn't exclude systems where you can allocate SIZE_MAX bytes multiple
times (e.g. MS DOS) or where there is some other (i.e. non-portable)
allocator.
[...]

That's not quite correct. The standard merely defines size_t as "the
unsigned integer type of the result of the sizeof operator" (C99
7.17p2). There are objects to which you can't apply the sizeof
operator, for example objects created by the *alloc() functions.
calloc() in particular can be used to *request* the creation of an
object bigger than SIZE_MAX bytes (though an implementation is likely
to reject all such requests).
 
M

Mark McIntyre

No, the argument to malloc is the expression -1, which is clearly of type
int, and is equally clearly negative!

The compiler doesn't read "-1" though, does it? Its an expression of
type int, whose value is represented by some bits which, when regarded
as a signed int, equal -1, and when regarded as an unsigned long, make
up (say) 0xFFFF.
Imagine if you'd passed in 'a'.
As far as malloc is concerned, it
*receives* a parameter with the value (size_t)-1, which is a (very)
positive value.

No, it recieves a set of bits in some memory address or register, and
interprets them as a size_t.

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
P

pete

Mark said:
No, it recieves a set of bits in some memory address or register, and
interprets them as a size_t.

As far as malloc is concerned,
it *initializes* a parameter with the value (size_t)-1,
which is a (very) positive value.
 
J

Joe Wright

CBFalconer said:
jacob navia wrote:
... snip ...

Yes, if done with unsigned shorts, i.e. any unsigned 16 bit type.
That's what the standard prescribes. Have you ever bothered to
read the standard? Without doing so how can you dare to modify the
lcc compiler?
No, its done with 32-bit ints. Although 65521 fits in 16 bits 65552
requires 17 bits. The product of the two wants 33 bits.

65521 int
65552 int
65296 int (truncated)
4295032592 long long
...and the corresponding binary..
00000000 00000000 11111111 11110001
00000000 00000001 00000000 00010000
00000000 00000000 11111111 00010000
00000000 00000000 00000000 00000001 00000000 00000000 11111111 00010000
 
C

CBFalconer

av wrote:
,,,,

You use too little English to explain what it is you're talking
about, and the English you do provide has such sloppy grammar and
punctuation that I can't figure out what it is that you're trying
to say.

av is a troll. Ignore it.
 
P

Peter Nilsson

...
The last time I studied mathematical rings was about 30 years ago; a
lot of the bits have dropped from my memory too. I don't remember how
the mathematicians decided to handle division for rings.

They invented Fields. But note that the mathematical notion of
multiplicative
inverses in Fields does not correspond to integer division with
rounding.
I think that they declared that rings are not closed under division.

Division by zero is usually excluded. So division is rarely closed
in any case.
However, for
every case where m1/m2 gives a value that is also a member of a ring,
the C division operator gives that same member.

That's too strong a statement. It is trivial to generate Rings whose
elements
are normal integers, but whose addition and multiplication functions
are
entirely different to the ordinary arithmetic of integers. The elements
are
really just labels, it's the function mappings that determine what is a
ring.
C's unsigned arithmetic uses particular mappings that just one example.
...A ring is the mathematical concept that is the closest
simple match to the behavior of C unsigned types.

Yes, but it's unnecessary to consider rings in general in order to
understand
unsigned integers. In fact most mathematical texts often explain
modular
arithmetic first to give a grounding example in preparation for later
discussing of groups and rings in more formal terms.

The important part is the notion that in the abstract, the operators +
and *
are just function mappings, not 'computations'. In other words + and *
are
in essence just pure lookup tables.

For unsigned integers, the mapping/table is 'onto', that is, for every
pair (a,b) there exists a corresponding element. Signed integer
operators are not onto since there are pairs that do not have
mappings. Because of that, C's signed integer arithmetic is not
a ring. However, the typical two's complement implementation
completes the mapping and does form a ring.

The standard explains the completeness of the unsigned mapping
in terms of wrap-around. But that explanation is really just a
convenient definition. Hardware implementers certainly don't think
in terms of it, they don't need to. All that is important is what the
definition implies. Modulo arithmetic has some particularly useful
properties, and it all stems from that simple definition.

Unfortunately, C's definition directly introduces the concept of an
intermediary result that needs to be force fitted back into the
appropriate range. In other parts, particularly floating point it's
called a mathematical result. But an intermediary result is the
way most people actually think of and understand modular
arithmetic.

But as is often the case in clc, neither the original post, nor the
spin off debate is anything new... [may need unwrapping...]

http://groups.google.com/group/comp...5dec2cfa882f7962?rnum=11#doc_0bbe47fcef0f83f7
 
P

Peter Nilsson

That's an 'all the world is intel' view. The standard is more general
but
otherwise quite explicit.
As far as malloc is concerned,
it *initializes* a parameter with the value (size_t)-1,
which is a (very) positive value.

The effect is the same, but to nitpick, the standard says assignment,
not initialisation: 6.5.2.2p4

An argument may be an expression of any object type. In preparing for
the call to a function, the arguments are evaluated, and each
parameter
is assigned the value of the corresponding argument.

So malloc doesn't do anything because its parameters already have
their values when it is actually called. This is indeed how most
calling conventions work. The code for the function definition assumes
that the values have already been assigned. The only way this can
happen (efficiently) in the case where conversions are necessary is
if the assignments occur prior to the call.

It is for this reason that even non variadic functions can require a
prototype in scope before the call in order for the function to be
called correctly.

Functions taking size_t parameters are candidates; malloc is one of
them. Although -1 is unlikely in real code, I have certainly seen
dynamic character array allocations where the argument is not a
size_t. Such calls require a prototype to be in place in order that the
correct conversion be applied.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top