sorting the input

C

Chris Torek

so, <char** cp> can be stored in <void* vp>.
Yes.

I thought a ** (pointer to pointer) needs a ** type to store

Ideally, given a value of some type T, the best place to store
it is an object declared with that type. Given:

typedef /* insert some data type here */ T;
/*
eg,
typedef double T;
or typedef short T;
or typedef char **T;
or typedef struct foo *T;
or even
typedef void (*T)(void);
*/

you would then do:

T var;
...
var = value;

to store the value of type T in a variable of type T.

Sometimes, however, we have -- for whatever reason -- a variable
(or other object, if "variable" refers only to named objects)
with the "wrong" type. For instance:

double var;
...
var = some_integer_valued_function();

What is required is that the object into which we save this value
be "sufficiently capacious" to store any possible value. So
this question:
but here we are storing a ** into a single *. are they of same size ?

is not quite aimed correctly. If they were the same size, that
might be good enough, because at least the "place we are saving
the value" has enough *bits* to store the value. But "same size"
is not really good enough -- an int and a float are often the
same size:

#include <stdio.h>
int main(void) {
printf("%d %d\n", (int)sizeof(int), (int)sizeof(float));
return 0;
}

This will print "4 4" on many machines (I have one here that does
.... well, actually, I have quite a few, but will use just one of
them). But we cannot store just any float in an int, nor vice
versa:

#include <stdio.h>
#include <limits.h>
int main(void) {
float f;
int i;

f = 2109876543;
i = 3.5;
printf("%d (should be %d); %f (should be %f)\n",
(int)f, 2109876541, (double)i, 3.5);
return 0;
}

On my machine, I get:

% ./t
2109876480 (should be 2109876541); 3.000000 (should be 3.500000)

So we see that on my machine, even though int and float are the
same size, they cannot store all the same values. The "int" variable
loses the fraction, while the "float" variable sometimes loses the
last few digits of any "too-big" integer (certain powers of two
are saved correctly, but other values are rounded to the nearest
power of two).

If we were to store "int"s in "double"s, though, at least on this
particular machine, all "int" values would fit without rounding
problems. So a "double" suffices to store any "int" value: it has
enough bits *and* it never "corrupts" any of the original bits.
Even though assigning INT_MAX to a "double" changes the set of bits
stored (see <http://web.torek.net/torek/c/numbers.html> for a
discussion of representations), we can later convert the double
back to an int, and we always get the original number back.

What happened back in the 1980s was that the ANSI C committee --
the people involved in X3J11 -- were at some meeting(s), and had
some discussion(s), in which various members said they wanted to
come up with a new data type. This new type should be "big enough"
to store any valid data-pointer type.

On most machines, all data pointer types are the same, but on a
few -- especially back then -- there were some that had more than
one "kind" of pointer, having the bits moved around (byte vs word
pointers) on some machines, or even different sizes of pointers on
others. But even on these "oddball" machines, there was always
some way to store the "worst case" pointers. Even if the compiler
might have to have a special struct or union type internally, there
would be *some* way to gather up all the information from any
"specific" pointer type -- a "T *" for some data-type T -- into a
"generic" form.

What the X3J11 (ANSI C) committee folks needed to do, then, was
require implementations to provide a "generic data pointer" type,
along with pairs of chunks of implementation-specific code. One
chunk of each pair would take a *specific* pointer value, gather
up everything about it, and store it into this "generic pointer".
The other chunk would take one of these "generic pointers", having
been constructed earlier from some specific pointer, and convert
it back to the appropriate specfiic pointer.

For discussion, let us call the generic pointer type "genptr_t",
for the moment. The ANSI folks might even have considered using
a name like this (I was not at the meetings, I only read some
of the eventual paperwork, so I do not know if they did).

On machines with multiple "flavors" of pointers, these "chunks
of code" (as I am calling them here) were usually one pair of
instructions. For instance, a machine with both byte and word
pointers will usually be able to point to any individual byte
with a byte pointer, so it can use "byte pointer" for genptr_t.
Then, given a value of type "T *", the value itself is either
already a byte pointer, or is a word pointer. So:

T *orig;
genptr_t save_it;
... something that sets "orig" ...
save_it = (genptr_t)orig;

needs to use "convert word pointer to byte pointer" if "T *" is
a word pointer. If "T *" is already a byte pointer, it just
uses an ordinary "copy" instruction (often spelled "move", but
it copies, rather than emptying out the original :) ).

To restore the original value:

... code that modifies "orig" ...
orig = (T *)save_it;

we just need to reverse the process, either with another copy, or
with a "convert byte pointer to word pointer" instruction. Note
that the compiler did (and does) not have to know whether "save_it"
originally came from a byte or word pointer. You, the C programmer,
tell the compiler which kind to change it *to*. (In this particular
case, you supply the type with a cast.) It is *your* responsibility
to make sure that whatever value is in save_it "came from" a value
of the right type.

On machines with only one "flavor" of pointer, like the x86 for
instance, any pointer is already suitable for a genptr_t, so we
only need to use the machine's "copy value" ("mov") instruction.
So this makes writing a C compiler for these machines even
easier. All the "genptr_t" does is leave room for the oddball
machines: it costs nothing on the x86.

The weirdest part of this whole story, though, is the spelling that
the X3J11 committee folks came up with for "genptr_t". Instead of
having some <stdwhatever.h> file provide it, or having the compiler
pre-load it using that name, they decided to have the compiler
pre-load it, but using the spelling "void *".

So "void *" is really a genptr_t -- a generic pointer. This is
a very special pointer, in two ways:

- it is always big enough to hold *any* data pointer (not
necessarily a function pointer, though), and

- you can convert "void *" to and from "T *", where T is any
data type, without using a cast. (Other pointer conversions
require using casts.)

That second property is probably what convinced the committee
to spell the type "void *", making it built-in to the compilers.

Note, by the way, that "void *" is itself a data type, so one
can point to a "void *", giving a "void **". This is just a
regular old (non-generic) pointer type, though: it points
specifically to a "void *", never to any other type.
 
B

Barry Schwarz

Not by my reading of it. It seems entirely within the allowed
behaviour that the last line (i.e. the one that triggers the
end-of-file condition) might not leave a \n in the buffer. In fact,
both reading a newline and seeing the end of the file are listed as
reasons to stop input and null-terminate the string. Am I missing
something?

I think we are using the term buffer in two different senses. I used
buffer to mean the "array" pointed to by the first argument to fgets.
You appear to be using it to refer to the I/O buffer used to manage
the stream. In that case I agree with you. I make no claim as to the
contents of the stream buffer. What I do claim is that santosh's
assertion that two end of file sequences (whatever that is) can
prevent fgets from storing a '\n' in the array buffer would a
violation of the standard. Ignoring error conditions, if the number
of bytes read is at least two less than the second argument to fgets,
then fgets must store both '\n' and '\0' in sequence immediately
following the last byte read. Failure to do so is a violation of its
behavior mandate. The only time the '\n' is not stored is when the
number of bytes read is exactly one less than the quantity specified
in the second argument.


Remove del for email
 
B

Ben Bacarisse

Barry Schwarz said:
I think we are using the term buffer in two different senses. I used
buffer to mean the "array" pointed to by the first argument to fgets.
You appear to be using it to refer to the I/O buffer used to manage
the stream. In that case I agree with you. I make no claim as to the
contents of the stream buffer.

No, I am talking about the same buffer as you. I, too, make no claims
about any intermediate stream buffers.
What I do claim is that santosh's
assertion that two end of file sequences (whatever that is)

[It is just how some systems let you signal an end-of-file when the
preceding input is not a newline. Thus, on Linux, to give a program a
three-byte input stream consisting of 'a', 'b' and 'c' you would type
"abc^D^D" at it.]
can
prevent fgets from storing a '\n' in the array buffer would a
violation of the standard. Ignoring error conditions, if the number
of bytes read is at least two less than the second argument to fgets,
then fgets must store both '\n' and '\0' in sequence immediately
following the last byte read. Failure to do so is a violation of its
behavior mandate.

All I can say is that is not how I read that part of the standard.
The governing text seems to be section 7.19.7.2 para. 2:

"The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array."

which I can't read as mandating a newline being added to the buffer
unless one was there in the first place. The only place where there
may not be one is of course, by definition, the last line.

If you are right, and there are further restrictions that mandate a
newline character (space permitting) in the last successful return
from fgets, then gcc's fgets is not conforming (as are several other C
libraries I have used before but no longer have access to).

Of course, it is permissible for an implementation to require that the
last line of all text streams include a newline. So the behaviour you
say is required by the standard may be imposed by an implementation;
but this is "optional" and, in any case, would only apply to text
streams.
 
B

Ben Bacarisse

pete said:
arnuld wrote:

& address operator

&a[f()]? & suppresses the array-to-pointer conversion, and & and *
are specifically defined to "cancel out" without evaluation (so I
think &*NULL is allowed) but that is some way from not evaluating its
operand.
 
H

Harald van Dijk

I never expected this to be like this way. I am quite surprised at this
C feature. Are there any other C operators who do not evaluate their
operands ?

?: is designed specifically to not evaluate one of its operands (but does
evaluate one or two of the others).
 
B

Barry Schwarz

snip
What I do claim is that santosh's
assertion that two end of file sequences (whatever that is)

[It is just how some systems let you signal an end-of-file when the
preceding input is not a newline. Thus, on Linux, to give a program a
three-byte input stream consisting of 'a', 'b' and 'c' you would type
"abc^D^D" at it.]
can
prevent fgets from storing a '\n' in the array buffer would a
violation of the standard. Ignoring error conditions, if the number
of bytes read is at least two less than the second argument to fgets,
then fgets must store both '\n' and '\0' in sequence immediately
following the last byte read. Failure to do so is a violation of its
behavior mandate.

All I can say is that is not how I read that part of the standard.
The governing text seems to be section 7.19.7.2 para. 2:

"The fgets function reads at most one less than the number of
characters specified by n from the stream pointed to by stream into
the array pointed to by s. No additional characters are read after a
new-line character (which is retained) or after end-of-file. A null
character is written immediately after the last character read into
the array."

which I can't read as mandating a newline being added to the buffer
unless one was there in the first place. The only place where there
may not be one is of course, by definition, the last line.

You are correct. My apologies to both you and santosh. I have to
stop using the Cliff Notes which tend to leave out a few key words.
Thanks for the reference (though I should have found it myself before
I even started to reply). 30.


Remove del for email
 
B

Barry Schwarz

?: is designed specifically to not evaluate one of its operands (but does
evaluate one or two of the others).

OK, I'll bite. When can it evaluate only one operand?


Remove del for email
 
H

Harald van Dijk

OK, I'll bite. When can it evaluate only one operand?

When the evaluation of the first operand causes the program to terminate.
This is not specific to ?:.
 
A

arnuld

When you're sorting an array of strings, each element is either a
pointer to char or an array of char that effectively decays into a
pointer to char. The qsort function will pass to your comparison
function the address of that pointer - i.e. a pointer to pointer.


without confusing the idea I can simply say that an array of strings is
*always* an array of pointer to chars. ok ?

#include <string.h>
int p_strcmp(const void *pv1, const void *pv2) {
char * const * v1 = pv1;
char * const * v2 = pv2;
return strcmp(*v1, *v2);
}

that code is conformed from the FAQ too and this is what "Joe Wright"
writes:

int qcmp(const void *p1, const void *p2)
{
return strcmp(*(const char **)p1, *(const char **)p2);
}


I am really amazed why it is simply not:

*(char**)p1

rather than the confusing *(char* const*)p1 or *(const char**)p1 ?
 
A

arnuld

No, it's always an array of strings. It needn't be an array of pointers
to char, though:

char notanarrayofpointers[][6] =
{
"Apple",
"Busby",
"Carol",
"Dough",
"Earth",
"Flood",
"Grate",
"Hotel"
};

This is an array of strings, but not an array of pointers to char.


you mean it will not be represented as <arrays of pointers to chars>
like when I give it to a function as argument ?


That doesn't look right to me. const char ** means "pointer to a pointer
to const char", whereas what is needed is either a pointer to a const
pointer to char or a pointer to a const pointer to const char (either is
reasonable).


I checked K&R2, page 253, <strcmp> requires a <const char *> which is a
<pointer to a const char>. It does not require pointers to be a const
themselves.

In your case (or in FAQ's case), when we dereference a <pointer to a const
pointer to char> , we get a <const pointer to char> which is not required.


I got it or not ?


Note that my solution doesn't involve any confusing casts, for the
simple reason that it doesn't involve any casts.


yes :)
 
S

santosh

Richard said:
arnuld said:
without confusing the idea I can simply say that an array of strings
is
*always* an array of pointer to chars. ok ?

No, it's always an array of strings. It needn't be an array of
pointers to char, though:

char notanarrayofpointers[][6] =
{
"Apple",
"Busby",
"Carol",
"Dough",
"Earth",
"Flood",
"Grate",
"Hotel"
};

This is an array of strings, but not an array of pointers to char.

So is this an "array of string" or perhaps an array of type string?

char arr_of_str[] = "hello";

Or is the above, and your example, just arrays which happen to contain
strings. Since strings are not really types in C can we even say "...
of strings" at all?

<snip>
 
S

santosh

arnuld said:
On Mon, 28 Apr 2008 06:31:02 +0000, Richard Heathfield wrote:

No, it's always an array of strings. It needn't be an array of
pointers to char, though:

char notanarrayofpointers[][6] =
{
"Apple",
"Busby",
"Carol",
"Dough",
"Earth",
"Flood",
"Grate",
"Hotel"
};

This is an array of strings, but not an array of pointers to char.

you mean it will not be represented as <arrays of pointers to chars>
like when I give it to a function as argument ?

This declares a two-dimensional char array; an array N of array 6 of
char where N is determined by the compiler when is parses the
declaration. It notes that in this particular case eight initialisers
are used and hence the N is set as eight. Then each of the arrays from
0 to 7 are initialised by copying the respective string literals into
them.

<snip>
 
A

arnuld

Given the correct call foo(notanarrayofpointers), we can deduce that foo
takes as its parameter a char (*)[6], NOT a char **.

I did not know that passing a 2D array to a function results in <pointer
to whole array>. Thanks for telling me this C feature :)
 
D

Default User

arnuld said:
On Mon, 28 Apr 2008 07:25:53 +0000, Richard Heathfield wrote:

Given the correct call foo(notanarrayofpointers), we can deduce
that foo takes as its parameter a char (*)[6], NOT a char **.

I did not know that passing a 2D array to a function results in
<pointer to whole array>. Thanks for telling me this C feature :)

The implicit conversion for an array gives you a pointer to the first
element. The first element is an array of char of size 6, so the array
conversion is to a pointer to "array six of char". That's what Richard
showed you above.



Brian
 
K

Keith Thompson

arnuld said:
On Mon, 28 Apr 2008 07:25:53 +0000, Richard Heathfield wrote:
Given the correct call foo(notanarrayofpointers), we can deduce that foo
takes as its parameter a char (*)[6], NOT a char **.

I did not know that passing a 2D array to a function results in <pointer
to whole array>. Thanks for telling me this C feature :)

Not to the whole array, just to the first element.

This is nothing more than one case of the more general rule that an
expression of array type is, in most contexts, converted to a pointer
to its first element (the exceptions are when the array expression is
(a) the operand of a unary "&", (b) the operand of "sizeof", or (c) a
string literal in an initializer, used to initialize an array object).

A 2D array is nothing more than an array of arrays. Everything about
2D arrays (syntax, semantics, whatever) follows directly from the
rules for arrays in general.
 
D

David Thompson

When you're sorting an array of strings, each element is either a pointer
to char or an array of char that effectively decays into a pointer to
char. The qsort function will pass to your comparison function the address
of that pointer - i.e. a pointer to pointer.
If you have, as the OP did, array of pointer to string (which is
pointer to char, or often better const char) then the compare
arguments are actually pointer to pointer to char, disguised as
pointer to void, and this is indeed correct:
#include <string.h>
int p_strcmp(const void *pv1, const void *pv2)
{
char * const * v1 = pv1;
char * const * v2 = pv2;
return strcmp(*v1, *v2);
}

But if you have array of array of char, there is no decay. The compare
arguments are pointer to string disguised as void*, and CBF's compare
routine is right. In fact, with the 'same representation' requirement
for /*flavored?*/char* and void*, it should work to simply give strcmp
directly to qsort. (Although I would call this a bad habit to get
into.) Also if you have array of struct whose first field is array of
char, and is the sort key, which is a case I have encountered.

- formerly david.thompson1 || achar(64) || worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top