What is a type?

T

Tim Rentsch

jacob navia said:
Tim said:
Syntactic elements have [program] types. It's the syntactic element
'2' that has type int, not some runtime value 2, which as you point
out may not exist. Values and other runtime entities have a
representation type impressed on them by the code accessing the value
or entity at that particular time. For linguistic convenience we
sometimes say "this object is of type int" but what's meant is that
the memory is being interpreted as having the representation type that
corresponds to the program type int for that implementation.

If I understand you correctly, you make a difference between program
types that correspond to the abstract types defined in the C standard,
and representation types that are the product of applying those abstract
types to a specific machine architecture.

Yes, with two minor corrections. The term "abstract type" is generally
used to mean a different concept; any of the terms "program type", "C type",
or "standard C type" would be more appropriate. Also, representation types
exist independently; they aren't the "product of applying" the C types.
A more accurate way of saying it would be that, on a particular machine,
a compiler chooses a representation type that will correspond to each
C program type. There are also representation types that don't correspond
to any C program type (although most of these won't ever be used by
the compiler).

You agree that types are definitions of how to interpret a sequence of
bits in memory:

You left out "representation" in that sentence. Representation types
do determine how to interpret memory values, in the case of data
objects; it's a little dangerous to call memory values "a sequence
of bits", since part of the representation type would determine
in what order the memory units are processed. And, functions also
have a representation type, but in the case of functions the representation
type determines how they should be called, not how the memory storing
the function object code will be interpreted.
Using your terminology, my type definition would correspond to the
representation types.

I think that's right. That's a little bit dangerous, since it
encourages people to think in terms of the representations, which
will change from machine to machine, or from compiler to compiler.
Or even, in some cases, from compilation to compilation. That's why
it's also important to explain program types.
1) Abstract type: int, as defined by the C standard.

Again, the term "abstract type" shouldn't be used, since it's
standardly used in the literature to mean something very different.

2) Concrete representation type: int as a sequence of 32 bits as
implemented by the lcc-win32 compiler.

Similarly, the word "concrete" here should not be used, since it
means something different in the literature. The type 'int' is
a concrete type, as the term is standardly used. You might try
"architecture/compiler specific representation".

The standard constraints the possible representations of int (it should
have at least 16 bits for instance), and lcc-win32 implements that
abstract type by choosing a machine word length for the "int" concrete
representation.

Did I understood you correctly?

I believe so (reiterating my comments about not using "abstract" and
"concrete" for these purposes).

Thanks for your contribution, you make an intersting point here.

I think a similar wording as above could be very well within the reach
of a beginner.

You are most welcome. I think so too.
 
K

Keith Thompson

Tim Rentsch said:
I'm confident he understands the two ideas. I expect the names would
also make sense to him, perhaps without previous explanation, but
almost certainly after getting the descriptions posted recently in the
NG.

I admit to having chosen these particular names myself; the
concepts though have a much longer pedigree, appearing in the
literature starting in the early 1980's.

To be clear, the ideas are what's important (IMO) to understand. I
think the names are fairly good and reasonably evocative (if I do say
so myself), but the two distinct ideas are the important thing.

Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.
 
T

Tim Rentsch

Keith Thompson said:
Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.

I chose the term "representation type" because historically the word
"type" was used to mean both concepts, and I thought it might help
people who are used to thinking the two concepts are synonymous.
I agree though that a different term might be better. Anyone
have any suggestions?
 
K

Keith Thompson

Tim Rentsch said:
I chose the term "representation type" because historically the word
"type" was used to mean both concepts, and I thought it might help
people who are used to thinking the two concepts are synonymous.
I agree though that a different term might be better. Anyone
have any suggestions?

What about just "representation"?
 
J

jacob navia

Tim said:
So I see the "program type/representation type" distinction
as more fundamental than the distinction of qualified types
vs unqualified types.

You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.

The same holds for
int a;
and
struct { int a;} a;

Two types with exactly the same representation.

It will be a challenge to explain all this subtetlies
in a tutorial, but somehow I find it necessary.

I think it can be done by proposing a rough approach
at the beginning, and refining it later.

But I am against writing yet another C for dummies. I
want to try to explain the complexity of C without
making believe people that they are learning yet another
version of BASIC or other beginner's language.

C programmers have to know more details of the actual
hardware because C allows you to use directly that
hardware.

Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.

Building types and procedures is called "programming".

A type can be several things, starting with the simple
ones, passive data. This data is stored in integers,
the only thing that machines understand: bits.

A *type* for this kind of data means a coded algorithm
description for the usage of the data. Text is stored
using an alphabet, the most common alphabet being the
"ASCII" format. An alphabet is a common convention for
writing letters as integers. We say that 'A' will be 65
and be done with it. Other alphabets can (and are) used
of course, C is not tied to ASCII.

The type char means then, that the data stored in
consecutive addresses is to be understood as a sentence
like:
"Please enter the amount"
that should be shown at default centered coordinates
with some button underneath and an edit field.

Integers can be used to store text, or colors if you
like: you make an alphabet and assign integers to the whole
color spectrum. The integers are interpreted by the screen
hardware as colors to be shown. Images can be stored as
integers that encode intensity levels and direct the
hardware to display the image.

Integers can store audio, as the many files floating
around will prove... Bach, Ravel, and many others can be
encoded in integers that represent a waveform at an
implicit frequency.

Integers can be used to encode other integers, so you
get formats like mp3 that take the voluminous sequence
of integers produced by the sampler and spit integers
again, but much less.

The type of the integers changes. It will be of no use
to the mp3 decoder to try to understand a photograph as
a song. Or maybe, we should hear what comes out of it
who knows...

In any case the type of the mp3 data is a song, not
a photograph, and if you display it as text (you can
do that one day to "see" a song) it will not be
meaningful either, the type is wrong.

All machines handle basically nothing else but bits.
To find our way we define types of data, i.e. we
ascribe a specific meaning to each bit of what we are
processing: a song, a photograph, some text, a number,
whatever.

Integers aren't all, as everyone knows, 0.5 exists,
and it is not an integer. Well, that doesn't hold.

We can approximate real numbers by using two integers,
the mantissa and the exponent, and we can figure out
clever ways of adding those integer pairs (or floating
point numbers, see later in this tutorial).

Integers can be arbitrarily big, with today's hard disk
capacity storing an integer of 200 GB is possible.

Smaller integers can be handled with a lot less problems
however, and for most applications, double precision is
already quite good.

Note that we "approximate" real numbers, never really
touching them. There is a quantization loss in the encoding,
and a whole seri'es of problems implicit in the digital
nature of the encoding.

Some integers of 1 units are called "chars" and they are
used to encode the alphabet, making the machine store text.

In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.

Several simple types can be combined into aggregates,
i.e. a related bunch of data like integers, character
strings, real numbers, etc. This aggregates can have
relationships between them, represented by pointers to
other aggregates. This are composite types (structures
or unions)

The data is handled in procedures, i.e. a sequences of
instructions that receive some inputs, and produce some
output or modification of the program state. The type
of a procedure is strictly defined by the type of its
inputs and the type of the output (return value).

There are yet another kind of types, where you make
a distinction between two objects that have the same
representation. You can build, for instance, an "enumerated"
type, that is in fact an integer, but encodes special
meaning.

There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.

Usually you receive an opaque pointer from a library
that wants to hide the details of how they do their
stuff from you. This is good for you, since you are
using the library precisely because you do not want to
know a lot about it and just use it.

This allows the library writers too, to change all their
internal stuff without needing a change in all the
customer base. Opaque types are like firewalls. They limit the
growth of the interdependency between the several
parts that make a whole program.


Well it will be something around this lines. Thanks for
the feedback.
 
J

jacob navia

jacob said:
You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.

Should be obviously compile time, not run-time what a blunder.
Enumerations exist only at compile time. At run time the
underlying type is used. The circuit doesn't have any
idea of enumerations.

I pressed the send button too soon.
 
T

Tim Rentsch

Keith Thompson said:
What about just "representation"?

If we use the term "representation" by itself, there's an ambiguity
about whether the notion under discussion is generic or specific.
Individual values have a 'representation'; storage that holds any of
a set of values has (or is accessed using) a 'representation type'.
("What's the representation for a NULL pointer?") Also, using
"representation" by itself doesn't work very well for functions; we
aren't interested in the particular bits that make up a function's
object code, but we are interested in what calling conventions are
necessary to call it. An unadorned "representation" tends to evoke
bit patterns more than it does calling sequences.

Certainly there is some precedent for using "representation" in these
kinds of discussions - "two's complement representation", for example.
For informal discussions it's probably fine. For more precise
descriptions, however, talking about the relationship between "types"
at compile-time and at run-time, "representation type" seems more
both more accurate and more evocative.

[Still good to have gotten the suggestion - thank you.]

Other ideas? Surely there must be some...
 
K

Keith Thompson

jacob navia said:
You are right. I think the best example are enumeration types
that only exist at run time, and where the "underlying" type
is int. The types are different but the underlying representation
is the same.

(As you acknowledged later, that should be "only exist at compilation
time".)

This makes it sound like an enumeration type and type "int" are
fundamentally different things, that an enumeration type exists at
compilation time, but type "int" exists at run time. This is
misleading. The type "int", like an enumeration type, exists only at
compilation time. Rather than saying that the underlying type of an
enumeration type is int, it's more accurate to say that the
enumeration type and int have the same representation. (That
representation might be a machine word, for example.)

And, of course, the representation of an enumeration type may or may
not be the same as the representation of type int; it's up to the
implementation to choose an underlying representation that can hold
all the specified values.

As a rule of thumb, anything having to do with the C language exists
only in your source program or at compilation time, not at run time.
(That's not completely true, since a lot of the names overlap.)

In my opinion, it's best to use the term "type" only for things that
are types in C, not for entities like machine words that exist at run
time.

[...]
Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.

That assumes a particular runtime model, one not required by the C
standard. There isn't necessarily a single sequential address space.
Data and code could be in separate address spaces; for that matter,
each object (declared or created by malloc()) could be in a distinct
address space. The existence of pointer arithmetic implies a
sequential address space within a single object, but not across
objects. A function address could be anything that allows the
function to be called; it could easily be an index into a system table
rather than a machine-level address.

[...]
In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.

A cast operator specifies a type conversion. Not all such conversions
are simple reinterpretations of the bits. Conversions between integer
and floating-point types almost certainly do more than just copying
the bits; conversions between pointer types may or may not do so.
What you're talking about is type punning; converting addresses is one
of several ways to achieve that.

[...]
There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.

These are called incomplete types; it's best to keep your terminology
consistent with standard usage.

Incomplete types and opaque types are two different things, and an
opaque type needn't be an incomplete type. For example, the type FILE
in <stdio.h> is opaque as far as the programmer is concerned (the
standard says nothing about its contents), but I can see what's in it
(in many implementations) by viewing the appropriate header file.
 
K

Keith Thompson

Tim Rentsch said:
If we use the term "representation" by itself, there's an ambiguity
about whether the notion under discussion is generic or specific.
Individual values have a 'representation'; storage that holds any of
a set of values has (or is accessed using) a 'representation type'.
("What's the representation for a NULL pointer?")

In my opinion, what you're calling a "representation type" isn't a
type at all. You can talk about the representation of a type (32-bit
two's-complement), or the representation of a particular value
(hexadecimal DEADBEEF); as long as you're careful, I don't see much of
a problem using the same word for both. If you want to distinguish,
you might consider using a term like "type representation".
Also, using
"representation" by itself doesn't work very well for functions; we
aren't interested in the particular bits that make up a function's
object code, but we are interested in what calling conventions are
necessary to call it. An unadorned "representation" tends to evoke
bit patterns more than it does calling sequences.

I wouldn't talk about the "representation" of a function at all,
either of its object code or of its calling convention. Data items
have representations.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[...]
Personally, I find the term "representation type" confusing. It's
certainly important to distinguish between a "type" (as used in the
standard and in the abstract machine) and "representation" (as spplied
to the hardware), but I don't find it useful to apply the term "type"
to the latter. FWIW, YMMV.

I chose the term "representation type" because historically the word
"type" was used to mean both concepts, and I thought it might help
people who are used to thinking the two concepts are synonymous.
I agree though that a different term might be better. Anyone
have any suggestions?

What about just "representation"?

If we use the term "representation" by itself, there's an ambiguity
about whether the notion under discussion is generic or specific.
Individual values have a 'representation'; storage that holds any of
a set of values has (or is accessed using) a 'representation type'.
("What's the representation for a NULL pointer?")

In my opinion, what you're calling a "representation type" isn't a
type at all.

Wouldn't you say pointers that store just an address and pointers that
store an address and a length are different types of pointers?
Wouldn't you say that a 'cdecl' function and a 'stdcall' function are
different types of functions? Wouldn't you say a number stored in
host order and a number stored in network order are different types of
numbers (even if on the host in question values in the two orderings
always had the same representations)? It makes just as much sense to
say that there are different types of representations as it does to
say that there are different types of variables.

You can talk about the representation of a type (32-bit
two's-complement), or the representation of a particular value
(hexadecimal DEADBEEF); as long as you're careful, I don't see much of
a problem using the same word for both.

It's applicable to one but not the other. The word representation
means "likeness or image"; unless there is something stored somewhere
in the running program, such a byte with a '4' in it and a rule like
"'4' means int", the phrase "the representation of a type" is a misuse
of language. That's a pretty good indicator that this path isn't
the right one to go down.

If you want to distinguish,
you might consider using a term like "type representation".

I don't mean "the representation of a type"; what I mean is "the type
of representation". It seems like "representation type" is a better
term for that.

Suggestions from other quarters have been "machine type",
"representational type", "implementation type", and "representation
schema". Are any of those less confusing or less misleading
than "representation type"?

I wouldn't talk about the "representation" of a function at all,
either of its object code or of its calling convention. Data items
have representations.

That you wouldn't use the same word for functions is a good indication
that it's not really the right term here. Both "data values" and
"function values" have differing patterns of implementation. What
we're trying to find is a term that captures and expresses the idea of
an "implementational pattern" - the same term should apply equally to
differences in function implementation as it does to differences in
data implementation.
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
Tim Rentsch said:
Keith Thompson <[email protected]> writes:
[...]
In my opinion, what you're calling a "representation type" isn't a
type at all.

Wouldn't you say pointers that store just an address and pointers that
store an address and a length are different types of pointers?
Wouldn't you say that a 'cdecl' function and a 'stdcall' function are
different types of functions? Wouldn't you say a number stored in
host order and a number stored in network order are different types of
numbers (even if on the host in question values in the two orderings
always had the same representations)? It makes just as much sense to
say that there are different types of representations as it does to
say that there are different types of variables.

I wouldn't call those things "types" at all. A type is something that
exists in your program. Using the term "type", in a very similar
context, to refer to a different concept can only cause confusion.

On an implementation where int and long have the same representation
(say, both are 32-bit two's-complement), they are distinct types. A
pointer to int and a pointer to long are distinct types. I might call
pointers that store just an address and pointers that store an address
and a length different *kinds* of pointers.
It's applicable to one but not the other. The word representation
means "likeness or image"; unless there is something stored somewhere
in the running program, such a byte with a '4' in it and a rule like
"'4' means int", the phrase "the representation of a type" is a misuse
of language. That's a pretty good indicator that this path isn't
the right one to go down.

I disagree; it's perfectly appropriate to refer to the representation
of a type. The standard does this.

[...]
I don't mean "the representation of a type"; what I mean is "the type
of representation". It seems like "representation type" is a better
term for that.

Suggestions from other quarters have been "machine type",
"representational type", "implementation type", and "representation
schema". Are any of those less confusing or less misleading
than "representation type"?

Just about anything that doesn't use the word "type" for something
that isn't a C type would be better than "representation type".
That you wouldn't use the same word for functions is a good indication
that it's not really the right term here. Both "data values" and
"function values" have differing patterns of implementation. What
we're trying to find is a term that captures and expresses the idea of
an "implementational pattern" - the same term should apply equally to
differences in function implementation as it does to differences in
data implementation.

That I woudn't use the same word for functions indicates that
functions and data items are very different things. On the C level,
the word "type" applies to both. On the implementation level, there
is no term that applies to both (except perhaps something vague like
"entity").
 
J

jacob navia

Keith said:
(As you acknowledged later, that should be "only exist at compilation
time".)

This makes it sound like an enumeration type and type "int" are
fundamentally different things, that an enumeration type exists at
compilation time, but type "int" exists at run time. This is
misleading. The type "int", like an enumeration type, exists only at
compilation time.

Surely not. A glance at the instruction set of a common circuit
for instance (x86) will reveal that the circuit supports integer
operations in hardware. The type int is perfectly supported by
almost all CPUs around.

Types (and type associated concepts) exist in the hardware itself.

C allows for a clear mapping of those hardware types into language
types, but it is obvious that in all processors the type int is
supported...

Floating point can exist at run time too, obviously, and many
CPUs support the type double, float and long double.

These types exist in hardware.

Rather than saying that the underlying type of an
enumeration type is int, it's more accurate to say that the
enumeration type and int have the same representation. (That
representation might be a machine word, for example.)

Enumerated types can be used to encode sets for instance. But this
concepts are not in any instruction generated by the compiler.

The programmer writes:

if (current.flags & (digit|letter)) {
}
And the machine understands:
if (current.flags & 36)
And, of course, the representation of an enumeration type may or may
not be the same as the representation of type int; it's up to the
implementation to choose an underlying representation that can hold
all the specified values.

It would be surprising if it would be floating point however ...

As a rule of thumb, anything having to do with the C language exists
only in your source program or at compilation time, not at run time.
(That's not completely true, since a lot of the names overlap.)

The task of the compiler is just to translate the instructions written
by the programmer as faithfully as possible into machine instructions
that do exactly what was written.

Most types (fortunately) *exist* at run time in a quite real manner.
C exists at run time sorry. For instance when you specify:

float d = (float) a;
and a was a double, the machine will shed precision by writing the
number into memory as a float and re-reading it.

The whole edifice of a programming language is the faithful copy
of program concepts into run-time objects.

In my opinion, it's best to use the term "type" only for things that
are types in C, not for entities like machine words that exist at run
time.

I have to disagree here. The machine should follow exactly the type
description specified in the source program.
[...]

Hardware is represented in C as a sequential space of
addreses, where are stored the data and the preprogrammed
(compiled) instructions sequences, that act with those
data and maybe further inputs.


That assumes a particular runtime model, one not required by the C
standard. There isn't necessarily a single sequential address space.

Some times. I remember the segmented model, and there are many CPUs that
have disjoint data/program areas, and even disjoint data areas of
different types (EPROM, RAM, disks, etc) Within this segments of
memory addresses, a linear sequence exists. When I get a 120GB disk
I can write from byte zero to byte 120GB - formatting overhead.
When I get a 128MB memory disk in RAM/USB the sequence is linear
again.

You yourself are written in a linear sequence of base-pairs, written
atom after atom in your DNA.
Data and code could be in separate address spaces; for that matter,
each object (declared or created by malloc()) could be in a distinct
address space. The existence of pointer arithmetic implies a
sequential address space within a single object, but not across
objects.

What is an object?
Is a character in a character string an object? Can we
imagine a character string where each character resides
in a different address space?

Weird isn't it? It wouldn't be handy.
> A function address could be anything that allows the
function to be called; it could easily be an index into a system table
rather than a machine-level address.

In C
(FnTable[index])(arg1,arg2)
is different from
fnptr(arg1,arg2)

A function expression must resolve to a machine address. This way
it can be passed around as an integer very efficiently. Most of
the power of C comes from this facility, functions as simple integers.

An efficient way of passing a *lot* of context.
[...]

In C you can at any moment change your mind and start
interpreting the same bits in another way. You make a
cast operation, i.e. you apply to some address a new
type.


A cast operator specifies a type conversion. Not all such conversions
are simple reinterpretations of the bits. Conversions between integer
and floating-point types almost certainly do more than just copying
the bits; conversions between pointer types may or may not do so.
What you're talking about is type punning; converting addresses is one
of several ways to achieve that.

True, I was speaking about re-interpreting because I wanted to emphasize
that memory is interpreted by the program. With this I am introducing
the discussion about strongly typed/weakly typed languages, that I hope
to come later on. This facility of re-interpreting memory is absent or
much more difficult in several other languages. As everything this can
be handy if well used, or a nightmare if abused.
[...]

There are even types that you can't figure out at all:
opaque types. This type is, for instance:

struct unknown *bn;

and struct "unknown" is nowhere defined. Or even worst:

void *bn;

A pointer that points to an unknown object.
You can do only one thing with this pointer:
pass it around.


These are called incomplete types; it's best to keep your terminology
consistent with standard usage.

struct unknown * is incomplete, void * not. To a beginner, an
expression like void * must be utterly strange. I will explain
this more later on.
Incomplete types and opaque types are two different things, and an
opaque type needn't be an incomplete type. For example, the type FILE
in <stdio.h> is opaque as far as the programmer is concerned (the
standard says nothing about its contents), but I can see what's in it
(in many implementations) by viewing the appropriate header file.

If you can see the contents, then its not opaque. Of course, any
non-opaque structure can be converted in an opaque one if you refuse
to look into it, but this would be playing with words. Normally, since
it is not specified in the standard it is better not to mess with it,
I agree with that, but with real opaque structures like void * that
is no longer possible. You can't use them, they enforce themselves
by definition.

Thanks for your feedback.
 
J

jacob navia

I think you make a too sharp separation between the language
and the concrete run time. I have a different opinion, but I
answered you two threads above this one.

In a few words again:

The crux of the matter is that the language is implemented at
run time by the compiler, that translates the types specified
in the program into run time types. The run time types exist
and they are the types specified in the program text.

C programs exist at run time with all their type machinery
active and running as specified.
 
K

Keith Thompson

jacob navia said:
Surely not. A glance at the instruction set of a common circuit
for instance (x86) will reveal that the circuit supports integer
operations in hardware. The type int is perfectly supported by
almost all CPUs around.

Types (and type associated concepts) exist in the hardware itself.

C allows for a clear mapping of those hardware types into language
types, but it is obvious that in all processors the type int is
supported...

You probably won't find the term "int" in a CPU reference manual.
"int" is a C type, not a machine-level concept. The corresponding
CPU-level concept is probably something like a "word".

Yes, of course the CPU supports integer operations (note the
relatively generic term "integer" rather than the C-specific term
"int"). It happens that C's type "int" is very likely to be mapped
almost directly to a machine "word", but they're still two different
concepts that exist in two different contexts. In C, "int" and "long"
are two distinct types, even if they're both the same size; in the
CPU, that distinction no longer exists.
Floating point can exist at run time too, obviously, and many
CPUs support the type double, float and long double.

These types exist in hardware.

They exist *as types* only in a C program, either in source or during
compilation. They are mapped onto operations in the hardware. (And
some CPUs don't directly support floating point, but emulate it in
software.)

[...]
The task of the compiler is just to translate the instructions written
by the programmer as faithfully as possible into machine instructions
that do exactly what was written.

The task of the compiler is to *map* what the programmer wrote into
machine instructions. The nature of that mapping varies from one
compiler to another, and from one CPU to another. Types are a
high-level language concept. Pretending that the CPU-level concepts
are the "types" in the same sense as int and long is misleading.
Most types (fortunately) *exist* at run time in a quite real manner.
C exists at run time sorry. For instance when you specify:

float d = (float) a;
and a was a double, the machine will shed precision by writing the
number into memory as a float and re-reading it.

Unless float and double are the same size (which is perfectly legal).

Or the CPU manual might refer to single-precision and double-precision
reals. Yes, the C types are most likely mapped directly to certain
machine-level representations, but the *types* float and double exist
only on the C side of that mapping.

[...]
I have to disagree here. The machine should follow exactly the type
description specified in the source program.

If it followed it exactly, wouldn't there be distinct machine-level
"types" for int and long?
Some times.
[snip]

Yes, sometimes there is, sometimes there isn't. I'm not sure what
point you're trying to make; my point is that asserting that "Hardware
is represented in C as a sequential space of addreses" is misleading.

[...]
What is an object?

The term is defined in the standard.
Is a character in a character string an object?

Yes, and it happens to be a component of another object.
Can we
imagine a character string where each character resides
in a different address space?

No. Or rather, we can imagine it, but it wouldn't be legal in C.

To use a concrete example:

char s[6] = "hello";

s[1] and s[2] are both objects of type char. s is an object of type
char[6]. Since the object s has to be in a locally linear address
space (which it may or may not share with other objects), it follows
that s[1] and s[2] must be in the same address space. Thus it's legal,
for example, to compute (&s[2] - &s[1]).

Given:

char t[6] = "fubar";

s and t could be in distinct address spaces, and computing (&t[2] -
&s[1]) invokes undefined behavior.
A function address could be anything that allows the
function to be called; it could easily be an index into a system table
rather than a machine-level address.

In C
(FnTable[index])(arg1,arg2)
is different from
fnptr(arg1,arg2)

A function expression must resolve to a machine address. This way
it can be passed around as an integer very efficiently. Most of
the power of C comes from this facility, functions as simple integers.

An efficient way of passing a *lot* of context.

I meant that a function pointer could be implemented as an index into
a system table. On many implementations, a function pointer is
implemented as a machine address, which looks like an integer index
into the entire address space (either of the machine or of the current
process). But an implementation that represented function pointers as
indices could be conforming.

The C implementation on the AS/400 represents function pointers as
some kind of large descriptor, not as a machine address.

All this is equally true for object pointers. Any pointer can be
represented in nearly any way the implementation chooses, as long as
the semantics are implemented consistently. Implementing pointers as
machine addresses happens to result in more efficient code on most
machines, but the C standard specifically doesn't require it.
True, I was speaking about re-interpreting because I wanted to emphasize
that memory is interpreted by the program. With this I am introducing
the discussion about strongly typed/weakly typed languages, that I hope
to come later on. This facility of re-interpreting memory is absent or
much more difficult in several other languages. As everything this can
be handy if well used, or a nightmare if abused.

My objection was to the reference to a cast operator. Not all casts
just re-interpret their operands, and not all type punning is done via
cast operators.
struct unknown * is incomplete, void * not. To a beginner, an
expression like void * must be utterly strange. I will explain
this more later on.

Neither "struct unknown *" nor "void *" is an incomplete type; they're
pointers, possibly to incomplete types. "struct unknown" is an
incomplete type if there's no definition for the complete type.
"void" is an incomplete type that cannot be completed.
If you can see the contents, then its not opaque. Of course, any
non-opaque structure can be converted in an opaque one if you refuse
to look into it, but this would be playing with words. Normally, since
it is not specified in the standard it is better not to mess with it,
I agree with that, but with real opaque structures like void * that
is no longer possible. You can't use them, they enforce themselves
by definition.

The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.
Thanks for your feedback.

You're welcome.
 
C

CBFalconer

Keith said:
.... snip ...

The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.

Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.
 
K

Keith Thompson

CBFalconer said:
Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.

Good point, I had forgotten about that.
 
R

Richard Bos

CBFalconer said:
Unfortunately making it truly opaque would often prevent the very
useful implementation of getc and putc as macros. So the only
solution left is to yell stridently at programmers who read the
definition of FILE.

Not necessarily. The standard headers needn't be available as files; and
even when an implementor wants to make most of his <stdio.h> legible to
the user, nothing need stop him from having it contain something like

#include <FILE_magic>
#define FILE __FILE_magic_FILE
#define getc __FILE_magic_getc

Richard
 
D

Dan Pop

In said:
The type FILE is an odd case. It's intended to act like an opaque
type, in the sense that the user isn't supposed to look inside it --
and a conforming implementation probably could make it a genuine
incomplete type, hiding the actual definition inside the library
implementation.

Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation would be
non-conforming.

I see no good reason for this requirement in the standard (private copies
of FILE objects generated by the standard C library are useless), but the
requirement is there and cannot be ignored by conforming implementations.

Dan
 
C

CBFalconer

Richard said:
Not necessarily. The standard headers needn't be available as files;
and even when an implementor wants to make most of his <stdio.h>
legible to the user, nothing need stop him from having it contain
something like

#include <FILE_magic>
#define FILE __FILE_magic_FILE
#define getc __FILE_magic_getc

How does this prevent the snooper from reading FILE_magic? If you
wire those definitions into the compiler, and eliminate the
existence of FILE_magic, then you have given up the flexibility of
revising the actual FILE implementation, causing attendant future
pain.
 
E

Eric Sosman

Dan said:
Nope, it cannot.

2 The types declared are size_t (described in 7.17);

FILE

which is an object type ...

An incomplete type is not an object type, so your implementation would be
non-conforming.

A substantial amount of hiding is possible even so,
at the cost of one extra indirection level:

/* <stdio.h> */
typedef struct __file_magic *FILE;
...

`FILE' is now an object type (to wit, a pointer), and that
much is revealed as required. The nature of what it points
to, though, remains hidden.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,148
Messages
2,570,834
Members
47,380
Latest member
AlinaBlevi

Latest Threads

Top