substring assignment in fortran, C, etc.

K

Keith Thompson

glen herrmannsfeldt said:
< Hmm... Not in case (3). Assignment means using = and arrays can't be
< assigned. Some people talk of strcpy(a, "123") as assigning a string
< to the array, but that is loose talk at best. It is just the wrong
< word.

<> But form (1) requires the assignment as a="123456", while form (3)
<> requires instead strcpy(a,"123456").

< Oh, you're one of them! Sorry. No, don't call it that. You are
< copying a string form one place to another.

Are you disqualifying it because of function notation, or because
it is a function call? Compilers may implement it inline, and
Fortran CHARACTER assignment may be implemented internally
as a function call. Yes it is different, but not that different.

Sure, but C defines the term "assignment", and it's the thing
specified by an assignment operator.
Does a="123456" in Fortran "copy" a string?

I don't know, does it? In C, it would copy the address of the
string's first character, not the string itself.

[...]
< a = malloc(strlen(b) + 1);
< if (a)
< strcpy(a, b);

<> Otherwise said I cannot declare a string of undefined length as char
<> a[] unless I also initialize it (like CHARACTER*(*) valid only for a
<> PARAMETER constant in a main).

As has been said, this does not generate an array of undefined
length, it has the appropriate length for its initial value.

C doesn't have SIZE, but you can use sizeof() to determine
the size, which is a compile time constant. (sizeof(a)/sizeof(*a))

That works if a is declared as an array, but in this case it's a
pointer object, and (sizeof(a)/sizeof(*a)) won't give you anything
meaningful.

C arrays are very commonly manipulated by passing pointers around,
e.g., as function arguments. When you do this, you have to have some
other way to determine the array's length. You can either pass the
length as a separate argument, or you can use strlen() to search for
the trailing '\0' if the array happens to contain a string (note that
the length of the string and the size of the array that contains it
are not the same thing).

I guess that Fortran treats arrays more like first-class objects, but
I'm not at all familiar with Fortran.
 
J

jameskuyper

glen said:
< Hmm... Not in case (3). Assignment means using = and arrays can't be
< assigned. Some people talk of strcpy(a, "123") as assigning a string
< to the array, but that is loose talk at best. It is just the wrong
< word.

<> But form (1) requires the assignment as a="123456", while form (3)
<> requires instead strcpy(a,"123456").

< Oh, you're one of them! Sorry. No, don't call it that. You are
< copying a string form one place to another.

Are you disqualifying it because of function notation, or because
it is a function call? Compilers may implement it inline, and
Fortran CHARACTER assignment may be implemented internally
as a function call. Yes it is different, but not that different.

The C standard clearly distinguishes between assignment expressions
and function call expressions. Neither term has any meaning in C that
does not involve such expressions, and it's never ambiguous whether a
given expression is an assignment expression or a function call
expression. The distinction might be blurred in some other contexts,
but not in C.

What happens internally might be that a function call expression
actually gets inlined, or that an assignment expression is converted
into a function call, but since that's all up to the implementation,
it's not really meaningful to even talk about it without first
specifying which particular implementation of C you're referring to.
It is important to know which type of expression you're talking about,
because what you can do with the expression is quite different in the
two cases.
Does a="123456" in Fortran "copy" a string?

I don't know. In C,

char word[] = "hello";
char pa = word;
strcpy(pa, "world", sizeof "world");
pa = "other";

The third line contains a strcpy() call that copies the array set
aside for the string literal "world", into the array named 'word'. The
last statement, on the other hand, contains an assignment expression
that changes pa so that it no longer points at 'word', it instead
points at the unnamed array set aside for the string literal "other".
Which of those two statements is most similar to what the Fortran
statement does? Or is it perhaps meaningless to even compare them?

....
C doesn't have SIZE, but you can use sizeof() to determine
the size, which is a compile time constant. (sizeof(a)/sizeof(*a)).

Not in C99. The following is legal C99 code, where the value of a
sizeof expression cannot, in general, be determined at compile time:

size_t func(int n)
{
char vla_array[n];
return sizeof vla_array;
}

Keep in mind that while sizeof gives the size of the array, it is
strlen() which gives the length of the string stored in that array.
That length must be shorter than the length of the array by at least 1
(for the terminating null character), unless the array doesn't contain
a string, in which case calling strlen() has undefined behavior.
 
G

glen herrmannsfeldt

(I wrote)

<> Does Fortran have a way to dimension
<> an array of the appropriate length for its initial value?

< character( len= *), parameter :: name = 'value'

< As of f08, array constants may be set to the correct size
< for their initial value.

Well, the question was for arrays, though in C strings are arrays.

C has:

int i[]={1,2,3,4,5};

which will dimension i to 5 and initialize it. I believe this
would be somthing like:

integer i(*)=(/ 1,2,3,4,5,/)

which I don't believe is legal as of Fortran 2003.

C also has:

int i[][3]={{1,2,3},{4,5,6}};

(Only the leftmost can be left out.)

< Allocatable arrays are set to the correct size automatically
< on assignment.

Yes, but that is a different question.

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)

<> Are you disqualifying it because of function notation, or because
<> it is a function call? Compilers may implement it inline, and
<> Fortran CHARACTER assignment may be implemented internally
<> as a function call. Yes it is different, but not that different.

< The C standard clearly distinguishes between assignment expressions
< and function call expressions. Neither term has any meaning in C that
< does not involve such expressions, and it's never ambiguous whether a
< given expression is an assignment expression or a function call
< expression. The distinction might be blurred in some other contexts,
< but not in C.

I completely agree. But if you ask how to do the equivalent of
the Fortran assignment:

c="hi there"

in C, and one answers with

strcpy(c,"hi there");

then it isn't fair to disqualify it as being a function call.
It performs the same operation but has a different name.

< What happens internally might be that a function call expression
< actually gets inlined, or that an assignment expression is converted
< into a function call, but since that's all up to the implementation,
< it's not really meaningful to even talk about it without first
< specifying which particular implementation of C you're referring to.
< It is important to know which type of expression you're talking about,
< because what you can do with the expression is quite different in the
< two cases.

<> Does a="123456" in Fortran "copy" a string?

< I don't know. In C,

(snip, then I wrote)

<> C doesn't have SIZE, but you can use sizeof() to determine
<> the size, which is a compile time constant. (sizeof(a)/sizeof(*a)).

< Not in C99. The following is legal C99 code, where the value of a
< sizeof expression cannot, in general, be determined at compile time:

< size_t func(int n)
< {
< char vla_array[n];
< return sizeof vla_array;
< }

Well, I meant specifically for the case of initialized arrays
that otherwise are of unknown length. If instead you do:

size_t func(int n)
{
int vla_array[n];
return sizeof vla_array/sizeof(int);
}

Would you expect a run time divide to be done? (Not that divide
is so slow on modern computers, but still one might want to know.)

< Keep in mind that while sizeof gives the size of the array, it is
< strlen() which gives the length of the string stored in that array.
< That length must be shorter than the length of the array by at least 1
< (for the terminating null character), unless the array doesn't contain
< a string, in which case calling strlen() has undefined behavior.

I meant it in the general array sense, where the terminating null
doesn't appear. There is no need to divide by sizeof(char) in
the character case. If one did

int i[]={1,2,3};

then using sizeof() is the only way to determine the actual size,
unless the length is hard coded, which removes the advantage of
this notation. (Well, one could code a special last value even
in the int case.)

-- glen
 
B

BartC

jameskuyper said:
glen said:
Does a="123456" in Fortran "copy" a string?

I don't know. In C,

char word[] = "hello";
char pa = word;
strcpy(pa, "world", sizeof "world");
pa = "other";

It makes a bit more sense like this:

char word[] = "hello";
char *pa = word;
memcpy(pa, "world", sizeof "world");
pa = "other";

Or perhaps with the copy line like this:

strcpy(pa, "world");

But either way it would go wrong if the "world" string was longer than the
"hello" string.
 
K

Keith Thompson

glen herrmannsfeldt said:
(snip, I wrote)

<> Are you disqualifying it because of function notation, or because
<> it is a function call? Compilers may implement it inline, and
<> Fortran CHARACTER assignment may be implemented internally
<> as a function call. Yes it is different, but not that different.

< The C standard clearly distinguishes between assignment expressions
< and function call expressions. Neither term has any meaning in C that
< does not involve such expressions, and it's never ambiguous whether a
< given expression is an assignment expression or a function call
< expression. The distinction might be blurred in some other contexts,
< but not in C.

I completely agree. But if you ask how to do the equivalent of
the Fortran assignment:

c="hi there"

in C, and one answers with

strcpy(c,"hi there");

then it isn't fair to disqualify it as being a function call.
It performs the same operation but has a different name.

I don't think anyone intended to "disqualify" it. If the Fortran
c="hi there"
means the same thing as C's
strcpy(c,"hi there")
then yes, it's the equivalent. The C function call just isn't an
assignment.

That's actually a fairly important point: C's equivalent of that
particular Fortran assignment statement is a function call, not an
assignment.

[...]
Well, I meant specifically for the case of initialized arrays
that otherwise are of unknown length. If instead you do:

size_t func(int n)
{
int vla_array[n];
return sizeof vla_array/sizeof(int);
}

Would you expect a run time divide to be done? (Not that divide
is so slow on modern computers, but still one might want to know.)

Probably. Actually sizeof(int) is likely to be a power of 2, so the
division is likely to be replaced by a shift.

It's also possible that a compiler might be clever enough to notice
that sizeof vla_array is the length of the array times sizeof(int),
and optimize it to the equivalent of "return n;". Note that the
length of the array is the value of n when the declaration is reached,
so the compiler also has to allow for the possibility that n has
changed. (It will probably just save the value somewhere.)
< Keep in mind that while sizeof gives the size of the array, it is
< strlen() which gives the length of the string stored in that array.
< That length must be shorter than the length of the array by at least 1
< (for the terminating null character), unless the array doesn't contain
< a string, in which case calling strlen() has undefined behavior.

I meant it in the general array sense, where the terminating null
doesn't appear. There is no need to divide by sizeof(char) in
the character case. If one did

int i[]={1,2,3};

then using sizeof() is the only way to determine the actual size,
unless the length is hard coded, which removes the advantage of
this notation. (Well, one could code a special last value even
in the int case.)

Right, but in C there's always the danger of confusing arrays and
pointers. A parameter declared to be of an array type is really a
pointer, and applying sizeof to it will just give you the size of the
pointer.
 
B

BartC

glen herrmannsfeldt said:
(snip, I wrote)

<> Are you disqualifying it because of function notation, or because
<> it is a function call? Compilers may implement it inline, and
<> Fortran CHARACTER assignment may be implemented internally
<> as a function call. Yes it is different, but not that different. .....
I completely agree. But if you ask how to do the equivalent of
the Fortran assignment:

c="hi there"

in C, and one answers with

strcpy(c,"hi there");

then it isn't fair to disqualify it as being a function call.
It performs the same operation but has a different name.

It means it isn't really built-in to the language.

If you used a language that required function calls for everything:
assign(), add(), mul(), equal(), and so on, then you might come to a similar
conclusion. Even if a particularly clever compiler managed to inline these
function calls...

Apart from the anomaly of structs, the core language of C only likes pushing
around bytes and machine words. Anything more high level needs to be
explicitly built on top of that.
 
G

glen herrmannsfeldt

(snip, I wrote)

<> I meant it in the general array sense, where the terminating null
<> doesn't appear. There is no need to divide by sizeof(char) in
<> the character case. If one did

<> int i[]={1,2,3};

<> then using sizeof() is the only way to determine the actual size,
<> unless the length is hard coded, which removes the advantage of
<> this notation. (Well, one could code a special last value even
<> in the int case.)

< Right, but in C there's always the danger of confusing arrays and
< pointers. A parameter declared to be of an array type is really a
< pointer, and applying sizeof to it will just give you the size of the
< pointer.

Hopefully new C programmers learn this pretty fast, but yes.

Though I don't know that it is any worse than the tricks
used with assumed size arrays in Fortran. In the Fortran 66 days
it was common to dimension the dummy array (1), and to compute
the appropriate offset, even for a multidimension actual argument.

Later, the (*) notation was added. C code like:

int x[20];
sub(x+10,10);

can be done in Fortran with assumed size arrays as:

integer x(20)
call sub(x(11),10)

where, as with C, the appropriate length should be passed
unless it is otherwise known to the callee.

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)

<> I completely agree. But if you ask how to do the equivalent of
<> the Fortran assignment:

<> c="hi there"

<> in C, and one answers with

<> strcpy(c,"hi there");

<> then it isn't fair to disqualify it as being a function call.
<> It performs the same operation but has a different name.

< It means it isn't really built-in to the language.

The question about the C library being part of the language,
or "built in" comes up pretty often. So SQRT is not really
built into Fortran since it is a function call? Since the
addition of generic intrinsic functions it is hard to say that
it isn't. In the C case, strcpy() is "built in" enough that
you might get surprised coding your own function with that name.

< If you used a language that required function calls for everything:
< assign(), add(), mul(), equal(), and so on, then you might come
< to a similar conclusion. Even if a particularly clever compiler
< managed to inline these function calls...

In Mathematica everything is a function (if not a function call),
but there are shortcuts. A+B is a short way to write Add[A,B]
but internally it is the same.

< Apart from the anomaly of structs, the core language of C
< only likes pushing around bytes and machine words.
< Anything more high level needs to be explicitly built on top
< of that.

#include <stdio.h>
int main() {
struct {
int x[10];
} a,b;
memset(&b,0,sizeof(b));
a=b;
printf("%d\n",a.x[3]);
}

Funny thing in C. You can't assign arrays, but you can assign
structures, even ones that contain arrays.

Somewhat similar to Fortran, with no arrays of pointers, but
arrays of structures containing pointers exist.

-- glen
 
J

jameskuyper

glen said:
(snip, I wrote) ....
< The C standard clearly distinguishes between assignment expressions
< and function call expressions. Neither term has any meaning in C that
< does not involve such expressions, and it's never ambiguous whether a
< given expression is an assignment expression or a function call
< expression. The distinction might be blurred in some other contexts,
< but not in C.

I completely agree. But if you ask how to do the equivalent of
the Fortran assignment:

c="hi there"

in C, and one answers with

strcpy(c,"hi there");

then it isn't fair to disqualify it as being a function call.
It performs the same operation but has a different name.

In C, c="hi there" and strcpy(c, "hi there") never perform the same
operation. if c has a pointer type, they perform two very different
operations, and only the first one is an assignment. If c has an array
type, c="hi there" is a constraint violation, and strcpy(c, "hi
there") does yet a third thing, not quite the same as either of the
operations involving the pointer. I'm not sure whether any of those
things is really equivalent to what Fortran does.

....
<> C doesn't have SIZE, but you can use sizeof() to determine
<> the size, which is a compile time constant. (sizeof(a)/sizeof(*a)).

< Not in C99. The following is legal C99 code, where the value of a
< sizeof expression cannot, in general, be determined at compile time:

< size_t func(int n)
< {
< char vla_array[n];
< return sizeof vla_array;
< }

Well, I meant specifically for the case of initialized arrays
that otherwise are of unknown length. ...

In context, I thought you were making a comment about sizeof in
general.
...If instead you do:

size_t func(int n)
{
int vla_array[n];
return sizeof vla_array/sizeof(int);
}

Would you expect a run time divide to be done? (Not that divide
is so slow on modern computers, but still one might want to know.)

I'd expect a decent compiler to optimize that to the equivalent of

size_t func(int n) { return n;}
 
K

Keith Thompson

glen herrmannsfeldt said:
< Right, but in C there's always the danger of confusing arrays and
< pointers. A parameter declared to be of an array type is really a
< pointer, and applying sizeof to it will just give you the size of the
< pointer.

Hopefully new C programmers learn this pretty fast, but yes.
[...]

In an ideal world they learn it pretty fast. Unfortunately ...

Section 6 of the comp.lang.c FAQ, <http://www.c-faq.com>, does a good
job of clearing up the confusion.
 
G

grocery_stocker

LC's No-Spam Newsreading account wrote:
...
I've even seen people using "typedef char * string;"

Such typedefs reflect and reinforce the misconception that C has a
string data type. It does not. It does have a string data format, but
you can use many different C constructs to store data in that format.
A char* can be used to point at the first element of a string, but it
is not itself a string.
I can use declarations like :
1) char *a
2) char a[]
3) char a[somenumber]
I am not at all scandalized by the third form (I'm used since ever to
the Fortran CHARACTER*somenumber A), but I thought that C could have
"variable-and-dynamic-length" null terminated strings, contrary to the
more rigid fixed-length strings of Fortran,

The difference between 2) and 3) is entirely in how the fixed length
of the array is determined. They both have a fixed length. They both
can contain strings of any length up to but not including the length
of the array. They are both capable of containing multiple strings,
which is an example of the fact that, in C, "string" is a data format,
not a data type.
Now the point comes to whether I add initialization (DATA statement in
my Fortran parliance) and assignment.
I could initialize string a adding e.g. ="123456" to the declaration of
form (1) and (2), but not (3).

You're incorrect about (3). The definition

char a[5] = "123456";

would be a constraint violation. However, the definitions

char b[6] = "123456";
char c[7] = "123456";
char d[8] = "123456";

are all perfectly fine. Note: b does not contain a string, since it
has no terminating null character.
... Also I am OBLIGED to do the
initialization with the undefined length array notation a[] ( I *must*
use (2D), while (2) gives compiler error 'array size missing' )

That's because it is not an "undefined length array", it's an
implicitly defined length, and without the initializer there's nothing
to implicitly define the length.
1D) char *a="123456" ;
2D) char a[]="123456" ;
If I do not initialize (forms (1) and (3)) I can later assign a value.

No, only the pointer can be assigned to. You can assign to the
elements of the arrays in case (2) and (3), and by doing so you can
create one or more strings in them. However, this is true whether or
not you initialize them.
But form (1) requires the assignment as a="123456", while form (3)
requires instead strcpy(a,"123456").

No, there are many different ways to assign a value to the pointer.
The key point to keep in mind is that declaring a pointer doesn't
initialize any memory for a character string. That has to be done
separately; for instance, using the string literal "123456" causes an
unnamed array to be created to contain the corresponding string, and
using that string literal to initialize a char* variable causes that
variable to be set to point at the the first element of the array. But
that pointer could be set to point at any other char in that array, or
in any other char array, for that matter.

strcpy() is one way to copy a string from one array to another, but
there are many others. It works just as well for (2) as for (3).
What is more important, the first argument (destination) of strcpy
cannot be a dynamic length string (1)

Incorrect. If the pointer were set to point at writable memory (which
it currently is not - the arrays created by using string literals are
not safely writeable), strcpy() could also be used to copy the string
into whichever location in memory it is currently pointing at.
... i.e. char *a ! If it is one gets a
segmentation fault. ...

That is true only if it points at a memory segment that you don't
currently have permission to write to. Whether or not this is the case
for the arrays created to store string literals is up to the
implementation, which is why its not safe to assume that you can write
to them.
... It must be a character array (2) or (3).
Otherwise said I cannot declare a string of undefined length as char a[]
unless I also initialize it (like CHARACTER*(*) valid only for a
PARAMETER constant in a main).
Is all this correct ?

Not really. You've confused the issue by using the same name for all
three cases. Let me distinguish them as follows:

char *pc = "123456";
char imp_length[] = "123456";
char exp_length[7] = "123456";

Any use of the string literal "123456" anywhere in your program causes
at least one unnamed array of char to be created, initialized with the
valued '1', '2', '3', '4', '5', '6', '\0', in that order. It's
entirely up to the implementation whether or not all uses of "123456"
refer to the same array, or whether each such use refers to a
different array. In addition, it's entirely up to the implementation
whether or not the array created for "123456" occupies the same
location in memory as the last seven elements of the array created for
"0123456". The behavior of any program that attempts to write anything
into one of those blocks of memory is undefined.

The variable named pc is a pointer that is initialized to point at the
first character in one of those blocks of memory. It could, at any
later time, be re-set to point at some other piece of memory. The
following statement:

pc = &imp_length[3] ;

causes pc to point at the char within imp_length which has the value
'4'. Here's where the difference between a data type and a data format
comes into play: &imp_length[n] is itself a pointer to the first
character of a string with a length of 5-n, for any value of n from 0
to 5. All of those strings share the same terminating null character.
five of them share the same '5' character, etc. Until you understand
that statement, you really don't understand what C strings are.

I don't see how &imp_len(n) have the same terminating null character
and how all 5 of them share the same '5 character.
imp_length is an array of 7 characters; the length is determined
implicily by counting the characters in the string literal "123456",
and adding 1 for the terminating null character. That array is filled
in by copying from the array used store the string literal. In this
case, there's no way for your program to even determine whether the
string literal's array actually exists; which means that in some cases
it won't actually exist; the only copy of those characters could be in
imp_length itself. Having been initialized with "123456", you're free
to change the contents of that array; in particular, the statement

imp_length[3] = '\0';

means that it no longer contains a string of length 6. It now starts
with a string of length 3; and contains another string of length 2
starting at &imp_length[4]. It also contains 5 other strings, but
they're just subsets of those two strings.

I also don't see how this contains another string of length two
starting at &imp_length[4].
 
G

glen herrmannsfeldt

In comp.lang.fortran Keith Thompson said:
glen herrmannsfeldt said:
< Right, but in C there's always the danger of confusing
< arrays and pointers. A parameter declared to be of an
< array type is really a pointer, and applying sizeof to it
< will just give you the size of the pointer.
Hopefully new C programmers learn this pretty fast, but yes.
[...]

In an ideal world they learn it pretty fast. Unfortunately ...

I was doing assembly programming years before I learned C,
which made it pretty easy to understand pointers. I suppose
it is harder if you don't understand addresses.

-- glen
 
N

nmm1

glen herrmannsfeldt said:
< Right, but in C there's always the danger of confusing arrays and
< pointers. A parameter declared to be of an array type is really a
< pointer, and applying sizeof to it will just give you the size of the
< pointer.

Hopefully new C programmers learn this pretty fast, but yes.
[...]

In an ideal world they learn it pretty fast. Unfortunately ...

Section 6 of the comp.lang.c FAQ, <http://www.c-faq.com>, does a good
job of clearing up the confusion.

Don't bet on it. You do know that it isn't actually specified by
the C standard, don't you? I am not denigrating that FAQ, so much
as pointing out the term "clearing up the confusion" is misleading!

Firstly, the conversion rules between arrays and pointers and back
again allow for indefinite and infinite implicit recursion - the
consensus is that this is a constraint on the implementation to
ensure that they are equivalent, but it's not even hinted at in any
wording. It isn't visible in C as such, but becomes so as soon as
you extend it (e.g. by adding safe pointers, garbage collection,
OpenMP-style parallelism etc.)

Secondly, the wording of 6.3.2.1#3 is seriously ambiguous about
WHEN the conversion takes place, and there are certain reasonable
interpretations that provide visible differences. During the early
years of C89 implementations, several of them varied in how they
implemented array arguments. This was one of the many ignored NB
comments on either C89 or C99. Please ask for examples, if you want.


Glen can be excused for thinking that the array/pointer mess is not
much worse than the Fortran assumed size and array element to array
one, because you need to have been deeply into the C standard to
know just how bad it is in ISO C. K&R C was much cleaner.


Regards,
Nick Maclaren.
 
B

BartC

glen herrmannsfeldt said:
(snip, I wrote)

<> I completely agree. But if you ask how to do the equivalent of
<> the Fortran assignment:

<> c="hi there"

<> in C, and one answers with

<> strcpy(c,"hi there");

<> then it isn't fair to disqualify it as being a function call.
<> It performs the same operation but has a different name.

< It means it isn't really built-in to the language.

The question about the C library being part of the language,
or "built in" comes up pretty often. So SQRT is not really
built into Fortran since it is a function call? Since the
addition of generic intrinsic functions it is hard to say that
it isn't. In the C case, strcpy() is "built in" enough that
you might get surprised coding your own function with that name.

I don't know how Fortran works it, but C specifically likes to say sqrt()
etc work like ordinary user functions, even though compilers secretly know
about them. To make sure sqrt() etc are generally available, they are
included in the standard environment.

Sometimes it's just a syntax choice: sqrt can look like "sqrt()" and be
actually built-in (in languages I create, sqrt is actually an operator).

Built-in functions (or operators that look like functions) can also have
some extra magic that allow them to work on several different types; in C
that usally isn't possible for user-functions.
< If you used a language that required function calls for everything:
< assign(), add(), mul(), equal(), and so on, then you might come
< to a similar conclusion.
In Mathematica everything is a function (if not a function call),
but there are shortcuts. A+B is a short way to write Add[A,B]
but internally it is the same.

That's another way of doing things. Perhaps Mathematica also allows all
sorts of special symbols (such as a proper square root sign for sqrt) so
needs alphanumeric alternatives, and Add[] is there for completeness.
< Apart from the anomaly of structs, the core language of C
< only likes pushing around bytes and machine words.
< Anything more high level needs to be explicitly built on top
< of that.

#include <stdio.h>
int main() {
struct {
int x[10];
} a,b;
memset(&b,0,sizeof(b));
a=b;
printf("%d\n",a.x[3]);
}

Funny thing in C. You can't assign arrays, but you can assign
structures, even ones that contain arrays.

Add a hundred more funny things, and you have a pretty good picture of C...
 
N

nmm1

I don't know how Fortran works it, but C specifically likes to say sqrt()
etc work like ordinary user functions, even though compilers secretly know
about them. To make sure sqrt() etc are generally available, they are
included in the standard environment.

Not quite. The functions ARE ordinary functions, but may be wrapped
by macros, which need not call an ordinary function. The only special
thing about sqrt, as a function, is that it may not be replaced by
the user even if <math.h> is not used.

This was more-or-less true in Fortran 66, completely changed (rather
ambiguously) Fortran 77, and specified properly in Fortran 90. You
can now use SOME intrinsic procedures in the context of user-defined
ones, but they aren't necessarily the same procedure as the one you
call.


Regards,
Nick Maclaren.
 
J

James Kuyper

grocery_stocker said:
Not really. You've confused the issue by using the same name for all
three cases. Let me distinguish them as follows:

char *pc = "123456";
char imp_length[] = "123456";
char exp_length[7] = "123456";

Any use of the string literal "123456" anywhere in your program causes
at least one unnamed array of char to be created, initialized with the
valued '1', '2', '3', '4', '5', '6', '\0', in that order. It's
entirely up to the implementation whether or not all uses of "123456"
refer to the same array, or whether each such use refers to a
different array. In addition, it's entirely up to the implementation
whether or not the array created for "123456" occupies the same
location in memory as the last seven elements of the array created for
"0123456". The behavior of any program that attempts to write anything
into one of those blocks of memory is undefined.

The variable named pc is a pointer that is initialized to point at the
first character in one of those blocks of memory. It could, at any
later time, be re-set to point at some other piece of memory. The
following statement:

pc = &imp_length[3] ;

causes pc to point at the char within imp_length which has the value
'4'. Here's where the difference between a data type and a data format
comes into play: &imp_length[n] is itself a pointer to the first
character of a string with a length of 5-n, for any value of n from 0
to 5. All of those strings share the same terminating null character.
five of them share the same '5' character, etc. Until you understand
that statement, you really don't understand what C strings are.

I don't see how &imp_len(n) have the same terminating null character
and how all 5 of them share the same '5 character.

That should have been 6-n, not 5-n, 6 instead of 5, and '6', instead of
'5'. An early draft of my message had "12345" instead of "123456". When
I corrected it, I thought I had made all corresponding adjustments, but
I missed those.

The array imp_length contains the following elements:

{'1', '2', '3', '4', '5', '6', '\0'}

In C, "A string is a contiguous sequence of characters terminated by and
including the first null character." (7.1.1p1). Notice the complete lack
of any requirements on where those characters are located.

&imp_length[6] points at the '\0' character. That is a contiguous
sequence of 1 character, which ends with a null character; in itself it
qualifies as a string of length 0.

&imp_length[5] points at the '6' character, which is the start of a
contiguous sequence of 2 characters, ending with a null character, so
they constitute a string of length 1; the null character that terminates
this string is the same as the one that was the entire string of length
0 pointed at by &imp_length[6].

&imp_length[4] points at the '5' character, which is the start of a
contiguous sequence of 3 characters, terminated by a null character, so
they constitute a string of length 2. The '6' character in this string
is the same as the one in the string of length 1 mentioned above.

All of these overlapping strings are equally usable as such, as far as
the C standard library is concerned. Note that strings are entirely an
issue for the C standard library (that's why they're not defined until
section 7, which describes the library). The C language itself has
string literals, but strings mean nothing at the language level.

You can pass &imp_length[n], for n from 0 to 6, to any C standard
library function that handles strings, and that function will treat the
corresponding portion of imp_length as a string in itself.
imp_length is an array of 7 characters; the length is determined
implicily by counting the characters in the string literal "123456",
and adding 1 for the terminating null character. That array is filled
in by copying from the array used store the string literal. In this
case, there's no way for your program to even determine whether the
string literal's array actually exists; which means that in some cases
it won't actually exist; the only copy of those characters could be in
imp_length itself. Having been initialized with "123456", you're free
to change the contents of that array; in particular, the statement

imp_length[3] = '\0';

means that it no longer contains a string of length 6. It now starts
with a string of length 3; and contains another string of length 2
starting at &imp_length[4]. It also contains 5 other strings, but
they're just subsets of those two strings.

I also don't see how this contains another string of length two
starting at &imp_length[4].

I hope that's clear now; it's the same string that was pointed by
&imp_length[4] before the assignment statement. Setting imp_length[3] to
'\0' changed the length of all the overlapping strings pointed by
&imp_length[n] for n from 0 to 3, but didn't have any affect on the
other three overlapping strings contained in imp_length.
 
J

James Kuyper

BartC said:
char word[] = "hello";
char pa = word;
strcpy(pa, "world", sizeof "world");
pa = "other";

It makes a bit more sense like this:

char word[] = "hello";
char *pa = word;

Aagh! A single missing character can make a lot of difference. :-(
memcpy(pa, "world", sizeof "world");

I should have used strncpy(pa, "world", sizeof word).
pa = "other";

Or perhaps with the copy line like this:

strcpy(pa, "world");

But either way it would go wrong if the "world" string was longer than
the "hello" string.

That's why I should have used strncpy(..., sizeof word).
 
G

Guest

LC's No-Spam Newsreading account writes:

Not as bad as it seems from my comments.

Thanks to you (and everybody) for the interesting comments. Some of them
may look learned tetrapiloctomy, but it is always good to know. In
particular it may be good to know how imprecise I can be saying
something about "comparison of string assignment". That level of
imprecision can be OK for me or my audience, but I appreciate it would
be different to be OK by chance, or to be OK but imprecise and being
aware of the imprecision.
So the shortest main program which demonstrates my case (where all
items are variables assigned explicitly a value, not just initialized)
is
char a[4] ; /* must use a maximum size */
strcpy(a,"abcd") ; /* value assigned later THUS */

BANG! This copies 5 bytes from the literal string to a. You have
space for 4. I'd have written:

char a[] = "abcd";

I stand corrected.

I'm aware of the terminating null when I pass a Fortran string to a C
jacket routine ( CALL CROUTINE(STRING//CHAR(0)) ), but tend to forget
about it the few times I have to write something in C.

But I tend to forget also the terminating semicolon :) Both in C and
Java ... but javac tells me more explicitly "you have forgotten a
semicolon" w.r.t. cc :)

Anyhow I'll rewrite my example as

int i,j ;
char a[5] ; /* must use a maximum size PLUS ONE */
char *b ; /* no size implied */
strcpy(a,"abcd") ; /* value assigned later THUS */
b = "AB" ; /* value assigned later THUS */
printf("a = %s\n", a);
i=1 ; j=2 ;
strncpy(a+i,b,j+1-i ) ; /* or memcpy */
printf("a = %s\n", a);

I did not want to illustrate *in particular* strcpy *as such*, but I
wanted to illustrate "giving a value" to a and b after the
initialization (is "giving a value" any better than "assigning" ?) ...
and find the "shortest or more legible equivalent" of

[ INTEGER I,J ] ! (optional with the I-N naming rule)
CHARACTER A*5,B*2 ! (must give a size for both)
A="abcd" ! #1
B="AB" ! #2
I=1
J=2
A(I:J)=B ! #3
WRITE(*,*)'A = ',A

in particular #3 (for me "assignment of a substring") is clearly
emulated by strncpy. #1 and #2 (for me "assignment of a string") are
emulated one by a strcpy and one by a pointer assignment, and HAVE to be
so because of the different way a and b are declared.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top