LC's No-Spam Newsreading account wrote:
....
I've even seen people using "typedef char * string;"
Such typedefs reflect and reinforce the misconception that C has a
string data type. It does not. It does have a string data format, but
you can use many different C constructs to store data in that format.
A char* can be used to point at the first element of a string, but it
is not itself a string.
I can use declarations like :
1) char *a
2) char a[]
3) char a[somenumber]
I am not at all scandalized by the third form (I'm used since ever to
the Fortran CHARACTER*somenumber A), but I thought that C could have
"variable-and-dynamic-length" null terminated strings, contrary to the
more rigid fixed-length strings of Fortran,
The difference between 2) and 3) is entirely in how the fixed length
of the array is determined. They both have a fixed length. They both
can contain strings of any length up to but not including the length
of the array. They are both capable of containing multiple strings,
which is an example of the fact that, in C, "string" is a data format,
not a data type.
Now the point comes to whether I add initialization (DATA statement in
my Fortran parliance) and assignment.
I could initialize string a adding e.g. ="123456" to the declaration of
form (1) and (2), but not (3).
You're incorrect about (3). The definition
char a[5] = "123456";
would be a constraint violation. However, the definitions
char b[6] = "123456";
char c[7] = "123456";
char d[8] = "123456";
are all perfectly fine. Note: b does not contain a string, since it
has no terminating null character.
... Also I am OBLIGED to do the
initialization with the undefined length array notation a[] ( I *must*
use (2D), while (2) gives compiler error 'array size missing' )
That's because it is not an "undefined length array", it's an
implicitly defined length, and without the initializer there's nothing
to implicitly define the length.
1D) char *a="123456" ;
2D) char a[]="123456" ;
If I do not initialize (forms (1) and (3)) I can later assign a value.
No, only the pointer can be assigned to. You can assign to the
elements of the arrays in case (2) and (3), and by doing so you can
create one or more strings in them. However, this is true whether or
not you initialize them.
But form (1) requires the assignment as a="123456", while form (3)
requires instead strcpy(a,"123456").
No, there are many different ways to assign a value to the pointer.
The key point to keep in mind is that declaring a pointer doesn't
initialize any memory for a character string. That has to be done
separately; for instance, using the string literal "123456" causes an
unnamed array to be created to contain the corresponding string, and
using that string literal to initialize a char* variable causes that
variable to be set to point at the the first element of the array. But
that pointer could be set to point at any other char in that array, or
in any other char array, for that matter.
strcpy() is one way to copy a string from one array to another, but
there are many others. It works just as well for (2) as for (3).
What is more important, the first argument (destination) of strcpy
cannot be a dynamic length string (1)
Incorrect. If the pointer were set to point at writable memory (which
it currently is not - the arrays created by using string literals are
not safely writeable), strcpy() could also be used to copy the string
into whichever location in memory it is currently pointing at.
... i.e. char *a ! If it is one gets a
segmentation fault. ...
That is true only if it points at a memory segment that you don't
currently have permission to write to. Whether or not this is the case
for the arrays created to store string literals is up to the
implementation, which is why its not safe to assume that you can write
to them.
... It must be a character array (2) or (3).
Otherwise said I cannot declare a string of undefined length as char a[]
unless I also initialize it (like CHARACTER*(*) valid only for a
PARAMETER constant in a main).
Is all this correct ?
Not really. You've confused the issue by using the same name for all
three cases. Let me distinguish them as follows:
char *pc = "123456";
char imp_length[] = "123456";
char exp_length[7] = "123456";
Any use of the string literal "123456" anywhere in your program causes
at least one unnamed array of char to be created, initialized with the
valued '1', '2', '3', '4', '5', '6', '\0', in that order. It's
entirely up to the implementation whether or not all uses of "123456"
refer to the same array, or whether each such use refers to a
different array. In addition, it's entirely up to the implementation
whether or not the array created for "123456" occupies the same
location in memory as the last seven elements of the array created for
"0123456". The behavior of any program that attempts to write anything
into one of those blocks of memory is undefined.
The variable named pc is a pointer that is initialized to point at the
first character in one of those blocks of memory. It could, at any
later time, be re-set to point at some other piece of memory. The
following statement:
pc = &imp_length[3] ;
causes pc to point at the char within imp_length which has the value
'4'. Here's where the difference between a data type and a data format
comes into play: &imp_length[n] is itself a pointer to the first
character of a string with a length of 5-n, for any value of n from 0
to 5. All of those strings share the same terminating null character.
five of them share the same '5' character, etc. Until you understand
that statement, you really don't understand what C strings are.
imp_length is an array of 7 characters; the length is determined
implicily by counting the characters in the string literal "123456",
and adding 1 for the terminating null character. That array is filled
in by copying from the array used store the string literal. In this
case, there's no way for your program to even determine whether the
string literal's array actually exists; which means that in some cases
it won't actually exist; the only copy of those characters could be in
imp_length itself. Having been initialized with "123456", you're free
to change the contents of that array; in particular, the statement
imp_length[3] = '\0';
means that it no longer contains a string of length 6. It now starts
with a string of length 3; and contains another string of length 2
starting at &imp_length[4]. It also contains 5 other strings, but
they're just subsets of those two strings.
exp_length is an array of 7 characters, just like imp_length. They
have different names and different locations, but once defined, they
have the same type and can be used in the same way. The only
difference between them is how the length of the array is determined,
and how it is initialized. If exp_length were initialized with
"12345", there would be two '\0' characters at the end, rather than
none. If the initializer were "1234567", the '7' would be copied into
the last element of the array, and the array would not contain a
string, because it would lack the required terminating null character
required for strings. If the initializer were "12345678", it would be
a constraint violation.
So the shortest main program which demonstrates my case (where all items
are variables assigned explicitly a value, not just initialized) is
int i,j ;
char a[4] ; /* must use a maximum size */
char *b ; /* no size implied */
Also, no memory allocated for a string, and no value has been assigned
to the pointer. It is therefore NOT safe to use 'b' in any way until
it has been initialized.
strcpy(a,"abcd") ; /* value assigned later THUS */
This copies the first four characters from the array created for the
string literal "abcd" into the array you've defined named 'a'. It then
tries to copy the terminating null character, but finds that there is
no room for it. The behavior of your program is therefore undefined.
In practice, that null character might get written somewhere where it
can cause a great deal of trouble, or it might get written somewhere
completely innocuous. It's also possible that it will not get written,
an event that might or might not cause your program to abort.
b = "AB" ; /* value assigned later THUS */
This sets b to point at the 'A' character in the array set aside for
the string literal "AB".
Because of the way typical compilers work, if you reach this point in
the code, there's a pretty good chance that this will accidentally
work as you expected it to, despite the erroneous strcpy() call, but
you shouldn't count on it.