copy a string into a 2d array of chars

S

Simon Schaap

Hello,
I have encountered a strange problem and I hope you can help me to
understand it. What I want to do is to pass an array of chars to a
function that will split it up (on every location where a * occurs in
the string). This split function should allocate a 2D array of chars
and put the split results in different rows. The listing below shows
how I started to work on this. To keep the program simple and help
focus the program the string is not actually split. The split function
in this case just allocates a 2D array of size 1 by the length of the
passed string and copies the entire input string into this newly
allocated array. A pointer to this array is then passed to the caller
function. By the way, allocating of these so called 2D arrays is done
by a funtion that I adapted from the book "C unleashed", R Heath, L
Kirby et al.

Unfortunately, even this simple program escapes my comprehension. When
the caller function prints the chars in the 2D array it just received
from the split function, the first char turns out to be the '\0'
character! I am completely at loss here, where does this '\0'
character come from? I hope someone will find the time to enlight me.

Sincerely,
Simon

BEGIN OF LISTING:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>


char** allocate_2d_array_of_chars(size_t m, size_t n);
char** split_string(char *instring);

int main(void)
{
char **a=NULL;
char b[14]="*(10)*5*(1)*1";
int i;

a=split_string(b);

if (a!=NULL) {
for (i=0;i<14;i++) {
printf("%d %c\n", i, a[0]);
}
/* to show the strange behavior : */
if (a[0][0]=='\0') {
printf("a[0][0] equals '\0' \n Strange..., not?\n");
}
free(a);
}
return 0;
}

char **split_string(char *instr)
{
int instrlen;
int i;
char **retarr;

instrlen = strlen(instr);

retarr = allocate_2d_array_of_chars(1,instrlen);
if (retarr==NULL) {
printf("could not allocate retarr\n");
return NULL;
}

/* copy the string */
for (i=0;i<instrlen;i++) {
retarr[0]=instr;
}

return retarr;
}

char** allocate_2d_array_of_chars(size_t m, size_t n)
{
/* adapted from "C unleashed", R Heath, L Kirby et al.*/
/* allocates a 2D array of one contiguous chunk of memory */

typedef char T;

T **a;
T *p;
size_t Row;

a=malloc(m * n * sizeof **a + m * sizeof *a);
if (a != NULL) {
for (Row = 0, p = (T *)a + m; Row < m; Row++, p+=n) {
a[Row] = p;
}
}
return a;
}

END OF LISTING
 
M

Mark Henning

Simon Schaap said:
I have encountered a strange problem and I hope you can help me to
understand it. What I want to do is to pass an array of chars to a
function that will split it up (on every location where a * occurs in
the string). This split function should allocate a 2D array of chars
and put the split results in different rows. The listing below shows
how I started to work on this. To keep the program simple and help
focus the program the string is not actually split. The split function
in this case just allocates a 2D array of size 1 by the length of the
passed string and copies the entire input string into this newly
allocated array. A pointer to this array is then passed to the caller
function. By the way, allocating of these so called 2D arrays is done
by a funtion that I adapted from the book "C unleashed", R Heath, L
Kirby et al.

Unfortunately, even this simple program escapes my comprehension. When
the caller function prints the chars in the 2D array it just received
from the split function, the first char turns out to be the '\0'
character! I am completely at loss here, where does this '\0'
character come from? I hope someone will find the time to enlight me.

Sincerely,
Simon

I have no idea if this is the 'correct' thing to do, but when dealing with
2D arrays, i normally allocate an array of pointers, each pointing to an
array.

Something like:

char **allocate_2d_array_of_chars(size_t m, size_t n)
{
int i;
char **a = malloc(m * sizeof(char *));

for(i = 0; i <= m; i++)
{
a = malloc(n * sizeof(char));
}

return a;
}

Note that this is untested code and contains no error checking.

If you do this, you need to ensure that you iterate through the 'base' array
and
free() each individual element before freeing the array as a whole.
 
B

Barry Schwarz

Hello,
I have encountered a strange problem and I hope you can help me to
understand it. What I want to do is to pass an array of chars to a
function that will split it up (on every location where a * occurs in
the string). This split function should allocate a 2D array of chars
and put the split results in different rows. The listing below shows
how I started to work on this. To keep the program simple and help
focus the program the string is not actually split. The split function
in this case just allocates a 2D array of size 1 by the length of the
passed string and copies the entire input string into this newly
allocated array. A pointer to this array is then passed to the caller

No it doesn't copy the entire string. Your looping and space
allocation is controlled by the value returned from strlen. strlen
does not count the terminating '\0' which is a part of the string.
What you would end up with (except for a problem to be discussed
later) is an array of char containing the original contents of the
string except for the '\0'. If you want the result to be strings you
should compute strlen()+1 and use that for loop control and
allocation.
function. By the way, allocating of these so called 2D arrays is done
by a funtion that I adapted from the book "C unleashed", R Heath, L
Kirby et al.

I don't have the book so I don't know if you copied it wrong or the
authors made the mistake described later.
Unfortunately, even this simple program escapes my comprehension. When

Before getting to the error, let's discuss the basic intent of the
function. You want an array of strings. Since a string is an array
of char, you want something that looks like an array of m strings
which really means an array of m arrays of n char.

If the original string has a total length (including the '\0') of n,
then n is also the maximum for each resulting string and m*n is
guaranteed to be large enough to hold all m strings.

But you don't want to have to do address arithmetic every time you
want to reference string i. The answer is to allocate space for m
pointers. The i-th pointer will contain the starting address of the
i-th string. Since everything about the strings is variable, the only
thing you know for sure is the starting address of the allocated
memory (a in your code). If the pointers are placed at the start of
this memory, they can be referred to with normal subscript notation
(a).

So, you need to allocate space for m*n characters and m pointers. In
your code, m * sizeof *a computes the space needed for m pointers and
m *n * sizeof **a computes the space for the m strings of length n.
Since the pointers come first, the first string will follow the last
pointer. (While this is relatively safe for arrays of char, see
comments below about potential alignment problems for arrays of long
or double.)

In the for statement, the first clause initialized Row as the index of
the first pointer (a[0]) and attempts to initialize p as the address
of the first string. (This is where the error occurs which I will get
to later.) The second clause terminates the loop after processing m
pointers. The third clause increments Row to be the index of the next
pointer and increments p to point to the start of the next string.
And then of course, the address is stored in the pointer.

When it is all done, the allocated area of memory would look like
|first pointer|second pointer|..................................................|
|...............|last (m-th) pointer|space for first string|space for second string|...|
|..................................|space for last (m-th) string|
where the i-th pointer contains the starting address of the space for
the i-th string.
the caller function prints the chars in the 2D array it just received
from the split function, the first char turns out to be the '\0'
character! I am completely at loss here, where does this '\0'
character come from? I hope someone will find the time to enlight me.

Due to the error in the code explained below, the value stored in a[0]
is only one byte beyond the value in a. (On my system, a is set to
0x00780eb0 upon return from malloc and a[0] is set to 0x00780eb1.)

When your allocate function returns to your split function, the for
loop tries to copy the characters from where instr points to where
retarr[0] points. As noted above retarr[0] actually points to one of
the bytes in itself. (On a big-endian machine, it would point to the
0x78; on a little-endian one it would point to the 0xeb.) On the
first iteration through the for loop, this byte is replaced by the
first character in instr ('*'). This has the affect of changing
retar[0] so that it points somewhere else. This invokes undefined
behavior and anything can happen.
Sincerely,
Simon

BEGIN OF LISTING:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>


char** allocate_2d_array_of_chars(size_t m, size_t n);
char** split_string(char *instring);

int main(void)
{
char **a=NULL;
char b[14]="*(10)*5*(1)*1";

Better to omit the dimension let the compiler decide how big it needs
to be.
int i;

a=split_string(b);

if (a!=NULL) {
for (i=0;i<14;i++) {

Since you use strlen for allocation, when i is 13 the call to printf
will invoke undefined behavior.
printf("%d %c\n", i, a[0]);
}
/* to show the strange behavior : */
if (a[0][0]=='\0') {
printf("a[0][0] equals '\0' \n Strange..., not?\n");
}
free(a);
}
return 0;
}

char **split_string(char *instr)
{
int instrlen;
int i;
char **retarr;

instrlen = strlen(instr);


You need a +1 here to accommodate the '\0'.
retarr = allocate_2d_array_of_chars(1,instrlen);
if (retarr==NULL) {
printf("could not allocate retarr\n");
return NULL;
}

/* copy the string */
for (i=0;i<instrlen;i++) {
retarr[0]=instr;


Without the +1, this will not copy the '\0'. When you actually split
the string, this will be somewhat critical.
}

return retarr;
}

char** allocate_2d_array_of_chars(size_t m, size_t n)
{
/* adapted from "C unleashed", R Heath, L Kirby et al.*/
/* allocates a 2D array of one contiguous chunk of memory */

typedef char T;

T **a;
T *p;
size_t Row;

a=malloc(m * n * sizeof **a + m * sizeof *a);
if (a != NULL) {
for (Row = 0, p = (T *)a + m; Row < m; Row++, p+=n) {

Here is the error in the p= assignment. It stems from how C does
pointer arithmetic. If Q is a pointer to type R (R *Q;), then the
expression Q+i involves pointer arithmetic and is treated in our
normal everyday integer arithmetic as Q + i*sizeof(R). That is, the
expression Q+i points to the i-th object of type R past the one Q
currently points to.

By casting a as a T*, the expression (T*)a+m points to the m-th T
after the one a currently points to. Since T is char and m is 1, it
evaluates to the address of the first char after a, which is only one
byte into the allocated area. Without the cast, a is a T** or, for
this discussion, a pointer to T*. Then the expression evaluates to
the m-th T* after the one a points to. T is still char but a char* is
typically 4 bytes (the exact size doesn't matter). m is still 1 so
the a+1 points to the first char* after the one a currently points to,
which is typically 4 bytes beyond the address in a.

Then, when you set a[0] to p in the next statement, the address stored
will be that of the next byte beyond the pointer, which is where you
really want the string to start.

Now for the caution. If T is any type that has a more stringent
alignment than T* (8 byte doubles and longs with 4 byte pointers would
be an example), there is no guarantee that the value initially
computed for p is properly aligned for the type T. On most systems,
this is not a problem when T is char.
a[Row] = p;
}
}
return a;
}

END OF LISTING



<<Remove the del for email>>
 
R

Ravi Uday

char** allocate_2d_array_of_chars(size_t m, size_t n)
{
/* adapted from "C unleashed", R Heath, L Kirby et al.*/
/* allocates a 2D array of one contiguous chunk of memory */

typedef char T;

T **a;
T *p;
size_t Row;

a=malloc(m * n * sizeof **a + m * sizeof *a);
if (a != NULL) {
for (Row = 0, p = (T *)a + m; Row < m; Row++, p+=n) {
a[Row] = p;
}
}
return a;
}

way too complicated, you can replace with this one.

char** allocate_2d_array_of_chars(size_t rows, size_t columns)
{
int i;
char **db_array;

db_array = malloc ( rows * sizeof *db_array);

if ( db_array == NULL )
{
puts ("Unable to allocate.. returning");
return NULL;
}

for ( i = 0; i<rows; i++)
{
db_array = malloc ( columns * sizeof *db_array);
if ( db_array == NULL )/* Handle errors appropriately. */
printf ("Unable to allocate db_array[%d]\n", i);
}

return db_array;
}

For freeing, you can use the one below.

void free_2d_array_of_chars( char **db_array, size_t rows)
{
int i;

for ( i = 0; i<rows; i++)
free ( db_array );

free (db_array);
}

- Ravi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top