determining size of array of chars

S

Seebs


Because they are all the same thing. All pointers to the same type of
thing are the same kind of object.

It's like ints. In a given implementation, "int" always means the same
sized thing, whether the value it holds is 1 or 60000..
Possibly my understanding is not very clear, that's why I am asking
here.
Okay.

This example is trivial.
It is less trivial to understand why, if I declare
const char *example[] = {
"hello", "a"
};
then I find that sizeof(example[0]) is 5 and sizeof(example[1]) is
still 5.

I don't believe this.

I don't believe this because I have never heard of a machine with
a 5-byte pointer type.

Now, if you were getting *4* all the time, that would make sense, because
on many machines, pointers are 4 bytes.

(Also, I should point out: The array denoted by "hello" is actually *6*
bytes, because it's terminated.)

Again, it didn't actually happen. So try running ACTUAL code and
looking at the results.
This result lead me to think that in this case too the memory is
h e l l o \0 a \0 \0 \0 \0 \0
exactly the same of your example.
No.

In other words, the compiler has to assign to the pointer the maximum
size, correct?

No.

POINTERS ARE OF A FIXED SIZE.

Okay, here's the deal. Imagine that your computer has a billion bytes
of memory. When your program is running, all those strings are loaded
in memory somewhere. So we just start counting from zero and numbering
all the memory.

So say your program has these strings starting at 0x100000.

0x100000 h
0x100001 e
0x100002 l
0x100003 l
0x100004 o
0x100005 \0
0x100006 a
0x100007 \0

Now, what's in the array?

char *str[2] = { "hello", "a" };

is exactly like
char *str[2] = { 0x100000, 0x100006 };

And because 0x100000 and 0x100006 are the same type of thing, and they are
probably both represented as 4 bytes, sizeof(str[0]) == 4, and sizeof(str[1])
== 4.

Because you're not getting the sizes of the things-pointed-to, but of the
pointers.

Disclaimer: The above is grossly oversimplified. The size of a pointer
could be 1, or 4, or 8, or 2, or 6, or 5, or whatever else; however, all
pointer-to-char will be the same size on a given implementation. In
most modern systems, the memory addresses you're using are "virtual",
meaning that two different programs can have different objects that they
see as being at 0x100000. There's a lot of oversimplification here.

The point is, when you have a string literal, the string contents are stored
somewhere, but the actual object created in the code is an *address*, which
is sort of like a special kind of number, and the *addresses* of things
of the same type are always the same size. (On some systems, pointers to
different types of objects could be different sizes.)

-s
 
J

James Dow Allen

It is less trivial to understand why, if I declare
         const char *example[] = {
                 "hello", "a"
         };
then I find that sizeof(example[0]) is 5 and sizeof(example[1]) is
still 5. ... Why?

(Don't you mean "Why not?" :) )

I see seebs answered this a minute before I was about to.
To rephrase his answer:

char *example[] = {
"hello", "a"
};
is more or less the same as
char anonymous1[] = "hello", anonymous2[] = "a";
char *example[] = {
/* following are pointers, not arrays */
&anonymous1[0], &anonymous2[0];
};

These things are less confusing, I think, if you
visualize a C compilation in terms of the actual data
and addresses that would result if you were writing the
same thing in machine language.

I try to send this useful message to beginners.
But watch the pedants now muddy the waters with talk
of The Standard(tm) and how one must never never NEVER
pretend C is implemented on real machines! :)

James Dow Allen
 
D

dehantonio

then I find that sizeof(example[0]) is 5 and sizeof(example[1]) is
still 5.

I don't believe this.

Sorry, my fault. I got confused with the original code: "cmd" is 4
bytes including the null termination, and incidentally the size of the
pointer was 4, that's why I was confusing pointer size with string
size!
I don't believe this because I have never heard of a machine with
a 5-byte pointer type.

Now, if you were getting *4* all the time, that would make sense, because
on many machines, pointers are 4 bytes.
[...]

Right, that was confusing me at first.

Now I understand that I was confusing the size of the pointer with the
size of the array. Very simple... but when you get confused nothing is
simple.

I wish to say thank you to you and to all the other peoples that
answered; this thread was very useful for me to clarify things.
 
K

Keith Thompson

James Dow Allen said:
These things are less confusing, I think, if you
visualize a C compilation in terms of the actual data
and addresses that would result if you were writing the
same thing in machine language.

I try to send this useful message to beginners.
But watch the pedants now muddy the waters with talk
of The Standard(tm) and how one must never never NEVER
pretend C is implemented on real machines! :)

Speaking as one of the pedants, I do recognize that people understand
C in different ways. I tend to do best with a fairly abstract
approach, but I understand I'm not exactly typical -- and there's
no shortage of posters to provide a more concrete perspective

And I think it's good to understand that C, in all its theoretical
abstraction, is the way it is precisely *because* it's implemented
on real machines.
 
K

Keith Thompson

dehantonio said:
I understand. I also tried the code on a PC and it works.
However, question 6.23 is a bit different, since every element has the
same size (int). In this case, I am declaring an array of pointers to
elements that do not have the same lenght: "pwd" is longer than "cd".
If the trik works, it means that the memory is allocated like this:
Byte - content
01 - l
02 - s
03 - \0
04 - ?
05 - p
06 - w
07 - d
08 - \0
09 - c
0A - d
0B - \0
0C - ?

Now, the question: is this behaviour a standard, or any compiler could
decide whether to skip or not bytes 04 and 0C?

I see that this has already been cleared up, but now that you understand
it perhaps I can muddy the waters a bit. :cool:}

The original declaration (with the typo fixed) is:

const char *LineMenu[] = {"ls", "pwd", "cd"};

The number of elements in the array can be computed by:

sizeof LineMenu / sizeof LineMenu[0]

(sizeof is a unary operator, not a function, so parentheses aren't
needed.)

It would work exactly the same way regardless of how LineMenu is
initialized. It could have been any of:

const char *LineMenu[] = {"", "", "");
const char *LineMenu[] = {NULL, NULL, NULL};
const char *LineMenu[] = {"some", "other", "values"};
or even:
const char *LineMenu[];

The initializer affects the value(s) stored in LineMenu.
sizeof is unaffected by stored values.
 
J

John Bode

Please disregard my previous post, I pressed "submit" by mistake, I do
not know how it could happen that the focus was on the wrong window.

I start from the following situation:

const char *LineMenu[] = {"ls", "pwd", "cd");
#define MENUSIZE 3

int somefunction(char* inputbuffer) {
     int idx;
     for (idx = 0; idx < MENUSIZE; idx++) {
     if (strstr(inputbuffer, LineMenu[idx])) {
       return idx;
     }
     }
     //not found:
    return -1;

}

Now, I would like to get rid of the MENUSIZE define. For example, if I
add more commands to LineMenu array, is there a way of dynamically
check it's size in the for cycle?
I want to avoid that I add a new command to LineMenu and I forget to
update MENUSIZE, or, in case LineMenu is huge, counting the array's
cells is not practical.
Thank you.

The easiest solution is to add a sentinel to the end of the array,
such as

const char *LineMenu[] = {"ls", "pwd", "cd", NULL};
...
for (idx = 0; LineMenu[idx] != NULL; idx++) {...}

Having said that, another approach (that's admittedly a *lot* more
work, but I think it's worth it) is to create a smart menu module that
encapsulates how menus are defined and accessed; instead of exposing
the LineMenu array directly, hide it behind a well-defined
interface.

Here's something off the top of my head with no guarantees of
applicability or correctness:

/**
* Menu.h -- defines the interface for a menu object. Menu items
can
* be added, searched for, etc.
*/
#ifndef MENU_H
#define MENU_H

/**
* Details of the menu type implementation are hidden from the
user
*/
typedef void *MenuType;

/**
* Allocate a new, empty menu object
*/
MenuType createMenu();

/**
* Deallocate a menu object
*/
void destroyMenu(MenuType *theMenu);

/**
* Add an item to the end of the menu.
*/
void addMenuItem(MenuType theMenu, char *theItem);

/**
* Find the position of an item in the menu;
* returns -1 if item is not found
*/
int getItemIndex(MenuType theMenu, char *theItem);

/**
* Return the item at the given index; returns NULL
* if item is not found (i.e., index is outside
* the bounds of the menu list)
*/
char *getItemAtIndex(MenuType theMenu, int index);

/**
* Get the total number of items in the menu
*/
size_t getItemCount(MenutType theMenu);

#endif

This is all the menu's clients see. Your code would look more like
this:

#include "Menu.h"

int main(void)
{
MenuType myMenu;
int idx;
char inputBuffer[SIZE]; // for some arbitrary SIZE
...
myMenu = createMenu(); // attempt to create the menu object
if (!myMenu)
{
// couldn't create the menu, fall over and die
exit(-1);
}

// add items to the menu
addItem(myMenu, "ls");
addItem(myMenu, "pwd");
addItem(myMenu, "cd");
...
// find the menu item that matches the pattern
idx = somefunction(inputBuffer, myMenu);
// do something with it
do_something_with(getItemAtIndex(myMenu, idx);
...
// we're done with the menu object, so deallocate it.
destroyMenu(&myMenu);
...
}

int somefunction(char *inputbuffer, MenuType theMenu)
{
int idx = 0;
size_t menuSize = getItemCount(theMenu);
for (idx = 0; idx < menuSize; idx++)
if (strstr(inputbuffer, getItemAtIndex(theMenu, idx))
return idx;
return -1;
}

somefunction doesn't know *how* the menu is implemented (array, list,
hash, etc.), nor does it care.

Of course, now you have to write the code underlying the menu module,
and then you'll have to build and link it in the rest of your project,
debug it, add realistic error handling, and all that jazz. That may
be more work than you're willing to put in at this stage.
 
B

BruceS

I understand. I also tried the code on a PC and it works.
However, question 6.23 is a bit different, since every element has the
same size (int). In this case, I am declaring an array of pointers to
elements that do not have the same lenght: "pwd" is longer than "cd".
If the trik works, it means that the memory is allocated like this:
Byte - content
01   - l
02   - s
03   - \0
04   - ?
05   - p
06   - w
07   - d
08   - \0
09   - c
0A   - d
0B   - \0
0C   - ?
Now, the question: is this behaviour a standard, or any compiler could
decide whether to skip or not bytes 04 and 0C?

I see that this has already been cleared up, but now that you understand
it perhaps I can muddy the waters a bit.  :cool:}

The original declaration (with the typo fixed) is:

    const char *LineMenu[] = {"ls", "pwd", "cd"};

Even after seeing comments about it existing, I didn't see the typo
until I copied & pasted your version and his version and compared
using a larger font. I guess I need a bigger monitor. Or better
eyes.
The number of elements in the array can be computed by:

    sizeof LineMenu / sizeof LineMenu[0]

(sizeof is a unary operator, not a function, so parentheses aren't
needed.)

It would work exactly the same way regardless of how LineMenu is
initialized.  It could have been any of:

    const char *LineMenu[] = {"", "", "");

But here you repeated the typo, kind of killing the point.
    const char *LineMenu[] = {NULL, NULL, NULL};
    const char *LineMenu[] = {"some", "other", "values"};
or even:
    const char *LineMenu[];

The initializer affects the value(s) stored in LineMenu.
sizeof is unaffected by stored values.

I think Seebs' explanation did very well for this, though he went a
bit far to please pedants. Note that the OP originally was using a
macro to define the number of elements---exactly what the usual code
produces. It doesn't look to me like anyone really cared about the
lengths of the strings, except in the sense of being concerned that
they would somehow interfere with the calculation of the number of
elements.
 
B

Ben Bacarisse

James Dow Allen said:
I see seebs answered this a minute before I was about to.
To rephrase his answer:

char *example[] = {
"hello", "a"
};
is more or less the same as
char anonymous1[] = "hello", anonymous2[] = "a";
char *example[] = {
/* following are pointers, not arrays */
&anonymous1[0], &anonymous2[0];
};

These things are less confusing, I think, if you
visualize a C compilation in terms of the actual data
and addresses that would result if you were writing the
same thing in machine language.

I try to send this useful message to beginners.

I think it very much depends on the beginner. It is probably quite rare
these days for someone starting out in C to know enough about assemblers
to benefit from that perspective. For that matter, the OP's original
confusion might come about precisely from having that perspective! I
would almost certainly put the bytes into consecutive locations and then
load pointers to them:

s1: .bytes "hello\0"
s2: .bytes "a\0"
exmaple: .quad s1
.quad s2

(in some hypothetical assembler) so the idea that the strings may be
non-contiguous requires...
But watch the pedants now muddy the waters with talk
of The Standard(tm) and how one must never never NEVER
pretend C is implemented on real machines! :)

....a more abstract view of the language. The OP seemed to have grasped
what might be happening at some level, but the original question was one
that needs some distance from the machine: are the byte always
consecutive?
 
N

Nick Keighley

Subject: determining size of array of chars


<snip>

your question is really "how do I determine the size of an array?"
I start from the following situation:

#define ARRAY_SIZE(A) (sizeof(A)/sizeof(A[0]))
const char *LineMenu[] = {"ls", "pwd", "cd");

/* not needed
#define MENUSIZE 3 */

int somefunction(char* inputbuffer) {

int somefunction (char* inputbuffer, size_t menu_size)
{

I must admit I'd pass LineMenu as a parameter as well
     int idx;
     for (idx = 0; idx < MENUSIZE; idx++) {

for (idx = 0; idx < menu_size; idx++)
{
      if (strstr (inputbuffer, LineMenu[idx]))
        return idx;
     }

     //not found:
    return -1;
}

void call_me (void)
{
char buffer [1024];
somefunction (buffer, ARRAY_SIZE(LineMenu));
}


I fiddled with your layout a bit as yours confused me
 
D

David Thompson

... perhaps I can muddy the waters a bit. :cool:}

The original declaration (with the typo fixed) is:

const char *LineMenu[] = {"ls", "pwd", "cd"};

The number of elements in the array can be computed by:

sizeof LineMenu / sizeof LineMenu[0]

(sizeof is a unary operator, not a function, so parentheses aren't
needed.)
Yep.

It would work exactly the same way regardless of how LineMenu is
initialized. It could have been any of:

const char *LineMenu[] = {"", "", "");
const char *LineMenu[] = {NULL, NULL, NULL};
const char *LineMenu[] = {"some", "other", "values"};

Yep, except for )-for-} already noted.
or even:
const char *LineMenu[];
Nope. That doesn't initialize, and (also) does not complete the type.

const char * LineMenu[3];
does have a complete type, and works correctly. And if this is at file
scope (as in the OP) and no other declaration with an initializer for
this var follows in the same t.u. there is an implicit definition that
initializes to 'appropriate' zeros, here null pointers, although it's
a pinhead-angel question whether you consider this declaration
responsible for that initialization.

If at block scope, which is also possible, there is no initialization,
but the type and size are still correct. Less useful, since its (only)
benefit is to allow you to access the array correctly and any read
from an uninitialized array is wrong, but the size is right.
The initializer affects the value(s) stored in LineMenu.
sizeof is unaffected by stored values.

Or even lack of any determinate value.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top