Array of Pointers

G

Gerald

I have a problem with an array of pointers.

In a program I'm writing, I have to read a file, containing thousands
of short lines. The content of another file will be compared against
each line later on. Before that happens there has been a problem that
causes a crash of the program.

This is a little program "test.exe" I wrote to test what happens.

compiler: mingw 3.1.01 for win32 command line
gcc -o test test.c

#include <stdio.h>
int main(){

long a;
const int size=520760;
char * arrayofpointers[size];
char ch[3]={"12"};

// setting each pointer to ch (just for this demonstration)
for (a=0; a<size; a++)
{ arrayofpointers[a]=ch;
}

printf("Checking\n");
printf("Last one: %s\n", arrayofpointers[size-1]);

return 0;

}

This doesn't work if the size is set to 700,000, it will result in a
crash. It does not crash if the size is lower, say 600,000. It will
only display the right content if the number is lower than 520,761.

Questions:
What am I doing wrong? (I'm an amateur programmer.)
What's the maximum number of pointers in an array?

Thanks!

Gerald
 
M

Mark McIntyre

I have a problem with an array of pointers.

In a program I'm writing, I have to read a file, containing thousands
of short lines. (snippage)
This doesn't work if the size is set to 700,000, it will result in a
crash. It does not crash if the size is lower, say 600,000. It will
only display the right content if the number is lower than 520,761.

Most implementations have upper limits on the amount of memory you can
grab. This is especialyl true when using automatic variables like your
array of pointers. Probably your system limits such memory to 512K or
something.
Questions:
What am I doing wrong? (I'm an amateur programmer.)

Use dynamic memory allocation. This might havae a higher limit. Or
change your algorithm.
What's the maximum number of pointers in an array?

Depends on your platform. ISTR that C requires only that the system
support objects up to 64KB in size.
 
K

Kevin Goodsell

Gerald said:
I have a problem with an array of pointers.

In a program I'm writing, I have to read a file, containing thousands
of short lines. The content of another file will be compared against
each line later on. Before that happens there has been a problem that
causes a crash of the program.

This is a little program "test.exe" I wrote to test what happens.

Be careful using "test" as the name of a program. Since you are using
Windows it shouldn't matter, but there's a standard Unix program called
"test", and people often get confused when they accidentally invoke that
instead of their own program.
compiler: mingw 3.1.01 for win32 command line
gcc -o test test.c

The first problem is that you are invoking this wrong (at least for this
group). Use these options: -W -Wall -ansi -pedantic

With my usual options (very similar to those above, with a few extra
warnings, a few warnings suppressed, and -pedantic-errors instead of
just -pedantic), gcc reports the following:

fun.c: In function `main':
fun.c:6: error: ISO C90 forbids variable-size array `arrayofpointers'
fun.c:9: error: syntax error before '/' token
fun.c:10: error: syntax error before ')' token
fun.c:6: warning: unused variable `arrayofpointers'
fun.c:7: warning: unused variable `ch'
fun.c:10: warning: statement with no effect
fun.c: At top level:
fun.c:14: error: syntax error before string constant
fun.c:14: warning: type defaults to `int' in declaration of `printf'
fun.c:14: warning: conflicting types for built-in function `printf'
fun.c:14: error: ISO C forbids data definition with no type or storage class
fun.c:15: error: syntax error before string constant
fun.c:15: warning: type defaults to `int' in declaration of `printf'
fun.c:15: error: ISO C forbids data definition with no type or storage class
fun.c:19:2: no newline at end of file
Terminated with exit code 1
#include <stdio.h>
int main(){

long a;
const int size=520760;

Use long here. That number is too big to portably store in an int.
char * arrayofpointers[size];

This is not legal (unless you are using C99). It should give you a
diagnostic when you invoke the compiler correctly. Array sizes must be
constant expressions, and 'const' objects, perhaps surprisingly, don't
fit the bill.
char ch[3]={"12"};

This doesn't look right to me, but gcc doesn't complain. I'd have to
look it up to figure out if it's right. I would expect the braces to
delimit a list of things with which to initialize the elements of the
array, which would mean that you'd be doing the equivalent of

char ch[3];
ch[0] = "12";
ch[1] = 0;
ch[2] = 0;

Which would obviously not be right. But this doesn't seem to be what is
happening, so I'm probably mistaken and need to review aggregate
initializers again. In any case, it would be less confusing if you
dropped the braces.

char ch[3] = "12";
// setting each pointer to ch (just for this demonstration)

This comment style is not supported in C versions prior to C99.
for (a=0; a<size; a++)
{ arrayofpointers[a]=ch;
}

printf("Checking\n");
printf("Last one: %s\n", arrayofpointers[size-1]);

return 0;

}

This doesn't work if the size is set to 700,000, it will result in a
crash. It does not crash if the size is lower, say 600,000. It will
only display the right content if the number is lower than 520,761.

Questions:
What am I doing wrong? (I'm an amateur programmer.)
What's the maximum number of pointers in an array?

You are probably encountering an implementation limit (I believe I
encountered such a limit while testing your code - I had to reduce the
size of arrayofpointers[] to avoid a crash in ntdll.dll before main()
even started executing). You need to consult the documentation to
determine if this is the case, and if there's a way to extend the limit.

-Kevin
 
K

Kevin Goodsell

Kevin said:
fun.c: In function `main':
fun.c:6: error: ISO C90 forbids variable-size array `arrayofpointers'
fun.c:9: error: syntax error before '/' token

I didn't look closely at these diagnostics before posting, but it
appears that nearly all of them after this point are cascading errors
due to the compiler being confused by earlier problems.
fun.c:10: error: syntax error before ')' token
fun.c:6: warning: unused variable `arrayofpointers'
fun.c:7: warning: unused variable `ch'
fun.c:10: warning: statement with no effect
fun.c: At top level:
fun.c:14: error: syntax error before string constant
fun.c:14: warning: type defaults to `int' in declaration of `printf'
fun.c:14: warning: conflicting types for built-in function `printf'
fun.c:14: error: ISO C forbids data definition with no type or storage
class
fun.c:15: error: syntax error before string constant
fun.c:15: warning: type defaults to `int' in declaration of `printf'
fun.c:15: error: ISO C forbids data definition with no type or storage
class
fun.c:19:2: no newline at end of file

....except this one, which was my fault.

-Kevin
 
B

Bruno Desthuilliers

Gerald said:
I have a problem with an array of pointers.

In a program I'm writing, I have to read a file, containing thousands
of short lines. The content of another file will be compared against
each line later on.

<OT>
You may (or may not) need a more appropriate (and probably a bit more
complex) data structure than a simple array to do this efficiently.
Before that happens there has been a problem that
causes a crash of the program.

This is a little program "test.exe" I wrote to test what happens.

compiler: mingw 3.1.01 for win32 command line
gcc -o test test.c

#include <stdio.h>
int main(){

int main(void) or int main(int argc, char **argv)
long a;
const int size=520760;

not portable, may overflow on some systems. And prefer size_t for sizes.
char * arrayofpointers[size];

Apart from the potential compatibility problem with C89 compilers, this
is far too big for an automatic variable. If you need that much memory,
use dynamic memory
char ch[3]={"12"};

No need for the {}.
// setting each pointer to ch (just for this demonstration)

Avoid C++ style comments, stick with C style (/* comment */)
for (a=0; a<size; a++)

You declared a as long, size as int. longs and ints can be of different
sizes.
{ arrayofpointers[a]=ch;
}

printf("Checking\n");
printf("Last one: %s\n", arrayofpointers[size-1]);

return 0;

}

This doesn't work if the size is set to 700,000, it will result in a
crash. It does not crash if the size is lower, say 600,000. It will
only display the right content if the number is lower than 520,761.

Questions:
What am I doing wrong? (I'm an amateur programmer.)

1/ You should raise the warning level a bit :

[laotseu@localhost fclc]$ gcc -Wall -ansi -pedantic -obigarray bigarray.c
bigarray.c: In function `main':
bigarray.c:6: warning: ISO C89 forbids variable-size array `arrayofpointers'
bigarray.c:9: parse error before '/' token
bigarray.c:10: parse error before ')' token
bigarray.c:6: warning: unused variable `arrayofpointers'
bigarray.c:7: warning: unused variable `ch'
bigarray.c:10: warning: statement with no effect
bigarray.c: At top level:
bigarray.c:14: parse error before string constant
bigarray.c:14: warning: type defaults to `int' in declaration of `printf'
bigarray.c:14: warning: conflicting types for built-in function `printf'
bigarray.c:14: ISO C forbids data definition with no type or storage class
bigarray.c:15: parse error before string constant
bigarray.c:15: warning: type defaults to `int' in declaration of `printf'
bigarray.c:15: ISO C forbids data definition with no type or storage class

2/ you should learn to use malloc(), realloc() and free().
What's the maximum number of pointers in an array?

None in theory. Practically, it depends on your implementation, your
system, etc... The problem here is not about the maximum numbers of
pointers in an array, but about the maximum available space for
automatic variables.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
int result = EXIT_FAILURE;

/* bad programming style : fixed size, magic number... */
size_t size = 1520760; /* now this works on my computer */

/*
* Try to allocate enough space for 'size' char*
* If malloc() fails, it'll return NULL.
*/
char **array = malloc(size * sizeof *array);
if (NULL == array) {
fprintf(stderr
, "%s : %d : failed to allocate mem for %d char*\n"
, __FILE__
, __LINE__
, size
);
}
else {
char *dummy="'anticonstitutionnellement' est un mot tres long";
size_t i; /* 'i' is the approved standard index for loops !-) */

/* copying dummy 'size' times (just for this demonstration) */
for (i = 0; i < size; i++) {
/*
* Try to allocate enough space to copy the content
* of dummy.
* Note that this will take far more space than what you did
* when just copying the address of the "ch" array
*
* (Note also that it's a stupid way of doing things in such
* a case since strlen(dummy) will be recomputed each
* time... Here we'd better move the call to strlen()
* outside the loop.)
*/
array = malloc((strlen(dummy) + 1) * sizeof *array);
if (NULL == array) {
fprintf(stderr
, "%s : %d : malloc failed at i = %d\n"
, __FILE__
, __LINE__
, i
);
break;
}
else {
strcpy(array, dummy);
}
}

if (i == size) {
/* if we're here, then all malloc() worked... */
printf("Checking\n");
printf("Last one: %s\n", array[size-1]);
result = EXIT_SUCCESS;
}
else {
/* we must not try to free() uninitialised pointers */
size = i;
}

/*
* dynamic memory (ie : obtained with malloc() or friends
* must be free()'d (ie: given back to the system)
*/
for (i = 0; i < size; i++) {
free(array);
/* not necessary here, but a good habit */
array = NULL;
}

free(array);
/* not necessary here, but a good habit */
array = NULL;
}

return result;
}

This may or may not work, depending on your computer and what's running
on it at the time...

And it also may take some time to execute !-)

Hope That Helps (and that I didn't left too much stupidities in my code...)
Bruno
 
G

Gerald

Bruno Desthuilliers said:
<OT>
You may (or may not) need a more appropriate (and probably a bit more
complex) data structure than a simple array to do this efficiently.
</OT>

I'm aiming for accuracy, speed and low memory use, in that order.

That's why I read the file into memory as a whole and then edit memory
to add string terminators (\0). Next thing I needed was an array of
pointers which can be used easily and swiftly for strcasestr (libc
2.3.2: strcasestr.c) compares.

char *__strcasestr (phaystack, pneedle)
const char *phaystack;
const char *pneedle;

This function is called for each line in the file. The haystack is
usually larger than the list of needles, but not always.

When I found the memory allocation problem I wasn't sure if this was a
good way, until you posted this:

char **array = malloc(size * sizeof *array);

That solved the problem. Thank you!

For some reason it didn't work when I tried this before I wrote to
this group. Probably due to a typo.

This may or may not work, depending on your computer and what's running
on it at the time...

It works on my computer.
And it also may take some time to execute !-)

Yes, it takes about a second on my Pentium 4.

My program does not create a new array, I just use the pointers to
point to places inside the file loaded into memory. Checks are done to
assure that there's nothing bad in the input file. This way, memory
usage is low and speed is high.
Hope That Helps (and that I didn't left too much stupidities in my code...)
Bruno

The example helped a lot! I also learned from Kevin and Mark's
replies.

Gerald
 
C

Chris Torek

I'm aiming for accuracy, speed and low memory use, in that order.

That's why I read the file into memory as a whole ...
[and apply] strcasestr [to] each line in the file.

If you are repeatedly searching for strings within a large file
(i.e., rewriting the venerable "grep" utility), and want it to be
as fast as possible as well as accurate, you almost certainly should
*not* be using any variant of strstr(), but rather some variant
of Boyer-Moore search.

Except for Boyer-Moore, the best string searching algorithms are
generally O(N) or slower (and often much slower), where N is the
amount of text to be searched -- in this case, the size of the
file. Boyer-Moore, however, is generally O(N/M), where M is the
length of the desired match. When searching for "abcdefghij", for
instance, you may be able to examine only about 10% of the file.

Also -- and somewhat incidentally -- reading the entire file into
memory first, then searching, could be one of the slower methods
to use, because it rules out parallelism. If you read the initial
part of the file, then instruct the I/O subsystem to read the next
part of the file while you work on the first part read so far, then
instruct the I/O subsystem to read the third part while you work
on the second, and so on, you may be able to finish the search in
only slightly longer than it takes to read the entire file, rather
than in "total time time to read file + total time to search".

Ideally (but not always actually), the best way to get this
parallelism in C is simply to go through ordinary stdio, using
fread() or fgets() or getchar() or whatever is most suitable to
the algorithm at hand.
 
D

Dave Thompson

On Sun, 21 Dec 2003 23:33:19 GMT, Kevin Goodsell

char ch[3]={"12"};

This doesn't look right to me, but gcc doesn't complain. I'd have to
look it up to figure out if it's right. <snip>

It is. 6.7.8p14:
An array of character type may be initialized by a character string
literal, optionally
^^^^^^^^^^
enclosed in braces. Successive characters of the character string
^^^^^^^^^^^^^^^^^^
literal (including the
terminating null character if there is room or if the array is of
unknown size) initialize the
elements of the array.

and similarly p15 for wide string, and similarly p11 for a scalar:
int a = 3;
int b = {4};
are both legal.

AFAICT the only benefit of this is to allow {0} to be a valid
initializer for an automatic variable, equivalent to the default
intialization of a static one, for any object type whatsoever.

- David.Thompson1 at worldnet.att.net
 
G

Gerald

Chris Torek said:
If you are repeatedly searching for strings within a large file
(i.e., rewriting the venerable "grep" utility), and want it to be
as fast as possible as well as accurate, you almost certainly should
*not* be using any variant of strstr(), but rather some variant
of Boyer-Moore search.

Yes, but not in this case. I tried Boyer-Moore, it works well in
certain cases, especially large files for dumb string searches. In the
program I was writing I narrow down the search area to a minimum
first, by looking for a "token". Because of that, the individual
searches are actually small. Examples of the lengths of the search
strings are:
1. 24 (needle) in 35 (haystack)
2. 28 (needle) in 14 (haystack)

A whole lot of needles (thousands) are to be searched in on average 10
small haystacks. Needles are usually smaller than haystacks.

In the first case, a maximum of 35-24=11 searches are needed (sliding
window). No match is expected more than 99.99% of all searches.

In the second case, no compare is necessary, but I'll let the
strcasestr from libc decide that. The length of needle is not known as
I've written it now, see below for code. May be this outweighs the
extra code needed to calculate string length. I could do some sorting
too, either by file length or an alphabetical sort before doing
compares. That would limit the amount of compares a bit.
Except for Boyer-Moore, the best string searching algorithms are
generally O(N) or slower (and often much slower), where N is the
amount of text to be searched -- in this case, the size of the
file. Boyer-Moore, however, is generally O(N/M), where M is the
length of the desired match. When searching for "abcdefghij", for
instance, you may be able to examine only about 10% of the file.

After checking 100% of the file for tokens, I estimate the examined
parts at about 5% in an average file, much less in larger than average
files. I have tried a few dozen case insensitive strstr variants for
speed and accuracy. The one from libc performed far better (>20%
faster) for this application than any other.
Also -- and somewhat incidentally -- reading the entire file into
memory first, then searching, could be one of the slower methods
to use, because it rules out parallelism. If you read the initial
part of the file, then instruct the I/O subsystem to read the next
part of the file while you work on the first part read so far, then
instruct the I/O subsystem to read the third part while you work
on the second, and so on, you may be able to finish the search in
only slightly longer than it takes to read the entire file, rather
than in "total time time to read file + total time to search".

Do you mean multi-threading? I've tried this without multi-threading.
It takes more time to read small parts and perform pointer assignments
when the file is read and processed in chunks.

It is known that the file is likely already in memory cache.
Ideally (but not always actually), the best way to get this
parallelism in C is simply to go through ordinary stdio, using
fread() or fgets() or getchar() or whatever is most suitable to
the algorithm at hand.

I used fread() to read the file at once. The source file is user
input, it cannot be trusted to be properly formatted.

These are parts of the code to process the file, terminate lines as
strings and set the pointers, which will be used later to perform the
searches. It skips empty lines.

char **filepointers;
long max_pointers=300000;
indicator=0;
filepointers = malloc(max_pointers * sizeof *filepointers);

sfile is a large list of needles already read into memory from file.
The length of this file is known as lenfile.

d=0;
// point to first line, if it exists at position zero
if (sfile[0]!='\x0D' && sfile[0]!='\x0A')
{ filepointers[0]=sfile; // set pointer
d++; // next pointer
}
// set pointers
for (c=0;c<lenfile; c++)
{ if (sfile[c]=='\x0D' || sfile[c]=='\x0A')
{ sfile[c]=0; // terminate string
indicator=1; // new string coming up (possibly)
}
else
{ if (indicator==1) // start new string
{ if (d==max_pointers-1) break; // don't overflow
filepointers[d]=sfile+c; // set pointer
indicator=0; // reset indicator
d++; // next pointer
}
}
}
filepointersnumber=d; // number of pointers counted

This code still takes nearly as much time as reading the file from
file cache memory. For an extremely large needle file, it's a bit less
than the time the searches take. May be this helps to gauge the
impact.

read file from memory cache : assing pointer : perform searches
1.1 : 1 : 1.2

There may be 0x0 chars inside the lines. In that case, the search is
limited to the first 0x0 encountered. To counter such problems,
another array can be created which holds line lengths. Terminating
lines won't be necessary. Case insensitive memory compares can be used
instead of strcasestr. Only if the needle length is equal or smaller
than the length of the haystack a compare will performed by calling a
function.

Gerald
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
474,125
Messages
2,570,748
Members
47,302
Latest member
MitziWragg

Latest Threads

Top