using realloc for a dynamically growing array

P

pereges

I'm trying to write a program that will create a dynamically growing
array. There is a parent array and from this I want to create a
seperate array with elements that are less than <= x (say some random
integer). Here's my program which compiles and exectures
successfully :

#include <stdio.h>
#include <stdlib.h>

int main(void)
{

int *parent;
int *child = NULL, *tempstore;
int child_size, i, parent_size;

printf("Enter size of parent array\n");
if(scanf("%d", &parent_size) != 1)
return (1);

parent = malloc(sizeof(int) * parent_size);
if(parent == NULL)
{
fprintf(stderr, "Memory allocation failed\n");
return (1);
}

for(i = 0; i < parent_size; i++)
{
printf("Enter element %d:", i);
if(scanf("%d", &parent) != 1)
return (1);
printf("\n");
}

child_size = 0;

for(i = 0; i < parent_size; i++)
{
if(parent <= 166) /* comparing against a random number 166 in
this eg. */
{
tempstore = realloc(child, (++child_size)*sizeof(int));
if(tempstore == NULL)
{
fprintf(stderr, "Memory reallocation failed\n");
return (1);
}
tempstore[child_size - 1] = parent;
child = tempstore;
}
}

for(i = 0; i < child_size; i++)
printf("%d\n", child);

return (0);
}

Another alternative is to walk the entire array and count the number
of child entries. Then allocate memory for the child array using the
number of child entries, again run through parent array , check for
the condition and put the element (which satisfies the condition) in
the child array. This requires two passes to be made on an array. What
I posted is just a sample code. In my actual project, there can be as
many as 10,000,000 entries in the parent array. Do you think going
through 10,000,000 entries twice is a better idea than using realloc ?
 
B

Bartc

pereges said:
I'm trying to write a program that will create a dynamically growing
array. There is a parent array and from this I want to create a
seperate array with elements that are less than <= x (say some random
integer). Here's my program which compiles and exectures
successfully :
for(i = 0; i < parent_size; i++)
{
printf("Enter element %d:", i);
if(scanf("%d", &parent) != 1)


This might take a while for 10million entries...
Another alternative is to walk the entire array and count the number
of child entries. Then allocate memory for the child array using the
number of child entries, again run through parent array , check for
the condition and put the element (which satisfies the condition) in
the child array. This requires two passes to be made on an array. What
I posted is just a sample code. In my actual project, there can be as
many as 10,000,000 entries in the parent array. Do you think going
through 10,000,000 entries twice is a better idea than using realloc ?

So, you have one parent array, and just the one child array?

In this case I don't think there's any question that a one-time scan of the
10-million element parent (taking milliseconds) will be faster than
potentially calling realloc millions of times.
 
P

pereges

So, you have one parent array, and just the one child array?

In this case I don't think there's any question that a one-time scan of the
10-million element parent (taking milliseconds) will be faster than
potentially calling realloc millions of times.


Well actually the array is to be split into two child arrays. The
array has to be split at the median value such that the left array
contains all elements <= median and right array contains elements >
median. For eg.

1 2 3 3 4 5 6

median is 3

the left array is

1 2 3 3

right array is

4 5 6

Because the parent array of size n is split at the median , it is
bound to happen that both child arrays size is >= n/2. So we can
malloc child arrays of size n/2 and from there on keep calling realloc
till all the elements are placed properly. The other much faster
option is double the allocated size each time the array reaches its
limit and more data is to be added. Ofcourse, the backdrop of this
method is that a lot of space will be wasted. I am not sure if you can
use realloc() to reduce the array to the actual size at the end. Most
sources that I've read say that its not possible in all
implementations.
 
P

pereges

So, you have one parent array, and just the one child array?
In this case I don't think there's any question that a one-time scan of the
10-million element parent (taking milliseconds) will be faster than
potentially calling realloc millions of times.

Well actually the array is to be split into two child arrays. The
child arrays will also be split. Sort of like binary tree .The array
has to be split at the median value such that the left array
contains all elements <= median and right array contains elements >
median. For eg.

1 2 3 3 4 5 6

median is 3

the left array is

1 2 3 3

right array is

4 5 6

Because the parent array of size n is split at the median , it is
bound to happen that both child arrays size is >= n/2. So we can
malloc child arrays of size n/2 and from there on keep calling realloc
till all the elements are placed properly. The other much faster
option is double the allocated size each time the array reaches its
limit and more data is to be added. Ofcourse, the backdrop of this
method is that a lot of space will be wasted. I am not sure if you can
use realloc() to reduce the array to the actual size at the end. Most
sources that I've read say that its not possible in all
implementations.
 
P

pereges

there are two scans over the parent array in the method i was talking
about in first post.

int left_count = 0, right_count = 0;

/* This is first pass */

for(i = 0; i <= parent_size; i++)
{
if(parent <= median;
left_count ++;
else
right_count ++;
}

/* Allocate memory for both children */

int * c0 = malloc(sizeof(int) * left_count);
int * c1 = malloc(sizeof(int) * right_count);

left_count = right_count = 0;

/* This is second pass */

for(i = 0; i <= parent_size; i++)
{
if(parent <= median);
c0[left_count ++] = parent;
else
c1[right_count ++ = parent;
}
 
P

pereges

there are two scans over the parent array in the method i was talking
about in first post. just for eg :

int left_count = 0, right_count = 0;

/* This is first pass */

for(i = 0; i < parent_size; i++)
{
if(parent <= median;
left_count ++;
else
right_count ++;

}

/* Allocate memory for both children */

int * c0 = malloc(sizeof(int) * left_count);
int * c1 = malloc(sizeof(int) * right_count);

left_count = right_count = 0;

/* This is second pass */

for(i = 0; i < parent_size; i++)
{
if(parent <= median);
c0[left_count ++] = parent;
else
c1[right_count ++] = parent;
}
 
B

Bartc

pereges said:
Well actually the array is to be split into two child arrays. The
child arrays will also be split. Sort of like binary tree .The array
has to be split at the median value such that the left array
contains all elements <= median and right array contains elements >
median. For eg.

1 2 3 3 4 5 6

median is 3

the left array is

1 2 3 3

right array is

4 5 6

Because the parent array of size n is split at the median , it is
bound to happen that both child arrays size is >= n/2. So we can
malloc child arrays of size n/2 and from there on keep calling realloc

Why not make both child arrays size n? Assuming you have memory for that.
You can then realloc them downwards. (If realloc does not do this, just copy
into a new array of the right size.)

But you are now saying the child arrays will be split too. Is this repeated
until we get to just one element? What are you trying to do here exactly,
it might be there's already some algorithm for it.

The splitting sounds familiar from a month or two back on c.l.c. In that
case it came down to a form of sorting.
 
B

Ben Bacarisse

pereges said:
there are two scans over the parent array in the method i was talking
about in first post.

int left_count = 0, right_count = 0;

/* This is first pass */

for(i = 0; i <= parent_size; i++)
{
if(parent <= median;
left_count ++;
else
right_count ++;
}


It won't make much difference, but you can calculate one of these
counts from the total and the other one. No need to ++ one or the
other.

Given that, it *may* be faster to write:

left_count += parent <= median;

rather than having an 'if' at all. If you have a median algorithm,
such things will never make a big difference but it can't hurt (if you
measure first).
 
B

Ben Bacarisse

pereges said:
Well actually the array is to be split into two child arrays. The
child arrays will also be split. Sort of like binary tree .The array
has to be split at the median value such that the left array
contains all elements <= median and right array contains elements >
median.
Because the parent array of size n is split at the median , it is
bound to happen that both child arrays size is >= n/2.

It is also bound to happen that the sum of the sizes == n so...
So we can
malloc child arrays of size n/2 and from there on keep calling realloc
till all the elements are placed properly.

No need unless you can show that the two arrays must be separately
free-able. Unless you do need do free them at different times, the
most efficient malloc is just one of size n. Put the <= elements in
the initial elements and the > ones after. A pointer to the start and
one into the later part then act like your to separate arrays -- the
only difference is that only one of these pointers is free-able and
doing so frees both arrays.

In fact, depending on the choice of median algorithm you may already
have this split (C++'s nth_element algorithm has this behaviour for
example). Do you need the original array order to be preserved? Do
you need to have separately free-able split arrays?
 
P

pereges

Why not make both child arrays size n? Assuming you have memory for that.
You can then realloc them downwards. (If realloc does not do this, just copy
into a new array of the right size.)

Yes, I could do that. Another solution could be create a link list and
keep a count for the elements and after that I can copy the data to an
array and free teh link list node by node as each node gets copied.
 
P

pereges

It won't make much difference, but you can calculate one of these
counts from the total and the other one. No need to ++ one or the
other.
Given that, it *may* be faster to write:

left_count += parent <= median;

rather than having an 'if' at all. If you have a median algorithm,
such things will never make a big difference but it can't hurt (if you
measure first).



Thanks, I did forget about that part.
 
P

pereges

No need unless you can show that the two arrays must be separately
free-able. Unless you do need do free them at different times, the
most efficient malloc is just one of size n. Put the <= elements in
the initial elements and the > ones after. A pointer to the start and
one into the later part then act like your to separate arrays -- the
only difference is that only one of these pointers is free-able and
doing so frees both arrays.
In fact, depending on the choice of median algorithm you may already
have this split (C++'s nth_element algorithm has this behaviour for
example). Do you need the original array order to be preserved? Do
you need to have separately free-able split arrays?


I do need seperate arrays and yes they can be seperately freed. Its
not a necessity for the original array to be preserved. I maintain an
array of pointers and instead sort (based on what they point to
ofcourse) that using the quick sort algorithm.
 
P

pereges

I'm smelling that someone is doing a sort of a list of numbers.

Let me explain what I mean. Many years ago, someone posted to a usenet
group a question "How do I sort a list of numbers?"

Many different people replied to this request with various sorting
algorithms listing the various tradeoffs, advantages, disadvantages, etc.

However, one person did something very importaint. He asked "Why do you
want to sort a list of numbers?"

The original poster saw that question and responded "Because I want to
find the smallest number in the list. So after I sort the list, I can
just get the first number from the sorted list."

Now it's human nature that when they have a problem, they'll break down
the solution into simple steps with the final step resulting in the
final solution they desire. And it's also human nature that if they
have any trouble with a specific step in their solution for them
to ask for help on that specific step and not to bother giving the big
picture on the problem they're really trying to solve.

Right now given the content of this thread, I have a very strong feeling
that this request is one step along a path to solve a larger problem. And
that you may have misstepped somewhere along the way.

So what is the actual problem you're attempting to solve?

ok i will post some code and explanation.
 
P

pereges

I'm trying to build a kdtree for a 3d object. An object contains
vertices(3d vectors) and triangles (triangular mesh structure). The
idea behind using a kdtree is to split the bounding box(a minimum
volume box which encompasses the entire object) at the root node (i.e.
the one which contains the object itself) recursively into smaller
boxes until some stopping condition is satisfied ( I'm using a
combination of depth and maximum permissible triangles inside a box).
The box at the root node will be split along the x axis, the two
resulting children along y axis and the grandchildren along z axis.
This is repeated in a cyclical fashion. I'm splitting the box at the
point which resembles the median point of all vertices inside a given
box. The triangles are split by comparing the three vertex coordinates
against the split point and they can be put in either left child ,
right child or both children(in case when the triangle intersects
splitting plane). I can also use some other criterion like mid point
but median is supposed to give a better split. I want to use this kd
tree for accelerating my ray tracing procedure. With a kdtree, all you
have to do is check against intersection between a ray and the
boxes(which is extremely fast) until you reach a leaf node (which
contains a minimum number of triangles). Now all you need to do is
test for intersection against a minimal number of triangles which is a
lot faster than checking for intersection of every ray with every
triangle in the mesh. Ultimately, almost every triangle can be traced
to a leaf node (which will not be further split btw).

Here's the data structure that I have written for kd tree node


typedef struct kdnode_s
{
/* bounding box */
bbox box;

/* 0 - split along x axis , 1 - split along y axis, 2 - split along
z axis */
int splitplane;
union
{
/* Used only for non-leaf nodes i.e. split plane != -1 */
struct
{
struct kdnode_s *child[2]; /* pointers to two children */
double split; /* splitting coordinate */

}nonleaf;

/* Used for leaf nodes i.e. split plane = -1 */
struct
{
int ntriangles; /* number of triangles in node */
triangle **triptrlist; /* pointer to an array of pointers to
triangles */

}leaf;

}u;

}kdnode;

Note that I do not store the vertex list in every kd tree node, I
don't think its necessary because
ultimately we are only using the vertex list for finding split point,
its the triangles that we are interested in.

Now the function(smaller version) which partitions the kd tree looks
as follows:

/
*******************************************************************************************************************************/
/* The arguments to this subidivide_kdtree
function:
*/
/* kd - pointer to kd tree node. Initially pointer of root node is
passed. */
/* m - pointer to the
mesh
*/
/* depth - depth of subidivision (0 when calling for first
time)
*/
/* maxdepth - maximum depth permissible (passed as argument when
calling for first time) */
/* maxtriangles - maximum number of triangles permissible
*/
/* pvertptrlist - parent vertex list */
/* ptriptrlist - parent triangle ptr
list
*/
/* pnvert - number of vertices in parent
node
*/
/* pntri - number of triangles in parent
node
*/
/
************************************************************************************************************************/

int subdivide_kdtree(kdnode *kd, mesh *m, int depth, int maxdepth, int
maxtriangles,
vector **pvertptrlist, triangle *ptriptrlist, int
pnvert, int pntri)
{
/* cntri - number of triangles in child, cnvert - number of
triangles in child */
int i, j, k, cntri, cnvert;
triangle **ctriptrlist; /* pointer to child triangle pointer array
*/
vector **cvertptrlist; /* pointer to child vertices pointer array */
int flag = 0; /* error flag */

kd->splitplane = -1; /* initialize current node as the leaf node */
kd->u.leaf.ntriangles = pntri;
kd->u.leaf.triptrlist = ptriptrlist;

/* If this condition is true, then its a non leaf node which will be
divided */
if(pntri > maxtriangles && (depth < maxdepth))
{
kd->splitplane = depth % 3;

/* sort the array of pointers */
quicksort(pvertptrlist, 0, pnvert - 1, kd->splitplane);

/* find the split point for parent node */
kd->u.nonleaf.split = find_split_point(pvertptrlist, pnvert - 1,
kd->splitplane);

/* process both children nodes */
for(i = 0; i < 2; i++)
{
/* Allocate memory and perform other initializations for both
children */


/* Now partition the parent vertex and triangle list */

cnvert = 0;
for(j = 0; j < pnvert; j++)
{
/* Count number of vertices inside current child box being
processed */

if(vertex_inside_box(pvertptrlist[j]->coord[kd->splitplane],
kd->u.nonleaf.split, i))
cnvert += 1;
}

/* Allocate memory for child vertex pointer list */

cvertptrlist = malloc(sizeof(vector *) * cnvert);

/* Put the parent vertices into child node */

cnvert = 0;
for(j = 0; j < pnvert; j++)
{
if(vertex_inside_box(pvertlist[j]->coord[kd->splitplane], kd-
u.nonleaf.split, i))
cvertlist[cnvert++] = pvertlist[j];
}

/* Do similar steps for partitioning the triangle */

flag = subdivide_kdtree(kd->u.nonleaf.child, m, depth + 1,
maxdepth,
maxtriangles, cvertptrlist,ctriptrlist,
cnvert, cntri);
}
}
return (flag);
}
 
E

ediebur

When I was a working programmer, I used to poke around in the source
for GNU emacs and that's how Stallman managed his text buffers ( if I
remember correctly)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top