Proper way to input a dynamically-allocated string

M

Michel Rouzic

I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?
 
?

=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=

Michel Rouzic said:
I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

One common idiom is the following:

#include <stdio.h>
#include <stdlib.h>

void *
frealloc(void *ptr, size_t sz)
{
void *tmp;

if ((tmp = realloc(ptr, sz)) != NULL)
return (tmp);
free(ptr);
return (NULL);
}

char *
getline(FILE *f)
{
char *str = NULL;
size_t sz = 0;
int ch;

for (size_t len = 0; ; ++len) {
ch = fgetc(f);
if (ch == EOF && !len)
return (NULL);
if (len == sz)
str = frealloc(str, sz = sz * 2 + 1);
if (ch == EOF || ch == '\n') {
str[len] = '\0';
return (str);
} else {
str[len] = ch;
}
}
}

However, on average, about 25% of the allocated memory will be wasted.
You can fix that by replacing

return (str);
with
return (frealloc(str, len + 1));

but you may still end up losing quite a bit to heap fragmentation,
depending on how good your system's malloc() implementation is.

DES
 
E

Eric Sosman

Michel said:
I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.
 
S

slebetman

Michel said:
I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?
From stdin is a bit of a problem. The usual answer is to use a buffer
to temporarily store the string:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 100

int main () {
char buffer[BUFFER_SIZE];
char *inputString = NULL;
size_t inLen = 0;

while (fgets(buffer, BUFFER_SIZE, stdin) != NULL) {
inLen += strlen(buffer);
if (inputString == NULL) {
inputString = malloc(inLen);
inputString[0] = '\0';
} else {
inputString = realloc(inputString, inLen);
}
if (inputString == NULL) {
/* malloc or realloc failed */
exit(-1);
}
strcat(inputString, buffer);
/* check for newline */
if(inputString[inLen-1] == '\n') {

/* process input here */

/* then remember to free inputString */
free(inputString);
inputString = NULL;
}
}
}
 
M

Michel Rouzic

Eric said:
Michel said:
I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.

That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';
 
K

Keith Thompson

Michel Rouzic said:
That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';


The speed of that code while reading input probably isn't going to be
much of an issue, but the multiple calls to realloc() might cause
excessive heap fragmentation, which could cause problems elsewhere in
your program. (The standard does use the term "heap", but you get the
idea.)
 
E

Eric Sosman

Michel said:
Eric said:
Michel Rouzic wrote:

I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.


That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';


That's the general idea. As written, though, it's not
very robust: it's oblivious to realloc() failures and to
end-of-file or errors on the standard input. Pay attention
to the Sixth Commandment at

http://www.lysator.liu.se/c/ten-commandments.html

Other observations: `int' should probably be `size_t',
and see Keith Thompson's response for one of the reasons a
character-at-a-time expansion may not work well. ("Others
will occur to your thought." -- Gandalf)
 
M

Michel Rouzic

Keith said:
Michel Rouzic said:
That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';


The speed of that code while reading input probably isn't going to be
much of an issue, but the multiple calls to realloc() might cause
excessive heap fragmentation, which could cause problems elsewhere in
your program. (The standard does use the term "heap", but you get the
idea.)


does it mean that the elements of my array won't be contigous?
 
M

Michel Rouzic

Eric said:
Michel said:
Eric said:
Michel Rouzic wrote:


I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.


That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';


That's the general idea. As written, though, it's not
very robust: it's oblivious to realloc() failures and to
end-of-file or errors on the standard input. Pay attention
to the Sixth Commandment



"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."

ok, I guess I should do that but.... well... as it says, it makes the
code much longer and makes it harder to read and, well, I wouldn't know
what to do with an error anyways, I mean, if the return code ain't
right, i might display something like "the return code ain't right\n"
and don't know what I should do.

realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it? how can you get an EOF or some error
from stdin?

you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)
 
M

Michel Rouzic

Eric said:
Other observations: `int' should probably be `size_t'

as for the size_t thing, well i could cast it for realloc, like this :
mystring=realloc(mystring, (size_t) i+1); other than that, I think i
should leave it to int, unless it is ok to do iterations and refer to
some element of an array by a size_t, which i doubt
 
J

Jordan Abel

Eric said:
Michel said:
Eric Sosman wrote:

Michel Rouzic wrote:


I know it must sound like a newbie question, but I never really had to
bother with that before, and I didn't even find an answer in the c.l.c
FAQ

I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

Let's suppose you're reading complete '\n'-terminated lines,
the way fgets() does but with no explicit length limit. You
could do something like this (pseudocode, no error checking):

buffer = <empty>
do {
expand buffer with realloc()
append next input character
} while (character wasn't '\n');
expand buffer with realloc()
append '\0'

For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

Once you've read the entire line you can, if you like, realloc()
the buffer one final time to trim it to the exact size. I find
that's seldom worth the bother: your program is probably going to
process the line and free() or re-use the buffer pretty soon.


That's the method I like the best. I wouldn't bother with make larger
buffers anyways, would make things too complicated, and things don't
have to be sooo efficient when it comes to inputting strings, what
matters the most is the result. i think I made your idea work pretty
good, tell me if you think yous potted anything wrong with it, I GDBed
it and the content of the memory looked fine

int i;
char *mystring;

i=0;
mystring=NULL;
do
{
mystring=realloc(mystring, i+1);
mystring=getchar();
i++;
}
while (mystring[i-1]!='\n');
mystring[i-1]='\0';


That's the general idea. As written, though, it's not
very robust: it's oblivious to realloc() failures and to
end-of-file or errors on the standard input. Pay attention
to the Sixth Commandment



"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."


unless you _really_ don't care whether it succeeds or fails

what are you going to do if a printf fails? what if you just want to
continue?
realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it?

it returns a null pointer because it didn't find enough memory
how can you get an EOF or some error from stdin?

someone types the EOF character, or interrupts [sending a signal on read
can possibly cause a read to fail in addition to calling a signal
handler]
you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)

if you do something that can cause a segmentation fault, it could very
well cause worse.
 
F

Flash Gordon

Michel said:
as for the size_t thing, well i could cast it for realloc, like this :
mystring=realloc(mystring, (size_t) i+1);

Why on earth would you think that? You *really* need to start working
through a decent text book.
> other than that, I think i
should leave it to int, unless it is ok to do iterations and refer to
some element of an array by a size_t, which i doubt

Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.
 
F

Flash Gordon

Michel said:
Keith Thompson wrote:


does it mean that the elements of my array won't be contigous?

No, it means the free space in your heap will be fragmented. Any memory
block returned by *alloc is always contiguous.
 
F

Flash Gordon

Michel Rouzic wrote:

"If a function be advertised to return an error code in the event of
difficulties, thou shalt check for that code, yea, even though the
checks triple the size of thy code..."

ok, I guess I should do that but.... well... as it says, it makes the
code much longer and makes it harder to read and, well, I wouldn't know
what to do with an error anyways, I mean, if the return code ain't
right, i might display something like "the return code ain't right\n"
and don't know what I should do.

If nothing else you can terminate the program with an error message
saying that it has failed.
realloc() failures? never heard of that (maybe cuz im quite new to all
that), what can happen with it?

There might not be a large enough block of free memory because you have
fragmented it. Or you might just have run out of memory, or hit a limit
enforced by the OS. No resource is *ever* infinite.
> how can you get an EOF or some error
from stdin?

On most systems there is a way for the user to signal EOF on stdin. Or a
file might be being piped in to stdin.
you know, I care to know about why I should check for errors, and about
what can cause them and what I should do about it, but so far (and you
can understand that) I like to keep my code as simple as possible,
mostly that I consider that my program should work only as far as it is
used correctly (like, if a program is supposed to have a .wav file in
input, and the user put a .mp3 or anything else instead, I don't wanna
bother with doing stuff that will tell him "you need to input a .wav
file", i rather let him have a segmentation fault)

It might not cause a segmentation violation. It might overwrite critical
data instead.
 
J

Jordan Abel

The way it was done in K&P Programming Practice, where they started
with a one char sized buffer and doubled the size every time there was
no space left. I`s a good "golden middle" type idea, i think.

doubling it each time actually reduces the worst-case complexity [of the
entire operation] from O(n^2) to O(n). I'm not sure what complexity you
end up with adding less than the full previous size. probably something
like nlogn or n^1.5
 
E

Eric Sosman

Jordan said:
Eric Sosman wrote:



The way it was done in K&P Programming Practice, where they started
with a one char sized buffer and doubled the size every time there was
no space left. I`s a good "golden middle" type idea, i think.


doubling it each time actually reduces the worst-case complexity [of the
entire operation] from O(n^2) to O(n). I'm not sure what complexity you
end up with adding less than the full previous size. probably something
like nlogn or n^1.5

Still O(n), just with a different multiplier. Assume you
start with a buffer of B characters and grow it by a fraction
r > 1 whenever it fills up. Then (ignoring the rounding off
to integer sizes), you get successive buffers of size B, B*r,
B*r^2, ... until after k expansions you eventually get to
B*r^k >= n.

You've copied (potentially) the contents of all the
smaller buffers, hence you may have copied as many as

B + B*r + B*r^2 + ... + B*r^(k-1)
= B * (r^k - 1) / (r - 1)

characters. Noting that k is log_base_r(n/B) + x, 0 <= x < 1,
the total characters copied come to:

B * (r^(log_base_r(n/B) + x) - 1) / (r - 1)
= B * (n/B * r^x - 1) / (r - 1)
= (n * r^x - B) / (r - 1)
= O(n)

"Next time, we're gonna do ... FRACTIONS!" -- Tom Lehrer
 
G

grayhag

Eric said:
For efficiency's sake you'd probably want to avoid quite
so many trips in and out of the memory allocator, so a refinement
would be to start with a roomier buffer and expand by more than
one character at a time if necessary. (My own function for doing
this -- everybody writes one eventually -- begins with 100 characters
and adds half the buffer's current size each time it needs to expand:
100, 150, 225, ...)

The way it was done in K&P Programming Practice,
where they started with a one char sized buffer
and doubled the size every time there was no space left.
I`s a good "golden middle" type idea, i think.
 
I

Inso Haggath

Jordan said:
The way it was done in K&P Programming Practice, where they started
with a one char sized buffer and doubled the size every time there was
no space left. I`s a good "golden middle" type idea, i think.

doubling it each time actually reduces the worst-case complexity [of the
entire operation] from O(n^2) to O(n). I'm not sure what complexity you
end up with adding less than the full previous size. probably something
like nlogn or n^1.5

Maybe adding a diminishing multiplier for additive part.
It`ll get lower and lower by a certain percentile
each new allocation.
 
M

Michel Rouzic

Flash said:
No, it means the free space in your heap will be fragmented. Any memory
block returned by *alloc is always contiguous.

oh, thats what I thought. But, what are the consequences, I mean, I'll
have some memory occupied, some free space, and then my string, so, as
for the free space, does it mean it could only be used for something
smll enough to fit it, or otherwise it will just be wasted space?
 
M

Michel Rouzic

Flash said:
Michel Rouzic wrote:



If nothing else you can terminate the program with an error message
saying that it has failed.


There might not be a large enough block of free memory because you have
fragmented it. Or you might just have run out of memory, or hit a limit
enforced by the OS. No resource is *ever* infinite.


On most systems there is a way for the user to signal EOF on stdin. Or a
file might be being piped in to stdin.


It might not cause a segmentation violation. It might overwrite critical
data instead.

um... i dont think you know what i'm refering to. The example I took is
the one of my program that reads .wav files to deal with them, without
checking that it actually is a .wav file. basically, it will just look
at a precise place in a file for a 32-bit integer telling how many
bytes are to be read in the file. If you try to input a non .wav file
instead, the 32-bit integer read will be bogus, and is likely to have a
value much higher than the number of bytes left in the file, so the
program will try to read even after the it has read the whole file,
thus causing a segmentation fault.

so basically, as I said, if the user wants to input an mp3 file instead
of a .wav, it's at his own risk. And if you want to input an EOF
character at some point, well, it's at your own risk too, maybe one day
i'll bother with making some stuff to check that kind of foolishness,
but so far i've got more prioritary things to do than this kinda of
stuff (like making sure my program does what it's supposed to do)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,498
Latest member
log5Sshell/alfa5

Latest Threads

Top