Proper way to input a dynamically-allocated string

M

Michel Rouzic

Flash said:
Why on earth would you think that? You *really* need to start working
through a decent text book.


Again, what on earth makes you think that? Of course you can use a
size_t variable for indexing in to an array.

ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?
 
P

pete

Michel said:
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

The integer parameter type of realloc is size_t,
so casting an int argument to type size_t, does nothing.

void *realloc(void *ptr, size_t size);

Do you have some resources available to learn about size_t?
 
W

websnarf

Michel said:
I know it must sound like a newbie question, but I never really had to
bother with that before,

Its not. Even experienced programmers seem not to know the proper
answer to this question (hint:fgets() is hardly adequate.)
[....] and I didn't even find an answer in the c.l.c FAQ

Not much of a surprise there.
I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

You have to understand, this is a foreign concept to many if not most
of the readers of this newsgroup. Every string container must have a
size, and "the C way" is to declare that size up front. You can search
the archives of this newsgroup to endless examples of this. The C
library is almost completely useless on this issue as well.
I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they take just
the needed space and I don't want to see any number such as 100 or
10,000 or even 4,294,967,296 in my code. Any way it can be done?

You can read my solution to this problem here:

http://www.pobox.com/~qed/userInput.html

The key point is that the C standard library does not provide
provisions for reading a line of dynamically sized string. 1) gets()
is a deterministic overflow and 2) fgets() is inadequate. So no matter
what, for a really correct and useful solution you have to roll your
own algorithm (but it is doable as the link above demonstrates.)
 
K

Keith Thompson

Michel Rouzic said:
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?

Why *should* you use an int?

The second argument of realloc() is of type size_t. You can use an
int if like (it will be implicitly converted if you have a proper
"#include <stdlib.h>"), but there's no good reason to do so.
 
M

Maineline News

Michel Rouzic wrote:
>
I know it must sound like a newbie question, but I never really
had to bother with that before, and I didn't even find an answer
in the c.l.c FAQ

I'd like to know what's the really proper way for input a string
in an array of char that's dynamically allocated. I mean, I wish
not to see any such things as char mystring[100]; I don't want
to see any number (if possible) I just want to declare something
like char *mystring; and then I don't know how allocate it with
just as many chars (with the space for the \0 of course) as you
>> get from stdin.

I'd really like to know once for all what's the smartest way of
inputing strings from stdin and storing them in a way so they
take just the needed space and I don't want to see any number
such as 100 or 10,000 or even 4,294,967,296 in my code. Any way
>> it can be done?

From stdin is a bit of a problem. The usual answer is to use a
buffer to temporarily store the string:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

.... snip code ...

Or you can get my ggets routine at:
<http://cbfalconer.home.att.net/download/ggets.zip>
 
R

Richard Heathfield

(e-mail address removed) said:
Its not. Even experienced programmers seem not to know the proper
answer to this question (hint:fgets() is hardly adequate.)

In my experience, you're wrong; how to do this is common knowledge amongst
experienced programmers.
I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100]; I don't want to see any number
(if possible) I just want to declare something like char *mystring; and
then I don't know how allocate it with just as many chars (with the
space for the \0 of course) as you get from stdin.

You have to understand, this is a foreign concept to many if not most
of the readers of this newsgroup. Every string container must have a
size, and "the C way" is to declare that size up front. You can search
the archives of this newsgroup to endless examples of this.

Most people who ask questions here are newbies, which is why they're asking
questions; that's why we tend to give them simple answers. Nevertheless,
the "how do I get an entire line of input" thing has been asked and
satisfactorily answered many times here.
The C
library is almost completely useless on this issue as well.

That's like saying the toolkit you get with a new bicycle is useless. Well,
yes, it's not brilliant - but it's probably enough to get you up and
rolling on Christmas Day. Serious users will want better in due course, and
quite a few solutions to this problem have been presented in this newsgroup
in the past.
 
M

Mark McIntyre

Its not. Even experienced programmers seem not to know the proper
answer to this question

Don't be silly.
I'd like to know what's the really proper way for input a string in an
array of char that's dynamically allocated. I mean, I wish not to see
any such things as char mystring[100];

You have to understand, this is a foreign concept to many if not most
of the readers of this newsgroup.

Absolute rubbish. Of course, I do realise you're trolling. But you
really are a chump.
 
S

slebetman

Michel said:
ok cool, so why shouldn't I use an int for the size in a realloc,

Apart from the obvious fact that realloc is defined to accept size_t
instead of int, you should consider size_t to be self documenting. That
is, it is immediately obvious what the variable means:

int len;

Oh it's a number of how long something is.

size_t len;

Oh it is how long/large something is in memory.

or why again shouldn't I cast it to size_t?

Because, for very large arrays (I myself haven't seen arrays larger
than 2GB, but it is possible) on 64bit (or 128bit? someday...)
platforms, size_t may or may not be the same size as int. You don't
know, I don't know. But your compiler knows. Using size_t causes your
compiler to treat it properly. On 32bit platforms size_t may end up
being the same as an unsigned int anyway so don't worry about it
generating different code. Leave it to your compiler to decide.
 
S

slebetman

Michel said:
oh, thats what I thought. But, what are the consequences, I mean, I'll
have some memory occupied, some free space, and then my string, so, as
for the free space, does it mean it could only be used for something
smll enough to fit it, or otherwise it will just be wasted space?

The consequences is that sooner or later malloc/realloc will fail
because it can't find a contigous area of memory as large as the one
you requested. Coupled with your refusal to handle realloc failures,
this will result in a program crash.
 
F

Flash Gordon

Michel said:
um... i dont think you know what i'm refering to.

I do and you are WRONG.
> The example I took is
the one of my program that reads .wav files to deal with them, without
checking that it actually is a .wav file. basically, it will just look
at a precise place in a file for a 32-bit integer telling how many
bytes are to be read in the file. If you try to input a non .wav file
instead, the 32-bit integer read will be bogus, and is likely to have a
value much higher than the number of bytes left in the file, so the
program will try to read even after the it has read the whole file,
thus causing a segmentation fault.

There is absolutely NO guarantee that is will cause a segmentation
fault. For a start that is a term that is not defined in the C standard,
secondly it is a term not applicable to all systems, thirdly almost any
action that could cause a segmentation fault could *also* overwrite some
critical data used by your application, such the FILE structures,
possibly causeing corruption of files on disk.
so basically, as I said, if the user wants to input an mp3 file instead
of a .wav, it's at his own risk. And if you want to input an EOF
character at some point, well, it's at your own risk too, maybe one day
i'll bother with making some stuff to check that kind of foolishness,
but so far i've got more prioritary things to do than this kinda of
stuff (like making sure my program does what it's supposed to do)

One of the *first* things to worry about is making sure that your input
data is correct. As well as the reasons I've also mentioned, i.e. risk
of doing nasty things to your system, which are REAL risks, although the
most likely problem is corrupting either output OR input file (yes, the
input file CAN be corrupted). There is also the risk that the format
gets extended and a wav file contains things you don't handle properly,
causing your program to corrupt things despite being given a real wav file.
 
S

slebetman

Michel said:
<snip>

so basically, as I said, if the user wants to input an mp3 file instead
of a .wav, it's at his own risk. And if you want to input an EOF
character at some point, well, it's at your own risk too, maybe one day
i'll bother with making some stuff to check that kind of foolishness,
but so far i've got more prioritary things to do than this kinda of
stuff (like making sure my program does what it's supposed to do)

The program crashing due to memory allocation error is the programmer's
foolishness for not checking the return value of realloc. No need to
punish the user for your own faults.

Checking for errors is not foolishness. It is the responsibility of the
programmer, more so in fact than the other 'priority' things like
adding features. This is because unchecked errors will cause those
wonderful features you've developed to fail at the most unexpected
times - like when demoing your app to your client.

When you program in C error handling, memory allocation etc. is the
responsibility of the programmer. This is because C is really nothing
more than 'high level assembly'. If you find this uncomfortable, and if
you insist on not checking errors, then don't write in C. Languages
like Tcl or Perl is more suitable. All the low level errors are already
handled by the people who wrote the interpreters in C so you don't have
to. Errors in scripting languages don't have the serious consequences
like in C.
 
C

Christian Bau

Flash Gordon said:
One of the *first* things to worry about is making sure that your input
data is correct. As well as the reasons I've also mentioned, i.e. risk
of doing nasty things to your system, which are REAL risks, although the
most likely problem is corrupting either output OR input file (yes, the
input file CAN be corrupted). There is also the risk that the format
gets extended and a wav file contains things you don't handle properly,
causing your program to corrupt things despite being given a real wav file.

If the application that is programmed in such a careless way is
important and widespread, then some attacker will figure out how to
construct a file that will not only crash the computer, but will make it
do exactly what the attacker wants it to do.
 
S

slebetman

Christian said:
If the application that is programmed in such a careless way is
important and widespread, then some attacker will figure out how to
construct a file that will not only crash the computer, but will make it
do exactly what the attacker wants it to do.

Yes! The infamous "buffer overflow".
 
M

Malcolm

Michel Rouzic said:
ok cool, so why shouldn't I use an int for the size in a realloc, or
why again shouldn't I cast it to size_t?
size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

There are many subtle problems with the use of unsigned integers. Java
eliminated them, for very good reasons.
The problem with using integers, on the other hand, is largely theoretical.
The maximum memory size allowed by a compiler may exceed the size of an
integer.
It is perfectly plausible that a company may have more than 32767 employees.
It is also perfectly plausible that a C program may have to run on a machine
where int is 16 bits. It is not plausible that you will want to run the
payroll for a company with more that 30,000 employees on a machine with
16-bit integers. Hence we can happily use an int to hold the count of
employees, or a long if really paranoid.
 
B

Ben Pfaff

Malcolm said:
size_t is an uglification that will run through all your code, wrecking its
readability and elegance, as every memory size, and hence every array index,
and hence every count, has to be a size_t.

I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.
 
M

Malcolm

Ben Pfaff said:
I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.
I'm going to do a thread on size_t sometime soon.

You have illustrated the problem however. Once you allow size_t, almost
every integer becomes a size_t, because most integers count something.
 
P

pete

Malcolm said:
I'm going to do a thread on size_t sometime soon.

You have illustrated the problem however.
Once you allow size_t, almost
every integer becomes a size_t,
because most integers count something.

Type int is good for return error codes or status codes.
Functions that do comparing, return type int.
A lot of stdio functions, return type int.

The number of nodes in a list, isn't tied to size_t.
I use long unsigned for counting those.
 
F

Flash Gordon

Malcolm said:
I'm going to do a thread on size_t sometime soon.

You have illustrated the problem however. Once you allow size_t, almost
every integer becomes a size_t, because most integers count something.

What is the problem with that? In any case, a lot of integers in code I
write are not counting the size of C objects (they might be scaled costs
which can even be negative, for example).
 
J

Joe Wright

Ben said:
I don't see why that is a problem. Much of my own code is
written that way. size_t is simply the natural type in C for the
size of something.

I have seldom defined a variable of type size_t. On DJGPP..

typedef long unsigned int size_t;

...is its declaration. In limits.h I find..

#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

...and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t parameters to
indicate positive values. Otherwise, int works perfectly well for me.
 
M

Malcolm

Joe Wright said:
#define SSIZE_MAX 2147483647
#define INT_MAX 2147483647
#define LONG_MAX 2147483647L

..and so see no compelling reason to type anything size_t rather than int.

It is interesting to have functions prototyped with size_t parameters to
indicate positive values. Otherwise, int works perfectly well for me.
Take this function

/*
trivial function that counts number of occurrences of ch in str
*/
mystrcount(const char *str, int ch)

Now basically this function is alwaysgoing to return small integers.
However, technically, someone could pass it a massive string, all set to one
character. Then an int would overflow, if size_t were bigger than an int.

Thus the function must return a size_t.

That means that the higher-level logic which calls it must also be written
with size_t, and the ugliness propagates
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,498
Latest member
log5Sshell/alfa5

Latest Threads

Top