Reading a line from a file

  • Thread starter Joona I Palaste
  • Start date
J

Joona I Palaste

Colin JN Breame said:
Fairly new to C. What is the best way to read a line (\n terminated) from
a file? Ive looked at fscanf but was not sure which format specifier to
use. (%s perhaps).

If you know how long the line will be, then fgets() is much better for
the job than fscanf().
 
C

Colin JN Breame

Hi,

Fairly new to C. What is the best way to read a line (\n terminated) from
a file? Ive looked at fscanf but was not sure which format specifier to
use. (%s perhaps).

Thanks
Colin
 
K

Kevin Goodsell

Colin said:
The line is variable length.

Arrays in C are not variable length. (That is, they won't automatically
change size. There is a C99 feature called Variable Length Arrays, but
they never actually change length either.) Therefore, you have a few
options:

1) Create an array that's "large enough" for your longest line.
2) Create a dynamic array and read as much of the line as you can into
that array. If that's the whole line, you are done. If it's not the
whole line, realloc the array to make it larger and read more. Rinse and
repeat until the entire line is read (or you run out of memory,
whichever comes first).

Personally, I'd recommend using 2, and wrapping the functionality up in
a separate function:

char *line = my_getline(some_file);

/* do stuff with line here */

free(line);

-Kevin
 
M

Morris Dovey

Kevin said:
Arrays in C are not variable length. (That is, they won't
automatically change size. There is a C99 feature called
Variable Length Arrays, but they never actually change length
either.) Therefore, you have a few options:

1) Create an array that's "large enough" for your longest line.
2) Create a dynamic array and read as much of the line as you
can into that array. If that's the whole line, you are done.
If it's not the whole line, realloc the array to make it
larger and read more. Rinse and repeat until the entire line
is read (or you run out of memory, whichever comes first).

Personally, I'd recommend using 2, and wrapping the
functionality up in a separate function:

char *line = my_getline(some_file);

/* do stuff with line here */

free(line);

Colin...

You might try a Google groups comp.lang.c search. Richard
Heathfield and Chuck Falconer have both provided URLs to their
routines; and I have another at http://www.iedu.com/mrd/c/getsm.c
 
R

Richard Heathfield

Colin said:
The line is variable length.


I discuss this problem at some length(!) on my Web site:

http://users.powernet.co.uk/eton/c/fgetdata.html

where I discuss various standard library functions for capturing string
data, and then present functions for getting: (a) an entire word, and (b)
an entire line, from a stream, irrespective of length (up to obvious limits
of memory, of course).
 
R

Richard Heathfield

Morris said:
You might try a Google groups comp.lang.c search. Richard
Heathfield and Chuck Falconer have both provided URLs to their
routines; and I have another at http://www.iedu.com/mrd/c/getsm.c

I have taken the liberty of adding (to the page on my site dealing with this
issue) a link to your URL, with the intent of providing lots of choice to
the discerning input-issues-aware programmer.
 
P

Paul Hsieh

So you need a fuly dynamic solution ...
Colin...

You might try a Google groups comp.lang.c search. Richard
Heathfield and Chuck Falconer have both provided URLs to their
routines; and I have another at http://www.iedu.com/mrd/c/getsm.c

This getsm.c, *in practice* will overflow that stack of just about any fixed
stack implementation of C for large enough input. So its just a buffer
overflow of a different kind. Both Richard Heathfield's and Chuck Falconer's
solutions are O(n^2) in performance, where n is the length of the input --
i.e., for large enough input, you machine will simply slow to a crawl (not to
mention the fact that they will shred the heap of any substandard heap
implementation) and actually be unable to retain the input in reasonable time.

A somewhat more general solution that doesn't suffer either of these problems
can be found here:

http://www.pobox.com/~qed/userInput.html

The example modes of usage are O(n) with respect to the input, you can
*optionally* set an upper bound for the target buffer, or with your own
customization you can process the input incrementally without storing the whole
input buffer if that makes sense in your application. It comes with examples
of the most common cases. For example:

char * s;
getstralloc (&s);
if (s) {
printf ("<%s>\n", s);
free (s);
}

will perform what you would have hoped gets() did without the built-in
unavoidable undefined behaviour.
 
C

CBFalconer

Paul said:
This getsm.c, *in practice* will overflow that stack of just about
any fixed stack implementation of C for large enough input. So
its just a buffer overflow of a different kind. Both Richard
Heathfield's and Chuck Falconer's solutions are O(n^2) in
performance, where n is the length of the input -- i.e., for large
enough input, you machine will simply slow to a crawl (not to
mention the fact that they will shred the heap of any substandard
heap implementation) and actually be unable to retain the input in
reasonable time.

No they aren't O(n^2), they are O(n) on any system with a good
realloc policy. They are also arranged to provide best
performance on most likely input. Extremely long lines are not
the norm for interactive input or any text - the typer tends to
get tired. At any rate the system is almost certainly going to be
i/o limited, not processer limited.

I don't know about the getsm version.

Please take more care with your attributions. I corrected them
above.
 
M

Morris Dovey

Paul said:
This getsm.c, *in practice* will overflow that stack of just
about any fixed stack implementation of C for large enough
input. So its just a buffer overflow of a different kind.

Absolutely correct. It blows up fairly reliably on my test
platform not long after you type the 700,000th character in the
line; but stack size, type, and even existence are, of course,
not C issues.

These essays may stimulate Colin to think in new ways (for him)
and to produce a solution that improves on all of the examples.
This is also, of course, not a C issue - but (IMO) one of the
better possible outcomes.
 
R

Richard Heathfield

Paul Hsieh wrote:

Both Richard Heathfield's and Chuck
Falconer's solutions are O(n^2) in performance, where n is the length of
the input -- i.e., for large enough input, you machine will simply slow to
a crawl (not to mention the fact that they will shred the heap of any
substandard heap implementation) and actually be unable to retain the
input in reasonable time.

That was news to me, so I catted all my /usr/include stuff, recursively,
giving me a corpus of about 16 MB. I then wrote a little getc(stdin) loop
to hash the whole lot, and it crunched through the bytes at about 4.8
MB/sec on my machine. Then I wrote a loop using fgetline (the solution you
claim is O(n^2), above). I expected it to be a little slower, since it uses
dynamic memory allocation to allow line-by-line reading (which is the whole
point of the function, obviously). It turned out to crank through at about
3.8 MB/sec, which doesn't square with your claim of O(n^2) as far as I can
see.
A somewhat more general solution that doesn't suffer either of these
problems can be found here:

http://www.pobox.com/~qed/userInput.html

Ah, yes. I tried that, having first inserted the necessary headers and a
little test driver.

I got three rather serious compilation errors (one for a missing declaration
of n, and a couple where you meant . but used -> instead). After I'd fixed
those, I wrote a test program, which segfaulted (in your code, not mine) on
the first iteration.

My routine may not be the fastest line-reader in the world, but at least it
works. If we remove the requirement that the code must work correctly, I
can write a version that will take no memory and run in zero time.
 
N

nrk

Richard Heathfield wrote:

My routine may not be the fastest line-reader in the world, but at least
it works. If we remove the requirement that the code must work correctly,
I can write a version that will take no memory and run in zero time.
LOL!!! That's a great quote that I am framing for posterity when faced with
similar situations (of course, due attributions will be made :)

-nrk.
 
K

Kevin Goodsell

nrk said:
Richard Heathfield wrote:



LOL!!! That's a great quote that I am framing for posterity when faced with
similar situations (of course, due attributions will be made :)

I think Richard is copying me. ;-) Several months ago in comp.lang.c++ I
said:

"Once the requirement for correctness is removed, the speed of the
program becomes irrelevant, because it can be arbitrarily fast or
slow."

-Kevin
 
R

Richard Heathfield

Kevin said:
I think Richard is copying me. ;-)

No, I don't think so. But the more I think about it, the more I think I
copied /somebody/. After posting that earlier this evening, I spent about
half an hour ploughing through some old programming books, trying to
discover whether I'd inadvertently plagiarised someone or other. No joy.
But if I got it from anywhere, it was probably from either McConnell or
Maguire.
Several months ago in comp.lang.c++ I
said:

"Once the requirement for correctness is removed, the speed of the
program becomes irrelevant, because it can be arbitrarily fast or
slow."

And what would I be doing reading clc++? :)
 
R

Richard Heathfield

Kevin said:
I think Richard is copying me. ;-)


Nailed it.

"...your program /doesn't work/. If mine doesn't have to work, I can make it
run instantly and take up no memory." Steve McConnell, "Code Complete",
p682, quoting Gerald Weinberg relating an incident happening to someone
else entirely.

So it's a complete re-phrasing of a half-remembered reference to a
second-hand anecdote. This is probably some kind of plagiaristic record.
 
K

Kevin Goodsell

Richard said:
And what would I be doing reading clc++? :)

Looking for things to plagiarize, of course. :p

OK, probably not. But aren't you also a C++ programmer? Or have I
confused you with someone else?

-Kevin
 
K

Kevin Goodsell

Richard said:
Kevin Goodsell wrote:





Nailed it.

"...your program /doesn't work/. If mine doesn't have to work, I can make it
run instantly and take up no memory." Steve McConnell, "Code Complete",
p682, quoting Gerald Weinberg relating an incident happening to someone
else entirely.

So it's a complete re-phrasing of a half-remembered reference to a
second-hand anecdote. This is probably some kind of plagiaristic record.

But I had never even read the reference. Does that mean I beat your
record? ;-)

(I do plan to read that book, though. I have a copy, but it's barely
been opened. I think I read the first 10 pages when I first got it, then
switched to something else as I have a tendency to do.)

-Kevin
 
R

Richard Heathfield

Kevin said:
Looking for things to plagiarize, of course. :p

OK, probably not. But aren't you also a C++ programmer?

Only when pressed by hunger. :)

Yes, I do program in C++ on occasion, mainly when I need a GUI in a hurry.
But I don't particularly /like/ the language, so I generally don't read
clc++ particularly often.
Or have I
confused you with someone else?

Oh, undoubtedly, my dear chap. Undoubtedly. But not for the reason you
think.
 
J

John Bode

Colin JN Breame said:
Hi,

Fairly new to C. What is the best way to read a line (\n terminated) from
a file? Ive looked at fscanf but was not sure which format specifier to
use. (%s perhaps).

Thanks
Colin

If you're going to use fscanf() to read '\n'-terminated lines from a
file and store the whole line to a single buffer, use the %[
conversion specifier:

char buff[101];
FILE *infile;
...
fscanf (infile, "%100[^\n]%*[^\n]%*c", buff);

The conversion specifier "%100[\n]" means "read characters until we
see EOF, a newline character ('\n'), or until we've read 100
characters, and assign them to buff." The conversion specifier
"%*[^\n]" means "read characters until we see EOF or a newline and
throw them away." This conversion specifier is there in case the
input line is longer than our expected maximum line length, and gives
us a way to remove those extra characters from the input buffer. The
"%*c" conversion specifier means "read the next character (which
should be the newline) and throw it away." This removes the newline
character from the input buffer. You never want to use the "%["
conversion specifier without specifying a maximum field width;
fscanf() has no way to tell how big your target buffer is unless you
explicitly tell it, so if your input buffer is sized for 100
characters and the input line is 132 characters and you haven't
specified a maximum field width, fscanf() will attempt to write those
extra 32 characters to memory outside your buffer, which will cause a
crash (if you're lucky) or otherwise weird behavior (if you're not).

Alternately, you can use fgets() to read an input line into a buffer.
Like the %[ conversion specifier above, you specify a maximum buffer
length:

char buff[101];
FILE *infile;
...
fgets (buff, sizeof buff, infile);

Like the %[ conversion specifier above, fgets() will read until it
sees either an EOF, a newline, or until we've read 100 characters, and
stores them to buff. Unlike the conversion specifier used above, the
newline character is stored as part of the buffer. Also, unlike
fscanf(), there's no provision to automatically consume and discard
any characters beyond the expected input line length; you'll have to
call fgets() (or other input routine) repeatedly to clear out the
input buffer. Note that fflush() should *not* be used to clear the
input buffer; you must use an actual input routine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,122
Messages
2,570,717
Members
47,283
Latest member
VonnieEwan

Latest Threads

Top