Anyway to read file line by line ?

A

Abby

My .dat file will contain information like below.

///////////
First
0x04
0x05
0x06

Second
0x07
0x08
0x09

Third
0x0E
0x0F
0x0D
///////////

In my main program, it will open this dat file, read line by line.
When it hits the string I'm searching for (e.g. "First"), the program
will know that the next line will be the variables to be stored. Are
there any C function to read file line by line? Please give me some
clue. Thanks a lot.

Abby.
 
A

Alan Balmer

My .dat file will contain information like below.

///////////
First
0x04
0x05
0x06

Second
0x07
0x08
0x09

Third
0x0E
0x0F
0x0D
///////////

In my main program, it will open this dat file, read line by line.
When it hits the string I'm searching for (e.g. "First"), the program
will know that the next line will be the variables to be stored. Are
there any C function to read file line by line? Please give me some
clue. Thanks a lot.
Look at the fgets function.
 
M

Malcolm

Abby said:
Are there any C function to read file line by line? Please give me some
clue. Thanks a lot.
fgets() reads a line, if you know the maximum length it can be.

It is maybe better to use fscanf(). A bit like printf(), it takes a format
string that is a little language in its own right.
 
J

Jeff

Abby said:
My .dat file will contain information like below.

///////////
First
0x04
0x05
0x06

Second
0x07
0x08
0x09

Third
0x0E
0x0F
0x0D
///////////

In my main program, it will open this dat file, read line by line.
When it hits the string I'm searching for (e.g. "First"), the program
will know that the next line will be the variables to be stored. Are
there any C function to read file line by line? Please give me some
clue. Thanks a lot.

Abby.

You can use fgets( ) to read the line from file.
In this case, if you need to extract the value of hex code (like 0x0E,
0x0F ). You can use sscanf( ) to help you.


fgets( stringbuffer, 20, fp);
sscanf( stringbuffer, "%X", &var1);
 
D

Dan Pop

In said:
It's not difficult to dynamically allocate memory for a line with fgets.

How can you predict the size needed by each fgets() call?
Then it can be any length.

To be able to read arbitrarily sized lines, fgets() is of little help.
The code actually implementing such a routine is even simpler if you
don't use fgets at all!

AFAICT, fgets is a solution in search of a problem.
The length is still fixed - or you have a buffer overflow.

You have the option to trivially discard the excess characters, thus
avoiding a fixed buffer overflow. Depending on the actual application,
this may or may not be an acceptable solution.
Best to use fgets.

Nope, it's *never* best to use fgets. Its usage is complicated and
error prone. When I ask for a fgets-based, bullet-proof solution, I
usually get broken code: it's far too easy to omit or ignore one of the
ways a fgets call can go wrong. The function is simply broken by
design (assuming that it was actually designed).

Dan
 
D

Dan Pop

You can use fgets( ) to read the line from file.
In this case, if you need to extract the value of hex code (like 0x0E,
0x0F ). You can use sscanf( ) to help you.

fgets( stringbuffer, 20, fp);

You can't blindly assume that a fgets call succeeds or that it exhausts
an input line.
sscanf( stringbuffer, "%X", &var1);

You can't blindly assume that a sscanf call succeeds, either.

With relatively few exceptions, the return value of a library function
call should not be discarded *before* being checked.

Dan
 
E

Emmanuel Delahaye

In said:
Nope, it's *never* best to use fgets. Its usage is complicated and
error prone. When I ask for a fgets-based, bullet-proof solution, I
usually get broken code: it's far too easy to omit or ignore one of the
ways a fgets call can go wrong. The function is simply broken by
design (assuming that it was actually designed).

May I submit this code?

#ifndef H_ED_IO_20030121102334
#define H_ED_IO_20030121102334

/* ed/inc/io.h */
#include <stdio.h>

enum
{
IO_OK,
IO_ERR_READ,
IO_ERR_LENGTH,
IO_ERR_BUFFER,
IO_ERR_CONVERSION,
IO_ERR_EMPTY,
IO_ERR_OUTPUT_ADDRESS,
IO_ERR_NB
};

int fget_s (char *s, size_t size, FILE * fp);

<...>

#endif /* H_ED_IO_20030121102334 */


#include "ed/inc/io.h"
#include <string.h>

static void clear_in (FILE * fp)
{
int c;

while ((c = fgetc (fp)) != '\n' && c != EOF)
{
}
}

int fget_s (char *s, size_t size, FILE * fp)
{
int err = IO_OK;

if (s != NULL)
{
if (fgets (s, size, fp) != NULL)
{
char *p = strchr (s, '\n');

if (p)
{
*p = 0;
}
else
{
clear_in (fp);
err = IO_ERR_LENGTH;
}
}
else
{
err = IO_ERR_READ;
}

}
else
{
err = IO_ERR_BUFFER;
}

return err;
}

<...>
 
D

Dan Pop

In said:
May I submit this code?

#ifndef H_ED_IO_20030121102334
#define H_ED_IO_20030121102334

/* ed/inc/io.h */
#include <stdio.h>

enum
{
IO_OK,
IO_ERR_READ,
IO_ERR_LENGTH,
IO_ERR_BUFFER,
IO_ERR_CONVERSION,
IO_ERR_EMPTY,
IO_ERR_OUTPUT_ADDRESS,
IO_ERR_NB
};

int fget_s (char *s, size_t size, FILE * fp);

<...>

#endif /* H_ED_IO_20030121102334 */


#include "ed/inc/io.h"
#include <string.h>

static void clear_in (FILE * fp)
{
int c;

while ((c = fgetc (fp)) != '\n' && c != EOF)
{
}
}

int fget_s (char *s, size_t size, FILE * fp)
{
int err = IO_OK;

if (s != NULL)

What's the point in checking s if you don't check fp too? Either both
or neither.
{
if (fgets (s, size, fp) != NULL)
{
char *p = strchr (s, '\n');

if (p)
{
*p = 0;
}
else
{
clear_in (fp);
^^^^^^^^^^^^^
What is that? An attempt to make the code look simpler than it actually
is? ;-)
err = IO_ERR_LENGTH;
}

There is a logical error here. Imagine that the *only* character left
unread on the input line was the newline itself. This shouldn't be
treated as an error condition; simply remove it from the stream and
return IO_OK: the user provided buffer contains the full line!
}
else
{
err = IO_ERR_READ;
}

}
else
{
err = IO_ERR_BUFFER;
}

return err;
}

Its sheer complexity brilliantly proves my point. Despite the fact that
you have not inlined the code of clear_in(), and of the bug mentioned
above. Fixing these issues will complicate the function significantly.

Try to implement fget_s without using fgets at all and see if the code is
not actually simpler.

The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

The value of rc is telling the whole story:

EOF - end of file encountered before getting a single character
0 - an empty line was input
1 - the line was truncated
2 - the line was completely read

Dan
 
P

Programmer Dude

Dan said:
char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

The value of rc is telling the whole story:

EOF - end of file encountered before getting a single character
0 - an empty line was input
1 - the line was truncated
2 - the line was completely read

Huh! Call me converted. Saved for future reference!!
(Thanks, Dan.)
 
E

Emmanuel Delahaye

In said:
In <[email protected]> Emmanuel Delahaye


What's the point in checking s if you don't check fp too? Either both
or neither.

You are correct. I will fix it.
^^^^^^^^^^^^^
What is that? An attempt to make the code look simpler than it actually
is? ;-)

This function was defined a few lines before.
There is a logical error here. Imagine that the *only* character left
unread on the input line was the newline itself. This shouldn't be
treated as an error condition;

You mean, if the size of the buffer was < 2? I should test that as a
precondition. To work correctly, the buffer size must be >= 2 char.
simply remove it from the stream and

It's done.
return IO_OK: the user provided buffer contains the full line!


Its sheer complexity brilliantly proves my point. Despite the fact that
you have not inlined the code of clear_in(),

It was.
and of the bug mentioned
above.
Fixing these issues will complicate the function significantly.

I guess you are kidding. Even once fixed, this code is straight simple and
clear. I see no complications or difficulties on it. It takes a significant
number of source lines because I'm used to write code that is readable (at
last by me). I'm not found of cryptic or ultra-compact coding (actually it's
forbidden in the company I work for).
Try to implement fget_s without using fgets at all and see if the code
is not actually simpler.

I see your point. Could be, yes.
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

I agree that it's short and compact, but I'd call it cryptic. The
choice of the 'complicated' qualifier or not is left to the reader!
The value of rc is telling the whole story:

EOF - end of file encountered before getting a single character
0 - an empty line was input
1 - the line was truncated
2 - the line was completely read

Interesting however. Thanks for having read me.
 
P

Paul Hsieh

My .dat file will contain information like below.

///////////
First
[...]

In my main program, it will open this dat file, read line by line.
When it hits the string I'm searching for (e.g. "First"), the program
will know that the next line will be the variables to be stored. Are
there any C function to read file line by line? Please give me some
clue. Thanks a lot.

You can try using fgets() as suggested by others, but it requires you
to make choices about the target buffer size before you read each
line. I.e., either your solution is convoluted and has a built in
buffer growing algorithm, or it has a buffer overflow, or it just
behaves incorrectly for inputs that are too large. If you have a very
strict way in which your input files are constructed then you can get
away with a fixed buffer algorithm, of course.

The alternative is to use a string library that takes care of this for
you. There are many string libraries that can work for you, but let
me just suggest how it might be done with my own called the "better
string library" (http://bstring.sf.net/):

Alternative 1: A high performance streaming solution (but it will read
ahead, which may be inappropriate for your application)

FILE * fp = fopen (inputFileName, "r");

if (fp) {
bStream s = bsopen ((bNread) fread, fp);
bstring b = cstr2bstr ("");
while (BSTR_OK == bsreadln (b, s, '\n')) {
/* b->data is an "unsigned char *" pointing to a line */
}
bdestroy (b);
bsclose (s);
fclose (fp);
}

Alternative 2: A slow but simpler streaming solution (the file reading
is exact)

FILE * fp = fopen (inputFileName, "r");

if (fp) {
bstring b;
while ((b = bgets ((bNgetc) fgetc, fp, '\n')) != NULL) {
/* b->data is an "unsigned char *" pointing to a line */
bdestroy (b);
}
fclose (fp);
}

Alternative 3: Just read the whole thing and parse it afterward (if
you have to read and hold the whole file anyway, this is probably the
fastest solution and gives you indexed random access to the lines.)

FILE * fp = fopen (inputFileName, "r");
bstring b;

if (fp) {
bstring b = bread ((bNread) fread, fp);
struct bstrList * bl = bsplit (b, '\n');
int i;
if (bl) for (i=0; i < bl->qty; i++) {
/* bl->entry->data is pointing to a line */
}
bstrListDestroy (bl);
bdestroy (b);
fclose (fp);
}

None of the alternatives shown above have any risk of buffer overflow
(though apparently they might not work correctly on the Tandom NonStop
or Data General Eclipse computers, but if you are on such a machine
you practically already know this and you have my greatest sympathy
and there is a simple way to work around the problem as described in
the documentation) and as shown there will be no memory leaks. Even
attempts at out-of-memory and input stack smashing attacks cannot
happen.

As to parsing the lines, the "better string library" has functions
like biseq, bCaselessCmp, bsplit, binstr, binchr etc and because of
its direct interoperability with char * buffers, you can use C
standard library functions like atoi, sscanf, and so on with no issue.
 
G

goose

Kevin Easton said:
Dan Pop said:
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

The value of rc is telling the whole story:

EOF - end of file encountered before getting a single character
0 - an empty line was input
1 - the line was truncated
2 - the line was completely read

If you want to be able to specify the buffer size at runtime, then this
solution becomes a little more complex.

but not by very much.


#define FORMAT_STRING_SIZE 200

size_t runtime_size;
....
char *s = malloc (runtime_size), c;
if (!s) {
/* handle error */
} else {
int rc;
char temp[FORMAT_STRING_SIZE];

sprintf (temp, "%%%u[^\\n]%%1[\\n]");
rc = fscanf (fp, temp, s, &c);

switch (rc) {
/* handle all the possible return values here */
}

free (s);
}

goose,
 
D

Dan Pop

In said:
Dan Pop said:
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

The value of rc is telling the whole story:

EOF - end of file encountered before getting a single character
0 - an empty line was input
1 - the line was truncated
2 - the line was completely read

If you want to be able to specify the buffer size at runtime, then this
solution becomes a little more complex.

A little more complex meaning an extra sprintf call. However, given the
fact that the code is so simple that it can be inlined, you don't
normally need to specify the buffer size at run time.

As I said sometime ago, none of these solutions (either scanf or fgets
based) is robust enough for production code, because they don't check
for embedded null characters in the input stream.

Dan
 
K

Kevin Easton

goose said:
Kevin Easton said:
Dan Pop said:
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

The value of rc is telling the whole story:

EOF - end of file encountered before getting a single character
0 - an empty line was input
1 - the line was truncated
2 - the line was completely read

If you want to be able to specify the buffer size at runtime, then this
solution becomes a little more complex.

but not by very much.

Yes, "little" and "not very much" are pretty much synonyms, are they
not?
#define FORMAT_STRING_SIZE 200

size_t runtime_size;
...
char *s = malloc (runtime_size), c;
if (!s) {
/* handle error */
} else {
int rc;
char temp[FORMAT_STRING_SIZE];

sprintf (temp, "%%%u[^\\n]%%1[\\n]");

Where's the matching argument for that %u conversion specifier? :)

- Kevin.
 
E

Emmanuel Delahaye

In said:
You're missing the point, although my explanation was crystal clear.
Imagine that size is 50 and the line has 49 characters plus the newline.
You have read the full line (except the newline), but you're returning
IO_ERR_LENGTH to the user, despite the fact that s contains the complete
line. Instead of doing that, you should remove the newline character
from the input stream and return IO_OK. But this is complicating your
function even further.

You mean this case:

sizeof s = 10
012345678
input error #2: 'ERR_LENGTH'
IN: '012345678'

Understood. The fix should not be too difficult, but I would imply a
strlen(), which is a waste of time.
Was it *inlined* ? I don't think so.

Dou you mean I should have coded this function like a function-like macro?

It is a static function. I let the compiler optimize it. Some of them do
inline in such conditions. Don't tell me that I should have use the C99
inline feature, or worst, some gcc extension. I attempt to write today's
/portable/ code. (At work, I use at last 4 different C-compilers for various
targets like x86, 68k, PowerPC or Texas DSP).
Fix it first, and see what it looks like.

Here is the fix (you will like it!):

static int clear_in (FILE * fp)
{
int n = 0;
int c;

do
{
c = fgetc (fp);

if (c != EOF)
{
n++;
}
}
while (c != '\n' && c != EOF);

return n;
}


int fget_s (char *s, size_t size, FILE * fp)
{
int err = IO_OK;

if (s != NULL)
{
if (size > 1)
{
if (fp != NULL)
{
if (fgets (s, size, fp) != NULL)
{
char *p = strchr (s, '\n');

if (p)
{
*p = 0;
}
else
{
int n = clear_in (fp);

if (n > 1)
{
err = IO_ERR_LENGTH;
}
}
}
else
{
err = IO_ERR_READ;
}
}
else
{
err = IO_ERR_FILE;
}
}
else
{
err = IO_ERR_BUFFER_SIZE;
}
}
else
{
err = IO_ERR_BUFFER;
}

return err;
}
Your coding style is very unreadable to me. I really had to make an
effort to match some if's and else's, because they were so far away, due
to your *waste* of vertical space.

The indentation is supposed to be crystal clear (My DOS port of GNUIndent
1.91). If you don't like it, just reindent the code with you own settings.
(K&R-style is more compact)
Then, what's the point in using fgets?

At a first glance, it seems quick and simple, but you are right, it is more
complicated than it sounded. I'm working on a pure fgetc() solution.
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

I agree that it's short and compact, but I'd call it cryptic.

What is the cryptic part? A simple fscanf call, followed by two simple
int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);

What is NN ? A macro? Are you sure a macro is replaced in a string?

I would have written it:

int rc = fscanf(fp, "%*[^\n]%1[\n]", NN, s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);

I think some parameters are missing for the two '*' and the two '%'.
Additionally, a '"' is missing too. Could you please fix that.

I feel free to call this code cryptic and hard to read and it seems that I'm
not alone.
tests, needed to decide what kind of cleanup is needed (if any).
This code also correctly handles the situation where your code fails
(the buffer has the exact size for the input line).


When three simple lines of code can do the job better than a couple
dozen, there is no question about which is the complicated solution.

We don't have the same definition for 'complicated'. You count the lines, I
try to read the code. That's the difference. Nobody win[s?]. Just a different
approach.
 
E

Emmanuel Delahaye

In said:
As I said sometime ago, none of these solutions (either scanf or fgets
based) is robust enough for production code, because they don't check
for embedded null characters in the input stream.

I see. A fully fgetc() solution should be better, isn't it?
 
E

Emmanuel Delahaye

In said:
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);

Not sure that the NN macro will be replaced in a string...

int rc = fscanf(fp, "%*[^\n]%1[\n]", NN, s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);

This part is very broken.

if (rc == 1) fscanf(fp said:
if (rc == 0) getc(fp);
 
E

Emmanuel Delahaye

In 'comp.lang.c' said:
Understood. The fix should not be too difficult, but I would imply a
strlen(), which is a waste of time.

Actually, I found a solution without strlen().
 
K

Kevin Easton

Emmanuel Delahaye said:
In 'comp.lang.c', (e-mail address removed) (Dan Pop) wrote: [...]
The fscanf equivalent is so simple that it can be used inline whenever
needed:

char s[NN + 1] = "", c;

int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);

I agree that it's short and compact, but I'd call it cryptic.

What is the cryptic part? A simple fscanf call, followed by two simple
int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);

What is NN ? A macro? Are you sure a macro is replaced in a string?

I think it's clear from context that NN is supposed to be replaced by
the human writing the code...
I would have written it:

int rc = fscanf(fp, "%*[^\n]%1[\n]", NN, s, &c);

Then you would have written something that doesn't work.
if (rc == 1) fscanf("%*[^\n]%*c);

I think some parameters are missing for the two '*' and the two '%'.

You need to bone up on scanf - in particular how %* works in the scanf
family as opposed to how it works in the printf family.

- Kevin.
 
M

Mark Mynsted

I see much nit-picking in this thread but who is willing to put up the
complete, unbreakable, replacement for Emmanuel Delahaye's code?

--
-MM
I rarely read email from this address /"\
because of spam. \ / ASCII Ribbon Campaign
I MAY see it if you put #NOTSPAM# X Against HTML Mail
in the subject line. / \
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top