How to read a flat file line by line

A

Adi

Hello eveyone,
I wanna ask a very simple question here (as it was quite disturbing
me for a long time.)
My problem is to read a file line by line. I've tried following
implementations but still facing problems:

Assume that
FILE* filePointer;
unsigned char lineBuffer[256];

1) Using fscanf: fscanf ( filePointer, "%[^\n]", lineBuffer);
The problem with this code is that fscanf behaviour is quite
unexpected. It sometimes increments the file pointer automatically and
sometime goes on reading the same line forerver.

2) Using fgets: fgets ( lineBuffer, 256, filePointer);
The problem here is that fgets also stores the '\n' in the
lineBuffer but what i need is just the complete line delimited by '\0'
before the end-of-line. Reason is that the file i'm reading might be a
DOS file where end-of-line is delimited by '\r\n'. So i want to skip
these end-of-line charater(s) no matter in which environment/OS the
file lies.

Is there any other way too to avoid this problem?? Please help me out
ASAP.
I'm using gcc version 3.4.5 on Red Hat Linux 7.1 2.96-79

Thanx in anticipation,
Adi
 
K

Keith Thompson

Adi said:
Assume that
FILE* filePointer;
unsigned char lineBuffer[256];
[...]
2) Using fgets: fgets ( lineBuffer, 256, filePointer);
The problem here is that fgets also stores the '\n' in the
lineBuffer but what i need is just the complete line delimited by '\0'
before the end-of-line. Reason is that the file i'm reading might be a
DOS file where end-of-line is delimited by '\r\n'. So i want to skip
these end-of-line charater(s) no matter in which environment/OS the
file lies.

If you've opened the file in text mode, the end-of-line will appear in
the string as a single '\n' character, regardless of how it's
represented in the external file.

If you've read a complete line with fgets(), it's easy enough to get
rid of the trailing '\n': just replace it with a '\0'. But don't
assume that the '\n' will be there; if the input line is longer than
your buffer, or if you reach end-of-file before the end of the file,
you can get a partial line iwth no trailing '\n'.
 
M

mdler

Hello Adi

Use fgets is the way I should do it

If you need only the string and not the control chars strip them like
this

fgets ( lineBuffer, 256, filePointer);

while(iscntrl(lineBuffer[strlen( lineBuffer)-1])
{
lineBuffer[strlen( lineBuffer)-1] = '\0';
}

Greetings Olaf

Adi schreef:
 
C

CBFalconer

Adi said:
.... snip ...

2) Using fgets: fgets ( lineBuffer, 256, filePointer);
The problem here is that fgets also stores the '\n' in the
lineBuffer but what i need is just the complete line delimited by
'\0' before the end-of-line. Reason is that the file i'm reading
might be a DOS file where end-of-line is delimited by '\r\n'. So
i want to skip these end-of-line charater(s) no matter in which
environment/OS the file lies.

So do many others (want to skip the '\n's). You can simply replace
fgets with ggets, the source for which is available at:

<http://cbfalconer.home.att.net/download/>

and is written in portable standard C, so it can be used anywhere.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
P

pete

Adi said:
Hello eveyone,
I wanna ask a very simple question here (as it was quite disturbing
me for a long time.)
My problem is to read a file line by line. I've tried following
implementations but still facing problems:

Assume that
FILE* filePointer;
unsigned char lineBuffer[256];

1) Using fscanf: fscanf ( filePointer, "%[^\n]", lineBuffer);
The problem with this code is that fscanf behaviour is quite
unexpected. It sometimes increments the file pointer automatically and
sometime goes on reading the same line forerver.

/* BEGIN pops_device.c */
/*
** If rc equals 0, then an empty line was entered
** and the array contains garbage values.
** If rc equals EOF, then the end of file was reached
** or there is some other problem.
** If rc equals 1, then there is a string in array.
** Up to LENGTH number of characters are read
** from a line of a text file or stream.
** If the line is longer than LENGTH,
** then the extra characters are discarded.
*/
#include <stdio.h>

#define LENGTH 12
#define str(x) # x
#define xstr(x) str(x)

int main(void)
{
int rc;
char array[LENGTH + 1];

puts("The LENGTH macro is " xstr(LENGTH));
fputs("Enter a string with spaces:", stdout);
fflush(stdout);
rc = fscanf(stdin, "%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) {
getchar();
}
while (rc == 1) {
printf("Your string is:%s\n\n"
"Hit the Enter key to end,\nor enter "
"another string to continue:", array);
fflush(stdout);
rc = fscanf(stdin, "%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) {
getchar();
}
if (rc == 0) {
*array = '\0';
}
}
return 0;
}

/* END pops_device.c */


2) Using fgets: fgets ( lineBuffer, 256, filePointer);
The problem here is that fgets also stores the '\n' in the
lineBuffer but what i need is just the complete line delimited by '\0'
before the end-of-line. Reason is that the file i'm reading might be a
DOS file where end-of-line is delimited by '\r\n'. So i want to skip
these end-of-line charater(s) no matter in which environment/OS the
file lies.

/* BEGIN products.c */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <ctype.h>

#define ARRAYSIZE 12
#define TAX(S) (0.06 * (S))
#define LIST { \
{"milk" , 2.59}, \
{"candy" , 1.21}, \
{"meat" , 1.69}, \
{"juice" , 1.29}, \
{"fruit" , 2.33} \
}
#define N_PRODUCTS (sizeof items / sizeof *items)

struct product {
char *name;
double price;
};

void Intro(void);
void Prompt(struct product *, size_t);
void Receipt(struct product *, size_t, unsigned *);
size_t OK_input(char *, size_t);
size_t OK_unsigned(long unsigned *);

int main(void)
{
struct product items[] = LIST;
unsigned quantity[N_PRODUCTS];
long unsigned quantity_temp;
size_t index = 0;

Intro();
do {
Prompt(items, index);
if (OK_unsigned(&quantity_temp)) {
quantity[index++] = quantity_temp;
}
} while (index != N_PRODUCTS);
Receipt(items, index, quantity);
return 0;
}

void Intro(void)
{
putchar('\n');
}

void Prompt(struct product *items, size_t index)
{
printf("How much %s would you like to order: ",
items[index].name);
fflush(stdout);
}

void
Receipt(struct product *items,
size_t n_products, unsigned *quantity)
{
size_t index;
double subtotal, increase, tax;

puts("\n"
"Item Price Quantity T-Price\n"
"-----------------------------------------");
for (subtotal = index = 0; index != n_products; ++index) {
increase = quantity[index] * items[index].price;
subtotal += increase;
printf("%-7s%6.2f%12u%16.2f\n",
items[index].name,
items[index].price, quantity[index], increase);
}
tax = TAX(subtotal);
printf("\n"
"%13s%-11s%17.2f\n"
"%13s%-11s%17.2f\n"
"%13s%-11s%17.2f\n",
"", "Subtotal ", subtotal,
"", "Sales Tax ", tax,
"", "Total Sales", subtotal + tax);
}

size_t OK_unsigned(long unsigned *quantity_ptr)
{
char array[ARRAYSIZE] = {'\0'};
size_t length;

length = OK_input(array, sizeof array);
if (length) {
if (strchr(array, '-')){
fputs("\nDon't use - in the number.\n", stderr);
length = 0;
} else {
char *endptr;

errno = 0;
*quantity_ptr = strtoul(array, &endptr, 10);
if (array != endptr - length) {
if (isdigit(endptr[-1])) {
size_t spindex = 0;

while (isspace(endptr[spindex])) {
++spindex;
if (endptr[spindex] == '\n') {
return 1;
}
}
}
fprintf(stderr, "\n"
"Don't use %-2s in the number.\n\n",
isspace(*endptr) ? "\bblank spaces"
: isprint(*endptr) ? endptr[1] = '\0', endptr
: "\bnonprinting characters");
length = 0;
} else {
if (errno || *quantity_ptr > ~0u) {
fprintf(stderr,"\n"
"Enter a number less than or equal "
"to %u.\n\n", ~0u);
length = 0;
}
}
}
}
return length;
}

size_t OK_input(char *array, size_t arraysize)
{
size_t length;

if (!fgets(array, arraysize, stdin) || feof(stdin)) {
if (ferror(stdin)) {
fputs("\n\n\nferror 1\n\n", stderr);
exit(EXIT_FAILURE);
}
if (strlen(array)) {
fputs("\n\n\nfeof 1\n", stderr);
exit(EXIT_FAILURE);
} else {
puts("\n\n\nDon't do that!\n");
clearerr(stdin);
length = 0;
}
} else {
length = strlen(array) - 1;
if (length && (array[length] != '\n')) {
do {
if(!fgets(array, arraysize, stdin) || feof(stdin)){
if (ferror(stdin)) {
fputs("\n\n\nferror 2\n", stderr);
exit(EXIT_FAILURE);
} else {
fputs("\n\n\nfeof 2\n", stderr);
exit(EXIT_FAILURE);
}
} else {
length = strlen(array) - 1;
}
} while (array[length] != '\n');
printf("\n"
"Don't type more than %lu character%s\n"
"before hitting the Enter key.\n\n",
arraysize - 2lu, arraysize == 3 ? "" : "s");
length = 0;
}
}
return length;
}

/* END products.c */

Is there any other way too to avoid this problem??

/* BEGIN line_to_string.c */

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>

struct list_node {
struct list_node *next;
void *data;
};

int line_to_string(FILE *fp, char **line, size_t *size);
void list_free(struct list_node *node, void (*free_data)(void *));
void list_fprint(FILE *stream, struct list_node *node);
struct list_node *string_node(struct list_node **head,
struct list_node *tail,
char *data);

int main(void)
{
struct list_node *head, *tail;
int rc;
char *buff_ptr;
size_t buff_size;
long unsigned line_count;

puts(
"\nThis program makes and prints a list of all the lines\n"
"of text entered from standard input.\n"
"Just hit the Enter key to end,\n"
"or enter any line of characters to continue."
);
tail = head = NULL;
line_count = 0;
buff_size = 0;
buff_ptr = NULL;
while ((rc = line_to_string(stdin, &buff_ptr, &buff_size)) > 1) {
++line_count;
tail = string_node(&head, tail, buff_ptr);
if (tail == NULL) {
break;
}
puts(
"\nJust hit the Enter key to end,\n"
"or enter any other line of characters to continue."
);
}
switch (rc) {
case EOF:
if (buff_ptr != NULL && strlen(buff_ptr) > 0) {
puts("rc equals EOF\nThe string in buff_ptr is:");
puts(buff_ptr);
++line_count;
tail = string_node(&head, tail, buff_ptr);
}
break;
case 0:
puts("realloc returned a null pointer value");
if (buff_size > 1) {
puts("rc equals 0\nThe string in buff_ptr is:");
puts(buff_ptr);
++line_count;
tail = string_node(&head, tail, buff_ptr);
}
break;
default:
break;
}
if (line_count != 0 && tail == NULL) {
puts("Node allocation failed.");
puts("The last line entered didn't make it onto the list:");
puts(buff_ptr);
}
free(buff_ptr);
puts("\nThe line buffer has been freed.\n");
printf("%lu lines of text were entered.\n", line_count);
puts("They are:\n");
list_fprint(stdout, head);
list_free(head, free);
puts("\nThe list has been freed.\n");
return 0;
}

int line_to_string(FILE *fp, char **line, size_t *size)
{
int rc;
void *p;
size_t count;

count = 0;
while ((rc = getc(fp)) != EOF) {
++count;
if (count + 2 > *size) {
p = realloc(*line, count + 2);
if (p == NULL) {
if (*size > count) {
(*line)[count] = '\0';
(*line)[count - 1] = (char)rc;
} else {
ungetc(rc, fp);
}
count = 0;
break;
}
*line = p;
*size = count + 2;
}
if (rc == '\n') {
(*line)[count - 1] = '\0';
break;
}
(*line)[count - 1] = (char)rc;
}
if (rc != EOF) {
rc = count > INT_MAX ? INT_MAX : count;
} else {
if (*size > count) {
(*line)[count] = '\0';
}
}
return rc;
}

void list_free(struct list_node *node, void (*free_data)(void *))
{
struct list_node *next_node;

while (node != NULL) {
next_node = node -> next;
free_data(node -> data);
free(node);
node = next_node;
}
}

void list_fprint(FILE *stream, struct list_node *node)
{
while (node != NULL) {
fputs(node -> data, stream);
putc('\n', stream);
node = node -> next;
}
}

struct list_node *string_node(struct list_node **head,
struct list_node *tail,
char *data)
{
struct list_node *node;

node = malloc(sizeof *node);
if (node != NULL) {
node -> next = NULL;
node -> data = malloc(strlen(data) + 1);
if (node -> data != NULL) {
if (*head == NULL) {
*head = node;
} else {
tail -> next = node;
}
strcpy(node -> data, data);
} else {
free(node);
node = NULL;
}
}
return node;
}

/* END line_to_string.c */


/* BEGIN type_1.c */

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <string.h>

#define ARGV_0 "type_1"

int line_to_string(FILE *fp, char **line, size_t *size);

int main(int argc, char *argv[])
{
int rc;
FILE *fd;
char *buff_ptr;
size_t buff_size;

buff_size = 0;
buff_ptr = NULL;
if (argc > 1) {
while (*++argv != NULL) {
fd = fopen(*argv, "r");
if (fd != NULL) {
while ((rc = line_to_string
(fd, &buff_ptr, &buff_size)) > 0)
{
switch (rc) {
case EOF:
if (buff_ptr != NULL
&& strlen(buff_ptr) > 0)
{
puts("rc equals EOF\n"
"The string in buff_ptr is:");
puts(buff_ptr);
}
break;
case 0:
puts("realloc returned a null pointer "
"value in line_to_string.");
if (buff_size > 1) {
puts("rc equals 0\n"
"The string in buff_ptr is:");
puts(buff_ptr);
}
break;
default:
puts(buff_ptr);
break;
}
}
fclose(fd);
} else {
fprintf(stderr,
"\nfopen() problem with \"%s\"\n", *argv);
break;
}
}
free(buff_ptr);
} else {
puts(
"Usage:\n>" ARGV_0
" <FILE_0.txt> <FILE_1.txt> <FILE_2.txt> ...\n"
);
}
return 0;
}

int line_to_string(FILE *fp, char **line, size_t *size)
{
int rc;
void *p;
size_t count;

count = 0;
while ((rc = getc(fp)) != EOF) {
++count;
if (count + 2 > *size) {
p = realloc(*line, count + 2);
if (p == NULL) {
if (*size > count) {
(*line)[count] = '\0';
(*line)[count - 1] = (char)rc;
} else {
ungetc(rc, fp);
}
count = 0;
break;
}
*line = p;
*size = count + 2;
}
if (rc == '\n') {
(*line)[count - 1] = '\0';
break;
}
(*line)[count - 1] = (char)rc;
}
if (rc != EOF) {
rc = count > INT_MAX ? INT_MAX : count;
} else {
if (*size > count) {
(*line)[count] = '\0';
}
}
return rc;
}

/* END type_1.c */
 
E

Eric Sosman

Keith said:
If you've opened the file in text mode, the end-of-line will appear in
the string as a single '\n' character, regardless of how it's
represented in the external file.
[...]

Let's add a caveat, disclaimer, and weasel-word to that
nice-sounding guarantee: It's only true if the text file is
"well-formed" according to the system's conventions. If you
somehow get hold of a "foreign" file that hasn't been properly
translated to the local dialect, all bets are off.

The O.P. doesn't actually say that he needs to read "\r\n"-
terminated lines on an "\n"-only system, but there seems to be
a whiff of that possibility in the air. Maybe he's reading a
text file that was FTP'ed in binary mode, or a foreign-format
file residing on a shared disk, or ... He writes of the "\r\n"
in a way that makes me think he's actually found them in his
buffers. (Maybe he hasn't, but it sounds that way.)
> If you've read a complete line with fgets(), it's easy enough to get
> rid of the trailing '\n': just replace it with a '\0'. [...]

Right. And if the trailing '\n' is immediately preceded by
an '\r' he could obliterate that, too, trying to patch things up
for a file that was transferred from a DOS-ish environment to a
different one without proper translation. Ultimately, though,
that's a losing proposition: There are just too many different
line-ending conventions kicking around, and trying to burden your
program with understanding all of them (especially after they've
been garbled by unsuccessful translation) is attacking the wrong
end of the problem. If the O.P. faces trouble of this kind, it's
better to fix the file transfer procedures than to try to repair
the damage afterwards -- get the transfer/translation done right,
and you're back in the nice state of affairs Keith describes.
 
A

Adi

Thanx buddy for replying but I'd say that I've also tried printing
length of buffer. So try the code below and see that length is 2 for
empty line in DOS file and 1 for Linux.
Though I've made a code to avoid that but I'd prefer a library function
rather than my own manipulations :(
So if anyone has a very short answer to my question, I'd rather be more
grateful!!
Also can anyone tell me why using fcanf ( filePointer, "%[^\n]",
lineBuffer); is not working...though it works for scanf???

#include <stdio.h>

int main(int argc, char *argv[])
{
int num, a, b;
char c, str[100];
FILE* fp;

if((fp = fopen(argv[1], "r")) != NULL)
while (fgets(str, 100, fp) != NULL)
{
//*(str + strlen(str) - 1) = '\0';
//fseek(fp, strlen(str) * sizeof(char), SEEK_CUR);
printf("%d)%s", strlen(str), str);
}
fclose(fp);
}

/*--------------code to remove trailing eon-of-file------*/
void strRemEOL(char* stringBuffer)
{
while(strlen(stringBuffer) && ( (stringBuffer[strlen(stringBuffer) -
1] == 13)\
|| ( stringBuffer[strlen(stringBuffer) - 1] == 10 )))
stringBuffer[strlen(stringBuffer) - 1] = 0;
}


Keith Thompson wrote:
--------------------------------
[...]
2) Using fgets: fgets ( lineBuffer, 256, filePointer);
The problem here is that fgets also stores the '\n' in the
lineBuffer but what i need is just the complete line delimited by '\0'
before the end-of-line. Reason is that the file i'm reading might be a
DOS file where end-of-line is delimited by '\r\n'. So i want to skip
these end-of-line charater(s) no matter in which environment/OS the
file lies.

If you've opened the file in text mode, the end-of-line will appear in
the string as a single '\n' character, regardless of how it's
represented in the external file.

If you've read a complete line with fgets(), it's easy enough to get
rid of the trailing '\n': just replace it with a '\0'. But don't
assume that the '\n' will be there; if the input line is longer than
your buffer, or if you reach end-of-file before the end of the file,
you can get a partial line iwth no trailing '\n'.
-----------------------------------------------------------------------
 
D

Default User

mdler said:
Hello Adi

Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or:
Use fgets is the way I should do it

If you need only the string and not the control chars strip them like
this

fgets ( lineBuffer, 256, filePointer);

while(iscntrl(lineBuffer[strlen( lineBuffer)-1])
{
lineBuffer[strlen( lineBuffer)-1] = '\0';
}

Computing strlen() twice in each iteration of the loop isn't the most
efficient way of doing things. Do it once, and decrement its value as
needed.

There shouldn't even be need to check for more than '\n' unless a text
file from a different OS was transfered or something.




Brian
 
E

Eric Sosman

mdler wrote On 09/08/06 03:29,:
Hello Adi

Use fgets is the way I should do it

If you need only the string and not the control chars strip them like
this

fgets ( lineBuffer, 256, filePointer);

while(iscntrl(lineBuffer[strlen( lineBuffer)-1])
{
lineBuffer[strlen( lineBuffer)-1] = '\0';
}

Unsafe loop: What if you read a line consisting
of exactly one '\n' plus the terminating '\0'?

On the first iteration, strlen() will return 1
and you'll set lineBuffer[1-1] = '\0'. Fine so far.

On the second iteration, strlen() will return 0.
Subtracting 1 gives a large positive value (recall that
strlen() returns a value of type size_t, which is an
unsigned integer type); this value will be at least
65535, but on most systems it will be 4294967295 --
much larger than 255, in any event. Attempting to
inspect lineBuffer[(size_t)-1] yields undefined behavior;
so does the attempt to store there, if one is made.

... and although the run-off-the-start-of-the-array
bug is probably more serious, there is yet another bug
in the code, a bug I've grown weary of ranting about and
will leave as an exercise for the Astute Reader.
 
B

Ben Pfaff

Eric Sosman said:
mdler wrote On 09/08/06 03:29,:
fgets ( lineBuffer, 256, filePointer);

while(iscntrl(lineBuffer[strlen( lineBuffer)-1])
{
lineBuffer[strlen( lineBuffer)-1] = '\0';
}

Unsafe loop: What if you read a line consisting
of exactly one '\n' plus the terminating '\0'?

Furthermore: What if you read a line that begins with '\0'?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top