frustrated by fscanf and sscanf

A

a

After reading FAQ comp.lang.c section 12 and googling again, still there is
no threads talking about reading a series of numbers. The input files,
somehow structured, is exemplified below:

<presence/absence of n space/tab on the first n lines>
12

<presence/absence of n space/tab here>0<presence/absence of n space/tab
here>90 10 23 43 0 0 0 0 0 0 0
90 0 0 0 0 88 0 0 0 0 0 0
10 0 0 0 0 0 26 16 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0 0
0 88 0 0 0 0 0 0 1 0 0 0
0 0 26 0 0 0 0 0 0 0 0 0
0 0 16 0 0 0 0 0 0 96 0 0
0 0 0 0 0 1 0 0 0 0 29 0
0 0 0 0 0 0 0 96 0 0 0 37
0 0 0 0 0 0 0 0 29 0 0 0
0 0 0 0 0 0 0 0 0 37 0 0
<presence/absence of n space/tab/newline here>
0 36 54 26 59 72 9 34 79 17 46 95
36 0 73 35 90 58 30 78 35 44 79 36
54 73 0 21 10 97 58 66 69 61 54 63
26 35 21 0 93 12 46 40 37 48 68 85
59 90 10 93 0 64 5 29 76 16 5 76
72 58 97 12 64 0 96 55 38 54 0 34
9 30 58 46 5 96 0 83 35 11 56 37
34 78 66 40 29 55 83 0 44 12 15 80
79 35 69 37 76 38 35 44 0 64 39 33
17 44 61 48 16 54 11 12 64 0 70 86
46 79 54 68 5 0 56 15 39 70 0 18
95 36 63 85 76 34 37 80 33 86 18 0
<presence/absence of n space/tab/newline here>
The unusual but deterministic behaviour is that some rows can be read
successfully but not the others. In this case, the first matrix is read
successfully, but the second one, nothing can be read (I think, printf shows
all zeros) until the 5 96 0 83 (the 7th row on 2nd matrix) but then the
reading is then on and off again (Frustrated >.< )

My code, after trying feof, fgets, sscanf and so, now is as follows:

while(!feof(file) && i< SIZE * SIZE + SIZE * SIZE) { //SIZE, an
integer obtained from reading the 1st line
fscanf(file, "%lf", &r);
i++;
}

I know they are integer matrices but the double data structure is needed for
further development.
 
U

user923005

After reading FAQ comp.lang.c section 12 and googling again, still there is
no threads talking about reading a series of numbers. The input files,
somehow structured, is exemplified below:

<presence/absence of n space/tab on the first n lines>
12

<presence/absence of n space/tab here>0<presence/absence of n space/tab
here>90 10 23 43 0 0 0 0 0 0 0
90 0 0 0 0 88 0 0 0 0 0 0
10 0 0 0 0 0 26 16 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0 0
0 88 0 0 0 0 0 0 1 0 0 0
0 0 26 0 0 0 0 0 0 0 0 0
0 0 16 0 0 0 0 0 0 96 0 0
0 0 0 0 0 1 0 0 0 0 29 0
0 0 0 0 0 0 0 96 0 0 0 37
0 0 0 0 0 0 0 0 29 0 0 0
0 0 0 0 0 0 0 0 0 37 0 0
<presence/absence of n space/tab/newline here>
0 36 54 26 59 72 9 34 79 17 46 95
36 0 73 35 90 58 30 78 35 44 79 36
54 73 0 21 10 97 58 66 69 61 54 63
26 35 21 0 93 12 46 40 37 48 68 85
59 90 10 93 0 64 5 29 76 16 5 76
72 58 97 12 64 0 96 55 38 54 0 34
9 30 58 46 5 96 0 83 35 11 56 37
34 78 66 40 29 55 83 0 44 12 15 80
79 35 69 37 76 38 35 44 0 64 39 33
17 44 61 48 16 54 11 12 64 0 70 86
46 79 54 68 5 0 56 15 39 70 0 18
95 36 63 85 76 34 37 80 33 86 18 0
<presence/absence of n space/tab/newline here>
The unusual but deterministic behaviour is that some rows can be read
successfully but not the others. In this case, the first matrix is read
successfully, but the second one, nothing can be read (I think, printf shows
all zeros) until the 5 96 0 83 (the 7th row on 2nd matrix) but then the
reading is then on and off again (Frustrated >.< )

My code, after trying feof, fgets, sscanf and so, now is as follows:

while(!feof(file) && i< SIZE * SIZE + SIZE * SIZE) { //SIZE, an
integer obtained from reading the 1st line
fscanf(file, "%lf", &r);
i++;
}

I know they are integer matrices but the double data structure is needed for
further development.


Suggestion:
The scanf() function is evil, as everyone knows.
I suggest using fgets() to read a line, and then parse the line into
individual numbers with something like this:
=======================================================================
#include <string.h>
#include <limits.h>
#include <stdlib.h>
#include <ctype.h>

/* The default delimiters are chosen as some ordinary white space
characters: */
static const char default_delimiters[] = {' ', '\n', '\t', '\r', '\f',
0};

/*
* The tokenize() function is similar to a reentrant version of
strtok().
* It parses tokens from 'string', where tokens are substrings
separated by characters from 'delimiter_list'.
* To get the first token from 'string', tokenize() is called with
'string' as its first parameter.
* Remaining tokens from 'string' are obtained by calling tokenize()
with NULL for the first parameter.
* The string of delimiters, identified by 'delimiter_list', can
change from call to call.
* If the string of delimiters is NULL, then the standard list
'default_delimiters' (see above) is used.
* tokenize() modifies the memory pointed to by 'string', because it
writes null characters into the buffer.
*/
char *tokenize(char *string, const char *delimiter_list,
char **placeholder)
{
if (delimiter_list == NULL)
delimiter_list = default_delimiters;

if (delimiter_list[0] == 0)
delimiter_list = default_delimiters;

if (string == NULL)
string = *placeholder;

if (string == NULL)
return NULL;
/*
* The strspn() function computes the length of the initial segment of
the first string
* that consists entirely of characters contained in the second
string.
*/
string += strspn(string, delimiter_list);
if (!string[0]) {
*placeholder = string;
return NULL;
} else {
char *token;
token = string;
/*
* The strpbrk() function finds the first occurrence of any character
contained in the second string
* found in the first string.
*/
string = strpbrk(token, delimiter_list);
if (string == NULL)
*placeholder = token + strlen(token);
else {
*string++ = 0;
*placeholder = string;
}
return token;
}
}

#ifdef UNIT_TEST
char test_string0[] = "This is a test. This is only a
test. If it were an actual emergency, you would be dead.";
char test_string1[] = "This is a also a test. This is only
a test. If it were an actual emergency, you would be dead. 12345";
char test_string2[] = "The quick brown fox jumped over the
lazy dog's back 1234567890 times.";
char test_string3[] = " \t\r\n\fThe quick brown fox jumped
over the lazy dog's back 1234567890 times.";
char test_string4[] = "This is a test. This is only a
test. If it were an actual emergency, you would be dead.";
char test_string5[] = "This is a also a test. This is only
a test. If it were an actual emergency, you would be dead. 12345";
char test_string6[] = "The quick brown fox jumped over the
lazy dog's back 1234567890 times.";
char test_string7[] = " \t\r\n\fThe quick brown fox jumped
over the lazy dog's back 1234567890 times.";

#include <stdio.h>

char whitespace[UCHAR_MAX + 1];

/* This test will create token separators as any whitespace or any
punctuation marks: */
void init_whitespace()
{
int i;
int index = 0;
for (i = 0; i < UCHAR_MAX; i++) {
if (isspace(i)) {
whitespace[index++] = (char) i;
}
if (ispunct(i)) {
whitespace[index++] = (char) i;
}
}
}

/*
TNX Gerd.
*/
void spin_test(char *test_string, char *white)
{
char *p = NULL;
char *token;
token = tokenize(test_string, white, &p);
while (token) {
puts(token);
token = tokenize(NULL, white, &p);
}
}

int main(void)
{
init_whitespace();
puts("Whitespace is whitespace+punctuation");
spin_test(test_string0, whitespace);
spin_test(test_string1, whitespace);
spin_test(test_string2, whitespace);
spin_test(test_string3, whitespace);
puts("Whitespace is simple whitespace");
spin_test(test_string4, NULL);
spin_test(test_string5, NULL);
spin_test(test_string6, NULL);
spin_test(test_string7, NULL);
return 0;
}
#endif
=======================================================================

And then read the numbers one at a time using sscanf() on the
fragment, checking the return of sscanf() each time. The problem with
scanf() is that you don't know where it went off. By splitting into
pieces you can easily find out where the trouble spots are and more
quickly diagnose the changes you will have to make.

There are (of course) many other alternatives.
 
S

santosh

a said:
After reading FAQ comp.lang.c section 12 and googling again, still
there is no threads talking about reading a series of numbers. The
input files, somehow structured, is exemplified below:

<snip description of input file structure>

Your description of the file format makes no sense at all. Do you mean
that the format of lines can change within the file.

Why not use a language like Perl which is expressly designed for such
purposes, at least to re-structure the file into a consistent format,
and then have the C program read the sanitised file?
The unusual but deterministic behaviour is that some rows can be read
successfully but not the others.

This indicates that your format specifiers are working for some rows,
but not for others. Reading data whose format varies on the fly with
the *scanf() family of functions is tricky. At the very least, why
don't you capture a line completely with fgets() and then try to pick
it apart with sscanf().

You should error check _every_ call to _every_ library function. This
way you can easily narrow down the input failure to a particular line
(provided you use the fgets()/sscanf() method I described). Then we can
say more about your problem.
In this case, the first matrix is
read successfully, but the second one, nothing can be read (I think,
printf shows all zeros) until the 5 96 0 83 (the 7th row on 2nd
matrix) but then the reading is then on and off again (Frustrated >.<
)

Such vague information along with statements like "I think..." is not
(unfortunately) going to be enough to help you. You must provide us the
compilable source code of a minimal program that still exhibits your
problem, along with a sample of your input file or a clearer
description of it's format.
My code, after trying feof, fgets, sscanf and so, now is as follows:

while(!feof(file) && i< SIZE * SIZE + SIZE * SIZE) {

feof() and ferror() make sense only _after_ a read operation has failed.
They are used to determine _why_ the read failed, because of
end-of-file or an error.

In this loop control expression you are trying to do too many things at
once and doing them incorrectly too.
//SIZE, an
integer obtained from reading the 1st line
fscanf(file, "%lf", &r);
i++;
}


This is hopeless. Please read in every line with fgets() and try to
convert it with sscanf(). Please check all library calls for failure.
Something like this:

#include <stdio.h>
#include <stdlib.h>
#define MAX_LINE 128

int main(void) {
char line[MAX_LINE];
int retval;
FILE *fp = fopen("input.file", "r");

if (!fp) return EXIT_FAILURE;

while (fgets(line, MAX_LINE, fp) != NULL) {
retval = sscanf(line, "%WHATEVER_FORMAT", &MATRIX_ELEMENT);
/* If retval does not contain the number of items you expect the
call to have successfully read and converted then something
went wrong with your format specifier and the concerned line.
*/
/* Other processing */
}
/* NOW you can check feof() and ferror() to determine why fgets()
returned NULL
*/
return STATUS;
}

<snip>
 
A

a

santosh said:
<snip description of input file structure>

Your description of the file format makes no sense at all. Do you mean
that the format of lines can change within the file.

Why not use a language like Perl which is expressly designed for such
purposes, at least to re-structure the file into a consistent format,
and then have the C program read the sanitised file?


I also agree that Perl does that elegantly by regular expression. However,
because the numbers are to be placed onto a matrix which will be operated on
by C codes, I need to write the whole program in C.
 
J

James Kuyper

a said:
After reading FAQ comp.lang.c section 12 and googling again, still there is
no threads talking about reading a series of numbers. The input files,
somehow structured, is exemplified below: ....
The unusual but deterministic behaviour is that some rows can be read
successfully but not the others. In this case, the first matrix is read
successfully, but the second one, nothing can be read (I think, printf shows
all zeros) until the 5 96 0 83 (the 7th row on 2nd matrix) but then the
reading is then on and off again (Frustrated >.< )

I took your example data, removing what I assumed was explanatory text
that's not in your actual data, and stored it in a file. I wrapped your
code fragment in a complete program, which set up everything
appropriately. You used feof() inappropriately, and SIZE violates the
usual conventions for naming what must be a variable in this program,
but I left those things uncorrected, since they shouldn't affect the
results.

// This code was written based upon a message posted by (e-mail address removed) on the
// usenet newsgroup comp.lang.c
// Message-ID: <[email protected]>
// Date: Wed, 28 Nov 2007 10:48:27 +0800
// The lines from that message are marked with //a. The rest of this program
// was written by James Kuyper to fill in a suitable context.
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
int SIZE;
int retval = EXIT_SUCCESS;
const char filename[] = "test.dat";
FILE *file = fopen(filename, "r");
double *r;

if(file == NULL)
{
perror(filename);
return EXIT_FAILURE;
}
if(fscanf(file, "%d ", &SIZE) != 1)

perror("SIZE");
retval = EXIT_FAILURE;
}
else if(SIZE <1 || SIZE_MAX/2/SIZE/SIZE < 1)
{
fprintf(stderr, "Unacceptable value for SIZE:%d\n", SIZE);
retval = EXIT_FAILURE;
}
else if((r=malloc(2*SIZE*SIZE*sizeof(*r)))==NULL)
{
fprintf(stderr, "Insufficient memory");
retval = EXIT_FAILURE;
}
else
{
int i;

printf("Reading 2 %dX%d arrays of double.\n", SIZE, SIZE);
while(!feof(file) && i< SIZE * SIZE + SIZE * SIZE) { //a
int n =
fscanf(file, "%lf", &r); //a
if(n != 1)
{
fprintf(stderr, "fscanf() returned %d\n", n);
break;
}
i++; //a
} //a
if(ferror(file))
{
perror(filename);
retval = EXIT_FAILURE;
}
printf("Elements read:%d\n", i);

free(r);
}

fclose(file);
return retval;
}

I compiled and ran my version of your program, with the following results:

~/testprog(77) make scan_array
cc -std=c99 -pedantic -Wall -Wpointer-arith -Wcast-align -Wwrite-strings
-Wstrict-prototypes -Wmissing-prototypes -c -o scan_array.o scan_array.c
cc scan_array.o -o scan_array
~/testprog(78) scan_array
Reading 2 12X12 arrays of double.
Elements read:288

Whatever the problem with your actual program is, it comes from
something that's different from what I wrote. Therefore, what you should
do is simplify your code as much as possible, while still demonstrating
the problem. Then post your ENTIRE program, not just a fragment. Post
your actual input file, not one with textual explanations stuck in the
middle, and finally include the exact output from your posted program,
when run using your posted data. Only then will be able to help you further.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top