newby question on C I/O

A

analyst41

I have a file that looks like

Value Cost Special_Status
12    34          Yes
21    44          yes
32    43           no
.....................

I can read it into some arrays (or a data frame using R) using Fortran
trivially

         subroutine getdat(values,costs,statuses.items)

        character statuses*3
        real values,costs
        dimension values(1000),Costs(1000),Statuses(1000)

         integer items, j

        open (unit=1,file="filename")

        read (1,*)

        j = 0
        do
        j = j + 1
       read (1,*,end=100) values(j),costs(j),statuses(j)
       enddo

100 continue

      items = j - 1

      stop
      end

Seems much harder in C - can anyone post an intuitive, easy solution
to do this in C?

Thanks.

Now that my problem has been solved, I wish to point out that this is
a one-liner in R:

if data.txt looks like

value, cost, premium_status
22 , 43, "yes"
54 , 32 , "no"
65 , 43 , "yes"

In R you would do (a is an object called a data frame - like a table
in SQL terminology).
a = read.csv("data.txt",stringsAsFactors=FALSE)
a
value cost premium_status
1 22 43 yes
2 54 32 no
3 65 43 yes
mode(a$value) [1] "numeric"
mode(a$cost) [1] "numeric"
mode(a$premium_status)
[1] "character"


R would treat the header values as column names and even assign a type
to each column.
 
M

Malcolm McLean

On Nov 4, 5:07 pm, "(e-mail address removed)" <[email protected]>

Now that my problem has been solved, I wish to point out that this is
a one-liner in R:

a = read.csv("data.txt",stringsAsFactors=FALSE)
I've got a readcsv C subroutine on my website. it's many line sof C,
but, once written, it allows you to read a csv file in one call.

The main problem with the code is that generating nans and testing fro
them, portably, is difficult.
 
K

Kenny McCormack

I've got a readcsv C subroutine on my website. it's many line sof C,
but, once written, it allows you to read a csv file in one call.

The main problem with the code is that generating nans and testing fro
them, portably, is difficult.

I assume you are referring to:

http://www.malcolmmclean.site11.com/www/CSVtoC/csvtoc.html

Be aware that that generates C code as output, which is not what one usually
expects to get from a CSV parser.

Unless I am mistaken...
 
M

Malcolm McLean

I assume you are referring to:

   http://www.malcolmmclean.site11.com/www/CSVtoC/csvtoc.html

Be aware that that generates C code as output, which is not what one usually
expects to get from a CSV parser.

Unless I am mistaken...
If you take the file csv.c and csv.h you get a general-purpose csv
file loader, which obviously needs a surrounding C program or program
with a C interface to use. The utility is a stand alone program built
on top of that.
 
K

Kenny McCormack

If you take the file csv.c and csv.h you get a general-purpose csv
file loader, which obviously needs a surrounding C program or program
with a C interface to use. The utility is a stand alone program built
on top of that.

Ok, so we *are* talking about the same thing, right? We agree on that?

My point is this: If I am reading it correctly, the output of running the
program is two pieces of C code - 1 that defines the struct and the other
that contains the data itself (as an initializer). So, the point is that
the data is fixed at compile time.

Which coudl certainly be useful, in some contexts, but is certainly not what
I think an average person would expect from a "CSV loader".

--
"The anti-regulation business ethos is based on the charmingly naive notion
that people will not do unspeakable things for money." - Dana Carpender

Quoted by Paul Ciszek (pciszek at panix dot com). But what I want to know
is why is this diet/low-carb food author doing making pithy political/economic
statements?

Nevertheless, the above quote is dead-on, because, the thing is - business
in one breath tells us they don't need to be regulated (which is to say:
that they can morally self-regulate), then in the next breath tells us that
corporations are amoral entities which have no obligations to anyone except
their officers and shareholders, then in the next breath they tell us they
don't need to be regulated (that they can morally self-regulate) ...
 
M

Malcolm McLean

Ok, so we *are* talking about the same thing, right?  We agree on that?

My point is this:  If I am reading it correctly, the output of running the
program is two pieces of C code - 1 that defines the struct and the other
that contains the data itself (as an initializer).  So, the point is that
the data is fixed at compile time.

Which coudl certainly be useful, in some contexts, but is certainly not what
I think an average person would expect from a "CSV loader".
Yes, if you treat the source as a whole program, then it's a CSV to C
data structure utility. It's intended for when you've got a lot of
data in CSV files that you want to incorporate at compile time into
your C programs, without rekeying.

However the code to load a CSV file has been abstracted out and placed
in the file csv.c. So you can take this file and only this file, and
you have a general purpose CSV loader, very similar to R's read.csv()
function.
 
K

Kenny McCormack

Malcolm McLean said:
However the code to load a CSV file has been abstracted out and placed
in the file csv.c. So you can take this file and only this file, and
you have a general purpose CSV loader, very similar to R's read.csv()
function.

Got it. Yes, csv.c looks like what I expected. It looks like it may be
worth looking into...

--
No, I haven't, that's why I'm asking questions. If you won't help me,
why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.
 
D

David Thompson

The '*' in that READ statement, where a FORMAT would ordinarily go,
indicates implicit formatting. A key difference between C and Fortran is

Yes. Formally called 'list-directed' formatting.
that I/O is a built-in part of the Fortran language, while in C I/O is
handled by standard library functions. Given the way C functions work,
that means that something equivalent to the implicit formatting used by
the READ statement above isn't feasible in C. The simplest approach

Well, the syntax is a little different, but so is C stdio vs Fortran
classic explicit-format. Functionally, what Fortran list-directed
input does is a little different from what Standard C *scanf does, so
you would have to write maybe a page or two of code to fullly handle
it, which you could put in a utility function and reuse as needed.

OTOH, if you want Fortran input, using Fortran is usually easier. On
practically all systems you can do I/O in Fortran and call C, or
nearly always vice versa. The syntax is standardized in F>=03; before
that it had to be an extension and varied some.
would be to use fscanf():
while(fscanf(infile, "%f%f%3s\n",
values[j], costs[j], statuses[j]) == 3)

Missing &s already noted.

I tried to keep this code simple, so it's can be compared to your
Fortran. I added only a little more error handling than is present in
your code. However, by my standards, this code is still less than ideal,
because it doesn't behave well in the face of format errors in the input
file.

If any line contains a number too big to be represented as a floating
point value, the behavior of fscanf() is undefined. If one of the

In principle, although in practice this particular UB is usually not
bad. Especially for floating-point (as this case is).
numbers is incorrectly formatted, the error detection capabilities of
this approach are limited. If any line contains too many fields, or too

Format error detection is adequate; correction is lacking.
few, the remaining lines in the file will be parsed incorrectly, because
fscanf() doesn't attach any special significant to new lines; they're
just whitespace, equivalent to tabs or spaces. I'm not sufficiently
familiar with the Fortran READ statement to be sure, but I suspect that
many of those issues would be equally problematic for your Fortran code.
The Fortran standard requires well-defined handling of whatever the
implementation classifies as an error condition (set iostat or goto
label cleanly or abort cleanly) but it doesn't say format error or
out-of-range is so classified. A *decent* implementation will do so
(and then, conformingly, handle them correctly).

The standard does define that too few values advance to the next line
(or you can use a terminator mark but the OP didn't) and too many
values (possibly after advancing) are ignored (unless you use
nonadvancing or stream, which the OP didn't).
I would use fgets() to fill in a line buffer, so I can process the data
one line at a time, checking for the possibility that the line being
read in is longer than the buffer. Format errors on one line won't carry
over to other lines.
Yep, for input that is actually line-oriented that's usually best.
 
J

James Kuyper

Yes. Formally called 'list-directed' formatting.


Well, the syntax is a little different, but so is C stdio vs Fortran
classic explicit-format. Functionally, what Fortran list-directed
input does is a little different from what Standard C *scanf does, so
you would have to write maybe a page or two of code to fullly handle
it, which you could put in a utility function and reuse as needed.

The key characteristic of Fortran list-directed input that I was talking
about is the fact that you don't have to identify the types of the
expressions you are printing; that information is implicitly determined
by looking at the expressions themselves. C++ function overloading makes
it feasible to do something similar in the <iostream> part of the C++
standard library (though you do have to provide your own overloads for
operator<<() and operator>>() for any user-defined type), but I don't
see any way to implement such a feature in "a page or two of code" in C.
Could you explain how that would work?
 
D

David Thompson

The key characteristic of Fortran list-directed input that I was talking
about is the fact that you don't have to identify the types of the
expressions you are printing; that information is implicitly determined
by looking at the expressions themselves. C++ function overloading makes
it feasible to do something similar in the <iostream> part of the C++
standard library (though you do have to provide your own overloads for
operator<<() and operator>>() for any user-defined type), but I don't
see any way to implement such a feature in "a page or two of code" in C.
Could you explain how that would work?

Aha. You were distinguishing all Fortran I/O (with special statements
which are aware of types) from C (with function calls that aren't).

I thought you were focussing on the difference between implicit (star)
format and explict format like 'I4, F9.2', and for that matter
namelist which also does formatting/parsing but slightly different.
That is what I meant a page or so of C can do.

Yes, using actual types to drive or check formatting/parsing is a
feature of Fortran (and some other languages) not available in C.

One approach I have used in a few cases involving formatting/parsing
of a fairly large number of structure types is a code generator that
produces C declarations and corresponding C I/O routines. I've also
seen some decent table-driven methods. Both of those are add-ons.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top