Reading Simple Flatfiles with a COBOL Mindset

K

KevinD

assumption: I am new to C and old to COBOL

I have been reading a lot (self teaching) but something is not sinking
in with respect to reading a simple file - one record at a time.
Using C, I am trying to read a flatfile. In COBOL, my simple file
layout and READ statement would look like below.

Question: what is the standard, simple coding convention for reading
in a flatfile - one record at a time?? SCANF does not work because of
spaces; I tried FGETS and STRUCT to emulate my COBOL perspective but
that does not work (though I may have coding this wrong). C likes to
deliver data in streams but FGETS is akin to reading a single record.

I know I am missing something that is very simple but the examples
that I have come across avoid this simple scenario. Please explain -
an example would be great.

thanks
kevin


.......

01 employee-record.
03 emp-id pic 9(5).
03 emp-dept pic x(5).
03 emp-name.
05 emp-name-last pic x(20).
05 emp-name-first pic x(20).
03 emp-hire-date.
05 emp-hire-date-mm pic 9(2).
05 emp-hire-date-dd pic 9(2).
05 emp-hire-date-yy pic 9(4).

read employee-flatfile into employee-record.
 
D

Dave Vandervies

assumption: I am new to C and old to COBOL

If you haven't already, get a copy of K&R2[1] and read it. If you've
done programming of almost any sort before, it's as good an introduction
to C as you'll find.

Also worth reading is Steve Summit's C FAQ at
http://www.eskimo.com/~scs/C-faq/top.html . The HTML version is somewhat
out of date, but the first link on the page points you at alternate
versions, including a more up-to-date text one.


Now on to your question...
I have been reading a lot (self teaching) but something is not sinking
in with respect to reading a simple file - one record at a time.
Using C, I am trying to read a flatfile. In COBOL, my simple file
layout and READ statement would look like below.

Question: what is the standard, simple coding convention for reading
in a flatfile - one record at a time?? SCANF does not work because of
spaces; I tried FGETS and STRUCT to emulate my COBOL perspective but
that does not work (though I may have coding this wrong). C likes to
deliver data in streams but FGETS is akin to reading a single record.

struct[2] is, if I'm not grossly misunderstanding you, approximately
equivalent to what you're thinking about as a "record", so you'll
probably want to eventually stuff the data you read into a struct and
return that struct.

fgets reads a *line* at a time. It looks like your file format uses
multiple lines per record, so you'd end up calling fgets multiple times
and then looking at what's in each line.

If this is a fixed format (you know that the next line you read when you
start will be the "01 employee-record." followed by 9 lines of data as
described below), things are a little bit easier, but a fully general
routine to read such a file format would read a line (with fgets), figure
out what it's describing the beginning of (sscanf would be a good place to
start for that, though it may end up not being what you need), and then
read the next lines (with fgets again) and extract the appropriate data
(with sscanf, various str* functions, and/or your own parsing code).
At each step the data extracted would be put somewhere accessible,
probably into a struct that you end up returning.

(You obviously don't want to do all this inline every time you want
to read a record, so once you've got it working wrap it up nicely in a
function so that when you want to actually read a record it's a one-line
function call.)

01 employee-record.
03 emp-id pic 9(5).
03 emp-dept pic x(5).
03 emp-name.
05 emp-name-last pic x(20).
05 emp-name-first pic x(20).
03 emp-hire-date.
05 emp-hire-date-mm pic 9(2).
05 emp-hire-date-dd pic 9(2).
05 emp-hire-date-yy pic 9(4).

As a zeroth approximation to what you'd want:
--------
struct employee_name
{
char *last;
char *first;
};
struct employee_hire_date
{
int month;
int day;
int year;
};
struct employee_record
{
EMP_ID_TYPE/*int?*/ emp_id;
DEPT_TYPE/*int?*/ emp_dept;
struct employee_name emp_name;
struct employee_hire_date emp_hire_date;
};

struct employee_record read_employee_record(FILE *in)
{
char buf[1024];
struct employee_record ret;

fgets(buf,sizeof buf,in);
/*Check that this is the start-of-record line
-OR-
assume that we're called because something else read that line
(which means this bit belongs in the caller)
*/

fgets(buf,sizeof buf,in);
/*Check that this line is the emp-id line, and extract the
value represented into ret.emp_id
*/

/*Similar to above for emp_dept*/

/*read_employee_name() will read the lines describing the name,
extract the relevant information, and return it packed into
a struct employee_name
*/
ret.emp_name=read_employee_name(in);

/*Similar to above for emp_hire_date*/

return ret;
}

/*And when you want to read a record, do something like
the_record=read_employee_record(my_input_file);
*/
--------


dave

[1] "The C Programming Language, 2nd edition", Brian W. Kernighan
and Dennis M. Ritchie, ISBN 0-13-110362-8 (paperback), 0-13-110370-9
(hardback).

[2] C is case-sensitive. Get used to using lower-case when you're talking
about parts of the language, since that makes it easier for C
programmers to understand.
 
E

Eric Sosman

KevinD said:
assumption: I am new to C and old to COBOL

I have been reading a lot (self teaching) but something is not sinking
in with respect to reading a simple file - one record at a time.
Using C, I am trying to read a flatfile. In COBOL, my simple file
layout and READ statement would look like below.

Question: what is the standard, simple coding convention for reading
in a flatfile - one record at a time?? SCANF does not work because of
spaces; I tried FGETS and STRUCT to emulate my COBOL perspective but
that does not work (though I may have coding this wrong). C likes to
deliver data in streams but FGETS is akin to reading a single record.

I know I am missing something that is very simple but the examples
that I have come across avoid this simple scenario. Please explain -
an example would be great.

C's I/O streams have no notion of "record," aside from
the fairly weak notion of "line" as expressed in fgets() and
a few other, relatively obscure parts of the library.

But don't despair. Ask yourself "What does a record
look like, when thought of as an undifferentiated stream of
bytes?" Then read the appropriate number of bytes from the
stream and impose your interpretation on them: The first
five are alphabetic characters denoting a stock ticker
symbol, the next ten are decimal digits giving the last
trade price in millicents, the next twenty are alphabetics
giving the name of the latest executive to serve his
company from behind bars, and so on. Extract whatever's
needed from this layout, convert it to more convenient forms
if you like (e.g., the fields of decimal digits might become
`int' or `double' values), and away you go.

C streams come in two principal flavors (three, really,
but I have a hunch you're not interested in wide characters
just yet): there are text streams and binary streams. If
your data file looks like a bunch of lines consisting of
textual characters, you should access it with a text stream:
use "r" as the second argument to fopen(). But if your
file contains "binary garbage" like numbers expressed in
binary or packed decimal format, use a binary stream: pass
"rb" as fopen()'s second argument.

In the binary case, it *may* be that you can use fread()
to plop a fixed number of bytes from the file straight into
a properly-arranged struct, and be on your way without any
need for further interpretation. However, there are lots of
pitfalls in this approach, and I couldn't recommend it without
knowing a lot more about your situation than I'm likely to
have time to discover.
 
M

Mike Wahler

KevinD said:
assumption: I am new to C and old to COBOL

I have been reading a lot (self teaching) but something is not sinking
in with respect to reading a simple file - one record at a time.

I think the conceptual 'gap' is because you perhaps don't
realize that C's i/o system has no notion of a 'record'.
Everything is a 'stream of characters'. If you want to
impose some sort of 'structure' such as a fixed record
length, you do that yourself in your code.
Using C, I am trying to read a flatfile. In COBOL, my simple file
layout and READ statement would look like below.

Question: what is the standard, simple coding convention for reading
in a flatfile - one record at a time??

There isn't one, since there's no notion of 'record'.
SCANF does not work because of
spaces; I tried FGETS

'fgets()' reads up to a newline or end of file, thus it effectively
reads variable length 'records', delimited by newline characters.
and STRUCT

'struct' (not the all lower-case, C is case-sensitive), is indeed
part of the solution.
to emulate my COBOL perspective but
that does not work (though I may have coding this wrong). C likes to
deliver data in streams but FGETS is akin to reading a single record.

Sort of. :)

If you want a fixed 'flat' record size,

Open your file in binary mode
(see second argument to 'fopen()')

Create a record type using the 'struct' keyword, e.g.

struct record
{
char name[30];
char phone[16];
};

Read the file with 'fread()', and write to it with 'fwrite()' (these
are the 'unformatted' i/o functions ). Move around in the file with
'fseek()'. Also 'ftell()' may be of use.

Finally, note that this will render your data file platform-specific.
(Binary representations of data can and do vary among platforms).

-Mike
 
C

CBFalconer

KevinD said:
I have been reading a lot (self teaching) but something is not
sinking in with respect to reading a simple file - one record at
a time. Using C, I am trying to read a flatfile. In COBOL, my
simple file layout and READ statement would look like below.

C doesn't have records, in your sense. Even if you create a
struct that mirrors your record, there is no guarantee that
writing it to or from a file mimics anything for any other
compiler or system.

What C does have is streams of bytes that can be read to or from a
file. Sometimes these may be characters, with lines demarcated by
newline markers, and then known as a text file. Some systems have
special processing for text files, others do not.

Your task, should you deign to accept it, is to discover the exact
file format required, in terms of a sequence of byte values, and
design code to transfer suitable blocks to and from the files.

I suspect that all the fields shown in your Cobol example are
actually text fields in that they hold representation of chars in
some code or other. If they are EBCDIC they will usually need
translation for a C system.
 
S

SM Ryan

(e-mail address removed) (KevinD) wrote:
# assumption: I am new to C and old to COBOL
#
# I have been reading a lot (self teaching) but something is not sinking
# in with respect to reading a simple file - one record at a time.
# Using C, I am trying to read a flatfile. In COBOL, my simple file
# layout and READ statement would look like below.
#
# Question: what is the standard, simple coding convention for reading
# in a flatfile - one record at a time?? SCANF does not work because of
# spaces; I tried FGETS and STRUCT to emulate my COBOL perspective but
# that does not work (though I may have coding this wrong). C likes to
# deliver data in streams but FGETS is akin to reading a single record.
#
# I know I am missing something that is very simple but the examples
# that I have come across avoid this simple scenario. Please explain -
# an example would be great.
#
# thanks
# kevin
#
#
# ......

# 01 employee-record.
# 03 emp-id pic 9(5).
# 03 emp-dept pic x(5).
# 03 emp-name.
# 05 emp-name-last pic x(20).
# 05 emp-name-first pic x(20).
# 03 emp-hire-date.
# 05 emp-hire-date-mm pic 9(2).
# 05 emp-hire-date-dd pic 9(2).
# 05 emp-hire-date-yy pic 9(4).

On most C implementations you can overlay a struct of just char[] fields
on a char string without worrying about padding. However to be safer and
to do the type conversions (most C implementation require decimal to binary
conversion to do arithmetic and zero byte terminated strings), I would be
inclined to write a parser routine.

struct employee_record {
int emp_id;
char emp_dept[5+1]; /*+1 for zero byte terminator*/
struct {
char emp_name_last[20+1];
char emp_name_first[20+1];
} emp_name;
struct {
int emp_hire_date_mm;
int emp_hire_date_dd;
int emp_hire_date_yy;
} emp_hire_date;
};

static long pic9(int n,char *line,int *pos) {
long num; char *t = malloc(n+1); memcpy(t,line+*pos,n); t[n] = 0;
*pos += n;
num = strtol(t,0,10); free(t);
return num;
}

static void picx(char *string,int n,char *line,int *pos) {
memcpy(string,line+*pos,n); string[n] = 0;
*pos += n;
}

static int read_employee_record(
FILE *employee_flatfile,
struct employee_record *employee_record
) {
char line[5+5+20+20+2+2+4+2];
if (fgets(line,sizeof line,employee_flatfile)) {
char *nl = strchr(line,'\n');
if (nl) *nl = 0;
if (strlen(line)==sizeof line-2) {
int pos = 0;
employee_record->emp_id = pic9(5,line,&pos);
picx(employee_record->emp_dept,5,line,&pos);
picx(employee_record->emp_name.emp_name_last,20,line,&pos);
picx(employee_record->emp_name.emp_name_first,20,line,&pos);
employee_record->emp_hire_date.emp_hire_date_mm = pic9(2,line,&pos);
employee_record->emp_hire_date.emp_hire_date_dd = pic9(2,line,&pos);
employee_record->emp_hire_date.emp_hire_date_yy = pic9(4,line,&pos);
return 0;
}else
return -1;
}else
return -1;
}

# read employee-flatfile into employee-record.

read_employee_record(employee_flatfile,&employee_record);
 
D

Dominic Shields

I tried FGETS and STRUCT to emulate my COBOL perspective but
that does not work (though I may have coding this wrong). C likes to
deliver data in streams but FGETS is akin to reading a single record.

I came from a Cobol background to C, after a bit of head-scratching I
used fgets and have never had a reason to do anything else (with
character data that represents records).

I found it useful to ditch the idea of worrying about recreating
Picture clauses with struct. The main reason for this is that in the
ICL VME enviroment I worked in record lengths were mostly fixed - OK
there were occurs depending bits on the end sometimes but in the Unix
environment I have found variable length delimited "records" to be the
norm.

I use the word "record" to mean that which is in between two newlines.
 
D

Dominic Shields

I tried FGETS and STRUCT to emulate my COBOL perspective but
that does not work (though I may have coding this wrong). C likes to
deliver data in streams but FGETS is akin to reading a single record.

I came from a Cobol background to C, after a bit of head-scratching I
used fgets and have never had a reason to do anything else (with
character data that represents records).

I found it useful to ditch the idea of worrying about recreating
Picture clauses with struct. The main reason for this is that in the
ICL VME enviroment I worked in record lengths were mostly fixed - OK
there were occurs depending bits on the end sometimes but in the Unix
environment I have found variable length delimited "records" to be the
norm.

I use the word "record" to mean that which is in between two newlines.
 
P

pete

I use the word "record" to mean that which is in between two newlines.

I use the word "line" when discussing streams.

N869
7.19.2 Streams
[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top