Need Help with C program

T

terry

I am a programmer (cobol, peoplesoft, sqr, etc.) so I am familiar
with programming logic, etc. but not very familiar with C. I need a C
program in a study I'm doing. The program is fairly simple, but not
familiar with C code it would take me some time to get it to work. A
good C programmer can probably give me the code in a few minutes.

Here's the program specs:

I'm doing a study on the italicized words in the King James Bible. The
italicized words are words not in the Greek and Hebrew text but were
added by the translators for sentence structure, etc.

I have a text file of the Old Testament and the New Testament. The
italicized words are surrounded by brackets []. If a verse contains
any italicized words I want to write that verse to a new flat-file
that I will use in Word to look at, get counts, and do some grammar
statistics, etc.

Here's a sample of the input flat-file with a few verses from Genesis.
Notice verses 2, 4, and 7 contain italicized words or brackets.

1 In the beginning God created the heaven and the earth.
2 And the earth was without form, and void; and darkness [was] upon
the face of the deep. And the Spirit of God moved upon the face of the
waters.
3 And God said, Let there be light: and there was light.
4 And God saw the light, that [it was] good: and God divided the light
from the darkness.
5 And God called the light Day, and the darkness he called Night. And
the evening and the morning were the first day.
6 And God said, Let there be a firmament in the midst of the waters,
and let it divide the waters from the waters.
7 And God made the firmament, and divided the waters which [were]
under the firmament from the waters which [were] above the firmament:
and it was so.
8 And God called the firmament Heaven. And the evening and the morning
were the second day.

So my output file would look like

2 And the earth was without form, and void; and darkness [was] upon
the face of the deep. And the Spirit of God moved upon the face of the
waters.
4 And God saw the light, that [it was] good: and God divided the light
from the darkness.
7 And God made the firmament, and divided the waters which [were]
under the firmament from the waters which [were] above the firmament:
and it was so.

The logic I came up with would be something like:

Read a character from the flat-file:
Check for a number
If number (indicates a new verse)
If a "[" is found (found_flag) from previous verse
write the WORK AREA stored verse to the output file.
Possibly need to write an eol character
Clear out the work area
Clear out the found [ flag

If a [ is NOT found (found-flag) then
clear the stored verse

Move each character to a WORK AREA
Check each character for a "["
If found set the found_flag = y

Got get another character

Another variation I would like to do is create an output file of JUST
the italicized words or bracketed words.

A couple of questions or issues:

How large a file can C read?
The Old Testament file is 3,282,275 characters (size is 3,342,336
bytes) If needed I could cut up the files.

Is there a better way?

I will probably use Borland C+ as the compiler.

If possible, please email any solutions to (e-mail address removed)

Thank you very much for your time and expertise.
 
T

Tom Zych

terry said:
I am a programmer (cobol...

My condolences :)
.... I need a C
program in a study I'm doing. The program is fairly simple, but not
familiar with C code it would take me some time to get it to work. A
good C programmer can probably give me the code in a few minutes.

See below.

[examples and pseudocode snipped]

Your spec is well written and your logic looks like it would work,
bearing in mind you have to deal with the beginning and end of the
file.
Another variation I would like to do is create an output file of JUST
the italicized words or bracketed words.

Much easier, of course.
How large a file can C read?
The Old Testament file is 3,282,275 characters (size is 3,342,336
bytes) If needed I could cut up the files.

If you slurp the whole thing in one go you'll need an array of that
many chars. Probably not difficult on any modern general-purpose
computer. I'd break the reads into 32k or so at a time, though.

If you read and process one line at a time there's no limit to
overall file size. IIRC there is an implementation-defined limit to
the maximum line length, which might cause problems if your input
only has newlines at the end of paragraphs.
I will probably use Borland C+ as the compiler.

We don't worry about that here. We worry about code that will work
on any platform.
If possible, please email any solutions to (e-mail address removed)
Thank you very much for your time and expertise.

We don't generally hand people their work on a silver platter here,
either. Some people will quote you their consulting rates.
 
K

Kris Wempa

Do you know how big the biggest verse is ? Assuming that numbers are verse
delimiters and CANNOT appear in the actual verse itself, this should be an
easy problem. Allocate a buffer for the biggest verse. Read the entire
verse into the buffer. Determine if there is a '[' in the verse with
strchr. If there is, write the whole buffer to your output/file where you
want it.

terry said:
I am a programmer (cobol, peoplesoft, sqr, etc.) so I am familiar
with programming logic, etc. but not very familiar with C. I need a C
program in a study I'm doing. The program is fairly simple, but not
familiar with C code it would take me some time to get it to work. A
good C programmer can probably give me the code in a few minutes.

Here's the program specs:

I'm doing a study on the italicized words in the King James Bible. The
italicized words are words not in the Greek and Hebrew text but were
added by the translators for sentence structure, etc.

I have a text file of the Old Testament and the New Testament. The
italicized words are surrounded by brackets []. If a verse contains
any italicized words I want to write that verse to a new flat-file
that I will use in Word to look at, get counts, and do some grammar
statistics, etc.

Here's a sample of the input flat-file with a few verses from Genesis.
Notice verses 2, 4, and 7 contain italicized words or brackets.

1 In the beginning God created the heaven and the earth.
2 And the earth was without form, and void; and darkness [was] upon
the face of the deep. And the Spirit of God moved upon the face of the
waters.
3 And God said, Let there be light: and there was light.
4 And God saw the light, that [it was] good: and God divided the light
from the darkness.
5 And God called the light Day, and the darkness he called Night. And
the evening and the morning were the first day.
6 And God said, Let there be a firmament in the midst of the waters,
and let it divide the waters from the waters.
7 And God made the firmament, and divided the waters which [were]
under the firmament from the waters which [were] above the firmament:
and it was so.
8 And God called the firmament Heaven. And the evening and the morning
were the second day.

So my output file would look like

2 And the earth was without form, and void; and darkness [was] upon
the face of the deep. And the Spirit of God moved upon the face of the
waters.
4 And God saw the light, that [it was] good: and God divided the light
from the darkness.
7 And God made the firmament, and divided the waters which [were]
under the firmament from the waters which [were] above the firmament:
and it was so.

The logic I came up with would be something like:

Read a character from the flat-file:
Check for a number
If number (indicates a new verse)
If a "[" is found (found_flag) from previous verse
write the WORK AREA stored verse to the output file.
Possibly need to write an eol character
Clear out the work area
Clear out the found [ flag

If a [ is NOT found (found-flag) then
clear the stored verse

Move each character to a WORK AREA
Check each character for a "["
If found set the found_flag = y

Got get another character

Another variation I would like to do is create an output file of JUST
the italicized words or bracketed words.

A couple of questions or issues:

How large a file can C read?
The Old Testament file is 3,282,275 characters (size is 3,342,336
bytes) If needed I could cut up the files.

Is there a better way?

I will probably use Borland C+ as the compiler.

If possible, please email any solutions to (e-mail address removed)

Thank you very much for your time and expertise.
 
C

Coos Haak

Do you know how big the biggest verse is ? Assuming that numbers are verse
delimiters and CANNOT appear in the actual verse itself, this should be an
easy problem. Allocate a buffer for the biggest verse. Read the entire
verse into the buffer. Determine if there is a '[' in the verse with
strchr. If there is, write the whole buffer to your output/file where you
want it.

The italic text may spread over more verses, at least Dutch bibles do.
So you have to look for ']' too.

Coos
 
R

Richard Heathfield

terry said:
I have a text file of the Old Testament and the New Testament. The
italicized words are surrounded by brackets []. If a verse contains
any italicized words I want to write that verse to a new flat-file
that I will use in Word to look at, get counts, and do some grammar
statistics, etc.

Make sure each verse is on a separate line. Then:

grep '\[' bible.txt > italics.txt
Another variation I would like to do is create an output file of JUST
the italicized words or bracketed words.

int ch;
int inword = 0;
while((ch = getc(fpin)) != EOF)
{
if(inword)
{
fputc(ch, fpout);
if(ch == ']')
{
fputc('\n', fpout); /* one word per line */
inword = 0;
}
}
else
{
if(ch == '[')
{
fputc(ch, fpout);
inword = 1;
}
}
}
A couple of questions or issues:

How large a file can C read?

Big as you like, if you're prepared to read it in chunks.
 
A

Arthur J. O'Dwyer

Do you know how big the biggest verse is ? Assuming that numbers are verse
delimiters and CANNOT appear in the actual verse itself, this should be an
easy problem. Allocate a buffer for the biggest verse. Read the entire
verse into the buffer. Determine if there is a '[' in the verse with
strchr. If there is, write the whole buffer to your output/file where you
want it.

The italic text may spread over more verses, at least Dutch bibles do.
So you have to look for ']' too.

Ouch. You mean literally, like this:

42 And the people did rejoice and did feast upon the
lambs and toads and tree-sloths and fruit-bats and
orangutans [and breakfast cereals.
43 Now] did the Lord say, "First thou pullest the
Holy Pin. Then thou must count to three. Three shall
be the number of the counting and the number of the
counting shall be three.

Or just that some verses end with italics, and are followed by
verses beginning with italics, like *this*:

42 And the people did rejoice and did feast upon the
lambs and toads and tree-sloths and fruit-bats and
orangutans [and breakfast cereals].
43 [Now] did the Lord say, "First thou pullest the
Holy Pin. Then thou must count to three. Three shall
be the number of the counting and the number of the
counting shall be three.

The second way is the easier way to parse, obviously.
So it all depends on how the text that the OP is using
is arranged.

-Arthur
 
D

David Rubin

Tom Zych wrote:

Much easier, of course.

Not any more or less easier...
If you slurp the whole thing in one go you'll need an array of that
many chars. Probably not difficult on any modern general-purpose
computer. I'd break the reads into 32k or so at a time, though.

Fortunately for you, your input is fixed, so you can tweak your program
by pre-processing the data. For example, you *know* how many characters
are in the longest verse. If you are using a particular copy of the
text, you even know how many individual lines are in the longest verse.
So, you just read lines (fgetts) into a circular buffer looking first
for verse numbers, and then for '[' characters. The rest should follow
easily.

/david
 
A

Arthur J. O'Dwyer

Not any more or less easier...

Then you must know a very easy way to do the first part (printing out
whole verses). And I doubt you do.


while ((input_char = getchar()) != EOF)
{
if (!output_flag && input_char == '[')
output_flag = 1;
else if (output_flag && input_char == ']') {
output_flag = 0;
putchar('\n');
}
else
putchar(input_char);
}


See how the second part doesn't require memory allocation;
in fact, it doesn't even require more than three bytes of
state information. The first part requires an arbitrarily
large amount of state information -- which *may* not be
feasible on some implementations with small memories and
big disks.

-Arthur
 
D

David Rubin

Arthur J. O'Dwyer said:
Then you must know a very easy way to do the first part (printing out
whole verses). And I doubt you do.

[snip - code]
See how the second part doesn't require memory allocation;
in fact, it doesn't even require more than three bytes of
state information. The first part requires an arbitrarily
large amount of state information

Not true. As I pointed out, since the OP is working with a specific text
(and perhaps a specific version of that text), these variables are
bounded.

/david
 
A

Arthur J. O'Dwyer

Arthur J. O'Dwyer said:
Then you must know a very easy way to do the first part (printing out
whole verses). And I doubt you do.

[snip - code]
See how the second part doesn't require memory allocation;
in fact, it doesn't even require more than three bytes of
state information. The first part requires an arbitrarily
large amount of state information

Not true. As I pointed out, since the OP is working with a specific text
(and perhaps a specific version of that text), these variables are
bounded.

Well, then why on earth did you bother to give any such complicated
algorithm at all? A simple "printf" would have solved the OP's
problem *much* faster and simpler.


#include <stdio.h>

int main(void)
{
printf("The requested output is:\n");
printf("2 And the earth was without form, and void; and darkness "
"[was] upon the face of the deep. And the Spirit of God "
"moved upon the face of the waters.\n");
printf("4 And God saw the light, that [it was] good; and God "
"divided the light from the darkness.\n");
printf("7 And God made the firmament, and divided the waters which "
"[were] under the firmament from the waters which [were] "
"above the firmament: and it was so.\n");
...
printf("26 So Joseph died, [being] an hundred and ten years old: "
"and they embalmed him, and he was put in a coffin in "
"Egypt.\n");
return 0;
}


Perhaps the OP wanted to find an algorithm that would be applicable
to *any* text he cared to examine, not merely a single transcription
of KJV.

-Arthur
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,574
Members
47,207
Latest member
HelenaCani

Latest Threads

Top