Trouble using string functions

G

Gary Morris

Hi all,

I tried posting this through a free news server, but it
still has not appeared in Google, so if it turns up again
I apologize.

I hope someone can help me with this, or at least help me
find some information that will help me. If I were not at my
wit's end already, I wouldn't even ask. I'm used to doing all
of my programming in Windows, but now I have a task to
accomplish in UNIX/Linux using good old gcc.

Basically, what I have to do is parse a JavaScript file that
will ALWAYS have the following format:

************************************************
<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
....(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
************************************************
It is auto-generated by another web site, and my job is to
get it to plain HTML format so that it can be linked to in an
email. What we want is for the end result to be:

************************************************

<TABLE BORDER="0" CELLPADDING="2">
<TR>
....(many more lines no longer with the "document.writeln('" )
</TR>
</TABLE>

************************************************

I have a good bit of it done, but I am getting stuck. Here is
the source I have so far in main:
{
int i, nc;
nc = 0;
for(int j = 1; j < 5; j++)
i = getchar();
while (i != EOF)
{
nc = nc + 1;
i = getchar();
if(i == '\n')
{
printf("%c", i);
for(int j = 1; j < 19; j++)
i = getchar();
}
else
{
printf("%c", i);
}
}
}

I am using redirction, i.e.: "prog<infile>outfile" to do this.

This code gets me close to what I need, but I still have the
remaining " '); " on the end of each string, which will not
do for obvious reasons. I tried reading into a string using
strcat(), which might do if I could just get it to work right,
for some reason my program will not run, no matter what I try
with strcat(). It will compile without any errors though.
I'm just not accustomed to using a language that does not
have an inherent "string" type. Everything is done with the
"char" type, and with the way some functions only want
pointers, etc, etc, well it's got me a bit confused and I've
spent WAY too much time on this already.

Can anyone help a fellow out?
 
M

Mike Wahler

Gary Morris said:
Hi all,

I tried posting this through a free news server, but it
still has not appeared in Google, so if it turns up again
I apologize.

I hope someone can help me with this, or at least help me
find some information that will help me. If I were not at my
wit's end already, I wouldn't even ask. I'm used to doing all
of my programming in Windows, but now I have a task to
accomplish in UNIX/Linux using good old gcc.

Basically, what I have to do is parse a JavaScript file that
will ALWAYS

Never say "always" :)

have the following format:
************************************************
<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
...(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
************************************************
It is auto-generated by another web site, and my job is to
get it to plain HTML format so that it can be linked to in an
email. What we want is for the end result to be:

************************************************

<TABLE BORDER="0" CELLPADDING="2">
<TR>
...(many more lines no longer with the "document.writeln('" )
</TR>
</TABLE>

************************************************
I am using redirction, i.e.: "prog<infile>outfile" to do this.

Can anyone help a fellow out?

Instead of using your approach of depending upon an exact
number and position of characters on each line, the below
extracts all characters between the first occurring pair
of delimiter characters (') on each line. I.e. lines
with less than two delimiters will be skipped, and characters
(if any) past the second delimiter will be skipped.


#include <stdio.h>
#include <string.h>

#define LINE_LEN 128 /* adjust to your needs */

int main()
{
char line[LINE_LEN] = {0};
char delim = '\'';
char *p1 = 0;
char *p2 = 0;

while(fgets(line, sizeof line, stdin))
if(p1 = strchr(line, delim))
if(p2 = strchr(++p1, delim))
{
*p2 = 0;
puts(p1);
}

return 0;
}


Input:

<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
....(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->


Output:

<TABLE BORDER="0" CELLPADDING="2">
<TR>
</TR>
</TABLE>


This code has not been thoroughly tested. I'll let you do that. :)

HTH,
-Mike
 
P

Paul Hsieh

(e-mail address removed) says...
Instead of using your approach of depending upon an exact
number and position of characters on each line, the below
extracts all characters between the first occurring pair
of delimiter characters (') on each line. I.e. lines
with less than two delimiters will be skipped, and characters
(if any) past the second delimiter will be skipped.

#include <stdio.h>
#include <string.h>

#define LINE_LEN 128 /* adjust to your needs */

This cannot be "adjusted" to the OP's needs. The OP did not say that the
automatically generated output had any line length limit, and given that it
*is* autogenerated, I rather doubt that it will obey any such trivially short
line length.

Using the better string library (http://bstring.sf.net/), this problem, and the
brittleness of the your solution (only searching for ') is trivially solved:

-------------------------------------------------------------------------------
#include <stdio.h>
#include "bstrlib.h"

int parseLines (bstring src) {
struct tagbstring token0 = bsStatic ("document.writeln('");
struct tagbstring token1 = bsStatic ("');");
struct tagbstring t, u;
int i, j;

/* Reference to where 1st token might match in src string */
blk2tbstr (t, src->data, token0.slen);
for (i=0; i < src->slen - token0.slen; i++) {

/* Does the 1st token match exactly? */
if (biseq (&t, &token0)) {

/* Reference to where 2nd token might match */
blk2tbstr (u, t.data, token1.slen);
for (j = i; j < src->slen - token1.slen; j++) {

/* Does the 2nd token match exactly? */
if (biseq (&u, &token1)) {

/* Construct middle string */
bstring b = blk2bstr (t.data + token0.slen,
j - i - token0.slen);

/* Output the '\0' terminated buffer */
puts (b->data);
bdestroy (b);
break;
}

/* Shift 2nd token scan downward */
u.data++;
}
}

/* Shift 1st token scan downward */
t.data++;
}
return 0;
}

int main (int argc, char * argv[]) {
FILE * fp;

if (argc < 2) {
printf ("%s [inputfile]\n", argv[0]);
return -__LINE__;
}

if (NULL != (fp = fopen (argv[1], "rb"))) {
/* Just read the whole file into a bstring */
bstring src = bread ((bNread) fread, fp);
int ret = parseLines (src);
fclose (fp);
bdestroy (src);
return ret;
}
return -__LINE__;
}
 
G

Gary Morris

Mike Wahler said:
Never say "always" :)

I only say that because another computer generates the script. Why
they would ever change it I can't imagine!
have the following format:
************************************************
<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
...(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->
************************************************
It is auto-generated by another web site, and my job is to
get it to plain HTML format so that it can be linked to in an
email. What we want is for the end result to be:

************************************************

<TABLE BORDER="0" CELLPADDING="2">
<TR>
...(many more lines no longer with the "document.writeln('" )
</TR>
</TABLE>

************************************************
I am using redirction, i.e.: "prog<infile>outfile" to do this.

Can anyone help a fellow out?

Instead of using your approach of depending upon an exact
number and position of characters on each line, the below
extracts all characters between the first occurring pair
of delimiter characters (') on each line. I.e. lines
with less than two delimiters will be skipped, and characters
(if any) past the second delimiter will be skipped.


#include <stdio.h>
#include <string.h>

#define LINE_LEN 128 /* adjust to your needs */

int main()
{
char line[LINE_LEN] = {0};
char delim = '\'';
char *p1 = 0;
char *p2 = 0;

while(fgets(line, sizeof line, stdin))
if(p1 = strchr(line, delim))
if(p2 = strchr(++p1, delim))
{
*p2 = 0;
puts(p1);
}

return 0;
}


Input:

<!--
document.writeln('<TABLE BORDER="0" CELLPADDING="2">');
document.writeln(' <TR>');
...(many more lines all starting with "document.writeln('" )
document.writeln(' </TR>');
document.writeln('</TABLE>');
// -->


Output:

<TABLE BORDER="0" CELLPADDING="2">
<TR>
</TR>
</TABLE>


This code has not been thoroughly tested. I'll let you do that. :)

HTH,
-Mike

The code has been thouroghly tested with several of these scripts
and it works perfectly every time so far! Thanks a bunch Mike. I'll
bet this took you all of 5 minutes at the most to cook up, whereas
I had spent quite a few hours trying all manner of different things.
Now I wish I had actually USED that C++ compiler that I got in the
mid 1990's.
 
G

Gary Morris

Oops, I spoke too soon! I just tried running this on the latest
version, and wouldn't you know that one of the lines had a ' in
it. Being javascript, it is escaped with the backslash like:

another\'s

Given this, it should be a fairly easy matter to check for the
escape character and ignore the next character. Fairly simple
for someone else, that is, but I am certainly going to try now
that I've got something that actually (almost) works like I
need it to.
 
M

Mike Wahler

Gary Morris said:
Oops, I spoke too soon! I just tried running this on the latest
version, and wouldn't you know that one of the lines had a ' in
it. Being javascript, it is escaped with the backslash like:

another\'s

Given this, it should be a fairly easy matter to check for the
escape character and ignore the next character. Fairly simple
for someone else, that is, but I am certainly going to try now
that I've got something that actually (almost) works like I
need it to.

Hint for a 'quick-n-dirty' fix:

The function 'strchr()' has a counterpart which starts
searching from the end of a string instead of from the
beginning: 'strrchr()'.

-Mike
 
R

Richard Bos

I only say that because another computer generates the script. Why
they would ever change it I can't imagine!

Hohum. Beware the snark. I've said the same thing before. There was this
other computer, which was supposed to generate data, and that was sent
to me to process. So I wrote a program to process it. Should be a simple
job - after all, it was all computer-generated data, and what reason
could they possibly have for changing the format?

No prizes for guessing what happened two months later.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top