remove entry from text file [NOT A HOMEWORK]

M

Micah Cowan

Roman Mashak said:
Hello, Micah!
You wrote on Thu, 23 Feb 2006 23:07:14 GMT:

[skip]
??>> /* skip spaces inside of token */
??>> for (; *idx != '\0' && (*idx == ' ' || *idx == '\t'); idx++)
??>> ;

MC> This is the same thing as:

MC> idx += strspn(idx, " \t");
It's not exactly same, the purpose of my code it to keep original buffer and
work only with second pointer.

I don't see how that changes with my line above.
And result of 'strspn()' comes into 'size_t' variable, while I need to work
with 'char *'

Which is why I did +=, and not =.

-Micah
 
J

Jordan Abel

I agree that it's annoying, but I think it was pretty much necessary.
Pre-ANSI C didn't have "const". Making string literals const would
break code like this:

int print_string(char *s)
{
return printf("s = \"%s\"\n", s);
}

print_string("hello");

Implementations could still be free to continue to support such code.
The problem is that now the "non-const const" is enshrined in the
standard.
The problem would occur on passing the string literal to a function
expecting a (non-const) char*, not on any attempt to actually write to
it. There would have been no good way to write this program in a
manner compatible with both K&R and ANSI C, though I suppose something
like

If it's K&R code, compile it in a different mode - or with a different
compiler program [cc instead of c89 on unix]. they could have also done
that with implicit int from the start, and maybe with non-prototypes.
 
R

Roman Mashak

Hello, David!
You wrote on Fri, 24 Feb 2006 00:30:00 +0000 (UTC):

??>>> if (!found) {
??>>> fprintf(fn, buf);
DH> ^^^
DH> Don't do that. Ever.

DH> If the input file happens to contain a '%' character, undefined
DH> behavior ensues.
Hm, I didn't find this issue in FAQ. What's the more robust and portable
method to put formatted output into file?
??>>> /* extract token from commas */
??>>> if (*idx++ != '"')
??>>> continue;

DH> This branch loses the input line, which probably isn't what was
DH> intended.
Here it is supposed to put a pointer on to first occurence of ' " ' symbol
in the line.

With best regards, Roman Mashak. E-mail: (e-mail address removed)
 
R

Roman Mashak

Hello, Micah!
You wrote on Thu, 23 Feb 2006 23:07:14 GMT:

[skip]
??>> /* skip spaces inside of token */
??>> for (; *idx != '\0' && (*idx == ' ' || *idx == '\t'); idx++)
??>> ;

MC> This is the same thing as:

MC> idx += strspn(idx, " \t");
It's not exactly same, the purpose of my code it to keep original buffer and
work only with second pointer.
And result of 'strspn()' comes into 'size_t' variable, while I need to work
with 'char *'

With best regards, Roman Mashak. E-mail: (e-mail address removed)
 
R

Roman Mashak

Hello, Micah!
You wrote on Fri, 24 Feb 2006 02:45:43 GMT:

MC> It's /not/ formatted output: that's the point. If he must use
MC> fprintf(), he should use "%s", followed by buf. But fputs() or puts()
MC> seem like more reasonable facilities.
That's good point, thanks!

??>>>>> /* extract token from commas */
??>>>>> if (*idx++ != '"')
??>>>>> continue;

MC> You've missed the point. If the line doesn't specify a " after the
MC> keyword "zone ", then it's "not the line you're looking for, move
MC> along." The trouble is, that if it's not the line he's looking for, he
I got you, but the problem is if I can't locate the ' " ' sign in the line,
I can't look for what's inside the "zone " token.
May be print the invalid string an move on should be enough:

if (*idx++ != '"') {
fputs(buf, fn);
fflush(fn);
continue;
}

With best regards, Roman Mashak. E-mail: (e-mail address removed)
 
M

MrG{DRGN}

Micah Cowan said:
-snip-

BTW, strncasecmp() is not a Standard C function (it's POSIX, which is
not on-topic here). The residents on this newsgroup insist that you do
not post code containing functions that are (a) not Standard C
functions and (b) not defined in the provided code. For the sake of
example code on this newsgroup, the easiest way for you to avoid
difficulties is probably to use strncmp() instead. Or, define your own
implementation of strncasecmp()...

Roman here is a definition for an implementation of strncasecmp I found in
some of my code. However, I didn't write it, and I can't verify if it
functions exactly like the strncasecmp Micah is talking about, or is
strictly portable.

int strncasecmp (const char *s1, const char *s2, size_t n)
{
int c1;
int c2;

do
{
c1 = *s1++;
c2 = *s2++;

if (!n--)
return 0; /* strings are equal until end point */

if (c1 != c2)
{
if (c1 >= 'a' && c1 <= 'z')
c1 -= ('a' - 'A');
if (c2 >= 'a' && c2 <= 'z')
c2 -= ('a' - 'A');
if (c1 != c2)
return -1; /* strings not equal */
}
} while (c1);

return 0; /* strings are equal */
}


hopefully those who know better will be willing to help out.
 
R

Roman Mashak

Hello, Micah!
You wrote on Fri, 24 Feb 2006 03:00:51 GMT:

??>> It's not exactly same, the purpose of my code it to keep original
??>> buffer and work only with second pointer.

MC> I don't see how that changes with my line above.

??>> And result of 'strspn()' comes into 'size_t' variable, while I need to
??>> work with 'char *'

MC> Which is why I did +=, and not =.
Sorry for my stupidity :) now I got your thought properly.
Thanks a lot!

With best regards, Roman Mashak. E-mail: (e-mail address removed)
 
R

Roman Mashak

Hello, Micah!
You wrote on Fri, 24 Feb 2006 02:59:38 GMT:

MC> Well, personally, I'd probably have avoided the separate and
MC> unnecessary buffer sidx[] altogether, and just stored the start and
MC> end of the interesting text.
What's the benefit of this?

MC> char *scur;

MC> Then the loop could be:

MC> for (scur = sidx; *idx != '\0' && *idx != '"'; ++scur, ++idx)
MC> *scur = *idx;

MC> which would at least eliminate the need for i.
But it will introduce one more pointer, *scur.

With best regards, Roman Mashak. E-mail: (e-mail address removed)
 
C

CBFalconer

MrG{DRGN} said:
.... snip ...

Roman here is a definition for an implementation of strncasecmp I
found in some of my code. However, I didn't write it, and I can't
verify if it functions exactly like the strncasecmp Micah is
talking about, or is strictly portable.

int strncasecmp (const char *s1, const char *s2, size_t n)
{
int c1;
int c2;

do
{
c1 = *s1++;
c2 = *s2++;

if (!n--)
return 0; /* strings are equal until end point */

if (c1 != c2)
{
if (c1 >= 'a' && c1 <= 'z')
c1 -= ('a' - 'A');
if (c2 >= 'a' && c2 <= 'z')
c2 -= ('a' - 'A');
if (c1 != c2)
return -1; /* strings not equal */
}
} while (c1);

return 0; /* strings are equal */
}

Here is what I would do with that to make it portable (but not to
any wide char sets). It may well run faster too.

#include <ctype.h>
int strncasecmp (const unsigned char *s1,
const unsigned char *s2,
size_t n)
{
int c1, c2;

do {
c1 = *s1++; c2 = *s2++;
if (!n--)
return 0; /* strings are equal until end point */
if (c1 != c2) {
c1 = toupper(c1); c2 = toupper(c2);
if (c1 != c2)
return c1 - c2; /* strings not equal */
}
} while (c1);
return 0; /* strings are equal */
}

This allows for use of non-ansi char sets and returns a value whose
sign describes the relationship for unequal strings (as does
strcmp).

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
M

Micah Cowan

Roman Mashak said:
MC> You've missed the point. If the line doesn't specify a " after the
MC> keyword "zone ", then it's "not the line you're looking for, move
MC> along." The trouble is, that if it's not the line he's looking for, he
I got you, but the problem is if I can't locate the ' " ' sign in the line,
I can't look for what's inside the "zone " token.
May be print the invalid string an move on should be enough:

if (*idx++ != '"') {
fputs(buf, fn);
fflush(fn);
continue;
}

Well, it's your program, do what you want: just make sure to think
things through enough to know that it really /is/ what you want.

I would probably do something like the above; but then, I'd also
probably take 100-char lines into account, too.

I suppose that, since it's a quick program for your own uses, then in
general you're probably not going to encounter zone lines without
quoted strings, so you don't have to be /too/ careful. IMO, it'd still
be very worthwhile to at least detect and report such conditions, so
you are aware of them if you accidentally encounter one.
 
M

Micah Cowan

CBFalconer said:
MrG{DRGN} wrote:
Here is what I would do with that to make it portable (but not to
any wide char sets). It may well run faster too.

#include <ctype.h>
int strncasecmp (const unsigned char *s1,
const unsigned char *s2,
size_t n)

Good idea, but since in the real world strncasecmp() is a standardized
API (just not by the C one...), you can't really change the type of
s1 and s2.
{
int c1, c2;

do {
c1 = *s1++; c2 = *s2++;
if (!n--)
return 0; /* strings are equal until end point */
if (c1 != c2) {
c1 = toupper(c1); c2 = toupper(c2);
if (c1 != c2)
return c1 - c2; /* strings not equal */
}
} while (c1);
return 0; /* strings are equal */
}

I wonder how much time that first "if (c1 != c2)" saves... it seems to
me that the most usual case will be true for that condition, in which
case, I'd probably just go immediately to the toupper() conditions and
then compare.

But I think it would vary greatly depending on the application...
 
M

Micah Cowan

MrG{DRGN} said:
Roman here is a definition for an implementation of strncasecmp I found in
some of my code. However, I didn't write it, and I can't verify if it
functions exactly like the strncasecmp Micah is talking about, or is
strictly portable.
int strncasecmp (const char *s1, const char *s2, size_t n)
{
int c1;
int c2;

do
{
c1 = *s1++;
c2 = *s2++;

if (!n--)
return 0; /* strings are equal until end point */

if (c1 != c2)
{
if (c1 >= 'a' && c1 <= 'z')
c1 -= ('a' - 'A');
if (c2 >= 'a' && c2 <= 'z')
c2 -= ('a' - 'A');
if (c1 != c2)
return -1; /* strings not equal */
}
} while (c1);

return 0; /* strings are equal */
}

I'm sure this worked on the system for which it was written; however,
C does not assume ASCII (nor POSIX), nor that ('a' - 'A') will be the
difference between lowercase and uppercase for all letters in the
alphabet. Chuck's reponse includes a more portable version.
 
F

Flash Gordon

Micah said:
Pedro Graca said:
Hmmmm ???

What's the type of a literal string in code?
I thought it was `const char *'

I wish the const were there....

But now. A string literal has type char[].
Agreed.

Since you posted some very helpful test code, I'll explain what you're
seeing...
#include <stdio.h>
#include <string.h>

int main(void) {
int a, b, c, d, e, f;
char * z1 = "forty two";
char z2[] = "forty two";

These two don't involve objects generated, as they're initializers,
and not assignments.

Actually, the first one *does* involve an object. It initialises z1 to
point to an anonymous char array containing the string literal.
Not much need to comment here.


This "forty two" requires that the generated program have at least one
character array with a length of 10, and that is the result given by
the string literal, so sizeof will return 10.

This one does not require an object in the generated program. sizeof is
an operator that returns the size of it's operand, not a function, so it
is simple (and I think normal practice) for the compiler to count the
length of the string literal and insert the size that the anonymous
array would be if it bothered to creat it.
The size of all three of these will be implementation-dependant, but
will certainly be the same, as they all test the same type.

const char * is not the same type as char *, but I believe they are
required to have the same representation.
And, as you've discovered, this should also give 10, for the
explicitly declared array of 10 characters.
Agreed.


Hey, doubt away! I'm certainly human, and have certainly said my share
of stupid things on this list, both formerly and recently. That's why
posting to comp.lang.c is a great way to learn the ins and outs of the
C language: if you say something that reveals a misperception of
reality, you will quickly be corrected. :)

Yes, and I think I've spotted a couple this time ;-)
 
M

Micah Cowan

Flash Gordon said:
Micah said:
Since you posted some very helpful test code, I'll explain what you're
seeing...
#include <stdio.h>
#include <string.h>
int main(void) {
int a, b, c, d, e, f;
char * z1 = "forty two";
char z2[] = "forty two";
These two don't involve objects generated, as they're initializers,
and not assignments.

Actually, the first one *does* involve an object. It initialises z1 to
point to an anonymous char array containing the string literal.

Well, yes. What I meant was that the string literals themselves don't
generate objects, only the declarations.
This one does not require an object in the generated program. sizeof
is an operator that returns the size of it's operand, not a function,
so it is simple (and I think normal practice) for the compiler to
count the length of the string literal and insert the size that the
anonymous array would be if it bothered to creat it.

Well, yes. But from the standpoint of the Standard, the object /does/
exist; the implementation must simply behave "as if" the object were
created. While such optimizations are likely, nevertheless from my
preferred POV, and as far as any conforming program knows, the object
exists.
 
D

Dave Thompson

If you decide to follow my advice, this would become:

idx += (sizeof zone_array) - 1;

Actually, my own preference has been to define a macro such as:

#define ARY_STRLEN(a) (sizeof(a)-1)

which leads to a slightly better expression, in terms of
self-documentation:

idx += ARY_STRLEN(zone_array) - 1;
s/- 1// since you did it within the macro.


- David.Thompson1 at worldnet.att.net
 
D

Dave Thompson

<OT> If you just need the function implemented, on any system where
you would have bind, this could be a one-liner in sed, awk, or perl in
awk mode something like:
awk -vqzone=\"blah\" \
'$1=="zone"{d=$2==qzone} !d{print} $1=="}"{d=0}'
and if the efficiency of this actually matters you're in trouble
anyway. OTOH if this is an exercise or opportunity to work on C
No need for + (update).
As noted elsethread this could be strspn. If not, the check for
null/zero is unnecessary; any character that is either space or tab is
necessarily not zero/null. Again below, snipped.
strlen("zone"), not sizeof().
sizeof lit -1 is fine and arguably better, as noted elsethread.
Not commas, quotes.
/* put token into buffer to compare */
while (*idx != '\0' && *idx != '"')
sidx[i++] = *idx++;
Could use strchr for the scan, and ...
sidx = '\0'; i = 0;
if (strcasecmp(sidx, argv[1]) != 0) {


Instead of copying, could (again) strncasecmp in place.

Unless bind allows escapes here (which I don't recall and can't be
arsed to check as it's offtopic) and you need to support that.
Although, you could just require that the user do the matching
escaping on the supplied argument. Or escape it once, at your program
startup, before doing the file scan -- but not in place; you _can_
modify argv[*] strings but not extend them. (And strictly you cannot
modify the argv[*] _pointers_, though you can argv itself.)


- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top