Alternative libraries for C, that are efficient

B

BartC

John Reye said:
Hi,

recently I learned that various "standard C library" functions have
deficiencies in terms of high performance.
Examples include the return values of fgets(), strcpy(), strcat()
(thanks Eric Sosman for mentioning the last 2)

Example: char * strcat ( char * destination, const char *
source );
Return Value: destination is returned.

How useless is that! I already know the distination, in the first
place.
Why not return a pointer to the end of the concatenated string, or the
size of the string. This would cost the library no extra performance
cost whatsoever!


Are these deficiencies _only_ string-related?

It seems that way. But no-one is bothered about it, because writing
alternative versions (thin wrappers around standard functions) is trivial:

int strcat_l(char *s,char *t,int slen,int tlen){
if (slen<0) slen=strlen(s);
if (tlen<0) tlen=strlen(t);
memcpy(s+slen,t,tlen+1);
return slen+tlen;
}

int strcpy_l(char *s,char *t,int tlen){
if (tlen<0) tlen=strlen(t);
memcpy(s,t,tlen+1);
return tlen;
}

Etc.

(In this style that I sometimes use, you can optionally supply a string
length if you know it, otherwise -1 is passed. It's not hard to make them
faster than the standard functions. (Except when the lengths aren't known;
then it would probably be better for these to just call the standard
versions, instead of messing with strlen(). But I tend to use this form for
other kinds of string processing where a standard version doesn't exist.))
 
M

Malcolm McLean

בת×ריך ×™×•× ×—×ž×™×©×™, 12 ב×פריל 2012 19:24:30 UTC+1, מ×ת John Reye:
Well implemented interfaces, will "almost automatically" make you
write good code.
But badly designed interfaces, will make you call strlen and all kinds
of other functions that you wouldn't have to call, if the functions-
interfaces had been designed in a better manner.
There's often a tension between perfomance and ease of use, however.

For instance my options parser allows the programmer to take in a string, typically a file name, from the command line. Command line arguments are modifiable, so the OS has almost certainly copied the data from the console input into a temporary buffer. Then the options parser takes a local copy of the command line arguments. Then it makes another copy when the caller requests the program. So we end up, probably in main, with a char pointer we could have had with a simple ptr = argv[index]; However I made the judgement that it's easier that way. Then you don't have to document that argv mustnot be modified, or that if the programmer requests two copies of the filename (unlikely) that they are aliases. And how much is all this shuffling about of data in memory likley to impact performance?
 
J

James Kuyper

It seems that way. But no-one is bothered about it, because writing
alternative versions (thin wrappers around standard functions) is trivial: ,,,
int strcpy_l(char *s,char *t,int tlen){
if (tlen<0) tlen=strlen(t);
memcpy(s,t,tlen+1);
return tlen;
}

Your wrapper version passes through the input array twice: once for
strlen() and once for memcpy(). That's precisely the deficiency he wants
to avoid. A minor change to the implementation of strcpy() would allow
it to return the desired pointer at no extra cost - the cost for your
wrapper is fairly high.
 
T

tom st denis

I'd rather use
strcat(foo, bar);
puts(foo);

and have a well-designed interface of strcat, to return something
useful.

Nothing stopping you from writing portable replacement functions. No
need to bitch on USENET about it.

Tom
 
E

Eric Sosman

Hi,

recently I learned that various "standard C library" functions have
deficiencies in terms of high performance.

Recently I learned that you have a bee in your bonnet.

No, hold it, calm down, rewind: I'm not saying it's your
fault the bee is there; it may have flown in of its own accord.
But your best bet it to shake it out of your hat and settle down
before your blood pressure gets any higher.

Despite its infelicities, I see little in the C library design
that forces implementations or callers to perform poorly. Certainly,
there are some interfaces that could, if revised, admit of higher-
erforming implementations. For example, malloc() is a fairly "narrow"
interface, and one can imagine a super_malloc() to which one could
pass information about the expected lifetime of the allocation, its
affinity or disaffinity to other allocations, whether it's more likely
to be used by the GPU or the DMA chip, and so on. But one can also see
that such a super_malloc() could be rather messy to use:

char *ptr = malloc(42);

vs.

super_malloc_options opt = { 0 };
opt.request_size = 42;
opt.growth_allowance = 42 * 6;
opt.lifetime_hint = SUPER_MALLOC_LIFETIME_BRIEF;
opt.cache_choice = SUPER_MALLOC_CACHE_AVOID(&blivet);
#if __STDC_VERSION__ > 201306L
opt.address_randomization =
#ifdef __WINDOWS__
__MS_RANDOMIZE_IF_ABLE__
#else
SUPER_MALLOC_RANDOMIZE_IF_ABLE
#endif
;
#endif
char *ptr = super_malloc(&opt);

There is no doubt that an interface along the lines of the latter
could outperform the familiar malloc() in at least some situations.
But which interface would you, as a code writer, prefer to use?
Which would you expect to require more debugging time?

I reiterate my earlier question about your keyboard: Are you
still using nineteenth-century QWERTY? I'm not saying that you
are wrong (or right) to do so, just asking you to examine the
reasons for your choice, and to reflect on those same reasons as
they may apply to other standardized interfaces.
Examples include the return values of fgets(), strcpy(), strcat()
(thanks Eric Sosman for mentioning the last 2)

You're welcome, I am sure. But don't read so much into it: In
hindsight we may wish that some choices had been made differently,
but that's no reason to go ballistic.
Surely there must be a good library somewhere, or do all C programmers
really carry their own routines with them?

That's a complicated question I'm too weary to address just now.
Ponder it for yourself, and you may discover some useful answers, or
if not answers, insights.

And f'gawd'sake relax!
 
B

BartC

James Kuyper said:
Your wrapper version passes through the input array twice: once for
strlen() and once for memcpy().

Yes, I mentioned my versions work better when the caller knows the lengths.

That's precisely the deficiency he wants
to avoid. A minor change to the implementation of strcpy() would allow
it to return the desired pointer at no extra cost - the cost for your
wrapper is fairly high.

My point was that wrappers like this can be trivial to write: that
length-based strcpy_l() function took only a minute or two. And instantly it
can already be faster than the standard function.

Being able to deal also with cases where the lengths are not known, is an
extra feature provided for convenience; it can be a bit slower, but it also
*returns* the length of the string, so it might save the caller having to
call strlen!

The wrapper functions can have other interfaces too.
 
G

Guest

James Kuyper wrote:
) On 04/12/2012 02:32 PM, John Reye wrote:

)>> 1. Make it work.
)>> 2. Make it pretty. (well designed API, documentation)
)>> 3. Make it robust. (handle errors gracefully)
)>> 4. Make it fast. (if you really need to)
)
) "Make it work" is always first -

tho' there are degrees of "work". A fast but inaccurate answer may be better than a "perfectly correct" answer. Sometimes "in good time" is an essential part of the requirement. Your car's anti-lock brakes can't spend hours or minutes deciding the best time to put the brakes on (or take them off).
) if you don't need the program to work
) correctly, you don't need to write a new program; there's plenty of
) existing programs that already don't do whatever it is that the new
) program was supposed to do.

That's the kind of thinking that brought us 2-megabyte XML blobs
as "database entities". Some stuff you just can't refactor when
it turns out it's a performance killer, so you *have* to account
for that from the beginning. Not to mention 'if it works, ship'

"don't prematurely optimise" is not an excuse to "prematurely pessimize". Besides if it's been properly designed (see Rule 2) the XML blob will be hidden behind an abstraction and a real database be shoved in with little trouble (I'm aware swopping databases isn't quite as easy as this- but it should be!).

there other things can't be added on afterwards eg. security
 
G

Guest

If you're designing a programming language, and working with an interpreted
implementation for the time being, is it a case of premature optimization to be
concerned with that the language features will be suitable for compilation?

The Extreme Programmers would say so. Their form of Future Blindness precludes thinking of anything beyond passing the current test case. Doing so is Big Design Up Front (BDUF).
 
J

James Kuyper

Yes, I mentioned my versions work better when the caller knows the lengths.

Which I consider a very minor advantage; if the length is known, just
call memcpy() directly. If it's not known, your version is slower than
strcpy(), and no faster than strcpy() followed by strlen().
That's precisely the deficiency he wants

My point was that wrappers like this can be trivial to write: that
length-based strcpy_l() function took only a minute or two. And instantly it
can already be faster than the standard function.

With only a little extra work, you can write your own strcpy_l() that
doesn't call standard library functions, passes through the array only
once, and is likely (with a decent compiler) to compile into code with
speed comparable to that of strcpy() itself, but with the added
advantage of returning a value which contains more useful information
than that returned by strcpy(). I wouldn't bother writing such a
routine, because it's trivial to inline the code manually; but it would
have been slightly more convenient if strcpy() had been defined to work
that way in the first place. It's not a sufficiently big issue to
justify John Reye's overreaction, but it is a real one.
 
W

Willem

(e-mail address removed) wrote:
) On Thursday, April 12, 2012 7:51:36 PM UTC+1, Willem wrote:
)> That's the kind of thinking that brought us 2-megabyte XML blobs
)> as "database entities". Some stuff you just can't refactor when
)> it turns out it's a performance killer, so you *have* to account
)> for that from the beginning. Not to mention 'if it works, ship'
)
) "don't prematurely optimise" is not an excuse to "prematurely pessimize".

I understand 'pessimize' to be to deliberately make something pessimal.
What I guess you're trying to say is something like: "Do prematurely try
to get in the right ballpark for optimality." But that's just a guess.

) Besides if it's been properly designed (see Rule 2) the XML blob will be
) hidden behind an abstraction and a real database be shoved in with little
) trouble (I'm aware swopping databases isn't quite as easy as this- but it
) should be!).

Uh yeah, no. You see, the 'abstraction' is the source of the problem.
Especially if it's an object oriented abstraction. OO and databases
really do not mix.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Q

Quentin Pope

Despite its infelicities, I see little in the C library design
that forces implementations or callers to perform poorly. Certainly,
there are some interfaces that could, if revised, admit of higher-
erforming implementations. For example, malloc() is a fairly "narrow"
interface, and one can imagine a super_malloc() to which one could pass
information about the expected lifetime of the allocation, its affinity
or disaffinity to other allocations, whether it's more likely to be used
by the GPU or the DMA chip, and so on. But one can also see that such a
super_malloc() could be rather messy to use:

char *ptr = malloc(42);

vs.

super_malloc_options opt = { 0 };
opt.request_size = 42;
opt.growth_allowance = 42 * 6;
opt.lifetime_hint = SUPER_MALLOC_LIFETIME_BRIEF; opt.cache_choice =
SUPER_MALLOC_CACHE_AVOID(&blivet); #if __STDC_VERSION__ > 201306L
opt.address_randomization =
#ifdef __WINDOWS__
__MS_RANDOMIZE_IF_ABLE__
#else
SUPER_MALLOC_RANDOMIZE_IF_ABLE
#endif
;
#endif
char *ptr = super_malloc(&opt);

There is no doubt that an interface along the lines of the latter could
outperform the familiar malloc() in at least some situations. But which
interface would you, as a code writer, prefer to use? Which would you
expect to require more debugging time?

Come on Eric, this is a straw man argument. Of course it is possible to
get the best of both worlds.

For example, in the Pthreads API, the pthread_create() function takes a
pthread_attr_t * argument, but you just need to provide NULL to get a
sensible set of defaults.
 
B

BartC

James Kuyper said:
On 04/13/2012 06:38 AM, BartC wrote:

Which I consider a very minor advantage; if the length is known, just
call memcpy() directly. If it's not known, your version is slower than
strcpy(), and no faster than strcpy() followed by strlen().

Then you lose all the abstraction that strcpy() etc provide; you have to
remember to copy the extra char in the case of strcpy().

And my main example was strcat(); the strcpy() was needed to initialise the
destination in each iteration of the test loop. You don't really want to
start inlining functions such as strcat() because the code will obscure
whatever it is you're really trying to do.

Don't forget these _l functions also return the length of the resulting
string. I altered the two test loops (one using _l functions, one using
standard functions) so that the length of the resulting string was needed.
(In the _l loop, I also used the result of strcpy_l() as one of the length
parameters to strcat_l().)

That way I managed to outperform four x86 C compilers at their highest
optimisation settings, *even without knowing* the lengths of the two strings
at the start of each iteration! (OK, on a very specific test on two strings
of 64 and 80 chars.)

I did another test using a strcmp_l(); this can return an instant result
when lengths are unequal, instead of having to scan the actual strings. OK,
you will argue this could also be done inline, but you don't really want to
do that; it's the compiler's job to inline, not yours.
 
J

James Kuyper

Well, "fairly high" is relative. What is the code going to do with the
copy? Perhaps that is orders of magnitude longer than the call to strlen().

Agreed - I don't like wasting anything, but the amount of waste involved
here is not sufficient to justify John Reye's description of these
routines as "useless".
Note, too, that it may be more efficient to call strlen() and then memcpy()
than it is to copy byte-by-byte as you look for the nul-terminator. (As I
recall, the x86 series of CPU has a single opcode which basically does
strlen, and memcpy can copy 4 [or even 8] bytes at a time for everything
other than the endpoints.)

That's too platform-specific an issue for my tastes; some other platform
might have a single opcode for copying while looking for a null
terminator. As long as I write clean simple code that does precisely
what needs to be done, and no more than what needs to be done, I feel
I've done my part of the job; choosing the right op-codes to implement
it is the compiler's part.

If a two-pass implementation turns out to be faster than the one-pass
solution I wrote, I can hope that a sufficiently sophisticated compiler
will convert my one-pass C code into machine code equivalent to a
two-pass implementation. I'm not going to lose any sleep over the
possibility that it won't. If I need more speed, and can afford to loose
portability, I'll use assembly language rather than trying to figure out
how to trick a particular C compiler into generating the assembly
language I want it to generate.

John would probably be more interested in such issues than I am.
 
J

Jens Gustedt

Hi,

Am 04/12/2012 08:22 PM, schrieb John Reye:
What's wrong with:

char * s;
if ((s = (char *)malloc(100)) == NULL) {
fprintf(stderr, "Out of memory\n");
exit(1);
}
strcpy(s, "Hello ");
strcat(s, "world");

Are you trying to tell me that your's will be more efficient.
Me'thinks not.

Mine is an expression, so it can easily packed in a macro, in the
initializer of a `for`-loop variable, whatever.
or

char const* t = strcat((char[100]){ "Hello " }, s);
char *t = (char *)((char[100]){ "Hello " });
strcat(t, s);

In your's the compound literal makes not much sense, since you give away
the unprotected pointer to it, so it would better be written

char t[100] = { "Hello " };
strcat(t, s);

Mine has the const qualifier on the pointed-to object. How would you
achieve this?
Sorry, I don't follow. Could you explain it (perhaps with an example).

If you'd like to use such things in macros (I do) you only want to
evaluate each argument once, because of possible side effects.

#define STRCAT3(X, Y, Z) strcat(strcat((X), (Y)), (Z))

in P99 I have

#define P99_STRCATS(TARG, ...)

that lifts that idea to (almost) any fixed number of arguments.

(Well I am cheating a bit, that one uses stpcpy under the hood to be
more efficient. But the idea is the same.)

Jens
 
J

Jens Gustedt

Am 04/12/2012 06:01 PM, schrieb John Reye:
Example: char * strcat ( char * destination, const char *
source );
Return Value: destination is returned.

How useless is that! I already know the distination, in the first
place.
Why not return a pointer to the end of the concatenated string, or the
size of the string. This would cost the library no extra performance
cost whatsoever!

ah, perhaps I found a library that would fit your needs better. How
about POSIX' memccpy?

(((char*)memccpy(destination, source, 0, SIZE_MAX))-1)

should do what you want, I think.

(You'd have to be sure that source is 0-terminated for this to work,
but that you'd have to do for strcat, too)

Jens
 
M

Malcolm McLean

בת×ריך ×™×•× ×©×™×©×™, 13 ב×פריל 2012 19:48:30 UTC+1, מ×ת James Kuyper:
If a two-pass implementation turns out to be faster than the one-pass
solution I wrote, I can hope that a sufficiently sophisticated compiler
will convert my one-pass C code into machine code equivalent to a
two-pass implementation. I'm not going to lose any sleep over the
possibility that it won't. If I need more speed, and can afford to loose
portability, I'll use assembly language rather than trying to figure out
how to trick a particular C compiler into generating the assembly
language I want it to generate.
You look at the bottlneck, which is unlikely to be the string concatenation, but might be. You know that.
But it's better to remove the bottleneck in C rather than assembly if you can. That way it's more likely that a maintainer can read your code. And ypucan port it much more easily. On the machine you port to the optimal two-pass solution might well be pessimal, but in ten years it might be a 20Ghz job available for five dollars, so no-one will care.
 
P

Phil Carmody

Kenneth Brody said:
On 4/12/2012 2:51 PM, Willem wrote:
[...]
That's the kind of thinking that brought us 2-megabyte XML blobs
as "database entities". Some stuff you just can't refactor when
it turns out it's a performance killer, so you *have* to account
for that from the beginning. Not to mention 'if it works, ship'

I thought it was "it compiles without errors... ship it." :)

You've obviously never met our suppliers...

Phil
--
I'd argue that there is much evidence for the existence of a God.
Pics or it didn't happen.
-- Tom (/. uid 822)
 
P

Phil Carmody

Jens Gustedt said:
Hi,

Am 04/12/2012 08:22 PM, schrieb John Reye:gag.
What's wrong with:

char * s;
if ((s = (char *)malloc(100)) == NULL) {
fprintf(stderr, "Out of memory\n");
exit(1);
}
strcpy(s, "Hello ");
strcat(s, "world");

Are you trying to tell me that your's will be more efficient.
Me'thinks not.

Mine is an expression, so it can easily packed in a macro, in the
initializer of a `for`-loop variable, whatever.
or

char const* t = strcat((char[100]){ "Hello " }, s);
char *t = (char *)((char[100]){ "Hello " });
strcat(t, s);

In your's the compound literal makes not much sense, since you give away
the unprotected pointer to it, so it would better be written

char t[100] = { "Hello " };
strcat(t, s);

Mine has the const qualifier on the pointed-to object. How would you
achieve this?
Sorry, I don't follow. Could you explain it (perhaps with an example).

If you'd like to use such things in macros (I do) you only want to
evaluate each argument once, because of possible side effects.

#define STRCAT3(X, Y, Z) strcat(strcat((X), (Y)), (Z))
retch.

in P99 I have

#define P99_STRCATS(TARG, ...)

that lifts that idea to (almost) any fixed number of arguments.

(Well I am cheating a bit, that one uses stpcpy under the hood to be
more efficient. But the idea is the same.)

How can you use the word "efficient" so close to code that,
from an efficiency viewpoint, is so obviously inefficient?
Do you have an efficient bubblesort too in your library?

If you want to concatenate 3 strings, using sprintf should be
way more efficient than the above nonsense.

Phil
--
I'd argue that there is much evidence for the existence of a God.
Pics or it didn't happen.
-- Tom (/. uid 822)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,575
Members
47,207
Latest member
HelenaCani

Latest Threads

Top