An exercise in fread optimisation

K

Khookie

Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
memset(buf, '\0', BUFSIZE + 1);

FILE *file = fopen("stuff.txt", "rb");

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
-- END CODE --

Average CPU time: 140-156

BTW, stuff.txt is a 200MB binary file of randomness.

I looked at the memsets, and thought - this could maybe be faster.

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

FILE *file = fopen("stuff.txt", "rb");

// Prepare
fseek(file, 0 , SEEK_END);
int size = ftell(file);
int iterations = size / BUFSIZE;
int remaining = size % BUFSIZE;
rewind(file);

/*
printf("Size: %d\n", size);
printf("Iterations: %d\n", iterations);
printf("Remainder: %d\n", remaining);
*/

// Iterate
int i;
for (i = 0; i < iterations; i++) {
fread(buf, BUFSIZE, CHAR_SIZE, file);
//printf("%s", buf);
}

fread(buf, BUFSIZE, CHAR_SIZE, file);
buf[remaining] = '\0';
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
-- END CODE --

Average CPU time: 125

What does everyone think? Could it be better?

Hope this was useful to someone.

Chris
 
M

Mark Bluemel

Khookie said:
Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];
memset(buf, '\0', BUFSIZE + 1);

FILE *file = fopen("stuff.txt", "rb");

while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {

That's a perverse way of expressing it, IMHO... You are specifying you
want one item (CHAR_SIZE is used where a count is expected) of size
BUFSIZE. I'm not sure whether fread is guaranteed to produce consistent
data when the file is a fractional multiple of BUFSIZE bytes long.
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
} [snip]
BTW, stuff.txt is a 200MB binary file of randomness.

I looked at the memsets, and thought - this could maybe be faster.

Why are you memset()ing at all?

If it's truly random, then it could contain a '\0' anywhere, surely?

So there's no point in trying to use '\0' as a terminator.

fread() will tell you how many items it read. If you used
fread(buf,CHAR_SIZE,BUFSIZE,file)
you would get a useful return code (the number of bytes read) from it.
 
U

user923005

#define BUFSIZE 32768
#define CHAR_SIZE 1

int main(int argc, char **argv) {
clock_t start = clock();

char buf[BUFSIZE + 1];

This is useless:
/*
memset(buf, '\0', BUFSIZE + 1); */


FILE *file = fopen("stuff.txt", "rb");

/* Create a 16K I/O buffer: */
setvbuf ( file , NULL , _IOFBF , 1024*16 );

/* We should check the return of setvbuf, as well as the file pointer
itself, of course. */
/* I will leave that to you. */
while (fread(buf, BUFSIZE, CHAR_SIZE, file)) {
//printf("%s", buf);
memset(buf, '\0', BUFSIZE);
}
//printf("%s", buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);}
[snip]
What does everyone think? Could it be better?

1. The memset() calls are totally pointless. You set the data buffer
to zero and then set it to the wanted value via reads. This is not
different than just setting the value via reads.
2. If you want to read faster, then enlarge the read buffer via
setvbuff(). It makes a bigger difference for writing rather than
reading, but it should reduce the total number of reads.
 
W

William Pursell

Hi everyone

BTW, thanks to everyone in this group, your collective advice has been
very helpful. I have to say, the C guys are definitely much nicer
than the Lisp guys ;-).

Tonight, I was thinking about freads, and how to get them faster. I
initially wrote a program like this:

-- START CODE --
#include <stdio.h>
#include <time.h>

#define BUFSIZE 32768
#define CHAR_SIZE 1

BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.
 
K

Khookie

BUFSIZ is defined in stdio.h. It is selected (in part)
as the size that is most efficient for I/O on the
implementation. Unless you have a really good
reason not to, you should probably use it instead
of randomly selecting your own value.

whoops - thanks everyone for pointing mistakes out & giving me
suggestions... will definitely fix it

Chris
 
K

Khookie

whoops - thanks everyone for pointing mistakes out & giving me
suggestions... will definitely fix it

Chris

Here is a hopefully more sane version.

#include <stdio.h>
#include <time.h>

#define BUFSIZE BUFSIZ

int main() {
clock_t start = clock();

FILE* file = fopen("stuff.txt", "rb");

char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

int length;
while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {
printf(buf);
}
buf[length] = '\0';
printf(buf);

fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);
}
 
K

Keith Thompson

Khookie said:
Here is a hopefully more sane version.

#include <stdio.h>
#include <time.h>

#define BUFSIZE BUFSIZ

That's a bit silly. Why not use BUFSIZ directly?
int main() {

Better: int main(void)
clock_t start = clock();

FILE* file = fopen("stuff.txt", "rb");

You open the file (in binary mode), but you don't check whether the
open succeeded. (It seems odd to open a file named "stuff.txt" in
binary mode, but it's not necessarily wrong.)
char buf[BUFSIZE + 1];
buf[BUFSIZE] = '\0';

int length;
while ((length = fread(buf, 1, BUFSIZE, file)) == BUFSIZE) {

fread() returns a size_t result. Why not assign that result to a
size_t variable?
printf(buf);

This is dangerous. You said before that the input file contains
random data (though that's not very specific; it could be random
bytes, random printable characters, random letters, or random
Shakespeare quotations). You pass buf as the format string to printf.
If buf happens to contain a '%' character, it will likely be
interpreted as a format specifier. Kaboom.

And if "stuff.txt" contains random binary data, it likely contains
null bytes ('\0'), which will terminate the string early.

In other words, you're reading data as if it were binary, and writing
it as if it were text (with no '%' characters). Be consistent.
}
buf[length] = '\0';
printf(buf);

See above.
fclose(file);

clock_t finish = clock();
printf("Total CPU time: %d\n", finish - start);

"%d" expects an argument of type int. ``finish - start'' is of type
clock_t. And since the value returned by clock() is scaled by
CLOCKS_PER_SEC, the value printed may not mean much anyway.

I'd add a "return 0;" here.

You mix declarations and statements within a block. This is a
C99-specific feature, not supported by all compilers.
 
K

Keith Thompson

Richard Heathfield said:
[Yo, Keith. Just a little nit, here.]

Keith Thompson said:
That's a bit silly.

No, it isn't.
Why not use BUFSIZ directly?

Because he might want to change it later.

Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.
 
K

Khookie

Richard Heathfield said:
[Yo, Keith. Just a little nit, here.]
Keith Thompson said:
No, it isn't.
Because he might want to change it later.

Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.

--
Keith Thompson (The_Other_Keith) <[email protected]>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Apologies - the above code is meant for a sockets application, hence
"#define BUFSIZE BUFSIZ" might look odd, especially in the context of
printf. I'm still trying to determine what makes an optimal buffer
size for sending data via sockets.

Chris
 
U

user923005

Richard Heathfield said:
[Yo, Keith. Just a little nit, here.]
Keith Thompson said:
#define BUFSIZE BUFSIZ
That's a bit silly.
No, it isn't.
Why not use BUFSIZ directly?
Because he might want to change it later.
Yes, good point. But in that case, I'd really want to use a name
other than BUFSIZE, something easier to distinguish from BUFSIZ.
--
Keith Thompson (The_Other_Keith) <[email protected]>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Apologies - the above code is meant for a sockets application, hence
"#define BUFSIZE BUFSIZ" might look odd, especially in the context of
printf. I'm still trying to determine what makes an optimal buffer
size for sending data via sockets.

Ludicrously off-topic, but this is what you want for that:
http://dast.nlanr.net/Projects/Iperf/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top