lseek and write question

G

golden

Hello,

I am going to ask a question regarding
write and lseek. I will provide code at the end of this, but first
some background.


I am trying to identify the cause of some latency in writing to disk.
My user claims that performance is much slower on SAN than on local
disk. The developer provided me a C++ program that performed a write
test that confirmed his suspicions. I modified the code to better
fit
my needs which it does now.


What I found during the test is that fsync is an expensive operation
and will block waiting for a confirmation from the disk device. What
I am trying to understand is the lseek function.


From what I read, it simply moves the pointer in the file descriptor
as directed. When I use this lseek function, writes are faster.


My question is why? When I use the write command, does the pointer
get reset and on each write, it will search for EOF?


This is running Linux sytem.


Thanks in advance:


#include <sys/types.h>
#include <sys/time.h>


#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>


int main(int argc, char **argv)
{
struct timeval start, end;
double usecs;
long val;
int ch, fd, idx, ops, numThreads;
char *fname= "";
int filesize = 40000000;
int bytes = 0;
bool dosync = true, doSeek=false;


bytes = 0;
ops = 0;
char *buf = new char[bytes];
fname = argv[1];


while (( ch = getopt(argc,argv, "b:eek::f:sl")) != EOF)
switch (ch) {
case 'b' :
bytes = atoi(optarg);
break;
case 'o' :
ops = atoi(optarg);
break;
case 'f' :
fname = (optarg);
break;
case 's' :
dosync = false;
break;
case 'l' :
doSeek = true;
break;
}
argc -= optind;
argv += optind;


gettimeofday(&start,NULL);


memset(buf,0,bytes);
if ( dosync ) {
printf("Processing %d bytes with %d Operations of fsync :
\t", bytes,ops);
} else {
printf("Processing %d bytes with %d Operations of fsync :
\t", bytes,1);
}


// unlink(fname);
if ((fd = open(fname, O_RDWR | O_CREAT, 0666)) == -1)
{
int errNum = errno;
printf("ERROR: failed to open %s: n",fname);
return(0);
}


for ( int idx(0) ; idx < ops ; idx++)
{


if (write(fd, buf, bytes) != bytes)
{
printf("write: \n");
exit (1);
}


if ( dosync ) {
if (fsync(fd) != 0)
{
printf("fsync: \n");
exit (1);
}
}
if ( doSeek )
{
if (lseek(fd, (off_t)0, SEEK_SET) == -1)
{
printf("lseek: %s\n",
strerror(errno));
exit (1);
}
}


}


// One last sync


if (fsync(fd) != 0)
{
printf("fsync: \n");
exit (1);
}
gettimeofday(&end,NULL);


int totalSec = 0;
long totalUSec = 0;


if (start.tv_usec > end.tv_usec) {
end.tv_usec += 1000000;
end.tv_sec--;
}


totalSec = end.tv_sec - start.tv_sec;
totalUSec = end.tv_usec - start.tv_usec;
int t = totalSec + (totalUSec / 1000000);


printf("%ld Hours ",t / ( 60 * 60));
t %= (60*60);
printf("%ld Minutes ",t / 60);
t %= 60;
printf("%ld.%ld Seconds ",t ,totalUSec);
printf("%ld.%ld Seconds\n ",totalSec ,totalUSec);
}
 
V

Victor Bazarov

golden said:
I am going to ask a question regarding
write and lseek. I will provide code at the end of this, but first
some background.
[..]
What I found during the test is that fsync is an expensive operation
and will block waiting for a confirmation from the disk device. What
I am trying to understand is the lseek function.


From what I read, it simply moves the pointer in the file descriptor
as directed. When I use this lseek function, writes are faster.


My question is why? When I use the write command, does the pointer
get reset and on each write, it will search for EOF?


This is running Linux sytem.

[..]

First a nit pick: 'write' is not a command. It's a function. IIRC,
it's a POSIX function, which isn't really on topic here. Now that's
out of the way, second, in C++ we'd use the 'fwrite' function (from
the C Standard Library). Have you tried switching to using 'fwrite'
instead?

And the last point: you might want to consider asking in the Linux
newsgroup since I/O performance depends greatly on the platform, and
there is no real explanation from the language point of view why
'write' is so slow without 'lseek'.

V
 
G

golden

golden said:
I am going to ask a question regarding
write and lseek. I will provide code at the end of this, but first
some background.
[..]
What I found during the test is that fsync is an expensive operation
and will block waiting for a confirmation from the disk device. What
I am trying to understand is the lseek function.
From what I read, it simply moves the pointer in the file descriptor
as directed. When I use this lseek function, writes are faster.
My question is why? When I use the write command, does the pointer
get reset and on each write, it will search for EOF?
This is running Linux sytem.

First a nit pick: 'write' is not a command. It's a function. IIRC,
it's a POSIX function, which isn't really on topic here. Now that's
out of the way, second, in C++ we'd use the 'fwrite' function (from
the C Standard Library). Have you tried switching to using 'fwrite'
instead?

And the last point: you might want to consider asking in the Linux
newsgroup since I/O performance depends greatly on the platform, and
there is no real explanation from the language point of view why
'write' is so slow without 'lseek'.

V

Thanks... the nitpicking will make me better, so I welcome that. I am
so used to programming in perl the "command" seems automatic. I will
try the fwrite and visit the linux group. Thanks for the reply.
 
J

James Kanze

golden said:
I am going to ask a question regarding
write and lseek. I will provide code at the end of this, but first
some background.
[..]
What I found during the test is that fsync is an expensive operation
and will block waiting for a confirmation from the disk device. What
I am trying to understand is the lseek function.
From what I read, it simply moves the pointer in the file descriptor
as directed. When I use this lseek function, writes are faster.
My question is why? When I use the write command, does the pointer
get reset and on each write, it will search for EOF?
This is running Linux sytem.
[..]
First a nit pick: 'write' is not a command. It's a function. IIRC,
it's a POSIX function, which isn't really on topic here. Now that's
out of the way, second, in C++ we'd use the 'fwrite' function (from
the C Standard Library). Have you tried switching to using 'fwrite'
instead?

It won't work. He's using a functionality (synchronized
writing) which isn't available in the standard library. The
most you can ever guarantee with the standard library (either
FILE* or iostream) is that the data has been transfered to the
OS; his call to fsych guarantees that it has been physically
written on the medium.
And the last point: you might want to consider asking in the Linux
newsgroup since I/O performance depends greatly on the platform, and
there is no real explanation from the language point of view why
'write' is so slow without 'lseek'.

With regards to his particular question, the answer seems
obvious (and will probably be the same on any system, anytime he
doesn't use synchronized writes): because of the seek, he's
always writing the data at the same place on the disk, which
means that the system can always reuse the same sector cache,
and never has to go to disk. Without the seek, he's writing a
fairly large file, and the system probably won't keep all of the
cached data around, but will write to disk.

Is it really surprising that writing a file with one record is
significantly faster than writing one with ops records (where
ops is probably fairly large)?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top