How to speed up ftell()/fseek()

  • Thread starter Leslaw Bieniasz
  • Start date
L

Leslaw Bieniasz

Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.

L.B.


*-------------------------------------------------------------------*
| Dr. Leslaw Bieniasz, |
| Institute of Physical Chemistry of the Polish Academy of Sciences,|
| Department of Electrochemical Oxidation of Gaseous Fuels, |
| ul. Zagrody 13, 30-318 Cracow, Poland. |
| tel./fax: +48 (12) 266-03-41 |
| E-mail: (e-mail address removed) |
*-------------------------------------------------------------------*
| Interested in Computational Electrochemistry? |
| Visit my web site: http://www.cyf-kr.edu.pl/~nbbienia |
*-------------------------------------------------------------------*
 
D

David White

Leslaw Bieniasz said:
Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.

Let me guess: You are using a Microsoft compiler. I once wrote a language
interpreter that did all the necessary token recognition, parsing and
expression evaluation, and it turned out that an lseek I was doing just to
keep track of the current file position (and not to actually seek anywhere)
was taking 50% of the execution time! That was easy to fix because I only
had to use my own counter to keep track of the position myself. In your case
the fseek is really seeking, so I don't know what you can do. Are you sure
the delays are excessive? You would expect some degradation in performance
as the file size increases and the physical seek distances on the disk get
larger.

DW
 
A

Alex Vinokur

Leslaw Bieniasz said:
Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.
[snip]

Perhaps the following links will give some tips:
http://groups-beta.google.com/group/perfo/msg/530fae8e5e065030?hl=en
http://groups-beta.google.com/group/perfo/msg/8a74465da4c4e9bb?hl=en
 
T

Thomas Maier-Komor

Leslaw said:
Hello,

I am trying to fastly read large binary files (order of 100-200 MB)
using ftell() and fseek(). My class gets a pointer to the
data stored in the file, and then uses fseek() to access
and read the data. The problem is that when the file grows
in size, the access time also increases. I initially used
fseek() with option SEEK_SET, but later switched to SEEK_CUR
in the hope that this will speed up the access, but there
is no improvement. My question is: is there anything else
one can do in order to have the access time independed
on the file size?
Stream classes are not conceivable here, as they are even
much slower.

L.B.


*-------------------------------------------------------------------*
| Dr. Leslaw Bieniasz, |
| Institute of Physical Chemistry of the Polish Academy of Sciences,|
| Department of Electrochemical Oxidation of Gaseous Fuels, |
| ul. Zagrody 13, 30-318 Cracow, Poland. |
| tel./fax: +48 (12) 266-03-41 |
| E-mail: (e-mail address removed) |
*-------------------------------------------------------------------*
| Interested in Computational Electrochemistry? |
| Visit my web site: http://www.cyf-kr.edu.pl/~nbbienia |
*-------------------------------------------------------------------*

did you consider mmap'ing the file instead? I don't know whether
this is available on your platform and more performant than fseek,
but it might be worth a try.

Tom
 
L

Leslaw Bieniasz

How can I do the mmapping of a file?
I heard about that but I don't know how to actually do this.
L.B.

*-------------------------------------------------------------------*
| Dr. Leslaw Bieniasz, |
| Institute of Physical Chemistry of the Polish Academy of Sciences,|
| Department of Electrochemical Oxidation of Gaseous Fuels, |
| ul. Zagrody 13, 30-318 Cracow, Poland. |
| tel./fax: +48 (12) 266-03-41 |
| E-mail: (e-mail address removed) |
*-------------------------------------------------------------------*
| Interested in Computational Electrochemistry? |
| Visit my web site: http://www.cyf-kr.edu.pl/~nbbienia |
*-------------------------------------------------------------------*
 
L

Lionel B

Leslaw said:
How can I do the mmapping of a file?
I heard about that but I don't know how to actually do this.

Be aware that it's basically a Unix thing... see if you have a header file called sys/mman.h in your sytem path.

http://www.gnu.org/software/libc/ma...mapped-I_002fO.html#Memory_002dmapped-I_002fO

I think you'll find some sample code if you follow the links in the post by Alex Vinokur earlier in this thread (it's a
bit tricky getting all the parameters right, I seem to recall).
 
T

Thomas Maier-Komor

Lionel said:
Be aware that it's basically a Unix thing... see if you have a header file called sys/mman.h in your sytem path.

http://www.gnu.org/software/libc/ma...mapped-I_002fO.html#Memory_002dmapped-I_002fO

I think you'll find some sample code if you follow the links in the post by Alex Vinokur earlier in this thread (it's a
bit tricky getting all the parameters right, I seem to recall).

when it comes to POSIX and UNIX the best places to go to are IMHO
- http://www.opengroup.org
(in this case
http://www.opengroup.org/onlinepubs/7990989775/xsh/mmap.html)
- usenet: comp.unix.programmer
- the man pages on your systems
- the docs of your system provider (e.g.: http://docs.sun.com)

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,829
Latest member
KimberAlli

Latest Threads

Top