getline buffering

T

toton

Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?

abir
 
?

=?iso-8859-1?q?Erik_Wikstr=F6m?=

Hi,
I am reading some large text files and parsing it. typical file size
I am using is 3 MB. It takes around 20 sec just to use std::getline (I
need to treat newlines properly ) for whole file in debug , and 8 sec
while optimization on.
It is for Visual Studio 7.1 and its std library. While vim opens it
in a fraction of sec.
So, is it that getline is reading the file line by line, instead
reading a chunk at a time in its internal buffer? is there any
function to set how much to read from the stream internally ?
I am not very comfortable with read and readsome , to load a large
buffer, as it changes the file position. While I need the visible file
position to be the position I am actually, while "internally" it
should read some more , may be like 1MB chunk ... ?

I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example: http://www.cplusplus.com/reference/iostream/streambuf/pubsetbuf.html
 
T

toton

I'm not sure, but I think it's the other way around, Vim does not read
the whole file at once so it's faster.

Each ifstream has a buffer associated with it, you can get a pointer
to it with the rdbuf()-method and you can specify an array to use as
buffer with the pubsetbuf()-method. See the following link for a short
example:http://www.cplusplus.com/reference/iostream/streambuf/pubsetbuf.html

Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....

abir
 
J

Jacek Dziedzic

toton said:
Hi,
I had checked it in a separate console project (multi threaded ) it
is running perfectly, and reads within .8 sec. However the same code
takes 12 sec when running inside my Qt app.
I fear Qt lib is interacting with c++ runtime is some way to cause the
problem ....
May be I need to build the Qt lib a fresh to check what is wrong.
Thanks for answering the question ....

Make sure you decouple stream I/O from stdio, i.e. do
std::ios::sync_with_stdio(false);

HTH,
- J/
 
T

toton

Normally good advice, but unnecessary with VC++.

P.J. Plauger
Dinkumware, Ltd.http://www.dinkumware.com

I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !
 
I

Ismo Salonen

toton said:
I got the problem. It has nothing to do with Qt or other
libraries ....
I was using a tellg() to get the current position. Now my question is
why tellg is such costly ? Won't it just return the current strem
position ?
To explain,
{
boost::progress_timer t;
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
///int pos = in.tellg();
std::getline(in,line);
}
}
This code takes 0.58 sec in my computer while if I uncomment the line
in.tellg(), it takes 120.8 sec !

Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.

ismo
 
T

toton

Could it be that you have opened the file in text mode and the tellg()
seeks to beginning always and rereads characters (counting cr+lf pairs
as one ). Try switching to binary mode and handle cr+lf yourself.

ismo

The whole purpose of using getline is that only. I am not sure why
tellg have to behave like that in text mode , it is stored one !
Tested the same with gcc .The program in mingw is not giving any big
performance
difference.
here is the program
#include <fstream>
#include <iostream>
#include <ctime>
int main(){
{
//boost::progress_timer t;
time_t start,end;
time(&start);
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
time(&end);
std::cout<<difftime(end,start);
}
}
With & without comment on the line , it takes 2 sec & 3 sec
respectively (without -o2 flag ) It looks fine to me ...
Even the visual studio std code looks quite simple one ....
anyone else has tested it with a big file (4-8 MB )and found a huge
difference ?

thanks
abir
 
?

=?iso-8859-1?q?Erik_Wikstr=F6m?=

The whole purpose of using getline is that only. I am not sure why
tellg have to behave like that in text mode , it is stored one !
Tested the same with gcc .The program in mingw is not giving any big
performance
difference.
here is the program
#include <fstream>
#include <iostream>
#include <ctime>
int main(){
{
//boost::progress_timer t;
time_t start,end;
time(&start);
std::ifstream in("Y:/Data/workspaces/tob4f/tob4f.dat");
std::string line;
while(in){
int pos = in.tellg();
std::getline(in,line);
}
time(&end);
std::cout<<difftime(end,start);
}}

With & without comment on the line , it takes 2 sec & 3 sec
respectively (without -o2 flag ) It looks fine to me ...
Even the visual studio std code looks quite simple one ....
anyone else has tested it with a big file (4-8 MB )and found a huge
difference ?

On a 22.5MB file I get one second running time without tellg, 4
seconds if the file is opened in text mode and 2 seconds if opened in
binary mode. Seems quite reasonable to me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top