iostream and memory-mapped file

W

wakun

Hi there,
I am seeking a fastest way to load a BIG string and parse it as a
given format. I have a extern function which return a (char *)string in
BIG size. Now, I am going to parse it with a iterator as following

char *str = return_a_big_size_str();
istringstream ss(string(str), istringstream::in);
istreambuf_iterator<char> bit(ss), eit;
parsing(bit, eit);

I found the code shown above is so inefficient because of the big size
of str.

BTW, I also save the whole string to a file, says str.txt, and then
load the file with ifstream

std::ifstream input("str.txt") ;
std::istreambuf_iterator bit(input), eit;
parsing(bit, eit);

I can't believe that the later program is faster than the previous one.
Anyway, I think memory-mapped IO maybe a better choice. However, I
have no idea how memory-mapped file associated with ifstream
 
C

Cory Nelson

it's slow because you are making a lot of copies.

is your parser templatized to use any kind of char iterator? then it
would be as easy as parsing(str, str+len). no copying required.
 
T

TB

(e-mail address removed) skrev:
Hi there,
I am seeking a fastest way to load a BIG string and parse it as a
given format. I have a extern function which return a (char *)string in
BIG size. Now, I am going to parse it with a iterator as following

IO is slow, accept it.
char *str = return_a_big_size_str();
istringstream ss(string(str), istringstream::in);
istreambuf_iterator<char> bit(ss), eit;
parsing(bit, eit);

I found the code shown above is so inefficient because of the big size
of str.

You could always write your own iterator:

#include <iterator>
#include <stdexcept>

class cstringiterator
: public std::iterator<std::input_iterator_tag,char> {

private:
char const * d_cstring;

public:
cstringiterator(char const * cstring = 0)
: d_cstring(cstring) { }
cstringiterator(cstringiterator const & csi)
: d_cstring(csi.d_cstring) { }

value_type operator*() throw (std::runtime_error) {
if(!d_cstring) throw std::runtime_error("Access Denied");
return *d_cstring;
}
cstringiterator & operator++() throw () {
if(d_cstring) {
if(!*++d_cstring) {
d_cstring = 0;
}
}
return *this;
}
cstringiterator operator++(int) throw () {
cstringiterator c(d_cstring);
++*this;
return c;
}
bool operator==(cstringiterator const & csi) const throw () {
return d_cstring == csi.d_cstring;
}
bool operator!=(cstringiterator const & csi) const throw () {
return d_cstring != csi.d_cstring;
}
};

#include <ostream>
#include <algorithm>

int main(int argc, char* argv[])
{
char const * c = "apa";
std::copy(cstringiterator(c),cstringiterator(),
std::eek:stream_iterator<char>(std::cout));
return 0;
}
BTW, I also save the whole string to a file, says str.txt, and then
load the file with ifstream

std::ifstream input("str.txt") ;
std::istreambuf_iterator bit(input), eit;
parsing(bit, eit);

Use an iterator that utilizes internal buffers, and only reads ahead
when called for; overwriting old buffers and allocates new when needed,
unless you actually must have complete access to the entire string at
any time.
I can't believe that the later program is faster than the previous one.
Anyway, I think memory-mapped IO maybe a better choice. However, I
have no idea how memory-mapped file associated with ifstream

Memory mapping a file is rather platform specific with its own set of
native api calls. Derive a class from std::basic_filebuf that neatly
handles it all.
 
D

Dietmar Kuehl

char *str = return_a_big_size_str();
istringstream ss(string(str), istringstream::in);

The above line create at least two copies of the string which are
all around at the same time. This is likely to cause swapping on your
system (at least if the strings are really rather large). This is an
tremendous performance hit.
istreambuf_iterator<char> bit(ss), eit;
parsing(bit, eit);

Hold it! You are parsing your string using stream *buffer* iterators,
i.e. you are not taking advantage of the formatting facilities of
streams at all? Why don't you simply pass pointers as the iterators
to the 'parsing()' function (which, of course, should be function
template). Assuming, however, that 'parsing()' is not a function
template, you still have the option to create a suitable stream buffer
which is used just for the situation described:

struct membuf:
std::streambuf
{
membuf(char* str) { this->setg(str, str, str + strlen(str)); }
};
membuf buffer(str);
std::istreambuf_iterator<char> bit(&buffer), eit;
// ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,000
Messages
2,570,252
Members
46,848
Latest member
CristineKo

Latest Threads

Top