Howto derive class from std::ifstream which counts newline characters

M

MSP

Hello everybody,

I am reading a text file (in a format created by myself), which
contains inline-data maintained by a third party library. Say, there
is a class, with some read-method as follows:

class CForeignClass
{
public:
bool Read(std::istream& is);
}


To read the file, I open a std::ifstream and whenever I encounter the
foreign data, I simply hand the istream reference over to the foreign
class.

std::ifstream f(path);

// read data owned by me
...

if (bForeignDataEncountered)
{
// read foreign data
CForeignClass foreignClass;
foreignClass.Read(f);
}

// read more data owned by me


While doing this, I would like to keep track of the number of lines
read by the foreign class. Is there a portable or recommended way to
do this by deriving my own class from std::ifstream and overwriting
some virtual method (or similar by replacing the standard input buffer
by a modified one) ?

Thanks in advance,
Matthias
 
F

Francesco

Hello everybody,

I am reading a text file (in a format created by myself), which
contains inline-data maintained by a third party library. Say, there
is a class, with some read-method as follows:

class CForeignClass
{
public:
bool Read(std::istream& is);
}

To read the file, I open a std::ifstream and whenever I encounter the
foreign data, I simply hand the istream reference over to the foreign
class.

std::ifstream f(path);

// read data owned by me
...

if (bForeignDataEncountered)
{
// read foreign data
CForeignClass foreignClass;
foreignClass.Read(f);
}

// read more data owned by me

While doing this, I would like to keep track of the number of lines
read by the foreign class. Is there a portable or recommended way to
do this by deriving my own class from std::ifstream and overwriting
some virtual method (or similar by replacing the standard input buffer
by a modified one) ?

Hi Matthias,
I didn't try any of the following, I'm just reasoning about the issue
you presented.

I suppose you cannot modify the foreignClass::Read() method -
otherwise you could make it return the number of lines read instead of
that bool.

But can you inspect its implementation to see how exactly it reads the
data from istream?

One solution could be to save the result of tellg() on that istream
before calling foreignClass::Read(), then call tellg() again, compute
the difference of those two pointers to understand how many characters
have been read and then parse that amount of data by yourself to count
newline characters - you'd have to reopen the file or create a copy of
it while in memory or something else like this.

Doing the above - assuming it works - you'd have no need to overload
istream.

In the other case, if you were to ensure that any read on that istream
keeps track of the lines read you should, I think, overload all read
methods of istream.

I'm going to do some testing for my very own curiosity.

Hope the above 2 cents help anyway.

Best regards,
Francesco
 
M

MSP

Hi Francesco,

first of all, thanks for the quick reply.
I suppose you cannot modify the foreignClass::Read() method -
otherwise you could make it return the number of lines read instead of
that bool.

To be precise: the foreign library is OpenCASCADE, an open source
modeller library (abbreviated OCC in what follows). So I have full
source code. Theoretically, it would be possible for me to modify the
OCC-source code, but practically, that's not an option.
But can you inspect its implementation to see how exactly it reads the
data from istream?

Reading is done in the typical C++-style, using a lot of <<-operators
(both OCC-defined and standard STL operators).
One solution could be to save the result of tellg() on that istream
before calling foreignClass::Read(), then call tellg() again, compute
the difference of those two pointers to understand how many characters
have been read and then parse that amount of data by yourself to count
newline characters - you'd have to reopen the file or create a copy of
it while in memory or something else like this.

Doing the above - assuming it works - you'd have no need to overload
istream.

I already considered this option. But the inline-date is some 3D-model
so it tends to be rather large. So I would prefer not to scan this
large chunk of data twice.
In the other case, if you were to ensure that any read on that istream
keeps track of the lines read you should, I think, overload all read
methods of istream.

Overloading all read methods is rather tedious and maybe error prone.
Instead, I was whether there is some central low-level method deep
down in the implementation of the stream (or the stream buffer), which
I could overwrite. For example, a callback which is triggered whenever
the stream buffer underflows and which filles the buffer with the new
data from the file.

I hope this clarifies what I intend to do.

Regards,
Matthias
 
M

MSP

Hi again,

I fixed two typos in the last paragraph of my previous post:

Overloading all read methods is rather tedious and maybe error prone.
Instead, I was wondering whether there is some central low-level
method deep
down in the implementation of the stream (or the stream buffer),
which
I could overwrite. For example, a callback which is triggered
whenever
the stream buffer underflows and which fills the buffer with the new
data from the file.
 
F

Francesco

Hi Francesco,

first of all, thanks for the quick reply.


To be precise: the foreign library is OpenCASCADE, an open source
modeller library (abbreviated OCC in what follows). So I have full
source code. Theoretically, it would be possible for me to modify the
OCC-source code, but practically, that's not an option.


Reading is done in the typical C++-style, using a lot of <<-operators
(both OCC-defined and standard STL operators).



I already considered this option. But the inline-date is some 3D-model
so it tends to be rather large. So I would prefer not to scan this
large chunk of data twice.




Overloading all read methods is rather tedious and maybe error prone.
Instead, I was whether there is some central low-level method deep
down in the implementation of the stream (or the stream buffer), which
I could overwrite. For example, a callback which is triggered whenever
the stream buffer underflows and which filles the buffer with the new
data from the file.

I hope this clarifies what I intend to do.

Yes, now it's clearer.

I think you'll have do dig into the calls of your std::istream
implementation and find that final function that actually does the
"dirty" work ;-)

The advantage of overriding all read methods of std::istream is that
your derived class will then be portable, while directly touching the
implementation of std::istream isn't portable at all - also, this
should be a "don't do it", decisively.

Both the tellg() solution and overriding std::istream operators will
lead to parsing of data twice - as you said for the first case, and as
I'm realizing now for the second case.

If you're really going to modify some code which doesn't directly
belong to your application I think it's better to modify the OCC code.

As a side question: why you want to know how many lines have been
parsed by OCC? I think that maybe you're using this datum for
something that could be achieved in some other way which doesn't
oblige you to "interfere" with the OCC parse.

Meanwhile, let's hope for someone dropping in and giving a working,
efficient and portable solution. There are a lot of wizards hanging
out there ;-)

Have good coding,
Francesco
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... using a library that reads some data from an istream ]
While doing this, I would like to keep track of the number of lines
read by the foreign class. Is there a portable or recommended way to
do this by deriving my own class from std::ifstream and overwriting
some virtual method (or similar by replacing the standard input buffer
by a modified one) ?

A filtering streambuf can do what you want. James Kanze has (or at
least used to have, and there's undoubtedly a copy still around) a
web page describing the basics of creating a filtering streambuf.
Jonathan Turkanis also wrote an iostreams library that's included in
Boost that makes it fairly simple to write filtering streambufs (as
long as you don't object to Boost, of course).
 
M

MSP

Hi Francesco,

Both the tellg() solution and overriding std::istream operators will
lead to parsing of data twice - as you said for the first case, and as
I'm realizing now for the second case.

I noticed this fact, too. But there is a slight difference between the
two cases, which theoretically could make a difference in performance:
In the second case the two scans of the same data byte occur in quick
succession, whereas in the first case (using tellg()), a lot of data
is scanned in between. As a result, it is possible that the same data
has to be mapped from disk into memory twice. (Note that the 3D data
tends to be very large.) However, I am not an expert in these matters
and also I have no profiling tools to measure performance exactly.
Maybe nowadays this is not such an issue anyway, since computers have
a lot of memory and cache.

As a side question: why you want to know how many lines have been
parsed by OCC? I think that maybe you're using this datum for
something that could be achieved in some other way which doesn't
oblige you to "interfere" with the OCC parse.

Very simple: the native (=non-foreign) part of the data file is very
human-readable, it is parsed by a yacc-parser and when it encounters
an error it issues a meaningful error message with a line number. The
OCC data is not at the end of the file, so errors following the OCC
data are currently reported with incorrect line numbers. (Of course,
there is a simple solution for this: place the OCC data at the end!
Probably, I will do this. But since I am a curious person, the problem
of intercepting the newline characters interests me nevertheless.)
Meanwhile, let's hope for someone dropping in and giving a working,
efficient and portable solution. There are a lot of wizards hanging
out there ;-)

Have good coding,
Francesco- Zitierten Text ausblenden -


For the moment, thanks a lot for your interest in my problem and the
good discussion. If I'll come up with an 'ingenious' solution, I will
let you know.

Regards,
Matthias
 
M

MSP

(e-mail address removed)>, (e-mail address removed)
says...

[ ... using a library that reads some data from an istream ]
While doing this, I would like to keep track of the number of lines
read by the foreign class. Is there a portable or recommended way to
do this by deriving my own class from std::ifstream and overwriting
some virtual method (or similar by replacing the standard input buffer
by a modified one) ?

A filtering streambuf can do what you want. James Kanze has (or at
least used to have, and there's undoubtedly a copy still around) a
web page describing the basics of creating a filtering streambuf.
Jonathan Turkanis also wrote an iostreams library that's included in
Boost that makes it fairly simple to write filtering streambufs (as
long as you don't object to Boost, of course).


Hi Jerry,
thank you for pointing out Boost (www.boost.org) to me. I didn't know
it yet. Sounds like an interesting location to look at. Is there any
reason why someone could object it?

Regards,
Matthias
 
A

Alf P. Steinbach

* MSP:
thank you for pointing out Boost (www.boost.org) to me. I didn't know
it yet. Sounds like an interesting location to look at. Is there any
reason why someone could object it?

Yes.

First, Boost is large.

Second, there is a versioning problem with use of any external library.

Third, using Boost runs squarely against the NIH principle (Not Invented Here,
don't use it). Some companies have as policy to not use third-party libraries at
all. At least not free ones.

Fourth, Boost is based on exceptions for failure reporting. In a setting where
exceptions are not used that might be a problem. There's probably a technical
solution for that, but I don't know.

Fifth, although Boost's license lets you do just about anything you can do with
the C++ standard library, it might be difficult to convince someone that it's
really safe from a business perspective. Before you know it Dave Abrahams might
be knocking on the door with his gun-slinging Boost lawyer! :) Or at least, the
wrong kind of manager might think that that is a distinct possibility.


Cheers & hth.,

- Alf
 
J

Jerry Coffin

[ ... ]
thank you for pointing out Boost (www.boost.org) to me. I didn't
know it yet. Sounds like an interesting location to look at. Is
there any reason why someone could object it?

Apparently a few, since some people object to it.

The first obvious object is that it's big -- downright huge, truth be
known. Although it's not (exactly) commercial in itself, it has a lot
of the same kinds of things as a commercial offering, such as extra
code to work around bugs in a large number of compilers. This not
only increases bulk, but often hurts readability.

Second, Boost often takes on problems many people wouldn't even
consider -- and to do that, includes some code that's extremely
difficult to comprehend. In many cases, code that _uses_ the library
is pretty simple and straightforward, but reading the library code
itself can be a mind-bending experience (e.g. spirit and expressive).

Third, since much of it uses templates heavily (and in ways compiler
authors probably didn't plan for), some boost code compiles quite
slowly. Any more than a tiny parser written with Spirit can tax many
(especially older) compilers right to, and sometimes beyond, their
limits.

Finally, rather than being simply a collection of pre-written, pre-
tested (etc.) code about like what you'd probably write yourself if
you had time, much of Boost attempts to provide the highest level of
generality and abstraction possible. This often requires people to
sit back and re-think how they approach a problem in general.
Especially if you already have a large body of existing code, it can
be difficult to incorporate a library that requires a large,
fundamental change in how you approach a problem. Even without
existing code, existing notions of how to approach a problem can be
equally difficult to overcome. Along with requiring thought, this
places rather higher requirements on maintenance coders -- you often
need direct knowledge of a specific library and its conventions
before you can understand code at all.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top