compressing text in std::vector <std::string>

L

Lynn McGuire

Is there a way to compress the text strings in a object of
type std::vector <std::string> ?

We use this object all over our software and the individual
strings can be up to 1 MB in size. A standard way of
compressing these strings would be nice rather than our own
home-grown approach.

Sincerely,
Lynn
 
J

Joshua Maurice

Is there a way to compress the text strings in a object of
type std::vector <std::string> ?

We use this object all over our software and the individual
strings can be up to 1 MB in size.  A standard way of
compressing these strings would be nice rather than our own
home-grown approach.

This is a pretty open ended question.

Presumably you know the pigeonhole principle, and that compression
algorithms thus must necessarily result in increased size of some
inputs, aka the "compressed" size of some strings is larger than the
original string size. Perhaps a bit theoretical, but thus compression
algorithms must be chosen based on the input data population aka
distribution.

Having said that, I could answer "Google the linux zip and unzip
utilities and see if they have a library already. Otherwise get the
source code, and embed the functionality." Is that what you're looking
for?
 
J

Jorgen Grahn

Is there a way to compress the text strings in a object of
type std::vector <std::string> ?

We use this object all over our software and the individual
strings can be up to 1 MB in size. A standard way of
compressing these strings would be nice rather than our own
home-grown approach.

zlib compression is pretty standard, I suppose ... but do you really
want to do that every time you access the vector?

But I'm guessing the real solution lies in a redesign which replaces
that vector<string> with something else. But more information would be
needed (i.e. what are you actually using them for?)

/Jorgen
 
L

Lynn McGuire

Is there a way to compress the text strings in a object of
Presumably you know the pigeonhole principle, and that compression
algorithms thus must necessarily result in increased size of some
inputs, aka the "compressed" size of some strings is larger than the
original string size. Perhaps a bit theoretical, but thus compression
algorithms must be chosen based on the input data population aka
distribution.

I was wondering if there was a "standard" way to compress strings
in std::string and so keep our code standardized. Something like
std::compressed_string <g>.

If we want to roll our own then there is a nice open source solution
at: http://bcl.comli.eu/

Thanks,
Lynn
 
L

Lynn McGuire

zlib compression is pretty standard, I suppose ... but do you really
want to do that every time you access the vector?

I would not compress the strings unless they were 1000 characters or more.
But I'm guessing the real solution lies in a redesign which replaces
that vector<string> with something else. But more information would be
needed (i.e. what are you actually using them for?)

Storage of user's results data from our calculation engine. The strings
can be many pages of results for each piece of equipment. And, there
can be several versions of results data: latest, 5 minutes ago, 1 hour
ago, yesterday at 445pm, etc... The amount is totally controlled by the
user.

Thanks,
Lynn
 
J

Joshua Maurice

I was wondering if there was a "standard" way to compress strings
in std::string and so keep our code standardized.  Something like
std::compressed_string <g>.

As in a compression function defined in the C++03 standard? No. There
is no such thing.

As for a portable compression function that would be guaranteed pre-
installed? POSIX and the single unix specification maybe, and win32
function maybe. Across both as a language or OS standard, no.

That leaves third party libraries.
If we want to roll our own then there is a nice open source solution
at: http://bcl.comli.eu/

I would also guess that zlib is a pretty good bet as Leigh mentioned.
I'm not terribly knowledgeable about such things.
 
A

AnonMail2005

Is there a way to compress the text strings in a object of
type std::vector <std::string> ?

It looks like boost iostreams library has some built in compression
filters which support standard compression methods (bzip2, gzip,
zlib).

It also has the concept of in-memory array stream which you might be
able to use to store your compressed string.

I think once the learning curve of using the library is gotten over,
the actual code will probably be trivial.

HTH
 
L

Lynn McGuire

Is there a way to compress the text strings in a object of
It looks like boost iostreams library has some built in compression
filters which support standard compression methods (bzip2, gzip,
zlib).

It also has the concept of in-memory array stream which you might be
able to use to store your compressed string.

Yes, googling "std::string compression boost" got me
http://lists.boost.org/boost-users/att-34342/main.cpp
which relies on http://www.zlib.net/ .

The interesting concept here is that the strings can be compressed
in place using the std::string objects. One just needs to maintain
the compressed / not_compressed status of the strings.

Thanks,
Lynn
 
J

Jorgen Grahn

Zlib, in turn, simply calls the deflate algorithm. There's
probably a C++ implementation floating around for free...

If you mean C++ *wrapper*, I have no objections.

If you mean a reimplementation of the same feature in C++ instead of C
(and assembly, for all I know) then I'd stick with zlib. That is a
proven implementation with a gigantic user base.

Avoiding libraries with C APIs is denying yourself one of the best
features of C++.

/Jorgen
 
J

Jorgen Grahn

I would not compress the strings unless they were 1000 characters or more.


Storage of user's results data from our calculation engine. The strings
can be many pages of results for each piece of equipment. And, there
can be several versions of results data: latest, 5 minutes ago, 1 hour
ago, yesterday at 445pm, etc... The amount is totally controlled by the
user.

Still not enough, I'm afraid. It sounds as something you'd store in
files and read/filter when needed, but that is only a wild guess.

/Jorgen
 
J

Jonathan Lee

If you mean C++ *wrapper*, I have no objections.

If you mean a reimplementation of the same feature in C++ instead of C
(and assembly, for all I know) then I'd stick with zlib. That is a
proven implementation with a gigantic user base.

Well, I *meant* reimplementation. I don't see the point of
dragging in all of zlib (I think it has file handling and such)
when all you really want is "deflate" in memory. It's a
pretty small algorithm.

Though, of course, your objections are perfectly reasonable.
In the end, it was just a heads up to the OP.

--Jonathan
 
V

Vaclav Haisman

Lynn McGuire wrote, On 9.7.2010 22:34:
Is there a way to compress the text strings in a object of
type std::vector <std::string> ?

We use this object all over our software and the individual
strings can be up to 1 MB in size. A standard way of
compressing these strings would be nice rather than our own
home-grown approach.
What are you doing with the strings? How many of them are there? How many
vectors do you have? Is vector of strings really the right data structure for
you?

You need to provide more context, I think.
 
L

Lynn McGuire

Well, I *meant* reimplementation. I don't see the point of
dragging in all of zlib (I think it has file handling and such)
when all you really want is "deflate" in memory. It's a
pretty small algorithm.

Though, of course, your objections are perfectly reasonable.
In the end, it was just a heads up to the OP.

Our main EXE (Win32) is 9.0 MB. Add zlib to it will probably
be inconsequential.

But, thanks for the heads up,
Lynn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top