Short string class

B

Brian

Shalom

Is there a short string class around? I've searched a little on the
net and didn't find anything. I'm interested in a class that limits
(possibly at compile time) the length to 255, so marshalling the
string's length only requires one byte. Thanks in advance.


Brian Wood
http://webEbenezer.net
(651) 251-9384
 
M

Marcel Müller

Brian said:
Is there a short string class around? I've searched a little on the
net and didn't find anything. I'm interested in a class that limits
(possibly at compile time) the length to 255, so marshalling the
string's length only requires one byte. Thanks in advance.

Feel free to write one. :)

All you have to do is to replace size_t by unsigned char. But I am in
doubt what the real benefit should be.


Marcel
 
B

Brian

Feel free to write one. :)

All you have to do is to replace size_t by unsigned char. But I am in
doubt what the real benefit should be.

The benefit is saving 3 or more bytes in bandwidth in the marshalling
process. Since strings are common it adds up. 255 chars is more
than enough for 95% of my uses of strings.


Brian Wood
 
B

Brian

This seems like a special marshalling code would be needed, no need for a
special string class.

Paavo


Possibly. However, I can only get 128 values in a byte
with my variable-length integer stuff, so there would be some
advantage to a unique type. If the limit were given at compile
time, the marshalling could use the minimum number of bytes and
not have to use the more complicated variable length encoding.
Maybe supporting both would be ideal.


Brian Wood
 
M

Marcel Müller

Paavo said:
This seems like a special marshalling code would be needed, no need for a
special string class.

Yes, indeed.

If only storage counts and the protocol is your choice, you may use a
dynamic 7 bit encoding:

0x00 .. 0x7f -> 0 .. 127 Bytes length
0x80 0x00 .. 0xff 0x7f -> 128 .. 16511
0x80 0x80 0x00 .. 0xff 0xff 0x7f -> 16512 .. 2113663
0x80 0x80 0x80 0x00 ... -> 2113664 .. 270549119
0x80 0x80 0x80 0x80 0x00 ... -> 270549120 .. 3.46E+10

Since the string content has dynamic length anyway it should be not too
complicated to deal with a dynamic length field too. If access
performance counts, you could store the length at negative offsets from
the *this. (A C++ class with a custom allocator could wrap that.)

But if you only want to conserve bytes, storing headerless ZLIB
compressed strings might be significantly more effective.


Marcel
 
B

Brian

Yes, indeed.

I don't think it is so clear. I'm currently using this
function:

template <typename T>
void
stringGroupCount(Counter& cntr, T const& grp)
{
cntr.MultiplyAndAdd(grp.size(), sizeof(uint32_t) );

typename T::const_iterator It = grp.begin();
typename T::const_iterator End = grp.end();
for (; It != End; ++It) {
cntr.Add((*It).length());
}
}

to count how many bytes are in a collection of strings.
The line that multiplies the multiplies the number of
elements in the collection times the sizeof value would
have to be replaced with a loop. So having a distinct
short string class has some advantages.

But if you only want to conserve bytes, storing headerless ZLIB
compressed strings might be significantly more effective.

I'm not sure what you mean by that.


Brian Wood
 
B

Brian

I don't think it is so clear.  I'm currently using this
function:

template <typename T>
void
stringGroupCount(Counter& cntr, T const& grp)
{
  cntr.MultiplyAndAdd(grp.size(), sizeof(uint32_t) );

  typename T::const_iterator It = grp.begin();
  typename T::const_iterator End = grp.end();
  for (; It != End; ++It) {
    cntr.Add((*It).length());
  }

}

to count how many bytes are in a collection of strings.
The line that multiplies the multiplies the number of
elements in the collection times the sizeof value would
have to be replaced with a loop.  

I guess that's not right. The number of bytes needed to
marshall the string length could be calculated in the
existing loop.


Brian Wood
 
I

Ian Collins

The benefit is saving 3 or more bytes in bandwidth in the marshalling
process. Since strings are common it adds up. 255 chars is more
than enough for 95% of my uses of strings.

Um, would you put as much effort into removing 3 characters from a
string? This does look like a lot of effort for little gain.
 
T

tonydee

Is there a short string class around?  I've searched a little on the
net and didn't find anything.  I'm interested in a class that limits
(possibly at compile time) the length to 255, so marshalling the
string's length only requires one byte.  Thanks in advance.

I'm not aware of one. But, given your interest is in marshalling and
not memory use, why not extend std::string trivially...? (I know
that's a touchy subject around here, but hey ;-P). By reserving the
single value 255 as a "longer-string" sentinel, you can make it safer
too. A illustration of this I haven't even tried to compile below...

Cheers,
Tony

struct Short_String : std::string
{
std::eek:stream& marshall_to(std::eek:stream& os)
{
if (size() < 255)
os << (uint8_t)size();
else
{
os << (uint8_t)255;
uint32_t nsize = htonl(size());
os.write(&nsize, 4);
}
return os << *this;
}

// marshall_from similarly...
};
 
B

Brian

I'm not aware of one.  But, given your interest is in marshalling and
not memory use, why not extend std::stringtrivially...?  (I know
that's a touchy subject around here, but hey ;-P).  By reserving the
single value 255 as a "longer-string" sentinel, you can make it safer
too.  A illustration of this I haven't even tried to compile below...

Cheers,
Tony

struct Short_String : std::string
{
    std::eek:stream& marshall_to(std::eek:stream& os)
    {
        if (size() < 255)
            os << (uint8_t)size();
        else
        {
            os << (uint8_t)255;
            uint32_t nsize = htonl(size());
            os.write(&nsize, 4);
        }
        return os << *this;
    }

    // marshall_from similarly...

};

I hacked up a "lil_string" now based on a string implementation
by Christian Stigen Larsen -- http://sublevel3.org.

http://webEbenezer.net/posts/lil_string.hh
http://webEbenezer.net/posts/lil_string.cc
http://webEbenezer.net/posts/lil_test.cc

If an operation would cause the size to exceed 255 bytes,
it throws an exception.

Brian Wood
 
J

Jorgen Grahn

On Apr 15, 3:38 pm, Marcel Müller <[email protected]>
wrote:

I'm not sure what you mean by that.

He means you can painfully invent and implement schemes for saving a
bit here and there, but you are probably going to be beaten by the guy
who ignores that and just slaps on a standard compression algorithm on
top of the file format or networking protocol. ZLIB is the most
popular of these.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,697
Latest member
AugustNabo

Latest Threads

Top