Please help with testing & improving a StringValue class

  • Thread starter Alf P. Steinbach
  • Start date
C

Chris Thomasson

[...]
Unfortunately there seems to be no way to implement a 'lightweight'
thread-safe assignment operator and/or copy constructor because
incrementing/decrementing the reference-counter and assignment of the
pointer are always two distinct operations.
[...]

You could use DWCAS.
 
A

Alf P. Steinbach

* Greg Herlihy:
(I'm wondering whether SharedArray should provide indexing, and/or
perhaps keep track of the length of the array: perplexingly and almost
paradoxically, it hasn't been needed. I'm also wondering whether there
is some better way to steer constructor selection (in StringValue and
StringValueOrNull) the Right Way, currently using boost::disable_if?)

One idea to help prevent StringValue's constructor from being passed a
const char array when a string literal is expected, would be to offer
a "StringLiteral" (or STRING_LITERAL) macro that clients could use to
designate the string literal initializer explicitly. (This suggestion
is based on a similar macro in Apple's CFString class.)

#define StringLiteral(a) StringValue(""a)

The double-quotes will cause a compile-time error - unless the
initializer "a" is a string literal (that is, it has double-quotes
surrounding it):

StringValue f()
{
const char s[] = "some text";

return StringLiteral(s); // Error: expected primary-expression
before '('

return StringLiteral("some text"); // OK
}

and alternately:

StringValue sv( StringLiteral("a string literal"));

Although I am not a big fan of macros, I will admit that they
occassionally have their uses.

Thanks, it's an ingenious idea.

But I don't think it helps with respect to technical detection of
non-literals, because if/when the macro is used one already knows that
the argument is a literal.

I mean, what's the chance of anyone writing STRING_LITERAL( b ) where b
is a local array, and the macro helping to catch this mistake?


Cheers, & thanks anyway (even though I'm disagreeing! ;-)),

- Alf
 
A

Alf P. Steinbach

* Alf P. Steinbach:
An 02 version of StringValue is now available at
<url: http://home.no.net/alfps/cpp/lib/alfs_v02.zip>.

An 03 version of StringValue is now available at
<url: http://home.no.net/alfps/cpp/lib/alfs_v03.zip>.

This version is mainly just a name change, reflecting that wchar_t is
now the /natural/ character code type:

Old name: New name:

WStringValueOrNull StringValueOrNull
WStringValue StringValue

StringValueOrNull BStringValueOrNull
StringValue BStringValue

E.g., in modern Windows programming wchar_t is the default choice (at
least for me), and then it's just a hassle, and makes for unreadable
code, to have "W" and "w" prefixes all over the place, on every name.

I also added a file [acknowledgments.txt].

TESTING:
Could readers please try to sort a large vector and large list of e.g.
BStringValue versus std::string, and report the timings? I'm suspecting
that at least for vector of strings, unoptimized BStringValue will be
significantly faster than a typical heavily optimized by best experts
around std::string. But this is just a hunch... ;-)

Cheers, & hope this can be interesting[1],

- Alf


Notes:
[1] I'm considering adding support for "tied" string values (e.g. a
string value that is just a pointer from Windows' argument list, but
keeping a reference count updated for that list), and += concatenation,
the latter because there's so much O(n^2) string concatenation code
around, and although values are immutable, += can do the equivalent of s
= s + t, only much more efficient with O(t) amortized copying time. I'm
not sure yet whether these two features are mutually incompatible!
 
B

Barry

Alf said:
* Alf P. Steinbach:
TESTING:
Could readers please try to sort a large vector and large list of e.g.
BStringValue versus std::string, and report the timings? I'm suspecting
that at least for vector of strings, unoptimized BStringValue will be
significantly faster than a typical heavily optimized by best experts
around std::string. But this is just a hunch... ;-)

You must forget to test the 'W' streaming of StringValue
change them into basic_ostream.
 
A

Alf P. Steinbach

* Barry:
You must forget to test the 'W' streaming of StringValue
change them into basic_ostream.

Not sure what you mean. Wide string streaming operators << and >> are
not provided because MingW g++ 3.4.4 doesn't support wide streams. That
is, MingW g++ 3.4.4 simply does not implement wide streams, e.g. there's
no wcout or wcin, although std::wstring is implemented.

However, narrow string streaming operators (for BStringValue) are
provided, and if the compiler supports wide streams, wide strings
(StringValue) can be streamed out simply by applying the pointer()
member function -- and streaming operators implemented simply by
copying and changing the types for the existing narrow operators.

More generally: when streaming wide character strings using the standard
library streams (no matter whether to/from std::wstring or what), one
should be aware that mostly the result is a not-very-well-defined
conversion to narrow characters. I.e. "wcout << L"hello" << endl" does
not by default result in UTF-16 or some other encoding at the output
end. And the standard library does not offer such encodings.

As far as I know, only the Dinkumware library has reasonable support for
wide character streaming, in the sense of offering the usual Unicode
encoding formats.

It would be nice if e.g. Boost provided this...


Cheers,

- Alf (hoping SomeOne(TM) will do the honors of measuring performance!)
 
R

Roland Pibinger

Wide string streaming operators << and >> are
not provided because MingW g++ 3.4.4 doesn't support wide streams. That
is, MingW g++ 3.4.4 simply does not implement wide streams, e.g. there's
no wcout or wcin, although std::wstring is implemented.

Try to define _GLIBCXX_USE_WCHAR_T
 
A

Alf P. Steinbach

* Roland Pibinger:
Try to define _GLIBCXX_USE_WCHAR_T

I think I tried that once.

Anyway, if it worked (with no harmful side-effects), then wide streams
would be supported by default. One doesn't remove a large portion of
the standard library for no reason.

So I doubt it works, at least not without harmful side-effects.

Cheers,

- Alf
 
B

Barry

Alf said:
* Barry:

Not sure what you mean. Wide string streaming operators << and >> are
not provided because MingW g++ 3.4.4 doesn't support wide streams. That
is, MingW g++ 3.4.4 simply does not implement wide streams, e.g. there's
no wcout or wcin, although std::wstring is implemented.

I didn't know this, I'm quite dummy on iostream.
So "have templated iostream or not" is equivalent to "wide stream support"?
 
A

Alf P. Steinbach

* Barry:
I didn't know this, I'm quite dummy on iostream.

Well, wide character streaming requires conversion to/from some
encoding, and presumably that's why it's missing in g++ 3.4.4.

So "have templated iostream or not" is equivalent to "wide stream support"?

I assume you mean, why didn't I templatize the operators?

Work... ;-)

But OK OK OK, did that, since if you react others are also likely to
react, but I think I'll wait posting a new version until I've
implemented some more complete functionality (like tying, great for
constant time substrings, and perhaps also efficient concatenation).


Cheers, & thanks,

- Alf (still hoping SomeOne(TM) can do the honors of testing speed!)
 
A

Alf P. Steinbach

* Alf P. Steinbach:

Reusing most of the text I posted for the 01 and 02 versions:

An 04 version of StringValue is now available at
<url: http://home.no.net/alfps/cpp/lib/alfs_v04.zip>.

Old features:

* A StringValue can be constructed from a literal in constant
time with no dynamic allocation.

* A StringValue can be constructed from a pointer and deletion
operation (functor or function) in constant time, with no O(n)
copying.

* A StringValue can be copied and assigned in constant time with
no dynamic allocation (great for e.g. standard containers).

* A StringValue can be safely constructed in other ways, including
from a 'char const*' and from a 'std::string', but then involving
O(n) copying and dynamic allocation.

* A StringValue can be freely copied and assigned, but the value
can not be modified.

* A license reference (Boost license) is included in every file,
resulting from comments by Roland Pibinger (thanks).

* operator==, operator< added,
resulting from comments by Barry <[email protected]> (thanks).

* Implicit conversion to 'char const*' /removed/, because

* A class StringValueOrNull was added, which supports passing
null-values around. A StringValue is implicitly convertible to
StringValueOrNull. A StringValueOrNull value can only be explicitly
converted to pure StringValue, then with checking of nullvalue &
possible exception.

* Support for << and >> stream i/o added (because of removal of
implicit conversion to 'char const*').

* In order to be useful in Windows programming, wchar_t versions of
StringValue (WStringValue) and StringValueOrNull (ditto) have
been added, i.e. templatization on the character type.

* Free function swap implementations moved from namespace std to
namespace alfs, resulting from comments by Greg Herlihy (thanks).

* Two small example usage programs, one an abstraction of the 'main'
arguments (with almost no overhead), and one ditto showing a simple
abstraction of Windows Unicode command line arguments.

New features:

* Templatized streaming operators (i.e. wide character streaming),
resulting from comments by Barry <[email protected]> (thanks).

* Basic tied string functionality.

* Indexing operator for SharedArray (yeah, finally).

A /tied string/ shares its reference counting with some specified
SharedArray, using that SharedArray as a lifetime manager. One useful
application is for constant time substring extraction. However,
constant time substring extraction requires some more machinery than
currently present (namely keeping track of string lengths).

For now, example code abstracting Windows Unicode program arguments:


<code>
// "Error handling omitted for brevity & clarity".

#include <alfs/StringValueClass.hpp>
#include <cstddef> // std::size_t, std::ptrdiff_t
#include <vector> // std::vector
#include <string> // std::wstring

#define UNICODE
#define _UNICODE
#include <windows.h>

class ProgramArguments
{
// A ProgramArguments instance can be copied freely in constant time.
// And it can be destroyed while still having individual StringValue
// argument instances around (the CommandLineToArgvW result is then
// freed only when all argument StringValue instances destroyed).
private:
typedef alfs::SharedArray<wchar_t*> StringPtrArray;

StringPtrArray myArgPointers;
int myArgCount;

public:
typedef std::ptrdiff_t Index;

ProgramArguments()
{
wchar_t** const argPointers = CommandLineToArgvW(
GetCommandLine(), &myArgCount
);
myArgPointers = StringPtrArray( argPointers, GlobalFree );
}

std::size_t size() const
{
return myArgCount;
}

alfs::StringValue operator[]( Index i ) const
{
// Constant time, no dynamic allocation.
using namespace alfs;
return StringValue(
TiedPointer(), myArgPointers, myArgPointers
);
}
};

void cppMain(
alfs::StringValue const& commandLine,
ProgramArguments const& arguments
)
{
std::wstring s = commandLine.pointer();

s += L'\n';
for( std::size_t i = 0; i < arguments.size(); ++i )
{
s += L'\n';
s += arguments.cStr();
}

wchar_t const title[] =
L"Command line & std arguments interpretation:";
MessageBox( 0, s.c_str(), title, MB_ICONINFORMATION );
}

alfs::StringValue commandLine()
{
// Client code need not distinguish between this static pointer and
// some string that needs deallocation.
return alfs::StringValue( GetCommandLine(), alfs::NoDelete() );
}

int main()
{
cppMain( commandLine(), ProgramArguments() );
// Almost no per-string overhead.
}
</code>


Comments, ideas, criticism etc. welcome! Note: almost not tested code.
At least not formally tested!

Cheers, & hope this can be interesting,

- Alf

(still hoping SomeOne(TM) can do the honors of testing speed, e.g.
sorting a large vector of std::string versus BStringType!)
 
C

Chris Thomasson

Alf P. Steinbach said:
I once suggested in [comp.std.c++] that SomeOne Else(TM) should propose a
string value class that accepted literals and char pointers and so on, with
possible custom deleter, and in case of literal strings just carrying the
original pointer.
[...]

You could use a lock-based and/or lock-free reader-writer pattern with your
code. For instance, here is a proxy garbage collector that can work well
within the concept: any object construction results in a subsequent
destruction. The proxy collector manages the lifetime that represents the
period in time between the ctor and dtor:

http://home.comcast.net/~vzoom/demos/pc_sample.c
(vc-6.0 code for an x86)

Look here for some further info:

http://groups.google.com/group/comp...&group=comp.programming.threads&q=pc_sample.c
 
C

Chris Thomasson

Chris Thomasson said:
Alf P. Steinbach said:
I once suggested in [comp.std.c++] that SomeOne Else(TM) should propose a
string value class that accepted literals and char pointers and so on,
with possible custom deleter, and in case of literal strings just carrying
the original pointer.
[...]

You could use a lock-based and/or lock-free reader-writer pattern with
your code. For instance, here is a proxy garbage collector that can work
well within the concept:

[...]

One other thing, the pattern that results from using a read-write lock is
compatible with the overall reader-writer pattern. One pattern can cover
lock-based algorithms and lock-free algorithms; very good. You can maintain
"portability" by using POSIX to create a failsafe implementation of a
"standard in-house" interface.
 
C

Chris Thomasson

Chris Thomasson said:
One other thing, the pattern that results from using a read-write lock is
compatible with the overall reader-writer pattern. One pattern can cover
lock-based algorithms and lock-free algorithms; very good. You can
maintain "portability" by using POSIX to create a failsafe implementation
of a "standard in-house" interface.
[...]

Read this for simple example of a fairly efficient marriage between
lock-free and lock-based programming:

https://coolthreads.dev.java.net/servlets/ProjectForumMessageView?forumID=1797&messageID=11068

Notice the pattern with the memory barriers. The producer of a string most
likely will need to execute at least a #StoreStore memory barrier
instruction before it produces data into a shared location. The blending of
lock-free and lock-based occurs within the critical-section of the lock. The
code is effectively publishing data into a public place that has visitors
which choose to access the data without the help of the lock. The readers
are just loading pointers, executing an implied and/or explicit "dependant"
#LoadLoad style memory-barrier, dereference the pointers they previously
loaded and issuing "subsequent logic" on the resulting data.
 
A

Alf P. Steinbach

* Gianni Mariani:
Alf P. Steinbach wrote:
...

Is it thread safe?

In addition to my previous reply, in version 05 (I guess that will be
0.05!) I've added

void detach()
{
if( refCount() > 1 ) { doDetach( capacity() ); }
assert( refCount() <= 1 );
}

StringValue_& detached()
{
detach();
return *this;
}

so that a "detached" instance, not sharing data with any other instance,
can be easily created for passing to another thread. The cost is
possible dynamic allocation and O(n) copying, depending on the current
ref-count of the source. I haven't tested this except that doDetach is
used in operator+= and seems to work well.

None of the versions have used static variables, but of course the code
uses dynamic allocation, which depending on the run-time library may use
thread unsafe static variables: that aspect is the same as with any
other C++ code, and with the current standard very tool-specific.

Btw., I've posted version 05 (also supporting operator+= concatenation,
with time generally linear in the argument instead of linear in the
result) at <url: home.no.net/alfps/cpp/lib/alfs_v05.zip>, Boost license.

Cheers,

- Alf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,654
Latest member
LannySinge

Latest Threads

Top