How to return a string of arbitrary length to caller?

J

John Smith

Hi,

I'm writing a library in C++ which is supposed to be used by people using C.
One function I have must return a string to users which is arbitrary length.
The user must be able to use this string as "char *" so I was wondering if
the following construct is safe:

char *Func()
{
static string str;

str = "foobar...";

return (char *) str.c_str();
}

The problem is that before the function call the length of the string is
unknown.
An alternative is to use malloc or new and provide that pointer to the user.
The memory allocated could then be deallocated later on.

Thanks in advance.
-- John
 
J

John Harrison

John Smith said:
Hi,

I'm writing a library in C++ which is supposed to be used by people using C.
One function I have must return a string to users which is arbitrary length.
The user must be able to use this string as "char *" so I was wondering if
the following construct is safe:

char *Func()
{
static string str;

str = "foobar...";

return (char *) str.c_str();
}

No its not safe, for two reasons. Firstly you are casting away const, if you
users ever use the non-const pointer to change the value of the string then
you are invoking undefined behaviour. Secondly the use of a static variable
might be problematic, what if Func is called twice? If the user still has
the pointer that was the result of the first call to Func then the second
call would change what that pointer is pointing to (I'm assuming that each
call to Func produces a different string).
The problem is that before the function call the length of the string is
unknown.
An alternative is to use malloc or new and provide that pointer to the user.
The memory allocated could then be deallocated later on.

That's the way to do it. Since you are dealing with C, presumably malloc
would be a better bet.
 
C

Claudio Jolowicz

No its not safe, for two reasons. Firstly you are casting away const, if you
users ever use the non-const pointer to change the value of the string then
you are invoking undefined behaviour. Secondly the use of a static variable
might be problematic, what if Func is called twice? If the user still has
the pointer that was the result of the first call to Func then the second
call would change what that pointer is pointing to (I'm assuming that each
call to Func produces a different string).


That's the way to do it. Since you are dealing with C, presumably malloc
would be a better bet.

Definitely, don't use new or new[] if the string can only be deallocated
with free().

Another way would be to let the caller pass you an allocated char* and
its size. The advantage of this being that the string can be allocated
and deallocated in the same context. Obviously this requires that the
caller knows which size is required, so it may not be applicable.
 
C

Claudio Jolowicz

That's the way to do it. Since you are dealing with C, presumably malloc
would be a better bet.

Definitely, don't use new or new[] if the string can only be deallocated
with free().

Actually, you /could/ use new and provide a function to deallocate the
string in the proper way.
 
P

Philipp Bachmann

An alternative is to use malloc or new and provide that pointer to the user.
The memory allocated could then be deallocated later on.

As John Harrison already said, of both your ideas this is the way to go.
Your system might have the convenience function "strdup()", so you
merge "std::string::size()", "malloc()" and "strcpy()" into one call. But then,
be aware, that "strdup()" might internally use "new []" instead of "malloc()".
Sun CC 5.3 seems to do this as Purify told me, which I consider a bad
decision...

I'd go for a third way, however: Adopt the way, the Unix "*_r()" routines
work: Pass a pointer to a user supplied buffer and the length of this buffer by
argument. For convenience, you could return the pointer to the buffer
again to the caller. This style has three advantages: It's known to callers from
e.g. the "*_r()" routines, it's less prone to memory leaks - functions
returning pointers to memory they've just allocated suffer from
potentially being wrongly used -, and the caller can decide himself which
memory allocation he uses - local variables, "malloc()", "mmap()"... Of
course, there's also a disadvantage: What to do, if the buffer is too small?
One of the easier ways out of this disadvantage is another function, which
calculates the size of the buffer in advance and return it to the caller. Take a
look e.g. at MS Win32 "GetSystemDirectory()", which can act both as a
function which returns the length of the buffer required and as a function
which actually fills the buffer.

Cheers,
Philipp.
 
J

Julie

John said:
Hi,

I'm writing a library in C++ which is supposed to be used by people using C.
One function I have must return a string to users which is arbitrary length.
The user must be able to use this string as "char *" so I was wondering if
the following construct is safe:

char *Func()
{
static string str;

str = "foobar...";

return (char *) str.c_str();
}

The problem is that before the function call the length of the string is
unknown.
An alternative is to use malloc or new and provide that pointer to the user.
The memory allocated could then be deallocated later on.

Thanks in advance.
-- John

Using your method is acceptable provided your callers can live w/ the side
effects of using a shared storage location.

One way would be to return the required length of the string and let the user
allocate the required space. The negative effect is that essentially the
function has to be called twice:

int Func(char * dest, int cchdest)
{
// get string as 'std::string', store required length
// if dest is NULL, don't store result
return requiredlength;
}

Another method is to have the caller pass in a function address that is used by
you to allocate the required space, and then leave it up to the caller to free
the memory:

typedef void * (Allocator)(int size);
char * Func(Allocator alloc)
{
char * dest = (char *)alloc(requiredsize);
// ...
return dest;
}

A note, if you define the Allocator signature the same as malloc, that would
allow your users to use it directly, minimizing the fuss:

// caller's code:
char * newstring = Func(malloc);
// use newstring
free(newstring); // all done!
 
J

Jakob Bieling

Another way would be to let the caller pass you an allocated char* and
its size. The advantage of this being that the string can be allocated
and deallocated in the same context. Obviously this requires that the
caller knows which size is required, so it may not be applicable.

Solution might be to have the function always return the number of
chars written to the buffer, and if the buffer size is 0, return the number
of chars needed.

hth
 
D

Dave Moore

John Smith said:
Hi,

I'm writing a library in C++ which is supposed to be used by people using C.
One function I have must return a string to users which is arbitrary length.
The user must be able to use this string as "char *" so I was wondering if
the following construct is safe:

char *Func()
{
static string str;

str = "foobar...";

return (char *) str.c_str();
}

The problem is that before the function call the length of the string is
unknown.
An alternative is to use malloc or new and provide that pointer to the user.
The memory allocated could then be deallocated later on.

Thanks in advance.
-- John

Well .. I think passing a char * allocated from inside the C++ to an
external C routine is fundamentally unsafe .. it would be much better
to pass in an allocated char * and an int describing the length as
suggested by another user. However I understand you might not have
much choice if you are dealing with legacy code. Lets assume you can
guarantee that the C code will not try to access beyond the end of the
string .. otherwise you are sunk. Lets also assume that the C-code
will not try to de-allocate the memory for the char *.

If both of the above are true, I think a factory approach like the
following might be useful:

#include <vector>
#include <string>

using std::string;
using std::vector;

class string_maker : vector<char *> {
public:
char * new_string(string s) { // you could also use const char
char *p = new char[s.length()+1];
s.copy(p,string::npos);
p[s.length()] = 0; // add C-style string terminator
push_back(p);
return p;
}

~string_maker() {
// free memory
vector<char *>::iterator I=begin();
for ( ; I!=end(); ++I)
delete [] *I;
}
};

// put this in the same translation unit as Func below.
namespace {
string_maker repository;
}

// now for your function

int Func(char *&p) {
string return_string;
// dynamic generation of return_string contents
p=repository.new_string(return_string);
return return_string.length()+1;
}

Notice I have changed the declaration of Func so that it returns the
length of the string as an int, which is essential so that the C-code
calling the function can make sure not to overrun the result. The
(unallocated) char * addressing the result is passed *by reference* to
Func as an argument. (Leaving out the reference means only the local
copy is modified).

Now, while I would really not advise doing things this way, I guess
this might work for you, given the caveats above. At least the
allocation and deallocation of memory is handled in a well defined
way, and you can be fairly sure that the char *'s being passed between
C and C++ will not be invalidated before the program exit.

Of course, many improvements could be made to the string_maker class
above .. it is just a (hopefully) helpful idea.

HTH, Dave Moore
 
J

John Smith

Thank you all for the useful suggestions and comments.

I discovered the best thing for me would be to do something like the
following.
Since I provide a complete api with C-frontend I can just do whatever I want
inside e.g.:

class MemoryHolder
{
public:
char *p;
};
char *Func(void *pHandle)
{
string str = ... my data;
MemoryHolder M = pHandle; // <- void pointer is really a memory object
M.p = new[..];
strcpy(M.p, str.c_str());
return M.p;
}

void Cleanup(void *pHandle)
{
... cleanup used memory
}

This example is naturally incomplete but thats just to show what I'm
thinking.
The user which only uses C has a void pointer handle which is a object
pointer. Inside I can cast the void pointer to my real type and hide the
real implementation (e.g. you don't need to put the classes in public
available header files).
To those who asked about legacy code I don't have any legacy. All is
developed from scratch but it's a requirement that my code can be used from
various languages including C.
The idea that some people recommended about passing NULL and then returning
length is sort of ok but the computation time for the function might be a
while so its useless to do the same calculations twice.

-- John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,166
Messages
2,570,907
Members
47,446
Latest member
Pycoder

Latest Threads

Top