Cast object from long in safe manner?

B

Ben Hutchings

Peter Koch Larsen wrote:
a) use a hash_map. Not standard yet, but it very soon will be. Lookup
is O(1) amortized.
<snip>

Big-O notation is supposed to tell you the worst case. (Big-omega is
used for the best case.) The worst case is, unfortunately, that all
the entries fall into the same hash bucket and lookup is O(n).
Amortisation makes no difference.
 
B

Ben Hutchings

Stefan Rupp wrote:
However, there is the defined "network byte order", into and out of
which values can be translated for purposes of sending data across
networks in heterogenous environments. The keywords you need to search
for are: htonl and ntohl!

The design of htonl and ntohl is thoroughly wrong-headed. First, the
names suggest that they work on longs when actually they must work on
32-bit integers (similarly htons and ntohs work on 16-bit integers,
not short). Second, they can't be implemented on architectures that
have trap values for integers. Third, they require the caller to
read and write the network-ordered integers into and out of buffers
as 32-bit values, which is impossible if the value is not properly
aligned. (This last is not an issue in TCP/IP, which is what they
were designed for, since all TCP/IP header fields are aligned.)

Portable conversion functions that deal with binary formats should
read and write arrays of bytes. (Even that is problematic if you're
aiming for complete portability, since a byte may have more than 8
bits.)
 
E

Ed Avis

Ben Hutchings said:
Big-O notation is supposed to tell you the worst case. (Big-omega is
used for the best case.)

Not so. To say 'f is O(g)' means that for large enough n, f(n) <=
g(n). (And when giving the name of an algorithm we normally mean the
time taken by that algorithm.) You can say

In the best case, quicksort is O(n log n) in the num. of elements

In the best case, quicksort is O(n ** 2) (also true)

In the worst case, quicksort is O(n ** 2)

In the worst case, quicksort is O(n ** n ** n ** 99)

To say 'f is Omega(g)' means that g is O(f), in other words for large
enough n, f(n) >= g(n). So

Quicksort is Omega(1) (yes, it takes constant time or worse)

Quicksort is Omega(n log n) (also true)

In the worst case, quicksort is Omega(n ** 2).

However there is an informal convention that when quoting big-Os you
quote the best one you can, so saying

Algorithm X takes time O(g(n)) where n is the input size

is, by reading between the lines, saying that it is both O(g(n)) and
Omega(g(n)), because you wouldn't bother to give a worse big-O than is
possible. Otherwise people would go around saying 'quicksort is
O(n**99)' and it would all get a bit silly.
The worst case is, unfortunately, that all the entries fall into the
same hash bucket and lookup is O(n).

Indeed. For such a degenerate hash lookup is O(n) - so it's no worse
than linear in the number of elements - but it is also Omega(n) - so
no better.
 
B

Ben Hutchings

Ed said:

My wording was sloppy; I really meant "upper bound" and "lower bound"
which might not actually be reached. However, I realise that even that
is not quite right since they are asymptotic bounds.
To say 'f is O(g)' means that for large enough n, f(n) <= g(n).

That should be f(n) <= k.g(n) for some constant k.
(And when giving the name of an algorithm we normally mean the
time taken by that algorithm.) You can say

In the best case, quicksort is O(n log n) in the num. of elements

In the best case, quicksort is O(n ** 2) (also true)

In the worst case, quicksort is O(n ** 2)

In the worst case, quicksort is O(n ** n ** n ** 99)
<snip>

I see what you mean and I accept that you're right, but it seems
misleading to me to give an asymptotic upper bound that's based to the
best case. It doesn't seem to represent any kind of upper bound at
all.
 
D

Dhruv

However there is an informal convention that when quoting big-Os you
quote the best one you can, so saying

Algorithm X takes time O(g(n)) where n is the input size

is, by reading between the lines, saying that it is both O(g(n)) and
Omega(g(n)), because you wouldn't bother to give a worse big-O than is
possible. Otherwise people would go around saying 'quicksort is
O(n**99)' and it would all get a bit silly.

Why not say that the quoted complexity is asymptotically tight bound or
not?


Regards,
-Dhruv.
 
H

Hyman Rosen

Ed said:
To say 'f is O(g)' means that for large enough n, f(n) <= g(n).

No, this is wrong. To say 'f is O(g)' means that there exist
constants N and C such that for all n > N, |f(n)| <= C * |g(n)|.

You can see that your definition is trivially wrong because it
claims that (n + 1) is not O(n).
 
J

Jamie Burns

Errr, guy's, I appreciate the thoughts on this, but look, what does all that
math mean to mere mortals like myself?!

:eek:)

I realise that a hashmap is the more elegant solution, but compared to
simply casting a memory address it is still going to be really slow right
(is that what the math proved)? And I think I have solved the dangers of
passing pointers around by using the magic string lookup.

Interestingly, I just did a quick test to see which is faster (given I am
simply casting now, not using hashmaps):

for (xx=0;xx<100000;xx++) {
rDerived* derived = dynamic_cast<rDerived*>(remoteObject->object);
if (derived) derived->setValue(message.data.messageSetInt.value);
}

against:

for (xx=0;xx<100000;xx++) {
if (typeid(remoteObject->object) == typeid(rDerived*)) ((rDerived*)
remoteObject->object)->setValue(message.data.messageSetInt.value);
}

And the latter blew the former out of the wayer (typeid() with a C style
cast was 94 times faster in my test than dynamic_cast).

Whoa.

Jamie.
 
K

Karl Heinz Buchegger

Jamie said:
Interestingly, I just did a quick test to see which is faster (given I am
simply casting now, not using hashmaps):

for (xx=0;xx<100000;xx++) {
rDerived* derived = dynamic_cast<rDerived*>(remoteObject->object);
if (derived) derived->setValue(message.data.messageSetInt.value);
}

against:

for (xx=0;xx<100000;xx++) {
if (typeid(remoteObject->object) == typeid(rDerived*)) ((rDerived*)
remoteObject->object)->setValue(message.data.messageSetInt.value);
}

And the latter blew the former out of the wayer (typeid() with a C style
cast was 94 times faster in my test than dynamic_cast).

IMHO that number (94) is still too high. Are you sure you didn't measure
how well the optimizer can optimize version 1 against optimizing
version 2.
 
K

kanze

Ben Hutchings said:
I see what you mean and I accept that you're right, but it seems
misleading to me to give an asymptotic upper bound that's based to the
best case. It doesn't seem to represent any kind of upper bound at
all.

Yes and no. Normally, the two interesting values are typical or
average, and worse case. But what do you say about quicksort. The
worst case is O(n^2). The best case is O(n ln n). I don't think that
we know how to calculate the typical or average, but empirically, I
think it has been established that quicksort usually performs fairly
close to its best case. No guarantees, but useful information anyway.
 
B

Ben Hutchings

Jamie said:
Errr, guy's, I appreciate the thoughts on this, but look, what does all that
math mean to mere mortals like myself?!

:eek:)

I realise that a hashmap is the more elegant solution, but compared to
simply casting a memory address it is still going to be really slow right
(is that what the math proved)?

It didn't *prove* anything, but the point of all that was that a hash
table lookup can be slow, taking time proportional to the number of
entries.

However, provided that the hash table implementation increases the
number of buckets in proportion to the number of entries, which I think
all hash_map implementations do, and ignoring cache effects which may
come into play if the table grows very large, the *average* time taken
for a hash lookup will be about the same time no matter how many
entries are in the table, and not very long at all.
And I think I have solved the dangers of passing pointers around by
using the magic string lookup.

What does that mean?
Interestingly, I just did a quick test to see which is faster (given I am
simply casting now, not using hashmaps):

for (xx=0;xx<100000;xx++) {
rDerived* derived = dynamic_cast<rDerived*>(remoteObject->object);
if (derived) derived->setValue(message.data.messageSetInt.value);
}

against:

for (xx=0;xx<100000;xx++) {
if (typeid(remoteObject->object) == typeid(rDerived*)) ((rDerived*)
remoteObject->object)->setValue(message.data.messageSetInt.value);
}

And the latter blew the former out of the wayer (typeid() with a C style
cast was 94 times faster in my test than dynamic_cast).

The latter doesn't do what you think it does. The typeid() of a
pointer doesn't depend on what it's pointing to. The condition in the
if statement should be:

typeid(*remoteObject->object) == typeid(rDerived)
 
K

kanze

Errr, guy's, I appreciate the thoughts on this, but look, what does
all that math mean to mere mortals like myself?!

It means that you have to know what you are trying to do before you try
to do it.
I realise that a hashmap is the more elegant solution, but compared to
simply casting a memory address it is still going to be really slow
right (is that what the math proved)? And I think I have solved the
dangers of passing pointers around by using the magic string lookup.

The difference is that the hashmap will work. Passing a long around,
and casting it to a pointer whenever you feel like accessing memory,
won't. After that, of course, the quality of the hash function will
affect performance. As others have mentionned, however, if only one
process is doing all of the object creation (and id assignment), there
are fairly simple solutions which will allow using an std::vector (which
is nothing other than a very efficient map size_t->T, for a restricted
range of values on the size_t side).
 
K

Keith H Duggar

Hello Jamie,

This problem is common and you can find literature discussing various
solutions in the client server community.

All of the viable solutions so far have discussed creating a mapping
function :

f(g(object) -> object

where g(object) is often called a "handle". This handle body idiom is
well established and you should consider using one of the solutions
already outline. Yes, you do incur a performance penalty for the added
safety. That is a tradeoff you need evaluate for your application;
but, that is the bottom line.

I just wanted to point you to one interesting alternative to the
mapping functions discussed for far : encryption. The general idiom is
:

encrypt ( this-is-a-valid-pointer + pointer ) -> handle
decrypt ( handle ) -> this-is-a-valid-pointer + pointer

If you don't recover this-is-a-valid-pointer then you don't use the
pointer.

Using this idiom you adjust the safety-vs-performance tradeoff quite
easily by choosing an appropriate encryption algorithm. One advantage
of this approach is that it doesn't require linear space overhead as
do the other handle-body approaches discussed so far. One disadvantage
is that it doesn't allow you to easily correct for server-side memory
management of the objects ( moving them around, etc) since you can't
readily adjust the mapping of handles you have already distributed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,888
Members
47,420
Latest member
ZitaVos505

Latest Threads

Top