Zero-size array as struct member

J

Juha Nieminen

Goran said:
Oh, come on! set is not a vector. That's a MAJOR flaw in your example
and example is therefore completely of the mark.

The example is demonstrating the speed of 'new' and 'delete' compared
to other relatively complex operations (in this case inserting an element
into a balanced binary tree). In this case 'new' is over 3 times slower
than all the rebalancing needed in the element insertion, which is quite
a lot. That's the point in the example.

Why are you nitpicking on the choice of data container when the whole
point was to demonstrate the speed of memory allocation?
What!? Reserve would cause exactly 0 memory fragmentation, at least
until code starts reallocating or freeing these vectors, at which
point, there would be, fragmentation-wise, little, if any, difference
between a vector and discussed struct hack.

True. reserve() in itself causes 0 memory fragmentation in this case.
However, I didn't say that reserve() causes memory fragmentation. I said
that reserve() does not alleviate the memory fragmentation caused by
having a std::vector as a member of the struct. You should really try to
read what I'm writing.

The std::vector is used simply as a subsitute of a raw array in this
case. Reallocation is not the issue here.
You're still to prove how memory is fragmented. Until you reach
reserved size in a vector, there's no fragmentation.

I'm still to prove how memory is fragmented? Hello? Do you even know
what memory fragmentation means, and how it happens during the execution
of a typical program?

Making two memory allocations worsens memory fragmentation with higher
probability than making only one. reserve() has nothing to do with that.
All you have is one allocation more with a vector. That's easily
swamped by the rest of the code, especially if you have millions of
elements in it.

The problem is not one vector having a million elements. The problem
is a million vectors, each having a few elements. That's what the struct
hack is optimizing (well, one of the things).
You need to have _a lot_ of vectors, all with a _small_ number of
elements in it for your complaint to be relevant.

And that's exactly what's happening if you have an array as the member
of a struct, and then you instantiate that struct millions of times. Which
was my original point.
And for that,
there's no need to use contortions until one can measurein running
code, that performance hit is indeed relevant. You are trying to do it
backwards, and especially because programmers are proven time and time
over to be poor judges of performance problems.

I'm trying to do it backwards? I'm not trying to do anything. I'm not
even advocating the use of the struct hack. All I'm saying is that one
of the possible reasons to use the struct hack is that it lessens the
amount of memory allocations (making the program potentially faster)
besides lessening memory fragmentation.

There are many reasons to avoid such a low-level hack, and I have never
denied that.
 
J

Juha Nieminen

Goran said:
I shall do no such thing.

Yeah, way to go. Don't even try to understand what the struct hack
technique is all about, and then scream loudly how you know these things
better than anybody else. That's the way to discuss and to learn things.
There's no need to be condescending.

Look who's talking.
 
J

Juha Nieminen

Öö Tiib said:
You may allocate sufficient memory for several things (some of what
may be arrays with run-time decided sice) at once in C++ as well. You
have to manage it and use placement new to put all things into same
storage.

Well, that is, basically, what the struct hack is all about. The only
difference is that in C you don't use placement new to initialize the
object (and, technically speaking, you don't need to use placement new
in C++ either, if the struct is fully a POD type, in which case it will
work just as in C).
 
Ö

Öö Tiib

  Well, that is, basically, what the struct hack is all about. The only
difference is that in C you don't use placement new to initialize the
object (and, technically speaking, you don't need to use placement new
in C++ either, if the struct is fully a POD type, in which case it will
work just as in C).

We discuss performance optimization here. Sure C++ works somewhere
under surface technically as close to metal as C. For platform-
specific performance optimizations one has sometimes to hack even
deeper. What i am objecting against is building and exposing such
structs in some C++ interface what OP was all about.
 
G

Goran Pusic

  Yeah, way to go. Don't even try to understand what the struct hack
technique is all about, and then scream loudly how you know these things
better than anybody else. That's the way to discuss and to learn things.

Please read my very first post in this thread and __then__ tell me I
don't understand what struct hack technique is all about.

But flames aside, I think we ultimately disagree only about one thing:
how often will one benefit from employing it.

My contention is: in practice, almost never. In practice, other
processing will swamp the cost of that one allocation. And if actual
number of elements needs to change (and in what I write, that's almost
always), vector beats struct hack, because struck hack has horrible
reallocation performance.

Goran.
 
J

Juha Nieminen

Goran Pusic said:
But flames aside, I think we ultimately disagree only about one thing:
how often will one benefit from employing it.

Who is "we"? At least I am not claiming that the struct hack should
be used frequently, if at all. In fact, if I had such a situation in some
project of mine where using the struct hack would potentially bring
efficiency benefits, I would nevertheless try to think if the program
could be redesigned in such way that the benefit is achievable without
the struct hack.
 
A

Andrey Tarasevich

thomas said:
Wait.. I don't think it's illegal in C++. At least I will definitely
object making it illegal by the standard community.

It is formally illegal in both C and C++, as is explicitly stated in
both standards.
Will you guys getting crazy if we do things like this

-------code-----
struct A{
int num;
int p[0];
};
A *pA = (A*)malloc(sizeof(A)+sizeof(int)*10);
printf("%d\n", &pA->p[10] - &pA->p[0]); //accessing out of
bounds.
------code---

Just don't do things like this. If you want to use the "struct hack",
either declare the array with size 1 (as shown in my post) or maybe even
declare it with some huge size. Use `offsetof` instead of `sizeof` to
calculate the size for `malloc`.

Once you think about it, you should realize that the habit of declaring
an array with size 0 in "struct hack" originates from one and only one
source: the desire to use `sizeof(A)` under `malloc` with no extra
corrections (referring to your example). This, in turn, based on the
simple fact that not too many programmers know about `offsetof` and its
applications.

Of course, in C99 you should use the size-less declaration, which was
introduced specifically for "struct hack" and specifically because
zero-size array declaration is illegal.
 
F

Fred Zwarts

Andrey Tarasevich said:
thomas said:
Wait.. I don't think it's illegal in C++. At least I will definitely
object making it illegal by the standard community.

It is formally illegal in both C and C++, as is explicitly stated in
both standards.
Will you guys getting crazy if we do things like this

-------code-----
struct A{
int num;
int p[0];
};
A *pA = (A*)malloc(sizeof(A)+sizeof(int)*10);
printf("%d\n", &pA->p[10] - &pA->p[0]); //accessing out of
bounds.
------code---

Just don't do things like this. If you want to use the "struct hack",
either declare the array with size 1 (as shown in my post) or maybe
even declare it with some huge size. Use `offsetof` instead of
`sizeof` to calculate the size for `malloc`.

The problem with 'offsetof' in C++ is that it must be a compile time constant.
In the example above this is not a problem, but in general one sometimes
needs a size calculated at run time.
 
V

Vladimir Jovic

joe said:
Vladimir said:
thomas said:
On Aug 20, 2:01 am, Andrey Tarasevich <[email protected]>
wrote:
Pete Becker wrote:
Hi, I need your help.
----------
struct SvrList{
unsigned int uNum;
GameSvr svr[0]; //line A
};
---------
Once I declared a struct like this to store server list info.
It's supposed to be used like this.
----------
SvrList* pList = (SvrList*)malloc(sizeof(
SvrList) + svrNum*sizeof(GameSvr));
pList->uNum, pList->svr[0], pList->svr[1].... blabla..
I wouldn't call this fine. Even
pList->svr[0]
is accessing the element that is out of array's bounds, and
that is UB. How come your program is not crashing, or at
least going crazy? Maybe you are just unlucky to have a bug
hidden.
It's an old C programmers hack. I've come across this idiom in
lot's old C code, particularly driver and os code. Microsoft
Win32 is rife with it.
Except that it's not legal C, either.
Which is why it's referred to above as a "hack". Quite a common
one, too.
Usually we use the term "hack" when the code relies on a specific
manifestation of undefined (or unspecified) behavior, but otherwise
is well-formed.

In this case the code is ill-formed, since 0-size array declaration
is illegal in C++ (as well as in C). In other words, arguing about
the "out of bounds" access here doesn't make much sense, since the
code is formally non-compilable.
Wait.. I don't think it's illegal in C++. At least I will definitely
object making it illegal by the standard community.
In the c++ standard, see 8.3.4.1 , this part :

... If the constant-expression (5.19) is present, it shall be an
integral constant expression and its value shall be greater than
zero...
therefore it is illegal c++ code.
It can be dangerous but it can also do good. It depends on whether we
are using it correctly.
Can't use it correctly. It is illegal, therefore undefined behaviour.

Undefined by the standard, but DEFINED by all implementations. The "it's
undefined behavior" ranting gets quite annoying. Try compiling something
with the standard instead of an implementation sometime and see how far
you get!

"not defined by the standard" means the implementation can do whatever
it wants. In my opinion, the code should be compiled without warnings.
Following the standard, it is much easier to change compiler and even
the OS and target platform.

I might be wrong, but if we take this definition of rant :
http://en.wikipedia.org/wiki/Rant
then what I wrote is not a rant :)
 
B

Bart van Ingen Schenau

Now, you made me curious. Could you present an example? I would try to
rewrite that without performance penalty in a "hack free" manner. The
reasong is: I have a hard time imagining a use for the struct hack in C++
such that doing the same thing in a more idiomatic way comes with a
performance hit. On the other hand, I know that sometimes my imagination is
just lacking.

Best

Kai-Uwe Bux

Here is an example where the struct-hack is extensively used in a real
production environment.

The system in question uses a message-passing mechanism to communicate
with both internal and external entities.
The basic message structure looks like this:

typedef struct
{
unsigned char media;
unsigned char receiver_dev;
unsigned char sender_dev;
unsigned char function;
unsigned char len[2];
unsigned char receiver_obj;
unsigned char sender_obj;
unsigned char data[1];
} MESSAGE_T;

The payload of the message, which is contained in the data array, is
typically between 4 and 16 bytes, but can be as much as 4000 bytes.
As said, these messages can be sent both to internal destinations
(sender_dev == received_dev) and to external devices using a variety
of (serial) communication mechanisms.

Bart v Ingen Schenau
 
Ö

Öö Tiib

Now, you made me curious. Could you present an example? I would try to
rewrite that without performance penalty in a "hack free" manner. The
reasong is: I have a hard time imagining a use for the struct hack in C++
such that doing the same thing in a more idiomatic way comes with a
performance hit. On the other hand, I know that sometimes my imagination is
just lacking.

Kai-Uwe Bux

Here is an example where the struct-hack is extensively used in a real
production environment.

The system in question uses a message-passing mechanism to communicate
with both internal and external entities.
The basic message structure looks like this:

typedef struct
{
    unsigned char  media;
    unsigned char  receiver_dev;
    unsigned char  sender_dev;
    unsigned char  function;
    unsigned char  len[2];
    unsigned char  receiver_obj;
    unsigned char  sender_obj;
    unsigned char  data[1];

} MESSAGE_T;

The payload of the message, which is contained in the data array, is
typically between 4 and 16 bytes, but can be as much as 4000 bytes.
As said, these messages can be sent both to internal destinations
(sender_dev == received_dev) and to external devices using a variety
of (serial) communication mechanisms.

Sent over serial? In C++ i would have "std::vector<uint8_t> message_"
buried into some Message class and done.
 
J

joe

Juha said:
Exactly how is it an "apples-to-oranges comparison"? The two
techniques are achieving the same thing: A struct instantiation which
contains an array whose size is decided at runtime (rather than at
compile time).
The difference between them is that the struct hack achieves this with
one single allocation per struct instance, while the "cleaner" method
requires two allocations per struct instance (one for the struct
itself and another for the vector).

Apples and oranges. One is contiguous and one is not. You're "turning it
inside out": performance comes as a side effect of the contiguity and the
contiguity is the important part of the technique, not the side effect.
Performance will not be an issue if you need to allocate just a few
instances of the struct. However, it can become an issue if you need
to allocate millions of instances.


That may be so, but you can't dismiss the efficiency benefit in cases
where huge amounts of instantiations need to be done.


Why are people here so obsessed about the specifics of std::vector
here? It's only being used here as a substitute for a raw array. In
terms of efficiency it doesn't make any difference compared to one.
It's just easier to use.


There seems to be some kind of failure at communication here.

I am not talking about the efficiency of allocating an array. I am
talking *precisely* about the efficiency of allocating a struct which
has an array as the last member, comparing the struct hack to a more
regular solution (which has a pointer or a std::vector as the last
struct element).

I was just pointing out that I don't think that performance is hardly the
reason that the struct hack came to be. As such, I find the discussion
about performance tangential to the OP. Performance is, of course, the
proverbial low-hanging-fruit of discussion in programming groups. (I
think that's what prompted me to ask in another post what is more
worthwhile to know, "Big O" science or Assembly language. Obviously, I
think the latter).
 
J

joe

Juha Nieminen wrote:

struct Apple
{
// Some other member variables here.
int size;
int array[0];
};


struct Orange
{
// Some other member variables here.
std::vector<int> array;

MyStruct(int size): array(size) {}
};

That the performance is different is a non-issue at best or a clouding of
the purpose of the struct hack worst. I'd like to see C++ support the
technique better than using a one-element array, which of course is
syntactically hackish.
 
J

joe

Juha said:
The struct hack is
about a struct which, besides any other members it may have, also has
an array as its
last element, the size of which is determined at runtime.

Which, of course is incorrect. At least it is lax or unknowing by only
giving one example usage (using dynamic mem with the struct hack) and
presenting it as a definition of the struct hack. Again though,
performance is hardly the point of the struct hack and std:vector is not
a substitute for the struct hack. C was right in standardizing it and C++
would do good to follow suit. While there are more issues to address in
developing an "alternative" to the struct hack in C++, they are not
unsurmountable I think.
 
J

joe

Kai-Uwe Bux said:
Now, you made me curious. Could you present an example? I would try to
rewrite that without performance penalty in a "hack free" manner. The
reasong is: I have a hard time imagining a use for the struct hack in
C++ such that doing the same thing in a more idiomatic way comes with
a performance hit. On the other hand, I know that sometimes my
imagination is just lacking.

There is no alternative to the struct hack in C++. One has to use the
hack if one needs/wants the characteristics of it because C++ has not
formalized or accepted any form of the technique. A hole in the C++ spec
(or worse) if you ask me. I'm fine with using the one-element array
version of the technique, but I do feel like that is unnecessarily (for
lack of formal C++ recognition and support of the technique) hackish (but
not in comparison to how I use/abuse the preprocessor!).
 
J

joe

tni said:
Now, you made me curious. Could you present an example? I would try
to rewrite that without performance penalty in a "hack free" manner.
The reasong is: I have a hard time imagining a use for the struct
hack in C++ such that doing the same thing in a more idiomatic way
comes with a performance hit. On the other hand, I know that
sometimes my imagination is just lacking.

How about this:

#include <boost/intrusive/bs_set_hook.hpp>
#include <boost/intrusive/treap.hpp>

struct Message : public boost::intrusive::bs_set_base_hook<> {
uint16_t priority;
uint16_t length;
uint8_t payload[];
};

// Message comparison for treap is based on payload
boost::intrusive::treap<Message> mtreap;

Let's assume, you have to use a treap-like data structure, but you are
free to distribute the data any way you like. "bs_set_base_hook<>" is
basically a set of 3 pointers (to other messages). Whenever you
access a message, you typically use priority, length, payload and a
random pointer from "bs_set_base_hook<>".

The messages are small, lets say on average 16 bytes. Let's assume on
a 32-bit platform, sizeof(Message) will be 16 without payload and
pointers are 4 bytes each, so messages will typically fit into a
single cache line.
Let's assume for the purposes of this example, access to the messages
is random and memory usage is more than 100x the CPU cache size.

Given these constraints, the only choice you have, is to locate all
the data for a message in one place. You could of course use a vector
with something like the small string optimization (instead of the
flexible array). That would result in higher memory usage and more
complex code but a much cleaner design/interface and re-usability.

\\

That being said, I'll happily admit that you can usually
hide/eliminate the extra indirection 'proper' C++ requires and that
the struct hack should be used very, very rarely.

Those selective placements though are very, very important. Perhaps
having come from a C originally I still have an appreciation for
programming close to the metal whereas that concept escapes those that
started out with the decidely higher-level C++ (or buy into it, lock,
stock and barrel).
 
J

joe

Goran said:
Please read my very first post in this thread and __then__ tell me I
don't understand what struct hack technique is all about.

But flames aside, I think we ultimately disagree only about one thing:
how often will one benefit from employing it.

My contention is: in practice, almost never. In practice, other
processing will swamp the cost of that one allocation. And if actual
number of elements needs to change (and in what I write, that's almost
always), vector beats struct hack, because struck hack has horrible
reallocation performance.

Performance never was the issue.
 
J

joe

Goran said:
I shall do no such thing. Did you take a look at hbvla class from my
first post in this thread? I can implement this thing, in C++, with
the best here. And I certainly did it better than a random C hack like
OP's.

And I still know it's a hack that has very limited usefulness in real-
world code, because performance aspect is a massive case of
prematureoptimizationitis.

Performance is not the issue. You are missing the whole point behind the
struct hack. You and Juha.
 
J

joe

Juha said:
Who is "we"? At least I am not claiming that the struct hack should
be used frequently, if at all. In fact, if I had such a situation in
some project of mine where using the struct hack would potentially
bring efficiency benefits, I would nevertheless try to think if the
program could be redesigned in such way that the benefit is
achievable without the struct hack.

Before you can use a tool, you must understand what its purpose is. The
struct hack is not a performance optimization. You have categorized it
incorrectly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,373
Latest member
Desiree036

Latest Threads

Top