The container abstraction and parallel programming

ec429 · Jan 8, 2012

You could deduce that c must be of integer type isn't it?
LET THE COMPILER DO THAT. Let's eliminate declarations for
integer and let the compiler deduce from the program text
the type of each variable.

Looks ridiculous isn't it?

No, actually. There are several programming languages which have
strong, static typing, and yet have a complete system of type inference.
I believe ML is one such language.
Such a thing is not appropriate for C, because in C's model types are
/promises/ one makes to the compiler. This is C's optimisation
paradigm: you promise the compiler you won't use a value as anything but
an integer, by declaring it 'int'. You promise the compiler you won't
write through a pointer, by qualifying it 'const'. You promise the
compiler the pointer you gave it has no aliases, by qualifying it
'restrict'.
If you want code that makes more use of hardware features like SIMD,
don't add primitives that map to hardware instructions - add new ways of
promising things to the compiler. Then suggest them in comp.std.c and
don't bash other compilers for not supporting them unless and until they
get incorporated into the Standard.
-e

jacob navia · Jan 8, 2012

Le 08/01/12 12:05, ec429 a écrit :

On 08/01/12 10:41, jacob navia wrote:
Actually, those programs aren't "exactly the same", due to the lack of
the aforementioned "restrict" keyword. For instance:
double array[3]={0,1,2};
doADD(array+1, array+1, array, 2);

Adding the "restrict" keyword doesn't change anything.

In general, the correct approach to parallel programming is giving the
compiler as few constraints as possible. This can sometimes be achieved
with 'restrict', but not much can be done in procedural languages unless
they have some explicit means of telling the compiler that normal order
constraints can be relaxed (which would be a /much/ more useful
extension to C than processor-specific vectorisation primitives, since
the latter would reduce the portable glory of C to the provincial
insularity of assembler (not that assembler isn't valuable in
appropriate circumstances, of course!)).

Well, thanks a lot. It is EXACTLY what I proposed in the message that
started this discussion. We should have a way to tell the compiler
to use SIMD operations in a special object that we declare as SIMD.

In C as it stands, the
canonical way to do this is by means of threads (at least in C11; '99
and its predecessors don't mention threaded execution, but many
implementations support it). SIMD has its uses, of course; but unless
your code is perversely written (and so long as you have used 'restrict'
where appropriate) the compiler knows better than you do. Besides, your
time as a programmer is worth enough that you generally shouldn't be
expending it on non-portable performance hacks unless you /know/ your
code will be run a very large number of times. YAGNI.

1) Using threads to make SIMD is not possible. The overhead of thread
creation and maintenance is impossible high.

2) Your remarks about "The compiler knows better" have no connection
with reality. Gcc, that is a highly advanced compiler, is unable
to do SIMD reliably, as I proved. It suffices a small change and
the whole thing COLLAPSES.

3) Your remarks about avoiding performance hacks are welcome since it is
exactly what I was saying: we need a portable way of using SIMD that
is supported by the language.

If you really want to write strongly parallelisable programs (rather
than merely hand-optimising with a few vector ops that probably don't
gain you much anyway) try a functional language, such as LISP - with a
properly designed data-flow model parallel processing becomes something
that the implementation can do transparently.

Well, I am using C, and I do not see how LISP would be useful in this
context since it would need the same extensions that you would need in
C. Lisp is NOT SIMD per se.

I think you just don't like gcc, because you know (and don't want to
admit to yourself) that it's better than yours.

I have never denied that gcc generates better code than lcc.
In my posting I spoke of two mainstream compilers like gcc and MSVC
That is all. And what I said was right: they aren't able to use SIMD
programming in a useful way.

But you do not like what I say (and you do not want to admit it
to yourself) because I am NOT using gcc and even have the audacity of
trying to build a different compiler!

You see?

If you get to personal attacks or suppositions we just can't discuss.
Everything becomes a stupid war of "gcc is the best and you are wrong
because you use another compiler"

Nick Keighley · Jan 8, 2012

If anything, that may be an indicator that maybe nobody cares that much about
Intel's SIMD instructions. It's not hard to see why.

SIMD instructions do not translate to a tangible benefit for the average
consumer of Intel chips. (Marketing 101: features need to connect with
benefits!)

but there's an awful lot of Intel chips sold. Even 0.1% of that market
is huge. Just because the average desk top user doesn't need SIMD
doesn't mean its market is negligable.

There was a time when games and video playback stretched the ability of the CPU
to the core, and hardware for accelerating them was next to nonexistent. The
processing-intensive tasks in those kinds of applications are handled by
dedicated hardware these days.

SIMD will not fix the major, user-visible performance issues in a Wintel PC,
like long boot times, or browsers becoming unresponsive for seconds at a time.

yes but there are people who need heavy number crunching. CAD for one,
I'd think.

ec429 · Jan 8, 2012

Le 08/01/12 12:05, ec429 a écrit :

Actually, those programs aren't "exactly the same", due to the lack of
the aforementioned "restrict" keyword. For instance:
double array[3]={0,1,2};
doADD(array+1, array+1, array, 2);

Click to expand...

Adding the "restrict" keyword doesn't change anything.

This is patently false. Adding the "restrict" keyword certainly changes
the program (from the perspective of the C standard) since it promises
to the compiler that the pointers will not be aliased, thereby making
optimisations (such as SIMD) legal (they otherwise would not be).

Well, thanks a lot. It is EXACTLY what I proposed in the message that
started this discussion. We should have a way to tell the compiler
to use SIMD operations in a special object that we declare as SIMD.

No, you have missed the point. You don't tell the compiler to "use SIMD
instructions"; you tell the compiler that it /may/ operate on these data
in any order since the operators are commutative, nothing is aliased,
&c.; if the target platform has SIMD instructions the compiler may
choose to use them. I cannot stress enough that this is the compiler's job.
Mixing container abstractions with compiler internals, as your OP seems
to propose, is a pathway to infinite pain; instead, your containers
library (if such a thing be even necessary) should be written in a
manner susceptible to optimisation by SIMD-aware compilers - namely, it
should make full use of the restrict keyword, and handle arrays simply
where possible; also, data dependencies should be carefully controlled.
(Data dependencies limit all reordering)
If, after doing this, you find that the compiler does not use SIMD
instructions, you patch the compiler (in as general a way as possible).
You do _not_ patch the language.

1) Using threads to make SIMD is not possible. The overhead of thread
creation and maintenance is impossible high.

I wasn't suggesting using threads for SIMD; I was merely discussing
parallelisation in general terms.

2) Your remarks about "The compiler knows better" have no connection
with reality. Gcc, that is a highly advanced compiler, is unable
to do SIMD reliably, as I proved. It suffices a small change and
the whole thing COLLAPSES.

Ok, I'll clarify. The compiler *in principle* knows better, and if it
gets it wrong, you patch the compiler; you don't patch the language.

3) Your remarks about avoiding performance hacks are welcome since it is
exactly what I was saying: we need a portable way of using SIMD that
is supported by the language.

"Using SIMD" in your code is a performance hack. /Making the use of
SIMD by the compiler possible/ in your code is portable, and avoids
performance hacks.

Well, I am using C, and I do not see how LISP would be useful in this
context since it would need the same extensions that you would need in
C. Lisp is NOT SIMD per se.

Lisp is inherently parallel; one entirely valid model of execution is to
consider every cons as being executed in parallel, blocking on its data
dependencies from its car and cdr. This is why functional languages
often incorporate monads, because I/O happens to need to be serial.
A sufficiently intelligent LISP compiler could compile (reduce),
(expand), (select), and (extend) to appropriate SIMD instructions if the
target platform has them. But this is c.l.c, not c.l.lisp.

I have never denied that gcc generates better code than lcc.
In my posting I spoke of two mainstream compilers like gcc and MSVC
That is all. And what I said was right: they aren't able to use SIMD
programming in a useful way.

I suspect your definition of "useful" is isomorphic to "anything
mainstream compilers don't do". As other posters have explained, gcc
can indeed use SIMD in a useful way, but only where sufficient promises
have been made to it (like "restrict") for such use not to be a
violation of the C standard. Since it's good practice to make those
promises wherever you can (even if you don't foresee an immediate
benefit - just the documentation effect of const and restrict justifies
their use), I don't see what you're complaining about.

But you do not like what I say (and you do not want to admit it
to yourself) because I am NOT using gcc and even have the audacity of
trying to build a different compiler!

I laud your efforts to build a compiler; I'm impressed that you are able
to do so. But that doesn't necessarily mean that ideas you have for the
extension of the language are good ones.

If you get to personal attacks or suppositions we just can't discuss.
Everything becomes a stupid war of "gcc is the best and you are wrong
because you use another compiler"

I'm not a High Priest of the One True Way (tm) of gcc. rather my gold
standard is The Standard. Its extension or emendation is justified only
when it can be shown to have failed. On SIMD it has not done so.
gcc may currently use less SIMD than it could. That is a problem to be
solved by work on gcc, or by creating a better compiler that matches
gcc's good qualities while avoiding its bad habits. It is not a problem
to be solved by adding non-portable incompatible extensions to the
language, particularly when the language already provides sufficient
recourse.
-e

Stephen Sprunk · Jan 8, 2012

There are no "dependencies" between array members. There is a possible
aliasing problem when dealing with pointers rather than arrays, but that
is easily enough solved with "restrict" qualifiers.

I don't understand what you think the problem is.

Click to expand...

He has to be referring to loop dependencies as in

for (i = 0; i < n; i++) a = a[i-3] + 2;

The main reason I'm writing is to point out that late version C
compilers including gcc, Intel, and I believe though am not sure,
Microsoft all will emit vector instructions for loops coded with
commonsense idioms. I have a signal processing code that speeds up by
a factor of 3 with gcc when this opimization is turned on.

Right. Unfortunately, GCC seems to only recognize "commonsense idioms"
and misses opportunities to vectorize code that doesn't fit those
idioms, even when doing so would be provably safe. So, you have to know
what idioms it understands and modify your code to comply; that is not
ideal, but it's hardly a new concept--and presumably the set of idioms
will expand over time as folks complain or contribute patches.

I have no idea if ICC (or MSVC?) has the same limitation.

S

jacob navia · Jan 8, 2012

Le 08/01/12 18:37, Stephen Sprunk a Ã©crit :

Right. Unfortunately, GCC seems to only recognize "commonsense idioms"
and misses opportunities to vectorize code that doesn't fit those
idioms, even when doing so would be provably safe. So, you have to know
what idioms it understands and modify your code to comply; that is not
ideal, but it's hardly a new concept--and presumably the set of idioms
will expand over time as folks complain or contribute patches.

I have no idea if ICC (or MSVC?) has the same limitation.

Not the same but SIMILAR limitations. Then, for each compiler:
do {
learnLimitations();
ModifyCode();
} while (ItsnotInStandardC());

That is the reason for my proposal. There is no standard way of doing
SIMD even if most modern CPUs have SIMD extensions.

Stephen Sprunk · Jan 8, 2012

Le 07/01/12 14:39, Stephen Sprunk a Ã©crit :

I don't understand what you think the problem is.

Click to expand...

The problem, Stephen, is as follows:

gcc will vectorize this:

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result=L+R;
}

but will NOT vectorize this:

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result[n-i]=L[n-i]+R[n-i];
}

Why?

GCC doesn't understand that loop because it's not idiomatic. When I
crank up the verbosity, it tells me that loop has a "complicated access
pattern".

All this huge machinery for automatic vectorizing has never worked
correctly for dozens of years since requires an incredible amount
of program analysis.

Click to expand...

It works correctly if you write idiomatic code, and the amount of work
required to recognize a handful of idioms is not "incredible".

My proposition is:

Why make complicated something that you can do in a simple way?

Let the programmer declare the data that should be handled in an SIMD
way.

Click to expand...

GCC has vector data types for that, eg. v2df, if you want them. IMHO,
it's simpler (and more portable) to just write code that can be
auto-vectorized according to whatever hardware is available on the
particular target of the day.

(I have a different perspective on this, perhaps, because my employer's
code base has to run on over a dozen different CPUs; portability trumps
maximum performance on one particular CPU.)

If I write:

c = (a+b)/2;
if (c&1) fn(1);

You could deduce that c must be of integer type isn't it?

Click to expand...

Um, no. It could be any arithmetic type.

Well, for SIMD the problem is the same.

Click to expand...

Not really. A vector is just an array of some other type--it's not a
new type itself.

S

Keith Thompson · Jan 8, 2012

Stephen Sprunk said:
On 08-Jan-12 04:48, jacob navia wrote: [...]

If I write:

c = (a+b)/2;
if (c&1) fn(1);

You could deduce that c must be of integer type isn't it?

Click to expand...

Um, no. It could be any arithmetic type.

"c&1" is valid only if c is of some integer type.

jacob navia · Jan 8, 2012

Le 08/01/12 18:52, Stephen Sprunk a Ã©crit :

Um, no. It could be any arithmetic type.

c&1 implies c is an integer since all other floating types do not
support that operation.

It could be char/short/int/long/long long/, and by reading more
code the compiler could even differentiate among them.

gwowen · Jan 9, 2012

Nonsense, gcc doesn't have a real vectorizer. For instance:

It doesn't have a *perfect* vectorizer.

But NO option (including -ftree-vectorize, as you suggest) will
convince gcc to use the parallel instructions in the first case.

Yes, you can confuse vectorizers. You can also confuse optimizers.

Do you think I can't obfuscate trivial code so badly that lcc can't
optimize it?
Does that mean lcc doesn't have a real optimizer?

Of course it doesn't mean that. By writing clear C I've got a
measurable out of GCC's imperfect vectorizer, and no semantic
arguments will ever prevent that from being true.

That was the main argument of my post and nobody has given a sensible answer to it.
Most of the answer are like yours:

No one has agreed with you, certainly. That means one of two things:
i) You're wrong
ii) Everyone else is wrong.

Nobody · Jan 9, 2012

gcc will vectorize this:

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result=L+R;
}

Doesn't it require "restrict"?

I would expect vectorisation to produce the wrong result if result
overlaps either L or R.

Ben Bacarisse · Jan 9, 2012

Nobody said:
gcc will vectorize this:

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result=L+R;
}

Click to expand...

Doesn't it require "restrict"?
Yes.

I would expect vectorisation to produce the wrong result if result
overlaps either L or R.

A simple test shows that different code is generated with and without
the restrict keyword. If I lie to the compiler (i.e. I use restrict but
pass pointers to overlapping data) the code does indeed give the "wrong"
result (at optimisation level -O3 but not at -O2).

Stephen Sprunk · Jan 9, 2012

Stephen Sprunk said:
Stephen Sprunk said:

On 08-Jan-12 04:48, jacob navia wrote: [...]

If I write:

c = (a+b)/2;
if (c&1) fn(1);

You could deduce that c must be of integer type isn't it?

Click to expand...

Um, no. It could be any arithmetic type.

Click to expand...

"c&1" is valid only if c is of some integer type.

I stand corrected. Still, there are obvious problems with expecting the
compiler to deduce the type of undeclared variables.

S

Kaz Kylheku · Jan 9, 2012

gcc will vectorize this:

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result=L+R;
}

Click to expand...

Doesn't it require "restrict"?

No; the compiler could spit out two versions of the code along with a
run-time overlap check.

Keith Thompson · Jan 9, 2012

Stephen Sprunk said:
Stephen Sprunk said:

On 08-Jan-12 04:48, jacob navia wrote: [...]
If I write:

c = (a+b)/2;
if (c&1) fn(1);

You could deduce that c must be of integer type isn't it?

Um, no. It could be any arithmetic type.

Click to expand...

"c&1" is valid only if c is of some integer type.

Click to expand...

I stand corrected. Still, there are obvious problems with expecting the
compiler to deduce the type of undeclared variables.

For example, that conforming C compilers don't do that. (For some
other languages, it probably makes perfect sense.)

Looking upthread, I see that it was intended as an example of the
kind of inference a C compiler *shouldn't* do, in the context of
whether compilers should infer that they can use SIMD. I'm not
going to get into that particular debate, other than to observe
that it's not a perfect analogy.

Gene · Jan 11, 2012

gcc will vectorize this:

Click to expand...

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result=L+R;
}

Click to expand...

Doesn't it require "restrict"?

I would expect vectorisation to produce the wrong result if result
overlaps either L or R.

Likely true if compiled as a function, but with gcc -O3 this will be
inlined then vectorized. At least that was my experience.

Tim Rentsch · Jan 26, 2012

jacob navia said:
Le 08/01/12 09:08, Gareth Owen a @C3{A9}crit :

Nonsense. As well as the builtins already mentioned, GCC has
-ftree-vectorize which uses these SIMD instructions automatically when
the compiler can deduce that the aliasing rules allow it. Dot product
and matrix multiplication get a sizeable speed up when you use it.
The 'restrict' keyword helps a lot, also

Click to expand...

Nonsense, gcc doesn't have a real vectorizer. For instance:

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result[n-i]=L[n-i]+R[n-i];
}

This is exactly the same program as

void doADD(double *L, double *R, double *result,size_t n)
{
for(size_t i=0; i<n;i++) result=L+R;
}

But NO option (including -ftree-vectorize, as you suggest) will
convince gcc to use the parallel instructions in the first case.

Only Intel has a more stable version of a vectorizer, since
only Intel can afford to pay dozens of computer scientists
dozens of years full time to build it.

Gcc has invested a considerable effort and the results are
not very good, as you can see.

But of course, why make simple when you can do complicated?

Why is not possible that the PROGRAMMER declares a special
data type that should be used in SIMD programming?

That was the main argument of my post and nobody has given a sensible
answer to it. [snip]

The problem is, no one other than yourself finds your argument
persuasive, let alone convincing. Does that give you any sort
of pause or prompt any further reflection?

jacob navia · Jan 26, 2012

Le 26/01/12 18:23, Tim Rentsch a écrit :

That was the main argument of my post and nobody has given a sensible
answer to it. [snip]

Click to expand...

The problem is, no one other than yourself finds your argument
persuasive, let alone convincing. Does that give you any sort
of pause or prompt any further reflection?

Yes. I am reading "SIMD programming manual" by Paul Cockshott and
Kenneth Renfrew (Springer 2004) where those authors propose exactly
what I am proposing here.

Ignorance is widespread in this group, and that, combined with a desire
to attack anything I say because I am not "GNU" makes for discussions
like the one we are having: People that like to piss around, making
short ironical sentences like you do, without ever trying to understand
what I am proposing, nor reflecting about the fact that gcc is unable to
parallelize automatically a simple construct like the one I presented.

Why bother with facts?

Just tell the guy:

Tim Rentsch · Mar 8, 2012

jacob navia said:
Le 26/01/12 18:23, Tim Rentsch a @C3{A9}crit :

That was the main argument of my post and nobody has given a sensible
answer to it. [snip]

Click to expand...

The problem is, no one other than yourself finds your argument
persuasive, let alone convincing. Does that give you any sort
of pause or prompt any further reflection?

Click to expand...

Yes. I am reading "SIMD programming manual" by Paul Cockshott and
Kenneth Renfrew (Springer 2004) where those authors propose exactly
what I am proposing here.

That isn't really an answer to the question.

Ignorance is widespread in this group, and that, combined with a desire
to attack anything I say because I am not "GNU" makes for discussions
like the one we are having: People that like to piss around, making
short ironical sentences like you do,

I wasn't being ironical, or pissing around. I was asking
a serious, non-rhetorical question that was intended to
further the conversation. Can you see the difference
between those two kinds of behaviors?

without ever trying to understand what I am proposing, nor
reflecting about the fact that gcc is unable to parallelize
automatically a simple construct like the one I presented. [snip]

I don't know how you are in person, but based on how you
act in the newsgroup I believe you would be described as
a narcissist. As long as you act like a narcissist it
will be difficult to get people to take your proposals
seriously, regardless of any technical merit they might have.

The C Containers library	17	Apr 23, 2011
C Programming functions	2	Dec 3, 2021
Observing a container	2	Dec 10, 2013
How to properly insert a landing page within same container beneath an image element?	1	Oct 7, 2024
Engineering a List container Part 2: Implementations	20	Dec 8, 2013
Can't decide which language to get back into programming with	1	Mar 28, 2023
Parallel sorting algorithms...	0	Sep 7, 2012
review of the "container library", part 1/?	18	Mar 1, 2011

The container abstraction and parallel programming

ec429

jacob navia

Nick Keighley

ec429

Stephen Sprunk

jacob navia

Stephen Sprunk

Keith Thompson

jacob navia

gwowen

Nobody

Ben Bacarisse

Stephen Sprunk

Kaz Kylheku

Keith Thompson

Gene

Tim Rentsch

jacob navia

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads