Embedded languages based on early Ada (from "Re: Preferred OS, processor family for running embedded

P

Pascal Obry

Ray Blaak a écrit :
I am somewhat rusty on my Ada tasking knowledge, but why can't Thing be a
protected object?

I don't think this is true. Thing can be a protected object and passed
to some procedures. No problem here I would say and probably the right
approach.
It seems to me that is precisely the kind of synchronization control mechanism
you want to be able to have here.

Agreed.

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
 
D

Dmitry A. Kazakov

This is what I was thinking.

Syntax might be even simpler:
declare
Thing : X;
begin par
Foo (Thing);
Bar (Thing);
Baz (Thing);
end par;

Thing won't get corrupted if the programmed knows what they're doing!

Surely, but it becomes a pitfall for those who don't. The construct is
inherently unsafe, because it makes no any sense without some mutable Thing
or equivalent. This mutable thing is accessed unsafely, or else concurrency
gets killed.
In the case of pure functions, there is "obviously" no problem:

declare
Thing : X := InitThing;
begin par
A1 := Foo (Thing);
A2 := Bar (Thing);
A3 := Baz (Thing);
end par;
return A1+A2+A3;

In the case of procedures, there are numerous reasonable uses.
Perhaps the three procedures read Thing, and output three separate files.
Or maybe they write different parts of Thing. Maybe they validate
different properties of Thing, and raise an exception if a fault is found.
Perhaps they update statistics stored in a protected object, not shown.

The most obvious case is if the procedures are called on different
objects. Next most likely is if they are pure functions

The problem is that there always exists the final "A1+A2+A3" which
semantics is in question. The alternatives resynchronize on "A1+A2+A3" and
I see no obvious way to express this. A PAR statement would not really help
to decompose it.

(What you have done is replacing mutable Thing with mutable set {A1,A2,A3}.
Let's rename {A1,A2,A3} to Thing, the problem is still there.)
Could Thing be composed of protected objects? That way updates
would be serialised but wouldn't necessarily block the other procedures.

That could be a "hierarchical" mutex. But mutexes are themselves very
low-level. The unsafety were still there, it just would show itself as
deadlocks, rather than as corrupted data.
Maybe the procedures are very slow, but only touch Thing at the end?
Couldn't they run concurrently, and be serialised in an arbitrary order
at the end?

That is the key issue, IMO. An ability to chop large chunks when the
procedures run most of the time independently into independent and
serialized parts is all the decomposition is about...
Nothing in this problem is different from the issues of doing it with
separate tasks. So why is this any more problematic?

Because tasks additionally have safe synchronization and data exchange
mechanisms, while PAR should rely on inherently unsafe memory sharing.
The semantics I want permit serial execution in any order. And permit
operation even with a very large number of parallel statements in
effect. Imagine a recursive call with each level having many parallel
statements. Creating a task for each directly would probably break.
Something like an FFT, for example. FFT the upper and lower halves
of Thing in parallel. Combine serially.

Yes, and the run-time could assign the worker tasks from some pool of,
fully transparently to the program. That would be very cool.
Exception sematics would probably differ. Any statement excepting
would stop all other par statements(?)

But not by abort, rather it should wait for the next synchronization point
an propagate an exception out of there, so that the alternatives might
clean up the temporal objects they create. (The synchronization points
could be explicit, for example when an alternative calls to an entry point
or procedure of a shared thing.)
The compiler should be able to generate code which generates a
reasonable number of threads, depending on the hardware being used.
Yes


Maybe you're right. But I can't see how to glue this in with
Ada (or VHDL) semantics.

That is the most difficult part! :)-))
 
D

Dmitry A. Kazakov

I am somewhat rusty on my Ada tasking knowledge, but why can't Thing be a
protected object?

I tried to explain it in my previous post.

When Thing is a protected object, then the procedures and entries of,
called from the concurrent alternatives are all mutually exclusive. This is
not the semantics expected from PAR. Probably it would be better to rewrite
as:

declare
Thing : X;
begin
par -- Though this appears concurrent, it is not
Thing.Foo;
and
Thing.Bar;
and
Thing.Baz;
end par;
end;
It seems to me that is precisely the kind of synchronization control mechanism
you want to be able to have here.

No. The implied semantics of PAR is such that Thing should be accessed from
alternatives without interlocking because one *suggests* that the updates
are mutually independent. When Thing is visible from outside it should be
blocked by PAR for everyone else. This is not the behaviour of a protected
object. It is rather a "hierarchical" mutex.
 
J

Jonathan Bromley

Because tasks additionally have safe synchronization and data exchange
mechanisms, while PAR should rely on inherently unsafe memory sharing.

The PAR that I'm familiar with (from CSP/occam) most certainly does
*not* have "inherently unsafe memory sharing". There seems to be
an absurd amount of wheel-reinvention going on in this thread.
Yes, and the run-time could assign the worker tasks from some pool of,
fully transparently to the program. That would be very cool.

And easy to do, and done many times before.

For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly
static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?

In VHDL, a process represents a single statically-constructed
thread. It talks to its peers in an inherently safe way through
signals. With this mechanism, together with dynamic memory
allocation, you can easily fake-up whatever threading regime
takes your fancy. You probably wouldn't bother because
there are more convenient tools to do such things in software
land, but it can be done. In hardware you can do exactly the
same thing, but one (or more) of your processes must then
take responsibility for emulating the dynamic memory allocation,
carving up some real static physical memory according to
whatever strategy you choose to implement.
That is the most difficult part! :)-))

Maybe. But then again, maybe organising the structure of
the actual application is the most difficult part, and this
vapid rambling about things that are already well-understood
is actually rather straightforward.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
S

Simon Farnsworth

Jonathan said:
For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly
static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?
Not disagreeing with most of what you're saying, but I do feel the need to
point out the existence of systems with hotpluggable CPUs. Sun and IBM have
both sold systems for some years where CPUs can be added and removed at
runtime; software is expected to just cope with this.

Also in the software domain; there is a cost to switching between different
threads. Thus, in software, the aim is to limit the number of runnable
threads to the number of active CPUs. If there are more threads runnable
than CPUs available, some CPU time is wasted switching between threads,
which is normally undesirable behaviour.
 
D

Dr. Adrian Wrigley

The PAR that I'm familiar with (from CSP/occam) most certainly does
*not* have "inherently unsafe memory sharing". There seems to be
an absurd amount of wheel-reinvention going on in this thread.

I think reinvention is necessary. Whatever "par" semantics was
in Occam is not available in Ada (or C, C++, Perl or whatever).
It was considered useful then - bring it back!
And easy to do, and done many times before.

How do you do this in Ada? Or VHDL? It's been done many times
before, yes, but not delivered in any currently usable form for
the general programmer :( It's not in any mainstream language I know.
For heaven's sake... You have a statically-determinable number of
processors. It's your (or your compiler's) choice whether each of
those processors runs a single thread, or somehow runs multiple
threads. If each processor is entitled to run multiple threads, then
there's no reason why the number and structure of cooperating
threads should not be dynamically variable. If you choose to run
one thread on each processor, your thread structure is similarly

Of course. But how do I make this choice with the OSs and languages
of today? "nice" doesn't seem to be able to be able to control
this when code is written in Ada or VHDL. Nor is it defined
anywhere in the source code.
static. Hardware people have been obliged to think about this
kind of thing for decades. Software people seem to have a
pretty good grip on it too, if the textbooks and papers I've read
are anything to go by. Why is it suddenly such a big deal?

It's been a big deal for a long time as far as I'm concerned.
It's not a matter of "invention" mostly, but one of availability
and standards. There is no means in Ada to say "run this in
a separate task, if appropriate". Only a few, academic
and experimental tools offer the flexibility. Papers /= practise.
In VHDL, a process represents a single statically-constructed
thread. It talks to its peers in an inherently safe way through
signals. With this mechanism, together with dynamic memory
allocation, you can easily fake-up whatever threading regime
takes your fancy. You probably wouldn't bother because
there are more convenient tools to do such things in software
land, but it can be done.

I'm not sure what you're talking about here. Do you mean like any/all of
Split-C, Cilk, C*, ZPL, HPF, F, data-parallel C, MPI-1, MPI-2, OpenMP,
ViVA, MOSIX, PVM, SVM, Paderborn BSP, Oxford BSP toolset and IBM's TSpaces?

Specifying and using fine-grain parallelism requires language,
compiler and hardware support, I think.

Consider:
begin par
x := sin(theta);
y := cos(theta);
end par;

you probably *do* want to create a new thread, if thread creation
and destruction is much faster than the function calls. You don't
know this at compile-time, because this depends on the library in use,
and the actual parameters. Maybe X, Y are of dynamically allocated
length (multi-precision).

You can't justify designing hardware with very short thread
creation/destruction times, unless the software can be written
to take advantage. But none of the mainstream languages
allow fine grain reordering and concurrency to be specified.
That's the Catch-22 that Inmos/Occam solved. Technically.

The need is emerging again, now more threads on a chip
is easier than higher sequential instruction rate.
In hardware you can do exactly the
same thing, but one (or more) of your processes must then
take responsibility for emulating the dynamic memory allocation,
carving up some real static physical memory according to
whatever strategy you choose to implement.


Maybe. But then again, maybe organising the structure of
the actual application is the most difficult part, and this

This is sometimes true.
vapid rambling about things that are already well-understood
is actually rather straightforward.

Somewhere our models don't mesh. What is "straightforward" to
you is "impossible" for me. What syntax do I use, and which
compiler, OS and processor do I need to specify and exploit
fine-grain concurrency?

In 1987, the answers were "par", Occam, Transputer. Twenty
years later, Ada (or VHDL, C++, C#), Linux (or Windows), Niagara
(or Tukwila, XinC, ClearSpeed, Cell) do not offer us anything
remotely similar. In fact, in twenty years, things have
got worse :(
 
D

Dr. Adrian Wrigley

Not disagreeing with most of what you're saying, but I do feel the need to
point out the existence of systems with hotpluggable CPUs. Sun and IBM have
both sold systems for some years where CPUs can be added and removed at
runtime; software is expected to just cope with this.

Also in the software domain; there is a cost to switching between different
threads. Thus, in software, the aim is to limit the number of runnable
threads to the number of active CPUs. If there are more threads runnable
than CPUs available, some CPU time is wasted switching between threads,
which is normally undesirable behaviour.

This is part of the problem. Parallelism has to be be *inhibited* by
explicit serialisation, to limit the number of threads created.

So a construct like (in Ada):

for I in Truth'Range loop
Z := Z xor Truth (I);
end loop;

deliberately forces an execution serial order, even although we
know that order does not matter at all, in this case.

There is no effective consruct to permit but not require concurrency.

The software compiler can't sensibly parallelise this because:

The semantics of xor may be unknown (overloaded), and unsuitable
The execution time of each iteration is much smaller than thread
start/stop time
Too many parallel threads would be created

So we're left with source code which implies a non-existent serialisation
constraint.

If the "for I in all..." construct were in the language, we'd be
able to say "I don't care about the order", and permitting concurrency,
even if the result weren't identical (eg when using floating point)

Numerous algorithms in simulation are "embarrassingly parallel",
but this fact is completely and deliberately obscured from compilers.
Compilers can't normally generated fine-scale threaded code because
the applications don't specify it, the languages don't support it,
and the processors don't need it. But the technical opportunity
is real. It won't happen until the deadlock between compilers, software
and processors is broken.
 
J

Jonathan Bromley

What syntax do I use, and which
compiler, OS and processor do I need to specify and exploit
fine-grain concurrency?

In 1987, the answers were "par", Occam, Transputer. Twenty
years later, Ada (or VHDL, C++, C#), Linux (or Windows), Niagara
(or Tukwila, XinC, ClearSpeed, Cell) do not offer us anything
remotely similar. In fact, in twenty years, things have
got worse :(

Absolutely right. And whose fault is that? Not the academics,
who have understood this for decades. Not the hardware people
like me, who of necessity must understand and exploit massive
fine-grained parallelism (albeit with a static structure). No,
it's the programmer weenies with their silly nonsense about
threads being inefficient.

Glad to have got that off my chest :) But it's pretty frustrating
to be told that parallel programming's time has come, when
I spent a decade and a half trying to persuade people that it
was worth even thinking about and being told that it was
irrelevant.

For the numerical-algorithms people, I suspect the problem of
inferring opportunities for parallelism is nearer to being solved
than some might imagine. There are tools around that
can convert DSP-type algorithms (such as the FFT that's
already been mentioned) into hardware that's inherently
parallel; there are behavioural synthesis tools that allow
you to explore the various possible parallel vs. serial
possibilities for scheduling a computation on heterogeneous
hardware. It's surely a small step from that to distributing
such a computation across multiple threads or CPUs. All
that's needed is the will.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
D

Dr. Adrian Wrigley

Absolutely right. And whose fault is that? Not the academics,
who have understood this for decades. Not the hardware people
like me, who of necessity must understand and exploit massive
fine-grained parallelism (albeit with a static structure). No,
it's the programmer weenies with their silly nonsense about
threads being inefficient.

By the way... I am a satisfied customer of yours (from 1994).

If there is any blame to share, I place it upon the language
designers who don't include the basics of concurrency (and
I include Ada, which has no parallel loops, statements or function
calls. Nor decent pure functions).

I do hardware, processor and software design. But I'm not
keen on trying to fix-up programming languages, compilers
and processors so they mesh better. (Unless someone pays me!)
Glad to have got that off my chest :) But it's pretty frustrating
to be told that parallel programming's time has come, when

(I'm not saying this - so don't be frustrated! What I'm saying
is that multithreading has become "buzzword compliant" again,
so may there's an opportunity to exploit to address longstanding
technical deficiencies and rebrand Ada and/or VHDL)
I spent a decade and a half trying to persuade people that it
was worth even thinking about and being told that it was
irrelevant.

Parallel programming's time hasn't quite arrived :(
But it's only 3-5 years away! Still. (like flying cars,
fusion power and flat screens, which never seem to get
nearer. {Oh. tick off flat screens!})
For the numerical-algorithms people, I suspect the problem of
inferring opportunities for parallelism is nearer to being solved
than some might imagine. There are tools around that
can convert DSP-type algorithms (such as the FFT that's
already been mentioned) into hardware that's inherently

Again, this is ages old now. But it can't convert
C-type programs reliably and efficiently.
parallel; there are behavioural synthesis tools that allow
you to explore the various possible parallel vs. serial
possibilities for scheduling a computation on heterogeneous
hardware. It's surely a small step from that to distributing
such a computation across multiple threads or CPUs. All
that's needed is the will.

A small step. Like from Apollo 11.

Once the language/software/compiler/processor deadlock is broken,
things will move rapidly. Give it another 15 years, and we might
be half way there.

Glad to see that we're not so far apart as I thought!
 
P

Pascal Obry

Dr. Adrian Wrigley a écrit :
Numerous algorithms in simulation are "embarrassingly parallel",
but this fact is completely and deliberately obscured from compilers.

Not a big problem. If the algorithms are "embarrassingly parallel" then
the jobs are fully independent. In this case that is quite simple,
create as many tasks as you have of processors. No big deal. Each task
will compute a specific job. Ada has no problem with "embarrassingly
parallel" jobs.

What I have not yet understood is that people are trying to solve, in
all cases, the parallelism at the lowest lever. Trying to parallelize an
algorithm in an "embarrassingly parallel" context is loosing precious
time. Many real case simulations have billions of those algorithm to
compute on multiple data, just create a set of task to compute in
parallel multiple of those algorithm. Easier and as effective.

In other words, what I'm saying is that in some cases ("embarrassingly
parallel" computation is one of them) it is easier to do n computations
in n tasks than n x (1 parallel computation in n tasks), and the overall
performance is better.

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
 
D

Dmitry A. Kazakov

Dr. Adrian Wrigley a écrit :

Not a big problem. If the algorithms are "embarrassingly parallel" then
the jobs are fully independent. In this case that is quite simple,
create as many tasks as you have of processors. No big deal. Each task
will compute a specific job. Ada has no problem with "embarrassingly
parallel" jobs.

What I have not yet understood is that people are trying to solve, in
all cases, the parallelism at the lowest lever. Trying to parallelize an
algorithm in an "embarrassingly parallel" context is loosing precious
time. Many real case simulations have billions of those algorithm to
compute on multiple data, just create a set of task to compute in
parallel multiple of those algorithm. Easier and as effective.

In other words, what I'm saying is that in some cases ("embarrassingly
parallel" computation is one of them) it is easier to do n computations
in n tasks than n x (1 parallel computation in n tasks), and the overall
performance is better.

The idea (of PAR etc) is IMO quite opposite. It is about treating
parallelism rather as a compiler optimization problem, than as a part of
the domain. In the simplest possible form it can be illustrated on the
example of Ada's "or" and "or else." While the former is potentially
parallel, it has zero overhead compared to sequential "or else." (I don't
count the time required to evaluate the operands). If we compare it with
the overhead of creating tasks, we will see a huge difference both in terms
of CPU cycles and mental efforts.
 
P

Pascal Obry

Dmitry A. Kazakov a écrit :
The idea (of PAR etc) is IMO quite opposite. It is about treating
parallelism rather as a compiler optimization problem, than as a part of
the domain. In the simplest possible form it can be illustrated on the
example of Ada's "or" and "or else." While the former is potentially
parallel, it has zero overhead compared to sequential "or else." (I don't
count the time required to evaluate the operands). If we compare it with
the overhead of creating tasks, we will see a huge difference both in terms
of CPU cycles and mental efforts.

I don't buy this :) You don't have to create tasks for every
computations. You put in place a writer/consumer model. A task prepare
the data and put them into a list (protected object) and you have a set
of tasks to consume those jobs. This works in many cases, requires only
creation of tasks once (not as bad as OpenMP which creates threads for
parallel computations).

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
 
D

Dmitry A. Kazakov

Dmitry A. Kazakov a écrit :


I don't buy this :)

Well, maybe I don't buy it too... :)-)) Nevertheless, it is a very
challenging and intriguing idea.
You don't have to create tasks for every computations.

(On some futuristic hardware tasks could become cheaper than memory and
arithmetic computations.)
You put in place a writer/consumer model. A task prepare
the data and put them into a list (protected object) and you have a set
of tasks to consume those jobs. This works in many cases, requires only
creation of tasks once (not as bad as OpenMP which creates threads for
parallel computations).

Ah, but publisher/subscriber framework is itself a solution of some
problem, which is not a domain problem. If you had a distributed middleware
you would not care about publishers and subscribers. You would simply
assign/read a variable controlled by the middleware. Interlocking,
marshaling whatsoever would happen transparently.
 
R

Ray Blaak

Dmitry A. Kazakov said:
I tried to explain it in my previous post.

When Thing is a protected object, then the procedures and entries of,
called from the concurrent alternatives are all mutually exclusive. This is
not the semantics expected from PAR. Probably it would be better to rewrite
as:

PAR only says that all of its statements run in parallel, nothing more nothing
less (e.g. equivalent to the task bodys you had around each statement before).

Those statements can themselves access synchronization and blockin controls
that affect their execution patterns.
No. The implied semantics of PAR is such that Thing should be accessed from
alternatives without interlocking because one *suggests* that the updates
are mutually independent.

The updates are independent only if their behaviour truly is independent. If
they access a shared synchronization control then by definition they are
mutually dependent.

It is not PAR that dictates this, but rather the statements themselves.

PAR would only be convenience shorthand for writing task bodies around each
statement.
When Thing is visible from outside it should be
blocked by PAR for everyone else. This is not the behaviour of a protected
object. It is rather a "hierarchical" mutex.

The behaviour of a protected object is defined by its entries and how it is
used.
 
D

Dr. Adrian Wrigley

Dr. Adrian Wrigley a écrit :

Not a big problem. If the algorithms are "embarrassingly parallel" then
the jobs are fully independent. In this case that is quite simple,

They aren't independent in terms of cache use! They may also have
common subexpressions, which independent treatments re-evalutates.
create as many tasks as you have of processors. No big deal. Each task
will compute a specific job. Ada has no problem with "embarrassingly
parallel" jobs.

A problem is it that it breaks the memory bandwidth budget. This
approach is tricky with large numbers of processors. And even more
challenging with hardware synthesis.
What I have not yet understood is that people are trying to solve, in
all cases, the parallelism at the lowest lever. Trying to parallelize an
algorithm in an "embarrassingly parallel" context is loosing precious
time.

You need to parallelise at the lowest level to take advantage of
hardware synthesis. For normal threads a somewhat higher level
is desirable. For multiple systems on a network, a high level
is needed.

What I want in a language is the ability to specify when things
must be evaluated sequentially, and when it doesn't matter
(even if the result of changing the order may differ).
Many real case simulations have billions of those algorithm to
compute on multiple data, just create a set of task to compute in
parallel multiple of those algorithm. Easier and as effective.

Reasonable for compilers and processors as they are designed now.
Even so it can be challenging to take advantage of shared
calculations and memory capacity and bandwidth limitations.

But useless for hardware synthesis. Or automated partitioning
software. Or generating system diagrams from code.

Manual partitioning into tasks and sequential code segments is
something which is not part of the problem domain, but part
of the solution domain. It implies a multiplicity of sequentially
executing process threads.

Using concurrent statements in the source code is not the same thing
as "trying to parallelise an algorithm". It doesn't lose any
prescious execution time. It simply informs the reader and the
compiler that the order of certain actions isn't considered relevant.
The compiler can takes some parts of the source and convert to
a netlist for an ASIC or FPGA. Other parts could be broken
down into threads. Or maybe parts could be passed to separate
computer systems on a network. Much of it could be ignored.
It is the compiler which tries to parallelise the execution.
Unlike tasks, where the programmer does try to parallelise.

Whose job is it to parallise operations? Traditionally,
programmers try to specify exactly what sequence of operations is
to take place. And then the compiler does its best to shuffle
things around (limited). And the CPU tries to overlap data
fetch, calculation, address calculation by watching the
instruction sequence for concurrency opportunities.
Why do the work to force sequential operation if the
compiler and hardware are desperately trying to infer
concurrency?
In other words, what I'm saying is that in some cases ("embarrassingly
parallel" computation is one of them) it is easier to do n computations
in n tasks than n x (1 parallel computation in n tasks), and the overall
performance is better.

This is definitely the case. And it helps explain why parallelisation
is not a job for the programmer or the hardware designer, but for
the synthesis tool, OS, processor, compiler or run-time. Forcing
the programmer or hardware designer to hard-code a specific parallism type
(threads), and a particular partitioning, while denying the expressiveness
of a concurrent language will result in inferior flexibility and
inability to map the problem onto certain types of solution.

If all the parallelism your hardware has is a few threads then all you
need to code for is tasks. If you want to be able to target FPGAs,
million-thread CPUs, ASICs and loosely coupled processor networks,
the Ada task model alone serves very poorly.

Perhaps mapping execution of a program onto threads or other
concurent structure is like mapping execution onto memory.
It *is* possible to manage a processor with a small, fast memory,
mapped at a fixed address range. You use special calls to move
data to and from your main store, based on your own analysis of
how the memory access patterns will operate. But this approach
has given way to automated caches with dynamic mapping of
memory cells to addresses. And virtual memory. Trying to
manage tasks "manually", based on your hunches about task
coherence and work load will surely give way to automatic
thread inference creation and management based on the interaction
of thread management hardware and OS support. Building in
hunches about tasking to achieve parallelism can only be
a short-term solution.
--
Adrian
 
P

Pascal Obry

Dr. Adrian Wrigley a écrit :
If all the parallelism your hardware has is a few threads then all you
need to code for is tasks. If you want to be able to target FPGAs,
million-thread CPUs, ASICs and loosely coupled processor networks,
the Ada task model alone serves very poorly.

Granted. I was talking about traditional hardware where OpenMP is used
and I do not find this solution convincing in this context. It is true
that for massively parallel hardwares things are different. But AFAIK
massively parallel hardwares (like IBM Blue Gene) all come with a
different flavor of parallelism, I don't know if it is possible to have
a model to fit them all... I'm no expert on those anyway.

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://www.obry.net
--| "The best way to travel is by means of imagination"
--|
--| gpg --keyserver wwwkeys.pgp.net --recv-key C1082595
 
C

Colin Paul Gloster

Marcus Harnisch <[email protected]> posted on Fri, 02 Mar 2007
14:22:00 +0100:

"To be fair, of the examples posted, only in C the behavior is
actually undefined."


In Ada it could be useful to read the value of a variable which has
not been assigned a value in the source code, e.g. if the VOLATILE
PRAGMA is used and the variable's memory location is directly
connected to output from a temperature sensor, for example. This is
similar to claiming that assigning a floating point number to a
variable of type INTEGER can be desired. It can be desired, but I
should be required to explicitly express my intent such as in
some_integer : integer := integer(floating_point_number);
instead of
some_integer : integer := floating_point_number;
which quite rightly is illegal.


"In VHDL [..] the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results."


I was unaware of this. Could you please tell me more about this? I
have so far received no such response to
"Reading an undefined value",
HTTPS://Bugzilla.Mentor.com/show_bug.cgi?id=120
timestamped 2007-02-15 04:19.

Thanks in advance,
Colin Paul Gloster
 
J

Jonathan Bromley

"In VHDL [..] the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results."


I was unaware of this. Could you please tell me more about this? I
have so far received no such response to
"Reading an undefined value",
HTTPS://Bugzilla.Mentor.com/show_bug.cgi?id=120

With respect, this is something that is trivially discovered
from reading the VHDL LRM or any half-decent text book;
there is no mystery about it. Any scalar in VHDL is
initialised to the left-hand value of its subtype's range;
aggregates have each of their components so initialised.
You can, of course, add an initialiser to a declaration to
override this behaviour. Every simulator I've ever used
correctly implements this language feature.

The problem arises not in the language, nor in simulation
(i.e. execution of a program written in VHDL on a
suitable platform), but in synthesis. The majority of
hardware platforms do not offer reliable power-up
initialisation of internal state. Consequently it is appropriate
to code explicitly some reset behaviour. For exactly this
reason, the hardware-oriented data types in VHDL (std_logic,
etc) have a specific undefined value as the leftmost value
of their value-set, so that initialisation oversights are
more likely to be detected.

Unfortunately for a purist such as you, there are many
occasions in hardware design where it is entirely
appropriate to read an uninitialised object. For
example, a pipeline or shift register probably does
not need the hardware overhead of reset; it will
automatically flush itself out over the first few clock
cycles - but only if you allow it to run, which of course
entails reading uninitialised (or default-initialised) values.
Consequently it is appropriate for synthesis tools to do
pretty much what they generally do: don't worry about
initialisations. For preference, they should issue warnings
about attempts to do explicit initialisation, since these cannot
be synthesised to hardware on most platforms. However,
even then it may be appropriate to let this past, since the
explicit initialisation may be useful in order to limit the
excessive pessimism that usually occurs when a simulator
is confronted with a lot of 'U' or 'X' values. This issue is
one of those things that hardware designers are required
to be aware of, and failure to attend to it is usually a good
identifying mark of a beginner, or a dyed-in-the-wool
programmer assuming that hardware design is easy.

Please don't assume that hardware design is naive, ignorant
or incompetent simply because it doesn't look exactly
like good software design.
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
(e-mail address removed)
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
C

Colin Paul Gloster

Hello again,

It seems prudent to highlight that my advice regarding discrepancies
between VHDL simulations and synthesized VHDL was intended for someone
(named Mike Silva) who I suspected would not have been aware of how
common this is, who said in
(which is a different subthread on comp.lang.ada and so does not
appear in References fields on comp.lang.vhdl):

"[..]

Well, I did pick up a VHDL book a while back. Maybe it's a sign. :)
But first I want to get Ada running on a SBC."

Sorry if this caused confusion. Having said that, I would prefer
simulations to reflect reality (but I appreciate that less accuracy
for higher speed can be acceptable if you know what you are
sacrificing and what you are doing).


In timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <[email protected]> posted:
Colin Paul Gloster said:
What do you mean by this? The VHDL I simulate behaves the same as the
FPGA, unless I do something bad like doing asynchronous design, or
miss a timing constraint."

Or if you use the enumeration encoding attribute and it is not supported
by both the simulation tool and the synthesis tool;

Well, attributes are all a bit tool specific, I'm not sure this is
important. The sim and the synth will *behave* the same, but the
numbers used to represent states might not be what you expect. Or am
I misunderstanding what you are getting at?"



You seem to have understood it but to not be aware of what I have read
on the matter, which may be just scaremongering, I have never actually
checked whether a simulation tool and a synthesis tool differ on this.

Many attributes are tool-specific, but not all. The attribute
ENUM_ENCODING is located somewhere in between: it was introduced in
IEEE Std 1076.6-1999, "IEEE Standard for VHDL Register
Transfer Level (RTL) Synthesis", and is still present in IEEE Std
TM
1076.6 -2004,
HTTP://IEEEXplore.IEEE.org/search/s...estds&query=((1076.6-1999)<in>metadata)&pos=0
which contains in 7.1.8 Enumeration encoding attribute:
"[..]

NOTE-Use of this attribute may lead to simulation mismatches, e.g.,
with use of relational operators.

[..]"


E.g. from page 13 of Synopsys's pvhdl_2.pdf :
"[..]

You can override the automatic enumeration encodings and specify
your own enumeration encodings with the ENUM_ENCODING
attribute. This interpretation is specific to Presto VHDL, and
overriding might result in a simulation/synthesis mismatch. [..]

[..]"

You can see how "a simulation/synthesis mismatch" would result from
page 15 of pvhdl_7.pdf :
"[..]

Example 7-3 Using the ENUM_ENCODING Attribute
attribute ENUM_ENCODING: STRING;
-- Attribute definition
type COLOR is (RED, GREEN, YELLOW, BLUE, VIOLET);
attribute ENUM_ENCODING of
COLOR: type is "010 000 011 100 001";
-- Attribute declaration
The enumeration values are encoded as follows:
RED = "010"
GREEN = "000"
YELLOW = "011"
BLUE = "100"
VIOLET = "001"
The result is GREEN < VIOLET < RED < YELLOW < BLUE.
Note:
The interpretation of the ENUM_ENCODING attribute is specific to
Presto VHDL. Other VHDL tools, such as simulators, use the
standard encoding (ordering).

[..]"



In timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <[email protected]> posted:
"[..]

Never used buffers, so I dunno about that!"


Apparently almost nobody used them. I never used them, I do not use
them.



"[..]
'Z' not being treated as high impedance by a synthesis tool;

It will be if the thing you are targetting has tristates. Either as
IOBs or internally."


Maybe Synplify does. Synopsys's pvhdl_5.pdf warns instead:
"5
Inferring Three-State Logic 5
Presto VHDL infers a three-state buffer when you assign the value of
Z to a signal or variable. The Z value represents the high-impedance
state. Presto VHDL infers one three-state buffer per process. You
can assign high-impedance values to single-bit or bused signals (or
variables). [..]

[..]

You cannot use the z value in an expression, except for
concatenation and comparison with z, such as in
if (IN_VAL = .Z.) then y<=0 endif;
This is an example of permissible use of the z value in an
expression, but it always evaluates to false. So it is also a
simulation/
synthesis mismatch.

[..]

Be careful when using expressions that compare with the z value.
Design Compiler always evaluates these expressions to false, and
the pre-synthesis and post-synthesis simulation results might differ.
For this reason, Presto VHDL issues a warning when it synthesizes
such comparisons."



In timestamped Fri, 02 Mar 2007 15:55:46
+0000 said:
values being ignored for synthesis;

Works in my tools."


In case someone tries to coerce you into using Synopsys: from
pvhdl_c.pdf :

"[..]

subprogram
Default values for parameters are unsupported. [..]

[..]"



In timestamped Fri, 02 Mar 2007 15:55:46
+0000, Martin Thompson <[email protected]> posted:

"[..]
sensitivity lists being ignored for synthesis;

That depends on the tool.
or other
discrepancies.

Well, that covers a lot ;-)"


Needless to say ... :)



"[..]
This may be too much to expect for timing constraints, but I -- perhaps
naively -- do not see why an asynchronous design should be so dismissable.
How hard could it be to replace tools' warnings that a referenced signal
needs to be added to a sensitivity list with a rule in the language standard
which makes the omission from the sensitivity list illegal?

Because it might be handy for simulating something? I dunno to be
honest.

[..]"


I doubt it.




"[I am not an async expert but...] You can do async design in VHDL and
with synthesis, but proving correctness by simulation does not work
out as I understand it."


I do not have a clue.



"[..]
You may rightly deem that claim of mine to be unwarranted, but outside
of testbenches, I do not see what use the language is if it is not
transferrable to actual hardware.

What?! "Outside of testbenches, I do not see what use..." *Inside* of
testbenches is where I spend most of my coding time! The whole point
of having a rich language is to make running simulations easy.

The fact that we has a synthesisable subset is not a bad thing, just
how real life gets in the way. [..]"


It is possible to write testbenches for VHDL in a language other than
VHDL. I do not argue whether testbenches in VHDL or another language
are better. It is possible to synthesize code in a language other than
a dedicated hardware description language, and we interpret which of
synthesizable and unsynthesizable code are a side effect of
practicalities of reality. I am not trying to convince you on this
point, we simply think about it differently.



"I wish VHDL had *more* non synthesisable features
(like dynamic array sizing for example)."


I am aware of an initiative to add a feature, which may or may not be
synthesizable, to VHDL to aid verification, but I do not believe I had
heard a desire for VHDL to have "*more* non synthesisable features"
before .




" I'd like to write my
testbenches in Python :)"

So why don't you?
HTTP://MyHDL.JanDecaluwe.com/doku.php



"[..]
Martin J. Thompson wrote:

"Multi dimensional arrays have worked (even in synthesis) for years in
my experience.

[..]"

Not always, and not with all tools. E.g. last month, someone
mentioned in
: "Using 2D-Arrays as I/O signals _may_ be a problem for some synthesis
tools. [..]"

Well, that's a bit weak ("*may* be a problem") - what tools do they
currently not work in?"


Ask the author of (Ralf
Hildebrandt). I am not personally aware of any.



Martin J. Thompson wrote:

"> I admit my next example is historical, but Table 7.1-1 Supported and
Unsupported Synthesis Constructs of Ben Cohen's second (1998) edition of
"VHDL Answers to Frequently Asked Questions" contains:
"[..]
[..] multidimensional arrays are not allowed
[..]"

Cheers,
C. P. G.

Yes, in the past it has been a problem. [..]"


By coincidence I was checking something else in the book Luca Fanucci,
"Digital Sistems Design Using VHDL", SEU, 2002 last week and it was
mentioned therein that multidimensional arrays are not
synthesizable. I do not know whether or not they actually were
supported for synthesis at that time.

Regards,
C.P.G.
 
C

Colin Paul Gloster

In timestamped Mon, 05
Mar 2007 13:18:09 +0000, Jonathan Bromley
<[email protected]> posted:
"On 5 Mar 2007 12:20:55 GMT, Colin Paul Gloster
"In VHDL [..] the variable *does* have an
initial value. It's not entirely obvious but at least you will
*always* get consistent results."


I was unaware of this. Could you please tell me more about this? I
have so far received no such response to
"Reading an undefined value",
HTTPS://Bugzilla.Mentor.com/show_bug.cgi?id=120

With respect, this is something that is trivially discovered
from reading the VHDL LRM or any half-decent text book;
there is no mystery about it. Any scalar in VHDL is
initialised to the left-hand value of its subtype's range;
[..]

[..]"


Thank you. I clearly missed this. As a result of your post I have found:

"[..]

4.3.1.2 Signal declarations

[..]

In the absence of an explicit default expression, an implicit default
value is assumed for a signal of a scalar subtype or
for each scalar subelement of a composite signal, each of which is
itself a signal of a scalar subtype. The implicit
default value for a signal of a scalar subtype T is defined to be that
given by T'LEFT.

[..]

4.3.1.3 Variable declarations

[..]

If an initial value expression appears in the declaration of a
variable, then the initial value of the variable is determined
by that expression each time the variable declaration is
elaborated. In the absence of an initial value expression, a
default initial value applies. The default initial value for a
variable of a scalar subtype T is defined to be the value given
by T'LEFT. [..]

[..]"



Jonathan Bromley wrote:
"Unfortunately for a purist such as you, there are many
occasions in hardware design where it is entirely
appropriate to read an uninitialised object. For
example, a pipeline or shift register probably does
not need the hardware overhead of reset; it will
automatically flush itself out over the first few clock
cycles - but only if you allow it to run, which of course
entails reading uninitialised (or default-initialised) values.
Consequently it is appropriate for synthesis tools to do
pretty much what they generally do: don't worry about
initialisations. [..]"

I had not thought of those. I did mention a situation in
in which reading an uninitialized item is acceptable.


" For preference, they should issue warnings
about attempts to do explicit initialisation, since these cannot
be synthesised to hardware on most platforms. However,
even then it may be appropriate to let this past, since the
explicit initialisation may be useful in order to limit the
excessive pessimism that usually occurs when a simulator
is confronted with a lot of 'U' or 'X' values. This issue is
one of those things that hardware designers are required
to be aware of, and failure to attend to it is usually a good
identifying mark of a beginner, or a dyed-in-the-wool
programmer assuming that hardware design is easy."


I am certainly unaware of many important things related to
electronics.


"Please don't assume that hardware design is naive, ignorant
or incompetent simply because it doesn't look exactly
like good software design."


I do not. I am unhappy that electronic engineers are very eager to try
to transfer things which are unsuitable for software to hardware for
which they are also unsuitable, e.g. C++ and UML.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,173
Messages
2,570,940
Members
47,475
Latest member
NovellaSce

Latest Threads

Top