things i would ban from clc

Don Y · Mar 15, 2012

Hi Tim,

Did you read the paper?

Yes, hence my comments. To take (roughly) the cases presented
in the paper:

4.1) You *know* when writing multithreaded code that
control to *any* shared object has to be arbitrated
(this is, IMO, one of the key points the article
tries to skirt... as if wanting NOT to have the
developer be concerned with these issues -- an
admirable goal but not one that rules out threading
libraries as a possible solution). The examples given
in this section show *no* explicit acknowledgement that
the objects/data are shared nor any mechanism to ensure
that sharing works (i.e., not even a library call is
present!)

4.2) is yet another example of the above. Except, in
addition to the lack of any formal access control
mechanism, there is now the acknowledgement that a
datum may span "objects". I.e., yet another case of
wanting to protect the programmer from things he might
have failed to notice (though still not anything that
precludes a library based solution)

4.3) Frankly, I don't see the problem here (too early
in the day?). Aside from the INEFFICIENCY that is
introduced... where is the code "broken"? OK, the
compiler's optimizations may have been unfortunate
but does the code perform as intended?

5.1) Makes the efficiency argument explicit -- and
goes to my comment regarding truly parallel implementations.

for (my_prime = start; my_prime < 10000; ++my_prime) {
if (!get(my_prime)) {
for (mult = my_prime; mult < 100000000; mult += my_prime) {
if (!get(mult)) {
set(mult);
}
}
}
}

is written *assuming* multiple threads can access the array
(sieve) hidden behind get()/set() inexpensively and concurrently.
The author then complains that the implementation of set()/get()
can cause problems -- for exactly the same reasons in 4.x.

I.e., the authors are advocating freeing the developer from
any concerns associated with implementing concurrency.
"Let's let the compiler consider all of these possible
cases and craft some rules that are *different* from the
rules a multithreaded programmer already SHOULD know..."
Yet, complain when safeguarding against those problems
(e.g., by invoking a mutex per shared object) becomes
"expensive": "Where's *my* free lunch??"

Sure, it would be delightful to *have* such protection
(i.e., scissors that automatically cover their blades
when they detect that you are "running with them").
But, I don't see how that PREVENTS a library based solution
from working.

E.g., wrap the internals of get()/set() with a lock.
The concurrency problems go away -- but performance
plummets. BECAUSE YOUR THREADS ARE COMPETING FOR TOO
MUCH DATA.

OTOH, having a thread doing this while another thread is
balancing your checkbook and still another is darning
your socks incurs NO performance penalty.

Yeah, its unfortunate that I can't get 500 digits of
precision in all of my math "for free". But, I don't
*expect* it to be free!

boil down to
removing the need for the developer to "take precautions"
(i.e., manually ensure that the compiler doesn't "get ahead
of him") along with wanting to be able to use the language
to *efficiently* implement truly parallel threads of
execution. [snip elaboration]

Click to expand...

The problem is the compiler is operating at the wrong level of
abstraction. Whether the compiler "gets ahead" of a developer
(a frightening concept in and of itself, but let's ignore
that) doesn't matter, because the operating environment that
(non-thread-aware) compilers generate code for doesn't make
strong enough guarantees about inter-thread or inter-processor
memory consistency. To get threads to work usably, at least
part of the compiler needs to be aware of a lower level of
abstraction, below the architectural level for a single
process (which is where essentially all pre-thread-aware
compilers operate).

I don't see that. I think that's only the case if you want
the compiler to be able to implement the safeguards *for* you.
And, I only see it pertaining to certain types of optimizations
(if the compiler doesn't make those optimizations, then your
code isn't at risk for them!)

I.e., I don't see how that PRECLUDES the use of a library.

Compilers are unaware of interrupts. An interrupt can occur
between any two instructions. Does that mean a you can't write
code in C that will operate in the presence of interrupts?

What it *does* mean is that anything that your code could be
doing that an interrupt might want to ASYNCHRONOUSLY interfere
with needs to be protected against that interference. Likewise,
anything your code wants to be able to *convey* to your ISR
needs to take precautions that the ISR sees "the whole picture"
and not just *part* of it.

Am I missing something, here?

Don Y · Mar 15, 2012

Hi Jacob,

Le 15/03/12 20:26, Don Y a écrit :

I know, I have written printf for my compiler system lcc-win.
It is quite big mind you:

d:\repos\lcc-src\libc>pedump /summary xprintf.obj
xprintf.obj 41871 bytes, linked Tue Mar 13 19:22:37 2012

Section Name Size
01 .text 14780
02 .data 1112

almost 15K, + 12 K for character strings, tables, etc...

Exactly. You can write an *application* in that much
space! And all *it* does is "print stuff".

But it will print a denormalized number extracting all the
available precision from it.

Supports all the C99 formats, etc. Actually printf is a
run time intepreter of the format string. My printf also
must support all the extensions: 450 significant digits
qfloats, printng the comma separator to separate the number
in thousands, and many other "goodies".

I would like to have to possibility of trimming it, something
like "slimPrintf()" but it would make programming more complex
than it needs to.

If you have support for late/lazy binding, you can modularize
it and have only the portions that are actually *used*
dragged into the executable (e.g., in a hosted environment).

A more practical approach, IME, is to discard printf and
adopt hybrid functions that handle the formats of interest
to you.

E.g., I take great pains *not* to use floats/doubles in
most of my applications. The cost is usually too high
and the results they make available don't warrant the
expense (floating point emulation library, suppporting
floating point context in the OS, etc.).

Instead, I put the binary point where I want it and
carefully analyze the range and domain of each operation
that I perform to ensure I get the precision I need
in my results. (yeah, more tedious but cuts the
resource requirement$ significantly)

Don Y · Mar 15, 2012

// at end of progr someone has to free the list children

a pointer can point to each region of memory, that can be used
from functions for doing all is possible to do

Sure! And where do you synchronize access to that region with
other threads? What happens when the region is "full"? Do you
reset the pointer to the "beginning" and keep going? How does
the thread on the other end know *where* you are in this region
and where the "still undefined" portion of the region begins
(so it doesn't look for real data in those places)

Limbo lets you just create one (and "pass it around") just like
you would an int, etc.

So, while you can accumulate the stuff intended to be transfered
across that channel *in* a string (assuming the channel is

Click to expand...

is not a string is a pointer to a unsigned char
[but it could be aligned to unsigned if the function initialize
it allow...]

Understood. But it still isn't a complete communication
system. See above

i have no problem with these char pointers...

But *you* are responsible for implementing the list and the
tuples within it. I have no problem multiplying integers
in a loop of successive shifts and conditional adds. But,
I would much rather type an '*' between the two variable
names! :>

yes that is supposed a real programmer should do

Sure. IF THE LANGUAGE DOESN'T DO IT FOR YOU!

I.e., when a UN*X process ends, the system doesn't complain
that you forgot to free() some memory that you had allocated
BEFORE the process terminated.

The designers of the process semantics opted to take care of
this little detail *for* you because it is something that
is common enough in actual use. "Sorry, you forgot to
free() your resources so we are going to mess with the
operation of other UNRELATED processes in the system just
to ensure those folks get REALLY mad at you and force you
to behave more hospitably" :>

i have not problem with memory leak

no "output" can point, for example to a C like FILE struct etc

Again, see above. How does "logging()" know that you have
added something to that region of memory? What happens when
logging() gets ahead of you? Does it just print whatever
jibberish happened to reside in that region of memory
BEFORE you got around to stuffing tmpStr into it?

[big snip]

can be upside down and down upside, at end i say it should be ok

i'm not agree

That's your perogative! :> Some of us even use assembly language
(gasp! heresy!!)

leak are not my problems, but problem of my malloc()
function implementation ...

If you forget to free something that *you* caused to be allocated
(either directly or indirectly), then the leak is *your* problem.
You can't blame the memory subsystem for failing to read your
mind.

Ben Bacarisse · Mar 16, 2012

Don Y said:
Hi Ben,

Never! It needs semi-colons, and it needs the printf arguments to be in
the right order! Also, making the field width equal to the precision is
a little odd for the 'e' conversion.

I use it quite a lot, but that may just be that I like to parametrise my
code a lot. By the way, it's not a "flag". printf specifiers do have
flags (there are +, -, #, ' ' and 0) so using the wrong term might be
confusing.

Click to expand...

Wow, I am *so* relieved! With all your criticisms, I was
afraid you might not have UNDERSTOOD WHAT I INTENDED! Whew!
Glad I won't have to worry about *that*!

[If you have some EXTRA free time, how about checking my past
posts for spelling and grammatical errors. It might help
others who are confused by a typo here or there... e.g., my
recent use of 'int' in place of 'age'... There might be
*other* folks who would also welcome your attention to this
level of detail!]

I'm sorry. I have no desire to annoy you with details that don't
interest you. Some people are interested in details and some are not
and I made the wrong call. I will try not to do it again.

<snip>

Don Y · Mar 16, 2012

Hi Ben,

Don Y said:
Don Y said:

Hi Ben,

<snip>
When was the last time you used:

digits = INT_MAX
decimals = INT_MAX
printf("The answer is %*.*e", value, digits, decimals)

Never! It needs semi-colons, and it needs the printf arguments to be in
the right order! Also, making the field width equal to the precision is
a little odd for the 'e' conversion.

In fact, when was the last time you used the '*' flag in printf??

I use it quite a lot, but that may just be that I like to parametrise my
code a lot. By the way, it's not a "flag". printf specifiers do have
flags (there are +, -, #, ' ' and 0) so using the wrong term might be
confusing.

Click to expand...

Wow, I am *so* relieved! With all your criticisms, I was
afraid you might not have UNDERSTOOD WHAT I INTENDED! Whew!
Glad I won't have to worry about *that*!

[If you have some EXTRA free time, how about checking my past
posts for spelling and grammatical errors. It might help
others who are confused by a typo here or there... e.g., my
recent use of 'int' in place of 'age'... There might be
*other* folks who would also welcome your attention to this
level of detail!]

Click to expand...

I'm sorry. I have no desire to annoy you with details that don't
interest you. Some people are interested in details and some are not
and I made the wrong call. I will try not to do it again.

No, I apologize for "jumping down your throat". I understand
the point(s) you were trying to make.

OTOH, it is very annoying to have little "distractions" that
don't address the *substance* of the argument being made.
Too often, a thread wanders into totally arcane territory
at the expense of the original subject matter. Lots of
text flows back and forth but very little information
gets added to the actual discussion.

If someone posts a code fragment or makes a statement that
"doesn't look right" (to me), I *assume* they are intelligent
and wonder what *my* problem might be in grasping what they
are trying to convey. Am I not seeing some issue that *they*
have (and are trying to draw my attention to)? Has some
previous comment of mine been misinterpreted/unclear? Is
there a typographical error that could explain the confusion
on my part? Is natural *language* a problem? etc.

I'm not a teacher "grading papers". Nor do I have a desire to
be. And, unless the comments/code/etc. are *clearly* off the
mark, I suspect the other person isn't interested in my
corrections -- if they re-read their post, chances are, they
will find them independently. Nor do I want folks feeling they
have to quickly update their posts to correct piddly errors
to avoid having others correct them on their behalf.

E.g., in a discussion with "io_x", I have to consider if the
code he (?) presented glossed over a key feature of my
example -- or, if it addressed that feature in a different
manner. *Not* whether his code would compile without errors.
Or whether it would achieve the intended results. (a compiler
and debugger can comment more authoritatively on those issues
WITHOUT my involvement).

Instead, my focus is on "do you understand what this
*mechanism* is and what it does *for* you?" I.e., do you
understand what *I* am saying and do I understand what
*you* are saying...

Again, my apologies.

Tim Rentsch · Mar 19, 2012

Don Y said:
Hi Tim,

Did you read the paper?

Click to expand...

Yes, hence my comments. To take (roughly) the cases presented
in the paper:

4.1) You *know* when writing multithreaded code that
control to *any* shared object has to be arbitrated
[snip elaboration]

You missed the point of the example, which was to illustrate a
problem in defining how the thread library will behave when the
language specification doesn't say anything about inter-thread
memory access semantics.

4.2) is yet another example of the above. Except, in
addition to the lack of any formal access control
mechanism, there is now the acknowledgement that a
datum may span "objects". I.e., yet another case of
wanting to protect the programmer from things he might
have failed to notice (though still not anything that
precludes a library based solution)

You are misreading the example. 4.1 and 4.2 bring up different
issues. Your characterization of the example is just silly,
because it presumes that the very problem being identified has
been solved already.

4.3) Frankly, I don't see the problem here (too early
in the day?). Aside from the INEFFICIENCY that is
introduced... where is the code "broken"? OK, the
compiler's optimizations may have been unfortunate
but does the code perform as intended?

No, that's the point - the optimizing process has introduced a
race condition.

5.1) Makes the efficiency argument explicit -- and
goes to my comment regarding truly parallel implementations.

for (my_prime = start; my_prime < 10000; ++my_prime) {
if (!get(my_prime)) {
for (mult = my_prime; mult < 100000000; mult += my_prime) {
if (!get(mult)) {
set(mult);
}
}
}
}

is written *assuming* multiple threads can access the array
(sieve) hidden behind get()/set() inexpensively and concurrently.
The author then complains that the implementation of set()/get()
can cause problems -- for exactly the same reasons in 4.x.

You have misunderstood the point. The one case that behaves
incorrectly is included only for performance comparison. That
case is irrelevant to the issue being discussed in this section.

I.e., the authors are advocating freeing the developer from
any concerns associated with implementing concurrency.

No, they aren't.

"Let's let the compiler consider all of these possible
cases and craft some rules that are *different* from the
rules a multithreaded programmer already SHOULD know..."
Yet, complain when safeguarding against those problems
(e.g., by invoking a mutex per shared object) becomes
"expensive": "Where's *my* free lunch??" [snip elaboration]

It appears you have completely misunderstood the issue the paper
is trying to identify. Unless a language specification defines
inter-thread memory access semantics, it simply is not possible
for a developer to know what "rules" he should follow for
multithreaded programs.

boil down to
removing the need for the developer to "take precautions"
(i.e., manually ensure that the compiler doesn't "get ahead
of him") along with wanting to be able to use the language
to *efficiently* implement truly parallel threads of
execution. [snip elaboration]

Click to expand...

The problem is the compiler is operating at the wrong level of
abstraction. Whether the compiler "gets ahead" of a developer
(a frightening concept in and of itself, but let's ignore
that) doesn't matter, because the operating environment that
(non-thread-aware) compilers generate code for doesn't make
strong enough guarantees about inter-thread or inter-processor
memory consistency. To get threads to work usably, at least
part of the compiler needs to be aware of a lower level of
abstraction, below the architectural level for a single
process (which is where essentially all pre-thread-aware
compilers operate).

Click to expand...

I don't see that. I think that's only the case if you want
the compiler to be able to implement the safeguards *for* you.

No, it's true regardless of whether the burden for putting in the
safeguards rests on the compiler or the developer.

And, I only see it pertaining to certain types of optimizations
(if the compiler doesn't make those optimizations, then your
code isn't at risk for them!)

There may have been a time when that was true, but for most
modern processors these kinds of optimizations can take place in
hardware at run-time without the compiler ever being aware of
them.

I.e., I don't see how that PRECLUDES the use of a library.

It doesn't preclude the use of a library; it just means
a library by itself is not sufficient.

Compilers are unaware of interrupts. An interrupt can occur
between any two instructions. Does that mean a you can't write
code in C that will operate in the presence of interrupts?

Yes, it does mean that, if by "C" what is meant is ISO standard C
with no dependencies on implementation-defined or undefined
behavior.

What it *does* mean is that anything that your code could be
doing that an interrupt might want to ASYNCHRONOUSLY interfere
with needs to be protected against that interference. Likewise,
anything your code wants to be able to *convey* to your ISR
needs to take precautions that the ISR sees "the whole picture"
and not just *part* of it.

There is no way to do this without involving the implementation
at some level, because standard C does not have expressive enough
semantics -- without making guarantees beyond what the language
definition itself provides -- to convey what needs conveying.

Am I missing something, here?

I think you are confusing (1) how an individual implementation
behaves, and (2) what the language definition, by itself, guarantees
for the behavior of all implementations. They aren't the same.

Don Y · Mar 23, 2012

Hi Tim,

[Apologies for the delay in replying -- I'm dealing with travel
& meetings for the next bit]

Don Y said:
Don Y said:

Hi Tim,

I assume most people in the group have seen this, but for those
who have not:

Threads Cannot be Implemented as a Library
http://www.hpl.hp.com/techreports/2004/HPL-2004-209.pdf

I think the problem being addressed, here,

Did you read the paper?

Click to expand...

Yes, hence my comments. To take (roughly) the cases presented
in the paper:

4.1) You *know* when writing multithreaded code that
control to *any* shared object has to be arbitrated
[snip elaboration]

Click to expand...

You missed the point of the example, which was to illustrate a
problem in defining how the thread library will behave when the
language specification doesn't say anything about inter-thread
memory access semantics.

I see no library functions in this example. How will:

printf("Hello\n");

printf("Goodbyte\n");

behave (in different threads)? Will you see:
Hello
Goodbye
or
Goodbye
Hello
or
HGeololdobye

or
a flashing "screen" of noise?

Races are a fact of life in multithreaded applications.
TO THE EXTENT THAT THOSE THREADS SHARE ACCESS TO A RESOURCE(s).
That's why there are synchronization primitives, etc.

If you read each of the examples, the author wants the
language itself to make guarantees about how these races
are resolved.

In reality, you would bracket each such thread access with
a mutex or other mechanism to effectively guarantee atomic
operation. The author complains (just before section 4)
that:

"This approach clearly works most of the time.

Gee! :>

"Unfortunately, we will see that it is too imprecise
to allow the programmer to reason convincingly about
program correctness, or to provide clear guidance to
the compiler implementor.

OK, so writing multithreaded code isn't easy. Nor is writing
functionally correct code. Nor real-time code. Nor...

"As a result, apparently correct programs may fail
intermittently, or start to fail when a new compiler
or hardware version is used.

Sure! And:
hourly_wage = pay / hours_worked;
will ALSO fail intermittently -- when hours_worked is *0*!

"The resulting failures are triggered by specific thread
schedules, and are thus relatively hard to detect during
testing.

Yes. But, so far, I hear nothing that says why a *library*
based solution WILL NOT WORK. After all, the title of the
paper clearly makes that assertion:
_Threads Cannot Be Implemented As a Library_
I contend that the author would have been better served using:
_Threads SHOULD NOT Be Implemented As a Library_
to convey his *opinion* on their suitability.

"A secondary problem with this approach is that, in some
cases, it excludes the best performing algorithmic solutions.

Yes. But, that's a consequence of the language. FORTRAN
makes it hard to implement lists. You find another way.
Again, nothing that precludes a library for threading.

"As a result, many large systems, either intentionally, or
unintentionally, violate the above rules. The resulting
programs are then even more susceptible to the above problems."

So, are we to be paternalistic, now? Let's make the language
such that the developer *can't* make a mistake? Shall we get
rid of division out of fear that the programmer might fail to
test for a divisor of 0? Ban pointers because someone might
fiddle with something that should be faddled, instead?

Didn't *Java* try to solve all these problems??

You are misreading the example. 4.1 and 4.2 bring up different
issues. Your characterization of the example is just silly,
because it presumes that the very problem being identified has
been solved already.

Again, where has the author demonstrated the "problem" with
the library approach? He's just shown another race -- that's
a bit subtler to spot.

The same argument applies to:
struct {
int dozens;
int individual_eggs;
} x;
with:
x.individual_eggs++;
if (x.individual_eggs >= 12) {
x.individual_eggs = 0;
x.dozens++;
}
in which a concurrent thread could see a MONOTONICALLY
INCREASING "egg count" progress as:
{4, 10}
(4, 11}
{4, 12}
{4, 0}
{5, 0}
Again, why hasn't the author demonstrated why a library
--->CANNOT<--- be used for threading??

No, that's the point - the optimizing process has introduced a
race condition.

How does the code "not work" -- how does the library NOT
provide the guarantees sought?

You have misunderstood the point. The one case that behaves
incorrectly is included only for performance comparison. That
case is irrelevant to the issue being discussed in this section.

Section 5 focuses on "Performance". 5.1 highlights "Expensive
Synchronization". The sieve example tries to illustrate how
locking isn't "technically" needed (it assumes true and false
values can be obtained in an inherently atomic fashion). But,
that the developer can make no claims as to how effective the
speedup will actually be for this algorithm -- since events
can conspire so arrange updates to always be "just out of
step" with competing threads -- so that those threads end up NOT
being able to take advantage of the work of their peers
(because that work's progress has been disclosed to them
just a tiny bit too late).

Per the last paragraph in section 5.1:
"But even with 4 threads, the properly synchronized code
only barely exceeds the performance of a single
synchronization-free thread, and that only with the use
of spin-locks"
Sure sounds like the author is complaining about the *cost* of
these primitives!

No, they aren't.

I see nothing where the use of a library implementation has been
demonstrated as not working. If this is the point, then *litter*
every example with specific library invocations and point out
how they *can't* address the problem.

"Let's let the compiler consider all of these possible
cases and craft some rules that are *different* from the
rules a multithreaded programmer already SHOULD know..."
Yet, complain when safeguarding against those problems
(e.g., by invoking a mutex per shared object) becomes
"expensive": "Where's *my* free lunch??" [snip elaboration]

Click to expand...

It appears you have completely misunderstood the issue the paper
is trying to identify. Unless a language specification defines
inter-thread memory access semantics, it simply is not possible
for a developer to know what "rules" he should follow for
multithreaded programs.

The rules are simple: if a resource is accessed in different
threads, then *you* have to ensure those accesses don't
"interfere" with each other. All of the author's examples
illustrated cases where the author appears to want the
*compiler* to be able to resolve these issues *for* you.

Should the compiler look at how I initialize a pointer and,
based on that knowledge, *limit* the range of operations
that I can perform on that pointer and the range of values
that it can take on? E.g., if I initialize the pointer
to the start of a const char array, should the compiler
PREVENT me from advancing it beyond the end of that array?
Surely, there is no reason for me to refer to a location
"much beyond" the end of the array (i.e., strlen+1)!

We're perfectly content to NOT have the language prescribe
what happens in this case. And, our code somehow manages
to work despite this "implementation (un)defined" behavior.

boil down to
removing the need for the developer to "take precautions"
(i.e., manually ensure that the compiler doesn't "get ahead
of him") along with wanting to be able to use the language
to *efficiently* implement truly parallel threads of
execution. [snip elaboration]

The problem is the compiler is operating at the wrong level of
abstraction. Whether the compiler "gets ahead" of a developer
(a frightening concept in and of itself, but let's ignore
that) doesn't matter, because the operating environment that
(non-thread-aware) compilers generate code for doesn't make
strong enough guarantees about inter-thread or inter-processor
memory consistency. To get threads to work usably, at least
part of the compiler needs to be aware of a lower level of
abstraction, below the architectural level for a single
process (which is where essentially all pre-thread-aware
compilers operate).

Click to expand...

I don't see that. I think that's only the case if you want
the compiler to be able to implement the safeguards *for* you.

Click to expand...

No, it's true regardless of whether the burden for putting in the
safeguards rests on the compiler or the developer.

Again, I don't see that. The compiler can't arbitrarily rewrite
code. Otherwise, the ultimate compiler would rewrite ALL
programs:
main() {
...
exit();
}
as:
main() {
exit();
...
}
I don't see how a library *can't* (author's words) provide
that.

There may have been a time when that was true, but for most
modern processors these kinds of optimizations can take place in
hardware at run-time without the compiler ever being aware of
them.

And that's why you design memory barriers, etc. in your
application. To *force* the hardware to get back into
lock-step with the application.

It doesn't preclude the use of a library; it just means
a library by itself is not sufficient.

*Why*? What *can't* the library do?

Yes, it does mean that, if by "C" what is meant is ISO standard C
with no dependencies on implementation-defined or undefined
behavior.

But libraries can be implemented *outside* of the Standard -- as
long as the interface to the library conforms. Or, are you
claiming that the library *also* has to be "portable C"?

There is no way to do this without involving the implementation
at some level, because standard C does not have expressive enough
semantics -- without making guarantees beyond what the language
definition itself provides -- to convey what needs conveying.

You can interface to the language without being *part* of
the language. I.e., I can create a function that causes
the interrupts to be disabled in a processor. I can put that
function into a library. I can create different versions of
that function for different processor architectures. How
does that "not work"?

I think you are confusing (1) how an individual implementation
behaves, and (2) what the language definition, by itself, guarantees
for the behavior of all implementations. They aren't the same.

I understand that you can't do these things ENTIRELY within the
formally defined domain of the C language.

But, the title of the article was:
"Threads Cannot be Implemented as a Library"
it didn't say that the library had to be written in portable C.
It didn't say that the library had to be written in C at all!

I've got this great new Hardware Abstraction Language. The
primitives in this language are 'flush_cache', 'synchronize_caches',
'begin_atomic_operation', etc. The beauty of this language is
that a conforming HAL compiler will "do the right thing" for
the processor for which it is targeted. A command line
option to the HAL compiler lets you create a binding for
C, Pascal, Java, etc. The output can be in COFF, ELF, etc.

So, it's relatively straightforward to build a library with
C bindings that a C *application* can use to reliably provide
these mechanisms.

The unfortunate thing is that the HAL compiler exists "between
the ears" of a select few developers instead of as an executable.

Again, I see the author's failure being the choice of title;
"Threads SHOULD NOT be Implemented as a Library"
highlighting how hard it is to write multithreaded code (in
which case, why bother with yet another article stating the
obvious?) or:
"Threads Cannot be Implemented as a Library Written in C"
highlighting the guarantees that the library would have to
provide to the application developer and WHY THE LANGUAGE
(as it stands) CAN'T MAKE THOSE GUARANTEES, etc.

Kaz Kylheku · Mar 23, 2012

Don Y said:
Don Y said:

"Let's let the compiler consider all of these possible
cases and craft some rules that are *different* from the
rules a multithreaded programmer already SHOULD know..."
Yet, complain when safeguarding against those problems
(e.g., by invoking a mutex per shared object) becomes
"expensive": "Where's *my* free lunch??" [snip elaboration]

Click to expand...

It appears you have completely misunderstood the issue the paper
is trying to identify. Unless a language specification defines
inter-thread memory access semantics, it simply is not possible
for a developer to know what "rules" he should follow for
multithreaded programs.

A multi-threading API in which a thread cannot pass a pointer to automatic
storage into some module, such that this module can then execute code on
another thread which accesses that memory, is about as useful as
tits on a bull.

It means that you cannot do anything which resembles client-server RPC
interaction between threads, without marshaling or copying the arguments going
back and forth, which is a nonstarter, because one of the aims of threads is

Rui Maciel · Mar 23, 2012

Kaz said:
because one of the aims of threads is

Hang on tight. The rest of the post will appear once the thread finishes
marshalling.

Rui Maciel

Jens Gustedt · Mar 23, 2012

Am 03/23/2012 08:43 PM, schrieb Kaz Kylheku:

A multi-threading API in which a thread cannot pass a pointer to automatic
storage into some module, such that this module can then execute code on
another thread which accesses that memory, is about as useful as
tits on a bull.

Yes, it is probably quite useless. On the other hand C11 says in 6.2.4
p 5

"The result of attempting to indirectly access an object with
automatic storage duration from a thread other than the one with which
the object is associated is implementation-defined."

So an implementation of C11 may not allow this, as long as it
documents its behavior properly.

Jens

Ian Collins · Mar 23, 2012

Am 03/23/2012 08:43 PM, schrieb Kaz Kylheku:

Yes, it is probably quite useless. On the other hand C11 says in 6.2.4
p 5

"The result of attempting to indirectly access an object with
automatic storage duration from a thread other than the one with which
the object is associated is implementation-defined."

So an implementation of C11 may not allow this, as long as it
documents its behavior properly.

Sometimes I think the standard allows implementations too much latitude!
I'm not aware of any threading implementation that doesn't use a
shared address space. The use of a shared address space is often used
as one of the conditions that differentiate between a threading and a
process model.

Don Y · Mar 23, 2012

Hi Robert,

No, you're missing the point. C, as defined by the standard (at least
up to C11 with threading support), does *not* define a language that
can be reliably combined with a threading library, and expected to
produce *any* defined, or even consistent, results.

Threading-with-a-library works *only* if the implementation makes a
number of additional promises about the behavior of the program,
promises that are *not* part of the standard (again excepting C11 with
__STDC_NO_THREADS__ undefined).

Now threading obviously works with C in many cases, but that's just
because the compiler writers have, either by choice or accident,
implemented the required semantics that will make threading work
(consider the word tearing issues mentioned in the paper). It's not
something you can generically expect of C implementations.

So, what *specific* language in the specification (pre-2011)
gives the compiler writer latitude to make the following
"misbehave" (pseudo-code):

...
x = y = 0
spawn(A)
spawn(B)
...
begin_atom()
printf("%d %d", x, y)
end_atom()
...

A()
{
...
begin_atom()
x=1
y=2
end_atom()
...
}

B()
{
...
begin_atom()
x=3
y=4
end_atom()
...
}

I.e., anything other than the output of "0 0", "1 2" or "3 4"
would be considered "misbehaving".

And, specifically, why *can't* {begin,end}_atom() provide
the (obvious) functionality?

[OK, language lawyers... strut your stuff! :> ]

Kaz Kylheku · Mar 24, 2012

Am 03/23/2012 08:43 PM, schrieb Kaz Kylheku:

Yes, it is probably quite useless. On the other hand C11 says in 6.2.4
p 5

"The result of attempting to indirectly access an object with
automatic storage duration from a thread other than the one with which
the object is associated is implementation-defined."

This kind of waffling is pointless word semantics. Regardless of whether you
require this, or allow implementations to duck out of it, there will be the
pretty much exactly the same implementations in the world with the same
capabilities.

It's just a question of whether they are marked as nonconforming
on this account, or conforming but with a reduced functionality.

To someone coding in the proverbial trenches, it makes no difference.

Morally speaking, implementations that break stuff like this shuld be deemed
nonconforming, in a kind of "hall of shame".

Standards should be bold and just require stuff to work, and not waste words on
waffling.

Jens Gustedt · Mar 24, 2012

Am 03/23/2012 11:54 PM, schrieb Ian Collins:

Sometimes I think the standard allows implementations too much latitude!

usually they have the simple reason that there is some implementation
that they want to cover

I'm not aware of any threading implementation that doesn't use a shared
address space. The use of a shared address space is often used as one
of the conditions that differentiate between a threading and a process
model.

No, no, that is not what it says. Statically allocated and malloc'ed
objects *must* work. It is just that an implementation might forbit a
thread to peek into the "local variables" of another thread. I could
imagine that this can be used on special hardware with per cpu
memory. Think of implementing a thread model on a graphics card, for
example.

But I agree that this is nothing I'd like to program

Jens

Jens Gustedt · Mar 24, 2012

Am 03/24/2012 06:45 AM, schrieb Kaz Kylheku:

This kind of waffling is pointless word semantics. Regardless of whether you
require this, or allow implementations to duck out of it, there will be the
pretty much exactly the same implementations in the world with the same
capabilities.

It's just a question of whether they are marked as nonconforming
on this account, or conforming but with a reduced functionality.

To someone coding in the proverbial trenches, it makes no difference.

Morally speaking, implementations that break stuff like this shuld be deemed
nonconforming, in a kind of "hall of shame".

Standards should be bold and just require stuff to work, and not waste words on
waffling.

You are ranting a bit quickly, I find.

Such a thing partially covers existing practice for programming e.g on
GPUs, where you have something like global memory that all PE can
access, statically identified of some sort or malloced, and local
memory that is restricted to PE and thus to the thread.

I am personally not a great fan of CUDA but if one day they could map
the new thread model on this, this would probably a big progress.

Jens

gwowen · Mar 26, 2012

Sometimes I think the standard allows implementations too much latitude!
I'm not aware of any threading implementation that doesn't use a
shared address space. The use of a shared address space is often used
as one of the conditions that differentiate between a threading and a
process model.

But one could imagine a distributed system where heap and static
allocations refer to some centralised datastore (possibly locally
cached, with some higher-level central constraints to maintain
consistency).[0] "Threads" are then local processes and sharable-
pointers are opaque references to the central store, and the "memory
space" spanned by these references is quite distinct from those used
by process-local variables.

It's a little far fetched, but I'd imagine the standard writers had
something in mind when they made it implementation defined (as opposed
to undefined).

[0] Please don't make me say "cloud"

Tim Rentsch · Mar 27, 2012

Don Y said:
[snip]

I echo the comments of Robert Wessel, whose response
matches my perception.

Beyond that, your reply seems largely nonresponsive and
simply repetitive of earlier comments. Usually I think
there is not much value in trying to respond to someone
who is not being responsive. May I suggest that you
put less effort into arguing, and more into reading and
listening carefully?

Replies to Seebach - attempting to post to clc moderated	5	Sep 11, 2009
Hello everyone, I would need help in bioinfo please.	3	Sep 5, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
clc-wiki answer to K+R exercise 2-7	6	May 16, 2010
Looking to start a project but I need help	2	Mar 8, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
I HAVE MADE AN IMPROVED INPUT FOR INTEGERS ONLY	3	Oct 28, 2024
a couple of things I don't understand wrt lists	7	Apr 16, 2013

things i would ban from clc

Don Y

Don Y

Don Y

Ben Bacarisse

Don Y

Tim Rentsch

Don Y

Kaz Kylheku

Rui Maciel

Jens Gustedt

Ian Collins

Don Y

Kaz Kylheku

Jens Gustedt

Jens Gustedt

gwowen

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads