Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication?
Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.
So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.
[...] And you're not considering a lag metric other than just number of
hops?
Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.
Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.
Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.
How about generating C source that you #include from a resident file in
your project, would that be an option?
I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.
The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.
Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?
Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)
Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)
I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.
Dude: malloc (65536) is not portable by the same reasoning, and:
If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.
What are you trying to say here? Help me out a bit.
I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.
If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.
My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it. At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)
Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.
Basically, yes.
Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.
This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)
[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.
I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory. You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.
[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.
Not its not. Its not portable, that's for sure.
If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.
Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.
[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.
High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.
Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste
I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.
*BZZZT!!!* Wrong.
It would be a drag to do so and it would be slow, but it is possible.
Impossible. Slow, fast or otherwise. Its just impossible.
I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.
That is, as long as I can get my msize() function as well
Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.
Yes, as close as gcc is, it does. And in other cases, I have the standard.
But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.
Ah, the "tit-for-tat" technique...!
So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.
---- Proposed addition to the standard to add stack overflows:
---- Draft
a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):
And what if the local/parameters are not stored, or storable on the stack?
What if an implementation uses multiple stacks depending on the *type* of the
parameter?
- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:
You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.
(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER
- The ULMU for a struct shall be equal to
the sum of upper limits of its members.
This doesn't sound right with implementation defined struct alignment issues.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members.
Doesn't that assume that the implementation implements overlapping union
members? (It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)
- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.
((the last one is certainly wasteful, but this could be refined))
The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERHEAD.
And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)
Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<functionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?
Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.
b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.
But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.
c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.
You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.
But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.
As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.
But I don't understand how this is any different from free form function names.
I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.
You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.
When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.
You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.
Too much freedom is a dangerous thing, in programming at least.
Yeah we should all rename our functions as func0000, func0001, etc ...
It would make for an interesting experiment, sociologically
This "experiment" is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations, and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.
A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.
Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!
The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.
You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.
I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?
As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.
What's so bad about using third-party libraries?
From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).
How's that? How are you going to improve things by adding a carry operator?
In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?
You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)
Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...
About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.
Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.
The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.
No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.
[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.
No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.
I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.
No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.
I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.
[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.
The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:
while (HeyBankAreYouDoneYet()) {
runSetiAtHome();
}
in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome()". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)
But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?
Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented. Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.
It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.
But if you want the OF, then why not have another operator for that?
Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.
OF is never the important flag. Its not relevant at all. I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.
[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.
It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)
My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how
c +< v = a + b
would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.
In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".
Ok. That just leaves the case of two signed values, then.
See above. Carry is not a signedness sensitive semantic.
I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.
I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion. In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics. But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)