find out the problem

Alexei A. Frounze · Aug 4, 2005

Krishanu Debnath said:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No, my crystal ball is not on strike, actuall I don't have any. I can
only
infer from what you have posted. The whole objection came up with the
shift
expression where value of the RHS is same as the width of the LHS.

Okay, I just saw your reply on Old Wolf's post. You almost took 24 hr
to realize this.

That's because I was thought I wrote 31 (done that many times before) and
even didn't bother to look at the code to make sure it was indeed 31. And
what's worst I used old Borland compiler that ate the source and produced
executable that gave me what I was expecting.

Sorry, it was
unintentional.

Alex

Jasen Betts · Aug 4, 2005

That all is correct, no doubt, however, my personal opinion is that that's
wrong to take the shift count and blindly feed it to the asm instruction
with such a limitation. It's as illogical as the need to cast one of two
integer multipliers to long integer in order to get the correct long integer
product. That strikes the math guys in the back. Anyway, I was talking about
a slightly different problem of the shift -- whether or not it's SAR-like or
something else, especially something very odd, probably not any kind of
shift at all.

C is defined to compile to efficient assembler code on a wide range of
hardware, for this reason if you step outside of the standard you may
get unexpected behavior.

if you want a language that behaves predictably in all situations C isn't it.
maybe you could try java

Bye.
Jasen

Jasen Betts · Aug 4, 2005

I don't agree that the cast is illogical. And dealing with this
doesn't have to generate inefficient code in the cases where it's
NOT needed, unlike the shift case, which slows down all variable shifts.

it's not possible to determine what type the product should have in all
situations

#include <stdio.h>
int foo(const char * s, int a, int b,int c)
{
printf(%s,a*b,c);
}

So is it needed or not above, the type used for the expression a*b will
have a direct bearing on the behavior of the printf call...

Bye.
Jasen

Alexei A. Frounze · Aug 4, 2005

Jasen Betts said:
C is defined to compile to efficient assembler code on a wide range of
hardware,

These days the compilers do a good work. And it wasn't a question of
performance, rather of correctness.

for this reason if you step outside of the standard you may
get unexpected behavior.

To me unexpected is to get a wrong value, just the lower bits of the correct
one, even when "lvalue" is big enough to hold the correct one. But the
standard requires me to put the silly cast in or I'm in trouble of not
getting such simple thing as product of integers. IMO, in case when the
calculations may give more bits than the operands the compiler should
compile in such a way so as not to loose a single bit somewhere in the
middle. I consider the C requirement we're talking about here be broken.
It's an HLL and as such is supposed to help write better code quicker, to
implement the math (and other) algorithms in a form of machine code, but it
stumbles on integers. What a surprise to every new C programmer.

if you want a language that behaves predictably in all situations C isn't it.
maybe you could try java

No, thanks.

Somehow, I think java won't help me to get certain things
done, instead it'll make it impossible to do them at all.
Alex

Walter Roberson · Aug 4, 2005

To me unexpected is to get a wrong value, just the lower bits of the correct
one, even when "lvalue" is big enough to hold the correct one. But the
standard requires me to put the silly cast in or I'm in trouble of not
getting such simple thing as product of integers. IMO, in case when the
calculations may give more bits than the operands the compiler should
compile in such a way so as not to loose a single bit somewhere in the
middle. I consider the C requirement we're talking about here be broken.
It's an HLL and as such is supposed to help write better code quicker, to
implement the math (and other) algorithms in a form of machine code, but it
stumbles on integers. What a surprise to every new C programmer.

for (i=1, x=1; i <1000; i++) x *= i;

Now, what size is x afterwards?

Do you realize that you are proposing that the compiler must use
a "bignum" package for all integral arithmetic, so as to be able to handle
arbitrary precision arithmetic? Consider that long * long
has the potential for producing a value which is up to
2 * CHAR_BIT * sizeof long bits wide (e.g., something that would
take a 'long long' to represent, except C89 has no 'long long'.)

It is legal in C for
sizeof char == sizeof short == sizeof int == sizeof long
i.e., there might only -be- one size of integral value, which is
fine in C89 if that value is at least 32 bits wide.

If your proposal were to carry, then unless indefinite precision
were to be required, then addition, subtraction and multiplication
would all have to be prohibitted (unless the compiler were to
insert size tests around each operation... and what's it supposed
to do if one of those tests fails??)

Alexei A. Frounze · Aug 4, 2005

Walter Roberson said:
for (i=1, x=1; i <1000; i++) x *= i;

Now, what size is x afterwards?

The same as before.

Do you realize that you are proposing that the compiler must use
a "bignum" package for all integral arithmetic, so as to be able to handle
arbitrary precision arithmetic?

No, I am not suggesting *that*. What I want is to do this kind of stuff
safely:
int x=something, y=something;
long z = x * y;
w/o a cast of either x or y to long. Just that sort of thing.

Consider that long * long
has the potential for producing a value which is up to
2 * CHAR_BIT * sizeof long bits wide (e.g., something that would
take a 'long long' to represent, except C89 has no 'long long'.)

It is legal in C for
sizeof char == sizeof short == sizeof int == sizeof long
i.e., there might only -be- one size of integral value, which is
fine in C89 if that value is at least 32 bits wide.

That's fine to have them all identical if the CPU supports only one
type/size of integer, however awkward that CPU can be (e.g. having strictly
32(64) bit ALU).

If your proposal were to carry, then unless indefinite precision
were to be required, then addition, subtraction and multiplication
would all have to be prohibitted (unless the compiler were to
insert size tests around each operation... and what's it supposed
to do if one of those tests fails??)

The compiler *knows* the sizes of the types, so, there's nothing to check.
Likewise, if I were to
int x=something, y=something;
long z = x + y;
I'd like to get the result w/o overflow/underflow since the result can fit
in z (provided, of course, sizeof(long)>sizeof(int), but that's a different
story IMO).

Alex

Keith Thompson · Aug 5, 2005

Alexei A. Frounze said:
No, I am not suggesting *that*. What I want is to do this kind of stuff
safely:
int x=something, y=something;
long z = x * y;
w/o a cast of either x or y to long. Just that sort of thing.

So you want the evaluation of an expression, particularly the type of
the expression, to be influenced by its context. In this case, you
want x * y to be evaluated as type long (either by promoting the
operands or by using an int*int-->long operation) because the result
is assigned to a variable of type long.

That's not entirely unreasonable, and there are languages that do that
kind of thing.

The advantage of the C approach is that it's simpler once you
understand it. It can be counterintuitive if you approach expressions
mathematically, but it's relatively easy to describe (no special rules
for how the context affects expression evaluation) *and* easy to work
around when you need different behavior.

I suspect that whatever rules you set up, there are going to be cases
where the result is counterintuitive.

I also suspect there's no way to change the C evaluation rules without
breaking existing code, which means no such change will ever happen,
even if the result might be a better language.

Alexei A. Frounze · Aug 6, 2005

Keith Thompson said:
So you want the evaluation of an expression, particularly the type of
the expression, to be influenced by its context. In this case, you
want x * y to be evaluated as type long (either by promoting the
operands or by using an int*int-->long operation) because the result
is assigned to a variable of type long.
Right.

That's not entirely unreasonable, and there are languages that do that
kind of thing.

If I'm not mistaken, that's what Pascal does... Wait a minute, a test
program for TurboPascal shows this same problem and this same workaround
through casting. I thought it's only C's prob...

The advantage of the C approach is that it's simpler once you
understand it.

I understand it, but I'm not sure where exactly it's simpler.

It can be counterintuitive if you approach expressions
mathematically,

That's what I'd like to have. And btw, many ASMs (which aren't HLLs) have a
multiplication instruction that gives me what I want and my only job to save
the result in a big enough variable after this MUL. Of course, there are
others, like pairs of instructions that calculate the halves (LSB/W/* and
MSB/W/*) of the product (there are such in MMX subset). But it seems to me
that most of the times the full product is achieved through one simple
instruction.

but it's relatively easy to describe (no special rules
for how the context affects expression evaluation) *and* easy to work
around when you need different behavior.

I don't need different.

I suspect that whatever rules you set up, there are going to be cases
where the result is counterintuitive.
Example?

I also suspect there's no way to change the C evaluation rules without
breaking existing code, which means no such change will ever happen,
even if the result might be a better language.

No doubt.

Thanks,
Alex

Chris Torek · Aug 6, 2005

[Regarding a multiply of the form N-bits x N-bits => 2N-bits]

.... And btw, many ASMs (which aren't HLLs) have a multiplication
instruction that gives me what I want and my only job to save
the result in a big enough variable after this MUL. Of course, there are
others, like pairs of instructions that calculate the halves (LSB/W/* and
MSB/W/*) of the product (there are such in MMX subset). But it seems to me
that most of the times the full product is achieved through one simple
instruction.

I am reasonably conversant in a few machine instruction sets (x86,
SPARC, VAX, 680x0, MIPS), and can do some work in about a dozen
more. I would say that machines that deliver the full 2N-bit result
are only about half as common as those that produce only the N-bit
low-order result -- although some machines (e.g., VAX) have some
way (often a separate, slower instruction) to produce the full
2N-bit result.

MIPS (at least as of R2000/R3000; they may have changed this in the
R4000) is unusual in having all multiplies produce their results in
the special "multiply result" register-pair, making it easy to get
either the N-bit result with one instruction (read multiplier-lo), or
the 2N-bit result with two (read multiplier-lo and multiplier-hi).

SPARC originally had no multiply at all (only "multiply step", from
which one synthesized multiply, using the special %y register) but
acquired multiply instructions in V8 and V9. The multiply
instructions deliver only an N-bit result. Using %y you can do
a 32x32 => 64 multiply, but on V9, %y remains 32 bits, so you
cannot obtain the 64x64 => 128 result with a single instruction.

The "quad" library I wrote for 4.4BSD includes code to compute 32
x 32 => 64 bit results (well, "half-quad x half-quad => full-quad"
-- it will do 64x64 => 128 if the number of bits in a quad is 128).
The code is fairly short, so here it is. Note that the function
name is that used by GCC and is in the implementor's namespace.

(The quad routines also include Knuth's Algorithm D for division,
optimized again for 64-bit divisor / 64-bit dividend => 64-bit
quotient with 64-bit remainder. As with the multiply code, I
attempted to parameterize it so that "32 bits" and "64 bits" are
not assumed, but this is the only way it has been used.)

/*-
* Copyright (c) 1992, 1993
* The Regents of the University of California. All rights reserved.
*
* This software was developed by the Computer Systems Engineering group
* at Lawrence Berkeley Laboratory under DARPA contract BG 91-66 and
* contributed to Berkeley.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/

#if defined(LIBC_SCCS) && !defined(lint)
static char sccsid[] = "@(#)muldi3.c 8.1 (Berkeley) 6/4/93";
#endif /* LIBC_SCCS and not lint */

#include "quad.h"

/*
* Multiply two quads.
*
* Our algorithm is based on the following. Split incoming quad values
* u and v (where u,v >= 0) into
*
* u = 2^n u1 * u0 (n = number of bits in `u_long', usu. 32)
*
* and
*
* v = 2^n v1 * v0
*
* Then
*
* uv = 2^2n u1 v1 + 2^n u1 v0 + 2^n v1 u0 + u0 v0
* = 2^2n u1 v1 + 2^n (u1 v0 + v1 u0) + u0 v0
*
* Now add 2^n u1 v1 to the first term and subtract it from the middle,
* and add 2^n u0 v0 to the last term and subtract it from the middle.
* This gives:
*
* uv = (2^2n + 2^n) (u1 v1) +
* (2^n) (u1 v0 - u1 v1 + u0 v1 - u0 v0) +
* (2^n + 1) (u0 v0)
*
* Factoring the middle a bit gives us:
*
* uv = (2^2n + 2^n) (u1 v1) + [u1v1 = high]
* (2^n) (u1 - u0) (v0 - v1) + [(u1-u0)... = mid]
* (2^n + 1) (u0 v0) [u0v0 = low]
*
* The terms (u1 v1), (u1 - u0) (v0 - v1), and (u0 v0) can all be done
* in just half the precision of the original. (Note that either or both
* of (u1 - u0) or (v0 - v1) may be negative.)
*
* This algorithm is from Knuth vol. 2 (2nd ed), section 4.3.3, p. 278.
*
* Since C does not give us a `long * long = quad' operator, we split
* our input quads into two longs, then split the two longs into two
* shorts. We can then calculate `short * short = long' in native
* arithmetic.
*
* Our product should, strictly speaking, be a `long quad', with 128
* bits, but we are going to discard the upper 64. In other words,
* we are not interested in uv, but rather in (uv mod 2^2n). This
* makes some of the terms above vanish, and we get:
*
* (2^n)(high) + (2^n)(mid) + (2^n + 1)(low)
*
* or
*
* (2^n)(high + mid + low) + low
*
* Furthermore, `high' and `mid' can be computed mod 2^n, as any factor
* of 2^n in either one will also vanish. Only `low' need be computed
* mod 2^2n, and only because of the final term above.
*/
static quad_t __lmulq(u_long, u_long);

quad_t
__muldi3(a, b)
quad_t a, b;
{
union uu u, v, low, prod;
register u_long high, mid, udiff, vdiff;
register int negall, negmid;
#define u1 u.ul[H]
#define u0 u.ul[L]
#define v1 v.ul[H]
#define v0 v.ul[L]

/*
* Get u and v such that u, v >= 0. When this is finished,
* u1, u0, v1, and v0 will be directly accessible through the
* longword fields.
*/
if (a >= 0)
u.q = a, negall = 0;
else
u.q = -a, negall = 1;
if (b >= 0)
v.q = b;
else
v.q = -b, negall ^= 1;

if (u1 == 0 && v1 == 0) {
/*
* An (I hope) important optimization occurs when u1 and v1
* are both 0. This should be common since most numbers
* are small. Here the product is just u0*v0.
*/
prod.q = __lmulq(u0, v0);
} else {
/*
* Compute the three intermediate products, remembering
* whether the middle term is negative. We can discard
* any upper bits in high and mid, so we can use native
* u_long * u_long => u_long arithmetic.
*/
low.q = __lmulq(u0, v0);

if (u1 >= u0)
negmid = 0, udiff = u1 - u0;
else
negmid = 1, udiff = u0 - u1;
if (v0 >= v1)
vdiff = v0 - v1;
else
vdiff = v1 - v0, negmid ^= 1;
mid = udiff * vdiff;

high = u1 * v1;

/*
* Assemble the final product.
*/
prod.ul[H] = high + (negmid ? -mid : mid) + low.ul[L] +
low.ul[H];
prod.ul[L] = low.ul[L];
}
return (negall ? -prod.q : prod.q);
#undef u1
#undef u0
#undef v1
#undef v0
}

/*
* Multiply two 2N-bit longs to produce a 4N-bit quad, where N is half
* the number of bits in a long (whatever that is---the code below
* does not care as long as quad.h does its part of the bargain---but
* typically N==16).
*
* We use the same algorithm from Knuth, but this time the modulo refinement
* does not apply. On the other hand, since N is half the size of a long,
* we can get away with native multiplication---none of our input terms
* exceeds (ULONG_MAX >> 1).
*
* Note that, for u_long l, the quad-precision result
*
* l << N
*
* splits into high and low longs as HHALF(l) and LHUP(l) respectively.
*/
static quad_t
__lmulq(u_long u, u_long v)
{
u_long u1, u0, v1, v0, udiff, vdiff, high, mid, low;
u_long prodh, prodl, was;
union uu prod;
int neg;

u1 = HHALF(u);
u0 = LHALF(u);
v1 = HHALF(v);
v0 = LHALF(v);

low = u0 * v0;

/* This is the same small-number optimization as before. */
if (u1 == 0 && v1 == 0)
return (low);

if (u1 >= u0)
udiff = u1 - u0, neg = 0;
else
udiff = u0 - u1, neg = 1;
if (v0 >= v1)
vdiff = v0 - v1;
else
vdiff = v1 - v0, neg ^= 1;
mid = udiff * vdiff;

high = u1 * v1;

/* prod = (high << 2N) + (high << N); */
prodh = high + HHALF(high);
prodl = LHUP(high);

/* if (neg) prod -= mid << N; else prod += mid << N; */
if (neg) {
was = prodl;
prodl -= LHUP(mid);
prodh -= HHALF(mid) + (prodl > was);
} else {
was = prodl;
prodl += LHUP(mid);
prodh += HHALF(mid) + (prodl < was);
}

/* prod += low << N */
was = prodl;
prodl += LHUP(low);
prodh += HHALF(low) + (prodl < was);
/* ... + low; */
if ((prodl += low) < low)
prodh++;

/* return 4N-bit product */
prod.ul[H] = prodh;
prod.ul[L] = prodl;
return (prod.q);
}

cs · Aug 7, 2005

[Regarding a multiply of the form N-bits x N-bits => 2N-bits]

for me it could be <10 of assembly lines for each cpu

Alexei A. Frounze · Aug 17, 2005

Chris Torek said:
[Regarding a multiply of the form N-bits x N-bits => 2N-bits]

.... And btw, many ASMs (which aren't HLLs) have a multiplication
instruction that gives me what I want and my only job to save
the result in a big enough variable after this MUL. Of course, there are
others, like pairs of instructions that calculate the halves (LSB/W/* and
MSB/W/*) of the product (there are such in MMX subset). But it seems to me
that most of the times the full product is achieved through one simple
instruction.

Click to expand...

I am reasonably conversant in a few machine instruction sets (x86,
SPARC, VAX, 680x0, MIPS), and can do some work in about a dozen
more. I would say that machines that deliver the full 2N-bit result
are only about half as common as those that produce only the N-bit
low-order result -- although some machines (e.g., VAX) have some
way (often a separate, slower instruction) to produce the full
2N-bit result.

MIPS (at least as of R2000/R3000; they may have changed this in the
R4000) is unusual in having all multiplies produce their results in
the special "multiply result" register-pair, making it easy to get
either the N-bit result with one instruction (read multiplier-lo), or
the 2N-bit result with two (read multiplier-lo and multiplier-hi).

Maybe you're right about the probability/spread of these two types of
getting the fill 2N-bit-long product.

As for getting 2N-bit result with two instructions, I think 2nd instruction
shouldn't be just a read instruction, implying that 1st one does all the
work. 2nd one should calculate the rest of the product (high bits) based on
the subtotal contained in that inernal sum register. This appears the most
logical when we think of multiplying numbers of different signedness. I
mean, whatever the operands are (both unsigned, both signed or mixed) the
low N bits of the product will always be the same.

Thanks for quoting Knuth. I have his book. At times, it seems to be very
nice reading.

I only wish the text was a bit easier to follow, more
detailed. At the same time, I'm not sure inventing (M)MIX was a good idea to
explain how algorithms work with it. Simpler pseudocode would do better IMO.
Alex

I'm tempted to quit out of frustration	1	Aug 13, 2023
Logic Problem with BigInteger Method	2	Aug 26, 2023
Just starting out	1	Jan 13, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Trying to figure out http request POST phrasing	1	Mar 30, 2023
A real math problem.	5	Jan 2, 2024
Java matrix problem	3	Sep 10, 2023
Very basic C code cannot find the problem. vga project	0	Apr 15, 2022

find out the problem

Alexei A. Frounze

Jasen Betts

Jasen Betts

Alexei A. Frounze

Walter Roberson

Alexei A. Frounze

Keith Thompson

Alexei A. Frounze

Chris Torek

cs

Alexei A. Frounze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads