multithreading ; linux adn windows behave differently

K

kumarchi

hello:

I wrote a simple program which does simple math loop and I am testing
under dual core processor

systm1:
intel dual core laptop ; windows xp os
when I spawn of two threads in windows (both under visual c and cygwin
cc) the program behaves as expected.
in single thread mode the time is 2x and clearly one of hte one do of
the processor is being utilized

system2:
amd 4200x2 desktop ubuntu hardy 8.04
here actually the single thread gets slightly better performance. in
multi thereaded both the cpu's are 100% utilized. but even in single
threaded both the cpu's are alternatively being used 50/100 %!!

gurus:
any idea why the linux system (i am assuming the difference is due to
OS) is behaving differently?

here is the simple code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <pthread.h>

typedef struct {
pthread_t t;
pthread_attr_t t_atr;
void *(*func) (void *);
void *arg;
}genpth_type;

long times=200;

static void dest (void *pthv)
{
genpth_type *pth = (genpth_type *)pthv;

if(pth)
free (pth);
}

static genpth_type *new (void)
{
genpth_type *item=0;

item = calloc (1, sizeof (*item));

return item;
}

static void *testfunc (double *val)
{
long cnt=2e5;
long k = 0;
long i=0;

for (k=0; k<times; k++)
{
for (i=0; i<cnt; i++)
{
double d = (double)rand();
double x = 0;

x = pow(d, 0.55);
x = exp(x);

x *= 0.8;

x += 3.5;

x = log10(x);

*val = x;
}
}

return val;
}

int main (int argc, char **argv)
{
genpth_type *t1=0;
genpth_type *t2=0;

double t1d=1;
double t2d=10;
long i =0;
double *val=0;
long status = 0;
pthread_t self;
time_t tt1, tt2;
double dt=0;


time (&tt1);


if(argc > 1)
{

times *= 2;

testfunc (&t1d);

time (&tt2);

dt = difftime (tt2, tt1);
printf ("\n dt = %lg \n", dt);
}

else
{

t1 = new ();

t1->func = testfunc;
t1->arg = &t1d;

status = pthread_create (&(t1->t), &(t1->t_atr), t1->func, t1-



t2 = new ();

t2->func = testfunc;
t2->arg = &t2d;

status = pthread_create (&(t2->t), &(t2->t_atr), t2->func, t2-



pthread_join (t1->t, 0);
pthread_join (t2->t, 0);





printf ("\n t1d=%lg\n", t1d);
printf ("\n t2d=%lg\n", t2d);


time (&tt2);
dt = difftime (tt2, tt1);
printf ("\n dt = %lg \n", dt);
}


exit (0);

}
 
S

Szabolcs Borsanyi

hello:

I wrote a simple program which does simple math loop  and I am testing
under dual core processor

This question is definitely off topic here, but I managed to find one
topical aspect:
library functions (especially rand()) are not reentrant.
Do not use them without a synchronising mechanism. For better
performance, implement your own or use libc's reentrant
random number generators, where the internal information about
the next random number is stored in automatic variables, which
are on a thread local stack on your platforms.
Avoid calling the same library functions from the threads and
<off>
avoid the frequent access of the same memory (page) from
several threads.
</off>

Szabolcs
 
K

Keith Thompson

I wrote a simple program which does simple math loop and I am testing
under dual core processor
[...]

here is the simple code

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <time.h>
#include <pthread.h>
[...]

Try asking in comp.programming.threads.
 
A

Antoninus Twink

amd 4200x2 desktop ubuntu hardy 8.04
here actually the single thread gets slightly better performance. in
multi thereaded both the cpu's are 100% utilized. but even in single
threaded both the cpu's are alternatively being used 50/100 %!!

I find dt = 4 running the two threads, versus dt = 7 for one thread
(Debian, 2.6.24 kernel), so I guess there's some problem with your
setup.

Are you running an SMP kernel? Does cat /proc/cpuinfo report both cores?
What was the system load when you ran the test?
 
A

Antoninus Twink

library functions (especially rand()) are not reentrant.
Do not use them without a synchronising mechanism. For better
performance, implement your own or use libc's reentrant
random number generators, where the internal information about
the next random number is stored in automatic variables, which
are on a thread local stack on your platforms.

I'm not sure what function you have in mind. The reentrant version of
rand provided by POSIX is rand_r(), which takes a pointer to a seed as
its argument - it's very unlikely that TLS will be needed or desirable.
 
S

Szabolcs Borsanyi

I'm not sure what function you have in mind. The reentrant version of
rand provided by POSIX is rand_r(), which takes a pointer to a seed as
its argument - it's very unlikely that TLS will be needed or desirable.

You are right, rand_r() is posix, indeed. I did not mean TLS
as thread local storage for global or static variables, but the stack,
owned by the thread, that holds the object pointed to by the argument
of
rand_r(). And since rand_r takes just an unsigned*, its quality of
RNG is limited by design, this often rules out this function.

Szabolcs
 
K

kumarchi

I find  dt = 4 running the two threads, versus dt = 7 for one thread
(Debian, 2.6.24 kernel), so I guess there's some problem with your
setup.

Are you running an SMP kernel? Does cat /proc/cpuinfo report both cores?
What was the system load when you ran the test?

yes my kernel is 2.6.24-17 and the procinfo reports the 2 cores
 
K

Keith Thompson

yes my kernel is 2.6.24-17 and the procinfo reports the 2 cores

Antoninus Twink is trying to disrupt this newsgroup by encouraging
off-topic posts. Please post any further questions to
comp.programming.threads or to a newsgroup that deals with your
operating system, where you will find experts who can actually answer
your questions. Thank you.
 
K

kumarchi

I find  dt = 4 running the two threads, versus dt = 7 for one thread
(Debian, 2.6.24 kernel), so I guess there's some problem with your
setup.

Are you running an SMP kernel? Does cat /proc/cpuinfo report both cores?
What was the system load when you ran the test?

I compiled with optimization . mo i am getting dt=21(for 2 treads) vs
dt=27 ; some improvement. by the what ishte hardware you are using .
it is way way faster than mine 5x ; that is incredible
 
A

Antoninus Twink

I compiled with optimization . mo i am getting dt=21(for 2 treads) vs
dt=27 ; some improvement. by the what ishte hardware you are using .
it is way way faster than mine 5x ; that is incredible

Probably not - I reduced the number of outer loops :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top