Time::HiRes < 1.91 and glibc 2.4 incompatibility

M

Mark Seger

A long time ago I found a very peculiar timing bug in my open source
performance monitoring tool 'collectl' - I discovered that when glibc
went from version 2.3 to 2.4 it changed the time resolution from
microseconds to nanoseconds, going from 32 bits to 64 bits. It also
tuned out at the time the only one to make the move to that newer glibc
was SuSE. Anyhow, that change broke Time::HiRes for any timing greater
than 4.2 seconds!

I contacted the author of HiRes and he fixed it in the 1.91 release and
things have been fine since. Then yesterday I got an email from a user
who reported unusual timing problems with inconsistent monitoring
intervals which stumped me because collectl does very precise timing,
down to usecs. After a lot of digging around I realized this was the
same problem. Furthermore I also noticed even RHEL5.1 is only using
HiRes 1.86, though I also see they're running glibc 2.5. My first fear
was this is gonna break everywhere but now I'm also thinking it may have
been glibc 2.4 specific.

What I would like to do is check the version of HiRes someone is using
along with which version of glibc they've got and warn them if there's a
problem. I do know I can get the version of HiRes via
Time::HiRes->VERSION, but don't know if there's any way to get a library
version. I know on redhat I can see a version in the library name, but
I don't know if that will always be the case on all distros. I also
don't want to put too much pain into this because things do see to work
ok with 2.5 and so there may be a very small number of systems effected.
However I'm always looking to reduce support questions from users and
as the popularity of collectl grows I want to head off as much of this
sort of thing in the future if I can.

As a bonus question does anyone have any additional experiences with
versions of HiRes and glibc incompatibilities? and if so am I'm right
that things are ok with 2.5?

-mark
 
S

smallpond

A long time ago I found a very peculiar timing bug in my open source
performance monitoring tool 'collectl' - I discovered that when glibc
went from version 2.3 to 2.4 it changed the time resolution from
microseconds to nanoseconds, going from 32 bits to 64 bits. It also
tuned out at the time the only one to make the move to that newer glibc
was SuSE. Anyhow, that change broke Time::HiRes for any timing greater
than 4.2 seconds!


What call into glibc changed?
 
M

Mark Seger

smallpond said:
What call into glibc changed?
I honestly don't know the details. What I do know is if you call ualarm
with a number greater than 4.2M (actually 2**32-1), it will NOT produce
the desired wait if you're using glbic 2.4. It you update HiRes to
V1.91 or greater it will. It seems that this is not a problem with
glibc 2.5 but it would be nice to hear some more confirmation about 2.5.

The following is from the change log for HiRes:

1.91 [2006-09-28]
- ualarm() in SuSE was overflowing after ~4.2 seconds,
probably due to a glibc bug, workaround by using the
setitimer() variant if either useconds or interval >= IV_1E6
(this case seems to vary between systems: are useconds
more than 999_999 for ualarm() defined or not)

Does this help?

-mark
 
S

smallpond

What call into glibc changed?

I honestly don't know the details. What I do know is if you call ualarm
with a number greater than 4.2M (actually 2**32-1), it will NOT produce
the desired wait if you're using glbic 2.4. It you update HiRes to
V1.91 or greater it will. It seems that this is not a problem with
glibc 2.5 but it would be nice to hear some more confirmation about 2.5.

The following is from the change log for HiRes:

1.91 [2006-09-28]
- ualarm() in SuSE was overflowing after ~4.2 seconds,
probably due to a glibc bug, workaround by using the
setitimer() variant if either useconds or interval >= IV_1E6
(this case seems to vary between systems: are useconds
more than 999_999 for ualarm() defined or not)

Does this help?

-mark


The useconds_t type is only defined to support values up to 1,000,000.
Depending on undefined behavior is a mistake on the caller's part.
The AIX C library also returns an error if values >1M are passed in;
it's not a glibc bug.
ualarm is replaced by setitimer, which has seconds and microseconds.
--S
 
M

Mark Seger

The useconds_t type is only defined to support values up to 1,000,000.
Depending on undefined behavior is a mistake on the caller's part.
The AIX C library also returns an error if values >1M are passed in;
it's not a glibc bug.
ualarm is replaced by setitimer, which has seconds and microseconds.
--S

When I first started using sigalrm in my tool many years ago, I was
testing on some redhat 7.2 systems as well as and redhat 9. I'm pretty
sure HiRes only called out time in usecs and didn't specify an upper
limit and it's just worked fine ever since until glibc 2.4. I'm not
sure what was changed internally but it still works just fine now as
long as you use a newer version of the module. There must be other
timer calls that do allow you to exceed 4.2 seconds because it wouldn't
make any sense to have a timer this accurate but not for longer
durations and so maybe HiRes determined which call to make based on your
request? Or maybe it just uses a call that does allow time >4.2 seconds?

All that said, I am very impressed with the accuracy of this timer
because I can literally get my code to within a clock tick of accuracy,
something I think is very lacking in most of the standard performance
monitoring tools - since many of them don't provide long term logging or
fine-grained timestamps, people just don't realize it.

-mark
 
M

Mark Seger

The useconds_t type is only defined to support values up to 1,000,000.
Depending on undefined behavior is a mistake on the caller's part.
The AIX C library also returns an error if values >1M are passed in;
it's not a glibc bug.
ualarm is replaced by setitimer, which has seconds and microseconds.
--S

I've been thinking about this some more, and I guess the question in my
mind is how did this ever work? pre HiRes .91, ularm of 10 seconds
works with glibc 2.3 and doesn't work with glibc 2.4.

Has nobody else tripped over this?

-mark
 
M

Martijn Lievaart

I've been thinking about this some more, and I guess the question in my
mind is how did this ever work? pre HiRes .91, ularm of 10 seconds
works with glibc 2.3 and doesn't work with glibc 2.4.

If something is defined only when <condition> holds it is not defined
that it will not work <condition> doesn't hold. It might even work in
case condition is not satisfied! But depending on that undefined
behaviour might break in the next release, as was most probably the case
here.

M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top