Threads, timing and HTTP

A

Alexander Lamb

--Apple-Mail-13--882754710
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=ISO-8859-1;
delsp=yes;
format=flowed

Hello list,

I am implementing a very simple script to ping web servers or =20
services (to monitor how our environment is functionning).

Some production url's run on more than one host. Therefore, I start a =20=

thread for each separate url.

The function run in my threads is:

def doPing(uri_string, probe)
s =3D uri_string
while true
begin
timeout(@seconds_before_timeout) do |timeout_length|
start =3D Time.new
begin
open(s) do |result|
if result.status[0] !=3D "200"
probe.addToLogFile([s,'ERR',0,result.status[1]])
else
probe.addToLogFile([s,'OK',Time.new - start,''])
end
end
rescue Exception
probe.addToLogFile([s,'ERR',0,$!])
end
end
rescue Timeout::Error
probe.addToLogFile([s,'ERR',0,'timeout'])
end
sleep(@seconds_between_ping)
end
end

However, this is a problem. Indeed, I want also to measure the time =20
(round-trip) it takes for the ping (these are only simple pings for =20
the time being). As you can see I get the local time before and after =20=

the call. But this doesn't work with threads. Indeed, since the =20
process is shared among threads, the time will be dependent on the =20
number of threads I am running and not a correct view of the actual =20
time it takes to ping.

I can't define the piece of code between the two times as critical =20
and only for one thread because if the open-uri blocks, it will =20
prevent another thread to ping another url in the mean time.

Any idea? Maybe use processes instead of threads?

Thanks for any hints!
--
Alexander Lamb
Service d'Informatique M=E9dicale
H=F4pitaux Universitaires de Gen=E8ve
(e-mail address removed)
+41 22 372 88 62
+41 79 420 79 73






--Apple-Mail-13--882754710--
 
R

Robert Klemme

Alexander said:
Hello list,

I am implementing a very simple script to ping web servers or
services (to monitor how our environment is functionning).

Some production url's run on more than one host. Therefore, I start a
thread for each separate url.

The function run in my threads is:

def doPing(uri_string, probe)
s = uri_string
while true
begin
timeout(@seconds_before_timeout) do |timeout_length|
start = Time.new
begin
open(s) do |result|
if result.status[0] != "200"
probe.addToLogFile([s,'ERR',0,result.status[1]])
else
probe.addToLogFile([s,'OK',Time.new - start,''])
end
end
rescue Exception
probe.addToLogFile([s,'ERR',0,$!])
end
end
rescue Timeout::Error
probe.addToLogFile([s,'ERR',0,'timeout'])
end
sleep(@seconds_between_ping)
end
end

However, this is a problem. Indeed, I want also to measure the time
(round-trip) it takes for the ping (these are only simple pings for
the time being). As you can see I get the local time before and after
the call. But this doesn't work with threads. Indeed, since the
process is shared among threads, the time will be dependent on the
number of threads I am running and not a correct view of the actual
time it takes to ping.

I can't define the piece of code between the two times as critical
and only for one thread because if the open-uri blocks, it will
prevent another thread to ping another url in the mean time.

Any idea? Maybe use processes instead of threads?

Maybe you can exploit one of the result headers. Chances are that there
is a timestamp somewhere. Then you *only* need to synchronize clocks on
your machine and on servers...

Or you switch to a single thread solution. I don't know your ping
interval but if you don't need to ping too often and don't have too many
servers that should be ok. You can create a simple scheduling that always
picks the URL with the closest ping point...

Regards

robert
 
A

Alexander Lamb

Alexander said:
Hello list,

I am implementing a very simple script to ping web servers or
services (to monitor how our environment is functionning).

Some production url's run on more than one host. Therefore, I start a
thread for each separate url.

The function run in my threads is:

def doPing(uri_string, probe)
s = uri_string
while true
begin
timeout(@seconds_before_timeout) do |timeout_length|
start = Time.new
begin
open(s) do |result|
if result.status[0] != "200"
probe.addToLogFile([s,'ERR',0,result.status[1]])
else
probe.addToLogFile([s,'OK',Time.new - start,''])
end
end
rescue Exception
probe.addToLogFile([s,'ERR',0,$!])
end
end
rescue Timeout::Error
probe.addToLogFile([s,'ERR',0,'timeout'])
end
sleep(@seconds_between_ping)
end
end

However, this is a problem. Indeed, I want also to measure the time
(round-trip) it takes for the ping (these are only simple pings for
the time being). As you can see I get the local time before and after
the call. But this doesn't work with threads. Indeed, since the
process is shared among threads, the time will be dependent on the
number of threads I am running and not a correct view of the actual
time it takes to ping.

I can't define the piece of code between the two times as critical
and only for one thread because if the open-uri blocks, it will
prevent another thread to ping another url in the mean time.

Any idea? Maybe use processes instead of threads?

Maybe you can exploit one of the result headers. Chances are that
there
is a timestamp somewhere. Then you *only* need to synchronize
clocks on
your machine and on servers...

Or you switch to a single thread solution. I don't know your ping
interval but if you don't need to ping too often and don't have too
many
servers that should be ok. You can create a simple scheduling that
always
picks the URL with the closest ping point...
I could go single thread (since indeed I am doing a ping per 30
seconds more or less). However, if the first one I try hangs (until a
timeout for example), I am pushing back the time at which I will ping
the second url. Logically I would need to do something like "ping
each url one after another unless one of them seems to take longer
and then detach a thread to wait for the answer".
For the time being I will test forking a process.

Alex
 
R

Robert Klemme

Alexander said:
Alexander said:
Hello list,

I am implementing a very simple script to ping web servers or
services (to monitor how our environment is functionning).

Some production url's run on more than one host. Therefore, I start
a thread for each separate url.

The function run in my threads is:

def doPing(uri_string, probe)
s = uri_string
while true
begin
timeout(@seconds_before_timeout) do |timeout_length|
start = Time.new
begin
open(s) do |result|
if result.status[0] != "200"
probe.addToLogFile([s,'ERR',0,result.status[1]])
else
probe.addToLogFile([s,'OK',Time.new - start,''])
end
end
rescue Exception
probe.addToLogFile([s,'ERR',0,$!])
end
end
rescue Timeout::Error
probe.addToLogFile([s,'ERR',0,'timeout'])
end
sleep(@seconds_between_ping)
end
end

However, this is a problem. Indeed, I want also to measure the time
(round-trip) it takes for the ping (these are only simple pings for
the time being). As you can see I get the local time before and
after the call. But this doesn't work with threads. Indeed, since
the process is shared among threads, the time will be dependent on
the number of threads I am running and not a correct view of the
actual time it takes to ping.

I can't define the piece of code between the two times as critical
and only for one thread because if the open-uri blocks, it will
prevent another thread to ping another url in the mean time.

Any idea? Maybe use processes instead of threads?

Maybe you can exploit one of the result headers. Chances are that
there
is a timestamp somewhere. Then you *only* need to synchronize
clocks on
your machine and on servers...

Or you switch to a single thread solution. I don't know your ping
interval but if you don't need to ping too often and don't have too
many
servers that should be ok. You can create a simple scheduling that
always
picks the URL with the closest ping point...
I could go single thread (since indeed I am doing a ping per 30
seconds more or less). However, if the first one I try hangs (until a
timeout for example), I am pushing back the time at which I will ping
the second url. Logically I would need to do something like "ping
each url one after another unless one of them seems to take longer
and then detach a thread to wait for the answer".
For the time being I will test forking a process.

You could also have a controller thread that watches your single testing
thread. If the testing takes longer than n seconds (where n << timeout)
it sets a flag for the current testing thread (with a thread local
variable for example) and starts a new tester thread.

Cheers

robert
 
R

Robert Klemme

Robert said:
Alexander said:
Alexander Lamb wrote:

Hello list,

I am implementing a very simple script to ping web servers or
services (to monitor how our environment is functionning).

Some production url's run on more than one host. Therefore, I start
a thread for each separate url.

The function run in my threads is:

def doPing(uri_string, probe)
s = uri_string
while true
begin
timeout(@seconds_before_timeout) do |timeout_length|
start = Time.new
begin
open(s) do |result|
if result.status[0] != "200"
probe.addToLogFile([s,'ERR',0,result.status[1]])
else
probe.addToLogFile([s,'OK',Time.new - start,''])
end
end
rescue Exception
probe.addToLogFile([s,'ERR',0,$!])
end
end
rescue Timeout::Error
probe.addToLogFile([s,'ERR',0,'timeout'])
end
sleep(@seconds_between_ping)
end
end

However, this is a problem. Indeed, I want also to measure the time
(round-trip) it takes for the ping (these are only simple pings for
the time being). As you can see I get the local time before and
after the call. But this doesn't work with threads. Indeed, since
the process is shared among threads, the time will be dependent on
the number of threads I am running and not a correct view of the
actual time it takes to ping.

I can't define the piece of code between the two times as critical
and only for one thread because if the open-uri blocks, it will
prevent another thread to ping another url in the mean time.

Any idea? Maybe use processes instead of threads?


Maybe you can exploit one of the result headers. Chances are that
there
is a timestamp somewhere. Then you *only* need to synchronize
clocks on
your machine and on servers...

Or you switch to a single thread solution. I don't know your ping
interval but if you don't need to ping too often and don't have too
many
servers that should be ok. You can create a simple scheduling that
always
picks the URL with the closest ping point...
I could go single thread (since indeed I am doing a ping per 30
seconds more or less). However, if the first one I try hangs (until a
timeout for example), I am pushing back the time at which I will ping
the second url. Logically I would need to do something like "ping
each url one after another unless one of them seems to take longer
and then detach a thread to wait for the answer".
For the time being I will test forking a process.

You could also have a controller thread that watches your single
testing thread. If the testing takes longer than n seconds (where n
<< timeout) it sets a flag for the current testing thread (with a
thread local variable for example) and starts a new tester thread.

Yet another idea: you make the testing semi critical. When a thread
starts testing it stores a timestamp somewhere. Every other thread checks
whether the timestamp is set and is only max n seconds away. If it's
longer, replace the timestamp with it's own timestamp and go ahead. If
we're still in the n seconds range, go on sleeping.

robert
 
Y

Yohanes Santoso

Alexander Lamb said:
However, this is a problem. Indeed, I want also to measure the time
(round-trip) it takes for the ping (these are only simple pings for
the time being). As you can see I get the local time before and after
the call.
But this doesn't work with threads. Indeed, since the process is
shared among threads, the time will be dependent on the number of
threads I am running and not a correct view of the actual time it
takes to ping.

Using process instead of thread would also have the same problem. You
still can't guarantee that your execution path is not suspended
between start time to end time; the CPU is still shared among
processes.

You'd have to use an OS that can give you this guarantee. The catch
is, there is no port of ruby to such OS yet.

But probably you don't need a guarantee, just a 'good enough' is good
enough. In this case, there is little benefit made from using
processes, and personally, I won't bother to do so for a simple
monitoring program.
I can't define the piece of code between the two times as critical

Even if you define it, you can't count on the ruby process not being
suspended by some external factors.

The fundamental problem is simple: limited resources. As long as there
are multiple executions desiring access to the same resources (cpu
time, network time), there is bound to be some contentions.
Any idea? Maybe use processes instead of threads?

For critical monitoring services, one use an OS that can guarantee
some amount of cpu time within some duration to a process. Examples of
this is found in many places like in your car's ABS controller and
your local neighbourhood's nuclear power station.

YS.
 
A

Alexander Lamb

--Apple-Mail-2--876048802
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=ISO-8859-1;
delsp=yes;
format=flowed

Well, as you said: "good enough" is ok since I need to test a real =20
life situation. For example I have several apps calling some web =20
services on some given servers. Obviously if many apps call at the =20
same time the timing will be different, as is my probe in Ruby. =20
However, I had the feeling that by starting my two or three threads =20
exactly at the same time to ping some servers, the timing result is =20
not really correct. Using processes, of course several processes will =20=

fight for resources but I am more in a real life situation.

However, reading your replies, I thing a good approximation is to =20
slightly offset by 3-4 seconds each ping (e.g. not starting all the =20
threads at the same time). Then I have a very high probability, even =20
after several hours of pings, to have only one ping thread running at =20=

a time thus giving me a good approximation of the time taken.

Slightly off topic: what I am trying to do is to monitor the way our =20
systems work. We are very distributed and need to setup alarms if =20
some service goes down. A little bit like products such as BigBrother =20=

but more application oriented. I saw on the agenda of Euruku05 :

Using Ruby to monitor enterprise software from Sven C. Koehler.

There are no slides or description but it could be something similar =20
to what I am trying to do (actually that we already did in a previous =20=

version in Java but I wanted to simplify it and make it more =20
customizable). Can someone give me pointers or maybe even Mr. Koehler =20=

if he is on this list?

Thanks,
--
Alexander Lamb
Service d'Informatique M=E9dicale
H=F4pitaux Universitaires de Gen=E8ve
(e-mail address removed)
+41 22 372 88 62
+41 79 420 79 73





Robert said:
Alexander said:
On Oct 21, 2005, at 3:01 PM, Robert Klemme wrote:


Alexander Lamb wrote:


Hello list,

I am implementing a very simple script to ping web servers or
services (to monitor how our environment is functionning).

Some production url's run on more than one host. Therefore, I =20
start
a thread for each separate url.

The function run in my threads is:

def doPing(uri_string, probe)
s =3D uri_string
while true
begin
timeout(@seconds_before_timeout) do |timeout_length|
start =3D Time.new
begin
open(s) do |result|
if result.status[0] !=3D "200"
probe.addToLogFile([s,'ERR',0,result.status=20
[1]])
else
probe.addToLogFile([s,'OK',Time.new - =20
start,''])
end
end
rescue Exception
probe.addToLogFile([s,'ERR',0,$!])
end
end
rescue Timeout::Error
probe.addToLogFile([s,'ERR',0,'timeout'])
end
sleep(@seconds_between_ping)
end
end

However, this is a problem. Indeed, I want also to measure the =20
time
(round-trip) it takes for the ping (these are only simple pings =20=
for
the time being). As you can see I get the local time before and
after the call. But this doesn't work with threads. Indeed, since
the process is shared among threads, the time will be dependent on
the number of threads I am running and not a correct view of the
actual time it takes to ping.

I can't define the piece of code between the two times as critical
and only for one thread because if the open-uri blocks, it will
prevent another thread to ping another url in the mean time.

Any idea? Maybe use processes instead of threads?



Maybe you can exploit one of the result headers. Chances are that
there
is a timestamp somewhere. Then you *only* need to synchronize
clocks on
your machine and on servers...

Or you switch to a single thread solution. I don't know your ping
interval but if you don't need to ping too often and don't have too
many
servers that should be ok. You can create a simple scheduling that
always
picks the URL with the closest ping point...



I could go single thread (since indeed I am doing a ping per 30
seconds more or less). However, if the first one I try hangs =20
(until a
timeout for example), I am pushing back the time at which I will =20
ping
the second url. Logically I would need to do something like "ping
each url one after another unless one of them seems to take longer
and then detach a thread to wait for the answer".
For the time being I will test forking a process.

You could also have a controller thread that watches your single
testing thread. If the testing takes longer than n seconds (where n
<< timeout) it sets a flag for the current testing thread (with a
thread local variable for example) and starts a new tester thread.

Yet another idea: you make the testing semi critical. When a thread
starts testing it stores a timestamp somewhere. Every other thread =20=
checks
whether the timestamp is set and is only max n seconds away. If it's
longer, replace the timestamp with it's own timestamp and go =20
ahead. If
we're still in the n seconds range, go on sleeping.

robert


--Apple-Mail-2--876048802--
 
B

Bob Hutchison

Alexander Lamb wrote:

I am implementing a very simple script to ping web servers or
services (to monitor how our environment is functionning).

This thread seems to thrashing around a bit.

By what criteria do you establish how your environment is
functioning? Once you know that you can look into how to monitor
those criteria alone and no others.

A 'ping' is really testing the network and web server responsiveness
(assuming the ping task is simple). You can't separate those.
Establish a base line and compare to that. I'd think that if the
network or server is stalling for any reason you'd like to know, and
when comparing to a base line you have a chance of detecting that.

The application responsiveness is probably best measured on the
server by the application itself, possibly by recording the time
between first touch on the app through to the close or flush of the
socket. You'd have to ask the application to report on this.

Cheers,
Bob
 
A

Alexander Lamb

This thread seems to thrashing around a bit.

By what criteria do you establish how your environment is
functioning? Once you know that you can look into how to monitor
those criteria alone and no others.

A 'ping' is really testing the network and web server
responsiveness (assuming the ping task is simple). You can't
separate those. Establish a base line and compare to that. I'd
think that if the network or server is stalling for any reason
you'd like to know, and when comparing to a base line you have a
chance of detecting that.

The application responsiveness is probably best measured on the
server by the application itself, possibly by recording the time
between first touch on the app through to the close or flush of the
socket. You'd have to ask the application to report on this.
Yes, you are right. This ping is really only the first building block
(but you can't imagine how many times we had problems because of a
simple Apache server which didn't restart gracefully at midnight:)
at the same time the apps were running fine... just nobody could get
to them, which is rather annoying in a hospital.

That's why we have a monitoring system which monitors not only the
Apache servers but some key WebServices as well. To monitor
WebServices we either do some kind of dummy search (e.g. give me the
list of patients in this unit) or implement a specific service which
shall test a few things and reply with some information about speed
of the transaction and other things.

What you saw in my post is the first block ( a simple HTTP Ping
probe) for the new system I wish to develop in Ruby. I will then have
a sort of master process which will consolidate the states of the
various probes and display (to be defined: how) the situation. If
some situation seems critical, then some kind of alert will be
escalated.

This brings to another question (sorry I am a beginner in Ruby): is
there a rule engine available in Ruby where you could express rules a
bit like with Jess in Java?

Thanks

Alex
 
B

Bob Hutchison

Yes, you are right. This ping is really only the first building
block (but you can't imagine how many times we had problems because
of a simple Apache server which didn't restart gracefully at
midnight:) at the same time the apps were running fine... just
nobody could get to them, which is rather annoying in a hospital.

I most certainly can imagine it... well, actually, I can rely on
memory :)
That's why we have a monitoring system which monitors not only the
Apache servers but some key WebServices as well. To monitor
WebServices we either do some kind of dummy search (e.g. give me
the list of patients in this unit) or implement a specific service
which shall test a few things and reply with some information about
speed of the transaction and other things.

What you saw in my post is the first block ( a simple HTTP Ping
probe) for the new system I wish to develop in Ruby. I will then
have a sort of master process which will consolidate the states of
the various probes and display (to be defined: how) the situation.
If some situation seems critical, then some kind of alert will be
escalated.

Okay, still, I'd *strongly* suggest the baseline thing (given my
experience) and establishing service levels too. I've written several
monitoring systems, some quite large, sometimes while on a team that
built the hardware to run the monitoring system. Comparison to
expectations or historical records, while tricky to get working at
first, works very well in the end.
This brings to another question (sorry I am a beginner in Ruby): is
there a rule engine available in Ruby where you could express rules
a bit like with Jess in Java?

I don't know personally. But your rules might be overly 'crisp' and
so not very stable (think chaos, and tipping points). I'd have a look
around for a simple fuzzy reasoning system (an unfortunate phrase :)
but *way* more stable.

Cheers,
Bob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top