ScientificPython - LeastSquareFit diverges

Harold Fellermann · Jul 18, 2006

Dear all,

I am trying to fit a powerlaw to a small dataset using
Scientific.Functions.LeastSquares fit.
Unfortunately, the algorithm seems to diverge and throws an
OverflowException.
Here is how I try it:

from Scientific.Functions.LeastSquares import leastSquaresFit

data = [

Click to expand...

Click to expand...

.... (2.5, 589.0, 0.10000000000000001),
.... (7.5, 442.0, 0.10000000000000001),
.... (12.5, 96.0, 0.10000000000000001),
.... (17.5, 36.0, 0.10000000000000001),
.... (22.5, 16.0, 0.10000000000000001),
.... (27.5, 7.0, 0.10000000000000001),
.... (32.5, 6.0, 0.10000000000000001),
.... (37.5, 3.0, 0.10000000000000001),
.... (42.5, 3.0, 0.10000000000000001),
.... (47.5, 1.0, 0.10000000000000001),
.... (52.5, 3.0, 0.10000000000000001),
.... (57.5, 1.0, 0.10000000000000001),
.... (67.5, 1.0, 0.10000000000000001),
.... (77.5, 2.0, 0.10000000000000001),
.... (82.5, 1.0, 0.10000000000000001),
.... (87.5, 2.0, 0.10000000000000001)
.... ].... return a*x**b
....Traceback (most recent call last):
File "<stdin>", line 1, in ?
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/LeastSquares.py",
line 72, in leastSquaresFit
next_chi_sq, next_alpha = _chiSquare(model, next_p, data)
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/LeastSquares.py",
line 22, in _chiSquare
f = model(parameters, point[0])
File "<stdin>", line 2, in powerlaw
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/FirstDerivatives.py",
line 182, in __rpow__
return pow(other, self)
File
"/usr/lib/python2.4/site-packages/Scientific/Functions/FirstDerivatives.py",
line 171, in __pow__
raise OverflowError, "Numerical result of pow(%s,%s) out of range."
% (self.value,other.value-1)
OverflowError: Numerical result of pow(2.5,8376.79243687) out of range.

I added some debugging output in
/usr/lib/python-2.4/site-packages/Scientifc/Functions/LeastSquares.py
in the function _chiSquare that prints the fit parameters during the
Levenberg-Marquardt iteration.
The procedure seems do diverge after the first step:

((10, [1]), (-3, [0, 1]))
[(-67402.311817579117, [1]), (8377.7924368716158, [0, 1])]

Note that I could easily fit the above data using gnuplots internal
fitting procedure. Any idea what is going wrong here? Is it a known
problem? Are there any work arounds or other packages?

Any help is appreciated!

- harold -

Terry Reedy · Jul 18, 2006

Harold Fellermann said:
I am trying to fit a powerlaw to a small dataset using
Scientific.Functions.LeastSquares fit.

This is a bit off-topic here, and normally better for the scipy list, but I
have some experience with nonlinear least squares.

Unfortunately, the algorithm seems to diverge and throws an
OverflowException.

Assuming the program is okay, this means that either the function
mismatches the data or the initial values are too far off to converge.

Here is how I try it:

from Scientific.Functions.LeastSquares import leastSquaresFit

data = [

Click to expand...

Click to expand...

... (2.5, 589.0, 0.10000000000000001),
... (7.5, 442.0, 0.10000000000000001),
... (12.5, 96.0, 0.10000000000000001),

I presume that tuples are x, y, some_error_indicator. But the last does
not matter here.

... return a*x**b

Did you try plotting logx versus log y to see if you get approximately a
straight line? If so, the intercept and slope are estimates of loga and b.

...

I presume (10,-3) is the starting (a,b). But, for instance 10*7.5**-3 =
..02, which has no relation to 442, whereas, for instance, 1000*7.5-.75 =
221, which is in the ballpark, at least. So (a,b)=(1000, -.75) might have
a chance.

Terry Jan Reedy

konrad.hinsen · Jul 18, 2006

... return a*x**b

Fitting power laws is a tricky business, you need a pretty good
initial guess to get convergence.

Note that I could easily fit the above data using gnuplots internal
fitting procedure. Any idea what is going wrong here? Is it a known
problem? Are there any work arounds or other packages?

My suggestion is to fit, at least as a first step, the logarithms of
your data points:

import Numeric as N

def powerlaw_log((a, b), x) :
return N.log(a) + b*N.log(x)

params1, chisq = leastSquaresFit(powerlaw_log, (10., -3.),
[(x, N.log(y)) for x, y, sigma in
data])

You can then use those parameters as starting values for fitting your
original problem:

params2, chisq = leastSquaresFit(powerlaw, params1, data)

Doing this for your data yields:

params1: [9469.9675999067185, -2.0881423620750521]

params2: [1591.4025775162165, -1.0112284948049179]

The big difference between the two fits is a further indicator for a
stability problem. I would trust the first set more than the second one.

As a general rule, the model to be fitted should be a smoothly
varying function of the parameters, and the same should be true for
the derivatives.

The second general rule is never to trust a non-linear fit algorithm
blindly. Look at your data first, see if the model can be a good fit,
and play with some paramater values to get a feeling for how they
influence the fit. Plotting your data set, it is immediately clear
that the first point ruins any nice power law behaviour. You might
thus prefer to do the fit without the first point, and you will get a
much better defined exponent:

params1: [31363.301954929859, -2.4047303053979046]
params2: [182522.2346197216, -2.9893640209815757]

Plotting the models corresponding to these two sets together with the
data, you will see that everything coincides well for large x values,
meaning that the first two points make all the difference - another
pointer towards a lack of stability in the fit.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: (e-mail address removed)
---------------------------------------------------------------------

Harold Fellermann · Jul 19, 2006

Thanks for your advices, Terry and Konrad,

using the linear fit as initial condition for the pawerlow fit works
pretty well for my data.
(I already had the two calculations but performed them vice versa ...

Anyway, I had
the impression that the leastSquaresFit in Scientific Python is an
implementation of
the Levenberg Marquardt algorithm as it is presented in the Numerical
Recipes. Accoring
to reviews, this algorithm is not famous for its stability
(e.g. http://www.stanford.edu/class/cme302/wnnr/nr.html). Better
implementations
are out there (e.g. http://www.ics.forth.gr/~lourakis/levmar/). Are
there any plans to
improve the SciPy algorithm? Would it be a welcome contribution to
SciPy to work
this part out?

- harold -

Konrad Hinsen · Jul 19, 2006

Anyway, I had
the impression that the leastSquaresFit in Scientific Python is an
implementation of
the Levenberg Marquardt algorithm as it is presented in the Numerical
Recipes.
True.

Accoring
to reviews, this algorithm is not famous for its stability
(e.g. http://www.stanford.edu/class/cme302/wnnr/nr.html). Better
implementations
are out there (e.g. http://www.ics.forth.gr/~lourakis/levmar/). Are
there any plans to
improve the SciPy algorithm? Would it be a welcome contribution to
SciPy to work
this part out?

Yes, definitely. And no, I have no plans to do it myself any time
soon. The current implementation has always been sufficient for my
needs, and time is scarce...

BTW, ScientificPython (http://dirac.cnrs-orleans.fr/
ScientificPython/) is not the same thing as SciPy (http://
www.scipy.org/). Both are scientific libraries for Python, but their
focus is different: ScientificPython aims at providing pythonic
modules for scientific computing, whereas SciPy's objective is to
provide Python interfaces to the large pool of scientific libraries
from the Fortran/C/C++ world. Scientific users of Python should
probably have both of them installed.

Konrad.
--
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: (e-mail address removed)
---------------------------------------------------------------------

How to fix ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2570)	0	Jul 28, 2023
an error about DJANGO_SETTINGS_MODULE	0	Jul 22, 2010
python+libxml2+scrapy AttributeError: 'module' object has noattribute 'HTML_PARSE_RECOVER'	3	Aug 15, 2012
python import sys.path	1	Dec 30, 2008
problems importing from /usr/lib/pyshared/	3	Jan 10, 2013
I'm getting an error when trying to reach my Instagram followers in Selenium	1	Jan 11, 2023
python import error	7	Dec 11, 2013
pip won't ignore system-installed files	1	Oct 22, 2013

ScientificPython - LeastSquareFit diverges

Harold Fellermann

Terry Reedy

konrad.hinsen

Harold Fellermann

Konrad Hinsen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads