Numpy outlier removal

  • Thread starter Joseph L. Casale
  • Start date
M

Maarten

With the line constrained to go through 0,0, a line eyeballed with a
clear ruler could easily be better than either regression line, as a
human will tend to minimize the deviations *perpendicular to the line*,
which is the proper thing to do (assuming both variables are measured in
the same units).

In that case use an appropriate algorithm to perform the fit. ODR comes to mind. http://docs.scipy.org/doc/scipy/reference/odr.html

Maarten
 
M

Maarten

With the line constrained to go through 0,0, a line eyeballed with a
clear ruler could easily be better than either regression line, as a
human will tend to minimize the deviations *perpendicular to the line*,
which is the proper thing to do (assuming both variables are measured in
the same units).

In that case use an appropriate algorithm to perform the fit. ODR comes to mind. http://docs.scipy.org/doc/scipy/reference/odr.html

Maarten
 
C

Chris Angelico

Why on Earth do you think that the distance from nominal surface
temperatures to freezing much less absolute 0 is the right scale to compare
global warming changes against? You need to compare against the size of
global mean temperature changes that would cause large amounts of human
suffering, and that scale is on the order of a *few* degrees, not hundreds.
A change of half a degree over a few decades with no signs of slowing down
*should* be alarming.

I didn't say what it should be; I gave three examples. And as I said,
this is not the forum to debate climate change; I was just using it as
an example of statistical reporting.

Three types of lies.

ChrisA
 
R

Robert Kern

I didn't say what it should be;

Actually, you did. You stated that "a ~0.6 deg increase across ~30 years [is
h]ardly statistically significant". Ignoring the confusion between statistical
significance and practical significance (as external criteria like the
difference between the nominal temp and absolute 0 or the right criteria that I
mentioned has nothing to do with statistical significance), you made a positive
claim that it wasn't significant.
I gave three examples.

You gave negligently incorrect ones. Whether your comments were on topic or not,
you deserve to be called on them when they are wrong.
And as I said,
this is not the forum to debate climate change; I was just using it as
an example of statistical reporting.

Three types of lies.

FUD is a fourth.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

Steven D'Aprano

With the line constrained to go through 0,0 a line eyeballed with a
clear ruler could easily be better than either regression line, as a
human will tend to minimize the deviations *perpendicular to the line*,
which is the proper thing to do (assuming both variables are measured
in the same units).

It is conventional to talk about "residuals" rather than deviations.

And it could even more easily be worse than a regression line. And since
eyeballing is entirely subjective and impossible to objectively verify,
the line that you claim minimizes the residuals might be very different
from the line that I claim minimizes the residuals, and no way to decide
between the two claims.

In any case, there is a technique for working out ordinary least squares
(OLS) linear regression using perpendicular offsets rather than vertical
offsets:

http://mathworld.wolfram.com/LeastSquaresFittingPerpendicularOffsets.html

but in general, if you have to care about errors in the dependent
variable, you're better off using a more powerful technique than just OLS.

The point I keep making, that everybody seems to be ignoring, is that
eyeballing a line of best fit is subjective, unreliable and impossible to
verify. How could I check that the line you say is the "best fit"
actually *is* the *best fit* for the given data, given that you picked
that line by eye? Chances are good that if you came back to the data a
month later, you'd pick a different line!

As I have said, eyeballing a line is fine for rough back of the envelope
type calculations, where you only care that you have a line pointing more
or less in the right direction. But for anything where accuracy is
required, line fitting by eye is down in the pits of things not to do,
right next to "making up the answers you prefer".
 
S

Steven D'Aprano

Three types of lies.

Oh, surely more than that.

White lies.

Regular or garden variety lies.

Malicious lies.

Accidental or innocent lies.

FUD -- "fear, uncertainty, doubt".

Half-truths.

Lying by omission.

Exaggeration and understatement.

Propaganda.

Misinformation.

Disinformation.

Deceit by emphasis.

And manufactured doubt.

E.g. the decades long campaign by the tobacco companies to deny that
tobacco products caused cancer, when their own scientists were telling
them that they did. Having learnt how valuable falsehoods are, those same
manufacturers of doubt went on to sell their services to those who wanted
to deny that CFCs destroyed ozone, and that CO2 causes warming.


The old saw about "lies, damned lies and statistics" reminds me very much
of a quote from Homer Simpson:

"Pfff, facts, you can prove anything that's even remotely true with
facts!"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,142
Messages
2,570,819
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top