Velocity Reviews > Numpy outlier removal

# Numpy outlier removal

Maarten
Guest
Posts: n/a

 01-08-2013
On Tuesday, January 8, 2013 10:07:08 AM UTC+1, Terry Reedy wrote:

> With the line constrained to go through 0,0, a line eyeballed with a
> clear ruler could easily be better than either regression line, as a
> human will tend to minimize the deviations *perpendicular to the line*,
> which is the proper thing to do (assuming both variables are measured in
> the same units).

In that case use an appropriate algorithm to perform the fit. ODR comes to mind. http://docs.scipy.org/doc/scipy/reference/odr.html

Maarten

Maarten
Guest
Posts: n/a

 01-08-2013
On Tuesday, January 8, 2013 10:07:08 AM UTC+1, Terry Reedy wrote:

> With the line constrained to go through 0,0, a line eyeballed with a
> clear ruler could easily be better than either regression line, as a
> human will tend to minimize the deviations *perpendicular to the line*,
> which is the proper thing to do (assuming both variables are measured in
> the same units).

In that case use an appropriate algorithm to perform the fit. ODR comes to mind. http://docs.scipy.org/doc/scipy/reference/odr.html

Maarten

Chris Angelico
Guest
Posts: n/a

 01-08-2013
On Wed, Jan 9, 2013 at 2:55 AM, Robert Kern <(E-Mail Removed)> wrote:
> On 08/01/2013 06:35, Chris Angelico wrote:
>> ... it looks
>> quite significant to show a line going from the bottom of the graph to
>> the top, but sounds a lot less noteworthy when you see it as a
>> half-degree increase on about (I think?) 30 degrees, and even less
>> when you measure temperatures in absolute scale (Kelvin) and it's half
>> a degree in three hundred.

>
> Why on Earth do you think that the distance from nominal surface
> temperatures to freezing much less absolute 0 is the right scale to compare
> global warming changes against? You need to compare against the size of
> global mean temperature changes that would cause large amounts of human
> suffering, and that scale is on the order of a *few* degrees, not hundreds.
> A change of half a degree over a few decades with no signs of slowing down
> *should* be alarming.

I didn't say what it should be; I gave three examples. And as I said,
this is not the forum to debate climate change; I was just using it as
an example of statistical reporting.

Three types of lies.

ChrisA

Robert Kern
Guest
Posts: n/a

 01-08-2013
On 08/01/2013 20:14, Chris Angelico wrote:
> On Wed, Jan 9, 2013 at 2:55 AM, Robert Kern <(E-Mail Removed)> wrote:
>> On 08/01/2013 06:35, Chris Angelico wrote:
>>> ... it looks
>>> quite significant to show a line going from the bottom of the graph to
>>> the top, but sounds a lot less noteworthy when you see it as a
>>> half-degree increase on about (I think?) 30 degrees, and even less
>>> when you measure temperatures in absolute scale (Kelvin) and it's half
>>> a degree in three hundred.

>>
>> Why on Earth do you think that the distance from nominal surface
>> temperatures to freezing much less absolute 0 is the right scale to compare
>> global warming changes against? You need to compare against the size of
>> global mean temperature changes that would cause large amounts of human
>> suffering, and that scale is on the order of a *few* degrees, not hundreds.
>> A change of half a degree over a few decades with no signs of slowing down
>> *should* be alarming.

>
> I didn't say what it should be;

Actually, you did. You stated that "a ~0.6 deg increase across ~30 years [is
h]ardly statistically significant". Ignoring the confusion between statistical
significance and practical significance (as external criteria like the
difference between the nominal temp and absolute 0 or the right criteria that I
mentioned has nothing to do with statistical significance), you made a positive
claim that it wasn't significant.

> I gave three examples.

You gave negligently incorrect ones. Whether your comments were on topic or not,
you deserve to be called on them when they are wrong.

> And as I said,
> this is not the forum to debate climate change; I was just using it as
> an example of statistical reporting.
>
> Three types of lies.

FUD is a fourth.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Steven D'Aprano
Guest
Posts: n/a

 01-09-2013
On Tue, 08 Jan 2013 04:07:08 -0500, Terry Reedy wrote:

>> But that is not fitting a line by eye, which is what I am talking

>
> With the line constrained to go through 0,0 a line eyeballed with a
> clear ruler could easily be better than either regression line, as a
> human will tend to minimize the deviations *perpendicular to the line*,
> which is the proper thing to do (assuming both variables are measured
> in the same units).

It is conventional to talk about "residuals" rather than deviations.

And it could even more easily be worse than a regression line. And since
eyeballing is entirely subjective and impossible to objectively verify,
the line that you claim minimizes the residuals might be very different
from the line that I claim minimizes the residuals, and no way to decide
between the two claims.

In any case, there is a technique for working out ordinary least squares
(OLS) linear regression using perpendicular offsets rather than vertical
offsets:

http://mathworld.wolfram.com/LeastSq...arOffsets.html

but in general, if you have to care about errors in the dependent
variable, you're better off using a more powerful technique than just OLS.

The point I keep making, that everybody seems to be ignoring, is that
eyeballing a line of best fit is subjective, unreliable and impossible to
verify. How could I check that the line you say is the "best fit"
actually *is* the *best fit* for the given data, given that you picked
that line by eye? Chances are good that if you came back to the data a
month later, you'd pick a different line!

As I have said, eyeballing a line is fine for rough back of the envelope
type calculations, where you only care that you have a line pointing more
or less in the right direction. But for anything where accuracy is
required, line fitting by eye is down in the pits of things not to do,
right next to "making up the answers you prefer".

--
Steven

Jason Friedman
Guest
Posts: n/a

 01-09-2013
> Statistical analysis is a huge science. So is lying. And I'm not sure
> most people can pick one from the other.

Chris, your sentence causes me to think of Mr. Twain's sentence, or at
least the one he popularized:
http://www.twainquotes.com/Statistics.html.

Jason Friedman
Guest
Posts: n/a

 01-09-2013
> Statistical analysis is a huge science. So is lying. And I'm not sure
> most people can pick one from the other.

Chris, your sentence causes me to think of Mr. Twain's sentence, or at
least the one he popularized:
http://www.twainquotes.com/Statistics.html.

Steven D'Aprano
Guest
Posts: n/a

 01-09-2013
On Wed, 09 Jan 2013 07:14:51 +1100, Chris Angelico wrote:

> Three types of lies.

Oh, surely more than that.

White lies.

Regular or garden variety lies.

Malicious lies.

Accidental or innocent lies.

FUD -- "fear, uncertainty, doubt".

Half-truths.

Lying by omission.

Exaggeration and understatement.

Propaganda.

Misinformation.

Disinformation.

Deceit by emphasis.

And manufactured doubt.

E.g. the decades long campaign by the tobacco companies to deny that
tobacco products caused cancer, when their own scientists were telling
them that they did. Having learnt how valuable falsehoods are, those same
manufacturers of doubt went on to sell their services to those who wanted
to deny that CFCs destroyed ozone, and that CO2 causes warming.

The old saw about "lies, damned lies and statistics" reminds me very much
of a quote from Homer Simpson:

"Pfff, facts, you can prove anything that's even remotely true with
facts!"

--
Steven