Velocity Reviews > Java > Riddle me this

# Riddle me this

Sharp Tool
Guest
Posts: n/a

 11-06-2005
Hi

Consider this list of numbers:

12.0
5.0
1.0
-0.1
-2.1
-124.0

what algorithm to use to remove large negative values such as -124.0?
how to determine a cutoff value that is statistically meaningful?

So far i have:

cuff off = smallest positive - smallest difference in negative pairs
= 1.0 - (2.1 - 0.1)
= 1.0 - 2.0
= -1.0

Problem is that would eliminate - 2.1!

Help appreciated.
Sharp Tool

Roedy Green
Guest
Posts: n/a

 11-06-2005
On Sun, 06 Nov 2005 08:46:17 GMT, "Sharp Tool"
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>what algorithm to use to remove large negative values such as -124.0?
>how to determine a cutoff value that is statistically meaningful?

That is not usually a statistical question but a plausibility
question. If you are scanning data for temperatures of Honolulu you
would look at history, give yourself a safety factor, and chop below
and above a given range.

Readings for human temperatures would have a narrower range unless you
included corpses.

If your numbers fit a normal bell shaped curve, you can compute the
mean and standard deviation. Then you could throw out numbers more
than n deviations from the mean.

--
http://mindprod.com Java custom programming, consulting and coaching.

Thomas Hawtin
Guest
Posts: n/a

 11-06-2005
Sharp Tool wrote:
>
> what algorithm to use to remove large negative values such as -124.0?
> how to determine a cutoff value that is statistically meaningful?

This newsgroup probably isn't the best place to find statisticians
(although I guess there are a few).

You could google for "outliers" or similar. "Grubbs' Test for Outliers"
seems like a step in the right direction.

Tom Hawtin
--
Unemployed English Java programmer
http://jroller.com/page/tackline/

SDB
Guest
Posts: n/a

 11-06-2005
"Sharp Tool" <(E-Mail Removed)> wrote in message
news:tpjbf.9940\$(E-Mail Removed)...

: Consider this list of numbers:
:
: 12.0
: 5.0
: 1.0
: -0.1
: -2.1
: -124.0

: what algorithm to use to remove large negative values such as -124.0?
: how to determine a cutoff value that is statistically meaningful?

: So far i have:

: cuff off = smallest positive - smallest difference in negative pairs
: = 1.0 - (2.1 - 0.1)
: = 1.0 - 2.0
: = -1.0

How sophisticated do you need to be? Consider using the absolute value so
you don't need to worry about positive or negative numbers.

If the numbers you gave are just an example and the problem you are trying
to solve is more generic, look at a statics value called the 'Z-Score' also
sometimes called the 'Z-Value'. It computed by subtracting the number from
the mean then dividing it by the standard diviation of the set. You can
throw out value outside a range of Z-scores.

From your set, the standard deviation is 52.15.

The z-Score of the second one, 5.0 is .8603
The z-Score of the last one, -124, is .0282

In stats, the z-Score is your friend.

Sharp Tool
Guest
Posts: n/a

 11-07-2005

>> Sharp Tool wrote:
> >
> > what algorithm to use to remove large negative values such as -124.0?
> > how to determine a cutoff value that is statistically meaningful?

>
> This newsgroup probably isn't the best place to find statisticians
> (although I guess there are a few).
>
> You could google for "outliers" or similar. "Grubbs' Test for Outliers"
> seems like a step in the right direction.
>
> Tom Hawtin

Grubbs Test is only suitable for data that has a normal distribution - mine
does not.

Cheers
Sharp

Sharp Tool
Guest
Posts: n/a

 11-07-2005

"SDB" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "Sharp Tool" <(E-Mail Removed)> wrote in message
> news:tpjbf.9940\$(E-Mail Removed)...
>
> : Consider this list of numbers:
> :
> : 12.0
> : 5.0
> : 1.0
> : -0.1
> : -2.1
> : -124.0
>
> : what algorithm to use to remove large negative values such as -124.0?
> : how to determine a cutoff value that is statistically meaningful?
>
> : So far i have:
>
> : cuff off = smallest positive - smallest difference in negative pairs
> : = 1.0 - (2.1 - 0.1)
> : = 1.0 - 2.0
> : = -1.0
>
> How sophisticated do you need to be? Consider using the absolute value so
> you don't need to worry about positive or negative numbers.
>
> If the numbers you gave are just an example and the problem you are trying
> to solve is more generic, look at a statics value called the 'Z-Score'

also
> sometimes called the 'Z-Value'. It computed by subtracting the number

from
> the mean then dividing it by the standard diviation of the set. You can
> throw out value outside a range of Z-scores.
>
> From your set, the standard deviation is 52.15.
>
> The z-Score of the second one, 5.0 is .8603
> The z-Score of the last one, -124, is .0282
>
> In stats, the z-Score is your friend.

My data does not fit a normal distribution.
I do not want to eliminate any positive values.
I only want to eliminate large negative values.
Z scores work with only with absolute values.
So whats the best way to go now? I'm not a statistician.

Cheers
Sharp Tool

Roedy Green
Guest
Posts: n/a

 11-07-2005
On Mon, 07 Nov 2005 08:42:24 GMT, "Sharp Tool"
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>My data does not fit a normal distribution.
>I do not want to eliminate any positive values.
>I only want to eliminate large negative values.
>Z scores work with only with absolute values.
>So whats the best way to go now? I'm not a statistician.

What distribution do they conform to?
--
http://mindprod.com Java custom programming, consulting and coaching.

Andrew Thompson
Guest
Posts: n/a

 11-07-2005
Sharp Tool wrote:

> My data does not fit a normal distribution.

What distribution/pattern/logic does it fit, because..

> I only want to eliminate large negative values.

...knowing that will lead to a lot closer to defining
(pinning down, and putting a value to) 'large'.

Beyond the hypothetical though, does this describe
an actual problem, or is it purely a mental exercise?

Sharp Tool
Guest
Posts: n/a

 11-07-2005
> Sharp Tool wrote:
>
> > My data does not fit a normal distribution.

>
> What distribution/pattern/logic does it fit, because..
>
> > I only want to eliminate large negative values.

>
> ..knowing that will lead to a lot closer to defining
> (pinning down, and putting a value to) 'large'.

A large value is one that is an obvious outlier.
I only want to eliminate large negative values.
By eye-balling the list of numbers, you can see that -124.0
doesn't 'fit in'. Wondering if there a statistical method for this.

> Beyond the hypothetical though, does this describe
> an actual problem, or is it purely a mental exercise?

Mental exercise, but i think it could be useful for removing
negative outliers.

Sharp Tool

Sharp Tool
Guest
Posts: n/a

 11-07-2005
> <(E-Mail Removed)> wrote, quoted or indirectly quoted someone
> who said :
>
> >My data does not fit a normal distribution.
> >I do not want to eliminate any positive values.
> >I only want to eliminate large negative values.
> >Z scores work with only with absolute values.
> >So whats the best way to go now? I'm not a statistician.

>
> What distribution do they conform to?

Random I believe.

Sharp Tool