Velocity Reviews > standard deviation

# standard deviation

Bill Cunningham
Guest
Posts: n/a

 06-05-2011
I have some code here and a snippet of unfinished, untested code which
is an attempt at a function called stddev. This is of course meant to
calculate a standard deviation. I am trying to build small helper functions
that can be built into tech analysis tools. Something I've been attempting
and thinking about for a long time. stddev's first parameter is passed the
return value of the function mean(). It may not need a second parameter but
this is what I have so far. stddev needs to do the following things.
1) find the difference in prices from mean. Whether negative of positive
numbers.
2) square those numbers
3) sum those squares
4) calculate the square of the total from 3 above.

#include <stdio.h>
#include <stdlib.h>
#ifdef M
#include <math.h>
#endif

double mean(double *, int);
double stddev(double, double *);

mean.c

#include "tech.h"

double mean(double *avg, int num)
{
double sum, average;
int i;
sum = average = 0;
for (i = 0; i < num; ++i) {
sum = sum + avg[i];
average = sum / num;
}
return average;
}

stddev.c /*the attempt*/

#include "tech.h"

double stddev(double mean, double *prices)
{
double price = 0.0;
int i = 0;
for (; i < prices; ++i) {
if (prices[i] > mean) {
price = prices - mean;
return prices;
} else if (prices[i] < mean) {
price = mean - prices;
return prices;
}

I really have no way to code this but I don't want anyone to do my
homework. Can someone offer tips or citations as to what I might need to do
here?

Bill

Lew Pitcher
Guest
Posts: n/a

 06-05-2011
On June 5, 2011 14:25, in comp.lang.c, http://www.velocityreviews.com/forums/(E-Mail Removed)d wrote:

> I have some code here and a snippet of unfinished, untested code which
> is an attempt at a function called stddev. This is of course meant to
> calculate a standard deviation.

[snip]
> double stddev(double mean, double *prices)
> {
> double price = 0.0;
> int i = 0;
> for (; i < prices; ++i) {
> if (prices[i] > mean) {
> price = prices - mean;
> return prices;
> } else if (prices[i] < mean) {
> price = mean - prices;
> return prices;
> }
>
> I really have no way to code this but I don't want anyone to do my
> homework. Can someone offer tips or citations as to what I might need to
> do here?

Sorry, Bill, but your code doesn't really reflect the accepted way that you
calculate standard deviation. I'm not mathematician enough to tell whether
you've written equivalent code or not, so I'll just assume that your code
isn't correct, and move on.

I suggest that you read the first few paragraphs of the Wikipedia article on
Standard Deviation, especially start of the "Basic Examples" section
(http://en.wikipedia.org/wiki/Standar...Basic_examples)
There, you'll find an excellent algorithm for calculating standard deviation
that is easily transformable into C code.

Let me summarize their algorithm:
1) Compute the mean of the population
2) For each element of the population,
2a) compute the difference between the element and the mean.
2b) square this value
2c) call this new value the "variance"
3) Find the mean of the variances (sum them, then divide by the # of
variances)
4) Compute the square root of this sum of the mean of the variances

This square root is the "standard deviation"

--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Kleuskes & Moos
Guest
Posts: n/a

 06-05-2011
On Jun 5, 8:25*pm, "Bill Cunningham" <(E-Mail Removed)> wrote:
> * * I have some code here and a snippet of unfinished, untested code which
> is an attempt at a function called stddev. This is of course meant to
> calculate a standard deviation. I am trying to build small helper functions
> that can be built into tech analysis tools. Something I've been attempting
> and thinking about for a long time. stddev's first parameter is passed the
> return value of the function mean(). It may not need a second parameter but
> this is what I have so far. stddev needs to do the following things.
> 1) find the difference in prices from mean. Whether negative of positive
> numbers.
> 2) square those numbers
> 3) sum those squares
> 4) calculate the square of the total from 3 above.
>
>
> #include <stdio.h>
> #include <stdlib.h>
> #ifdef M
> #include <math.h>
> #endif
>
> double mean(double *, int);
> double stddev(double, double *);
>
> mean.c
>
> #include "tech.h"
>
> double mean(double *avg, int num)
> {
> * * double sum, average;
> * * int i;
> * * sum = average = 0;
> * * for (i = 0; i < num; ++i) {
> *sum = sum + avg[i];
> *average = sum / num;
> * * }
> * * return average;
>
> }
>
> stddev.c /*the attempt*/
>
> #include "tech.h"
>
> double stddev(double mean, double *prices)
> {
> * * double price = 0.0;
> * * int i = 0;
> * * for (; i < prices; ++i) {
> *if (prices[i] > mean) {
> * * *price = prices - mean;
> * * *return prices;
> *} else if (prices[i] < mean) {
> * * *price = mean - prices;
> * * *return prices;
> *}
>
> * * I really have no way to code this but I don't want anyone to do my
> homework. Can someone offer tips or citations as to what I might need to do
> here?
>
> Bill

Erwin Kreyszig has a pretty good rundown in 'Introduction to
mathematical statistics, principles and methods' section 3.2 and 3.3.
It used to be pretty standard when i was in college, so i guess it
should still be available in the library.

Lew Pitcher
Guest
Posts: n/a

 06-05-2011
On June 5, 2011 14:57, in comp.lang.c, (E-Mail Removed) wrote:

> On June 5, 2011 14:25, in comp.lang.c, (E-Mail Removed)d wrote:
>
>> I have some code here and a snippet of unfinished, untested code
>> which
>> is an attempt at a function called stddev. This is of course meant to
>> calculate a standard deviation.

> [snip]
>> double stddev(double mean, double *prices)
>> {
>> double price = 0.0;
>> int i = 0;
>> for (; i < prices; ++i) {
>> if (prices[i] > mean) {
>> price = prices - mean;
>> return prices;
>> } else if (prices[i] < mean) {
>> price = mean - prices;
>> return prices;
>> }
>>

FWIW, from the algorithm and data given on the Wikipedia page, I coded this

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double StdDev(unsigned int samplesize, double population[])
{
unsigned int index;

if (samplesize == 0) return 0.0; /* catch obvious error */

/* compute mean of sample population */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
sum += population[index];
mean = sum / samplesize;

/* compute variances */
for (index = 0, sum = 0.0 ; index < samplesize; ++index)
{
double delta;

delta = population[index] - mean;
sum += (delta * delta);
}
return sqrt(sum/samplesize); /* standard deviation */
}

/*
** Population values taken from the Wikipedia example
*/
int main(void)
{
double pop[] = {2,4,4,4,5,5,7,9};
unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

printf("The standard deviation = %f\n",StdDev(popsize,pop));

return EXIT_SUCCESS;
}

When I compile and run this code
\$ cc -lm -o stddev stddev.c
\$ stddev
The standard deviation = 2.000000
\$
I get the same Standard Deviation value as the Wikipedia article's example

HTH
--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Bill Cunningham
Guest
Posts: n/a

 06-05-2011
Lew Pitcher wrote:

[snip]

> FWIW, from the algorithm and data given on the Wikipedia page, I
> coded this
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <math.h>
>
> double StdDev(unsigned int samplesize, double population[])
> {
> unsigned int index;
>
> if (samplesize == 0) return 0.0; /* catch obvious error */
>
> /* compute mean of sample population */
> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> sum += population[index];
> mean = sum / samplesize;
>
> /* compute variances */
> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> {
> double delta;
>
> delta = population[index] - mean;
> sum += (delta * delta);

Is this really saying sum=sum+(delta*delta);
And the parenthsis is for precedence?

> }
> return sqrt(sum/samplesize); /* standard deviation */
> }
>
> /*
> ** Population values taken from the Wikipedia example
> */
> int main(void)
> {
> double pop[] = {2,4,4,4,5,5,7,9};
> unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

Is the above code the standard thing to use if you have an array and
really don't want to count the number of elements? Using sizeof?

> printf("The standard deviation = %f\n",StdDev(popsize,pop));
>
> return EXIT_SUCCESS;
> }
>
> When I compile and run this code
> \$ cc -lm -o stddev stddev.c
> \$ stddev
> The standard deviation = 2.000000
> \$
> I get the same Standard Deviation value as the Wikipedia article's
> example
>
> HTH

Lew Pitcher
Guest
Posts: n/a

 06-05-2011
On June 5, 2011 17:02, in comp.lang.c, (E-Mail Removed)d wrote:

> Lew Pitcher wrote:
>
> [snip]
>
>> FWIW, from the algorithm and data given on the Wikipedia page, I
>> coded this
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <math.h>
>>
>> double StdDev(unsigned int samplesize, double population[])
>> {
>> unsigned int index;
>>
>> if (samplesize == 0) return 0.0; /* catch obvious error */
>>
>> /* compute mean of sample population */
>> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
>> sum += population[index];
>> mean = sum / samplesize;
>>
>> /* compute variances */
>> for (index = 0, sum = 0.0 ; index < samplesize; ++index)
>> {
>> double delta;
>>
>> delta = population[index] - mean;
>> sum += (delta * delta);

>
> Is this really saying sum=sum+(delta*delta);

Yes

> And the parenthsis is for precedence?

Not really. The parenthesis here are a visual cue to the programmer. They
are unnecessary for the logic; the expression would compute the same
without the parenthesis.

>> }
>> return sqrt(sum/samplesize); /* standard deviation */
>> }
>>
>> /*
>> ** Population values taken from the Wikipedia example
>> */
>> int main(void)
>> {
>> double pop[] = {2,4,4,4,5,5,7,9};
>> unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

>
> Is the above code the standard thing to use if you have an array and
> really don't want to count the number of elements? Using sizeof?

(sizeof(array) / sizeof(array[0])) is a fairly standard way to determine the
number of elements in an array. You could call it an idiom.

>> printf("The standard deviation = %f\n",StdDev(popsize,pop));
>>
>> return EXIT_SUCCESS;
>> }
>>
>> When I compile and run this code
>> \$ cc -lm -o stddev stddev.c
>> \$ stddev
>> The standard deviation = 2.000000
>> \$
>> I get the same Standard Deviation value as the Wikipedia article's
>> example
>>
>> HTH

>
>

--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------

Fred
Guest
Posts: n/a

 06-06-2011
On Jun 5, 12:22*pm, Lew Pitcher <(E-Mail Removed)> wrote:
> On June 5, 2011 14:57, in comp.lang.c, (E-Mail Removed) wrote:
>
>
>
>
>
> > On June 5, 2011 14:25, in comp.lang.c, (E-Mail Removed) wrote:

>
> >> * * I have some code here and a snippet of unfinished, untested code
> >> * * which
> >> is an attempt at a function called stddev. This is of course meant to
> >> calculate a standard deviation.

> > [snip]
> >> double stddev(double mean, double *prices)
> >> {
> >> * * double price = 0.0;
> >> * * int i = 0;
> >> * * for (; i < prices; ++i) {
> >> *if (prices[i] > mean) {
> >> * * *price = prices - mean;
> >> * * *return prices;
> >> *} else if (prices[i] < mean) {
> >> * * *price = mean - prices;
> >> * * *return prices;
> >> *}

>
> FWIW, from the algorithm and data given on the Wikipedia page, I coded this
>
> * #include <stdio.h>
> * #include <stdlib.h>
> * #include <math.h>
>
> * double StdDev(unsigned int samplesize, double population[])
> * {
> * * double sum, mean, spread;
> * * unsigned int index;
>
> * * if (samplesize == 0) return 0.0; * */* catch obvious error */
>
> * * /* compute mean of sample population */
> * * for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> * * * sum += population[index];
> * * mean = sum / samplesize;
>
> * * /* compute variances */
> * * for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> * * {
> * * * double delta;
>
> * * * delta = population[index] - mean;
> * * * sum += (delta * delta);
> * * }
> * * return sqrt(sum/samplesize); /* standard deviation */
> * }
>
> * /*
> * ** Population values taken from the Wikipedia example
> * */
> * int main(void)
> * {
> * * double pop[] = {2,4,4,4,5,5,7,9};
> * * unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));
>
> * * printf("The standard deviation = %f\n",StdDev(popsize,pop));
>
> * * return EXIT_SUCCESS;
> * }
>
> When I compile and run this code
> * \$ cc -lm -o stddev stddev.c
> * \$ stddev
> * The standard deviation = 2.000000
> * \$
> I get the same Standard Deviation value as the Wikipedia article's example
>

The above algorithm, while mathematically correct, is not good enough
for a computer. If your population is very large, or the individual
items in the population vary greatly in magnitude, you may run into
severe truncation and roundoff errors.

A more accurate way is to include the Leveque computational
correction,
computing the variance as:

var = { sum[(x[i] - mean)^2] - (1/n)*sum[(x[i] - mean) } / (n-1)
then stddev = sqrt(var)

Note that you computing the mean is not really as simple as summing
the
items and dividing by the number of items. What happens on a 32-bit
machine if the first item is of magnitude 10^18, followed by 10^20
items that are of magnitude 1? None of the latter items will
magnitude in error.
--
Fred K

Fred
Guest
Posts: n/a

 06-06-2011
On Jun 6, 7:39*am, Fred <(E-Mail Removed)> wrote:
> On Jun 5, 12:22*pm, Lew Pitcher <(E-Mail Removed)> wrote:
>
>
>
>
>
> > On June 5, 2011 14:57, in comp.lang.c, (E-Mail Removed) wrote:

>
> > > On June 5, 2011 14:25, in comp.lang.c, (E-Mail Removed) wrote:

>
> > >> * * I have some code here and a snippet of unfinished, untested code
> > >> * * which
> > >> is an attempt at a function called stddev. This is of course meant to
> > >> calculate a standard deviation.
> > > [snip]
> > >> double stddev(double mean, double *prices)
> > >> {
> > >> * * double price = 0.0;
> > >> * * int i = 0;
> > >> * * for (; i < prices; ++i) {
> > >> *if (prices[i] > mean) {
> > >> * * *price = prices - mean;
> > >> * * *return prices;
> > >> *} else if (prices[i] < mean) {
> > >> * * *price = mean - prices;
> > >> * * *return prices;
> > >> *}

>
> > FWIW, from the algorithm and data given on the Wikipedia page, I coded this

>
> > * #include <stdio.h>
> > * #include <stdlib.h>
> > * #include <math.h>

>
> > * double StdDev(unsigned int samplesize, double population[])
> > * {
> > * * double sum, mean, spread;
> > * * unsigned int index;

>
> > * * if (samplesize == 0) return 0.0; * */* catch obvious error */

>
> > * * /* compute mean of sample population */
> > * * for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> > * * * sum += population[index];
> > * * mean = sum / samplesize;

>
> > * * /* compute variances */
> > * * for (index = 0, sum = 0.0 ; index < samplesize; ++index)
> > * * {
> > * * * double delta;

>
> > * * * delta = population[index] - mean;
> > * * * sum += (delta * delta);
> > * * }
> > * * return sqrt(sum/samplesize); /* standard deviation */
> > * }

>
> > * /*
> > * ** Population values taken from the Wikipedia example
> > * */
> > * int main(void)
> > * {
> > * * double pop[] = {2,4,4,4,5,5,7,9};
> > * * unsigned int popsize = (sizeof(pop) / sizeof(pop[0]));

>
> > * * printf("The standard deviation = %f\n",StdDev(popsize,pop));

>
> > * * return EXIT_SUCCESS;
> > * }

>
> > When I compile and run this code
> > * \$ cc -lm -o stddev stddev.c
> > * \$ stddev
> > * The standard deviation = 2.000000
> > * \$
> > I get the same Standard Deviation value as the Wikipedia article's example

>
> The above algorithm, while mathematically correct, is not good enough
> for a computer. If your population is very large, or the individual
> items in the population vary greatly in magnitude, you may run into
> severe truncation and roundoff errors.
>
> A more accurate way is to include the Leveque computational
> correction,
> computing the variance as:
>
> var = { sum[(x[i] - mean)^2] - (1/n)*sum[(x[i] - mean) } / (n-1)
> then stddev = sqrt(var)

Oops, missing a square. The variance with Leveque correction is

{ sum[(x[i] - mean)^2] - (1/n)* sum[(x[i] - mean)]^2 } / (n-1)

i.e., in the first term you sum the squares of x[i]-mean,
and in the second term you square the sum of x[i]-mean

See the Stanford Computer Science report by Chan, Golub, and Leveque

> Note that you computing the mean is not really as simple as summing
> the
> items and dividing by the number of items. What happens on a 32-bit
> machine if the first item is of magnitude 10^18, followed by 10^20
> items that are of magnitude 1? None of the latter items will
> magnitude in error.

-- Fred K

Nobody
Guest
Posts: n/a

 06-06-2011
On Mon, 06 Jun 2011 07:39:47 -0700, Fred wrote:

> The above algorithm, while mathematically correct, is not good enough
> for a computer. If your population is very large, or the individual
> items in the population vary greatly in magnitude, you may run into
> severe truncation and roundoff errors.
>
> A more accurate way is to include the Leveque computational
> correction,

While that may be true, it's a minor detail given the amount of software
I've seen which uses the single-pass algorithm:

var := (sum(x[i]^2) - sum(x[i])^2/n)/n

This can be rather inaccurate if the standard deviation is small compared
to the mean (i.e. the data has a relatively large constant offset).

Dann Corbit
Guest
Posts: n/a

 06-07-2011
In article <46b643da-c836-45ce-9348-fc8758139b33
{snip}
The Welford method cited here is quite good numerically. It is the
method that I use:
http://en.wikipedia.org/wiki/Algorit...ating_variance