Velocity Reviews > Perl > calculate an average with every data in an array

# calculate an average with every data in an array

Erol Akman
Guest
Posts: n/a

 03-04-2009
Hi,

I have following (crazy) task and was hoping to solve it via perl, but

I have an array of values and need to calculate every possible average
and sort it by the average data:

@array = qw(4 6 2 7 1 8 3 5 9 10);

This should be printed out:
4+2 = 6, Average =3
4+6+2 = 12, Average=4
4+6 = 10, Average=5
....
10+4+6+2+3 = 25, Average=5

also nice to have, but not must have: I need to see which scalar perl
used to sum up, maybe "4+6+2 = 12, Average=4, scalar 0,1,3 were summed
up"

I know, its nuts, but I really need this.

Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
so on and hundreds of values like 54, 32, 45. I don't know which and
how many of these values where summed up to get the data above. My
idea was to calculate every possible average to find out which values
where summed up and averaged out.

Is this possible? Can you help me?

Best regards
Erol

Rasmus Villemoes
Guest
Posts: n/a

 03-04-2009
Erol Akman <(E-Mail Removed)> writes:

> Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
> so on and hundreds of values like 54, 32, 45. I don't know which and
> how many of these values where summed up to get the data above. My
> idea was to calculate every possible average to find out which values
> where summed up and averaged out.
>
> Is this possible? Can you help me?

If I understand the problem, you have, say, 100 integers a_1, a_2,
...., a_100 (your 54, 32, 45,... for instance). Some mean person has
taken some of these integers, computed the average and told you this
average (eg. 23.33). Your job is to find the integers which were used
to arrive at this average. Then you are given another average (like
12.25) and must do the same, etc. Is this correct? (If not, ignore the
rest.)

You do realize that the number of subsets of your "hundreds of values"
is larger than the expected lifetime of the solar system in
femtoseconds? So even calculating the average of one subset every
1E-15 seconds wouldn't complete the task in time for it to be
useful. So in theory your approach is possible (the problem is finite,
and an exhaustive search on all subsets would certainly do the job),
but in reality it is only applicable for very small sets of integers.

Maybe a quantum computer could do it, but I don't know if perl is
ported to that platfrom yet

--
Rasmus Villemoes
<http://rasmusvillemoes.dk/>

Edwards
Guest
Posts: n/a

 03-04-2009
On 2009-03-04, Erol Akman <(E-Mail Removed)> wrote:
> I have an array of values and need to calculate every possible average
> and sort it by the average data:
>
> @array = qw(4 6 2 7 1 8 3 5 9 10);
>
>
> This should be printed out:
> 4+2 = 6, Average =3
> 4+6+2 = 12, Average=4
> 4+6 = 10, Average=5
> ...
> 10+4+6+2+3 = 25, Average=5
>
> also nice to have, but not must have: I need to see which scalar perl
> used to sum up, maybe "4+6+2 = 12, Average=4, scalar 0,1,3 were summed
> up"
>
> I know, its nuts, but I really need this.
>
> Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
> so on and hundreds of values like 54, 32, 45. I don't know which and
> how many of these values where summed up to get the data above. My
> idea was to calculate every possible average to find out which values
> where summed up and averaged out.
>
> Is this possible? Can you help me?

Coding-wise this seems fairly straightforward, but I see a couple of
potential problems given the background you provide above. The first,
which you've probably already realized, is that since there are
different ways of summing to a given number, the resulting averages
aren't going to be unique. In your example problem, for instance, I
get several dozen ways (75, actually) of obtaining an average of "6";
and while "unique" solutions aren't exactly rare (10 is only the
average of 10; 9.5 of "9 10", 8.6667 of "7 9 10" etc), they are
definitely in the minority.

The other is that since you have to look at combinations of elements
where the number of elements in the combination is also arbitrary, the
search space is over what I believe is called the "power set" of your
set of values. In your example, there are 10 values, so the number of
possible lists made from sets of these values is 2**10... well,
(2**10)-1, since of course we don't count the empty set. That ran
pretty quickly on my machine, but you mention "hundreds of values"
above and 2**100 is what, 10**30 or so? Which, by itself, would
merely be "nuts" as you say; but in light of the first problem, I am
betting that most of the time a given average is going to have an
incredibly huge number of "lists" associated with it, and the averages
with only a single associated list (ie, a unique list of numbers out
of your set of possible values that has the given average) are going
to be quite rare.

Sorry to sound so negative, but I guess I am trying to say that you
might want to try to figure out how long this would take (eg benchmark
starting with a small subset of your data, ten or so as in your
example problem, then gradually increase the subset size) relative to
how badly you really need it...

--
Darrin

Michael Austin
Guest
Posts: n/a

 03-04-2009
Erol Akman wrote:
> Hi,
>
> I have following (crazy) task and was hoping to solve it via perl, but
>
> I have an array of values and need to calculate every possible average
> and sort it by the average data:
>
> @array = qw(4 6 2 7 1 8 3 5 9 10);
>
>
> This should be printed out:
> 4+2 = 6, Average =3
> 4+6+2 = 12, Average=4
> 4+6 = 10, Average=5
> ...
> 10+4+6+2+3 = 25, Average=5
>
> also nice to have, but not must have: I need to see which scalar perl
> used to sum up, maybe "4+6+2 = 12, Average=4, scalar 0,1,3 were summed
> up"
>
> I know, its nuts, but I really need this.
>
> Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
> so on and hundreds of values like 54, 32, 45. I don't know which and
> how many of these values where summed up to get the data above. My
> idea was to calculate every possible average to find out which values
> where summed up and averaged out.
>
> Is this possible? Can you help me?
>
> Best regards
> Erol

possible? yes.
practical? NO.
useful? I doubt it.
getting the CORRECT permutation? throw a dart.

Robert Billing
Guest
Posts: n/a

 03-05-2009
Erol Akman wrote:

> Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
> so on and hundreds of values like 54, 32, 45. I don't know which and
> how many of these values where summed up to get the data above. My
> idea was to calculate every possible average to find out which values
> where summed up and averaged out.
>
> Is this possible? Can you help me?

At first sight this does not look possible because the brute force
attack, simply trying all the possibilities, requires something like
10^30 iterations. However it does remind me of a digital filter
optimisation problem that two of us did solve over one long summer with
a VAX and a Transputer array (remember those?). The final version could
find optimal integer settings for a very large filter in about 4 hours
of solid bash on the array.

The trick was to make a first guess, then decide which direction would
take us closer to what we wanted, effectively deciding which way was
uphill. Having found a hilltop we then tried making a grid of jumps
around the neighbourhood to see if we could find a higher hill, then
climb that, and so on until we reached a point where every change made
the filter worse.

It seems to me that every entry in your raw data is either above or
below the value you want, so you need to find a selection of above and
below values which pull the average in both directions with equal
weight. Some sort of "put this in and take that out" iteration might
well converge in a reasonable time.

Rasmus Villemoes
Guest
Posts: n/a

 03-05-2009
Robert Billing <(E-Mail Removed)> writes:

> Erol Akman wrote:
>
>> Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
>> so on and hundreds of values like 54, 32, 45. I don't know which and

>
> It seems to me that every entry in your raw data is either above or
> below the value you want, so you need to find a selection of above and
> below values which pull the average in both directions with equal
> weight. Some sort of "put this in and take that out" iteration might
> well converge in a reasonable time.

Yes, the OP definitely must use some fancier algorithm than brute
force to achieve his goal. An obvious observation is that the average
of some integers is a rational number. 23.33 looks suspicially like it
is really 23 + 1/3 = 70/3, so one could limit the search to (3
integers summing to 70) or (6 integers summing to 140) or (9 integers
summing to 210) or ... I don't know if there are standard algorithms
to search for solutions for each problem in ().

This approach of course assumes he knows the floating point data with
sufficient precision to make an educated guess at the true rational
representation (is 84.56 really 84 + 5/9, etc?).

Also, it is still unclear if he wants all subsets with a given average
or just one.

--
Rasmus Villemoes
<http://rasmusvillemoes.dk/>

Dr.Ruud
Guest
Posts: n/a

 03-07-2009
Erol Akman wrote:

> I have following (crazy) task and was hoping to solve it via perl, but
>
> I have an array of values and need to calculate every possible average
> and sort it by the average data:
>
> @array = qw(4 6 2 7 1 8 3 5 9 10);
>
>
> This should be printed out:
> 4+2 = 6, Average =3
> 4+6+2 = 12, Average=4
> 4+6 = 10, Average=5
> ...
> 10+4+6+2+3 = 25, Average=5
>
> also nice to have, but not must have: I need to see which scalar perl
> used to sum up, maybe "4+6+2 = 12, Average=4, scalar 0,1,3 were summed
> up"
>
> I know, its nuts, but I really need this.
>
> Background: I have lots and lots of data like 23.33, 12.25, 84.56 and
> so on and hundreds of values like 54, 32, 45. I don't know which and
> how many of these values where summed up to get the data above. My
> idea was to calculate every possible average to find out which values
> where summed up and averaged out.
>
> Is this possible? Can you help me?

It seems to me that you have a goal value, like when finding out which
invoices relate to a single payment.
--
Ruud

Erol Akman
Guest
Posts: n/a

 03-09-2009
Thank you very much for your thoughts and efforts. It made me realize
how nuts this task really is

Taken into account that I could make a few educated guesses as Rasmus
Villemoes suggested, I would not have "hundreds of values" but 20 or
even less.

You've said, that you already ran a calculation on your computer with
10 values (?), how did you do that? Could you provide me your code? I
just know the basics:

#! /usr/bin/perl
# calculate an average

use strict;
use warnings;

my @array = qw(1 2 3 4 5 6 7 8 9 10 11);
my \$sum = 0;
\$sum += \$_ for(@array);
my \$avg = \$sum / scalar(@array);
print "average: ",\$avg,"\n";
print "Values that have been used:","\n";
print join("\n",@array), "\n";
print "Number of values: ", scalar(@array), "\n";

Erol

Edwards
Guest
Posts: n/a

 03-09-2009
Not sure if this was directed at me; if not, sorry, please ignore.

On 2009-03-09, Erol Akman <(E-Mail Removed)> wrote:
> Thank you very much for your thoughts and efforts. It made me realize
> how nuts this task really is
>
> Taken into account that I could make a few educated guesses as Rasmus
> Villemoes suggested, I would not have "hundreds of values" but 20 or
> even less.
>
> You've said, that you already ran a calculation on your computer with
> 10 values (?), how did you do that? Could you provide me your code?

Well, I was a bit afraid to do that because it's so ugly; keep in mind
this was just a quick hack that I threw together based only on your
initial description of the problem. The output isn't really formatted
the way you requested, but it should be pretty self-explanatory (the
number in parentheses after a given average is how many lists of
values had that average). I've reformatted it a bit (it was originally

trurl:~>perl -e 'my @data = qw(4 6 2 7 1 8 3 5 9 10);
my %avg_list;
for my \$iter (1..2**@data-1) {
# There must be a better way to get the bit flags from each number,
# but I couldn't figure out how to get unpack to do what I wanted...
my @chosen = split "", sprintf("%0" . @data . "b", \$iter);
my @indices = grep(\$chosen[\$_], (0..@data-1));
# The eval was just a silly way to do the average in one line; you're
# probably better off doing it the straightforward way (as you posted)
my \$this_avg = (eval join("+", @data[@indices]))/@indices;
push @{\$avg_list{\$this_avg}}, [@data[@indices]];
}

for my \$this_avg (sort {\$a <=> \$b} keys (%avg_list)) {
print "\$this_avg (", scalar(@{\$avg_list{\$this_avg}}), "):\n";
for my \$list (@{\$avg_list{\$this_avg}}) {
print "\t@{\$list}\n";
}
}' | more

Please consider this code "not really tested in any serious way".

--
Darrin

Erol Akman
Guest
Posts: n/a

 03-25-2009
Hi Darrin,

and made realize, that my problem is hard to solve.

Thx

-----
Erol