Hi will,
> hi,
>
> can i please have some pointers to calculate the 'spamicity'?
>
> i've read a couple of websites, but most of them are using it in a
> programming language way, rather than a formulae.
>
> eg this from www.paulgraham.com
>
> (let ((g (* 2 (or (gethash word good) 0)))
> (b (or (gethash word bad) 0)))
> (unless (< (+ g b) 5)
> (max .01
> (min .99 (float (/ (min 1 (/ b nbad))
> (+ (min 1 (/ g ngood))
> (min 1 (/ b nbad)))))))))
>
>
> what does it mean in simple english?
Will, that is an extract of Common Lisp. (gethash word good) is looking up
the word in the group of good words. If it's not found the second part of
the OR is computed which then returns 0. Ditto for (gethash word bad) except
the word is looked up in the group of bad words.
UNLESS means to do something unless the test is true.
MAX and MIN find the maximum or minimum of a set of numbers.
FLOAT converts a number into a floating point representation.
/ is division.
+ is addition.
Everything is in prefix format, e.g. (+ 1 (* 2 3)) = (+ 1 6) = 7. It's like
function calls where the opening bracket is before the function name instead
of directly after the function name, e.g. sqrt(x) in many other languages
would be written as (sqrt x) in Lisp.
This pseudocode translation may help:
let good-entry = is the word in the group of good words?
bad-entry = is the word in the group of bad words?
g = if good-entry then choose the good-entry else 0
b = if bad-entry then choose the bad-entry else 0
unless (g+b) < 5
max(0.01
min(0.99 float[ min(1 b/nbad)
--------------------------------
min(1 g/ngood) + min(1 b/nbad) ]))
Regards,
Adam