Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Linear regression in NumPy

Reply
Thread Tools

Linear regression in NumPy

 
 
nikie
Guest
Posts: n/a
 
      03-17-2006
I'm a little bit stuck with NumPy here, and neither the docs nor
trial&error seems to lead me anywhere:
I've got a set of data points (x/y-coordinates) and want to fit a
straight line through them, using LMSE linear regression. Simple
enough. I thought instead of looking up the formulas I'd just see if
there isn't a NumPy function that does exactly this. What I found was
"linear_least_squares", but I can't figure out what kind of parameters
it expects: I tried passing it my array of X-coordinates and the array
of Y-coordinates, but it complains that the first parameter should be
two-dimensional. But well, my data is 1d. I guess I could pack the X/Y
coordinates into one 2d-array, but then, what do I do with the second
parameter?

Mor generally: Is there any kind of documentation that tells me what
the functions in NumPy do, and what parameters they expect, how to call
them, etc. All I found was:
"This function returns the least-squares solution of an overdetermined
system of linear equations. An optional third argument indicates the
cutoff for the range of singular values (defaults to 10-10). There are
four return values: the least-squares solution itself, the sum of the
squared residuals (i.e. the quantity minimized by the solution), the
rank of the matrix a, and the singular values of a in descending
order."
It doesn't even mention what the parameters "a" and "b" are for...

 
Reply With Quote
 
 
 
 
Robert Kern
Guest
Posts: n/a
 
      03-17-2006
nikie wrote:
> I'm a little bit stuck with NumPy here, and neither the docs nor
> trial&error seems to lead me anywhere:
> I've got a set of data points (x/y-coordinates) and want to fit a
> straight line through them, using LMSE linear regression. Simple
> enough. I thought instead of looking up the formulas I'd just see if
> there isn't a NumPy function that does exactly this. What I found was
> "linear_least_squares", but I can't figure out what kind of parameters
> it expects: I tried passing it my array of X-coordinates and the array
> of Y-coordinates, but it complains that the first parameter should be
> two-dimensional. But well, my data is 1d. I guess I could pack the X/Y
> coordinates into one 2d-array, but then, what do I do with the second
> parameter?
>
> Mor generally: Is there any kind of documentation that tells me what
> the functions in NumPy do, and what parameters they expect, how to call
> them, etc. All I found was:
> "This function returns the least-squares solution of an overdetermined
> system of linear equations. An optional third argument indicates the
> cutoff for the range of singular values (defaults to 10-10). There are
> four return values: the least-squares solution itself, the sum of the
> squared residuals (i.e. the quantity minimized by the solution), the
> rank of the matrix a, and the singular values of a in descending
> order."
> It doesn't even mention what the parameters "a" and "b" are for...


Look at the docstring. (Note: I am using the current version of numpy from SVN,
you may be using an older version of Numeric. http://numeric.scipy.org/)

In [171]: numpy.linalg.lstsq?
Type: function
Base Class: <type 'function'>
String Form: <function linear_least_squares at 0x1677630>
Namespace: Interactive
File:
/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site-packages/numpy-0.9.6.2148-py2.4-macosx-10.4-ppc.egg/numpy/linalg/linalg.py
Definition: numpy.linalg.lstsq(a, b, rcond=1e-10)
Docstring:
returns x,resids,rank,s
where x minimizes 2-norm(|b - Ax|)
resids is the sum square residuals
rank is the rank of A
s is the rank of the singular values of A in descending order

If b is a matrix then x is also a matrix with corresponding columns.
If the rank of A is less than the number of columns of A or greater than
the number of rows, then residuals will be returned as an empty array
otherwise resids = sum((b-dot(A,x)**2).
Singular values less than s[0]*rcond are treated as zero.

--
Robert Kern
http://www.velocityreviews.com/forums/(E-Mail Removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
 
 
 
marek.rocki@wp.pl
Guest
Posts: n/a
 
      03-17-2006
nikie napisal(a):
> I'm a little bit stuck with NumPy here, and neither the docs nor
> trial&error seems to lead me anywhere:
> I've got a set of data points (x/y-coordinates) and want to fit a
> straight line through them, using LMSE linear regression. Simple
> enough. I thought instead of looking up the formulas I'd just see if
> there isn't a NumPy function that does exactly this. What I found was
> "linear_least_squares", but I can't figure out what kind of parameters
> it expects: I tried passing it my array of X-coordinates and the array
> of Y-coordinates, but it complains that the first parameter should be
> two-dimensional. But well, my data is 1d. I guess I could pack the X/Y
> coordinates into one 2d-array, but then, what do I do with the second
> parameter?


Well, it works for me:

x = Matrix([[1, 1], [1, 2], [1, 3]])
y = Matrix([[1], [2], [4]])
print linear_least_squares(x, y)

Make sure the dimensions are right. X should be n*k, Y should (unless
you know what you are doing) be n*1. So the first dimension must be
equal.

If you wanted to:
y = Matrix([1, 2, 4])
it won't work because it'll have dimensions 1*3. You would have to
transpose it:
y = transpose(Matrix([1, 2, 4]))

Hope this helps.

 
Reply With Quote
 
nikie
Guest
Posts: n/a
 
      03-17-2006
I still don't get it...
My data looks like this:
x = [0,1,2,3]
y = [1,3,5,7]
The expected output would be something like (2, 1), as y[i] = x[i]*2+1

(An image sometimes says more than 1000 words, so to make myself clear:
this is what I want to do:
http://www.statistics4u.info/fundsta...egression.html)

So, how am I to fill these matrices?

(As a matter of fact, I already wrote the whole thing in Python in
about 9 lines of code, but I'm pretty sure this should have been
possible using NumPy)

 
Reply With Quote
 
Robert Kern
Guest
Posts: n/a
 
      03-17-2006
nikie wrote:
> I still don't get it...
> My data looks like this:
> x = [0,1,2,3]
> y = [1,3,5,7]
> The expected output would be something like (2, 1), as y[i] = x[i]*2+1
>
> (An image sometimes says more than 1000 words, so to make myself clear:
> this is what I want to do:
> http://www.statistics4u.info/fundsta...egression.html)
>
> So, how am I to fill these matrices?


As the docstring says, the problem it solves is min ||A*x - b||_2. In order to
get it to solve your problem, you need to cast it into this matrix form. This is
out of scope for the docstring, but most introductory statistics or linear
algebra texts will cover this.

In [201]: x = array([0., 1, 2, 3])

In [202]: y = array([1., 3, 5, 7])

In [203]: A = ones((len(y), 2), dtype=float)

In [204]: A[:,0] = x

In [205]: from numpy import linalg

In [206]: linalg.lstsq(A, y)
Out[206]:
(array([ 2., 1.]),
array([ 1.64987674e-30]),
2,
array([ 4.10003045, 1.09075677]))

--
Robert Kern
(E-Mail Removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
Matt Crema
Guest
Posts: n/a
 
      03-18-2006
Robert Kern wrote:
> nikie wrote:
>
>>I still don't get it...
>>My data looks like this:
>> x = [0,1,2,3]
>> y = [1,3,5,7]
>>The expected output would be something like (2, 1), as y[i] = x[i]*2+1
>>
>>(An image sometimes says more than 1000 words, so to make myself clear:
>>this is what I want to do:
>>http://www.statistics4u.info/fundsta...egression.html)
>>
>>So, how am I to fill these matrices?

>
>
> As the docstring says, the problem it solves is min ||A*x - b||_2. In order to
> get it to solve your problem, you need to cast it into this matrix form. This is
> out of scope for the docstring, but most introductory statistics or linear
> algebra texts will cover this.
>
> In [201]: x = array([0., 1, 2, 3])
>
> In [202]: y = array([1., 3, 5, 7])
>
> In [203]: A = ones((len(y), 2), dtype=float)
>
> In [204]: A[:,0] = x
>
> In [205]: from numpy import linalg
>
> In [206]: linalg.lstsq(A, y)
> Out[206]:
> (array([ 2., 1.]),
> array([ 1.64987674e-30]),
> 2,
> array([ 4.10003045, 1.09075677]))
>


I'm new to numpy myself.

The above posters are correct to say that the problem must be cast into
matrix form. However, as this is such a common technique, don't most
math/stats packages do it behind the scenes?

For example, in Matlab or Octave I could type:
polyfit(x,y,1)

and I'd get the answer with shorter, more readable code. A one-liner!
Is there a 'canned' routine to do it in numpy?

btw, I am not advocating that one should not understand the concepts
behind a 'canned' routine. If you do not understand this concept you
should take <Robert Kern>'s advice and dive into a linear algebra book.
It's not very difficult, and it is essential that a scientific
programmer understand it.

-Matt
 
Reply With Quote
 
Matt Crema
Guest
Posts: n/a
 
      03-18-2006
Matt Crema wrote:
> Robert Kern wrote:
>
>> nikie wrote:
>>
>>> I still don't get it...
>>> My data looks like this:
>>> x = [0,1,2,3]
>>> y = [1,3,5,7]
>>> The expected output would be something like (2, 1), as y[i] = x[i]*2+1
>>>
>>> (An image sometimes says more than 1000 words, so to make myself clear:
>>> this is what I want to do:
>>> http://www.statistics4u.info/fundsta...egression.html)
>>>
>>> So, how am I to fill these matrices?

>>
>>
>>
>> As the docstring says, the problem it solves is min ||A*x - b||_2. In
>> order to
>> get it to solve your problem, you need to cast it into this matrix
>> form. This is
>> out of scope for the docstring, but most introductory statistics or
>> linear
>> algebra texts will cover this.
>>
>> In [201]: x = array([0., 1, 2, 3])
>>
>> In [202]: y = array([1., 3, 5, 7])
>>
>> In [203]: A = ones((len(y), 2), dtype=float)
>>
>> In [204]: A[:,0] = x
>>
>> In [205]: from numpy import linalg
>>
>> In [206]: linalg.lstsq(A, y)
>> Out[206]:
>> (array([ 2., 1.]),
>> array([ 1.64987674e-30]),
>> 2,
>> array([ 4.10003045, 1.09075677]))
>>

>
> I'm new to numpy myself.
>
> The above posters are correct to say that the problem must be cast into
> matrix form. However, as this is such a common technique, don't most
> math/stats packages do it behind the scenes?
>
> For example, in Matlab or Octave I could type:
> polyfit(x,y,1)
>
> and I'd get the answer with shorter, more readable code. A one-liner!
> Is there a 'canned' routine to do it in numpy?
>
> btw, I am not advocating that one should not understand the concepts
> behind a 'canned' routine. If you do not understand this concept you
> should take <Robert Kern>'s advice and dive into a linear algebra book.
> It's not very difficult, and it is essential that a scientific
> programmer understand it.
>
> -Matt


Hi again,

I guess I should have looked first

m,b = numpy.polyfit(x,y,1)

-Matt
 
Reply With Quote
 
nikie
Guest
Posts: n/a
 
      03-18-2006
Thank you!

THAT's what I've been looking for from the start!

 
Reply With Quote
 
Matt Crema
Guest
Posts: n/a
 
      03-18-2006
nikie wrote:
>
> <SNIP Found that polyfit is a useful built-in tool for linear regression>


Hello,

I'm glad that helped, but let's not terminate this discussion just yet.
I am also interested in answers to your second question:

nikie wrote:

> "More generally: Is there any kind of documentation that tells me what
> the functions in NumPy do, and what parameters they expect, how to
> call them, etc.


As I said, I'm also new to numpy (only been using it for a week), but my
first impression is that the built-in documentation is seriously
lacking. For example, the Mathworks docs absolutely crush numpy's. I
mean this constructively, and not as a shot at numpy.

<Robert Kern> gave an excellent answer, but I differ with his one point
that the docstring for "numpy.linalg.lstsq?" contains an obvious answer
to the question. Good documentation should be written in much simpler
terms, and examples of the function's use should be included.

I wonder if anyone can impart some strategies for quickly solving
problems like "How do I do a linear fit in numpy?" if, for example, I
don't know which command to use.

In Matlab, I would have typed:
"lookfor fit"
It would have returned 'polyval'. Then:
"help polyval"

and this problem would have been solved in under 5 minutes.

To sum up a wordy post, "What do experienced users find is the most
efficient way to navigate the numpy docs? (assuming one has already
read the FAQs and tutorials)"

Thanks.
-Matt
 
Reply With Quote
 
Robert Kern
Guest
Posts: n/a
 
      03-18-2006
Matt Crema wrote:

> To sum up a wordy post, "What do experienced users find is the most
> efficient way to navigate the numpy docs? (assuming one has already
> read the FAQs and tutorials)"


You're not likely to get much of an answer here, but if you ask on
(E-Mail Removed), you'll get plenty of discussion.

--
Robert Kern
(E-Mail Removed)

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Non-linear regression help in Python Krzysztof Bieniasz Python 0 02-14-2011 09:24 PM
linear regression in webform Nod Lee ASP .Net 1 01-09-2007 01:44 PM
Linear regression in NumPy Jianzhong Liu Python 1 12-05-2006 06:50 PM
Re: Linear regression in 3 dimensions Robert Kern Python 5 09-14-2006 11:28 PM
Linear regression in 3 dimensions wirecom@wirelessmeasurement.com Python 1 09-02-2006 07:26 AM



Advertisments