Computing correlations with SciPy

T

tkpmep

I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?
X = [1, 2, 3, 4, 5]
Y = [5, 4, 3, 2, 1]
import scipy
scipy.corrcoef(X,Y)
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python24\Lib\site-packages\numpy\lib\function_base.py", line
671, in corrcoef
d = diag(c)
File "C:\Python24\Lib\site-packages\numpy\lib\twodim_base.py", line
80, in diag
raise ValueError, "Input must be 1- or 2-d."
ValueError: Input must be 1- or 2-d.
Thanks in advance

Thomas Philips
 
F

Felipe Almeida Lessa

Em Qui, 2006-03-16 às 07:49 -0800, (e-mail address removed) escreveu:
I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

$ python2.4
Python 2.4.2 (#2, Nov 20 2005, 17:04:48)
[GCC 4.0.3 20051111 (prerelease) (Debian 4.0.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
x = [1,2,3,4,5]
y = [5,4,3,2,1]
import scipy
scipy.corrcoef(x, y)
array([[ 1., -1.],
[-1., 1.]])
 
J

John Hunter

tkpmep> I want to compute the correlation between two sequences X
tkpmep> and Y, and tried using SciPy to do so without success.l
tkpmep> Here's what I have, how can I correct it?
>>>> X = [1, 2, 3, 4, 5] Y = [5, 4, 3, 2, 1] import scipy
>>>> scipy.corrcoef(X,Y)
tkpmep> Traceback (most recent call last): File "<interactive
tkpmep> input>", line 1, in ? File
tkpmep> "C:\Python24\Lib\site-packages\numpy\lib\function_base.py",
tkpmep> line 671, in corrcoef d = diag(c) File
tkpmep> "C:\Python24\Lib\site-packages\numpy\lib\twodim_base.py",
tkpmep> line 80, in diag raise ValueError, "Input must be 1- or
tkpmep> 2-d." ValueError: Input must be 1- or 2-d.
Hmm, this may be a bug in scipy. matplotlib also defines a corrcoef
function, which you may want to use until this problem gets sorted out

In [9]: matplotlib.mlab.corrcoef(X,Y)

In [10]: X = [1, 2, 3, 4, 5]

In [11]: Y = [5, 4, 3, 2, 1]

In [12]: matplotlib.mlab.corrcoef(X,Y)
Out[12]:
array([[ 1., -1.],
[-1., 1.]])
 
T

Travis Oliphant

I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

This was a bug in NumPy (inherited from Numeric actually). The fix is
in SVN of NumPy.

Here are the new versions of those functions that should work as you
wish (again, these are in SVN, but perhaps you have a binary install).

These functions belong in <site-packages>/numpy/lib/function_base.py



def cov(m,y=None, rowvar=1, bias=0):
"""Estimate the covariance matrix.

If m is a vector, return the variance. For matrices return the
covariance matrix.

If y is given it is treated as an additional (set of)
variable(s).

Normalization is by (N-1) where N is the number of observations
(unbiased estimate). If bias is 1 then normalization is by N.

If rowvar is non-zero (default), then each row is a variable with
observations in the columns, otherwise each column
is a variable and the observations are in the rows.
"""

X = asarray(m,ndmin=2)
if X.shape[0] == 1:
rowvar = 1
if rowvar:
axis = 0
tup = (slice(None),newaxis)
else:
axis = 1
tup = (newaxis, slice(None))


if y is not None:
y = asarray(y,ndmin=2)
X = concatenate((X,y),axis)

X -= X.mean(axis=1-axis)[tup]
if rowvar:
N = X.shape[1]
else:
N = X.shape[0]

if bias:
fact = N*1.0
else:
fact = N-1.0

if not rowvar:
return (dot(X.transpose(), X.conj()) / fact).squeeze()
else:
return (dot(X,X.transpose().conj())/fact).squeeze()

def corrcoef(x, y=None, rowvar=1, bias=0):
"""The correlation coefficients
"""
c = cov(x, y, rowvar, bias)
try:
d = diag(c)
except ValueError: # scalar covariance
return 1
return c/sqrt(multiply.outer(d,d))
 
T

tkpmep

Tested it and it works like a charm! Thank you very much for fixing
this. Not knowing what an SVN is, I simply copied the code into the
appropriate library files and it works perfectly well.

May I suggest a simple enhancement: modify corrcoef so that if it is
fed two 1 dimensional arrays, it returns a scalar. cov does something
similar for covariances: if you feed it just one vector, it returns a
scalar, and if you feed it two, it returns the covariance matrix i.e:
x = [1, 2, 3, 4, 5]
z = [5, 4, 3, 2, 1]
scipy.cov(x,z)
array([[ 2.5, -2.5],
[-2.5, 2.5]])
2.5

I suspect that the majority of users use corrcoef to obtain point
estimates of the covariance of two vectors, and relatively few will
estimate a covariance matrix, as this method tends not to be robust to
the presence of noise and/or errors in the data.

Thomas Philips
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,291
Messages
2,571,455
Members
48,131
Latest member
AntoniaSep

Latest Threads

Top