Computing correlations with SciPy

tkpmep · Mar 16, 2006

I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

X = [1, 2, 3, 4, 5]
Y = [5, 4, 3, 2, 1]
import scipy
scipy.corrcoef(X,Y)

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<interactive input>", line 1, in ?
File "C:\Python24\Lib\site-packages\numpy\lib\function_base.py", line
671, in corrcoef
d = diag(c)
File "C:\Python24\Lib\site-packages\numpy\lib\twodim_base.py", line
80, in diag
raise ValueError, "Input must be 1- or 2-d."
ValueError: Input must be 1- or 2-d.
Thanks in advance

Thomas Philips

Felipe Almeida Lessa · Mar 16, 2006

Em Qui, 2006-03-16 Ã s 07:49 -0800, (e-mail address removed) escreveu:

I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

$ python2.4
Python 2.4.2 (#2, Nov 20 2005, 17:04:48)
[GCC 4.0.3 20051111 (prerelease) (Debian 4.0.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

x = [1,2,3,4,5]
y = [5,4,3,2,1]
import scipy
scipy.corrcoef(x, y)

Click to expand...

Click to expand...

array([[ 1., -1.],
[-1., 1.]])

John Hunter · Mar 16, 2006

tkpmep> I want to compute the correlation between two sequences X
tkpmep> and Y, and tried using SciPy to do so without success.l
tkpmep> Here's what I have, how can I correct it?

>>>> X = [1, 2, 3, 4, 5] Y = [5, 4, 3, 2, 1] import scipy
>>>> scipy.corrcoef(X,Y)

Click to expand...

Click to expand...

tkpmep> Traceback (most recent call last): File "<interactive
tkpmep> input>", line 1, in ? File
tkpmep> "C:\Python24\Lib\site-packages\numpy\lib\function_base.py",
tkpmep> line 671, in corrcoef d = diag(c) File
tkpmep> "C:\Python24\Lib\site-packages\numpy\lib\twodim_base.py",
tkpmep> line 80, in diag raise ValueError, "Input must be 1- or
tkpmep> 2-d." ValueError: Input must be 1- or 2-d.
Hmm, this may be a bug in scipy. matplotlib also defines a corrcoef
function, which you may want to use until this problem gets sorted out

In [9]: matplotlib.mlab.corrcoef(X,Y)

In [10]: X = [1, 2, 3, 4, 5]

In [11]: Y = [5, 4, 3, 2, 1]

In [12]: matplotlib.mlab.corrcoef(X,Y)
Out[12]:
array([[ 1., -1.],
[-1., 1.]])

Travis Oliphant · Mar 17, 2006

I want to compute the correlation between two sequences X and Y, and
tried using SciPy to do so without success.l Here's what I have, how
can I correct it?

This was a bug in NumPy (inherited from Numeric actually). The fix is
in SVN of NumPy.

Here are the new versions of those functions that should work as you
wish (again, these are in SVN, but perhaps you have a binary install).

These functions belong in <site-packages>/numpy/lib/function_base.py

def cov(m,y=None, rowvar=1, bias=0):
"""Estimate the covariance matrix.

If m is a vector, return the variance. For matrices return the
covariance matrix.

If y is given it is treated as an additional (set of)
variable(s).

Normalization is by (N-1) where N is the number of observations
(unbiased estimate). If bias is 1 then normalization is by N.

If rowvar is non-zero (default), then each row is a variable with
observations in the columns, otherwise each column
is a variable and the observations are in the rows.
"""

X = asarray(m,ndmin=2)
if X.shape[0] == 1:
rowvar = 1
if rowvar:
axis = 0
tup = (slice(None),newaxis)
else:
axis = 1
tup = (newaxis, slice(None))

if y is not None:
y = asarray(y,ndmin=2)
X = concatenate((X,y),axis)

X -= X.mean(axis=1-axis)[tup]
if rowvar:
N = X.shape[1]
else:
N = X.shape[0]

if bias:
fact = N*1.0
else:
fact = N-1.0

if not rowvar:
return (dot(X.transpose(), X.conj()) / fact).squeeze()
else:
return (dot(X,X.transpose().conj())/fact).squeeze()

def corrcoef(x, y=None, rowvar=1, bias=0):
"""The correlation coefficients
"""
c = cov(x, y, rowvar, bias)
try:
d = diag(c)
except ValueError: # scalar covariance
return 1
return c/sqrt(multiply.outer(d,d))

tkpmep · Mar 19, 2006

Tested it and it works like a charm! Thank you very much for fixing
this. Not knowing what an SVN is, I simply copied the code into the
appropriate library files and it works perfectly well.

May I suggest a simple enhancement: modify corrcoef so that if it is
fed two 1 dimensional arrays, it returns a scalar. cov does something
similar for covariances: if you feed it just one vector, it returns a
scalar, and if you feed it two, it returns the covariance matrix i.e:

x = [1, 2, 3, 4, 5]
z = [5, 4, 3, 2, 1]
scipy.cov(x,z)

Click to expand...

Click to expand...

array([[ 2.5, -2.5],
[-2.5, 2.5]])
2.5

I suspect that the majority of users use corrcoef to obtain point
estimates of the covariance of two vectors, and relatively few will
estimate a covariance matrix, as this method tends not to be robust to
the presence of noise and/or errors in the data.

Thomas Philips

scipy	5	May 8, 2011
scipy error invalid path	0	Jan 31, 2014
git_revision issues with scipy/numpy/matplotlib	3	Jul 7, 2012
scipy error undefined symbol: lsame_	1	Apr 19, 2010
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
SciPy Optimization syntax	1	Sep 20, 2006
segmentation fault in scipy?	16	May 10, 2006
scipy code runs in empty directory, not another	5	Nov 13, 2010

Computing correlations with SciPy

tkpmep

Felipe Almeida Lessa

John Hunter

Travis Oliphant

tkpmep

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads