2

I want to plot the correlation matrix using python. I have tried with the following script

  corr_matrix=np.corrcoef(vector)
  imshow(corr_matrix, interpolation='bilinear')
  colorbar()
  show()

The dimension of the matrix is 2500X2500. The above code produces a matrix of full of dots. But I want smooth surface. How do I get that.

Best Sudipta

2 Answers 2

5

What do you mean by "smooth surface" and why do you want to visualize your correlation matrix that way?

Here are two useful examples for visualizing [correlation] matrices. Both contain an explanation as well as example code for matplotlib.

  1. Square grid pseudocolor plot http://glowingpython.blogspot.com/2012/10/visualizing-correlation-matrices.html

  2. Hinton Diagram http://www.scipy.org/Cookbook/Matplotlib/HintonDiagrams

Update: To supplement my comment, here's a pseudocolor visualization of a 1000x1000 correlation matrix, which didn't encounter memory issues on my humble laptop:

enter image description here

Note that although row 20 is correlated to other variables and row 40 is correlated to row 80, in the style of the GlowingPython example, yet this information is obscured by the sheer size of the matrix.

Sign up to request clarification or add additional context in comments.

9 Comments

the pcolor() method is out of memory for my matrix. It works for small matrix (say 10X10) but for the large matrix it is not working. Is there any other method which work similar to the pcolor()?
I'm not sure looking at a pcolor chart of a matrix that large (2500x2500) will tell you anything useful. Having said that, how large do you get before the memory error? Perhaps consider plotting a quarter of the matrix at a time?
See my update which contains a pic of a 1000x1000 corr matrix.
Same script as in the first link (GlowingPython), I just changed the size.
@user1964587 - On a side note, use pcolormesh (or just imshow as you're already doing, but with interpolation='nearest') instead of pcolor for large arrays. (pcolormesh is limited to rectangular cells, whereas pcolor isn't, thus the speedup.) The advantage (or disadvantage) to using pcolormesh or pcolor over imshow when displaying regular grids is mostly that the former produce vector output. At any rate, pcolormesh should solve your problems with the large matrix.
|
0

You can sort the columns based on the values obtained in the correlation matrix.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.