1

Lets say I have a numpy array of numbers. It's like 43,000X5000. For ex:

array([[-0.  ,  0.02,  0.03,  0.05,  0.06,  0.05],
       [ 0.02,  0.  ,  0.02,  0.05,  0.04,  0.04],
       [ 0.03,  0.02,  0.  ,  0.06,  0.05,  0.05],
       [ 0.05,  0.05,  0.06,  0.  ,  0.02,  0.01],
       [ 0.06,  0.04,  0.05,  0.02, -0.  ,  0.01],
       [ 0.05,  0.04,  0.05,  0.01,  0.01, -0.  ]])

I want to print a result such that it's like a cross-tab with these values and having headers both as column headers and as index. Basically what I am trying to do is I have a distance matrix of text documents. I want to show a table where I have these distances for each pair of text documents with the text document names on both the columns and indexes.

Something like below:

Austen_Emma Austen_Pride    Austen_Sense    CBronte_Jane    CBronte_Professor   CBronte_Villette
Austen_Emma -0.00   0.02    0.03    0.05    0.06    0.05
Austen_Pride    0.02    0.00    0.02    0.05    0.04    0.04
Austen_Sense    0.03    0.02    0.00    0.06    0.05    0.05
CBronte_Jane    0.05    0.05    0.06    0.00    0.02    0.01
CBronte_Professor   0.06    0.04    0.05    0.02    -0.00   0.01
CBronte_Villette    0.05    0.04    0.05    0.01    0.01    -0.00

I was thinking of converting the numpy matrix to pandas data frame and then adding header and index. Any other suggestions.

1
  • np.savetxt lets you define a header. But to add the string column you'd have to define a structured array - one with 7 fields, one string and 6 float. Commented Oct 2, 2015 at 6:46

1 Answer 1

2

You could do the following using Pandas:

import numpy as np
import pandas as pd

pd.set_option('display.width', 150)
header = ['Austen_Emma', 'Austen_Pride', 'Austen_Sense', 'CBronte_Jane', 'CBronte_Professor', 'CBronte_Villette']

a = np.array([[-0.  ,  0.02,  0.03,  0.05,  0.06,  0.05],
       [ 0.02,  0.  ,  0.02,  0.05,  0.04,  0.04],
       [ 0.03,  0.02,  0.  ,  0.06,  0.05,  0.05],
       [ 0.05,  0.05,  0.06,  0.  ,  0.02,  0.01],
       [ 0.06,  0.04,  0.05,  0.02, -0.  ,  0.01],
       [ 0.05,  0.04,  0.05,  0.01,  0.01, -0.  ]]) 

frame = pd.DataFrame(a, index=header, columns=header)
print frame

This would give you the following output:

                   Austen_Emma  Austen_Pride  Austen_Sense  CBronte_Jane  CBronte_Professor  CBronte_Villette
Austen_Emma              -0.00          0.02          0.03          0.05               0.06              0.05
Austen_Pride              0.02          0.00          0.02          0.05               0.04              0.04
Austen_Sense              0.03          0.02          0.00          0.06               0.05              0.05
CBronte_Jane              0.05          0.05          0.06          0.00               0.02              0.01
CBronte_Professor         0.06          0.04          0.05          0.02              -0.00              0.01
CBronte_Villette          0.05          0.04          0.05          0.01               0.01             -0.00
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.