Adding string header and index to a numpy array

Question

Lets say I have a numpy array of numbers. It's like 43,000X5000. For ex:

array([[-0.  ,  0.02,  0.03,  0.05,  0.06,  0.05],
       [ 0.02,  0.  ,  0.02,  0.05,  0.04,  0.04],
       [ 0.03,  0.02,  0.  ,  0.06,  0.05,  0.05],
       [ 0.05,  0.05,  0.06,  0.  ,  0.02,  0.01],
       [ 0.06,  0.04,  0.05,  0.02, -0.  ,  0.01],
       [ 0.05,  0.04,  0.05,  0.01,  0.01, -0.  ]])

I want to print a result such that it's like a cross-tab with these values and having headers both as column headers and as index. Basically what I am trying to do is I have a distance matrix of text documents. I want to show a table where I have these distances for each pair of text documents with the text document names on both the columns and indexes.

Something like below:

Austen_Emma Austen_Pride    Austen_Sense    CBronte_Jane    CBronte_Professor   CBronte_Villette
Austen_Emma -0.00   0.02    0.03    0.05    0.06    0.05
Austen_Pride    0.02    0.00    0.02    0.05    0.04    0.04
Austen_Sense    0.03    0.02    0.00    0.06    0.05    0.05
CBronte_Jane    0.05    0.05    0.06    0.00    0.02    0.01
CBronte_Professor   0.06    0.04    0.05    0.02    -0.00   0.01
CBronte_Villette    0.05    0.04    0.05    0.01    0.01    -0.00

I was thinking of converting the numpy matrix to pandas data frame and then adding header and index. Any other suggestions.

np.savetxt lets you define a header. But to add the string column you'd have to define a structured array - one with 7 fields, one string and 6 float. — hpaulj
– hpaulj, Commented Oct 2, 2015 at 6:46

Martin Evans · Accepted Answer · 2015-10-02 07:54:13Z

You could do the following using Pandas:

import numpy as np
import pandas as pd

pd.set_option('display.width', 150)
header = ['Austen_Emma', 'Austen_Pride', 'Austen_Sense', 'CBronte_Jane', 'CBronte_Professor', 'CBronte_Villette']

a = np.array([[-0.  ,  0.02,  0.03,  0.05,  0.06,  0.05],
       [ 0.02,  0.  ,  0.02,  0.05,  0.04,  0.04],
       [ 0.03,  0.02,  0.  ,  0.06,  0.05,  0.05],
       [ 0.05,  0.05,  0.06,  0.  ,  0.02,  0.01],
       [ 0.06,  0.04,  0.05,  0.02, -0.  ,  0.01],
       [ 0.05,  0.04,  0.05,  0.01,  0.01, -0.  ]]) 

frame = pd.DataFrame(a, index=header, columns=header)
print frame

This would give you the following output:

                   Austen_Emma  Austen_Pride  Austen_Sense  CBronte_Jane  CBronte_Professor  CBronte_Villette
Austen_Emma              -0.00          0.02          0.03          0.05               0.06              0.05
Austen_Pride              0.02          0.00          0.02          0.05               0.04              0.04
Austen_Sense              0.03          0.02          0.00          0.06               0.05              0.05
CBronte_Jane              0.05          0.05          0.06          0.00               0.02              0.01
CBronte_Professor         0.06          0.04          0.05          0.02              -0.00              0.01
CBronte_Villette          0.05          0.04          0.05          0.01               0.01             -0.00

Collectives™ on Stack Overflow

Adding string header and index to a numpy array

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related