Display columns in matrix format using dataframe python

Question

I have the following table

I want to convert int into a matrix using python, to look something like below:

Can I get some direction as to where to start with this? I have used pandas to read two dataframes and merge them to create the initial table I have shown(one having two columns).

Code I am using is below is below:

import pandas as pd
from pyexcelerate import Workbook
import numpy as np
import time
start = time.process_time()
excel_file = 'Test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
print(df.columns)
print(df.index)

newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
myNewDF = newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
aftercalc = time.process_time()
print(aftercalc - start)

myNewDF.to_excel("1.xlsx")
print(time.process_time() - aftercalc)

The ouput of the prints are :

Index(['ColumnB'], dtype='object') Index(['TypeA', 'TypeA', 'TypeA', 'TypeA', 'TypeA', 'TypeB', 'TypeB', 'TypeC', 'TypeC', 'TypeC', 'TypeD'], dtype='object', name='ColumnA')

The error I get while running this is :

Traceback (most recent call last): File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'ColumnA'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "test.py", line 10, in newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB')) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\frame.py", line 5628, in pivot return pivot(self, index=index, columns=columns, values=values) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\reshape\pivot.py", line 379, in pivot index = MultiIndex.from_arrays([index, data[columns]]) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

Please don't post images of code/data/Tracebacks. Just copy the text, paste it in your question and format it as code. — wwii
– wwii, Commented Sep 2, 2019 at 2:28
Welcome to SO. Please take the tour and take the time to read How to Ask and the other links found on that page. This isn't a discussion forum or tutorial service. — wwii
– wwii, Commented Sep 2, 2019 at 2:29

Humi · Accepted Answer · 2019-09-16 08:27:04Z

2

Does this solve?

newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))

newdf
Out[28]: 
ColumnA TypeA TypeB TypeC TypeD
ColumnB                        
A           A     A   NaN     A
B           B   NaN     B   NaN
C           C   NaN     C   NaN
D           D   NaN   NaN   NaN
E           E   NaN   NaN   NaN
F         NaN     F   NaN   NaN
Z         NaN   NaN     Z   NaN

newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
Out[29]: 
ColumnA TypeA TypeB TypeC TypeD
ColumnB                        
A         yes   yes         yes
B         yes         yes      
C         yes         yes      
D         yes                  
E         yes                  
F               yes            
Z                     yes

Modified Code

import pandas as pd
#from pyexcelerate import Workbook
import time
import numpy as np
start = time.process_time()
excel_file = 'C:\\Users\\ss\\Desktop\\check.xlsx'
df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
print(df.columns)
print(df.index)

newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
myNewDF = newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
aftercalc = time.process_time()
print(aftercalc - start)

myNewDF.to_excel("C:\\Users\\ss\\Desktop\\output.xlsx")

edited Sep 16, 2019 at 8:27

answered Sep 2, 2019 at 1:49

Humi

6095 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

misguided Over a year ago

KeyError: 'ColumnA'

Humi Over a year ago

Check your dataframe column names? are there any spaces? or spelling issue?

misguided Over a year ago

There is definitely no spelling mistake. The only thing I am doing differently is , creating the df = pd.read_excel(excel_file, sheet_name=0, index_col=0)

Humi Over a year ago

You are getting that error because your ColumnA is your dataframe's index. Please see the modified code i have added.

misguided Over a year ago

code updated as suggested but still getting the same error unfortunately.

|

BENY · Accepted Answer · 2019-09-02 01:40:23Z

1

We can do

pd.crosstab(df.ColumnA,df.ColumnB).astype(bool)

answered Sep 2, 2019 at 1:40

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

Display columns in matrix format using dataframe python

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related