2

I have the following table

enter image description here

I want to convert int into a matrix using python, to look something like below:

enter image description here

Can I get some direction as to where to start with this? I have used pandas to read two dataframes and merge them to create the initial table I have shown(one having two columns).

Code I am using is below is below:

import pandas as pd
from pyexcelerate import Workbook
import numpy as np
import time
start = time.process_time()
excel_file = 'Test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
print(df.columns)
print(df.index)

newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
myNewDF = newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
aftercalc = time.process_time()
print(aftercalc - start)

myNewDF.to_excel("1.xlsx")
print(time.process_time() - aftercalc)

The ouput of the prints are :

Index(['ColumnB'], dtype='object') Index(['TypeA', 'TypeA', 'TypeA', 'TypeA', 'TypeA', 'TypeB', 'TypeB', 'TypeC', 'TypeC', 'TypeC', 'TypeD'], dtype='object', name='ColumnA')

The error I get while running this is :

Traceback (most recent call last): File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\indexes\base.py", line 2657, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'ColumnA'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "test.py", line 10, in newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB')) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\frame.py", line 5628, in pivot return pivot(self, index=index, columns=columns, values=values) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\reshape\pivot.py", line 379, in pivot index = MultiIndex.from_arrays([index, data[columns]]) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\frame.py", line 2927, in getitem indexer = self.columns.get_loc(key) File "C:_data\learn\Miniconda\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

2
  • Please don't post images of code/data/Tracebacks. Just copy the text, paste it in your question and format it as code. Commented Sep 2, 2019 at 2:28
  • Welcome to SO. Please take the tour and take the time to read How to Ask and the other links found on that page. This isn't a discussion forum or tutorial service. Commented Sep 2, 2019 at 2:29

2 Answers 2

2

Does this solve?

newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))

newdf
Out[28]: 
ColumnA TypeA TypeB TypeC TypeD
ColumnB                        
A           A     A   NaN     A
B           B   NaN     B   NaN
C           C   NaN     C   NaN
D           D   NaN   NaN   NaN
E           E   NaN   NaN   NaN
F         NaN     F   NaN   NaN
Z         NaN   NaN     Z   NaN

newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
Out[29]: 
ColumnA TypeA TypeB TypeC TypeD
ColumnB                        
A         yes   yes         yes
B         yes         yes      
C         yes         yes      
D         yes                  
E         yes                  
F               yes            
Z                     yes      

Modified Code

import pandas as pd
#from pyexcelerate import Workbook
import time
import numpy as np
start = time.process_time()
excel_file = 'C:\\Users\\ss\\Desktop\\check.xlsx'
df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
print(df.columns)
print(df.index)

newdf= (df.pivot(index='ColumnB',columns='ColumnA', values='ColumnB'))
myNewDF = newdf.transform(lambda x: np.where(x.isnull(), '', 'yes'))
aftercalc = time.process_time()
print(aftercalc - start)

myNewDF.to_excel("C:\\Users\\ss\\Desktop\\output.xlsx")
Sign up to request clarification or add additional context in comments.

6 Comments

KeyError: 'ColumnA'
Check your dataframe column names? are there any spaces? or spelling issue?
There is definitely no spelling mistake. The only thing I am doing differently is , creating the df = pd.read_excel(excel_file, sheet_name=0, index_col=0)
You are getting that error because your ColumnA is your dataframe's index. Please see the modified code i have added.
code updated as suggested but still getting the same error unfortunately.
|
1

We can do

pd.crosstab(df.ColumnA,df.ColumnB).astype(bool)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.