I'm trying to create a calendar that rolls up information across a catalog of projects and organizes it chronologically and by project type. I've been using Pandas and have been unable to get the basic structure right. For example, given this dataset:
Type Name Health Month Year
0 Marketing ProjectA OK Jan 2018
1 Science ProjectB Warning Apr 2018
2 Marketing ProjectC OK Mar 2018
3 Development ProjectD OK Feb 2018
4 Marketing ProjectE OK Jan 2018
5 Development ProjectF Warning Feb 2018
6 Development ProjectG Trouble May 2018
7 Marketing ProjectH Trouble May 2018
8 Development ProjectI Warning Feb 2018
9 Marketing ProjectJ OK May 2018
10 Science ProjectK Warning Apr 2018
Using the trick shown at Remove none values from dataframe, I can create field to track the rank order of each item within the final table:
df['aggval'] = df['Year'].map(str) + df['Month'] + df['Type']
df['index'] = df.groupby(['aggval']).cumcount()
produces 2 extra columns:
Type Name Health Month Year aggval index
0 Marketing ProjectA OK Jan 2018 2018JanMarketing 0
1 Science ProjectB Warning Apr 2018 2018AprScience 0
2 Marketing ProjectC OK Mar 2018 2018MarMarketing 0
3 Development ProjectD OK Feb 2018 2018FebDevelopment 0
4 Marketing ProjectE OK Jan 2018 2018JanMarketing 1
5 Development ProjectF Warning Feb 2018 2018FebDevelopment 1
6 Development ProjectG Trouble May 2018 2018MayDevelopment 0
7 Marketing ProjectH Trouble May 2018 2018MayMarketing 0
8 Development ProjectI Warning Feb 2018 2018FebDevelopment 2
9 Marketing ProjectJ OK May 2018 2018MayMarketing 1
10 Science ProjectK Warning Apr 2018 2018AprScience 1
With these extract columns, we can now pivot to create an initial version of our project roll up table:
pv1 = pd.pivot_table(df, values='Name', index=['Type', 'index'], columns=['Year', 'Month'], aggfunc=lambda x: "".join(x)).fillna('')
pv1 = pv1.reindex(columns = zip(12 * [2018], ['Jan', 'Feb', 'Mar', 'Apr', 'May']))
to produce the report below. This is basically correct: it collects and lists projects, shows their Names, and organizes them by Type (swimlanes) and chronologically by year and month:
Year 2018
Month Jan Feb Mar Apr May
Type index
Development 0 ProjectD ProjectG
1 ProjectF
2 ProjectI
Marketing 0 ProjectA ProjectC ProjectH
1 ProjectE ProjectJ
Science 0 ProjectB
1 ProjectK
I'm now stumped in trying to extend this model to display the Name and Health for each project together.
I can add in the Health field as a second pivot table value:
pv2 = pd.pivot_table(df, values=['Name', 'Health'], index=['Type', 'index'], columns=['Year', 'Month'], aggfunc={'Name':lambda x: "|".join(x), 'Health':lambda x: ":".join(x), }).fillna('')
# pv2 = pv2.reindex(columns = zip(10 * [2018], ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr', 'May', 'May'], ['Health', 'Name', 'Health', 'Name', 'Health', 'Name', 'Health', 'Name', 'Health', 'Name', 'Health', 'Name']))
to produce:
Health Name
Year 2018 2018
Month Apr Feb Jan Mar May Apr Feb Jan Mar May
Type index
Development 0 OK Trouble ProjectD ProjectG
1 Warning ProjectF
2 Warning ProjectI
Marketing 0 OK OK Trouble ProjectA ProjectC ProjectH
1 OK OK ProjectE ProjectJ
Science 0 Warning ProjectB
1 Warning ProjectK
This is the right idea -- both the project Health and Name show up for each project, in the right Month and right Type swimlane, but I'd like them side-by-side by project. Reindexing the columns produces the right result at the header level, but wipes out the cells with Nan values:
pv2 = pd.pivot_table(df, values=['Name', 'Health'], index=['Type', 'index'], columns=['Year', 'Month'], aggfunc={'Name':lambda x: "|".join(x), 'Health':lambda x: ":".join(x), }).fillna('')
pv2 = pv2.reindex(columns = zip(10 * [2018], ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar', 'Apr', 'Apr', 'May', 'May'], ['Health', 'Name', 'Health', 'Name', 'Health', 'Name', 'Health', 'Name', 'Health', 'Name', 'Health', 'Name']))
produces:
2018
Year Jan Feb Mar Apr May
Month Health Name Health Name Health Name Health Name Health Name
Type index
Development 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Marketing 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Science 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Again, the structure is now correct, but the cell values are no longer showing the project-specific data. What am I missing?
