I have a dataframe that is 762106 rows x 79 columns. There are 14 'sets' of three columns with each column indicating a different level of 'intensity' for a given feature, and NaN where there is a value in another column. They are already encoded and I want to condense them into a single column so that instead of 42 of these columns I have 14.
A subset can be recreated like this:
import pandas as pd
import numpy as np
df = pd.DataFrame([[np.nan, 2, np.nan, 1, np.nan, np.nan, np.nan, np.nan, 3],
[1, np.nan, np.nan, np.nan, 2, np.nan, 1, np.nan, np.nan],
[np.nan, np.nan, 3, 1, np.nan, np.nan, np.nan, 2, np.nan]],
columns=['a','aa','aaa','b','bb','bbb','c','cc','ccc'])
Output:
a aa aaa b bb bbb c cc ccc
0 NaN 2.0 NaN 1.0 NaN NaN NaN NaN 3.0
1 1.0 NaN NaN NaN 2.0 NaN 1.0 NaN NaN
2 NaN NaN 3.0 1.0 NaN NaN NaN 2.0 NaN
I want them to look like this:
a b c
0 2 1 3
1 1 2 1
2 3 1 2
My current solution is to take values from aa, aaa, etc using .fillna() and then use .drop() to drop the superfluous columns:
df['a'] = df['a'].fillna(df['aa']).fillna(df['aaa'])
df = df.drop(['aa','aaa'],axis = 1)
df['b'] = df['b'].fillna(df['bb']).fillna(df['bbb'])
df = df.drop(['bb','bbb'],axis = 1)
And this works, but I want to know if there is a more elegant way to accomplish this without copy pasting this code block 14 times.