Loop or iterate through columns in a DataFrame to replace null values

Question

I have a dataframe that is 762106 rows x 79 columns. There are 14 'sets' of three columns with each column indicating a different level of 'intensity' for a given feature, and NaN where there is a value in another column. They are already encoded and I want to condense them into a single column so that instead of 42 of these columns I have 14.

A subset can be recreated like this:

import pandas as pd
import numpy as np    
df = pd.DataFrame([[np.nan, 2, np.nan, 1, np.nan, np.nan, np.nan, np.nan, 3],
                    [1, np.nan, np.nan, np.nan, 2, np.nan, 1, np.nan, np.nan],
                    [np.nan, np.nan, 3, 1, np.nan, np.nan, np.nan, 2, np.nan]],
                   columns=['a','aa','aaa','b','bb','bbb','c','cc','ccc'])

Output:

    a       aa      aaa     b       bb      bbb     c       cc      ccc
0   NaN     2.0     NaN     1.0     NaN     NaN     NaN     NaN     3.0
1   1.0     NaN     NaN     NaN     2.0     NaN     1.0     NaN     NaN
2   NaN     NaN     3.0     1.0     NaN     NaN     NaN     2.0     NaN

I want them to look like this:

    a   b   c
0   2   1   3
1   1   2   1
2   3   1   2

My current solution is to take values from aa, aaa, etc using .fillna() and then use .drop() to drop the superfluous columns:

df['a'] = df['a'].fillna(df['aa']).fillna(df['aaa'])
df = df.drop(['aa','aaa'],axis = 1)
    
df['b'] = df['b'].fillna(df['bb']).fillna(df['bbb'])
df = df.drop(['bb','bbb'],axis = 1)

And this works, but I want to know if there is a more elegant way to accomplish this without copy pasting this code block 14 times.

Pablo C · Accepted Answer · 2021-02-24 07:28:06Z

1

You can use pandas.DataFrame.groupby with axis = 1 ("columns"):

df.groupby(lambda x: x[0], axis = 1).sum()
     a    b    c
0  2.0  1.0  3.0
1  1.0  2.0  1.0
2  3.0  1.0  2.0

If groupby is used with a function, it's called on each value of the object's index, in this case, the columns names.

Since you can group by any function, it can be a really flexible solution.

edited Feb 24, 2021 at 7:28

answered Feb 24, 2021 at 7:23

Pablo C

4,7612 gold badges10 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2021-02-24 07:16:52Z

0

You can grouping by first letter in columns names with GroupBy.first:

df = df.groupby(df.columns.str[0], axis=1).first()
print (df)
     a    b    c
0  2.0  1.0  3.0
1  1.0  2.0  1.0
2  3.0  1.0  2.0

answered Feb 24, 2021 at 7:16

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Loop or iterate through columns in a DataFrame to replace null values

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related