Adding missing rows in Pandas

Question

I have a dataframe like below.

user cat  val
u1   cat1 1
u1   cat2 2
u1   cat3 3
u2   cat1 5
u3   cat4 4

And wish to transform it like this.

user cat  val(avg)
u1   cat1 1
u1   cat2 2
u1   cat3 3
u1   cat4 4
u2   cat1 5
u2   cat2 2
u2   cat3 3
u2   cat4 4
u3   cat1 3
u3   cat2 2
u3   cat3 3
u3   cat4 4

There are also a couple of numeric columns to the right of cat, which hopefully can be filled with either NA; or if possible averaged out.

Can you clarify what you mean by "averaged out", with an example? — jpp
– jpp, Commented Jun 20, 2018 at 11:41
Edited with example. the val column is the original value, otherwise the average of existing values. — hedebyhedge
– hedebyhedge, Commented Jun 20, 2018 at 11:51

Mohamed Thasin ah · Accepted Answer · 2018-06-20 12:00:29Z

5

one way to solve this,

l1=df['user'].unique().tolist()
l2=df['cat'].unique().tolist()
new_df =  pd.DataFrame(list(itertools.product(l1,l2))).rename(columns={0:'user',1:'cat'})
new_df=pd.merge(new_df,df,on=['user','cat'],how='left')

Output:

   user   cat  val
0    u1  cat1  1.0
1    u1  cat2  2.0
2    u1  cat3  3.0
3    u1  cat4  NaN
4    u2  cat1  5.0
5    u2  cat2  NaN
6    u2  cat3  NaN
7    u2  cat4  NaN
8    u3  cat1  NaN
9    u3  cat2  NaN
10   u3  cat3  NaN
11   u3  cat4  4.0

edited Jun 20, 2018 at 12:00

answered Jun 20, 2018 at 11:45

Mohamed Thasin ah

11.2k11 gold badges65 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

ALollz · Accepted Answer · 2018-06-20 14:37:30Z

1

You want to basically reindex your DataFrame.

from itertools import product

idx = list(product(df['user'].unique(), df['cat'].unique()))
df  = df.set_index(['user', 'cat']).reindex(idx).reset_index()

The second part seems to be setting the missing values (in this case NaN after the reindex) to the average value from that category. You can accomplish this with a mapping and groupby to calculate the average.

df.loc[df.val.isnull(), 'val'] = df.loc[df.val.isnull(), 'cat'].map(df.groupby('cat').val.mean())

df is now:

   user   cat  val
0    u1  cat1  1.0
1    u1  cat2  2.0
2    u1  cat3  3.0
3    u1  cat4  4.0
4    u2  cat1  5.0
5    u2  cat2  2.0
6    u2  cat3  3.0
7    u2  cat4  4.0
8    u3  cat1  3.0
9    u3  cat2  2.0
10   u3  cat3  3.0
11   u3  cat4  4.0

edited Jun 20, 2018 at 14:37

answered Jun 20, 2018 at 14:04

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Comments

sammywemmy · Accepted Answer · 2021-09-12 08:45:21Z

1

You could use the complete function from pyjanitor to expose the missing values :

# pip install pyjanitor
import pandas as pd
import janitor as jn

df.complete('user', 'cat')
   user   cat  val
0    u1  cat1  1.0
1    u1  cat2  2.0
2    u1  cat3  3.0
3    u1  cat4  NaN
4    u2  cat1  5.0
5    u2  cat2  NaN
6    u2  cat3  NaN
7    u2  cat4  NaN
8    u3  cat1  NaN
9    u3  cat2  NaN
10   u3  cat3  NaN
11   u3  cat4  4.0

answered Sep 12, 2021 at 8:45

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

3 Comments

Shubham Sharma Over a year ago

its very handy to use complete. I think pandas developers should add that in the main library itself.

sammywemmy Over a year ago

@ShubhamSharma maybe you can raise a request for it? possibly make a PR?

Shubham Sharma Over a year ago

I guess we could raise a PR, not sure if they will accept it but nevertheless that's a good idea. Let's do it together :-)

Caio Belfort · Accepted Answer · 2018-06-20 11:49:45Z

0

What you want is a cross join between the unique values of each column

You can do this creating two dataframes with unique values from each

df_col1 = pd.DataFrame(df['user'].unique(), columns=['user'])
df_col2 = pd.DataFrame(df['cat'].unique(), columns=['cat'])

# This step will make a dummy var in each dataframe which will be used to 
# merge the dataframes
merged = df_col1.assign(key=1).merge(df_col2.assign(key=1)).drop(['key'], 1)

answered Jun 20, 2018 at 11:49

Caio Belfort

5552 silver badges11 bronze badges

Collectives™ on Stack Overflow

Adding missing rows in Pandas

4 Answers 4

Comments

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related