How to fill missing values in DataFrame using another DataFrame in Pandas

Question

my df looks like this:

sprint   sprint_created
------   -----------
S100     2020-01-01    
S101     2020-01-10
NULL     2020-01-20
NULL     2020-01-31
S101     2020-01-10
...

in the above df, you can see that some of the sprint values are NULL

I have another df2 that has sprint date ranges:

sprint   sprint_start   sprint_end
------   -----------    ----------
S100     2020-01-01     2020-01-09    
S101     2020-01-10     2020-01-19  
S102     2020-01-20     2020-01-29  
S103     2020-01-30     2020-02-09  
S104     2020-02-10     2020-02-19  
...

How can I map these data and fill in the NULL values in the df by comparing the data in the df2?

Please note that the shape of df and df2 are different.

Are both DataFrames guaranteed to use the same row based index? i.e. row 5 in df always corresponds to row 5 in df2. Or do you need to match them based on the sprint_created and sprint_started columns? (which look the same here but may not be). Pandas has good documentation on concatenation and merging: pandas.pydata.org/pandas-docs/stable/user_guide/merging.html — de1
– de1, Commented Jan 7, 2021 at 23:55
As I can see, the Sprint column has multiple repeated values in first table. What is the key that you would use to identify and join a row if your index values dont match between the 2 tables? — Akshay Sehgal
– Akshay Sehgal, Commented Jan 7, 2021 at 23:59
in df2 yes, since it is providing the value of the sprint date ranges. But df could be random — floss
– floss, Commented Jan 8, 2021 at 0:04
df will have multiple same sprintid and the key in df would be another column which is project_id — floss
– floss, Commented Jan 8, 2021 at 0:05

David Erickson · Accepted Answer · 2021-01-08 00:42:53Z

1

One way would be to melt and resample your df2 and create a dictionary to map back to df1:

#make sure columns are in datetime format
df1['sprint_created'] = pd.to_datetime(df1['sprint_created'])
df2['sprint_start'] = pd.to_datetime(df2['sprint_start'])
df2['sprint_end'] = pd.to_datetime(df2['sprint_end'])

#melt dataframe of the two date columns and resample by group
new = (df2.melt(id_vars='sprint').drop('variable', axis=1).set_index('value')
          .groupby('sprint', group_keys=False).resample('D').ffill().reset_index())

#create dictionary of date and the sprint and map back to df1
dct = dict(zip(new['value'], new['sprint']))
df1['sprint'] = df1['sprint_created'].map(dct)
#or df1['sprint'] = df1['sprint'].fillna(df1['sprint_created'].map(dct))
df1
Out[1]: 
  sprint sprint_created
0   S100     2020-01-01
1   S101     2020-01-10
2   S102     2020-01-20
3   S103     2020-01-31
4   S101     2020-01-10

edited Jan 8, 2021 at 0:42

answered Jan 8, 2021 at 0:25

David Erickson

16.7k2 gold badges21 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wwnde · Accepted Answer · 2021-01-08 00:21:13Z

I assummed duplicated sprint in df(first dataframe can be dropped). Please advice otherwise if not so. I use merge asof with one day tolerance based on my comparison of the two dfs you provided. Advice otherwise if so

df.assign(sprint=pd.merge_asof( df.drop_duplicates(keep='first'), df1, left_on="sprint_created", right_on="sprint_start", tolerance=pd.Timedelta("1D"))['sprint_y']).dropna()

  sprint sprint_created
0   S100     2020-01-01
1   S101     2020-01-10
2   S102     2020-01-20
3   S103     2020-01-31

If your frame has legit multiple sprints as explained above in comments. Please try;

g=df.assign(sprint=pd.merge_asof( df.drop_duplicates(keep='first'), df1, left_on="sprint_created", right_on="sprint_start", tolerance=pd.Timedelta("1D"))['sprint_y'])
g.loc[g.sprint.isna(), 'sprint']=g.groupby('sprint_created').sprint.ffill()
print(g)



sprint sprint_created
0   S100     2020-01-01
1   S101     2020-01-10
2   S102     2020-01-20
3   S103     2020-01-31
4   S101     2020-01-10

Collectives™ on Stack Overflow

How to fill missing values in DataFrame using another DataFrame in Pandas

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related