2

I have 2 dataframes df_1 and df_2. Both have an index datetimecodewhich is a pd.datetime64 object, and a temp column. I want to iterate through df_1 and replace all the NaN temperature values with the corresponding temp from 'df_2'.

Something like this:

for index, row in df_1.iterows():
      row['temp'] = df_2[index]['temp'] if row['temp'] ==np.nan

but this is invalid sytax

2 Answers 2

1

IIUC

df_1.fillna(df_2, inplace=True)

or

df_1.loc[df_1.temp.isnull(), 'temp'] = df_2.temp

demonstration

tidx = pd.date_range('2016-03-31', periods=5)
df_1 = pd.DataFrame(dict(temp=[1, np.nan, 3, np.nan, 5]), tidx)
df_2 = pd.DataFrame(dict(temp=np.arange(11, 16)), tidx)

df_1.fillna(df_2)

enter image description here

df_1.loc[df_1.temp.isnull(), 'temp'] = df_2.temp

df_1

enter image description here

Sign up to request clarification or add additional context in comments.

5 Comments

I have implemented the first method, and there is an error at the df_1.fillna statement which says pandas.indexes.base.InvalidIndexError The error is inconsistent - it depends on the time period selected but i can't see anything causing it......
...In fact, the error is completely intermittent and apparently random when running the same data - so it looks like there is some instability in this solution?
well then... good thing I gave you two solutions ;-) I'm looking into the warning though. That is a tad troubling
Ha! Yes a good thing. But your second solution isn't working for me at the moment, it gives this error: ValueError: cannot reindex from a duplicate axis.
My bad! There was a duplicated data item in df_2 (because the imported timecodes had daylight savings) which caused the indexing error (though strangely, only sometimes). Both of your methods work well.
1

Is this what you're looking for:

df_1 = pd.DataFrame({'temp': [1,2,3,np.nan,5,np.nan,7]})
   temp
0   1.0
1   2.0
2   3.0
3   NaN
4   5.0
5   NaN
6   7.0

df_2 = pd.DataFrame({'temp': [8,9,10,11,12,13,14]})
   temp
0     8
1     9
2    10
3    11
4    12
5    13
6    14

df_1.temp.fillna(df_2['temp'], inplace=True)

   temp
0   1.0
1   2.0
2   3.0
3  11.0
4   5.0
5  13.0
6   7.0

   temp
0     8
1     9
2    10
3    11
4    12
5    13
6    14

2 Comments

@Brian Well we both answered exactly in the same time apparently. Just a few seconds apart probably...
Thank you both @Brian - sorry I can't accept both answers!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.