2

assume I have the following two dataframes. DataFrame A and DataFrame B.

DataFrame A has four columns. Year, Month, day and temperature. (e.g. 2021 || 7 || 5 || 23). Currently, some of the temperature cell in DataFrame A are NaN.

DataFrame B has two columns. Date and temperature. (e.g. 2021/7/7 || 28)

The time interval of DataFrame A and DataFrame B are different. The time interval of DataFrame A is smaller than interval B. But some of them overlap. (e.g. every 10 mins in DataFrame B and every 5 mins in DataFrame A).

Now I want to copy the temperature data from DataFrame B to DataFrame A if there is a NaN value in DataFrame A.

I have a method which using looping, but it is very slow. I want to use pandas vectorization. But I don't know how. Can anyone teach me?

    for i in tqdm(range(len(dfA['Temp']))):
       if(pd.isna(df['Temp'].iloc[i])):
         date_time_str = str(year) + '/' + str(month) + '/' + str(day)
         try:
            dfA['temp'].iloc[i] = float(dfB.loc[dfB['Date'] == date_time_str].iloc[:, 1])
            
         except:
            print("no value")
            pass

My solution is very slow, how to do it with pandas vectorization?

Method I tried for vectorization:

dfA.loc[df['temp'].isnull() & ((datetime.datetime(dfA['Year'], df['*Month'], dfA['Day']).strftime("%Y/%m/%d %H:%M"))in dfB.Date.values) , 'temp'] = float(dfB[dfB['Date'] == datetime.datetime(dfA['Year'], df['*Month'], dfA['Day']].iloc[:, 1])

Above is my method and trying, it doesn't work.

Example data:

DataFrame A
Year    Month   Day Temperature
2020    1        17  25
2020    1        18  NaN
2020    1        19  28
2020    1        20  NaN
2020    1        21  NaN
2020    1        22  NaN

DataFrame B
Date    Temp
1/17/2020   25
1/19/2020   28
1/21/2020   31
1/23/2020   34
1/25/2020   23
1/27/2020   54

Expected Output
Year    Month   Day Temperature
2020    1        17 25
2020    1        18 NaN
2020    1        19 28
2020    1        20 NaN
2020    1        21 31
2020    1        22 NaN




enter image description here

2
  • @Chirs, I add some sample data, please check, thank you Commented Aug 6, 2021 at 3:46
  • @Chris, I just change the sample data, please check, thank you Commented Aug 6, 2021 at 3:53

2 Answers 2

1

Let's map them:

dfa['Date']=pd.to_datetime(dfa[['Day','Month','Year']])
dfb['Date']=pd.to_datetime(dfb['Date'])
dfb['Temperature']=dfa.pop('Date').map(dfb.set_index('Date')['Temp'])

OR

Let's Merge them:

dfa['Date']=pd.to_datetime(dfa[['Day','Month','Year']])
dfb['Date']=pd.to_datetime(dfb['Date'])
dfa=dfa.merge(dfb[['Date','Temp']],on='Date',how='left')
dfa['Temperature']=dfa['Temperature'].fillna(dfa.pop('Temp'))
Sign up to request clarification or add additional context in comments.

2 Comments

@Arurag Dabas, I found that it should be dfa.pop('Temp'). I think I just made a mistake so I deleted the comment. Because we merge dfa and dfb. dfa now has the column 'temp'. So we can pop the value from dfa['temp'], after that write it to dfa['Temperature']. I am not sure, can you help me you confirm my answer is correct. Thank you so much
@Stack ohh....then updated answer....yes you are right since we are assigning the merge operation to dfa variable so it's dfa.pop('Temp') :)
1

One way using pandas.to_datetime with pandas.Series.fillna:

df1 = df1.set_index(pd.to_datetime(df1[["Year", "Month", "Day"]]))
s = df2.set_index(pd.to_datetime(df2.pop("Date"))).squeeze()
df1["Temperature"] = df1["Temperature"].fillna(s)
print(df1.reset_index(drop=True))

Output:

   Year  Month  Day  Temperature
0  2020      1   17         25.0
1  2020      1   18          NaN
2  2020      1   19         28.0
3  2020      1   20          NaN
4  2020      1   21         31.0
5  2020      1   22          NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.