assume I have the following two dataframes. DataFrame A and DataFrame B.
DataFrame A has four columns. Year, Month, day and temperature. (e.g. 2021 || 7 || 5 || 23). Currently, some of the temperature cell in DataFrame A are NaN.
DataFrame B has two columns. Date and temperature. (e.g. 2021/7/7 || 28)
The time interval of DataFrame A and DataFrame B are different. The time interval of DataFrame A is smaller than interval B. But some of them overlap. (e.g. every 10 mins in DataFrame B and every 5 mins in DataFrame A).
Now I want to copy the temperature data from DataFrame B to DataFrame A if there is a NaN value in DataFrame A.
I have a method which using looping, but it is very slow. I want to use pandas vectorization. But I don't know how. Can anyone teach me?
for i in tqdm(range(len(dfA['Temp']))):
if(pd.isna(df['Temp'].iloc[i])):
date_time_str = str(year) + '/' + str(month) + '/' + str(day)
try:
dfA['temp'].iloc[i] = float(dfB.loc[dfB['Date'] == date_time_str].iloc[:, 1])
except:
print("no value")
pass
My solution is very slow, how to do it with pandas vectorization?
Method I tried for vectorization:
dfA.loc[df['temp'].isnull() & ((datetime.datetime(dfA['Year'], df['*Month'], dfA['Day']).strftime("%Y/%m/%d %H:%M"))in dfB.Date.values) , 'temp'] = float(dfB[dfB['Date'] == datetime.datetime(dfA['Year'], df['*Month'], dfA['Day']].iloc[:, 1])
Above is my method and trying, it doesn't work.
Example data:
DataFrame A
Year Month Day Temperature
2020 1 17 25
2020 1 18 NaN
2020 1 19 28
2020 1 20 NaN
2020 1 21 NaN
2020 1 22 NaN
DataFrame B
Date Temp
1/17/2020 25
1/19/2020 28
1/21/2020 31
1/23/2020 34
1/25/2020 23
1/27/2020 54
Expected Output
Year Month Day Temperature
2020 1 17 25
2020 1 18 NaN
2020 1 19 28
2020 1 20 NaN
2020 1 21 31
2020 1 22 NaN
