I want to insert a row into a dataframe based on values in another dataframe. I have attempted to reproduce my problem in a simple way. I have two dataframes df1 and df2.
data = [['apple', 'apples'], ['orange', 'oranges'], ['banana', 'bananas'], ['kiwi', 'kiwis']]
df1 = pd.DataFrame(data, columns= ['fruit', 'fruits'])
data_1 = [['apple', 'A'], ['banana', 'B'], ['cherry', 'C'], ['durian', 'D'], ['kiwi', 'K'], ['elderberry', 'E'], ['fig', 'F'], ['orangar', 'O']]
df2 = pd.DataFrame(data_1, columns= ['fruit', 'label'])
The datframes look like this.
df1:
fruit fruits
0 apple apples
1 orange oranges
2 banana bananas
3 kiwi kiwi
df2:
fruit label
0 apple A
1 banana B
2 cherry C
3 durian D
4 kiwi K
5 elderberry E
6 fig F
7 orange O
Now I want to combine the two datframes based on values in the 'fruit' column. I have written a nested for loop to achieve my result.
line = pd.DataFrame(columns=['fruit', 'label'])
for i in range(len(df1)):
for j in range(len(df2)):
if df1.iloc[i]['fruit'] == df2.iloc[j]['fruit']:
df2 = df2.append(pd.DataFrame({"fruit": df1.iloc[i]['fruits'], "label": df2.iloc[j]['label']}, index= [j+0.5]))
df2 = df2.sort_index().reset_index(drop=True)
The result looks like this:
df2:
fruit label
0 apple A
1 apples A
2 banana B
3 bananas B
4 cherry C
5 durian D
6 kiwi K
7 kiwis K
8 elderberry E
9 fig F
10 orange O
11 oranges O
My original datasets have close to 30,000 values. This makes the nested for loop solution that I have used very slow. Is there a faster and more efficient way to do this?
Thanks