0

I want to insert a row into a dataframe based on values in another dataframe. I have attempted to reproduce my problem in a simple way. I have two dataframes df1 and df2.

data = [['apple', 'apples'], ['orange', 'oranges'], ['banana', 'bananas'], ['kiwi', 'kiwis']]
df1 = pd.DataFrame(data, columns= ['fruit', 'fruits'])

data_1 = [['apple', 'A'], ['banana', 'B'], ['cherry', 'C'], ['durian', 'D'], ['kiwi', 'K'], ['elderberry', 'E'], ['fig', 'F'], ['orangar', 'O']]
df2 = pd.DataFrame(data_1, columns= ['fruit', 'label'])

The datframes look like this.

df1:
    fruit   fruits
0   apple   apples
1   orange  oranges
2   banana  bananas
3   kiwi    kiwi 


df2:
    fruit     label
0   apple       A
1   banana      B
2   cherry      C
3   durian      D
4   kiwi        K
5   elderberry  E
6   fig         F
7   orange      O

Now I want to combine the two datframes based on values in the 'fruit' column. I have written a nested for loop to achieve my result.

line = pd.DataFrame(columns=['fruit', 'label'])
for i in range(len(df1)):
    for j in range(len(df2)):
        if df1.iloc[i]['fruit'] == df2.iloc[j]['fruit']:
            df2 = df2.append(pd.DataFrame({"fruit": df1.iloc[i]['fruits'], "label": df2.iloc[j]['label']}, index= [j+0.5]))
            df2 = df2.sort_index().reset_index(drop=True)

The result looks like this:

df2:
fruit         label
0   apple       A
1   apples      A
2   banana      B
3   bananas     B
4   cherry      C
5   durian      D
6   kiwi        K
7   kiwis       K
8   elderberry  E
9   fig         F
10  orange      O
11  oranges     O 

My original datasets have close to 30,000 values. This makes the nested for loop solution that I have used very slow. Is there a faster and more efficient way to do this?

Thanks

2 Answers 2

1

Let's do:

# get labels from df2
_df = pd.merge(df1, df2, how='left', on='fruit')

# drop the old fruit column and rename fruits to fruit
_df = _df.drop('fruit', axis=1)
_df = _df.rename({'fruits': 'fruit'}, axis=1)

# concat the 2 dataframes together
df2 = pd.concat([df2, _df])

Sign up to request clarification or add additional context in comments.

Comments

1

Use merge with left join and then reshape by DataFrame.stack:

df = (df2.merge(df1, on='fruit', how='left')
         .set_index('label')
         .stack()
         .reset_index(level=1, drop=True)
         .reset_index(name='fruit')[['fruit','label']])
print (df)
         fruit label
0        apple     A
1       apples     A
2       banana     B
3      bananas     B
4       cherry     C
5       durian     D
6         kiwi     K
7        kiwis     K
8   elderberry     E
9          fig     F
10     orangar     O

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.