Insert a row into a dataframe based on values in another dataframe

Question

I want to insert a row into a dataframe based on values in another dataframe. I have attempted to reproduce my problem in a simple way. I have two dataframes df1 and df2.

data = [['apple', 'apples'], ['orange', 'oranges'], ['banana', 'bananas'], ['kiwi', 'kiwis']]
df1 = pd.DataFrame(data, columns= ['fruit', 'fruits'])

data_1 = [['apple', 'A'], ['banana', 'B'], ['cherry', 'C'], ['durian', 'D'], ['kiwi', 'K'], ['elderberry', 'E'], ['fig', 'F'], ['orangar', 'O']]
df2 = pd.DataFrame(data_1, columns= ['fruit', 'label'])

The datframes look like this.

df1:
    fruit   fruits
0   apple   apples
1   orange  oranges
2   banana  bananas
3   kiwi    kiwi 


df2:
    fruit     label
0   apple       A
1   banana      B
2   cherry      C
3   durian      D
4   kiwi        K
5   elderberry  E
6   fig         F
7   orange      O

Now I want to combine the two datframes based on values in the 'fruit' column. I have written a nested for loop to achieve my result.

line = pd.DataFrame(columns=['fruit', 'label'])
for i in range(len(df1)):
    for j in range(len(df2)):
        if df1.iloc[i]['fruit'] == df2.iloc[j]['fruit']:
            df2 = df2.append(pd.DataFrame({"fruit": df1.iloc[i]['fruits'], "label": df2.iloc[j]['label']}, index= [j+0.5]))
            df2 = df2.sort_index().reset_index(drop=True)

The result looks like this:

df2:
fruit         label
0   apple       A
1   apples      A
2   banana      B
3   bananas     B
4   cherry      C
5   durian      D
6   kiwi        K
7   kiwis       K
8   elderberry  E
9   fig         F
10  orange      O
11  oranges     O

My original datasets have close to 30,000 values. This makes the nested for loop solution that I have used very slow. Is there a faster and more efficient way to do this?

Thanks

kelvt · Accepted Answer · 2021-09-17 05:17:25Z

1

Let's do:

# get labels from df2
_df = pd.merge(df1, df2, how='left', on='fruit')

# drop the old fruit column and rename fruits to fruit
_df = _df.drop('fruit', axis=1)
_df = _df.rename({'fruits': 'fruit'}, axis=1)

# concat the 2 dataframes together
df2 = pd.concat([df2, _df])

answered Sep 17, 2021 at 5:17

kelvt

1,0588 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2021-09-17 05:18:32Z

1

Use merge with left join and then reshape by DataFrame.stack:

df = (df2.merge(df1, on='fruit', how='left')
         .set_index('label')
         .stack()
         .reset_index(level=1, drop=True)
         .reset_index(name='fruit')[['fruit','label']])
print (df)
         fruit label
0        apple     A
1       apples     A
2       banana     B
3      bananas     B
4       cherry     C
5       durian     D
6         kiwi     K
7        kiwis     K
8   elderberry     E
9          fig     F
10     orangar     O

answered Sep 17, 2021 at 5:18

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Insert a row into a dataframe based on values in another dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related