How to update pandas dataframe columns based on another dataframe faster?

Question

I am using the below code to update a dataframe based on another one. However, it is dramatically slow. I am looking for a solution.

for inx, row in df1.iterrows():
    dfTmp = df2.loc[df2['KANR'].astype(str) == row['KANR']]
    if dfTmp.empty:
        continue

    if dfTmp.loc[dfTmp['STATUS'] == "F5"].empty is False:
        timestamp = "%s %s" % (dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Date"],
                               dfTmp.loc[dfTmp['STATUS'].astype(str) == "F5"].iloc[0, ]["Time"])
        df1.set_value(inx, 'F5', timestamp)

Its usually easier if you provide your df.head() and the expected output — Vaishali
– Vaishali, Commented Feb 21, 2017 at 21:07

akuiper · Accepted Answer · 2017-02-21 21:49:33Z

1

You can use merge, which is optimized for speed and will be much faster for this kind of match task, something like this, assuming you don't have duplicated date time for each KANR:

df2['F5'] = df2['Date'].astype(str) + " " + df2['Time'].astype(str)
to_join = df2.loc[df2['STATUS'].astype(str) == 'F5', ['F5', 'KANR']].groupby('KANR').head(1)
df1.merge(to_join, how='left', on = 'KANR')

edited Feb 21, 2017 at 21:49

answered Feb 21, 2017 at 21:34

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to update pandas dataframe columns based on another dataframe faster?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related