Update particular values in a pandas dataframe from another dataframe

Question

I have a dataframe consisting of a chat transcript:

id     time        author          text
a1    06:15:19     system        aaaaa
a1    13:57:50     Agent(Human)  ssfsd
a1    14:00:05     customer      ddg
a1    14:06:08     Agent(Human)  sdfg
a1    14:08:54     customer      sdfg
a1    15:58:48     Agent(Human)  jfghdfg
a1    16:18:41     customer      urtr
a1    16:51:38     Agent(Human)  erweg

I also have another dataframe of agents containing what time they initiated the chat. For eg: df2

id    agent_id    agent_time
a1     D01        13:57:50
a1     D02        15:58:48

Now, I'm looking to update the values in 'author' column with the values in 'agent_id' based on that particular time, and also filling the in between values of author containing "Agent(Human)" with their respective agent name.

Final output desired:

id     time        author          text
a1    06:15:19     system        aaaaa
a1    13:57:50     D01           ssfsd
a1    14:00:05     customer      ddg
a1    14:06:08     D01           sdfg
a1    14:08:54     customer      sdfg
a1    15:58:48     D02           jfghdfg
a1    16:18:41     customer      urtr
a1    16:51:38     D02           erweg

I tried to do it using .map() operation

df1['author'] = df1['time'].map(df2.set_index('agent_time')['agent_id'])

But I'm getting a wrong output:

id     time        author          text
a1    06:15:19     NaN           aaaaa
a1    13:57:50     D01           ssfsd
a1    14:00:05     NaN           ddg
a1    14:06:08     NaN           sdfg
a1    14:08:54     NaN           sdfg
a1    15:58:48     D02           jfghdfg
a1    16:18:41     NaN           urtr
a1    16:51:38     NaN           erweg

I tried using .loc method too but didn't work

Can anyone guide me on how to achieve the desired output? Any leads will be helpful

What is print (df1['time'].dtype, df2['agent_time'].dtype) ? — jezrael
– jezrael, Commented Feb 5, 2021 at 7:32
except for those matching, all others are getting NaN, including, system and customer — Shubham R
– Shubham R, Commented Feb 5, 2021 at 7:40

jezrael · Accepted Answer · 2021-02-05 07:43:19Z

1

I think in your solution should be added GroupBy.ffill for forward missing values per id and Series.where for repalce non matched Agent(Human) to original values of Author:

m = df1['author'].eq('Agent(Human)')

df1['author'] = (df1['time'].map(df2.set_index('agent_time')['agent_id'])
                            .groupby(df1['id'])
                            .ffill()
                            .where(m, df1['author']))

print (df1)
   id      time    author     text
0  a1  06:15:19    system    aaaaa
1  a1  13:57:50       D01    ssfsd
2  a1  14:00:05  customer      ddg
3  a1  14:06:08       D01     sdfg
4  a1  14:08:54  customer     sdfg
5  a1  15:58:48       D02  jfghdfg
6  a1  16:18:41  customer     urtr
7  a1  16:51:38       D02    erweg

edited Feb 5, 2021 at 7:43

answered Feb 5, 2021 at 7:31

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Update particular values in a pandas dataframe from another dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related