1

I have a dataframe consisting of a chat transcript:

id     time        author          text
a1    06:15:19     system        aaaaa
a1    13:57:50     Agent(Human)  ssfsd
a1    14:00:05     customer      ddg
a1    14:06:08     Agent(Human)  sdfg
a1    14:08:54     customer      sdfg
a1    15:58:48     Agent(Human)  jfghdfg
a1    16:18:41     customer      urtr
a1    16:51:38     Agent(Human)  erweg

I also have another dataframe of agents containing what time they initiated the chat. For eg: df2

id    agent_id    agent_time
a1     D01        13:57:50
a1     D02        15:58:48

Now, I'm looking to update the values in 'author' column with the values in 'agent_id' based on that particular time, and also filling the in between values of author containing "Agent(Human)" with their respective agent name.

Final output desired:

id     time        author          text
a1    06:15:19     system        aaaaa
a1    13:57:50     D01           ssfsd
a1    14:00:05     customer      ddg
a1    14:06:08     D01           sdfg
a1    14:08:54     customer      sdfg
a1    15:58:48     D02           jfghdfg
a1    16:18:41     customer      urtr
a1    16:51:38     D02           erweg

I tried to do it using .map() operation

df1['author'] = df1['time'].map(df2.set_index('agent_time')['agent_id'])

But I'm getting a wrong output:

id     time        author          text
a1    06:15:19     NaN           aaaaa
a1    13:57:50     D01           ssfsd
a1    14:00:05     NaN           ddg
a1    14:06:08     NaN           sdfg
a1    14:08:54     NaN           sdfg
a1    15:58:48     D02           jfghdfg
a1    16:18:41     NaN           urtr
a1    16:51:38     NaN           erweg

I tried using .loc method too but didn't work

Can anyone guide me on how to achieve the desired output? Any leads will be helpful

5
  • What is print (df1['time'].dtype, df2['agent_time'].dtype) ? Commented Feb 5, 2021 at 7:32
  • @jezrael both are object datatypes, strings Commented Feb 5, 2021 at 7:37
  • except for those matching, all others are getting NaN, including, system and customer Commented Feb 5, 2021 at 7:40
  • Answer was edited. Commented Feb 5, 2021 at 7:46
  • @jezrael You're the best :) Commented Feb 5, 2021 at 7:52

1 Answer 1

1

I think in your solution should be added GroupBy.ffill for forward missing values per id and Series.where for repalce non matched Agent(Human) to original values of Author:

m = df1['author'].eq('Agent(Human)')

df1['author'] = (df1['time'].map(df2.set_index('agent_time')['agent_id'])
                            .groupby(df1['id'])
                            .ffill()
                            .where(m, df1['author']))

print (df1)
   id      time    author     text
0  a1  06:15:19    system    aaaaa
1  a1  13:57:50       D01    ssfsd
2  a1  14:00:05  customer      ddg
3  a1  14:06:08       D01     sdfg
4  a1  14:08:54  customer     sdfg
5  a1  15:58:48       D02  jfghdfg
6  a1  16:18:41  customer     urtr
7  a1  16:51:38       D02    erweg
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.