Creating a list column in a dataframe based on values in another dataframe

Question

I have two DataFrames:

df1:

       node        ids
0   ab          [978]
1   bc          [978, 121]

df2:

       name        id
0   alpha          978
1   bravo          121

I would like to add a new column called names in df1 where I get the list of names corresponding to ids column like this

   node            ids             names
0   ab            [978]            [alpha]
1   bc            [978, 121]       [alpha,bravo]

Would apprreciate help.

I think e.g. first row ab [978] ic changed to ab [10] and 10 is no in df2['id'] — jezrael
– jezrael, Commented Feb 20, 2020 at 12:25

jezrael · Accepted Answer · 2020-02-20 12:25:44Z

4

Use if both id values are integers (or both strings, same types):

d = df2.set_index('id')['name'].to_dict()
df1['names'] = [[d.get(y) for y in x] for x in df1['ids']]
print (df1)
  node         ids           names
0   ab       [978]         [alpha]
1   bc  [978, 121]  [alpha, bravo]

If possible value in list not match value of df2['id'] is replaced some no match value:

d = df2.set_index('id')['name'].to_dict()
df1['names'] = [[d.get(y, 'no match') for y in x] for x in df1['ids']]
print (df1)
  node         ids              names
0   ab   [978, 10]  [alpha, no match]
1   bc  [978, 121]     [alpha, bravo]

Or is possible omit this values:

d = df2.set_index('id')['name'].to_dict()
df1['names'] = [[d[y] for y in x if y in d.keys()] for x in df1['ids']]
print (df1)
  node         ids           names
0   ab   [978, 10]         [alpha]
1   bc  [978, 121]  [alpha, bravo]

edited Feb 20, 2020 at 12:25

answered Feb 20, 2020 at 12:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ma0 Over a year ago

if you are using the get method you do not really need the if y in d.keys(), right?

jezrael Over a year ago

@Ev.Kounis - yes, it depends what happens if no match

jezrael Over a year ago

@Ev.Kounis - I add 2 possible ideas, thank you for pointing it.

Celius Stingher · Accepted Answer · 2020-02-20 12:35:30Z

0

How about you try with this alternative solution?

df1 = (df1.reset_index()).merge(
        ((df1['ids'].explode().reset_index()).merge(
                df2,how='left',left_on='ids',right_on='id').groupby('index')['name','ids'].agg(
                        lambda x: list(x)).reset_index()),
                how='left',on='index').drop(
                        columns=['index','ids_y']).rename(
                                columns={'ids_x':'ids'})
print(df1)

Output:

  node         ids            name
0   ab       [978]         [alpha]
1   bc  [978, 121]  [alpha, bravo]

edited Feb 20, 2020 at 12:35

answered Feb 20, 2020 at 12:29

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

Collectives™ on Stack Overflow

Creating a list column in a dataframe based on values in another dataframe

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related