Pandas mapping columns from multiple dataframes

Question

I have a dataframe (FinalDF) which looks like this

id | Movie | Cast
0   The Dark Knight Christopher Nolan
1   The Dark Knight Christian Bale
2   Pulp Fiction    Quentin Tarantino
3   Pulp Fiction    John Travolta
4   Schindler’s List    Steven Spielberg
5   Schindler’s List    Liam Neeson

and Movie names are mapped to IDs like this in movie_cast_DF

id | name | uuid
-------------------------
 1 | The Dark Knight        | m1
 2 | Pulp Fiction           | m2
 3 | Schindler’s List       | m3
 4 | Christopher Nolan      | d1
 5 | Christian Bale         | a1
 6 | Quentin Tarantino      | d2
 7 | John Travolta          | a2
 8 | Steven Spielberg       | d3
 9 | Liam Neeson            | a3

I need to map the ids in the columns like this in FinalDF

id  | Movie |   Cast |  mid     | cid
------------------------------------------------------------------
0   The Dark Knight     Christopher Nolan       m1      d1
1   The Dark Knight     Christian Bale          m1      a1
2   Pulp Fiction        Quentin Tarantino       m2      d2
3   Pulp Fiction        John Travolta           m2      a2
4   Schindler’s List    Steven Spielberg        m3      d3
5   Schindler’s List    Liam Neeson             m3      a3

I tried using following method:

def getID(x):
    try:
        return movie_cast_DF[movie_cast_DF['name'].str.contains(x.lower(), case=False)]['uuid'].values[0]
    except:
        return None
FinalDF['mid'] = FinalDF['Movie'].apply(getID)
FinalDF['cid'] = FinalDF['Cast'].apply(getID)
FinalDF.head()

Is there any efficient and faster way to do the mapping?

Can you not merge or join all the dataframes and then drop what you don't need? — Tony
– Tony, Commented Jan 10, 2018 at 21:14
If my answer did not solve your problem, please let me know. — cs95
– cs95, Commented Jan 10, 2018 at 22:20

cs95 · Accepted Answer · 2018-01-10 21:15:28Z

2

First, set name as the index for df2.

dfmap = df2.set_index("name").uuid
dfmap

name
The Dark Knight      m1
Pulp Fiction         m2
Schindler’s List     m3
Christopher Nolan    d1
Christian Bale       a1
Quentin Tarantino    d2
John Travolta        a2
Steven Spielberg     d3
Liam Neeson          a3
Name: uuid, dtype: object

We'll use this series object to map keys to values in df. Next, call map/replace twice -

df['mid'] = df.Movie.map(dfmap)
df['cid'] = df.Cast.map(dfmap)

df

               Movie               Cast mid cid
id                                             
0    The Dark Knight  Christopher Nolan  m1  d1
1    The Dark Knight     Christian Bale  m1  a1
2       Pulp Fiction  Quentin Tarantino  m2  d2
3       Pulp Fiction      John Travolta  m2  a2
4   Schindler’s List   Steven Spielberg  m3  d3

answered Jan 10, 2018 at 21:15

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jaswanth Kumar Over a year ago

Thanks, this worked perfectly. Although I need to watch out for few case mismatches. I converted the index to lower case and tried it again. Very efficient!

Collectives™ on Stack Overflow

Pandas mapping columns from multiple dataframes

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related