0

I have a dataframe (FinalDF) which looks like this

id | Movie | Cast
0   The Dark Knight Christopher Nolan
1   The Dark Knight Christian Bale
2   Pulp Fiction    Quentin Tarantino
3   Pulp Fiction    John Travolta
4   Schindler’s List    Steven Spielberg
5   Schindler’s List    Liam Neeson

and Movie names are mapped to IDs like this in movie_cast_DF

id | name | uuid
-------------------------
 1 | The Dark Knight        | m1
 2 | Pulp Fiction           | m2
 3 | Schindler’s List       | m3
 4 | Christopher Nolan      | d1
 5 | Christian Bale         | a1
 6 | Quentin Tarantino      | d2
 7 | John Travolta          | a2
 8 | Steven Spielberg       | d3
 9 | Liam Neeson            | a3

I need to map the ids in the columns like this in FinalDF

id  | Movie |   Cast |  mid     | cid
------------------------------------------------------------------
0   The Dark Knight     Christopher Nolan       m1      d1
1   The Dark Knight     Christian Bale          m1      a1
2   Pulp Fiction        Quentin Tarantino       m2      d2
3   Pulp Fiction        John Travolta           m2      a2
4   Schindler’s List    Steven Spielberg        m3      d3
5   Schindler’s List    Liam Neeson             m3      a3

I tried using following method:

def getID(x):
    try:
        return movie_cast_DF[movie_cast_DF['name'].str.contains(x.lower(), case=False)]['uuid'].values[0]
    except:
        return None
FinalDF['mid'] = FinalDF['Movie'].apply(getID)
FinalDF['cid'] = FinalDF['Cast'].apply(getID)
FinalDF.head()

Is there any efficient and faster way to do the mapping?

3
  • 2
    Looks like all you need to do is merge on id. Commented Jan 10, 2018 at 21:13
  • Can you not merge or join all the dataframes and then drop what you don't need? Commented Jan 10, 2018 at 21:14
  • If my answer did not solve your problem, please let me know. Commented Jan 10, 2018 at 22:20

1 Answer 1

2

First, set name as the index for df2.

dfmap = df2.set_index("name").uuid
dfmap

name
The Dark Knight      m1
Pulp Fiction         m2
Schindler’s List     m3
Christopher Nolan    d1
Christian Bale       a1
Quentin Tarantino    d2
John Travolta        a2
Steven Spielberg     d3
Liam Neeson          a3
Name: uuid, dtype: object

We'll use this series object to map keys to values in df. Next, call map/replace twice -

df['mid'] = df.Movie.map(dfmap)
df['cid'] = df.Cast.map(dfmap)

df

               Movie               Cast mid cid
id                                             
0    The Dark Knight  Christopher Nolan  m1  d1
1    The Dark Knight     Christian Bale  m1  a1
2       Pulp Fiction  Quentin Tarantino  m2  d2
3       Pulp Fiction      John Travolta  m2  a2
4   Schindler’s List   Steven Spielberg  m3  d3
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this worked perfectly. Although I need to watch out for few case mismatches. I converted the index to lower case and tried it again. Very efficient!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.