You can set up a global dictionary to keep track of the values and use the value in the dictionary if it exists and increment the largest value if it doesn't:
d = {} # Dictionary to assign numerical ids
maxV = 0 # Max numerical id in the dictionary
def assignId(x):
lst = []
global d, maxV
for item in x:
if item in d:
# Get numerical id from the dictionary.
lst.append(d.get(item))
else:
# Increment the largest numerical id in the dictionary
# and add it to the dictionary.
maxV += 1
d[item] = maxV
lst.append(maxV)
return lst
If I apply this to the df using:
df['genre_ids'] = df['genre'].apply(assignId)
I get:
genre genre_ids
0 [Comedy, Supernatural, Romance] [1, 2, 3]
1 [Comedy, Parody, Romance] [1, 4, 3]
2 [Comedy] [1]
3 [Comedy, Drama, Romance, Fantasy] [1, 5, 3, 6]
4 [Comedy, Drama, Romance] [1, 5, 3]
with this dictionary d:
{'Comedy': 1,
'Supernatural': 2,
'Romance': 3,
'Parody': 4,
'Drama': 5,
'Fantasy': 6}