I have two similar dataframes to the below:
import pandas as pd
num1 = ["1111 2222", "3333", "4444 5555 6666", "7777 8888", "9999"]
num2 = ["A1", "A2", "A3", "A4", "A5"]
linkage = pd.DataFrame({"num1":num1, "num2":num2})
num1 = ["2222", "3333", "5555", "8888", "9999"]
num2 = ['none', 'none', 'none', 'none', 'none']
df = pd.DataFrame({"num1":num1, "num2":num2})
Linkage:
num1 num2
1111 2222 A1
3333 A2
4444 5555 6666 A3
7777 8888 A4
9999 A5
df:
num1 num2
2222 none
3333 none
5555 none
8888 none
9999 none
I want to place the "num2" value from the linkage dataframe in the second dataframe based on if the "num1" value from the second dataframe is one of the "num1" values in the linkage dataframe. The code I currently have is:
df.num2 = [linkage.num2[i] for y in df.num1 for i, x in enumerate(linkage.num1) if y in x]
Which yields what I want:
num1 num2
2222 A1
3333 A2
5555 A3
8888 A4
9999 A5
But the code is noticeably slower the larger the dataframes get. CPU times: total: 516 ms
Wall time: 519 ms Is there a better method of using linkage dataframes?