I have two datasets of company names with naming variations between the two (ex: Merck & Co. vs Merck & Co vs Merck & Co., Inc.)
I am trying to create a "map" where all the unique variants of "Merck" match to "Merck & Co."
My current approach is to create a dictionary, convert to a dataframe, and ideally perform this transformation described below:
Company 1 Company 2 Company 3
0 Co1 variant 0 Co1 variant 0 Co3 variant 0
1 Co1 variant 1 Co2 variant 1
2 Co1 variant 2
Should become:
Company 1 Company 2 Company 3
Co1 variant 0 True True False
Co1 variant 1 True False False
Co1 variant 2 True False False
Co2 variant 0 False True False
Co2 variant 1 False True False
Co3 variant 0 False True True
From here if each row only has 1 True value in the row, I would match this naming variant to the column header, otherwise I would match it to itself; allowing the final data set to look like
Variants Matched Name
0 Co1 variant 0 Co1 variant 0
1 Co1 variant 1 Company 1
2 Co1 variant 2 Company 1
3 Co2 variant 1 Company 2
4 Co3 variant 0 Company 3
Thank you in advance for any input, I'm also happy to clarify further if the question is not clear enough.
