I'm trying to match a column in a DataFrame to one of a list of substrings.
e.g. take the column (strings) with the following values:
text1C1
text2A
text2
text4
text4B
text4A3
And create a new column which has matched them to the following substrings:
vals = ['text1', 'text2', 'text3', 'text4', 'text4B']
The code I have at the moment works, but it seems like a really inefficient way of solving the problem.
df = pd.DataFrame({'strings': ['text1C1', 'text2A', 'text2', 'text4', 'text4B', 'text4A3']})
for v in vals:
df.loc[df[df['strings'].str.contains(v)].index, 'matched strings'] = v
This returns the following DataFrame, which is what I need.
strings matched strings
0 text1C1 text1
1 text2A text2
2 text2 text2
3 text4 text4
4 text4B text4B
5 text4A3 text4
Is there a more efficient way of doing this especially for larger DataFrames (10k+ rows)?
I cant think of how to deal with one of the items of vals also being a substring of another (text4 is a substring of text4B)