I would like to conditionally select rows in a pandas dataframe if a string value contains some other string values, defined as a regex. The string values to check for change per row, and right now are stored in a series, with the formats displayed below:
df = pd.DataFrame(["a", "a", "b", "c", "de", "de"], columns=["Value"])
df:
| Index | Value |
| 0 | "a" |
| 1 | "a" |
| 2 | "b" |
| 3 | "c" |
| 4 | "de" |
| 5 | "de" |
series = pd.Series(["a|b|c", "a", "d|e", "c", "c|a", "f|e"])
Series with contains regex per row:
| Index | Value |
| 0 | "a|b|c" |
| 1 | "a" |
| 2 | "d|e" |
| 3 | "c" |
| 4 | "c|a" |
| 5 | "f|e" |
The expected output I want would be a mask that I can use to index the dataframe only to the rows that match the regex:
mask = [True, True, False, True, False, True]
df[mask]:
| Index | Value |
| 0 | "a" |
| 1 | "a" |
| 3 | "c" |
| 5 | "de" |
I would like to avoid lambdas and apply as much as possible, since I am processing a big dataset and I need execution to be as performant as possible
Thanks a lot,