I have a pandas dataframe containing 2 columns. One containing regex pattern and the other having actual string. I want to filter out the rows where the pattern column and actual data comply with each other.
My data is in a csv file and it looks like below.
pattern,data
1234.*,abcd
567_.*,567_hello
I am expecting the output data frame to be as shown below.
pattern,data
567_.*,567_hello
I tried using lambda function on each row of DataFrame. But of no use.
df[df.apply(lambda row: re.compile(row[0]).match(row[1]))]
df[df.apply(lambda row: re.compile(row[0].str).match(row[1].str))]
df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']))]
I could achieve this by constructing an all new DataFrame by iterating and filtering then. But it's not an efficient way to iterate dataframe. I am trying to work towards a more pythonic approach.
m = [bool(re.match(p, d)) for p, d in zip(df['pattern'], df['data'])]then dodf[m]to get the matches?