2

I have a pandas dataframe containing 2 columns. One containing regex pattern and the other having actual string. I want to filter out the rows where the pattern column and actual data comply with each other.

My data is in a csv file and it looks like below.

pattern,data
1234.*,abcd
567_.*,567_hello

I am expecting the output data frame to be as shown below.

pattern,data
567_.*,567_hello

I tried using lambda function on each row of DataFrame. But of no use.

df[df.apply(lambda row: re.compile(row[0]).match(row[1]))]
df[df.apply(lambda row: re.compile(row[0].str).match(row[1].str))]
df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']))]

I could achieve this by constructing an all new DataFrame by iterating and filtering then. But it's not an efficient way to iterate dataframe. I am trying to work towards a more pythonic approach.

3
  • 1
    Is producing a boolean list and then using that acceptable enough? eg: m = [bool(re.match(p, d)) for p, d in zip(df['pattern'], df['data'])] then do df[m] to get the matches? Commented Nov 12, 2019 at 11:53
  • @JonClements I don't want to extend the dataframe. The dataframe is already huge with millions of records. Commented Nov 12, 2019 at 11:54
  • How is that extending a DataFrame? Also - millions of bools isn't exactly generally prohibitive... Commented Nov 12, 2019 at 11:59

1 Answer 1

1

After a bit of modification, here is the result:

df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']) is not None, axis=1)]

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, It worked. May I know why my code didn't work? What's the reason for adding is not None condition?
@BarathVutukuri I believe the is not None condition is used to Return only positive matches where the pattern returns a successful match . Other Places where the pattern does not Match would be a Empty Space(Which is basically a NaN).
@BarathVutukuri apart from is not None , axis 1 (row) should be stated as well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.