Matching regex in two different dataframe Python

Question

I'm having trouble on how to match regex in two different dataframe that is linked with its type and unique country. Here is the sample for the data df and the regex df. Note that the shape for these two dataframe is different because the regex df containing just unique value.

             **Data df**                                          **Regex df**

  **Country    Type      Data**                       **Country    Type       Regex**
      MY       ABC     MY1234567890                        MY       ABC    ^MY[0-9]{10}
      IT       ABC     IT1234567890                        IT       ABC    ^IT[0-9]{10}
      PL       PQR     PL123456                            PL       PQR    ^PL
      MY       XYZ     456792abc                           MY       XYZ    ^\w{6,10}$
      IT       ABC     MY45889976
      IT       ABC     IT567888976

I have tried to merge them together and just use lambda to do the matching. Below is my code,

df.merge(df_regex,left_on='Country',right_on="Country")
df['Data Quality'] = df.apply(lambda r:re.match(r['Regex'],r['Data']) and 1 or 0, axis=1)

But, it will add another row for each of the different type and country. So there will be a lot of duplication which is not efficient and time consuming.

Is there any pythonic way to match the data to its country and type but the reference is in another dataframe. without merging those 2 df. Then if its match to its own regex, it will return 1, else 0.

Could you include your desired output? A new column in Data df holding ones and zeros? — JvdV
– JvdV, Commented Apr 20, 2020 at 9:10
yes @JvdV. Would love to know if there is other way than concatenating — Aqilah
– Aqilah, Commented Apr 20, 2020 at 14:38

Prince Francis · Accepted Answer · 2020-04-20 09:18:40Z

2

To avoid repetition based on Type we should include Type also in the joining conditions, Now apply the lambda

df2 = df.merge(df_regex, left_on=['Country', 'Type'],right_on=['Country', 'Type'])
df2['Data Quality'] = df2.apply(lambda r:re.match(r['Regex'],r['Data']) and 1 or 0, axis=1)
df2

It will give you the following output.

Country Type          Data         Regex  Data Quality
0      MY  ABC  MY1234567890  ^MY[0-9]{10}             1
1      IT  ABC  IT1234567890  ^IT[0-9]{10}             1
2      IT  ABC    MY45889976  ^IT[0-9]{10}             0
3      IT  ABC   IT567888976  ^IT[0-9]{10}             0
4      PL  PQR      PL123456           ^PL             1
5      MY  XYZ     456792abc    ^\w{6,10}$             1

answered Apr 20, 2020 at 9:18

Prince Francis

3,1071 gold badge16 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aqilah Over a year ago

thanks but after the merging, there are some data that is omitted as the shape is reduced. anyway, is there any other way without merging those two df?

Collectives™ on Stack Overflow

Matching regex in two different dataframe Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related