I'm having trouble on how to match regex in two different dataframe that is linked with its type and unique country. Here is the sample for the data df and the regex df. Note that the shape for these two dataframe is different because the regex df containing just unique value.
**Data df** **Regex df**
**Country Type Data** **Country Type Regex**
MY ABC MY1234567890 MY ABC ^MY[0-9]{10}
IT ABC IT1234567890 IT ABC ^IT[0-9]{10}
PL PQR PL123456 PL PQR ^PL
MY XYZ 456792abc MY XYZ ^\w{6,10}$
IT ABC MY45889976
IT ABC IT567888976
I have tried to merge them together and just use lambda to do the matching. Below is my code,
df.merge(df_regex,left_on='Country',right_on="Country")
df['Data Quality'] = df.apply(lambda r:re.match(r['Regex'],r['Data']) and 1 or 0, axis=1)
But, it will add another row for each of the different type and country. So there will be a lot of duplication which is not efficient and time consuming.
Is there any pythonic way to match the data to its country and type but the reference is in another dataframe. without merging those 2 df. Then if its match to its own regex, it will return 1, else 0.
Data dfholding ones and zeros?