Compare multiple values from a DataFrame against a single row from another

Question

I'm trying to compare address values for inaccuracies, for example, given multiple records like:

Reference	Apartment	Address	PostCode
AS097	NaN	00 Name Road	BH1 4HB
AS097	Flat 1 Building Name	00 Name Road	BH1 4HB
AS097	Flat 2 Building Name	00 Name Rd	BH1 4HB
AS097	Flat 3 Building Name	00 Name Road	BH1 4HB
AS097	Flat 4 Building Name	00 Name Road	BH1 4HB
AS097	Flat 5 Building Name	00 Name Road	BH1 4HX
HO056	NaN	23 Street Road	XG9 9GX

I've a dataframe where I store all the "main" addresses by checking if the Apartment column is empty like so:

main_address = df['Apartment'].isnull()

df_st = pd.DataFrame({'Reference':df[main_address].Reference, 'Address':df[main_address].Address, 'PostCode':df[main_address].PostCode})

df_st will look like this:

Reference	Apartment	Address	PostCode
AS097	NaN	00 Name Road	BH1 4HB
HO056	NaN	23 Street Road	XG9 9GX

df has over 1K records, but df_st containing the "main" address ends up with approx. 200 records.

I'm trying to create a new DataFrame where I can identify where the records don't match by domparing df against df_st.

THE PROBLEM

I try the below:

# Clean the Reference values
refs_list = df['Reference'].str.split('/').str[0]
df['Reference'] = refs_list

# Create a new column titled issues and flag if the references match
df['issues'] = np.where(df['Reference'] == df_st['Reference'], 'True', 'False')

I want the above for the Address and PostCode unfortunately this does not work since df and df_st don't have the same shape.

I'm struggling to find a way to achieve a comparison between the two DataFrames df against df_st.

I want to compare all matching Reference row values from df against it's matching from df_st and if one of them don't match create a new column title Issues and store the conflicting column there.

MY DESIRED OUTCOME

Given the data above, after comparing df data against df_st results in a new DataFrame, like below

Reference	Apartment	Address	PostCode	Issues
AS097	NaN	00 Name Road	BH1 4HB	None
AS097	Flat 1 Building Name	00 Name Road	BH1 4HB	None
AS097	Flat 2 Building Name	00 Name Rd	BH1 4HB	Address
AS097	Flat 3 Building Name	00 Name Road	BH1 4HB	None
AS097	Flat 4 Building Name	00 Name Road	BH1 4HB	None
AS097	Flat 5 Building Name	00 Name Road	BH1 4HX	PostCode
HO056	NaN	23 Street Road	XG9 9GX	None

Where Address appears as an issue in the column Issues since the address don't match against df_st, same for PostCode since it differs from df_st

IN A NUT SHELL

All I want to to know how to compare matching rows by Reference from a DataFrame against another and compare the other values Address and PostCode.

Hope that makes sense.

The best way to solve this would be to join the df_st to the df dataframe using pandas.DataFrame.merge and then simply compare the rows as you would normally in pandas. — Oxbowerce
– Oxbowerce, Commented May 7, 2021 at 17:17
Just to clarify: does df contain the true data? For example, for AS097, how does one tell if the correct address is 00 Name Road or 00 Name Rd? Similarly for BH1 4HX versus BH1 4HB. Is it correct to assume that we look to df for the correct PostCode or Address? — Daren
– Daren, Commented May 10, 2021 at 2:02
@Daren, df_st contains the correct data to compare against with, Reference is not a reliable data source, but Address in df_st is — Ricardo Sanchez
– Ricardo Sanchez, Commented May 10, 2021 at 7:22

Daren · Accepted Answer · 2021-05-11 02:34:50Z

0

You can join df and df_st on Reference:

df_merged = pd.merge(df, df_st, on="Reference", how="left")

Note: The how="left" would really depend on what you want in the joined table. You can then compare the values in the joined columns:

df_merged['Address_df'] == df_merged['Address_df_st']

answered May 11, 2021 at 2:34

Daren

1863 bronze badges

Add a comment |

Stack Exchange Network

Compare multiple values from a DataFrame against a single row from another

1 Answer 1

Your Answer

Hot Network Questions

Compare multiple values from a DataFrame against a single row from another

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions