1

I need to find rows that have circular referenced in the CSV input file like:

start,end,weather
california,arizona,hot
colorado,kansas,cold
arizona,california,hot

The above should detect that the 1st and 3rd row a circular reference. I'm currently loading the csv into database and running a self-join query to determine that the data has circular reference. But looking to see if there is any way to handle this using Python Pandas.

Thanks!

4
  • How about california -> arizona, arizona -> kansas, kansas -> california? Do you need to handle this loop? Commented Aug 13, 2018 at 6:51
  • No, only the first level circular reference and not transitive loop. Thanks! Commented Aug 13, 2018 at 6:54
  • 1
    Does right / left matter? What if the last row has right relation? Commented Aug 13, 2018 at 6:56
  • yes, it needs to be the same relation. Updated the sample in the question posted. Commented Aug 13, 2018 at 7:00

1 Answer 1

1

You can filter the rows where the value of df.start Serie is contain in the df.end Serie. Then you appy a second filter to get the rows where the value of df.end Serie is contain in the df.start Serie :

df = df.loc[df.start.isin(df.end),:]
df = df.loc[df.end.isin(df.start),:]
df["way"] = df.apply(lambda x: sorted([x["start"], x["end"]]), axis=1)
print(df)

The output will give you line 0 and 2.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there anyway to ensure the 3rd column is the same?
I updated my answer by adding a new Series that does the job

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.