1

I have a script that scrapes a site and puts specific site names into a csv. Some days it has 0 site names and some days it has more the 4. I have another script that takes the csv from today and the csv from yesterday and compares the two. If today's csv has site names that were also on yesterday's csv I want to outfile those site names to a different txt file. I have:

with open(filepath + today + filename, 'r') as t1, open(filepath + yesterday + filename, 'r') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

with open(checklistFile, 'w') as outfile:
    for line in fileone:
        if line in file:
            outfile.write(line)
            print("bad")
        else:
            outfile.write("good")
            print("good")

this only works if the csvs have the same number of lines and only works if they are in the same order. For instance, if today had "site1, site2, site3" and yesterday had "site4, site1, site5", this script would miss it. Any help would be appreciated. I'm running Python 2.7 so I cant use csv-diff.

1 Answer 1

1

You can achieve this using pandas:

import pandas as pd
df_today = pd.read_csv(filepath + today + filename)
df_yesterday = pd.read_csv(filepath + yesterday + filename)
df_common = pd.concat([df_today,df_yesterday])
duplicates_df = df_common[df_common.duplicated()]
duplicates_df.to_csv(checklistFile, index=False)
Sign up to request clarification or add additional context in comments.

1 Comment

I got a crash course of pandas to make the first script and now it comes in handy again:) thanks for the quick answer! worked like a charm.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.