With Python I'm seeking to create a script that compares data in two different csvs. The first csv, filedata.csv, contains strings of filepaths containing information on user names and user ids. The second csv, roster.csv, contains those same fields broken up into different columns. I would like to search through the filepath string in filedata.csv for matches in roster.csv, and then write the columns from roster.csv into filedata.csv. Below are the csv structures, and the desired output.
filedata.csv
filename
C:\johndoe_0001_paper1.doc
C:\janedoe_0002_paper2.doc
C:\johnsmith_0003_paper3.pdf
roster.csv
first_name, last_name, user_id
john, doe, 0001
jane, doe, 0002
john, smith, 0003
Desired output for filedata.csv:
filename, first_name, last_name, user_id
C:\johndoe_0001_paper1.doc, john, doe, 0001
C:\janedoe_0002_paper2.doc, jane, doe, 0002
C:\johnsmith_0003_paper3.pdf, john, smith, 0003
I attempted the following code with Pandas to see if I can search through the strings in filenames.csv for matches from roster.csv:
import pandas as pd
df = pd.read_csv('filenames.csv')
filenames = str(df['filename'])
roster = pd.read_csv('roster.csv')
roster_last_name = str(roster['last_name'])
roster_first_name = str(roster['first_name'])
roster_user_id = str(roster['user_id'])
print(df.loc([filenames]).str.contains([roster_last_name]))
But get the following error:
TypeError: unhashable type: 'list'
Likewise I've tried something simpler, but with no success, as "False" is always returned:
if roster_last_name in filenames:
print("True")
else:
print("False")
I'm sure I'm missing something simple, but unsure how to proceed. All suggestions are greatly appreciated.
.loc([filename]), but the actual syntax is.loc[filename]. However, becausefilename = str(df['filename']),filenameis actually the string representing the series objectdf.filename, which is not a list of filenames.df.filenamehowever is.