1

I want to write a python script which merges scores for two csv files based on a specific key.

file1.csv

    id, uid, score1, score2
    1,abc,3,5
    2,def,2,4

file2.csv

    id, uid, score3
    1,def,5
    2,abc,4

example of desired joined file for given key 'uid':

    uid, score1, score2, score3
    abc, 3, 5, 4
    def, 2, 4, 5

My code looks like it should work but for some reason I keep getting

    KeyError: 'uid'

when I try and run this:

    import pandas as pd

    csv1 = pd.read_csv('file1.csv')
    csv2 = pd.read_csv('file2.csv')
    csv1.drop(csv1.columns[[0]], axis=1, inplace=True)
    csv2.drop(csv2.columns[[0]], axis=1, inplace=True)

    merged = pd.merge(csv1, csv2, on='uid')
    print merged

I even tried replacing

    merged = pd.merge(csv1, csv2, on='uid')

with

    merged = csv1.merge(csv2, on='uid')

and I got the same error.

I think the drop might be modifying the indexes somehow so that merge can't read 'uid' but I don't know how to fix it.

1
  • 1
    what does csv1 and csv2 look like after you do .drop? Commented Dec 16, 2015 at 3:22

2 Answers 2

1

Because pandas' read_csv method will not strip spaces in the first line for you. You can see all the keys in your csv dataframe by printing csv1.keys(), which will like this:

Index([u'id', u' uid', u' score1', u' score2'], dtype='object')

So you have to use ' uid' as merge key or change the first lines in your file1.csv or file2.csv.

P.S. You may look at this question to save a little strip work by hands

Sign up to request clarification or add additional context in comments.

Comments

0

Instead of dropping the 'id' column in both, can you try

merged = pd.merge(csv1,csv2, on=['id','uid'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.