how to write a python script that merges columns in two csv files based on a specific key

Question

I want to write a python script which merges scores for two csv files based on a specific key.

file1.csv

    id, uid, score1, score2
    1,abc,3,5
    2,def,2,4

file2.csv

    id, uid, score3
    1,def,5
    2,abc,4

example of desired joined file for given key 'uid':

    uid, score1, score2, score3
    abc, 3, 5, 4
    def, 2, 4, 5

My code looks like it should work but for some reason I keep getting

    KeyError: 'uid'

when I try and run this:

    import pandas as pd

    csv1 = pd.read_csv('file1.csv')
    csv2 = pd.read_csv('file2.csv')
    csv1.drop(csv1.columns[[0]], axis=1, inplace=True)
    csv2.drop(csv2.columns[[0]], axis=1, inplace=True)

    merged = pd.merge(csv1, csv2, on='uid')
    print merged

I even tried replacing

    merged = pd.merge(csv1, csv2, on='uid')

with

    merged = csv1.merge(csv2, on='uid')

and I got the same error.

I think the drop might be modifying the indexes somehow so that merge can't read 'uid' but I don't know how to fix it.

what does csv1 and csv2 look like after you do .drop?

Liam Foley
– Liam Foley

2015-12-16 03:22:00 +00:00
Commented Dec 16, 2015 at 3:22 — Liam Foley
– Liam Foley, Commented Dec 16, 2015 at 3:22

Community · Accepted Answer · 2017-05-23 12:30:55Z

1

Because pandas' read_csv method will not strip spaces in the first line for you. You can see all the keys in your csv dataframe by printing csv1.keys(), which will like this:

Index([u'id', u' uid', u' score1', u' score2'], dtype='object')

So you have to use ' uid' as merge key or change the first lines in your file1.csv or file2.csv.

P.S. You may look at this question to save a little strip work by hands

edited May 23, 2017 at 12:30

CommunityBot

11 silver badge

answered Dec 16, 2015 at 3:45

YCFlame

1,2991 gold badge16 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jay · Accepted Answer · 2015-12-16 03:27:43Z

0

Instead of dropping the 'id' column in both, can you try

merged = pd.merge(csv1,csv2, on=['id','uid'])

answered Dec 16, 2015 at 3:27

Jay

235 bronze badges

Collectives™ on Stack Overflow

how to write a python script that merges columns in two csv files based on a specific key

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related