Merge two csv files with custom columns

Question

I have two csv files: test1

test2

A   C   D   B
3   x   25  101
2   y   0.35    11
1   z   0.45    111
6   k   0.55    1101
7   l   0.65    1010

I want to merge them on = A but i need only the columns test1.A,B and test2.D, B in the final file. And since both these files have a same column name B, need to rename it while joining itself. The output file should be like this:

A   B   D       B1
1   a   0.45    1110
2   b   0.35    1010
3   c   25      1011
4   d       
5   e   0.55    
6       0.65    1000
7               1111

where B1 corresponds to column B in Table2. Both B columns in test1 and test2 are not same.

import pandas

csv1 = pandas.read_csv('test1.csv',dtype='unicode')
csv2 = pandas.read_csv('test2.csv',dtype='unicode')
merged = pandas.merge(csv1[list('AB')],csv2[list('DB')], on='A',how="outer")
merged.to_csv("outputtest.csv", index=False)

This is giving me error:

KeyError: "['B'] not in index"

jezrael · Accepted Answer · 2016-06-25 06:14:29Z

1

You can drop column C in csv2 and then merge with parameter suffixes and last fillna by empty string:

merged = pd.merge(csv1,
                  csv2.drop('C', axis=1), 
                  on='A',
                  how="outer", 
                  suffixes=('','1')).fillna('')
print (merged)
     A  B     D    B1
0  1.0  a  0.45   111
1  2.0  b  0.35    11
2  3.0  c    25   101
3  4.0  d            
4  5.0  e            
5  6.0     0.55  1101
6  7.0     0.65  1010

If in csv is many columns, you can use subset - only columns which need and column for join - in this solution column A:

merged = pd.merge(csv1[['A','B']],
                  csv2[['A','D','B']], 
                  on='A',
                  how="outer", 
                  suffixes=('','1')).fillna('')
print (merged)
     A  B     D    B1
0  1.0  a  0.45   111
1  2.0  b  0.35    11
2  3.0  c    25   101
3  4.0  d            
4  5.0  e            
5  6.0     0.55  1101
6  7.0     0.65  1010

Or:

merged = pd.merge(csv1[list('AB')],
                  csv2[list('ADB')], 
                  on='A',
                  how="outer", 
                  suffixes=('','1')).fillna('')
print (merged)
     A  B     D    B1
0  1.0  a  0.45   111
1  2.0  b  0.35    11
2  3.0  c    25   101
3  4.0  d            
4  5.0  e            
5  6.0     0.55  1101
6  7.0     0.65  1010

edited Jun 25, 2016 at 6:14

answered Jun 25, 2016 at 6:08

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

jezrael Over a year ago

I think this error means you have maybe some space in column names - try check it by print csv1.columns.

Diganta Bharali Over a year ago

Instead of suffix, if we need to rename the column name altogether say make it 'BiNumeral' instead of B, what would be the approach. Should i change the names while reading csv2 or is there a shorter way?

jezrael Over a year ago

IIUC you can use df.rename(columns={'B':'BiNumeral'}, inplace=True)

Diganta Bharali Over a year ago

yup. I did something like that.

import pandas  csv1 = pandas.read_csv('test1.csv',dtype='unicode') csv2 = pandas.read_csv('test2.csv',dtype='unicode') csv2 = csv2.rename(columns={'C':'Binomial'}) merged = pandas.merge(csv1,csv2[['A','Binomial','D']], on='A') merged.to_csv("outputtest.csv", index=False)

Diganta Bharali Over a year ago

One last doubt, if i need the output to give only the rows of test1, which join do i need to perform. If i do an inner join it gives only the rows which have same id., i.e it gives me rows where A=1,2,3 but i want rows with A=1,2,3,4,5

|

Collectives™ on Stack Overflow

Merge two csv files with custom columns

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related