combine/merge two csv using pandas/python

Question

I have two csvs, I want to combine or merge these csvs as left join... my key column is "id", I have same non-key column as "result" in both csvs, but I want to override "result" column if any value exists in "result" column of 2nd CSV . How can I achieve that using pandas or any scripting lang. Please see my final expected output.

Input

input.csv:

id,scenario,data1,data2,result
1,s1,300,400,"{s1,not added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,

output.csv:

id,result
1,"{s1,added}"
3,"{s3,added}"

Expected Output

final_output.csv

id,scenario,data1,data2,result
1,s1,300,400,"{s1,added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,"{s3,added}"

Current Code:

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='test_id',how='left')
merged.to_csv("final_output.csv", index=False)

Question:

Using this code I am getting the result column twice. I want only once and it should override if value exists in that column. How do I get a single result column?

You want this in python or a code in awk is acceptable?

Inian
– Inian

2017-01-16 04:42:44 +00:00
Commented Jan 16, 2017 at 4:42 — Inian
– Inian, Commented Jan 16, 2017 at 4:42

Mahesh · Accepted Answer · 2017-01-16 11:52:56Z

2

try this, this works as well

import pandas as pd
import numpy as np
c=pd.merge(a,b,on='id',how='left')
lst=[]
for i in c.index:
    if(c.iloc[i]['result_x']!=''):
         lst.append(c.iloc[i]['result_x'])
    else:
         lst.append(c.iloc[i]['result_y'])
c['result']=pd.Series(lst)
del c['result_x']
del c['result_y']

answered Jan 16, 2017 at 11:52

Mahesh

1418 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Madhu Over a year ago

Hey @Mahesh, I modified a code little bit... As I wanted all the data from right csv "result column" if exists.

for i in c.index:      if(pd.isnull(c.iloc[i]['result_y'])):           lst.append(c.iloc[i]['result_x'])         else:          lst.append(c.iloc[i]['result_y'])

Mahesh Over a year ago

@Madhura Mhatre ok...you got the desired result right?

Stephen Rauch · Accepted Answer · 2017-01-16 05:11:06Z

1

This will combine the columns as desired:

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='id', how='outer')

def merge_results(row):
    y = row['result_y']
    return row['result_x'] if isinstance(y, float) else y

merged['result'] = merged.apply(merge_results, axis=1)
del merged['result_x']
del merged['result_y']

merged.to_csv("final_output.csv", index=False)

answered Jan 16, 2017 at 5:11

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

Comments

Legolas · Accepted Answer · 2019-05-15 08:34:53Z

1

You can also use concat as below.

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
frames=[a,b]
mergedFrames=pd.DataFrame()
mergedFrames=pd.concat(frames, sort=True)
mergedFrames.to_csv(path/to/location)

NOTE: The sort=True is added to avoid some warnings

answered May 15, 2019 at 8:34

Legolas

7931 gold badge9 silver badges20 bronze badges

Collectives™ on Stack Overflow

combine/merge two csv using pandas/python

Input

Expected Output

Current Code:

Question:

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Input

Expected Output

Current Code:

Question:

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related