5

I have two csvs, I want to combine or merge these csvs as left join... my key column is "id", I have same non-key column as "result" in both csvs, but I want to override "result" column if any value exists in "result" column of 2nd CSV . How can I achieve that using pandas or any scripting lang. Please see my final expected output.

Input

input.csv:

id,scenario,data1,data2,result
1,s1,300,400,"{s1,not added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,

output.csv:

id,result
1,"{s1,added}"
3,"{s3,added}"

Expected Output

final_output.csv

id,scenario,data1,data2,result
1,s1,300,400,"{s1,added}"
2,s2,500,101,"{s2 added}"
3,s3,600,202,"{s3,added}"

Current Code:

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='test_id',how='left')
merged.to_csv("final_output.csv", index=False)

Question:

Using this code I am getting the result column twice. I want only once and it should override if value exists in that column. How do I get a single result column?

1
  • You want this in python or a code in awk is acceptable? Commented Jan 16, 2017 at 4:42

3 Answers 3

2

try this, this works as well

import pandas as pd
import numpy as np
c=pd.merge(a,b,on='id',how='left')
lst=[]
for i in c.index:
    if(c.iloc[i]['result_x']!=''):
         lst.append(c.iloc[i]['result_x'])
    else:
         lst.append(c.iloc[i]['result_y'])
c['result']=pd.Series(lst)
del c['result_x']
del c['result_y']
Sign up to request clarification or add additional context in comments.

2 Comments

Hey @Mahesh, I modified a code little bit... As I wanted all the data from right csv "result column" if exists. for i in c.index: if(pd.isnull(c.iloc[i]['result_y'])): lst.append(c.iloc[i]['result_x']) else: lst.append(c.iloc[i]['result_y'])
@Madhura Mhatre ok...you got the desired result right?
1

This will combine the columns as desired:

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
merged = a.merge(b, on='id', how='outer')

def merge_results(row):
    y = row['result_y']
    return row['result_x'] if isinstance(y, float) else y

merged['result'] = merged.apply(merge_results, axis=1)
del merged['result_x']
del merged['result_y']

merged.to_csv("final_output.csv", index=False)

Comments

1

You can also use concat as below.

import pandas as pd

a = pd.read_csv("input.csv")
b = pd.read_csv("output.csv")
frames=[a,b]
mergedFrames=pd.DataFrame()
mergedFrames=pd.concat(frames, sort=True)
mergedFrames.to_csv(path/to/location)

NOTE: The sort=True is added to avoid some warnings

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.