I need to convert csv to json file format but output not getting as expected. Line1, Line2 and etc getting repeated in json output. I need to remove those repeated part.
Input data
7,priya,kannan,[email protected],07-12-1994,"123","456",67,mdu,tn,india
7,priya,kannan,[email protected],07-12-1994,"123","456",67,mdu,tn,india
Expected output
[ {
"source_id": 7,
"fname": "priya",
"lname": "kannan",
"date_of_birth": "07-12-1994",
"email": ["[email protected]", "[email protected]"],
"address": [{
"line1": 123,
"line2": 456,
"line3": 67,
"city": "mdu",
"state": "tn",
"country": "india"
}]
}]
Output getting
[ {
"source_id": 7,
"fname": "priya",
"lname": "kannan",
"date_of_birth": "07-12-1994",
"email": ["[email protected]", "[email protected]"],
"address": [{
"line1": 123,
"line2": 456,
"line3": 67,
"city": "mdu",
"state": "tn",
"country": "india"
}, {
"line1": 123,
"line2": 456,
"line3": 67,
"city": "mdu",
"state": "tn",
"country": "india"
}]
}]
Code tried
g_cols = ['source_id', 'fname', 'lname', 'email', 'date_of_birth']
df = pd.read_csv(path, sep=",", header=0)
cols = df.columns[~df.columns.isin(g_cols)]
g_cols.remove('email')
df = (df.sort_values(g_cols)
.set_index(g_cols)
.assign(email=df.groupby(g_cols)['email'].agg(lambda x: tuple(pd.unique(x))))
.reset_index())
g_cols.append('email')
df1 = df.groupby(g_cols)[cols].apply(lambda x: x.to_dict('records')).reset_index(name='address').to_dict('record')
print(df1)
df2 = pd.DataFrame(df1)