Converting csv to json file format but not getting expected output

Question

I need to convert csv to json file format but output not getting as expected. Line1, Line2 and etc getting repeated in json output. I need to remove those repeated part.

Input data

7,priya,kannan,[email protected],07-12-1994,"123","456",67,mdu,tn,india
7,priya,kannan,[email protected],07-12-1994,"123","456",67,mdu,tn,india

Expected output

[ {
    "source_id": 7,
    "fname": "priya",
    "lname": "kannan",
    "date_of_birth": "07-12-1994",
    "email": ["[email protected]", "[email protected]"],
    "address": [{
        "line1": 123,
        "line2": 456,
        "line3": 67,
        "city": "mdu",
        "state": "tn",
        "country": "india"
    }]
}]

Output getting

[ {
    "source_id": 7,
    "fname": "priya",
    "lname": "kannan",
    "date_of_birth": "07-12-1994",
    "email": ["[email protected]", "[email protected]"],
    "address": [{
        "line1": 123,
        "line2": 456,
        "line3": 67,
        "city": "mdu",
        "state": "tn",
        "country": "india"
    }, {
        "line1": 123,
        "line2": 456,
        "line3": 67,
        "city": "mdu",
        "state": "tn",
        "country": "india"
    }]
}]

Code tried

g_cols = ['source_id', 'fname', 'lname', 'email', 'date_of_birth']
df = pd.read_csv(path, sep=",", header=0)

cols = df.columns[~df.columns.isin(g_cols)]
g_cols.remove('email')

df = (df.sort_values(g_cols)
      .set_index(g_cols)
      .assign(email=df.groupby(g_cols)['email'].agg(lambda x: tuple(pd.unique(x))))
      .reset_index())

g_cols.append('email')
df1 = df.groupby(g_cols)[cols].apply(lambda x: x.to_dict('records')).reset_index(name='address').to_dict('record')
print(df1)
df2 = pd.DataFrame(df1)

Anurag Dabas · Accepted Answer · 2021-08-10 10:54:37Z

1

In This step use drop_duplicates() method:

df1 = df.drop_duplicates().groupby(g_cols)[cols].apply(lambda x: x.to_dict('records')).reset_index(name='address').to_dict('record')

output of df1:

[{'source_id': 7,
  'fname': 'priya',
  'lname': 'kannan',
  'date_of_birth': '07-12-1994',
  'email': ('[email protected]', '[email protected]'),
  'address': [{'ln1': 123,
    'ln2': 456,
    'ln3': 67,
    'cty': 'mdu',
    'state': 'tn',
    'cntry': 'india'}]}]

answered Aug 10, 2021 at 10:54

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Naveen Over a year ago

Can you please help me on this. stackoverflow.com/questions/68921025/…

Naveen · Accepted Answer · 2021-08-10 10:58:10Z

0

g_cols = ['source_id', 'fname', 'lname', 'email', 'date_of_birth']
df = pd.read_csv(path, sep=",", header=0)

cols = df.columns[~df.columns.isin(g_cols)]
g_cols.remove('email')

df = (df.sort_values(g_cols)
      .set_index(g_cols)
      .assign(email=df.groupby(g_cols)['email'].agg(lambda x: tuple(pd.unique(x))))
      .reset_index())

g_cols.append('email')
df1 = df.drop_duplicates().groupby(g_cols)[cols].apply(lambda x: x.to_dict('records')).reset_index(name='address').to_dict('record')
print(df1)
df2 = pd.DataFrame(df1)

answered Aug 10, 2021 at 10:58

Naveen

8114 bronze badges

Collectives™ on Stack Overflow

Converting csv to json file format but not getting expected output

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related