Convert Csv to JSON with nested array

Question

I have a CSV file

group, first, last
fans, John, Smith
fans, Alice, White
students, Ben, Smith
students, Joan, Carpenter
...

The Output JSON file needs this format:

[
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
},
{
  "group" : "students",
  "user" : [
    {
      "first" : "Ben",
      "last" :  "Smith"
    },
    {
      "first" : "Joan",
      "last" :  "Carpenter"
    }
  ]
}
]

I mean what is wrong with your code? What language do you use? — taras
– taras, Commented Jul 27, 2018 at 19:51
Sorry, I am using python. The problem for me is how to nest the array. Do I need to create a json file with first and last and ...? The strategy is kind of confuse. I can create a json file with three fields: group, first and last, but how to group first and last with group — FU USF
– FU USF, Commented Jul 27, 2018 at 21:16

jschnurr · Accepted Answer · 2018-07-28 15:31:39Z

1

Short answer
Use itertools.groupby, as described in the documentation.

Long answer
This is a multi-step process.

Start by getting your CSV into a list of dict:

from csv import DictReader
with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

groupby needs sorted data, so define a function to get the key, and pass it in like so:

def keyfunc(x):
    return x['group']

data = sorted(data, key=keyfunc)

Last, call groupby, providing your sorted data and your key function:

from itertools import groupby
groups = []
for k, g in groupby(data, keyfunc):
    groups.append({
        "group": k,
        "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
    })

This will iterate over your data, and every time the key changes, it drops into the for block and executes that code, providing k (the key for that group) and g (the dict objects that belong to it). Here we just store those in a list for later.

In this example, the user key uses some pretty dense comprehensions to remove the group key from every row of user. If you can live with that little bit of extra data, that whole line can be simplified as:

"user": list(g)

The result looks like this:

[
  {
    "group": "fans",
    "user": [
      {
        "first": "John",
        "last": "Smith"
      },
      {
        "first": "Alice",
        "last": "White"
      }
    ]
  },
  {
    "group": "students",
    "user": [
      {
        "first": "Ben",
        "last": "Smith"
      },
      {
        "first": "Joan",
        "last": "Carpenter"
      }
    ]
  }
]

edited Jul 28, 2018 at 15:31

answered Jul 28, 2018 at 2:55

jschnurr

1,1916 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

FU USF Over a year ago

Thanks. And I got NameError: name 'groupby' is not defined. Do you have any idea what is wrong with it?

jschnurr Over a year ago

I missed an import - from itertools import groupby. Fixed now.

FU USF Over a year ago

If I have another column called group ID, how can I group users by both group and group ID?

Collectives™ on Stack Overflow

Convert Csv to JSON with nested array

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related