0

I have a CSV file

group, first, last
fans, John, Smith
fans, Alice, White
students, Ben, Smith
students, Joan, Carpenter
...

The Output JSON file needs this format:

[
{
  "group" : "fans",
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
},
{
  "group" : "students",
  "user" : [
    {
      "first" : "Ben",
      "last" :  "Smith"
    },
    {
      "first" : "Joan",
      "last" :  "Carpenter"
    }
  ]
}
]
5
  • Sorry, the csv file with 3 columns, group, first, last Commented Jul 27, 2018 at 18:34
  • So, what is your problem? Commented Jul 27, 2018 at 19:40
  • How to convert this csv to the json file with nested array Commented Jul 27, 2018 at 19:49
  • 1
    I mean what is wrong with your code? What language do you use? Commented Jul 27, 2018 at 19:51
  • Sorry, I am using python. The problem for me is how to nest the array. Do I need to create a json file with first and last and ...? The strategy is kind of confuse. I can create a json file with three fields: group, first and last, but how to group first and last with group Commented Jul 27, 2018 at 21:16

1 Answer 1

1

Short answer
Use itertools.groupby, as described in the documentation.

Long answer
This is a multi-step process.

Start by getting your CSV into a list of dict:

from csv import DictReader
with open('data.csv') as csvfile:
    r = DictReader(csvfile, skipinitialspace=True)
    data = [dict(d) for d in r]

groupby needs sorted data, so define a function to get the key, and pass it in like so:

def keyfunc(x):
    return x['group']

data = sorted(data, key=keyfunc)

Last, call groupby, providing your sorted data and your key function:

from itertools import groupby
groups = []
for k, g in groupby(data, keyfunc):
    groups.append({
        "group": k,
        "user": [{k:v for k, v in d.items() if k != 'group'} for d in list(g)]
    })

This will iterate over your data, and every time the key changes, it drops into the for block and executes that code, providing k (the key for that group) and g (the dict objects that belong to it). Here we just store those in a list for later.

In this example, the user key uses some pretty dense comprehensions to remove the group key from every row of user. If you can live with that little bit of extra data, that whole line can be simplified as:

"user": list(g)

The result looks like this:

[
  {
    "group": "fans",
    "user": [
      {
        "first": "John",
        "last": "Smith"
      },
      {
        "first": "Alice",
        "last": "White"
      }
    ]
  },
  {
    "group": "students",
    "user": [
      {
        "first": "Ben",
        "last": "Smith"
      },
      {
        "first": "Joan",
        "last": "Carpenter"
      }
    ]
  }
]
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. And I got NameError: name 'groupby' is not defined. Do you have any idea what is wrong with it?
I missed an import - from itertools import groupby. Fixed now.
If I have another column called group ID, how can I group users by both group and group ID?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.