parsing json data into csv using pandas

Question

I am trying to convert the a json file into a csv file using the pandas package in Python.

Code being used:

import pandas
json_file = ("/home/joe/Documents/code/facebook/json/message_1.json")
output = pandas.read_json(json_file)
f = open("/home/joe/Documents/code/facebook/csv/test_output.csv", "w+")
f.write(output.to_csv())

Sample json:

{
  "messages": [
    { 
      "sender_name": "Joe P",
      "timestamp_ms": 1576878720049,
      "content": "message 3",
      "type": "Generic"
    },
    { 
      "sender_name": "Joe P",
      "timestamp_ms": 1576878681386,
      "content": "message 2",
      "type": "Generic"
    },
    {
      "sender_name": "Aimee C",
      "timestamp_ms": 1576878665008,
      "content": "message 1",
      "type": "Generic"
    }
  ]
}

i would like the output csv data to be formatted like this:

sender_name |timestamp_ms  |content   |type
Joe P       |1576878720049 |Message 3 |generic
Joe P       |1576878681386 |Message 2 |generic
Aimee C     |1576878665008 |Message 1 |generic

However, the output data looks like this (only 2 columns instead of 4):

    |messages
0   |{'sender_name': 'Joe P', 'timestamp_ms': 1576878720049, 'content': 'message 3', 'type': 'Generic'}
1   |{'sender_name': 'Joe P', 'timestamp_ms': 1576878681386, 'content': 'message 2', 'type': 'Generic'}
2   |{'sender_name': 'Aimee C', 'timestamp_ms': 1576878665008, 'content': 'message 1', 'type': 'Generic'}

I've read through lots of threads related to parsing JSON data with pandas but i can't quite pin down the solution to this.

alkasm · Accepted Answer · 2020-01-02 22:33:56Z

If you have {"key": [v0, v1, ...], ...} structure pandas assumes key is the name of a column, and v0, v1, ... are the values of that column, which is exactly the output you're getting. So, you don't want to pass it a dict of lists.

Instead, you want a list, where each value in the list corresponds to an entire row. This is exactly the structure of the values corresponding to the "messages" key. So if you simply index your JSON with the "messages" key, you'll get an array of rows (dictionaries with column names mapping to values), and you can pass that to Pandas to create a dataframe.

In [87]: import pandas as pd

In [88]: import json

In [89]: sample_json = """
    ...: {
    ...:   "messages": [
    ...:     {
    ...:       "sender_name": "Joe P",
    ...:       "timestamp_ms": 1576878720049,
    ...:       "content": "message 3",
    ...:       "type": "Generic"
    ...:     },
    ...:     {
    ...:       "sender_name": "Joe P",
    ...:       "timestamp_ms": 1576878681386,
    ...:       "content": "message 2",
    ...:       "type": "Generic"
    ...:     },
    ...:     {
    ...:       "sender_name": "Aimee C",
    ...:       "timestamp_ms": 1576878665008,
    ...:       "content": "message 1",
    ...:       "type": "Generic"
    ...:     }
    ...:   ]
    ...: }
    ...: """

In [90]: json_data = json.loads(sample_json)

In [91]: df = pd.DataFrame(json_data["messages"])

In [92]: df
Out[92]:
     content sender_name   timestamp_ms     type
0  message 3       Joe P  1576878720049  Generic
1  message 2       Joe P  1576878681386  Generic
2  message 1     Aimee C  1576878665008  Generic

If your end goal is to just convert from the JSON to a CSV, you don't even need Pandas here at all. You can just use a csv.DictWriter and write the inner dicts directly. For example:

In [95]: import sys

In [96]: import csv

In [97]: writer = csv.DictWriter(sys.stdout, json_data["messages"][0].keys())

In [98]: writer.writeheader()
sender_name,timestamp_ms,content,type

In [99]: writer.writerows(json_data["messages"])
Joe P,1576878720049,message 3,Generic
Joe P,1576878681386,message 2,Generic
Aimee C,1576878665008,message 1,Generic

Collectives™ on Stack Overflow

parsing json data into csv using pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related