0

I am trying to convert the a json file into a csv file using the pandas package in Python.

Code being used:

import pandas
json_file = ("/home/joe/Documents/code/facebook/json/message_1.json")
output = pandas.read_json(json_file)
f = open("/home/joe/Documents/code/facebook/csv/test_output.csv", "w+")
f.write(output.to_csv())

Sample json:

{
  "messages": [
    { 
      "sender_name": "Joe P",
      "timestamp_ms": 1576878720049,
      "content": "message 3",
      "type": "Generic"
    },
    { 
      "sender_name": "Joe P",
      "timestamp_ms": 1576878681386,
      "content": "message 2",
      "type": "Generic"
    },
    {
      "sender_name": "Aimee C",
      "timestamp_ms": 1576878665008,
      "content": "message 1",
      "type": "Generic"
    }
  ]
}

i would like the output csv data to be formatted like this:

sender_name |timestamp_ms  |content   |type
Joe P       |1576878720049 |Message 3 |generic
Joe P       |1576878681386 |Message 2 |generic
Aimee C     |1576878665008 |Message 1 |generic

However, the output data looks like this (only 2 columns instead of 4):

    |messages
0   |{'sender_name': 'Joe P', 'timestamp_ms': 1576878720049, 'content': 'message 3', 'type': 'Generic'}
1   |{'sender_name': 'Joe P', 'timestamp_ms': 1576878681386, 'content': 'message 2', 'type': 'Generic'}
2   |{'sender_name': 'Aimee C', 'timestamp_ms': 1576878665008, 'content': 'message 1', 'type': 'Generic'}

I've read through lots of threads related to parsing JSON data with pandas but i can't quite pin down the solution to this.

1 Answer 1

1

If you have {"key": [v0, v1, ...], ...} structure pandas assumes key is the name of a column, and v0, v1, ... are the values of that column, which is exactly the output you're getting. So, you don't want to pass it a dict of lists.

Instead, you want a list, where each value in the list corresponds to an entire row. This is exactly the structure of the values corresponding to the "messages" key. So if you simply index your JSON with the "messages" key, you'll get an array of rows (dictionaries with column names mapping to values), and you can pass that to Pandas to create a dataframe.

In [87]: import pandas as pd

In [88]: import json

In [89]: sample_json = """
    ...: {
    ...:   "messages": [
    ...:     {
    ...:       "sender_name": "Joe P",
    ...:       "timestamp_ms": 1576878720049,
    ...:       "content": "message 3",
    ...:       "type": "Generic"
    ...:     },
    ...:     {
    ...:       "sender_name": "Joe P",
    ...:       "timestamp_ms": 1576878681386,
    ...:       "content": "message 2",
    ...:       "type": "Generic"
    ...:     },
    ...:     {
    ...:       "sender_name": "Aimee C",
    ...:       "timestamp_ms": 1576878665008,
    ...:       "content": "message 1",
    ...:       "type": "Generic"
    ...:     }
    ...:   ]
    ...: }
    ...: """

In [90]: json_data = json.loads(sample_json)

In [91]: df = pd.DataFrame(json_data["messages"])

In [92]: df
Out[92]:
     content sender_name   timestamp_ms     type
0  message 3       Joe P  1576878720049  Generic
1  message 2       Joe P  1576878681386  Generic
2  message 1     Aimee C  1576878665008  Generic

If your end goal is to just convert from the JSON to a CSV, you don't even need Pandas here at all. You can just use a csv.DictWriter and write the inner dicts directly. For example:

In [95]: import sys

In [96]: import csv

In [97]: writer = csv.DictWriter(sys.stdout, json_data["messages"][0].keys())

In [98]: writer.writeheader()
sender_name,timestamp_ms,content,type

In [99]: writer.writerows(json_data["messages"])
Joe P,1576878720049,message 3,Generic
Joe P,1576878681386,message 2,Generic
Aimee C,1576878665008,message 1,Generic
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.