0

I am trying to read a JSON file using pandas. The JSON file is in this format:

{
    "category": "CRIME", 
    "headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV", 
    "authors": "Melissa Jeltsen", 
    "link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89", "short_description": "She left her husband. He killed their children. Just another day in America.", 
    "date": "2018-05-26"
}
{
    "category": "ENTERTAINMENT", 
    "headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song", 
    "authors": "Andy McDonald", 
    "link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201", 
    "short_description": "Of course, it has a song.", 
    "date": "2018-05-26"
}

However, I get the following error that I don't understand why:

ValueError                                Traceback (most recent call last)
/var/folders/j6/rj901v4j40368zfdw64pbf700000gn/T/ipykernel_11792/4234726591.py in <module>
----> 1 df = pd.read_json('db.json', lines=True)
      2 df.head()

~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

~/opt/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
    610 
    611     with json_reader:
--> 612         return json_reader.read()
    613 
    614 

~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in read(self)
    742                 data = ensure_str(self.data)
    743                 data_lines = data.split("\n")
--> 744                 obj = self._get_object_parser(self._combine_lines(data_lines))
    745         else:
    746             obj = self._get_object_parser(self.data)

~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
    766         obj = None
    767         if typ == "frame":
--> 768             obj = FrameParser(json, **kwargs).parse()
    769 
    770         if typ == "series" or obj is None:

~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in parse(self)
    878             self._parse_numpy()
    879         else:
--> 880             self._parse_no_numpy()
    881 
    882         if self.obj is None:

~/opt/anaconda3/lib/python3.9/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
   1131         if orient == "columns":
   1132             self.obj = DataFrame(
-> 1133                 loads(json, precise_float=self.precise_float), dtype=None
   1134             )
   1135         elif orient == "split":

ValueError: Expected object or value

My code is written as follows:

import pandas as pd

df = read_json('db.json', lines=True)
df.head()

I tried changing the structure of the JSON file as suggested by here but it doesn't work. The error that I get is the same error as the one I have specified above. Is there any other way that i can solve this issue?

1 Answer 1

1

You can wrap it in square brackets [] and add a comma between the dictionaries for valid json.

[{
    "category": "CRIME",
    "headline": "There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV",
    "authors": "Melissa Jeltsen",
    "link": "https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89", "short_description": "She left her husband. He killed their children. Just another day in America.",
    "date": "2018-05-26"
},
{
    "category": "ENTERTAINMENT",
    "headline": "Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song",
    "authors": "Andy McDonald",
    "link": "https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-world-cup-song_us_5b09726fe4b0fdb2aa541201",
    "short_description": "Of course, it has a song.",
    "date": "2018-05-26"
}]

Read file::

import pandas as pd


df = pd.read_json("/path/to/file/db.json")
print(df)

Output:

        category                                                                     headline          authors                                                                                                 link                                                             short_description       date
0          CRIME             There Were 2 Mass Shootings In Texas Last Week, But Only 1 On TV  Melissa Jeltsen  https://www.huffingtonpost.com/entry/texas-amanda-painter-mass-shooting_us_5b081ab4e4b0802d69caad89  She left her husband. He killed their children. Just another day in America. 2018-05-26
1  ENTERTAINMENT  Will Smith Joins Diplo And Nicky Jam For The 2018 World Cup's Official Song    Andy McDonald  https://www.huffingtonpost.com/entry/will-smith-joins-diplo-and-nicky-jam-for-the-official-2018-...                                                     Of course, it has a song. 2018-05-26
Sign up to request clarification or add additional context in comments.

2 Comments

I have tried this but it still gives me the same error.
you were also missing pd.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.