1

I am trying to load a JSON file into a DataFrame, I know this question was answered multiple times but I've beein trying all the possible solutions but with no success, here is how my JSON file looks like:

{
"event": {
    "origin": "devicename",
    "module": "",
    "interface": "",
    "component": "",
    "payload": "{\"typeofsensor\" : \"US_distance\",\"distance\": 2}"
}}
{
"event": {
    "origin": "devicename",
    "module": "",
    "interface": "",
    "component": "",
    "payload": "{\"typeofsensor\" : \"mpu6050\",\"accelX\": 0.06, \"accelY\": 0.50, \"accelZ\": -0.88, \"temp\": 25.45}"
}}

What I am trying to do is to extract the information in the "payload" in order to create a DataFrame with columns are typeofsensor and the sensor values since I have diffrent types of sensors.


I tried:

data = []
for line in open('data.JSON', 'r'):
    data.append(json.loads(line))

I got this error:

JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

I also tried:

df = pd.read_json('data.JSON', lines=True)

I got this error:

ValueError                                Traceback (most recent call last)
C:\Users\DEVELO~1\AppData\Local\Temp/ipykernel_204/911564313.py in <module>
----> 1 df = pd.read_json('data.JSON', lines=True)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    205                 else:
    206                     kwargs[new_arg_name] = new_arg_value
--> 207             return func(*args, **kwargs)
    208 
    209         return cast(F, wrapper)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    309                     stacklevel=stacklevel,
    310                 )
--> 311             return func(*args, **kwargs)
    312 
    313         return wrapper

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, encoding_errors, lines, chunksize, compression, nrows, storage_options)
    612 
    613     with json_reader:
--> 614         return json_reader.read()
    615 
    616 

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in read(self)
    744                 data = ensure_str(self.data)
    745                 data_lines = data.split("\n")
--> 746                 obj = self._get_object_parser(self._combine_lines(data_lines))
    747         else:
    748             obj = self._get_object_parser(self.data)

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in _get_object_parser(self, json)
    768         obj = None
    769         if typ == "frame":
--> 770             obj = FrameParser(json, **kwargs).parse()
    771 
    772         if typ == "series" or obj is None:

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in parse(self)
    883 
    884         else:
--> 885             self._parse_no_numpy()
    886 
    887         if self.obj is None:

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\pandas\io\json\_json.py in _parse_no_numpy(self)
   1138         if orient == "columns":
   1139             self.obj = DataFrame(
-> 1140                 loads(json, precise_float=self.precise_float), dtype=None
   1141             )
   1142         elif orient == "split":

ValueError: Expected object or value
10
  • Please add the desired output and at least one of your many code attempts to the question and describe how your code attempt(s) failed to generate the desired output. If your code attempt raises an exception, please add the full traceback to the question. Commented Oct 21, 2021 at 9:47
  • 2
    This is not a valid json file Commented Oct 21, 2021 at 9:47
  • The problem with your input file is duplicate keys and styling errors, solve them and you've been all set to go :) Commented Oct 21, 2021 at 10:10
  • Spark will be able to read this JSON file as it treats each line as a separate JSON object. e.g {...}\n{...} Commented Oct 21, 2021 at 10:12
  • @TomRon Could you please explain more? Commented Oct 21, 2021 at 10:49

1 Answer 1

1

Change your file into a valid format like below (eliminate {} from the middle and change key names to become unique):

{
    "event1": {
        "origin": "devicename",
        "module": "",
        "interface": "",
        "component": "",
        "payload": "{\"typeofsensor\" : \"US_distance\",\"distance\": 2}"
    },

    "event2": {
        "origin": "devicename",
        "module": "",
        "interface": "",
        "component": "",
        "payload": "{\"typeofsensor\" : \"mpu6050\",\"accelX\": 0.06, \"accelY\": 0.50, \"accelZ\": -0.88, \"temp\": 25.45}"
    }
}

easily read the file:

pd.read_json('path to/your_file.json')
#output
                                                   event1                                             event2
origin                                         devicename                                         devicename
module                                                                                                      
interface                                                                                                   
component                                                                                                   
payload    {"typeofsensor" : "US_distance","distance": 2}  {"typeofsensor" : "mpu6050","accelX": 0.06, "a...
Sign up to request clarification or add additional context in comments.

2 Comments

This solution helps only if the JSON file is valid but the file that I have is really big and it's not possible to just modify it manually, is there any automated solution to fix the JSON file? @meti
The only solution that comes into my mind is to read the file and parse it into a valid format using python :), but I advise waiting to see anybody comes up with an automated solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.