3

I need some help parsing JSON file. I've tried a couple of different ways to get the data I need. Below is a sample of the code and also a section of the JSON data but when I run the code I get the error listed above.

There's 500K lines of text in the JSON and it first fails about about 1400 lines in and I can't see anything in that area section to indicate why.

I've run it successfully by only checking blocks of JSON up to the first 1400 lines and I've used a different parser and got the same error.

I'm debating if it's an error in the code, an error in the JSON or a result of the JSON being made of different kids of data as some (like the example below) is for a forklift and others for fixed machines but it is all structured just like below.

All help sincerely appreciate.

Code:

import json

file_list = ['filename.txt'] #insert filename(s) here

for x in range(len(file_list)):

    with open(file_list[x], 'r') as f:
        distros_dict = json.load(f)

#list the headlines to be parsed
for distro in distros_dict:
    print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])

And here is a section of the JSON:

{
    "id": "b4994c877c9c",
    "name": "Trukki_0001",
    "areaId": "Tracking001",
    "areaName": "Ajoneuvo",
    "color": "#FF0000",
    "coordinateSystemId": "CoordSys001",
    "coordinateSystemName": null,
    "covarianceMatrix": [
        0.47,
        0.06,
        0.06,
        0.61
    ],
    "position": [
        33.86,
        33.07,
        2.15
    ],
    "positionAccuracy": 0.36,
    "positionTS": 1489363199493,
    "smoothedPosition": [
        33.96,
        33.13,
        2.15
    ],
    "zones": [
        {
            "id": "Zone001",
            "name": "Halli1"
        }
    ],
    "direction": [
        0,
        0,
        0
    ],
    "collisionId": null,
    "restrictedArea": "",
    "tagType": "VEHICLE_MANNED",
    "drivenVehicleId": null,
    "drivenByEmployeeIds": null,
    "simpleXY": "33|33",
    "EventProcessedUtcTime": "2017-03-13T00:00:00.3175072Z",
    "PartitionId": 1,
    "EventEnqueuedUtcTime": "2017-03-13T00:00:00.0470000Z"
}
6
  • *appreciated... Commented May 16, 2018 at 13:01
  • Which line in code does trigger the error and what is the exact error message? If it is the json.load(f) line, what are the first lines of the file? Commented May 16, 2018 at 13:02
  • To rule out malformed JSON: can you put your entire JSON through a validator, e.g. jsonlint.com and see if it comes out as valid? If not, maybe the validator can point you in the right direction. Commented May 16, 2018 at 13:03
  • The JSON you've posted seems perfectly fine and I'd bet it will parse as such in Python without producing the error in the question's title. On the other hand, I'm pretty sure it will break once you get to the for .. loop as your distros_dict is actually the object itself instead of a list of parsed JSONs so it will iterate over its keys. Commented May 16, 2018 at 13:06
  • One more question: is your JSON one huge object, or is it an array with multiple objects? Commented May 16, 2018 at 13:10

3 Answers 3

4

The actual problem was that the JSON file was coded in UTF not ASCII. If you change the encoding using something like notepad++ then it will be solved.

Sign up to request clarification or add additional context in comments.

1 Comment

Worked for me, I changed UTF-16 LE encoding to UTF-8. In VSCode, one can use this: stackoverflow.com/a/40365121/3799680
1

Using the file provided I got it to work by changing "distros_dict" to a list. In you code you assign distros_dict not add to it, so if more than 1 file were to be read it would assign it to the last one.

This is my implementation

import json

file_list = ['filename.txt'] #insert filename(s) here
distros_list = []

for x in range(len(file_list)):
 with open(file_list[x], 'r') as f:
        distros_list.append(json.load(f))

#list the headlines to be parsed
for distro in distros_list:
    print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])

You will be left with a list of dictionaries

4 Comments

Many thanks, sincerely, but I get an error referrign to line 8 of the code distros_list.append(json.load(f)) referring to line 299 in load, 354 in loads, 339 in decode and 357 in raw_decode ending with: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Any ideas?
@exmatelote from what I can see it sounds like a error in your JSON. I would recommend running it through some JSON formatter or error checker as said in other replies
@examtelote if it is possible could you post all your JSON so I can have a look
It seems the error was indeed in the JSON. It was in UTF and when I encoded it to ANSI it worked just fine.
0

I'm guessing that your JSON is actually a list of objects, i.e. the whole stream looks like:

[
    { x:1, y:2 },
    { x:3, y:4 },
    ...
]

... with each element being structured like the section you provided above. This is perfectly valid JSON, and if I store it in a file named file.txt and paste your snippet between a set of [ ], thus making it a list, I can parse it in Python. Note, however, that the result will be again a Python list, not a dict, so you'd iterate like this over each list-item:

import json
import pprint

file_list = ['file.txt']

# Just iterate over the file-list like this, no need for range()
for x in file_list:

    with open(x, 'r') as f:
        # distros is a list!
        distros = json.load(f)

    for distro in distros:
        print(distro['name'])
        print(distro['positionTS'])
        print(distro['smoothedPosition'][1])

        pprint.pprint(distro)

Edit: I moved the second for-loop into the loop over the files. This seems to make more sense, as otherwise you'll iterate once over all files, store the last one in distros, then print elements only from the last one. By nesting the loops, you'll iterate over all files, and for each file iterate over all elements in the list. Hat-tip to the commenters for pointing this out!

2 Comments

Hi. I'll see about following up the ideas you kindly gave. However, I still get an error, which is referring to line 8 of the code distros_list.append(json.load(f)) referring to line 299 in load, 354 in loads, 339 in decode and 357 in raw_decode ending with: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) Any ideas?
Turns out it was cos the JSON was in a UTC, not ASCI format... annoying.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.