1

unable to load read json data

Code:

import json
import pymysql
import os

with open("C:/feed/datafile2021-02-23_15.json", "r") as myfile:
   

 data=myfile.read()

obj = json.loads(data)
ac_obj=obj["data"]
print(ac_obj)

json Data:

{
  "query_status": "ok",
  "data": [
    {
      "sha256_hash": "8291db6ed7f2be2e014d6ad586a2fa2021c6f59334416e1042ed88edea137d0b",
      "sha3_384_hash": "d684410f118253b96c5799aa44e3e2ef1d9ef9728ee6ee13cd6c076368f5a9de8189779a0214e26398f4e48915284013",
      "sha1_hash": "20590d6000caf00092b80e3bfc740c492e2f7e50",
      "md5_hash": "ff26664f179c0a189471183aa87e3c4a",
      "first_seen": "2021-02-23 08:46:04",
      "last_seen": null,
      "file_name": "SecuriteInfo.com.Variant.Zusy.368685.25618.6070",
      "file_size": 2684128,
      "file_type_mime": "application/x-dosexec",
      "file_type": "exe",
      "reporter": "SecuriteInfoCom",
      "origin_country": "FR",
      "anonymous": 0,
      "signature": null,
      "imphash": "4328f7206db519cd4e82283211d98e83",
      "tlsh": "34C533817B3D457AE4E3C93293F3F61E4FB4920C956869FB5B79C1203DA9F0250A924B",
      "ssdeep": "49152:z+cw5wkXuOdHLP+0ZSq5DhWNA0ZriICJEz3eUdWot2K0pHcAZaRMEZpk:KcWwG1dH60tj0ZWIyEz3tWotZ2HcA7Ek",
      "tags": null,
      "code_sign": [],
      "intelligence": {
        "clamav": null,
        "downloads": "20",
        "uploads": "1",
        "mail": null
      }
    },
    {
      "sha256_hash": "3d3112ce7c1a80e0378b15c7084b1b49a9805a5e47a85a97acdd7841d0a9b40b",
      "sha3_384_hash": "1ff30f891e4b6eb421a5181373943bf23cc8633f66cc20265450ffc255047aae308344a71a74d1794b14323c41c4276b",
      "sha1_hash": "b24be163878f851e0b9bc5da8967879d5ff3d846",
      "md5_hash": "e48ba1147b75508b7f58cace584373cb",
      "first_seen": "2021-02-23 08:45:59",
      "last_seen": null,
      "file_name": "SecuriteInfo.com.Trojan.GenericKDZ.73123.31244.15546",
      "file_size": 555008,
      "file_type_mime": "application/x-dosexec",
      "file_type": "exe",
      "reporter": "SecuriteInfoCom",
      "origin_country": "US",
      "anonymous": 0,
      "signature": null,
      "imphash": "71b77d57e8aec8db116eba9e387ce755",
      "tlsh": "79C4D010BBF1D035F6B266F4497992A5A93ABD717B3480CF53C626DA1A386E09C31723",
      "ssdeep": "12288:it0DzYl40RFrFMFf7CphShPDd+ByKZz+RfCJP079dwkkV46D25sT6fVU:imDItRFZMIphShZzsyCJP0pcV46DusT7",
      "tags": null,
      "code_sign": [],
      "intelligence": {
        "clamav": null,
        "downloads": "15",
        "uploads": "1",
        "mail": null
      }
    }
  ]
}

Error log:

Traceback (most recent call last):
  File ".\sampletest.py", line 10, in <module>
    obj = json.loads(data)
  File "C:\Program Files\Python38\lib\json\__init__.py", line 357, in loads    
    return _default_decoder.decode(s)
  File "C:\Program Files\Python38\lib\json\decoder.py", line 337, in decode    
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\Python38\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) 
4
  • 3
    Check your file contents. Print out data to make sure it's what you expect it to be Commented Feb 23, 2021 at 10:32
  • The error indicates an empty file. Commented Feb 23, 2021 at 10:33
  • @rdas when i print the data geeting like this ÿþ{ "query_status": "ok", "data": [ { "sha256_hash": "8291db6ed7f2be2e014d6ad586a2fa2021c6f59334416e1042ed88edea137d0b", "md5_hash": "ff26664f179c0a189471183aa87e3c4a", "first_seen": "2021-02-23 08:46:04", "last_seen": null, "file_name": "SecuriteInfo.com.Variant.Zusy.368685.25618.6070", "file_size": 2684128, "mail": null } } after passing data to loads function not getting data obj = json.loads(data) Commented Feb 23, 2021 at 10:39
  • @yugandarA: Remove ÿþ in the json file. Commented Feb 23, 2021 at 11:03

1 Answer 1

1

You don't need to read the file then json.loads(), you can just use json.load() and that data works fine for me:

import json

def read_json(path):
    with open(path, 'r') as file:
        return json.load(file)

data = read_json('data.json')

# pretty print the output
from pprint import pprint
pprint(data)

Output:

{'data': [{'anonymous': 0,
           'code_sign': [],
           'file_name': 'SecuriteInfo.com.Variant.Zusy.368685.25618.6070',      
           'file_size': 2684128,
           'file_type': 'exe',
           'file_type_mime': 'application/x-dosexec',
           'first_seen': '2021-02-23 08:46:04',
           'imphash': '4328f7206db519cd4e82283211d98e83',
           'intelligence': {'clamav': None,
                            'downloads': '20',
                            'mail': None,
                            'uploads': '1'},
           'last_seen': None,
           'md5_hash': 'ff26664f179c0a189471183aa87e3c4a',
           'origin_country': 'FR',
           'reporter': 'SecuriteInfoCom',
           'sha1_hash': '20590d6000caf00092b80e3bfc740c492e2f7e50',
           'sha256_hash': '8291db6ed7f2be2e014d6ad586a2fa2021c6f59334416e1042ed88edea137d0b',
           'sha3_384_hash': 'd684410f118253b96c5799aa44e3e2ef1d9ef9728ee6ee13cd6c076368f5a9de8189779a0214e26398f4e48915284013',
           'signature': None,
           'ssdeep': '49152:z+cw5wkXuOdHLP+0ZSq5DhWNA0ZriICJEz3eUdWot2K0pHcAZaRMEZpk:KcWwG1dH60tj0ZWIyEz3tWotZ2HcA7Ek',
           'tags': None,
           'tlsh': '34C533817B3D457AE4E3C93293F3F61E4FB4920C956869FB5B79C1203DA9F0250A924B'},
          {'anonymous': 0,
           'code_sign': [],
           'file_name': 'SecuriteInfo.com.Trojan.GenericKDZ.73123.31244.15546', 
           'file_size': 555008,
           'file_type': 'exe',
           'file_type_mime': 'application/x-dosexec',
           'first_seen': '2021-02-23 08:45:59',
           'imphash': '71b77d57e8aec8db116eba9e387ce755',
           'intelligence': {'clamav': None,
                            'downloads': '15',
                            'mail': None,
                            'uploads': '1'},
           'last_seen': None,
           'md5_hash': 'e48ba1147b75508b7f58cace584373cb',
           'origin_country': 'US',
           'reporter': 'SecuriteInfoCom',
           'sha1_hash': 'b24be163878f851e0b9bc5da8967879d5ff3d846',
           'sha256_hash': '3d3112ce7c1a80e0378b15c7084b1b49a9805a5e47a85a97acdd7841d0a9b40b',
           'sha3_384_hash': '1ff30f891e4b6eb421a5181373943bf23cc8633f66cc20265450ffc255047aae308344a71a74d1794b14323c41c4276b',
           'signature': None,
           'ssdeep': '12288:it0DzYl40RFrFMFf7CphShPDd+ByKZz+RfCJP079dwkkV46D25sT6fVU:imDItRFZMIphShZzsyCJP0pcV46DusT7',
           'tags': None,
           'tlsh': '79C4D010BBF1D035F6B266F4497992A5A93ABD717B3480CF53C626DA1A386E09C31723'}],
 'query_status': 'ok'}
Sign up to request clarification or add additional context in comments.

4 Comments

@GoldenLion that's what is done here, just in a reusable function as not to introduce spaghetti into the code.
the json string can be assigned directly to a dataframe. pd.DataFrame(myjson). If the json is hierarchial than use json_normalize to flatten out the data then move it to a dataframe.
the question didn't mention pandas, it is an extra package to install and IMO pandas can be a bit overkill in many applications and too many people rely on it for all their ETL nowadays
pandas has some overhead but most of the visualization can be done with it. Pandas is a desired end goal.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.