0

I have a nested json file which I read with python. I am interested only in specfic information. Suppose I have following json file

  {
    "metadata_id": "3596fe93-5e4d-4ba9-8b10-c7ce28b18475",
    "kistler": {
      "metadata_id": "c2faa0df-e07a-40bb-91db-ac95281e0bfb",
      "actualValues": {
        "Zykluszaehler": {
          "name": "Zykluszaehler",
          "unit": "-",
          "value": 196.0,
          "type": 2
        },
        "Zykluszeit": {
          "name": "Zykluszeit",
          "unit": "s",
          "value": 5082.0,
          "type": 2
        },
        "Parameter 1": {
          "name": "Parameter 1",
          "unit": "s",
          "value": 0.0,
          "type": 1
        },
        "Parameter 2": {
          "name": "Parameter 2",
          "unit": "Schalter",
          "value": 0.0,
          "type": 1
        }
      },
      "mdeValues": {
        "Status": "0",
        "Vollautomatik": "1",
        "Manuell": "0",
        "Maschinenstatus": "4",
        "Maschinenstillstand": "0",
        "ManuelleBuchung": "",
        "ManuellerStatus": "0GM",
        "Alarme": "",
        "Auftrag": "NONE",
        "Artikel": "NONE",
        "Werkzeug": "NONE",
        "Charge": "NONE",
        "Zykluszaehler": "196",
        "Anfahrausschuss": "",
        "Aussschusszaehler": "-1",
        "GutteileZaehler": "-1",
        "Kavitaeten": "1",
        "Sollstueck": "0",
        "GutZaehler": "10325",
        "BruttoZaehler": "10325",
        "SchlechtZaehler": "0",
        "AusschussGrund": "",
        "AusschussAnzahl": "",
        "Bemerkung": "",
        "Message": "",
        "SollZykluszeit": "180",
        "SollStkProStd": "",
        "QS_TPP_ID": "200",
        "Folgeauftrag": "",
        "SollAusschuss": "0",
        "SchusszahlProduktion": "3",
        "PdeCommand": "",
        "ExternalStrValues": "",
        "ExternalValues": "",
        "ULDL": "00",
        "Personal": "NONE",
        "SummeIstLaufzeitAuftrag": "644713",
        "SummeSollLaufzeitAuftrag": "26029726",
        "SummeIstLaufzeit": "644713",
        "SummeSollLaufzeit": "22286461",
        "Freizeit": "0",
        "SummeAusschuss": "0",
        "StartIstMenge": "0",
        "StartSchlechtMenge": "0",
        "StartSummeAusschuss": "0",
        "OrderId": "",
        "Option1": "",
        "Option2": "",
        "Rohstoff": "",
        "SchichtID": "Spätschicht",
        "Startzeit": "2019-05-08 14:00:00",
        "Endzeit": "2019-05-08 22:00:00",
        "SollLaufzeit": "28800",
        "Barcode": "NONE",
        "ProgStillstand": "0",
        "SummeLaufzeit": "26029726",
        "ExternalCommand": "zyklusoutofrange=1"
      },
      "processDataCreatedUtc": "2019-05-08T14:54:07.4391900Z",
      "processDataCreatedLocal": "2019-05-08T16:54:07.4391900+02:00",
      "cycle": 196,
      "machineId": "T_68539",
      "pdeMachineId": 102,
      "dmc": "0020000000NONE20190508145407000000242T_68539000NONE"
    },
    "id": { "dmc": "0020000000NONE20190508145407000000242T_68539000NONE" },
    "MAi": {
      "metadata_id": "2c8b4fd6-7142-4e1d-83af-199ba6967cde",
      "samples": [
        {
          "metadata_id": "ad123bdd-1d1d-49b8-84d6-6308eab38744",
          "sampleID": 74517,
          "parameterID": 9,
          "equipmentID": 1,
          "SampleDateTimestamp": 1557327571,
          "parameterName": "Produktcode",
          "Unit": "Status1",
          "UTCBias": -120,
          "dataKeys": [
            {
              "metadata_id": "120c99f7-c768-4605-9874-75be45d0a0fc",
              "name": "Bauteil",
              "description": "",
              "ID": 2,
              "LDDID": 1,
              "value": "1",
              "groupID": 1,
              "dkType": 0
            },
            {
              "metadata_id": "9a5a52a5-1e1e-4d99-a9f9-728e33cad934",
              "name": "Produktcode",
              "description": "",
              "ID": 1,
              "LDDID": 1,
              "value": "0020000000NONE20190508145407000000242T_68539000NONE",
              "groupID": 2,
              "dkType": 0
            }
          ],
          "rawValues": [
            {
              "metadata_id": "ac178e72-86a0-4dc6-8b5f-10129cce4156",
              "value": 1.0,
              "dataKeys": []
            }
          ],
          "SMAX": 0.0,
          "SMEAN": 0.0,
          "SMEDIAN": 0.0,
          "SMIN": 0.0,
          "SN": 0,
          "SSD": 0.0
        }
      ]
    },
    "processDataCreatedUtc": "2019-05-08T14:54:07.4400000",
    "processDataCreatedLocal": "2019-05-08T16:54:07.4400000"
  }

I would like to have a pandas frame with the following structure:

Zykluszahler|Zykluszeit|Parameter 1| Parameter 2
   196.0        5082.0     0.0        0.0

Up to now I made it until here

import pandas as pd
import json
from flatten_json import flatten
from pandas.io.json import json_normalize

with open(r"filepath") as data_file:
data=json.load(data_file)
df=json_normalize(data["kistler"])

If I use this lines I get all the information, i.e.

'actualValues.Prozessueberwachung_Schalter_SW.name',
   'actualValues.Prozessueberwachung_Schalter_SW.type',
   'actualValues.Prozessueberwachung_Schalter_SW.unit',
   'actualValues.Prozessueberwachung_Schalter_SW.value',
   'actualValues.S11013_Hub_A_Auswerfer_zurueck_verzoegert.name',
   'actualValues.S11013_Hub_A_Auswerfer_zurueck_verzoegert.type',
   'actualValues.S11013_Hub_A_Auswerfer_zurueck_verzoegert.unit',
   'actualValues.S11013_Hub_A_Auswerfer_zurueck_verzoegert.value',
   'actualValues.Zykluszaehler.name', 'actualValues.Zykluszaehler.type',
   'actualValues.Zykluszaehler.unit', 'actualValues.Zykluszaehler.value',
   'actualValues.Zykluszeit.name', 'actualValues.Zykluszeit.type',
   'actualValues.Zykluszeit.unit', 'actualValues.Zykluszeit.value',
   'cycle', 'dmc', 'machineId', 'mdeValues.Alarme',
   'mdeValues.Anfahrausschuss', 'mdeValues.Artikel', 'mdeValues.Auftrag',
   'mdeValues.AusschussAnzahl', 'mdeValues.AusschussGrund',
   'mdeValues.Aussschusszaehler', 'mdeValues.Barcode',
   'mdeValues.Bemerkung', 'mdeValues.BruttoZaehler', 'mdeValues.Charge',
   'mdeValues.Endzeit', 'mdeValues.ExternalCommand',
   'mdeValues.ExternalStrValues', 'mdeValues.ExternalValues',
   'mdeValues.Folgeauftrag', 'mdeValues.Freizeit', 'mdeValues.GutZaehler',
   'mdeValues.GutteileZaehler', 'mdeValues.Kavitaeten',
   'mdeValues.Manuell', 'mdeValues.ManuelleBuchung',
   'mdeValues.ManuellerStatus', 'mdeValues.Maschinenstatus',
   'mdeValues.Maschinenstillstand', 'mdeValues.Message',
   'mdeValues.Option1', 'mdeValues.Option2', 'mdeValues.OrderId',
   'mdeValues.PdeCommand', 'mdeValues.Personal',
   'mdeValues.ProgStillstand', 'mdeValues.QS_TPP_ID', 'mdeValues.Rohstoff',
   'mdeValues.SchichtID', 'mdeValues.SchlechtZaehler',
   'mdeValues.SchusszahlProduktion', 'mdeValues.SollAusschuss',
   'mdeValues.SollLaufzeit', 'mdeValues.SollStkProStd',
   'mdeValues.SollZykluszeit', 'mdeValues.Sollstueck',
   'mdeValues.StartIstMenge', 'mdeValues.StartSchlechtMenge',
   'mdeValues.StartSummeAusschuss', 'mdeValues.Startzeit',
   'mdeValues.Status', 'mdeValues.SummeAusschuss',
   'mdeValues.SummeIstLaufzeit', 'mdeValues.SummeIstLaufzeitAuftrag',
   'mdeValues.SummeLaufzeit', 'mdeValues.SummeSollLaufzeit',
   'mdeValues.SummeSollLaufzeitAuftrag', 'mdeValues.ULDL',
   'mdeValues.Vollautomatik', 'mdeValues.Werkzeug',
   'mdeValues.Zykluszaehler', 'metadata_id', 'pdeMachineId',
   'processDataCreatedLocal', 'processDataCreatedUtc'

If I add following to my code

df=json_normalize(data["kistler"],"actualValues")

I get the names but not the values and columns. I feel I am not that far away. For any kind of help I would be really gratreful

Cheers

1
  • to be specific I would like to have a daframe constructed from the nested dict "actualValues". Again thanks a lot for any kind of help :) Commented May 14, 2019 at 15:23

1 Answer 1

2

I would just build the data by hand:

with open(r"filepath") as data_file:
    data=json.load(data_file)["kistler"]["actualValues"]

df = pd.DataFrame({k: [v["value"]] for k,v in data.items()})

It should give:

   Zykluszaehler  Zykluszeit  Parameter 1  Parameter 2
0          196.0      5082.0          0.0          0.0
Sign up to request clarification or add additional context in comments.

4 Comments

wow thank you so much :) This really helps me. Another question is: Is this a time efficient process? To have a better overview I deleted many items from the json file. In real life I have to deal with many items which will have many values. So if I use a for loop this might be very time consuming. If you know a less time consuming approach that would be great :) Cheers and thank you very much
IMHO, it should be reasonably efficient. You have to load the json data as a nested Python dictionary. Building a new dictionary by a list comprehension is a standard operation in Python and should be reasonably fast. And building a dataframe form a dict of arrays is fast, because it just build numpy arrays from the Python arrays.
Ok thank you very much. I do have one (last) question :) Suppose there is an additional dict inside the json file which I would like to read, which has the name "setvalues" How can I also read these data. Just adding ["setvalues"] to data=json.load(..) does not work. What can I do here :)
the "setvalues" dict has exactly the same structure. What I could do is: Having one dataframe with "actualValues" and one with "setValues" and use pd.concat[frame1,frame2]. However, I think there is a better solution for this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.