1

I'm having some difficulty in structuring the following data, I would like the help of experts on this topic

I need to structure a json in dataframe in pyspark. I don't have its complete schema but it has this nested structure below that doesn't change:

import http.client conn = http.client.HTTPSConnection("xxx")

payload = ""

conn.request("GET", "xxx", payload)

res = conn.getresponse() data = res.read().decode("utf-8")

json_obj = json.loads(data)

df = json.dumps(json_obj, indent=2)

This is the Json:

 {   "car": {
    "top1": {
      "cl": [
        {
          "nm": "Setor A",
          "prc": "40,00 %",
          "tv": [
            {
              "logo": "https://www.test.com/ddd.jpg",
              "nm": "BDFG",
              "lk1": "https://www.test.com/ddd/BDFG/",
              "lk2": "https://www.test-ddd.com",
              "dta": [
                {
                  "nm": "PA",
                  "cp": "nl",
                  "vl": "$ 2,50"
                },
                {
                  "nm": "FVP",
                  "cp": "UV",
                  "vl": "No"
                }
              ],
              "prc": "30,00 %"
            },
            {
              "logo": "https://www.test.com/ccc.jpg",
              "nome": "BDFH",
              "lk1": "https://www.test.com/ddd/BDFH/",
              "lk2": "https://www.test-ddd.com",
              "dta": [
                {
                  "nm": "PA",
                  "cp": "nl",
                  "vl": "$ 2,50"
                },
                {
                  "nm": "FVP",
                  "cp": "UV",
                  "vl": "No"
                }
              ],
              "prc": "70,00 %"
            }
          ]
        },
        {
          "nm": "B",
          "prc": "60,00 %",
          "tv": [
            {
              "logo": "https://www.test.com/bomm.jpg",
              "nm": "BOOM",
              "lk1": "https://www.test.com/ddd/BDFH/",
              "lk2": "https://www.test-ddd.com",
              "dta": [
                {
                  "nm": "PA",
                  "cp": "nl",
                  "vl": "$ 2,50"
                },
                {
                  "nm": "FVP",
                  "cp": "UV",
                  "vl": "No"
                }
              ],
              "prc": "100,00 %"
            }
          ]
        }
      ]
    },
    "top2": {
      "cl": [{}]
    "top3": {
      "cl": [{}]
     }

Example of a json file

I tried to somehow structure my data but without success:

schema = StructType(
    [
      StructField("car", ArrayType(StructType([
        StructField("top1", ArrayType(StructType([
          StructField("cl", ArrayType(StructType([
            StructField("nm", StringType(),True),
            StructField("prc", StringType(),True),
            StructField("tv", ArrayType(StructType([
              StructField("logo", StringType(),True),
              StructField("nm", StringType(),True),
              StructField("lk1", StringType(),True),
              StructField("lk2", StringType(),True),
              StructField("dta", ArrayType(StructType([
                StructField("nm", StringType(),True),
                StructField("cp", StringType(),True),
                StructField("vl", StringType(),True)]))),
              StructField("prc", StringType(),True)])))])))]))),
        StructField("top2", ArrayType(StructType([
          StructField("cl", ArrayType(StructType([
            StructField("nm", StringType(),True),
            StructField("prc", StringType(),True),
            StructField("tv", ArrayType(StructType([
              StructField("logo", StringType(),True),
              StructField("nm", StringType(),True),
              StructField("lk1", StringType(),True),
              StructField("lk2", StringType(),True),
              StructField("dta", ArrayType(StructType([
                StructField("nm", StringType(),True),
                StructField("cp", StringType(),True),
                StructField("vl", StringType(),True)]))),
              StructField("prc", StringType(),True)])))])))]))),  
        StructField("top3", ArrayType(StructType([
          StructField("cl", ArrayType(StructType([
            StructField("nm", StringType(),True),
            StructField("prc", StringType(),True),
            StructField("tv", ArrayType(StructType([
              StructField("logo", StringType(),True),
              StructField("nm", StringType(),True),
              StructField("lk1", StringType(),True),
              StructField("lk2", StringType(),True),
              StructField("dta", ArrayType(StructType([
                StructField("nm", StringType(),True),
                StructField("cp", StringType(),True),
                StructField("vl", StringType(),True)]))),
              StructField("prc", StringType(),True)])))])))])))])))])


df2 = sqlContext.read.json(df, schema)
df2.printSchema()

I receive this message: error message

i want to transform something like this:

exemple of dataframe

Is there any function that can facilitate this break and structure this data?

4
  • Side note: this is not JSON, this is a Python dict literal. It might be better if you run print(json.dumps(..., indent=2)) to show data in true JSON format. Commented Aug 19, 2021 at 0:00
  • On topic: there is no single unambiguous way to represent this data in a 2-d table (i.e a data frame). Can you give an example of how you want it to be formatted? Commented Aug 19, 2021 at 0:01
  • 1
    @shadowtalker i edited the topic, tks! Commented Aug 19, 2021 at 0:19
  • your json is corrupted..check if the brackets are all balanced. Commented Aug 21, 2021 at 6:20

1 Answer 1

2

You can pass JSON file path or RDD to json() method.

You need create RDD out of your JSON string using parallelize() then pass this RDD to json().

spark = SparkSession.builder.master("local[*]").getOrCreate()
rdd = spark.sparkContext.parallelize([json.dumps(json_obj,indent=2)])
# Schema will be inferred automatically. You can pass schema if you want.
json_df = spark.read.json(rdd) 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.