1

I have a JSON file like below :

{"Codes":[{"CName":"012","CValue":"XYZ1234","CLevel":"0","msg":"","CType":"event"},{"CName":"013","CValue":"ABC1234","CLevel":"1","msg":"","CType":"event"}}

I wanted to create the schema for this and if the JSON file is empty({}) it should be an empty String.

However, df Output is below when I used df.show:

[[012, XYZ1234, 0, event, ], [013, ABC1234, 1, event, ]]

I created Schema like below :

val schemaF = ArrayType(
  StructType(
    Array(
      StructField("CName", StringType),
      StructField("CValue", StringType),
      StructField("CLevel", StringType),
      StructField("msg", StringType),
      StructField("CType", StringType)
    )
  )
)

When I tried below,

val df1 = df.withColumn("Codes",from_json('Codes, schemaF))

It gives AnalysisException :

org.apache.spark.sql.AnalysisException: cannot resolve 'jsontostructs(Codes)' due to data type mismatch: argument 1 requires string type, however, 'Codes' is of array<structCName:string,CValue:string,CLevel:string,CType:string,msg:string> type.;; 'Project [valid#51, jsontostructs(ArrayType(StructType(StructField(CName,StringType,true), StructField(CValue,StringType,true), StructField(CLevel,StringType,true), StructField(msg,StringType,true), StructField(CType,StringType,true)),true), Codes#8, Some(America/Bogota)) AS errorCodes#77]

Can someone please tell me why and how to resolve this issue?

4
  • Codes column is already of type array of struct, why do you want to use from_json? Commented Mar 22, 2021 at 18:15
  • I see that your Json file is not well defined, where is the closing ] for your your array Commented Mar 22, 2021 at 18:37
  • I forgot to copy ] bracket. @itIsNaz Commented Mar 23, 2021 at 4:33
  • Because if the codes is empty(i.e { Codes : [] }), I want to make use of Schema @blackbishop Commented Mar 23, 2021 at 4:57

2 Answers 2

0

val schema =
      StructType(
        Array(
          StructField("CName", StringType),
          StructField("CValue", StringType),
          StructField("CLevel", StringType),
          StructField("msg", StringType),
          StructField("CType", StringType)
        )

      )
    val df0 = spark.read.schema(schema).json("/path/to/data.json")
Sign up to request clarification or add additional context in comments.

2 Comments

It's still the same error when I tried your schema. org.apache.spark.sql.AnalysisException: cannot resolve 'jsontostructs(Codes)' due to data type mismatch: argument 1 requires string type, however, 'Codes' is of array<structCName:string,CValue:string,CLevel:string,CType:string,msg:string> type @itIsNaz
I suppose that you have a column named Code in df in this case just add it : val df1 = df0.withColumn("Codes",f.from_json(f.col("Codes"), schema)) where f is defined as import org.apache.spark.sql.{functions => f}
0

Your schema does not correspond to the JSON file you're trying to read. It's missing the field Codes of array type, it should look like this :

val schema = StructType(
  Array(
    StructField(
      "Codes",
      ArrayType(
        StructType(
          Array(
            StructField("CLevel", StringType, true),
            StructField("CName", StringType, true),
            StructField("CType", StringType, true),
            StructField("CValue", StringType, true),
            StructField("msg", StringType, true)
          )
        ), true)
      ,true)
  )
)

And you want to apply it when reading the json not with from_json function :

val df = spark.read.schema(schema).json("path/to/json/file")

df.printSchema
//root
// |-- Codes: array (nullable = true)
// |    |-- element: struct (containsNull = true)
// |    |    |-- CLevel: string (nullable = true)
// |    |    |-- CName: string (nullable = true)
// |    |    |-- CType: string (nullable = true)
// |    |    |-- CValue: string (nullable = true)
// |    |    |-- msg: string (nullable = true)

EDIT:

For your comment question, you can use this schema definition:

val schema = StructType(
    Array(
      StructField(
        "Codes",
        ArrayType(
          StructType(
            Array(
              StructField("CLevel", StringType, true),
              StructField("CName", StringType, true),
              StructField("CType", StringType, true),
              StructField("CValue", StringType, true),
              StructField("msg", StringType, true)
            )
          ), true)
        ,true),
      StructField("lid", StructType(Array(StructField("idNo", StringType, true))), true)
    )
  )

4 Comments

To this, how to I add on one more column schema. lid:struct -> idNo:string
@YOGESHS not sure I understand your question. how do you want to add a column?
To this, how to I add on one more column schema. lid:struct -> idNo:string,, example : {"Codes":[{"CName":"012","CValue":"XYZ1234","CLevel":"0","msg":"","CType":"event"},{"CName":"013","CValue":"ABC1234","CLevel":"1","msg":"","CType":"event"}],lid :{"idNo": "1234"}}
@YOGESHS Please se the edited answer on how to add that field into the schema. If the question was on how to add that column to the dataframe then use: df.withColumn("lid", struct(lit("1234").as("idNo"))).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.