1

I have below logs which contains text and json string

2020-09-24T08:03:01.633Z 11.21.23.1 {"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}

created DF for the above logs as seen below


| Date     |Source IP  | Event Type
|2020-09-24|11.21.23.1 | {"EventTime":"202|

crated schema for converting json string to another data frame

json_schema = StructType([
        StructField("EventTime", StringType()),
        StructField("Hostname", StringType()),
        StructField("Keywords", IntegerType())
    ])

json_converted_df= df.select(F.from_json(F.col('Event Type'), json_schema).alias("data")).select("data.*").show()

but the Data Frame rerun null for all new json schema

+---------+--------+--------
|EventTime|Hostname|Keywords|
+---------+--------+--------
|     null|    null|null    |
+---------+--------+--------

How to resolve this issue?

1 Answer 1

0

Works fine with me ...

# Preparation of test dataset

a = [
    (
        "2020-09-24T08:03:01.633Z",
        "11.21.23.1",
        '{"EventTime":"2020-09-24 13:33:01","Hostname":"abc-cde.india.local","Keywords":-1234}',
    ),
]

b = ["Date", "Source IP", "Event Type"]

df = spark.createDataFrame(a, b)

df.show()
#+--------------------+----------+--------------------+
#|                Date| Source IP|          Event Type|
#+--------------------+----------+--------------------+
#|2020-09-24T08:03:...|11.21.23.1|{"EventTime":"202...|
#+--------------------+----------+--------------------+

df.printSchema()
#root
# |-- Date: string (nullable = true)
# |-- Source IP: string (nullable = true)
# |-- Event Type: string (nullable = true)
# Your code executed
from pyspark.sql.types import *

json_schema = StructType(
    [
        StructField("EventTime", StringType()),
        StructField("Hostname", StringType()),
        StructField("Keywords", IntegerType()),
    ]
)

json_converted_df = df.select(
    F.from_json(F.col("Event Type"), json_schema).alias("data")
).select("data.*")

json_converted_df.show()
#+-------------------+-------------------+--------+
#|          EventTime|           Hostname|Keywords|
#+-------------------+-------------------+--------+
#|2020-09-24 13:33:01|abc-cde.india.local|   -1234|
#+-------------------+-------------------+--------+

json_converted_df.printSchema()
#root
# |-- EventTime: string (nullable = true)
# |-- Hostname: string (nullable = true)
# |-- Keywords: integer (nullable = true)
Sign up to request clarification or add additional context in comments.

3 Comments

@Adhi What was the issue ? because, apparently, there was no issue at all.
Hi @Steven, i am not sure what went wrong ... by the way, can you tell me please..what usually cause for returning null value while transforming json string to another DF?
@Adhi Wrong type in json_schema. If you replace Hostname type with IntegerType, you'll end up with all NULL.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.