0

I use Spark 4.0 and have the following code

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import StringType

json_data = [
    ('{"auswahl":{"a":{"b":10032,"c":201844}',)
]

df_raw = spark.createDataFrame(json_data, ["data_str"])

df_raw.display()

# Step 2: Convert JSON string to VARIANT
df_variant = df_raw.select(
    F.col("data_str").cast("VARIANT").alias("data")
)

# Step 3: Extract nested field with variant_get
df_result = df_variant.select(
    F.variant_get("data", "$.auswahl.a.b", "string")
)

df_result.display()

However, I always get

variant_get(data, $.auswahl.a.b, 'STRING')
null

I looked up the spec - and I would expect to see 10032 instead of null.

df_variant loos like the following

data
"{"auswahl":{"a":{"b":10032,"c":201844}"

What I am doing wrong?

2
  • Your example is not reproduceable (stackoverflow.com/help/minimal-reproducible-example) - F apears in your code without declaration. Can you make your question more clear? "What am I doing wrong ?" is not a clear question. Commented Sep 11 at 18:21
  • added the imports and clarified the expected output Commented Sep 11 at 18:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.