I use Spark 4.0 and have the following code
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import StringType
json_data = [
('{"auswahl":{"a":{"b":10032,"c":201844}',)
]
df_raw = spark.createDataFrame(json_data, ["data_str"])
df_raw.display()
# Step 2: Convert JSON string to VARIANT
df_variant = df_raw.select(
F.col("data_str").cast("VARIANT").alias("data")
)
# Step 3: Extract nested field with variant_get
df_result = df_variant.select(
F.variant_get("data", "$.auswahl.a.b", "string")
)
df_result.display()
However, I always get
| variant_get(data, $.auswahl.a.b, 'STRING') |
|---|
| null |
I looked up the spec - and I would expect to see 10032 instead of null.
df_variant loos like the following
| data |
|---|
| "{"auswahl":{"a":{"b":10032,"c":201844}" |
What I am doing wrong?
Fapears in your code without declaration. Can you make your question more clear? "What am I doing wrong ?" is not a clear question.