3

I have a dataframe df with the following schema:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)

What I want to do is add a nested column, say car_brand to my person structure. How would I do it?

The expected final schema would look like this:

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- car_brand: string (nullable = true)

4 Answers 4

5

You can unpack the struct and add it to a new one, including the new column at the same time. For example, adding "bmw" to all persons in the dataframe be done like this:

df.withColumn("person", struct($"person.*", lit("bmw").as("car_brand")))
Sign up to request clarification or add additional context in comments.

4 Comments

This solution does not work with three level nested fields. How can we add nested_col field here? root |-- city_name: string (nullable = true) |-- person: struct (nullable = true) | |-- age: long (nullable = true) | |-- name: string (nullable = true) |-- nested_col: string (nullable = true)
@Sindhu If I understand correctly you want to add the nested_col to the person struct? You can do this by using the method in the answer, just changelit("bmw") to $"nested_col".
No, Actually I want to add a column when there is one more level of nesting per say name here, assuming name is struct here just like person.
@Sindhu If there isn't too many columns then you could do something like follows: withColumn("person", struct($"person.age", struct($"person.name", $"nested_col").as("name))). However, this will change a bit depending on the type of the column (here your name column is a string, not a struct). If you want a more detailed answer for your use case, I would recommend asking a new question on the site with more information (you can leave the link in a comment here).
2

Adding a new nested column within person:

df = df.withColumn(
        "person",
        struct(
            $"person.*",
            struct(
                lit("value_1").as("person_field_1"),
                lit("value_2").as("person_field_2"),
            ).as("nested_column_within_person")
       )
    )

Final schema :

root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- age: long (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- nested_column_within_person: struct (nullable = true)
 |    |    |-- person_field_1: string (nullable = true)
 |    |    |-- person_field_2: string (nullable = true)

1 Comment

Is there any syntax changed in the recent versions? i'm getting SyntaxError: invalid syntax for cdc_df.withColumn("AuditPayload", struct( lit("value_1").as("person_field_1"),lit("value_2").as("person_field_2"),).as("nested_column_within_person")))
1
import pyspark.sql.functions as func
dF = dF.withColumn(
        "person",
   func.struct(
            "person.age",
                func.struct(
                            "person.name",
                            func.lit(None).alias("NestedCol_Name")
                    ).alias("name")
       )
       )
O/P Schema:-
root
 |-- city_name: string (nullable = true)
 |-- person: struct (nullable = false)
 |    |-- age: string (nullable = true)
 |    |-- name: struct (nullable = false)
 |    |    |-- name: string (nullable = true)
 |    |    |-- NestedCol_Name: null (nullable = true)

1 Comment

this solution will help to add the nested column at more than one level and as per above example it will add a nested column i.e. ('NestedCol_Name') inside name column which we already have. Hope it answers @Sindhu's question
1

The function withField is available starting Spark 3.1. As per the doc, "it can be used to add/replace a nested field in StructType by name".

In this case, it can be used as follows: -

import org.apache.spark.sql.functions

df.withColumn("person", functions.col("person").withField("car_brand", functions.col("some car brand here")))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.