1

I have this DataFrame:

val df: DataFrame = Seq(
("1", "1234 58", "SYSTEM_A", "5", "2022"),
("2", "1458 65", "SYSTEM_B", "2", "2021")
).toDF("id", "pseudo_id", "system", "number", "date")

I need to build a nested DataFrame using the df DataFrame with the following schema:

root
 |-- id: string (nullable = true)
 |-- pseudo_id: string (nullable = true)
 |-- system: string (nullable = true)
 |-- version: struct (nullable = false)
 |    |-- number: string (nullable = true)
 |    |-- date: string (nullable = true)

I tried to build it with:

val nestedDf: DataFrame = df
.groupBy("id", "pseudo_id", "system")
.agg(
  struct(
  "number",
  "date"
  ).as("version")
)

But I have obtained the following error :

Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 'number' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

Any ideas ?

2
  • The error message explains it pretty well, you have to use an aggregation expression for all columns that are not part of the group by. The question is what you really intend to do? Is it just about re-structuring the data / changing the schema or do you want to aggregate (deduplicate?) the data? Commented Sep 5, 2022 at 16:19
  • I just want to changing the schema of the inital data. Commented Sep 5, 2022 at 16:24

1 Answer 1

1

You can use a struct expression:

val df=...
val df2=df.selectExpr("id", "pseudo_id", "system", "struct(number, date) as version")
df2.printSchema()

Output:

root
 |-- id: string (nullable = true)
 |-- pseudo_id: string (nullable = true)
 |-- system: string (nullable = true)
 |-- version: struct (nullable = false)
 |    |-- number: string (nullable = true)
 |    |-- date: string (nullable = true)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.