0

I have a schema like the below. I was wondering what is the best way in spark to select the elements seat and drive then cast it into a string. I am reading this in a dataframe with spark 1.6.

|-- cars: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- carId: string (nullable = true)
 |    |    |-- carCode: string (nullable = true)
 |    |    |-- carNumber: string (nullable = true)
 |    |    |-- features: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- seat: string (nullable = true)
 |    |    |    |    |-- drive: string (nullable = true)

The output of cars.features as car_features in json:

"cars_features":[[{"seat":"Auto","drive":"Manual"}]]

I am trying to select "Auto" and put it into a dataframe column and "Manual" and put into another column.

current attempt returns the whole structure as:

+-------------------+
|car_features       |
+-------------------+
|  [[Auto,Manual]]  |
+-------------------+

col("car.features").getItem(0).as("car_features_seat")
3
  • so you want select seat and drive as arrays of arrays or just as an array or as rows? Commented Sep 3, 2019 at 19:13
  • I edited the question to make that more clear. Commented Sep 3, 2019 at 19:21
  • getItem("key") not 0. Commented Sep 3, 2019 at 23:26

1 Answer 1

1

I had to drill into array twice:

col("car.features").getItem(0).getItem(0).getItem("seat").cast("String").as("car_features_seat")

This extracts "Auto"

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.