2

I am currently using Apache Spark 2.1.1 to process an XML file into a CSV. My goal is to flatten the XML but the problem I am currently facing is unbounded occurrences of elements. Spark automatically infer these unbounded occurrences into array. Now what I want to do is explode an array column.

 Sample Schema

 |-- Instrument_XREF_Identifier: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- @bsid: string (nullable = true)
 |    |    |-- @exch_code: string (nullable = true)
 |    |    |-- @id_bb_sec_num: string (nullable = true)
 |    |    |-- @market_sector: string (nullable = true)

I know I can explode the array by this method

result = result.withColumn(p.name, explode(col(p.name)))

which will produce multiple rows with each array value containing struct. But the output I want to produce is to explode it into multiple columns instead of row.

Here is my expected output according to the schema I mentioned above:

Lets say that there are two struct values in the array.

bsid1   exch_code1   id_bb_sec_num1   market_sector1   bsid2   exch_code2   id_bb_sec_num2   market_sector2
123     3            1                13               234     12           212              221
1
  • How would variable length array map to fixed number of columns? Please post example input and expected output. Commented Jan 16, 2018 at 17:36

1 Answer 1

2

suppose Instrument_XREF_Identifier is a column of type array<struct<..>>, then you have to do it in two steps:

result
.withColumn("tmp",explode(col("Instrument_XREF_Identifier")))
.select("tmp.*")

This will give you a column for each of the struct elements.

There seems not to be a way to do it in 1 select/withColumn statement, see Explode array of structs to columns in Spark

Sign up to request clarification or add additional context in comments.

1 Comment

But that would still just be exploded into multiple rows. I'm trying to approach it so that I'll create new columns when they are exploded.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.