1

I have a table with two columns, one is an id and the other a value. My value column contains 1488 characters. I have to split this column into multiple rows with 12 characters each. Example:

Dataframe:

ID  Value
 1  123456789987653ABCDEFGHI

Expected output:

ID  Value
1   123456789987
1   653ABCDEFGHI

How can this be done in Spark?

1 Answer 1

2

Create an UDF to split a string into equal length parts using grouped. Then use explode on the resulting list of string to flatten it.

import org.apache.spark.sql.functions._

def splitOnLength(len: Int) = udf((str: String) => {
  str.grouped(len).toSeq
})

df.withColumn("Value", explode(splitOnString(12)($"Value")))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.