1

I want to append to dataframe with a new map column of existing columns which start with a given common prefix.

For example, I have input of

{"Prefix_A": "v_A", "Prefix_B": "v_B", "Field": "v"}, {"Prefix_A": "v_A", "Prefix_B": "v_B", "Prefix_C": "v_C", "Field": "v"}

I want to combine all fields with prefix "Prefix_" and get an output of

{"NewColumn": {"Prefix_A": "v_A", "Prefix_B": "v_B"}, "Field": "v"}, {"NewColumn": {"Prefix_A": "v_A", "Prefix_B": "v_B", "Prefix_C": "v_C"}, "Field": "v"}

I want to do this on the fly, i.e. I don't know the columns as the data is schemaless json dump.

Further, I want to construct a new map column of existing columns which matches given regular expression.

2
  • Can you provide an example of your expected input and output? It is hard to tell what exactly you are trying to do. It would also be helpful if you had a specific question to answer. Commented Mar 8, 2019 at 20:22
  • @KitMenke editted Commented Mar 8, 2019 at 20:51

1 Answer 1

1

Assuming, you have an input file data.json containing your json entries, you can get the expected output with the following code:

import org.apache.spark.sql.functions.{col,struct}
import spark.implicits._

val df = spark.read.json("data.json")
val (prefixedColumns, otherColumns) = df.columns.partition(_.startsWith("Prefix"))
val transformedDf = df.select(
    struct(prefixedColumns.map(col):_*).as("NewColumn")
    +: otherColumns.map(col):_*)
transformedDf.write.json("output.json")

Basically, you recreate a new list of columns using the necessary functions and then apply them in select using the scala :_* notation to transform the sequence to varargs

Sign up to request clarification or add additional context in comments.

2 Comments

Nice solution. Not sure why you refer to those built-in functions as UDFs though.
I tend to call all non-sql standard functions "udfs". I fixed the text to just reference "functions"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.