0

I wrote code that transforms string into array of structs. I would like to do the same in python. Do you have any clue how can I do it?

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column


val df: DataFrame = Seq(
  "adserviceCalculateCpcAlgorithmV1:2;searchProductsDecorator:3;searchOffersDecorator:3;bundlediscounts:5;searchGridType:3"
).toDF("abTests")

display(
  df
    .withColumn("abTestsArr", split($"abTests", ";"))
    .withColumn("abTestsArr", 
      transform(col("abTestsArr"), (c: Column) => {
        struct(
          split(c, ":").getItem(0) as "name",
          split(c, ":").getItem(1) as "group"
        )
      }) 
    )
)
1
  • all of these functions are available with similar names and functionality. have you checked out the doc? Commented Aug 2, 2022 at 15:12

1 Answer 1

1

You would do the same in Python using lambda expression as the second parameter for transform function:

from pyspark.sql import functions as F

df.withColumn(
    "abTestsArr",
    F.transform(
        F.split("abTests", ";"), lambda x: F.struct(
            F.substring_index(x, ":", 1).alias("name"),
            F.substring_index(x, ":", -1).alias("group")
        )
    )
).show(truncate=False)

Instead of parsing it yourself, you could also consider using str_to_map to get a MapType column then convert into array of structs using map_entries function:

df.withColumn(
    "abTestsMap", 
    F.map_entries(F.expr("str_to_map(abTests, ';', ':')"))
).show(truncate=False)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.