create array of columns in pyspark

Question

I have a dataframe with single row and multiple columns. I would like it to convert it into multiple rows. I had found a similar question here on the stackoverflow.

The question answers how it can be done in scala but I wanted to do this in pyspark. I tried to replicate the code in pyspark but I wasn't able to do that.

I am not able to convert the below code in scala to python:

import org.apache.spark.sql.Column
var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => {Array(lit(c), col(c))}}
val df2 = df1.withColumn("myMap", map(ColumnsAndValues: _*))

blackbishop · Accepted Answer · 2021-03-11 10:15:52Z

1

In Pyspark you can use create_map function to create map column. And a list comprehension with itertools.chain to get the equivalent of scala flatMap :

import itertools
from pyspark.sql import functions as F

columns_and_values = itertools.chain(*[(F.lit(c), F.col(c)) for c in df1.columns])
df2 = df1.withColumn("myMap", F.create_map(*columns_and_values))

answered Mar 11, 2021 at 10:15

blackbishop

32.8k11 gold badges61 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

create array of columns in pyspark

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related