How to add a Python list to a Spark DataFrame?

Question

I have a Python list of 10000*1. I want to add it to a Spark DataFrame, so that the DataFrame consists of 10000 rows. How do I do that?

Mariusz · Accepted Answer · 2017-03-01 07:50:34Z

3

First, create dataframe from list:

new_df = spark.createDataFrame([(value,) for value in list], ['id'])

Then union both dataframes:

base.union(new_df).show()

Remember that column name and type in both dataframes must be the same.

answered Mar 1, 2017 at 7:50

Mariusz

14k3 gold badges66 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kevin E Over a year ago

To clarify, I think the OP probably meant "I want to create a Spark DataFrame" from a Python list, in which case the first part of this answer will suffice. At first I had no idea why this answer was suggesting to union anything, until I looked at the wording of the original question more carefully.

Zhang Tong · Accepted Answer · 2017-03-01 07:00:39Z

0

It looks like you want to add literal value

from pyspark.sql import functions as f

df = spark.sparkContext.parallelize([('idx',)]).toDF()
res = df.withColumn('literal_col', f.lit('strings'))
res.show(truncate=False)

# output:
+---+-----------+
|_1 |literal_col|
+---+-----------+
|idx|strings    |
+---+-----------+

answered Mar 1, 2017 at 7:00

Zhang Tong

4,7593 gold badges21 silver badges39 bronze badges

Collectives™ on Stack Overflow

How to add a Python list to a Spark DataFrame?

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related