2

I have a Python list of 10000*1. I want to add it to a Spark DataFrame, so that the DataFrame consists of 10000 rows. How do I do that?

2 Answers 2

3

First, create dataframe from list:

new_df = spark.createDataFrame([(value,) for value in list], ['id'])

Then union both dataframes:

base.union(new_df).show()

Remember that column name and type in both dataframes must be the same.

Sign up to request clarification or add additional context in comments.

1 Comment

To clarify, I think the OP probably meant "I want to create a Spark DataFrame" from a Python list, in which case the first part of this answer will suffice. At first I had no idea why this answer was suggesting to union anything, until I looked at the wording of the original question more carefully.
0

It looks like you want to add literal value

from pyspark.sql import functions as f

df = spark.sparkContext.parallelize([('idx',)]).toDF()
res = df.withColumn('literal_col', f.lit('strings'))
res.show(truncate=False)

# output:
+---+-----------+
|_1 |literal_col|
+---+-----------+
|idx|strings    |
+---+-----------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.