I have a Python list of 10000*1. I want to add it to a Spark DataFrame, so that the DataFrame consists of 10000 rows. How do I do that?
2 Answers
First, create dataframe from list:
new_df = spark.createDataFrame([(value,) for value in list], ['id'])
Then union both dataframes:
base.union(new_df).show()
Remember that column name and type in both dataframes must be the same.
1 Comment
Kevin E
To clarify, I think the OP probably meant "I want to create a Spark DataFrame" from a Python list, in which case the first part of this answer will suffice. At first I had no idea why this answer was suggesting to
union anything, until I looked at the wording of the original question more carefully.It looks like you want to add literal value
from pyspark.sql import functions as f
df = spark.sparkContext.parallelize([('idx',)]).toDF()
res = df.withColumn('literal_col', f.lit('strings'))
res.show(truncate=False)
# output:
+---+-----------+
|_1 |literal_col|
+---+-----------+
|idx|strings |
+---+-----------+