0

How to creat a pyspark DataFrame inside of a loop? In this loop in each iterate I am printing 2 values print(a1,a2). now I want to store all these value in a pyspark dataframe.

1 Answer 1

1

Initially, before the loop, you could create an empty dataframe with your preferred schema. Then, create a new df for each loop with the same schema and union it with your original dataframe. Refer the code below.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType

spark = SparkSession.builder.getOrCreate()

schema = StructType([
  StructField('a1', StringType(), True),
  StructField('a2', StringType(), True)
  ])

df = spark.createDataFrame([],schema)

for i in range(1,5):
    a1 = i
    a2 = i+1
    newRow = spark.createDataFrame([(a1,a2)], schema)
    df = df.union(newRow)

print(df.show())

This gives me the below result where the values are appended to the df in each loop.

+---+---+
| a1| a2|
+---+---+
|  1|  2|
|  2|  3|
|  3|  4|
|  4|  5|
+---+---+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.