0

enter image description hereHow can I create a dataframe of empty structs please.? Thank you .

dataxx = []
schema = StructType(
[
    StructField('Info1',
        StructType([
            StructField('fld', IntegerType(),True),
            StructField('fld1', IntegerType(),True),
            StructField('fld2', IntegerType(),True),
            StructField('fld3', IntegerType(),True),
            StructField('fld4', IntegerType(),True),   
            ])
    ),
]
)
df = sqlCtx.createDataFrame(dataxx, schema)

Thank you for your help

4
  • Not related to pandas..removed Commented Dec 22, 2019 at 16:04
  • Have you tried spark.createDataFrame([], schema) ? Commented Dec 22, 2019 at 20:26
  • Does this answer your question? How to create an empty DataFrame? Why "ValueError: RDD is empty"? Commented Dec 22, 2019 at 20:26
  • @blackbishop Thank you but its not really what I mean. I want to create like this shema of data frame struct. I have added a pic to better understand. Commented Dec 22, 2019 at 20:33

1 Answer 1

2

If you want to create DataFrame that has specific schema but contains no data, you can do it simply by providing empty list to the createDataFrame function:

from pyspark.sql.types import *

schema = StructType(
[
    StructField('Info1',
        StructType([
            StructField('fld', IntegerType(),True),
            StructField('fld1', IntegerType(),True),
            StructField('fld2', IntegerType(),True),
            StructField('fld3', IntegerType(),True),
            StructField('fld4', IntegerType(),True),   
            ])
    ),
]
)
df = spark.createDataFrame([], schema)

df.printSchema()

root
 |-- Info1: struct (nullable = true)
 |    |-- fld: integer (nullable = true)
 |    |-- fld1: integer (nullable = true)
 |    |-- fld2: integer (nullable = true)
 |    |-- fld3: integer (nullable = true)
 |    |-- fld4: integer (nullable = true)

Here spark is sparkSession.

Sign up to request clarification or add additional context in comments.

2 Comments

thank you David to add value in my fld2 for example can I do this please. ??df.Info1.fld2 = 22
@ceo No, I am afraid it is not going to work like this. If you want to add value to info1.fld2 (and have a single row in the DataFrame) you can call withColumn transformation (or just select) and redefine the struct and in fld2 use lit(22)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.