0

I have a dataframe with following schema:

root
 |-- Id: integer (nullable = true)
 |-- Id_FK: integer (nullable = true)
 |-- Foo: integer (nullable = true)
 |-- Bar: string (nullable = true)
 |-- XPTO: string (nullable = true)

From that dataframe I want to create a JSON file with the column name and type as follow

{
 "Id": "integer",
 "Id_FK": "integer",
 "Foo": "integer ",
 "Bar": "string",
 "XPTO": "string",
}

I'm trying to do that using pyspark, but I can't find any way to accomplish that. Can anyone help me?

1 Answer 1

1

Here is a solution that first populates a dictionary iterating among the columns of the schema. Then we use json.dumps to convert the dictionary into string:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
import json

# sample schema
schema = StructType(
    [
      StructField("Id_FK" ,IntegerType()),
      StructField("Foo" ,IntegerType()),
      StructField("Bar" ,StringType()),
      StructField("XPTO" ,StringType())
    ])

# create a dictionary where each item will be a pair of col_name : col_type
dict = {}
for c in schema:
  dict[c.name] = str(c.dataType)

# convert to json string
data = json.dumps(dict)

# save to file
text_file = open("output.txt", "w")
text_file.write(data)
text_file.close()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.