How to create JSON with column name and type from another dataframe

Question

I have a dataframe with following schema:

root
 |-- Id: integer (nullable = true)
 |-- Id_FK: integer (nullable = true)
 |-- Foo: integer (nullable = true)
 |-- Bar: string (nullable = true)
 |-- XPTO: string (nullable = true)

From that dataframe I want to create a JSON file with the column name and type as follow

{
 "Id": "integer",
 "Id_FK": "integer",
 "Foo": "integer ",
 "Bar": "string",
 "XPTO": "string",
}

I'm trying to do that using pyspark, but I can't find any way to accomplish that. Can anyone help me?

abiratsis · Accepted Answer · 2019-09-02 20:52:41Z

1

Here is a solution that first populates a dictionary iterating among the columns of the schema. Then we use json.dumps to convert the dictionary into string:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
import json

# sample schema
schema = StructType(
    [
      StructField("Id_FK" ,IntegerType()),
      StructField("Foo" ,IntegerType()),
      StructField("Bar" ,StringType()),
      StructField("XPTO" ,StringType())
    ])

# create a dictionary where each item will be a pair of col_name : col_type
dict = {}
for c in schema:
  dict[c.name] = str(c.dataType)

# convert to json string
data = json.dumps(dict)

# save to file
text_file = open("output.txt", "w")
text_file.write(data)
text_file.close()

answered Sep 2, 2019 at 20:52

abiratsis

7,3414 gold badges31 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to create JSON with column name and type from another dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related