I need to expand the Json object (column b) to multiple columns.
From this table,
| Column A | Column B |
|---|---|
| id1 | [{a:1,b:'letter1'}] |
| id2 | [{a:1,b:'letter2',c:3,d:4}] |
To this table,
| Column A | a | b | c | d |
|---|---|---|---|---|
| id1 | 1 | 2 | ||
| id2 | 1 | 2 | 3 | 4 |
I have tried transforming the dataframe from local and spark, but neither worked.
From the local, I extracted the kv in column B with multiple loops (this step was succussed).
But when I tried to turn the extracted kv (in dictionary structure) to dataframe, this error occurred: "ValueError: All arrays must be of the same length".
Because key c and key d from Json object are empty. So failed to do so.
Below answer was referred in this case.
Expand Dataframe containing JSON object into larger dataframe
From spark, I got type error (something like longtype and stringtype can't be regconized) when transformed the pandas dataframe to spark dataframe.
So, I transformed the pandas dataframe to string type df.astype(str), and I could transform it to spark dataframe.
def func(df):
spark = (
SparkSession.builder.appName("data")
.enableHiveSupport()
.getOrCreate()
)
df1 = spark.createDataFrame(df)
Now, when I tried to expand it ...
for i in df.columns:
if i == 'a_column':
# Since the rows became string instead of list.
# I need to remove the first and last characters which are the [ and ].
# But I get error here: failed due to error Column is not iterable
df.withColumn(i,substring(i,2,length(i)))
df.withColumn(i,substring(i,1,length(i)-1))
# transform each row (json string) to json object
# But I get error here: ValueError: 'json' is not in list ; AttributeError: json
# I assume the x.json means convert row to json obejct?
df = df.map(lambda x:x.json)
print(df.take(10))
Below answer was referred in this case. I can't hardcode the schemas as there are a lots of different JSON columns.
Pyspark: explode json in column to multiple columns
Pyspark: Parse a column of json strings
Someone please help. Could you show me how to do it from local and spark?
Every appreciate.