I have a dataframe with a schema as follows:
root
|-- column: struct (nullable = true)
| |-- column-string: string (nullable = true)
|-- count: long (nullable = true)
What I want to do is:
- Get rid of the struct - or by that I mean "promote" column-string, so my dataframe only has 2 columns - column-string and count
- I then want to split column-string into 3 different columns, so I end up with the schema:
The text within column-string always fits the format: Some-Text,Text,MoreText
Does anyone know how this is possible?
I'm using Pyspark Python.
PS. I am new to Pyspark & I don't know much about the struct format and couldn't find how to write an example into my post to make it reproducible - sorry.
