There is the following dataframe:
>>> df.printSchema()
root
|-- I: string (nullable = true)
|-- F: string (nullable = true)
|-- D: string (nullable = true)
|-- T: string (nullable = true)
|-- S: string (nullable = true)
|-- P: string (nullable = true)
column F is in dictionary format:
{"P1":"1:0.01","P2":"3:0.03,4:0.04","P3":"3:0.03,4:0.04",...}
I need to read column F as following and create two new columns P and N
P1 => "1:0.01"
P2 => "3:0.03,4:0.04"
and so on
+--------+--------+-----------------+-----+------+--------+----+
| I | P | N | D | T | S | P |
+--------+--------+---------------- +------------+--------+----+
| i1 | p1 | 1:0.01 | d1 | t1 | s1 | p1 |
|--------|--------|-----------------|-----|------|--------|----|
| i1 | p2 | 3:0.03,4:0.04 | d1 | t1 | s1 | p1 |
|--------|--------|-----------------|-----|------|--------|----|
| i1 | p3 | 3:0.03,4:0.04 | d1 | t1 | s1 | p1 |
|--------|--------|-----------------|-----|------|--------|----|
| i2 | ... | .... | d2 | t2 | s2 | p2 |
+--------+--------+-----------------+-----+------+--------+----+
any suggestion in Pyspark?