2

I have a pyspark.sql.dataframe.DataFrame with four columns of strings such as:

id col1 col2 col3
z10234 Header One : teacher Header Two : salary Header Three : 12
z10235 Header One : plumber Header Two : hourly Header Three : 15
z10236 Header One : executive Header Two : salary Header Three : 17
z10237 Header One : teacher Header Two : salary Header Three : 15
z10238 Header One : manager Header Two : hourly Header Three : 11

I need to convert each string col1, col2, and col3 such that the initial part of the string becomes the header:

id HeaderOne HeaderTwo HeaderThree
z10234 teacher salary 12
z10235 plumber hourly 15
z10236 executive salary 17
z10237 teacher salary 15
z10238 manager hourly 11

2 Answers 2

3

You can split by colon to get the first part as the column names and the second part as the column values:

import pyspark.sql.functions as F

names = df.limit(1).select(
    [F.split(c, ' : ')[0].alias(c) for c in df.columns[1:]]
).head().asDict()

df2 = df.select(
    'id', 
    *[F.split(c, ' : ')[1].alias(names[c]) for c in df.columns[1:]]
)

df2.show()
+------+----------+----------+------------+
|    id|Header One|Header Two|Header Three|
+------+----------+----------+------------+
|z10234|   teacher|    salary|          12|
|z10235|   plumber|    hourly|          15|
|z10236| executive|    salary|          17|
|z10237|   teacher|    salary|          15|
|z10238|   manager|    hourly|          11|
+------+----------+----------+------------+
Sign up to request clarification or add additional context in comments.

Comments

0

Another approach by creating MapType for each column , and exploding followed by groupby+pivot:

from pyspark.sql import functions as F
columns = ['col1','col2','col3']

def fun(c):
  c1 = F.split(c,":")
  return F.create_map(c1[0],c1[1]) #Since there can only be 2 strings

out = (df.select("id",*[fun(x).alias(x) for x in columns])
         .select("id",F.explode(F.map_concat(*columns)))
         .groupby("id").pivot("Key").agg(F.first("value")))

out.show()

+------+-----------+-------------+-----------+
|    id|Header One |Header Three |Header Two |
+------+-----------+-------------+-----------+
|z10234|    teacher|           12|     salary|
|z10235|    plumber|           15|     hourly|
|z10236|  executive|           17|     salary|
|z10237|    teacher|           15|     salary|
|z10238|    manager|           11|     hourly|
+------+-----------+-------------+-----------+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.