1

I have a dataframe with a schema as follows:

root
 |-- column: struct (nullable = true)
 |    |-- column-string: string (nullable = true)
 |-- count: long (nullable = true)  

What I want to do is:

  1. Get rid of the struct - or by that I mean "promote" column-string, so my dataframe only has 2 columns - column-string and count
  2. I then want to split column-string into 3 different columns, so I end up with the schema:

enter image description here

The text within column-string always fits the format: Some-Text,Text,MoreText

Does anyone know how this is possible?

I'm using Pyspark Python.

PS. I am new to Pyspark & I don't know much about the struct format and couldn't find how to write an example into my post to make it reproducible - sorry.

2 Answers 2

1

You can also use from_csv to convert the comma-delimited string into a struct, and then star expand the struct:

import pyspark.sql.functions as F

df2 = df.withColumn(
    'col',
    F.from_csv(
        'column.column-string',
        '`column-string` string, `column-string2` string, `column-string3` string'
    )
).select('col.*', 'count')

df2.show()
+-------------+--------------+--------------+-----+
|column-string|column-string2|column-string3|count|
+-------------+--------------+--------------+-----+
|     SomeText|          Text|      MoreText|    1|
+-------------+--------------+--------------+-----+

Note that it's better not to have hyphens in column names because they are reserved for subtraction. Underscores are better.

Sign up to request clarification or add additional context in comments.

Comments

0

You can select column-string field from the struct using column.column-string, the simply split by a comma to get three columns :

from pyspark.sql import functions as F

df1 = df.withColumn(
    "column_string", F.split(F.col("column.column-string"), ",")
).select(
    F.col("column_string")[0].alias("column-string"),
    F.col("column_string")[1].alias("column-string2"),
    F.col("column_string")[2].alias("column-string3"),
    F.col("count")
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.