5

I have a pyspark dataframe where some of its columns contain array of string (and one column contains nested array). As a result, I cannot write the dataframe to a csv.

Here is an example of the dataframe that I am dealing with -

    +-------+--------------------+---------+
    |ID     |             emailed| clicked
    +-------+--------------------+---------+
    |9000316|[KBR, NRT, AOR]     |[[AOR]]  
    |9000854|[KBR, NRT, LAX]     | Null 
    |9001996|[KBR, JFK]          |[[JFK]] 
    +-------+--------------------+---------+

I would like to get the following structure, to be saved as a csv.

    +-------+--------------------+---------+
    |ID     |             emailed| clicked
    +-------+--------------------+---------+
    |9000316|KBR, NRT, AOR       | AOR  
    |9000854|KBR, NRT, LAX       | Null 
    |9001996|KBR, JFK            | JFK 
    +-------+--------------------+---------+

I am very new to pyspark. Your help is greatly appreciated. Thank you!

1
  • 1
    Will the column clicked always have this format - [[value]] or it can be [[val1,val2...]] ? Commented Sep 11, 2017 at 17:51

1 Answer 1

6

Can you try this way. You will have to import the module

import pyspark.sql.functions.*
df.select(concat_ws(',', split(df.emailed)).alias('string_form')).collect()

Let me know if that helps.

-----Update----

Code explained in the link, I modified a bit.

from pyspark.sql.functions import *
from pyspark.sql.types import *

def getter(column):
    col_new=''
    for i,col in enumerate(column):
        if i==0:
           col_new=col
        else:
           col_new=col_new+','+col
    return col_new

getterUDF = udf(getter, StringType())

df.select(getterUDF(Ur_Array_Column))

You can try this as well.

Sign up to request clarification or add additional context in comments.

4 Comments

Not sure if the above ans will work. As I checked, split works here on on string. You can view this as well on Stack Overflow. stackoverflow.com/questions/37689878/…
You can use this function:
You have to change i==0 instead of i==1
agreed. Editing the same. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.