0

Let's say I have a df like this:

   id  type  length  key1             key2     key3
0  1   A     144     [value1,value2]  value3
1  1   B     20      value4           [value5]
2  4   A     54                                [value6]

Is there a way in PySpark I can get value5 and value6 out of the list since they are the only elements? I want to apply this to all cells. Output would be:

   id  type  length  key1             key2     key3
0  1   A     144     [value1,value2]  value3
1  1   B     20      value4           value5
2  4   A     54                                value6
1
  • Can you please give a try to the answer Commented Jul 2, 2020 at 18:59

1 Answer 1

1
    Df= df.withColumn('key3', F.when(F.size('key3) ==1),F.col('key3').getItem(1)  ).when(F.size('column_2) ==1),F.col('column_2').getItem(1))
 .otherwise ('key3')

You can keep adding the condition like this

.when(F.size('col_2) ==1),F.col('col_2').getItem(1)) 

First you need to find the size of the array and if found 1 then get the first element. I responded through mobile, please give it a try, I believe it should work

Sign up to request clarification or add additional context in comments.

5 Comments

This is only for one column though. Also, what is F?
I would like to apply it to allcells in the dataframe.
To get F, run import pyspark.sql.functions as F
Can you please approve the answer if it serves the purpose.. Thanks in advance
@dsk I did not, sorry. Also, is there a way to easily apply this to whole df? There are many columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.