6

I have an existing logic which converts pandas dataframe to list of tuples.

list(zip(*[df[c].values.tolist() for c in df])) 

where df is a pandas dataframe.

Somebody please help me implement the same logic without pandas in pyspark.

2
  • It isn't clear to me what the relation between pandas and spark is and why you're mentioning it. Commented Oct 14, 2019 at 21:57
  • df is created by calling toPandas() on a spark dataframe, I would like directly convert the spark dataframe to list of tuples. Commented Oct 14, 2019 at 22:22

2 Answers 2

6

You can first convert the dataframe to an RDD using the rdd method. A Row in dataframes is a tuple too, so you can just:

rdd = df.rdd
b = rdd.map(tuple)
b.collect()

Example DF:

df.show()
+-----+-----+
| Name|Score|
+-----+-----+
|name1|11.23|
|name2|14.57|
|name3| 2.21|
|name4| 8.76|
|name5|18.71|
+-----+-----+

After b.collect()

[('name1', 11.23), ('name2', 14.57), ('name3', 2.21), ('name4', 8.76), ('name5', 18.71)]

EDIT

If you're going to loop over this list of tuples, You may call collect() but the right method is toLocalIterator()

Sign up to request clarification or add additional context in comments.

4 Comments

I liked your solution, can we do it without collect?
@Thomas collect has been just used to show you output. The solution works witbout collect
I have another method which is expecting list of tuples and doesn't work if i pass b to it since b is still a rdd
@Thomas I extended my answer
2

An alternative without collect but with collect_list

import pyspark.sql.functions as F

df.show()
+-----+-----+
| Name|Score|
+-----+-----+
|name1|11.23|
|name2|14.57|
|name3| 2.21|
|name4| 8.76|
|name5|18.71|
+-----+-----+

@F.udf
def combo(*args):
  return [_ for _ in args][0]

df.withColumn('Combo', combo(F.array('Name','Score'))).agg(F.collect_list('Combo')).show(truncate=False)

+--------------------------------------------------------------------------+
|collect_list(Combo)                                                       |
+--------------------------------------------------------------------------+
|[[name1, 11.23],[name2, 14.57],[name3, 2.21],[name4, 8.76],[name5, 18.71]]|
+--------------------------------------------------------------------------+



Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.