1

I have a pyspark dataframe and want to add a column that adds values from a list in a repeating fashion. If this were just python, I would probably use itertools' cycle function. I don't know how to do this in pyspark.

names = ['Julia', 'Tim', 'Zoe']

My dataframe looks like this:

+-----+------+
| id_A| idx_B|
+-----+------+
|    a|     0|       
|    b|     0|    
|    b|     2|       
|    b|     2|       
|    b|     2|       
|    b|     2|      
+-----+------+

I want it to look like this:

+-----+------+--------+
| id_A| idx_B| names  |
+-----+------+--------+
|    a|     0|   Julia|
|    b|     0|     Tim|
|    b|     2|     Zoe|
|    b|     2|   Julia|
|    b|     2|     Tim|
|    b|     2|     Zoe|
+-----+------+--------+

1 Answer 1

2

Here's one way.

1 - add a unique incremental id for your dataframe:

df = spark.createDataFrame(
    df.rdd.zipWithIndex().map(lambda x: Row(*x[0], x[1]))
).toDF("id_A", "idx_B", "id")

df.show()
#+----+-----+---+
#|id_A|idx_B| id|
#+----+-----+---+
#|   a|    0|  0|
#|   b|    0|  1|
#|   b|    2|  2|
#|   b|    2|  3|
#|   b|    2|  4|
#|   b|    2|  5|
#+----+-----+---+

2 - create dataframe from the list of names:

names_df = spark.createDataFrame([(idx, name) for idx, name in enumerate(names)], ["name_id", "names"])

3 - join using modulo 3 (length of names list) in condition:

from pyspark.sql import functions as F

result = df.join(
    names_df,
    F.col("id") % 3 == F.col("name_id")
).orderBy("id").drop("id", "name_id")

result.show()
#+----+-----+-----+
#|id_A|idx_B|names|
#+----+-----+-----+
#|   a|    0|Julia|
#|   b|    0|  Tim|
#|   b|    2|  Zoe|
#|   b|    2|Julia|
#|   b|    2|  Tim|
#|   b|    2|  Zoe|
#+----+-----+-----+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.