0

I have a Pyspark dataframe df, like following:

+---+----+---+
| id|name|  c|
+---+----+---+
|  1|   a|  5|
|  2|   b|  4|
|  3|   c|  2|
|  4|   d|  3|
|  5|   e|  1|
+---+----+---+

I want to add a column match_name that have value from the name column where id == c

Is it possible to do it with function withColumn()?

Currently i have to create two dataframes and then perform join. Which is inefficient on large dataset.

Expected Output:

+---+----+---+----------+
| id|name|  c|match_name|
+---+----+---+----------+
|  1|   a|  5|         e|
|  2|   b|  4|         d|
|  3|   c|  2|         b|
|  4|   d|  3|         c|
|  5|   e|  1|         a|
+---+----+---+----------+
4
  • Possible duplicate of pyspark conditions on multiple columns and returning new column Commented Nov 3, 2017 at 12:02
  • What is c?? Where is the original match column?? Commented Nov 3, 2017 at 12:26
  • c and match are the same i change it for simplicity Commented Nov 3, 2017 at 12:27
  • No answers then? Commented Nov 6, 2017 at 5:11

1 Answer 1

2

Yes, it is possible, with when:

from pyspark.sql.functions import when, col

condition = col("id") == col("match")
result = df.withColumn("match_name", when(condition, col("name"))

result.show()

id name match match_name
1  a    3     null
2  b    2     b
3  c    5     null
4  d    4     d
5  e    1     null

You may also use otherwise to provide a different value if the condition is not met.

Sign up to request clarification or add additional context in comments.

7 Comments

... faster by some seconds (+1)... :)
` from pyspark.sql import functions as func df = spark.createDataFrame([(1, 'a', 5), (2, 'b', 4), (3, 'c', 2), (4, 'd', 3), (5, 'e', 1)], ['id', 'name', 'c']) condition = func.col("id") == func.col("c") result = df.withColumn("match_name", func.when(condition, func.col("name"))) result.show()`
Sir i want to take name of the column where id is 3 and add it across the row having id 1
@JugrajSingh 1) please, do not put long code snippets in the comments - they are unreadable 2) of course you do, because in these (new) data the condition is nowhere matched! I have confirmed that the solution with the data you provided originally in your post is indeed as shown and as expected...
agreed and apologies but my requirement is different. it is to fill all rows with ther possible match_name like join does.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.