9

I'm using spark 2.0.1,

 df.show()
+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
|     0.0|   3.0|1.0|  1.0|  0.0| 7.3|
|     1.0|   1.0|0.0|  1.0|  0.0|71.3|
|     1.0|   3.0|0.0|  0.0|  0.0| 7.9|
|     1.0|   1.0|0.0|  1.0|  0.0|53.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.5|
|     0.0|   1.0|1.0|  0.0|  0.0|51.9|

I have a data frame and I want to add a new column to df using withColumn and value of new column is base on other column value. I used something like this:

>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))

It is giving an error

TypeError: 'Column' object is not callable

can any help how to over come this error.

2 Answers 2

10

Its because you are trying to apply the function contains to the column. The function contains does not exist in pyspark. You should try like. Try this:

import pyspark.sql.functions as F

df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))

Or if you just want it to be exactly the number 3 you should do:

import pyspark.sql.functions as F

# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))

# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))
Sign up to request clarification or add additional context in comments.

2 Comments

The function contains is described in the documentation without any New in version X.X warning as seen here. Any idea why is not available?
In this case, the user was using pyspark 2.0.1, in wich contains is not available. Check your pyspark version, because contains is only available from 2.2 and above. Cheers.
0

To get the equivalent of contains you need the like function including % before and after the string you are using in your search:

dfnew = df.withColumn('AddCol' , when(df.Pclass.like('%3.0%'),'three').otherwise('notthree'))

As written in the comments, in newer versions of pyspark contains function can be used.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.