How to replace a string in a column with other string from the same column

Question

I have below dataframe.

id,code

1,GSTR

2,GSTR

3,NA

4,NA

5,NA

here GSTR may change it can be anything. i want to replace NA with other string that is present in the same column.

In this case i want to replace NA with other string that is present in the column i.e GSTR. I tried to use UDFS but being an unknown string. I am not able to figure out.

Note: In this code column there will be only two strings. one will be "NA" and another can be anything in our case GSTR is another string

Expected output

1,GSTR

2,GSTR

3,GSTR

4,GSTR

5,GSTR

Always code column will have only 2 values, 'NA' and 'some string' ? — Suresh
– Suresh, Commented Jan 5, 2018 at 10:17

Suresh · Accepted Answer · 2018-01-05 10:55:59Z

1

we can take the distinct string other than NA and use it,

>>> from pyspark.sql import functions as F
>>> df = spark.createDataFrame([(1,'GSTR'),(2,'GSTR'),(3,'NA'),(4,'NA'),(5,'NA')],['id','code'])
>>> df.show()
+---+----+
| id|code|
+---+----+
|  1|GSTR|
|  2|GSTR|
|  3|  NA|
|  4|  NA|
|  5|  NA|
+---+----+
>>> rstr = df.where(df.code != 'NA')[['code']].first().code
>>> df.withColumn('code',F.lit(rstr)).show()
+---+----+
| id|code|
+---+----+
|  1|GSTR|
|  2|GSTR|
|  3|GSTR|
|  4|GSTR|
|  5|GSTR|
+---+----+

Hope this helps.

edited Jan 5, 2018 at 10:55

answered Jan 5, 2018 at 10:23

Suresh

5,8802 gold badges27 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user8510536 Over a year ago

Thanks for your input. GSTR can be anywhere not only the first position. can you do anything for that?

Suresh Over a year ago

@AshSr , code will have only two values and we are taking not NA rows only, which gives only GSTR. There all rows will always have GSTR and take just first value to get the string dynamically.

user8510536 Over a year ago

Okay suresh, suppose i have few more columns like code1 and code2 having same type of date. should i code for each and every column? cant we make that dynamic?

Suresh Over a year ago

Let us continue this discussion in chat.

Collectives™ on Stack Overflow

How to replace a string in a column with other string from the same column

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related