11

I used the following code to replace the None value in a DataFrame row to an empty string:

def replaceNone(row):
  row_len = len(row)
  for i in range(0, row_len):
    if row[i] is None:
      row[i] = ""    
  return row

in my pyspark code:

data_out = df.rdd.map(lambda row : replaceNone(row)).map(
  lambda row : "\t".join( [x.encode("utf-8") if isinstance(x, basestring) else str(x).encode("utf-8") for x in row])
)

Then I got the following errors:

File "<ipython-input-10-8e5d8b2c3a7f>", line 1, in <lambda>
  File "<ipython-input-2-d1153a537442>", line 6, in replaceNone
TypeError: 'Row' object does not support item assignment

Does anyone have any idea about the error? How do I replace a "None" value in a row to an empty string? Thanks!

1
  • try df.replace('None',' '). Commented Jun 9, 2016 at 9:17

1 Answer 1

6

Row is a subclass of tuple and tuples in Python are immutable hence don't support item assignment. If you want to replace an item stored in a tuple you have rebuild it from scratch:

## replace "" with placeholder of your choice 
tuple(x if x is not None else "" for x in row)  

If you want to simply concatenate flat schema replacing null with empty string you can use concat_ws:

from pyspark.sql.functions import concat_ws

df.select(concat_ws("\t", *df.columns)).rdd.flatMap(lambda x: x)

To prepare output it makes more sense to use spark-csv and specify nullValue, delimiter and quoteMode.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.