1

I added a column to a dataframe which is an array of other columns.

strong text

How can I ignore the null values when I construct the column name_source? For example the line for Robert would show [internet, Robert] instead of [internet, Robert,]. The one with null name would show [internet,65878] instead of [internet,, 65878]

1
  • Please let me know the answer given with filter works for you. Commented Jul 17, 2021 at 0:13

3 Answers 3

1

You can achieve the same as follows:

df = df.withColumn('name_source', expr('filter(name_source, x -> x is not null)'))

Scala:

df.select(filter(col("name_source"), x => x.isNotNull))
Sign up to request clarification or add additional context in comments.

2 Comments

can you please provide in scala version
@LearnHadoop I have updated the answer, could you check whether that helps
0

Assuming you want to create your new column from all the ones in the initial dataframe:

df["name_source"] = df.apply(lambda x : x.dropna().values, axis=1)

Comments

0

Array all the rows and the array except the None after you cast it to string

df.withColumn("name_source",array(*[c for c in df.columns ])).withColumn("name_source",array_except(col('name_source'),array(lit(None).cast('string')))).show(truncate=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.