How to remove columns after any row has a NaN value in Python pandas dataframe

Question

Toy example code

Let's say I have following DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[11,21,31], "B":[12,22,32], "C":[np.nan,23,33], "D":[np.nan,24,34], "E":[15,25,35]})

Which would return:

>>> df
    A   B     C     D   E
0  11  12   NaN   NaN  15
1  21  22  23.0  24.0  25
2  31  32  33.0  34.0  35

Remove all columns with `nan` values

I know how to remove all the columns which have any row with a nan value like this:

out1 = df.dropna(axis=1, how="any")

Which returns:

>>> out1
    A   B   E
0  11  12  15
1  21  22  25
2  31  32  35

Expected output

However what I expect is to remove all columns after a nan value is found. In the toy example code the expected output would be:

Question

How can I remove all columns after a nan is found within any row in a pandas DataFrame ?

Your question asks how to remove columns after a nan, so column C should remain (unless the question is how to remove columns with a NaN and the column immediately after it). — Alexander
– Alexander, Commented Oct 9, 2020 at 17:06
Maybe I should refrase. The expected output is what I’m looking for. Once a nan is found within a column that column and the remaining ones after that one should be dropped. — Cedric Zoppolo
– Cedric Zoppolo, Commented Oct 9, 2020 at 17:12

Paul H · Accepted Answer · 2020-10-09 16:52:13Z

4

What I would do:

check every element for being null/not null
cumulative sum every row across the columns
check any for every column, across the rows
use that result as an indexer:

df.loc[:, ~df.isna().cumsum(axis=1).any(axis=0)]

Give me:

answered Oct 9, 2020 at 16:52

Paul H

68.7k23 gold badges165 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Cedric Zoppolo · Accepted Answer · 2020-10-09 16:43:36Z

0

I could find a way as follows to get the expected output:

colFirstNaN = df.isna().any(axis=0).idxmax() # Find column that has first NaN element in any row
indexColLastValue = df.columns.tolist().index(colFirstNaN) -1
ColLastValue = df.columns[indexColLastValue]
out2 = df.loc[:, :ColLastValue]

And the output would be then:

answered Oct 9, 2020 at 16:43

Cedric Zoppolo

4,7937 gold badges34 silver badges63 bronze badges

2 Comments

Dan Over a year ago

df.iloc[:, :colFirstNaN]?

Cedric Zoppolo Over a year ago

@Dan if I use that code I get TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [C] of <type 'str'>

Collectives™ on Stack Overflow

How to remove columns after any row has a NaN value in Python pandas dataframe

Toy example code

Remove all columns with `nan` values

Expected output

Question

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Toy example code

Remove all columns with nan values

Expected output

Question

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Remove all columns with `nan` values