3

Toy example code

Let's say I have following DataFrame:

import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[11,21,31], "B":[12,22,32], "C":[np.nan,23,33], "D":[np.nan,24,34], "E":[15,25,35]})

Which would return:

>>> df
    A   B     C     D   E
0  11  12   NaN   NaN  15
1  21  22  23.0  24.0  25
2  31  32  33.0  34.0  35

Remove all columns with nan values

I know how to remove all the columns which have any row with a nan value like this:

out1 = df.dropna(axis=1, how="any")

Which returns:

>>> out1
    A   B   E
0  11  12  15
1  21  22  25
2  31  32  35

Expected output

However what I expect is to remove all columns after a nan value is found. In the toy example code the expected output would be:

    A   B
0  11  12
1  21  22
2  31  32

Question

How can I remove all columns after a nan is found within any row in a pandas DataFrame ?

2
  • Your question asks how to remove columns after a nan, so column C should remain (unless the question is how to remove columns with a NaN and the column immediately after it). Commented Oct 9, 2020 at 17:06
  • Maybe I should refrase. The expected output is what I’m looking for. Once a nan is found within a column that column and the remaining ones after that one should be dropped. Commented Oct 9, 2020 at 17:12

2 Answers 2

4

What I would do:

  1. check every element for being null/not null
  2. cumulative sum every row across the columns
  3. check any for every column, across the rows
  4. use that result as an indexer:
df.loc[:, ~df.isna().cumsum(axis=1).any(axis=0)]

Give me:

    A   B
0  11  12
1  21  22
2  31  32
Sign up to request clarification or add additional context in comments.

Comments

0

I could find a way as follows to get the expected output:

colFirstNaN = df.isna().any(axis=0).idxmax() # Find column that has first NaN element in any row
indexColLastValue = df.columns.tolist().index(colFirstNaN) -1
ColLastValue = df.columns[indexColLastValue]
out2 = df.loc[:, :ColLastValue]

And the output would be then:

>>> out2
    A   B
0  11  12
1  21  22
2  31  32

2 Comments

df.iloc[:, :colFirstNaN]?
@Dan if I use that code I get TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [C] of <type 'str'>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.