3

I want to drop all columns whose name starts by 'var' and whose content is 'None'. Sample of my dataframe:

id var1 var2 newvar1 var3 var4 newvar2
1  x    y    dt      None f    None

Dataframe that I want:

id var1 var2 newvar1 var4 newvar2
1  x    y    dt      f    None

I want to do this for several files and I do not know how many 'var' I have in all of them. My dataframe has only one row. Here is the code that I tried:

for i in range(1,300):
    df.drop(df.loc[df['var'+str(i)] == 'None' ].index, inplace=True)

Error obtained:

KeyError: 'var208'

I also tried:

df.drop(df.loc[df['var'+str(i) for i in range(1,300)] == 'None'].index, inplace=True)
SyntaxError: invalid syntax

Could anyone help me improve my code?

2 Answers 2

3

Your error occurs because you have no column with that name. You can use df.columns to get a list of available columns, check if the name .startswith("var") and use df[col].isnull().all() to check if all values are None.

import pandas as pd

df = pd.DataFrame(columns=["id", "var1", "var2", "newvar1", "var3", "var4", "newvar2"],
                  data=[[1, "x", "y", "dt", None, "f", None]])


df.drop([col for col in df.columns if col.startswith("var") and df[col].isnull().all()], axis=1, inplace=True)
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your suggestion, but it does not work. Maybe because what I have is "None" not "NaN"
Is your "None" a string? Or is it None like I defined above?
If it is a string (because you read it from a file and it maybe didnt get converted), try to replace df[col].isnull().all() with df[col].isin(['None']).all()
Great, you're welcome :)
1

Let's try

out = df.drop(columns=df.filter(regex='^var').isna().all().pipe(lambda s: s.index[s]))
print(out)

   id var1 var2 newvar1 var4 newvar2
0   1    x    y      dt    f    None

Step by step explanation

out = df.drop(columns=(df.filter(regex='^var')       # get columns where column header starts with var
                       .isna()                       # is the value none
                       .all()                        # is the whole column none
                       .pipe(lambda s: s.index[s]))  # get the index name where the whole column is none )

2 Comments

Thank you for your suggestion, but it does not work. Maybe because what I have is "None" not "NaN"
@MGFern How about doing df = df.replace('None', None) before the drop?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.