Python drop columns in string range

Question

I want to drop all columns whose name starts by 'var' and whose content is 'None'. Sample of my dataframe:

id var1 var2 newvar1 var3 var4 newvar2
1  x    y    dt      None f    None

Dataframe that I want:

id var1 var2 newvar1 var4 newvar2
1  x    y    dt      f    None

I want to do this for several files and I do not know how many 'var' I have in all of them. My dataframe has only one row. Here is the code that I tried:

for i in range(1,300):
    df.drop(df.loc[df['var'+str(i)] == 'None' ].index, inplace=True)

Error obtained:

KeyError: 'var208'

I also tried:

df.drop(df.loc[df['var'+str(i) for i in range(1,300)] == 'None'].index, inplace=True)

SyntaxError: invalid syntax

Could anyone help me improve my code?

bitflip · Accepted Answer · 2022-09-24 11:14:53Z

3

Your error occurs because you have no column with that name. You can use df.columns to get a list of available columns, check if the name .startswith("var") and use df[col].isnull().all() to check if all values are None.

import pandas as pd

df = pd.DataFrame(columns=["id", "var1", "var2", "newvar1", "var3", "var4", "newvar2"],
                  data=[[1, "x", "y", "dt", None, "f", None]])


df.drop([col for col in df.columns if col.startswith("var") and df[col].isnull().all()], axis=1, inplace=True)

edited Sep 24, 2022 at 11:14

answered Sep 24, 2022 at 11:09

bitflip

3,7391 gold badge6 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MG Fern Over a year ago

Thank you for your suggestion, but it does not work. Maybe because what I have is "None" not "NaN"

bitflip Over a year ago

Is your "None" a string? Or is it None like I defined above?

bitflip Over a year ago

If it is a string (because you read it from a file and it maybe didnt get converted), try to replace df[col].isnull().all() with df[col].isin(['None']).all()

bitflip Over a year ago

Great, you're welcome :)

Ynjxsjmh · Accepted Answer · 2022-09-24 11:21:22Z

1

Let's try

out = df.drop(columns=df.filter(regex='^var').isna().all().pipe(lambda s: s.index[s]))

print(out)

   id var1 var2 newvar1 var4 newvar2
0   1    x    y      dt    f    None

Step by step explanation

out = df.drop(columns=(df.filter(regex='^var')       # get columns where column header starts with var
                       .isna()                       # is the value none
                       .all()                        # is the whole column none
                       .pipe(lambda s: s.index[s]))  # get the index name where the whole column is none )

answered Sep 24, 2022 at 11:21

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

2 Comments

MG Fern Over a year ago

Thank you for your suggestion, but it does not work. Maybe because what I have is "None" not "NaN"

Ynjxsjmh Over a year ago

@MGFern How about doing df = df.replace('None', None) before the drop?

Collectives™ on Stack Overflow

Python drop columns in string range

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related