0

I have a dataframe which looks like this

                                    Label                   Type  
Name                                                              
ppppp                         Base brute          UnweightedBase  
pbaaa                               Base                    Base  
pb4a1                      Très à gauche                Category 
pb4a2                           A gauche   pb4a2        Category  
pb4a3                          Au centre   pb4a3        Category  
pb4a4                           A droite   pb4a4        Category  

if "Type" column's value is "UnweightedBase" and "Base", I would like that delete from the data.

I can do this but just for one item at a time with the following code:

to_del = df[df['Type'] == "UnweightedBase"].index.tolist()

df= df.drop(to_del, axis)
return df

How do I modify my code so that I can delete more than one value at once?

my failed attempt:

to_del = df[df['Type'] in ["UnweightedBase","Base"]].index.tolist()

df= df.drop(to_del, axis)
return df

1 Answer 1

3

You could select the desired rows and reassign the resultant DataFrame to df:

In [60]: df = df.loc[~df['Type'].isin(['UnweightedBase', 'Base'])]

In [61]: df
Out[61]: 
    Name              Label      Type
2  pb4a1      Très à gauche  Category
3  pb4a2   A gauche   pb4a2  Category
4  pb4a3  Au centre   pb4a3  Category
5  pb4a4   A droite   pb4a4  Category

I think this is more direct and safer than using

to_del = df[df['Type'].isin(type_val)].index.tolist()
df= df.drop(to_del, axis)

since the latter does essentially the same selection as an intermediate step:

df[df['Type'].isin(type_val)]

moreover, index.tolist() will return index labels. If the index has non-unique values, you might delete unintended rows.

For example:

In [85]: df = pd.read_table('data', sep='\s{4,}')

In [86]: df.index = ['a','b','c','d','e','a']

In [87]: df
Out[87]: 
    Name              Label            Type
a  ppppp         Base brute  UnweightedBase
b  pbaaa               Base            Base
c  pb4a1      Très à gauche        Category
d  pb4a2   A gauche   pb4a2        Category
e  pb4a3  Au centre   pb4a3        Category
a  pb4a4   A droite   pb4a4        Category  #<-- note the repeated index

In [88]: to_del = df[df['Type'].isin(['UnweightedBase', 'Base'])].index.tolist()

In [89]: to_del
Out[89]: ['a', 'b']

In [90]: df = df.drop(to_del)

In [91]: df
Out[91]: 
    Name              Label      Type
c  pb4a1      Très à gauche  Category
d  pb4a2   A gauche   pb4a2  Category
e  pb4a3  Au centre   pb4a3  Category
#<--- OOPs, we've lost the last row, even though the Type was Category.
Sign up to request clarification or add additional context in comments.

1 Comment

Figured it out, this is what I wanted: to_del = meta[meta['Type'].isin(type_val)].index.tolist()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.