3

I want to drop some columns using format.(want to drop columns: new_cost0, new_0_quantity, new_2_cost, and new_2_quantity) But not every column is dropped. Below is the data frame and codes.

dataFrame

    |new_0_cost|new_0_quantity|new_2_cost|new_2_quantity|quality|weights|     
   0| 10       | 20           |  10      | 20           | good  | 40    |

function

def drop_cost_and_quan(data):
    # data is a dataframe described above
    # try to drop new_cost0, new_0_quantity, new_2_cost, and new_2_quantity
    data3 = data.copy()
    for i, item in enumerate(data3.columns):
        if item == 'new_{0}_cost'.format(i):
            data3 = data3.drop(item, axis=1)
        print('cost:',item == 'new_{0}_cost'.format(i))

    for i, item in enumerate(data3.columns):
        if item == 'new_{0}_quantity'.format(i):
            data3 = data3.drop(item, axis=1)
        print(item == 'item_{0}_quantity'.format(i))

    return data3

Outptut:

data3 = drop_cost_and_quan(data):

cost: True
cost: False
cost: True
cost: False
cost: False
cost: False
quntity: True
quntity: False
quntity: False
quntity: False

data3 
 |new_2_quantity|quality| weights|
0| 20           |good   |40

3 Answers 3

3

alternatively to @vinod's method you can also do it this way:

In [148]: df
Out[148]:
   new_0_cost  new_0_quantity  new_2_cost  new_2_quantity  new_0_total_cost  new_2_total_cost quality  weights
0          10              20          10              20              1111              2222    good       40

In [151]: df.drop(df.columns[df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')], 1, inplace=True)

In [152]: df
Out[152]:
   new_0_total_cost  new_2_total_cost quality  weights
0              1111              2222    good       40

Explanation:

In [148]: df
Out[148]:
   new_0_cost  new_0_quantity  new_2_cost  new_2_quantity  new_0_total_cost  new_2_total_cost quality  weights
0          10              20          10              20              1111              2222    good       40

In [149]: df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')
Out[149]: array([ True,  True,  True,  True, False, False, False, False], dtype=bool)

In [150]: df.columns[df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')]
Out[150]: Index(['new_0_cost', 'new_0_quantity', 'new_2_cost', 'new_2_quantity'], dtype='object')
Sign up to request clarification or add additional context in comments.

5 Comments

for this method, if I have other columns called new_0_total_cost, and new_2_total_cost and I don't want to drop the columns(drop only quantity, cost columns, not new_0_total_cost, new_2_total_cost columns), how can I modify the code you wrote?
@K.heer, you don't have to modify it, it'll work correctly - check update
I add | after cost by mistake, which drops mosts of the columns. Thank you!
but do you know why new_{0}_quantity'.format(i) does't work ?
@K.heer, i don't know, but you can easily debug it - add print('col#: {}, col_name: {}'.format(i, item)) to your loop
2

Use del df['column_name'] to delete column

to delete multiple columns

df.drop([col for col in ['column_name1','column_name2']],axis=1) 

Comments

2

Using filter

setup

data = pd.DataFrame(dict(
        new_0_cost=[10, 10],
        new_0_quantity=[20, 20],
        new_2_cost=[10, 10],
        new_2_quantity=[20, 20],
        quality=['good', 'good'],
        weights=[40, 40],
        new_0_total_cost=[1, 2],
        new_2_total_cost =[3, 4]  
    ))

data

enter image description here


data.filter(regex=r'^(?!new_\d+_(?:quantity|cost))')

enter image description here

1 Comment

yeah, first i thought to use filter() too, but i personally don't like negative lookahead RegEx's - sometimes it's painful to change them. +1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.