Python format and pandas

Question

I want to drop some columns using format.(want to drop columns: new_cost0, new_0_quantity, new_2_cost, and new_2_quantity) But not every column is dropped. Below is the data frame and codes.

dataFrame

    |new_0_cost|new_0_quantity|new_2_cost|new_2_quantity|quality|weights|     
   0| 10       | 20           |  10      | 20           | good  | 40    |

function

def drop_cost_and_quan(data):
    # data is a dataframe described above
    # try to drop new_cost0, new_0_quantity, new_2_cost, and new_2_quantity
    data3 = data.copy()
    for i, item in enumerate(data3.columns):
        if item == 'new_{0}_cost'.format(i):
            data3 = data3.drop(item, axis=1)
        print('cost:',item == 'new_{0}_cost'.format(i))

    for i, item in enumerate(data3.columns):
        if item == 'new_{0}_quantity'.format(i):
            data3 = data3.drop(item, axis=1)
        print(item == 'item_{0}_quantity'.format(i))

    return data3

Outptut:

data3 = drop_cost_and_quan(data):

cost: True
cost: False
cost: True
cost: False
cost: False
cost: False
quntity: True
quntity: False
quntity: False
quntity: False

data3 
 |new_2_quantity|quality| weights|
0| 20           |good   |40

MaxU - stand with Ukraine · Accepted Answer · 2016-11-09 15:36:21Z

3

alternatively to @vinod's method you can also do it this way:

In [148]: df
Out[148]:
   new_0_cost  new_0_quantity  new_2_cost  new_2_quantity  new_0_total_cost  new_2_total_cost quality  weights
0          10              20          10              20              1111              2222    good       40

In [151]: df.drop(df.columns[df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')], 1, inplace=True)

In [152]: df
Out[152]:
   new_0_total_cost  new_2_total_cost quality  weights
0              1111              2222    good       40

Explanation:

In [148]: df
Out[148]:
   new_0_cost  new_0_quantity  new_2_cost  new_2_quantity  new_0_total_cost  new_2_total_cost quality  weights
0          10              20          10              20              1111              2222    good       40

In [149]: df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')
Out[149]: array([ True,  True,  True,  True, False, False, False, False], dtype=bool)

In [150]: df.columns[df.columns.str.contains(r'^new_\d+_(?:quantity|cost)')]
Out[150]: Index(['new_0_cost', 'new_0_quantity', 'new_2_cost', 'new_2_quantity'], dtype='object')

edited Nov 9, 2016 at 15:36

answered Nov 9, 2016 at 14:48

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

K.heer Over a year ago

for this method, if I have other columns called new_0_total_cost, and new_2_total_cost and I don't want to drop the columns(drop only quantity, cost columns, not new_0_total_cost, new_2_total_cost columns), how can I modify the code you wrote?

MaxU - stand with Ukraine Over a year ago

@K.heer, you don't have to modify it, it'll work correctly - check update

K.heer Over a year ago

I add | after cost by mistake, which drops mosts of the columns. Thank you!

K.heer Over a year ago

but do you know why new_{0}_quantity'.format(i) does't work ?

MaxU - stand with Ukraine Over a year ago

@K.heer, i don't know, but you can easily debug it - add print('col#: {}, col_name: {}'.format(i, item)) to your loop

Vinod · Accepted Answer · 2016-11-09 14:46:36Z

2

Use del df['column_name'] to delete column

to delete multiple columns

df.drop([col for col in ['column_name1','column_name2']],axis=1)

answered Nov 9, 2016 at 14:46

Vinod

1,9631 gold badge12 silver badges18 bronze badges

Comments

piRSquared · Accepted Answer · 2016-11-09 16:09:05Z

2

Using filter

setup

data = pd.DataFrame(dict(
        new_0_cost=[10, 10],
        new_0_quantity=[20, 20],
        new_2_cost=[10, 10],
        new_2_quantity=[20, 20],
        quality=['good', 'good'],
        weights=[40, 40],
        new_0_total_cost=[1, 2],
        new_2_total_cost =[3, 4]  
    ))

data

data.filter(regex=r'^(?!new_\d+_(?:quantity|cost))')

edited Nov 9, 2016 at 16:09

answered Nov 9, 2016 at 15:57

piRSquared

296k68 gold badges509 silver badges654 bronze badges

1 Comment

MaxU - stand with Ukraine Over a year ago

yeah, first i thought to use filter() too, but i personally don't like negative lookahead RegEx's - sometimes it's painful to change them. +1

Collectives™ on Stack Overflow

Python format and pandas

3 Answers 3

5 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related