Pandas: Drop() int64 based on value returns object

Question

I need to drop all rows where a one column are below a certain value. I used the command below, but this returns the column as an object. I need to keep it as int64:

df["customer_id"] = df.drop(df["customer_id"][df["customer_id"] < 9999999].index)
df = df.dropna()

I have tried to re-cast the field as int64 after, but this causes the following error with data from a totally different column:

invalid literal for long() with base 10: '2014/03/09 11:12:27'

df["cutomer_id"] = df.drop(df[df["cutomer_id"] < 9999999].index) — Merlin
– Merlin, Commented Jun 12, 2016 at 14:40

jezrael · Accepted Answer · 2016-06-12 16:51:08Z

1

I think you need boolean indexing with reset_index:

import pandas as pd

df = pd.DataFrame({'a': ['s', 'd', 'f', 'g'],
                'customer_id':[99999990, 99999997, 1000, 8888]})
print (df) 
   a  customer_id
0  s     99999990
1  d     99999997
2  f         1000
3  g         8888

df1 = df[df["customer_id"] > 9999999].reset_index(drop=True)
print (df1)
   a  customer_id
0  s     99999990
1  d     99999997

Solution with drop, but is slowier:

df2 = (df.drop(df.loc[df["customer_id"] < 9999999, 'customer_id'].index))
print (df2)
   a  customer_id
0  s     99999990
1  d     99999997

Timings:

In [12]: %timeit df[df["customer_id"] > 9999999].reset_index(drop=True)
1000 loops, best of 3: 676 µs per loop

In [13]: %timeit (df.drop(df.loc[df["customer_id"] < 9999999, 'customer_id'].index))
1000 loops, best of 3: 921 µs per loop

edited Jun 12, 2016 at 16:51

answered Jun 12, 2016 at 16:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user6453877 Over a year ago

Thank you, that worked! Is it possible to run this on several columns in one command?

jezrael Over a year ago

Do you need compare multiple columns for 9999999 ? If yes, data can be excluded if row in all columns contain 9999999 or if at least one of column contains 9999999?

user6453877 Over a year ago

E.g. 9999999 for one column and 999 for another? Currently I need to define a series of DFs to capture the changes. I am sure there is a smarter way of doing this.

jezrael Over a year ago

Sorry, but you need for each column some different condition? e.g. for column col1 values higher as 999, for another col2 higher as 77777 ant this way for each column? This is not clear for me.

Clinton Boys · Accepted Answer · 2016-06-12 15:34:36Z

0

What's wrong with slicing the whole frame (and reindexing if necessary)?

df = df[df["customer_id"] < 9999999]
df.index = range(0,len(df))

answered Jun 12, 2016 at 15:34

Clinton Boys

1411 silver badge5 bronze badges

Collectives™ on Stack Overflow

Pandas: Drop() int64 based on value returns object

2 Answers 2

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related