3

I need to drop all rows where a one column are below a certain value. I used the command below, but this returns the column as an object. I need to keep it as int64:

df["customer_id"] = df.drop(df["customer_id"][df["customer_id"] < 9999999].index)
df = df.dropna()

I have tried to re-cast the field as int64 after, but this causes the following error with data from a totally different column:

invalid literal for long() with base 10: '2014/03/09 11:12:27'
2
  • df["cutomer_id"] = df.drop(df[df["cutomer_id"] < 9999999].index) Commented Jun 12, 2016 at 14:40
  • This does not change anything. Thank you. Commented Jun 12, 2016 at 15:23

2 Answers 2

1

I think you need boolean indexing with reset_index:

import pandas as pd

df = pd.DataFrame({'a': ['s', 'd', 'f', 'g'],
                'customer_id':[99999990, 99999997, 1000, 8888]})
print (df) 
   a  customer_id
0  s     99999990
1  d     99999997
2  f         1000
3  g         8888

df1 = df[df["customer_id"] > 9999999].reset_index(drop=True)
print (df1)
   a  customer_id
0  s     99999990
1  d     99999997

Solution with drop, but is slowier:

df2 = (df.drop(df.loc[df["customer_id"] < 9999999, 'customer_id'].index))
print (df2)
   a  customer_id
0  s     99999990
1  d     99999997

Timings:

In [12]: %timeit df[df["customer_id"] > 9999999].reset_index(drop=True)
1000 loops, best of 3: 676 µs per loop

In [13]: %timeit (df.drop(df.loc[df["customer_id"] < 9999999, 'customer_id'].index))
1000 loops, best of 3: 921 µs per loop
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, that worked! Is it possible to run this on several columns in one command?
Do you need compare multiple columns for 9999999 ? If yes, data can be excluded if row in all columns contain 9999999 or if at least one of column contains 9999999?
E.g. 9999999 for one column and 999 for another? Currently I need to define a series of DFs to capture the changes. I am sure there is a smarter way of doing this.
Sorry, but you need for each column some different condition? e.g. for column col1 values higher as 999, for another col2 higher as 77777 ant this way for each column? This is not clear for me.
0

What's wrong with slicing the whole frame (and reindexing if necessary)?

df = df[df["customer_id"] < 9999999]
df.index = range(0,len(df))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.