8

I have the table below contained in mytest.csv as below :

timestamp   val1    val2    user_id  val3  val4    val5    val6
01/01/2011  1   100 3    5     100     3       5
01/02/2013  20  8        6     12      15      3
01/07/2012      19  57   10    9       6       6        
01/11/2014  3100    49  6        12    15      3
21/12/2012          240  30    240     30       
01/12/2013          63                  
01/12/2013  3200    51  63       50

The above was obtained using the following code in which I tried to remove all duplicates but unfortunately some remained (based on 'timestamp' and 'user_id'):

import pandas as pd

newnames = ['timestamp', 'val1', 'val2','val3', 'val4','val5', 'val6','user_id']
df = pd.read_csv('mytest.csv', names = newnames, header = False, parse_dates=True, dayfirst=True)
df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True) 
df = df.loc[:,['timestamp', 'user_id', 'val1', 'val2','val3', 'val4','val5', 'val6']]
df_clean = df.drop_duplicates().fillna(0)

Also, I would like to know how I can efficiently remove all duplicate from the data (pre-processing) and if I should do this before reading it into a dataframe. For example the two last rows are considered duplicates and only the last one which do not contain empty val1 (val1 = 3200) should remain in the dataframe.

Thanks in advance for your help.

1

1 Answer 1

10

If you want to drop duplicates based on specific columns, you can use the subset argument (older pandas versions: cols) in drop_duplicates:

df_clean = df.drop_duplicates(subset=['timestamp', 'user_id'])
Sign up to request clarification or add additional context in comments.

6 Comments

Is it possible to equally delete rows for which val1 is nan or equal zero?
Do you mean something like df.dropna(subset=['val1'])?
Will that delete the entire row?
Yes, this deletes rows where there is a NaN value in the val1 column. What do you mean with 'equally delete rows for which val1 is nan'?
cols doesn't work anymore since v0.18, it was replaced by subset
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.