0

I have a pandas dataframe, one of the columns of which contains dates.

My objective is to set an initial date, and discard all the rows of the dataframe that are previous to this date. Snippet of dataframe:

 ID         fecha         
519457    25/02/2020 10:03    
519462    25/02/2020 10:07     
519468    25/02/2020 10:12
 ...           ...

The code I have been trying to use is the following:

xls=pd.ExcelFile(r'/home/.../Final.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja1")
Date_initial=['25/02/2020 10:07:00']
df=df.drop(df[["fecha"]<Date_initial].index)

Which did not work. I also tried substituing the last line for:

df[(df['fecha']>=Date_initial)]

As a result, I obtained the error:

ValueError: Lengths must match to compare

Am I missing something in the expression, or going in a completely wrong way to doing this? Thanks for your input!

1 Answer 1

1

May be something like this:

Date_initial='25/02/2020 10:07:00'
df=df[df["fecha"]>=Date_initial]]

Also, I recommend using datetime type:

df = pd.read_excel(xls, 'Hoja1', parse_dates=['fecha'], dayfirst=True)

Date_initial = pd.to_datetime('25/02/2020 10:07:00')
df = df[df['fecha'] >= Date_initial]
Sign up to request clarification or add additional context in comments.

1 Comment

This did the trick! Thanks a lot. PS. I removed an extra bracket from your answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.