2

I would like to standardize my dataframe making it start and end at a precise date but i can't find the solution... I am dealing with a timeseries so it is crucial I have everything starting and ending on the same day.

I have tried several piece of code including code from stackvoerflow but nothing works.

Right now I just want rows that are between 01/01/2010 and 31/12/2017 this is the code I have so far:

df=pd.read_csv("AREX.csv", sep = ";")
df[~df['Date'].isin(pd.date_range(start='20100101', end='20171231'))]        
print(df)
df.drop(["Open","High","Low","Volume","Open interest"],axis = 1, inplace=True)
print(df)

But it does not affect the number of rows it only drops the column I ask it to.

Does anyone has any idea on how to fix this?

Thank you in advance for any piece of advice you can give me!

6
  • 2
    Have to assign back ... df = df[~df['Date'].isin(pd.date_range(start='20100101', end='20171231'))] Commented Oct 14, 2018 at 18:51
  • No I tried that earlier and it does not work either... :/ But thank you for the input :) Commented Oct 14, 2018 at 19:56
  • Look, there are several things that could be happening. First of all, is Date an actual datetime dtype column ? Second, you definitely have to assign back, otherwise your code is doing nothing. Third, instead of using pd.date_range, use (df.Date <= '2017-12-31) & (df.Date >= '2010-01-01)` Commented Oct 14, 2018 at 20:00
  • Thank you so much for your help ! So part of it works now, it starts at 2010 but does not stop as 2017 I'll need to figure out a solution. here is the chnage I have made df["Date"] = pd.to_datetime(df['Date']) df = df[~df['Date'].isin(df.Date <= '2017-12-31') & (df.Date >= '2010-01-01')] Thank you @RafaelC Commented Oct 14, 2018 at 20:33
  • 1
    Thank you for all the advice and solution ! I have put it my loop and it works perfectly now ! :) Commented Oct 14, 2018 at 20:49

1 Answer 1

1

Ok so thanks to @RafaelC here is the answer to my problem.

def concatenate(indir="../Equity_Merton", outfile = "../Merged.csv"):
    os.chdir(indir)    
    fileList = glob.glob("*.csv")
    ticker = []
    main_df = pd.DataFrame()

    for filename in fileList:
        print(filename)
        df=pd.read_csv(filename, sep = ";")
        ticker.append(df)
        df["Date"] = pd.to_datetime(df['Date'])
        df = df[(df.Date <= '2017-12-31') & (df.Date >= '2010-01-01')]
        df.set_index("Date", inplace=True)     
        df.rename(columns = {"Close": filename[0:len(filename) - 4]}, inplace = True)
        df.drop(["Open","High","Low","Volume","Open interest"],axis = 1, inplace=True)

        if main_df.empty:
            main_df = df
        else:
            main_df = main_df.join(df, how='outer')

#        main_df = main_df.dropna(axis = 0, how="any")
        main_df.sort_index(axis=0, level=None, ascending=False, inplace=True, kind='quicksort', na_position='last')

    print(main_df.head())
    main_df.to_csv('Merton_Merged.csv')         
    shutil.move("Merton_Merged.csv", "../Merton_Merged.csv")

Thank you for your help!!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.