0

I have a the following problem :

1) df1 : one DataFrame with the following columns :

 date        Unnamed: 1 Unnamed: 2 Unnamed: 3 ........ Unnamed: 102 
2001-12-28   v1          v2          v4                    v102
2002-1-30    v1          v3         v7                     v102
2002-2-24    v2          v4         v5                    v102
.
.
.
.
2020-05-20   v1           v8          v9                    v102

In this DataFrame, I have the date and the names(v1,v2,...,v102) of the stocks present in a portfolio in that date.

2) df2: in the second "DataFrame" :

date               code          price

2002-04-21         v1            50
2002-04-23         v1            50.2
2002-04-23         v2            10.1
.                  .              .
(955809 rows later).              .
.                  .              .
2020-05-20        v3             14.3

In my second DataFrame, I have the name and the price of each stock for the following date

3) I want to create several DataFrames each with a 3 month period, starting by the date in the first row of the df1 and then moving 3 months forward, with only the stocks that are present in the df1 row that matches the start date.

for example :

df3 : start date 2001-12-28

date               code          price

2001-12-28        v1            50
2001-12-29        v1            50.2
2001-12-29        v2            13.1
.                  .              .
.                  .              .
.                  .              .
2020-03-28        v3             6.5

I don't have a clue on how to start, or how to write in a compact manner, if you guys could point me in an direction that would be great.

1 Answer 1

1

get the first line of data and get the date

tickers = df1.iloc[0, 1:].tolist()
# if not datetime
#df1['date'] = pd.to_datetime(df1['date'])
start_date = df.date.iloc[0] 
end_date = start_date + pd.DateOffset(months=3)

filter out the data you do not want

df2[(df2.date >= start_date) & (df2.date <= end_date) & (df2.ticker.isin(tickers))]

and then you could add those two steps to the loop.

list_df = []
last_date =  df1.iloc[-1,0]
start_date = df1.date.iloc[0] 
while True:
    end_date = start_date + pd.DateOffset(months=3)
    cut_off_date =  min(last_date, end_date)
    tickers = df1[df1.date == start_date].iloc[0, 1:].tolist()
    list_df.append(df2[(df2.date >= start_date) & (df2.date <= cut_off_date) & (df2.ticker.isin(tickers))])

    if end_date > last_date:
        break
    start_date = end_date
Sign up to request clarification or add additional context in comments.

3 Comments

woww, I was expecting some tips on how to proceed, thank you very much!
I got the following error : IndexError: single positional indexer is out-of-bounds, for the line of the : tickers = df1[df1.date == start_date].iloc[0, 1:].tolist()
@costas it seems you do not have data on that date

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.