create Dataframe based on date and values of another Dataframe

Question

I have a the following problem :

1) df1 : one DataFrame with the following columns :

 date        Unnamed: 1 Unnamed: 2 Unnamed: 3 ........ Unnamed: 102 
2001-12-28   v1          v2          v4                    v102
2002-1-30    v1          v3         v7                     v102
2002-2-24    v2          v4         v5                    v102
.
.
.
.
2020-05-20   v1           v8          v9                    v102

In this DataFrame, I have the date and the names(v1,v2,...,v102) of the stocks present in a portfolio in that date.

2) df2: in the second "DataFrame" :

date               code          price

2002-04-21         v1            50
2002-04-23         v1            50.2
2002-04-23         v2            10.1
.                  .              .
(955809 rows later).              .
.                  .              .
2020-05-20        v3             14.3

In my second DataFrame, I have the name and the price of each stock for the following date

3) I want to create several DataFrames each with a 3 month period, starting by the date in the first row of the df1 and then moving 3 months forward, with only the stocks that are present in the df1 row that matches the start date.

for example :

df3 : start date 2001-12-28

date               code          price

2001-12-28        v1            50
2001-12-29        v1            50.2
2001-12-29        v2            13.1
.                  .              .
.                  .              .
.                  .              .
2020-03-28        v3             6.5

I don't have a clue on how to start, or how to write in a compact manner, if you guys could point me in an direction that would be great.

galaxyan · Accepted Answer · 2020-05-21 22:28:38Z

1

get the first line of data and get the date

tickers = df1.iloc[0, 1:].tolist()
# if not datetime
#df1['date'] = pd.to_datetime(df1['date'])
start_date = df.date.iloc[0] 
end_date = start_date + pd.DateOffset(months=3)

filter out the data you do not want

df2[(df2.date >= start_date) & (df2.date <= end_date) & (df2.ticker.isin(tickers))]

and then you could add those two steps to the loop.

list_df = []
last_date =  df1.iloc[-1,0]
start_date = df1.date.iloc[0] 
while True:
    end_date = start_date + pd.DateOffset(months=3)
    cut_off_date =  min(last_date, end_date)
    tickers = df1[df1.date == start_date].iloc[0, 1:].tolist()
    list_df.append(df2[(df2.date >= start_date) & (df2.date <= cut_off_date) & (df2.ticker.isin(tickers))])

    if end_date > last_date:
        break
    start_date = end_date

edited May 21, 2020 at 22:28

answered May 21, 2020 at 22:08

galaxyan

6,1593 gold badges23 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

costas Over a year ago

woww, I was expecting some tips on how to proceed, thank you very much!

costas Over a year ago

I got the following error : IndexError: single positional indexer is out-of-bounds, for the line of the : tickers = df1[df1.date == start_date].iloc[0, 1:].tolist()

galaxyan Over a year ago

@costas it seems you do not have data on that date

Collectives™ on Stack Overflow

create Dataframe based on date and values of another Dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related