Somewhat similar to Excel's VLOOKUP function, I am wanting to use a value in one dataframe (portfolios below) to find an associated value in a second dataframe (returns below) and populate a third dataframe (let's call this dataframe3 for now) with these returned values. I have found several posts based on left merges and map, but my original two dataframes are of different structures, so these methods don't seem to fit (to me, at least).
I haven't made much progress, but here is the code I have:
Code
import pandas as pd
portfolios = pd.read_csv('portstst5_1.csv')
returns = pd.read_csv('Example_Returns.csv')
total_cols = len(portfolios.columns)
headers = list(portfolios)
concat = returns['PERMNO'].map(str) + returns['FROMDATE'].map(str)
idx = 2
returns.insert(loc=idx, column="concat", value=concat)
for i in range(total_cols):
col_len = portfolios.iloc[:,i].count()
for j in range(col_len):
print(portfolios.iat[j,i].astype('int').astype('str') + headers[i])
Data
This code will make a little more sense if I first describe my data:
portfolios is a dataframe with 13 columns of varying lengths. The column headers are dates in YYYYMMDD format. Below each date header are identifiers which are five digit numeric codes. A snippet of portfolios looks like this (some elements in some columns contain NaN):
20131231 20131130 20131031 20130930 20130831 20130731 20130630 \
0 93044.0 93044.0 13264.0 13264.0 89169.0 82486.0 91274.0
1 79702.0 91515.0 90710.0 81148.0 47387.0 88359.0 93353.0
2 85751.0 85724.0 88810.0 11513.0 85576.0 47387.0 85576.0
The data in returns data originally consists of three columns and 799 rows and looks like this (all elements are populated with values):
PERMNO FROMDATE MORET
0 93044 20131231 -0.022304
1 79702 20131231 0.012283
2 85751 20131231 -0.016453
3 85576 20131231 0.038766
Desired Output
I would like to make a third dataframe that is structured identically to portfolios. That is, it will have the same column header dates and the same number of rows in each column as does portfolios, but instead of identifiers, it will contain the MORET for the appropriate identifier/date combination. This is the reason for the concatenations in my code above - I am trying (perhaps unnecessarily) to create unique lookup values so I can communicate between portfolios and returns. For example, to populate dataframe3[0,0], I would look for the concatenated values from portfolios[0,0] and headers[0] (i.e. 9304420131231) in returns['concat'] and return the associated value in returns['MORET'] (i.e. -0.022304). I am stuck here on how to use the concatenated values to return my desired data.
Any thoughts are greatly appreciated.