2

I am trying to create a new df which summarises my key information, by taking that information from 3 (say) other dataframes.

dfdate = {'x1': [2, 4, 7, 5, 6],
     'x2': [2, 2, 2, 6, 7],
     'y1': [3, 1, 4, 5, 9]}
dfdate = pd.DataFrame(df, index=range(0:4))

dfqty = {'x1': [1, 2, 6, 6, 8],
     'x2': [3, 1, 1, 7, 5],
     'y1': [2, 4, 3, 2, 8]}
dfqty = pd.DataFrame(df2, range(0:4))

dfprices = {'x1': [0, 2, 2, 4, 4],
     'x2': [2, 0, 0, 3, 4],
     'y1': [1, 3, 2, 1, 3]}
dfprices = pd.DataFrame(df3, range(0:4))

Let us say the above 3 dataframes are my data. Say, some dates, qty, and prices of goods. My new df is to be constructed from the above data:

rng = len(dfprices.columns)*len(dfprices.index) # This is the len of new df
dfnew = pd.DataFrame(np.nan,index=range(0,rng),columns=['Letter', 'Number', 'date', 'qty', 'price])

Now, this is where I struggle to piece my stuff together. I am trying to take all the data in dfdate and put it into a column in the new df. same with dfqty and dfprice. (so 3x5 matricies essentially goto a 1x15 vector and are placed into the new df).

As well as that, I need a couple of columns in dfnew as identifiers, from the names of the columns of the old df.

Ive tried for loops but to no avail, and don't know how to convert a df to series. But my desired output is:

dfnew:
   'Lettercol','Numbercol', 'date', 'qty', 'price'
0     X            1         2       1      0
1     X            1         4       2      2
2     X            1         7       6      2
3     X            1         5       6      4
4     X            1         6       8      4      
5     X            2         2       3      2      
6     X            2         2       1      0                   
7     X            2         2       1      0                 
8     X            2         6       7      3          
9     X            2         7       5      4           
10    Y            1         3       2      1                  
11    Y            1         1       4      3           
12    Y            1         4       3      2           
13    Y            1         5       2      1          
14    Y            1         9       8      3         

where the numbers 0-14 are the index. letter = letter from col header in DFs number = number from col header in DFs next 3 columns are data from the orig df's

(don't ask why the original data is in that funny format :)

thanks so much. my last Q wasn't well received so have tried to make this one better, thanks

1 Answer 1

1

Use:

#list of DataFrames
dfs = [dfdate, dfqty, dfprices]

#list comprehension with reshape
comb = [x.unstack() for x in dfs]
#join together
df = pd.concat(comb, axis=1, keys=['date', 'qty', 'price'])
#remove second level of MultiIndex and index to column
df = df.reset_index(level=1, drop=True).reset_index().rename(columns={'index':'col'})
#extract all values without first by indexing [1:] and first letter by [0]
df['Number'] = df['col'].str[1:]
df['Letter'] = df['col'].str[0]

cols = ['Letter', 'Number', 'date', 'qty', 'price']
#change order of columns
df = df.reindex(columns=cols)
print (df)
   Letter Number  date  qty  price
0       x      1     2    1      0
1       x      1     4    2      2
2       x      1     7    6      2
3       x      1     5    6      4
4       x      1     6    8      4
5       x      2     2    3      2
6       x      2     2    1      0
7       x      2     2    1      0
8       x      2     6    7      3
9       x      2     7    5      4
10      y      1     3    2      1
11      y      1     1    4      3
12      y      1     4    3      2
13      y      1     5    2      1
14      y      1     9    8      3
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! So much more elegant than my attempts at using loops etc. Thanks
if we had an extra line of code: df['indentifier'] = df['col'], i.e. a col of x1, x2 and y2 and if I had another DF with transaction cost, say: tcostmatrix = pd.DataFrame(np.NaN, index=range(0,4), columns=['x1', 'x2', 'y1']) tcostmatrix.iloc[:,0] = 0.2 # tcost x1 tradecostmatrix.iloc[:,1] = 0.3 # tcost x2 tradecostmatrix.iloc[:,2] = 0.5 # tcost x3 (i.e. x1=0.2, x2=0.3, y1=0.5) how would I create a new column with values the respective tcost values? essentially im looking for a vlookup equiv in python? my desired output is tcost 0 0.2 1 0.2 ... 14 0.5

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.