populate a dense dataframe given a key-value dataframe

Question

I have a key-value dataframe:

pd.DataFrame(columns=['X','Y','val'],data= [['a','z',5],['b','g',3],['b','y',6],['e','r',9]])
>    X Y val
   0 a z   5
   1 b g   3
   2 b y   6
   3 e r   9

Which I'd like to convert into a denser dataframe:

     X z g y r
   0 a 5 0 0 0
   1 b 0 3 6 0
   2 e 0 0 0 9

Before I resort to a pure-python I was wondering if there was a simple way to do this with pandas.

It's easy to pivot to get this without the empty line of b 0 0 0 0; is that important? — DSM
– DSM, Commented Sep 5, 2013 at 17:30

Andy Hayden · Accepted Answer · 2013-09-05 18:55:11Z

3

You can use get_dummies:

In [11]: dummies = pd.get_dummies(df['Y'])

In [12]: dummies
Out[12]: 
   g  r  y  z
0  0  0  0  1
1  1  0  0  0
2  0  0  1  0
3  0  1  0  0

and then multiply by the val column:

In [13]: res = dummies.mul(df['val'], axis=0)

In [14]: res
Out[14]: 
   g  r  y  z
0  0  0  0  5
1  3  0  0  0
2  0  0  6  0
3  0  9  0  0

To fix the index, you could just add the X as this index, you could first apply set_index:

In [21]: df1 = df.set_index('X', append=True)

In [22]: df1
Out[22]: 
     Y  val
  X        
0 a  z    5
1 b  g    3
2 b  y    6
3 e  r    9

In [23]: dummies = pd.get_dummies(df['Y'])

In [24]: dummies.mul(df['val'], axis=0)
Out[24]: 
     g  r  y  z
  X            
0 a  0  0  0  5
1 b  3  0  0  0
2 b  0  0  6  0
3 e  0  9  0  0

If you wanted to do this pivot (you can also use pivot_table):

In [31]: df.pivot('X', 'Y').fillna(0)
Out[31]: 
   val         
Y    g  r  y  z
X              
a    0  0  0  5
b    3  0  6  0
e    0  9  0  0

Perhaps you want to reset_index, to make X a column (I'm not sure whether than makes sense):

In [32]: df.pivot('X', 'Y').fillna(0).reset_index()
Out[32]: 
   X  val         
Y       g  r  y  z
0  a    0  0  0  5
1  b    3  0  6  0
2  e    0  9  0  0

For completeness, the pivot_table:

In [33]: df.pivot_table('val', 'X', 'Y', fill_value=0)
Out[33]: 
Y  g  r  y  z
X            
a  0  0  0  5
b  3  0  6  0
e  0  9  0  0

In [34]: df.pivot_table('val', 'X', 'Y', fill_value=0).reset_index()
Out[34]: 
Y  X  g  r  y  z
0  a  0  0  0  5
1  b  3  0  6  0
2  e  0  9  0  0

Note: the column name are named Y, after reseting the index, not sure if this makes sense (and easy to rectify via res.columns.name = None).

edited Sep 5, 2013 at 18:55

answered Sep 5, 2013 at 17:50

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

DSM Over a year ago

Hmm. Using get_dummies preserves all the rows the OP wants, but doesn't put the 3 and 6 in the same row; .pivot("X", "Y").fillna(0) puts the 3 and 6 in the same row but loses the 0 row. I'm not sure which is closer to what the OP is after.

Andy Hayden Over a year ago

Hmmm, that positioning looks wrong. The thing I'm missing atm is the df['X'] col being part of the index

DSM Over a year ago

Yeah, I guess it could be an error on the OP's part. +1 anyway. :^)

Andy Hayden Over a year ago

:) I see what you're saying. Yeah, depends what they are after. If it's the thing OP wrote they should throw away the first index (as that doesn't make much sense)...

stites Over a year ago

Yeah sorry about not being clear - the pivot tables where all I was looking for... forgot about those. However after testing out get_dummies this works out better for what I need to work with. Thank you!

Dale · Accepted Answer · 2013-09-05 18:52:40Z

If you want something that feels more direct. Something akin to DataFrame.lookup but for np.put might make sense.

def lookup_index(self, row_labels, col_labels):
    values = self.values
    ridx = self.index.get_indexer(row_labels)
    cidx = self.columns.get_indexer(col_labels)
    if (ridx == -1).any():
        raise ValueError('One or more row labels was not found')
    if (cidx == -1).any():
        raise ValueError('One or more column labels was not found')
    flat_index = ridx * len(self.columns) + cidx
    return flat_index

flat_index = lookup_index(df, vals.X, vals.Y)
np.put(df.values, flat_index, vals.val.values)

This assumes that df has the appropriate columns and index to hold the X/Y values. Here's an ipython notebook http://nbviewer.ipython.org/6454120

Collectives™ on Stack Overflow

populate a dense dataframe given a key-value dataframe

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related