Manipulating a Dataframe using pandas, create new columns and fill them with values based on looking up existing data within dataframe

Question

Given data

df = pd.DataFrame(
    {
        'c': ['p1', 'p2', 'p3'],
        'v': [ 2  ,  8  ,  3],
    }
)

This outputs

I'm wondering how to create the following using pandas

    c  v  p1  p2  p3
0  p1  2   2   0   0
1  p2  8   0   8   0
2  p3  3   0   0   3

In such a way that I could scale this up to 1000 rows rather than 3 rows (so no hard coding)

edit

my current approach is as follows :

df = pd.DataFrame(
    {
        'c': ['p1', 'p2', 'p3'],
        'v': [ 2  ,  8  ,  3],
    }
)

# create columns with zero 
for p in df['c']:
    df[p] = 0
# iterate over columns, set values 
for p in df['c']:
    # get value
    value = df.loc[ df.loc[:,'c']==p, 'v']
    # get the location of the element to set
    idx=df.loc[:,'c']==p
    df.loc[idx,p]=value

which outputs the correct result, I feel as though it's a very clunky approach though.

Edit two

The solution must work for the following data :

df = pd.DataFrame(
    {
        'c': ['p1', 'p2', 'p3', 'p1'],
        'v': [ 2  ,  8  ,  3, 4],
    }
)

returning

    c  v  p1  p2  p3
0  p1  2   2   0   0
1  p2  8   0   8   0
2  p3  3   0   0   3
3  p1  9   9   0   0

Meaning that the approach of using a pivot table as

piv = df.pivot_table(index='c', columns='c', values='v', fill_value=0)
df = df.join(piv.reset_index(drop=True))

wouldn't work, although for the original data set it was fine.

jezrael · Accepted Answer · 2019-09-14 13:43:47Z

2

Multiple indicator DataFrame created by get_dummies with column v and DataFrame.join to original:

df1 = df.join(pd.get_dummies(df["c"]).mul(df['v'], axis=0))
print (df1)
    c  v  p1  p2  p3
0  p1  2   2   0   0
1  p2  8   0   8   0
2  p3  3   0   0   3

EDIT:

df1 = df.join(pd.get_dummies(df["c"]).mul(df['v'], axis=0))
print (df1)
    c  v  p1  p2  p3
0  p1  2   2   0   0
1  p2  8   0   8   0
2  p3  3   0   0   3
3  p1  4   4   0   0

Details:

#indicator column
print (pd.get_dummies(df["c"]))
   p1  p2  p3
0   1   0   0
1   0   1   0
2   0   0   1
3   1   0   0

#all values are multiple by c column
print (pd.get_dummies(df["c"]).mul(df['v'], axis=0))
   p1  p2  p3
0   2   0   0
1   0   8   0
2   0   0   3
3   4   0   0

edited Sep 14, 2019 at 13:43

answered Sep 14, 2019 at 13:05

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

baxx Over a year ago

does this depend on their being as many p_i values as there are rows in the data? because that's not necessarily true (see update)

jezrael Over a year ago

@baxx - In my solution if working perfectly, check edited answer.

baxx Over a year ago

yeah, this works well, and is the only one that works with the data i have locally as well, so I've translated something poorly it seems. Thanks though

jezrael Over a year ago

@baxx - You test another solution with pivot_table from Erfan answer, it working only if unique values in c.

baxx Over a year ago

yes, the one just with get_dummies and join didn't work with the data i have (~3400 rows) for some reason, I'm not too sure. I should try to get a subset of the data, obfuscate it, and include it into the OP so that you (and others) can see any mistakes, and why your solution was (in this case) the most ideal

|

bharatk · Accepted Answer · 2019-09-14 12:07:56Z

2

Use

pd.get_dummies() - Convert categorical variable into dummy/indicator variables.
df.join() - Join columns of another DataFrame.

Ex.

import pandas as pd
df = pd.DataFrame(
    {
        'c': ['p1', 'p2', 'p3'],
        'v': [ 2  ,  8  ,  3],
    }
)
s = pd.get_dummies(df["c"])
s.values[s != 0] = df['v']
df = df.join(s)
print(df)

O/P:

    c  v  p1  p2  p3
0  p1  2   2   0   0
1  p2  8   0   8   0
2  p3  3   0   0   3

edited Sep 14, 2019 at 12:07

answered Sep 14, 2019 at 11:50

bharatk

4,3455 gold badges19 silver badges31 bronze badges

Comments

Manualmsdos · Accepted Answer · 2019-09-14 12:37:08Z

1

You can use numpy matrix.

n = df['c'].shape[0]
t = np.zeros(shape=(n, n)).astype(np.int)
np.fill_diagonal(t, df['v'])    
t = pd.DataFrame(t, columns = df['c'])

df = pd.concat([df,t], axis=1)

df:

    c   v   p1  p2  p3
0   p1  2   2   0   0
1   p2  8   0   8   0
2   p3  3   0   0   3

answered Sep 14, 2019 at 12:37

Manualmsdos

1,5473 gold badges16 silver badges25 bronze badges

1 Comment

baxx Over a year ago

nice to see an alternative approach (although the question was around pandas - still good to see a different take though)

Erfan · Accepted Answer · 2019-09-14 12:43:14Z

1

Using pivot_table:

piv = df.pivot_table(index='c', columns='c', values='v', fill_value=0)
df = df.join(piv.reset_index(drop=True))

Output

    c  v  p1  p2  p3
0  p1  2   2   0   0
1  p2  8   0   8   0
2  p3  3   0   0   3

answered Sep 14, 2019 at 12:43

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Collectives™ on Stack Overflow

Manipulating a Dataframe using pandas, create new columns and fill them with values based on looking up existing data within dataframe

edit

Edit two

4 Answers 4

7 Comments

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

edit

Edit two

4 Answers 4

7 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related