Replace ones in binary columns with values from another column

Question

I have a data frame that looks like this:

df = pd.DataFrame({"value": [4, 5, 3], "item1": [0, 1, 0], "item2": [1, 0, 0], "item3": [0, 0, 1]})
df

  value item1   item2   item3
0   4   0      1         0
1   5   1      0         0
2   3   0      0         1

Basically what I want to do is replace the value of the one hot encoded elements with the value from the "value" column and then delete the "value" column. The resulting data frame should be like this:

df_out = pd.DataFrame({"item1": [0, 5, 0], "item2": [4, 0, 0], "item3": [0, 0, 3]})

   item1    item2   item3
0   0        4      0
1   5        0      0
2   0        0      3

i think this can be solved if you just use df["columNameToReplace"] = df["value"] and then delete the value from the dataframe ? — Vaibhav gusain
– Vaibhav gusain, Commented Dec 5, 2018 at 12:47

cs95 · Accepted Answer · 2018-12-05 13:09:56Z

14

Why not just multiply?

df.pop('value').values * df

   item1  item2  item3
0      0      5      0
1      4      0      0
2      0      0      3

DataFrame.pop has the nice effect of in-place removing and returning a column, so you can do this in a single step.

if the "item_*" columns have anything besides 1 in them, then you can multiply with bools:

df.pop('value').values * df.astype(bool)

   item1  item2  item3
0      0      5      0
1      4      0      0
2      0      0      3

If your DataFrame has other columns, then do this:

df
   value  name  item1  item2  item3
0      4  John      0      1      0
1      5  Mike      1      0      0
2      3  Stan      0      0      1

# cols = df.columns[df.columns.str.startswith('item')]
cols = df.filter(like='item').columns
df[cols] = df.pop('value').values * df[cols]

df
  name  item1  item2  item3
0  John      0      5      0
1  Mike      4      0      0
2  Stan      0      0      3

edited Dec 5, 2018 at 13:09

answered Dec 5, 2018 at 12:51

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

horro Over a year ago

Most elegant answer so far

gorjan Over a year ago

I like it but I should have been more specific with my question. Here is how my data frame actually looks like:

df_in = pd.DataFrame({"value": [4, 5, 3], "name": ["John", "Mike", "Stan"], "item1": [0, 1, 0], "item2": [1, 0, 0], "item3": [0, 0, 1]})

And the output df should be: df_out = pd.DataFrame({"name": ["John", "Mike", "Stan"], "item1": [0, 4, 0], "item2": [5, 0, 0], "item3": [0, 0, 3]})

cs95 Over a year ago

@GorjanRadevski Let me know if the edit does it for you.

gorjan Over a year ago

@coldspeed I get a really strange Value error now: ValueError: operands could not be broadcast together with shapes (2918590,) (2918590,26) Even though in the sample data frame the shapes similar: (3,) (3, 3) Do you know what might be causing the issue? Besides, yes, the edit definitely should do it for me.

gorjan Over a year ago

That works! I have no idea why on the sample data frame it worked without the addition. Thank you!

|

horro · Accepted Answer · 2018-12-05 12:57:47Z

1

You could do something like:

df = pd.DataFrame([df['value']*df['item1'],df['value']*df['item2'],df['value']*df['item3']])
df.columns = ['item1','item2','item3']

EDIT: As this answer will not scale well to many columns as @coldspeed comments, it should be done iterating a loop:

 cols = ['item1','item2','item3']
 for c in cols:
     df[c] *= df['value']
 df.drop('value',axis=1,inplace=True)

edited Dec 5, 2018 at 12:57

answered Dec 5, 2018 at 12:50

horro

1,3193 gold badges20 silver badges39 bronze badges

Comments

Sociopath · Accepted Answer · 2018-12-05 12:52:55Z

0

You need:

col = ['item1','item2','item3']

for c in col:
    df[c] = df[c] * df['value']

df.drop(['value'],1,inplace=True)

edited Dec 5, 2018 at 12:52

answered Dec 5, 2018 at 12:51

Sociopath

13.4k22 gold badges53 silver badges82 bronze badges

Comments

jpp · Accepted Answer · 2019-01-16 04:01:32Z

0

`pd.DataFrame.mul`

You can use mul, or eqivalently multiply, either using labels or integer positional indexing:

# label-based indexing
res = df.filter(regex='^item').mul(df['value'], axis='index')

# integer positional indexing
res = df.iloc[:, 1:].mul(df.iloc[:, 0], axis='index')

print(res)

#    item1  item2  item3
# 0      0      4      0
# 1      5      0      0
# 2      0      0      3

answered Jan 16, 2019 at 4:01

jpp

166k37 gold badges301 silver badges362 bronze badges

Collectives™ on Stack Overflow

Replace ones in binary columns with values from another column

4 Answers 4

8 Comments

Comments

Comments

`pd.DataFrame.mul`

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related