10

I have a data frame that looks like this:

df = pd.DataFrame({"value": [4, 5, 3], "item1": [0, 1, 0], "item2": [1, 0, 0], "item3": [0, 0, 1]})
df

  value item1   item2   item3
0   4   0      1         0
1   5   1      0         0
2   3   0      0         1

Basically what I want to do is replace the value of the one hot encoded elements with the value from the "value" column and then delete the "value" column. The resulting data frame should be like this:

df_out = pd.DataFrame({"item1": [0, 5, 0], "item2": [4, 0, 0], "item3": [0, 0, 3]})

   item1    item2   item3
0   0        4      0
1   5        0      0
2   0        0      3
1
  • i think this can be solved if you just use df["columNameToReplace"] = df["value"] and then delete the value from the dataframe ? Commented Dec 5, 2018 at 12:47

4 Answers 4

14

Why not just multiply?

df.pop('value').values * df

   item1  item2  item3
0      0      5      0
1      4      0      0
2      0      0      3

DataFrame.pop has the nice effect of in-place removing and returning a column, so you can do this in a single step.


if the "item_*" columns have anything besides 1 in them, then you can multiply with bools:

df.pop('value').values * df.astype(bool)

   item1  item2  item3
0      0      5      0
1      4      0      0
2      0      0      3

If your DataFrame has other columns, then do this:

df
   value  name  item1  item2  item3
0      4  John      0      1      0
1      5  Mike      1      0      0
2      3  Stan      0      0      1

# cols = df.columns[df.columns.str.startswith('item')]
cols = df.filter(like='item').columns
df[cols] = df.pop('value').values * df[cols]

df
  name  item1  item2  item3
0  John      0      5      0
1  Mike      4      0      0
2  Stan      0      0      3
Sign up to request clarification or add additional context in comments.

8 Comments

Most elegant answer so far
I like it but I should have been more specific with my question. Here is how my data frame actually looks like: df_in = pd.DataFrame({"value": [4, 5, 3], "name": ["John", "Mike", "Stan"], "item1": [0, 1, 0], "item2": [1, 0, 0], "item3": [0, 0, 1]}) And the output df should be: df_out = pd.DataFrame({"name": ["John", "Mike", "Stan"], "item1": [0, 4, 0], "item2": [5, 0, 0], "item3": [0, 0, 3]})
@GorjanRadevski Let me know if the edit does it for you.
@coldspeed I get a really strange Value error now: ValueError: operands could not be broadcast together with shapes (2918590,) (2918590,26) Even though in the sample data frame the shapes similar: (3,) (3, 3) Do you know what might be causing the issue? Besides, yes, the edit definitely should do it for me.
That works! I have no idea why on the sample data frame it worked without the addition. Thank you!
|
1

You could do something like:

df = pd.DataFrame([df['value']*df['item1'],df['value']*df['item2'],df['value']*df['item3']])
df.columns = ['item1','item2','item3']

EDIT: As this answer will not scale well to many columns as @coldspeed comments, it should be done iterating a loop:

 cols = ['item1','item2','item3']
 for c in cols:
     df[c] *= df['value']
 df.drop('value',axis=1,inplace=True)

Comments

0

You need:

col = ['item1','item2','item3']

for c in col:
    df[c] = df[c] * df['value']

df.drop(['value'],1,inplace=True)

Comments

0

pd.DataFrame.mul

You can use mul, or eqivalently multiply, either using labels or integer positional indexing:

# label-based indexing
res = df.filter(regex='^item').mul(df['value'], axis='index')

# integer positional indexing
res = df.iloc[:, 1:].mul(df.iloc[:, 0], axis='index')

print(res)

#    item1  item2  item3
# 0      0      4      0
# 1      5      0      0
# 2      0      0      3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.