3

I would like to replace some of the values in the foll. dataframe:

dataframe_a

Y2000   Y2001   Y2002    Y2003    Y2004    Item    Item Code
34        43      0      0          25     Test      Val

I would like to replace the values in the columns with a numeric value derived by multiplying a scalar (say 0.5) by all values in this dataframe:

dataframe_b

Y2000   Y2001   Y2002    Y2003    Y2004    Item    Item Code
34        43      10      20        25     Test      Val

So, in dataframe_a value for column Y2002 should be 10 * 0.5 and value for column Y2003 should be 20 * 0.5

Currently, I am doing this:

df = dataframe_a[dataframe_a == 0]
df = df * dataframe_b * 0.5

However, not sure how I can update dataframe_a with the new values

2

5 Answers 5

2

You can use the boolean mask and then call fillna:

In [58]:
fill = df2.select_dtypes(include = [np.number]) * 0.5
df1 = df1[df1!=0].fillna(fill)
df1

Out[58]:
   Y2000  Y2001  Y2002  Y2003  Y2004  Item Item  Code
0     34     43      5     10     25  Test        Val

Here df1[df1 !=0] will produce a df of the same shape with NaN values where the condition is not met, you can then call fillna on this and pass the other df which will replace the NaN values where the index and columns align.

The result of the boolean mask:

In [63]:
df1[df1!=0]

Out[63]:
   Y2000  Y2001  Y2002  Y2003  Y2004  Item Item  Code
0     34     43    NaN    NaN     25  Test        Val
Sign up to request clarification or add additional context in comments.

3 Comments

That replaces the NaN values buy the ones from df2 not by a scalar multiplier as he asked.
No idea, hope that'll help
thanks @EdChum, excellent explanation as usual. I didn't downvote :)
2

A generic one, in case you don't know the location of the 0 value:

new_df = 0.5*df2[df==0]
new_df.fillna(df, inplace=True)
print(new_df)

    0   1  2  3   4     5    6
0  34  43  5  5  25  Test  Val

Where dataframe_a is df and dataframe_b is df2

Comments

1
import pandas as pd
import numpy as np
randn = np.random.randn
s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df
df.replace(1, 12*4)  # replace all values 1 by 12*4
df

Ref about replace() : Replace all occurrences of a string in a pandas dataframe (Python)

Comments

1
dataframe_a[dataframe_a == 0] = 0.5 * dataframe_b[dataframe_a == 0]

Comments

1

pandas.DataFrame.where might be what you need. You would have to construct another dataframe with the specific column values that you want to substitute.

I don't have Pandas installed here so I can't show a dataframe example - but it works similarly with numpy arrays.

>>> a
array([1, 2, 0, 3, 4, 0, 5])
>>> subst
array([10, 20, 30, 40, 50, 60, 70])
>>> k = -.5
>>> np.where(a == 0, subst * k, a)
array([  1.,   2., -15.,   3.,   4., -30.,   5.])
>>>

One difference with the dataframe is that it can do an in-place substitution and you only have to specify the other dataframe (the one with the substitute values).

Finally a Pandas example:

>>> 
>>> df
   d  e  f
a  0  1  1
b  1  1  0
c  1  0  1
>>> s
    d   e   f
a  10  20  30
b  10  20  30
c  10  20  30
>>> k = -.5
>>> df.where(df != 0, other = s * k)
   d   e   f
a -5   1   1
b  1   1 -15
c  1 -10   1
>>> 
>>> df.where(df != 0, other = s * k, inplace = True)
>>> df
   d   e   f
a -5   1   1
b  1   1 -15
c  1 -10   1
>>>

Some examples from the pydata site.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.