37

I have a DataFrame (df1) with a dimension 2000 rows x 500 columns (excluding the index) for which I want to divide each row by another DataFrame (df2) with dimension 1 rows X 500 columns. Both have the same column headers. I tried:

df.divide(df2) and df.divide(df2, axis='index') and multiple other solutions and I always get a df with nan values in every cell. What argument am I missing in the function df.divide?

5 Answers 5

43

In df.divide(df2, axis='index'), you need to provide the axis/row of df2 (ex. df2.iloc[0]).

import pandas as pd

data1 = {"a":[1.,3.,5.,2.],
         "b":[4.,8.,3.,7.],
         "c":[5.,45.,67.,34]}
data2 = {"a":[4.],
         "b":[2.],
         "c":[11.]}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2) 

df1.div(df2.iloc[0], axis='columns')

or you can use df1/df2.values[0,:]

Sign up to request clarification or add additional context in comments.

2 Comments

Very good. The key message is stated in very first sentence of the answer - the DataFrame must be divided by a vector (pd.Series).
tried df1/df2.values[0,:] got IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed. df1/df2.values[0] works.
15

You can divide by the series i.e. the first row of df2:

In [11]: df = pd.DataFrame([[1., 2.], [3., 4.]], columns=['A', 'B'])

In [12]: df2 = pd.DataFrame([[5., 10.]], columns=['A', 'B'])

In [13]: df.div(df2)
Out[13]: 
     A    B
0  0.2  0.2
1  NaN  NaN

In [14]: df.div(df2.iloc[0])
Out[14]: 
     A    B
0  0.2  0.2
1  0.6  0.4

Comments

6

Small clarification just in case: the reason why you got NaN everywhere while Andy's first example (df.div(df2)) works for the first line is div tries to match indexes (and columns). In Andy's example, index 0 is found in both dataframes, so the division is made, not index 1 so a line of NaN is added. This behavior should appear even more obvious if you run the following (only the 't' line is divided):

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'])
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'])
df_a.div(df_b)

So in your case, the index of the only row of df2 was apparently not present in df1. "Luckily", the column headers are the same in both dataframes, so when you slice the first row, you get a series, the index of which is composed by the column headers of df2. This is what eventually allows the division to take place properly.

For a case with index and column matching:

df_a = pd.DataFrame(np.random.rand(3,5), index= ['x', 'y', 't'], columns = range(5))
df_b = pd.DataFrame(np.random.rand(2,5), index= ['z','t'], columns = [1,2,3,4,5])
df_a.div(df_b)

Comments

2

If you want to divide each row of a column with a specific value you could try:

df['column_name'] = df['column_name'].div(10000)

For me, this code divided each row of 'column_name' with 10,000.

Comments

-3

to divide a row (with single or multiple columns), we need to do the below:

df.loc['index_value'] = df.loc['index_value'].div(10000)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.