Adding new variable to dataframe

Question

I am new to Python. I am trying to add a randomly generated variable to an already existing dataframe. I get an error message, but can't figure out why.

import pandas as pd
import numpy as np

data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df=pd.DataFrame(data, columns=['age'])


 # Add income:
income_5 = np.random.randint(low=0, high=4, size=(nrows(df,))+1                          
df['income5'] = income_5

What am I doing wrong?

Sunderam Dubey · Accepted Answer · 2022-10-05 03:50:38Z

1

After changing size=(nrows(df,) to size=(len(df),) it works, so:

import pandas as pd
import numpy as np

data=[10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
df=pd.DataFrame(data, columns=['age'])


 # Add income:
income_5 = np.random.randint(low=0, high=4, size=(len(df),))+1                          
df['income5'] = income_5

edited Oct 5, 2022 at 3:50

Sunderam Dubey

8,83512 gold badges25 silver badges43 bronze badges

answered Oct 4, 2022 at 7:47

Deepak Tripathi

3,2611 gold badge11 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mozway · Accepted Answer · 2022-10-04 07:48:13Z

0

The correct syntax would be size=df.shape[0] or size=len(df):

income_5 = np.random.randint(low=0, high=4, size=df.shape[0])
df['income5'] = income_5

Example:

   age  income5
0   10        0
1   20        3
2   30        0
3   40        3
4   50        0
5   60        3
6   70        0
7   80        0
8   90        2
9  100        1

NB. You don't need the intermediate variable:

df['income5'] = np.random.randint(low=0, high=4, size=df.shape[0])

answered Oct 4, 2022 at 7:48

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

Adding new variable to dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related