7

I have sample schema, which consists 12 columns, and each column has certain category. Now i need to simulate those data into a dataframe of around 1000 rows. How do i go about it?

I have used below code to generate data for each column

      Location = ['USA','India','Prague','Berlin','Dubai','Indonesia','Vienna']
      Location = random.choice(Location)

      Age = ['Under 18','Between 18 and 64','65 and older']
      Age = random.choice(Age)

      Gender = ['Female','Male','Other']
      Gender = random.choice(Gender)

and so on

I need the output as below

       Location        Age          Gender
       Dubai           below 18     Female
       India           65 and older Male

. . . .

2 Answers 2

8

You can create each column one by one using np.random.choice:

df = pd.DataFrame()                                                                                                                                                                     
N = 1000                                                                                                                                                                                
df["Location"] = np.random.choice(Location, size=N)                                                                                                                                     
df["Age"] = np.random.choice(Age, size=N)                                                                                                                                               
df["Gender"] = np.random.choice(Gender, size=N)  

Or do that using a list comprehension:

column_to_choice = {"Location": Location, "Age": Age, "Gender": Gender}

df = pd.DataFrame(
    [np.random.choice(column_to_choice[c], 100) for c in column_to_choice]
).T

df.columns = list(column_to_choice.keys())

Result:

>>> print(df.head())                                                                                                                                                                              
    Location                Age  Gender
0      India       65 and older  Female
1     Berlin  Between 18 and 64  Female
2        USA  Between 18 and 64    Male
3  Indonesia           Under 18    Male
4      Dubai           Under 18   Other
Sign up to request clarification or add additional context in comments.

Comments

2

You can create a for loop for the number of rows you want in your dataframe and then generate a list of dictionary. Use the list of dictionary to generate the dataframe.

In [16]: for i in range(5):
    ...:     k={}
    ...:     loc = random.choice(Location)
    ...:     age = random.choice(Age)
    ...:     gen = random.choice(Gender)
    ...:     k = {'Location':loc,'Age':age, 'Gender':gen}
    ...:     list2.append(k)
    ...:

In [17]: import pandas as pd

In [18]: df = pd.DataFrame(list2)

In [19]: df
Out[19]:
                 Age Gender   Location
0  Between 18 and 64  Other     Berlin
1       65 and older  Other        USA
2       65 and older   Male      Dubai
3  Between 18 and 64   Male      Dubai
4  Between 18 and 64   Male  Indonesia

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.