0

I have the following DataFrame df:

d = {'1': ['25', 'AAA', 2], '2': ['30', 'BBB', 3], '3': ['5', 'CCC', 2], \
     '4': ['300', 'DDD', 2], '5': ['30', 'DDD', 3],  '6': ['100', 'AAA', 3]}

columns=['Price', 'Name', 'Class']

df = pd.DataFrame.from_dict(data=d, orient='index')
df.columns = columns

I want to duplicate rows based on values of the column Class. In particular, I want to randomly select rows where Class is equal to 3, and duplicate them. For example, in current df I have 3 rows with Class equal to 3. How can I create N duplicates, where N is configurable, for example:

N = 2
target_column = "Class"
target_value = 3
new_df = create_duplicates(df, target_column, target_value, N)

I was thinking to use for-loop and at each iteration (when Class is equal to 3) generate a random number. If it's greater than 0.5, then the row is added to a list of selected rows. This process continues until a list of selected rows contains N rows. Then these N rows are appended to df.

Is there a more elegant and shorter way to do the same? Maybe some built-in pandas functions?

2
  • I'm not following your question. Do you want to duplicate the entire dataset by a number N? or just some specific classes? Commented Dec 3, 2019 at 20:33
  • @QuangHoang: Sorry if it's unclear. I want to duplicate specific rows by a number N. The "specific" rows are those ones that have Class equal to 3. Commented Dec 3, 2019 at 20:44

1 Answer 1

1

I think this script below will do what you need. I lifted the repetition part from: Repeat Rows in Data Frame n Times

n=3
pd.concat([df,df[df['Class']==3].loc[df.index.repeat(n)].dropna()]).sort_values('Name')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! How does it select rows? Randomly? Or does it take a first row and duplicate it N times?
In this case it selects the rows with a class = 3. Can you better define what you mean by randomly? You could select one row by index, randomly; you could select N rows by index, but you might end up randomly selecting the same row more than one, so you could put some controls around that; you could randomly select a "class" and duplicate all the rows with the select class.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.