2

I have a dataframe that looks like this:

fruit   cost
apples  2
oranges 3
bananas 5
grapefruit  1

I want to pass a list that includes specified "fruit" column values and use that to duplicate those rows in the dataframe. For example, duplicated fruits = ['apples', 'oranges'].

These rows should then be copied back into the dataframe with an extra column that denotes that they are a copy (can be a binary 1/0).

3
  • What is your expected output? Commented Nov 1, 2019 at 13:10
  • Do you want df['indicator'] = df['fruit'].isin(fruits)? Commented Nov 1, 2019 at 13:11
  • I only want to duplicate "oranges" and "apples," so values I have specified. The desired output is to duplicate these rows in the dataframe and create a new column marking which rows are originals and which are copies. Commented Nov 1, 2019 at 13:12

2 Answers 2

3

Use Series.isin for get matching rows and for duplication use DataFrame.append to original data with DataFrame.assign for indicato column:

duplicated  = ['apples', 'oranges']
df1 = df[df['fruit'].isin(duplicated)].assign(new=1)
df = df.assign(new=0).append(df1, ignore_index=True)
print (df)
        fruit  cost  new
0      apples     2    0
1     oranges     3    0
2     bananas     5    0
3  grapefruit     1    0
4      apples     2    1
5     oranges     3    1

Another idea is use parameter keys in concat - it create new level filled by 0 and 1, so necessary DataFrame.reset_index by first level for convert this level for column:

df = (pd.concat([df, df1], keys=(0,1))
       .rename_axis(('new', None))
       .reset_index(level=0)
       .reset_index(drop=True))
print (df)
   new       fruit  cost
0    0      apples     2
1    0     oranges     3
2    0     bananas     5
3    0  grapefruit     1
4    1      apples     2
5    1     oranges     3
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! Follow up q - if I want to specify that row now, i.e. where apples is duplicated (fruit = apples, new = 1) and I want to replace "apples" with "apple juice," how could I do that replacement?
@dataelephant use m =(df['fruit'] =='apples') & (df['new'] ==1) and then df.loc[m, 'fruit'] = "apple juice"
@dataelephant also check this.
0

You could use concat:

result = pd.concat([df, df[df.fruit.isin(fruits)].assign(new=1)], sort=False).fillna(0)

Output

        fruit  cost  new
0      apples     2  0.0
1     oranges     3  0.0
2     bananas     5  0.0
3  grapefruit     1  0.0
0      apples     2  1.0
1     oranges     3  1.0

As an alternative you could reindex, with default_value=0, before concat:

filtered = df[df.fruit.isin(fruits)].assign(new=1)

result = pd.concat([df.reindex(columns=filtered.columns, fill_value=0), filtered], sort=False)

print(result)

Output

        fruit  cost  new
0      apples     2    0
1     oranges     3    0
2     bananas     5    0
3  grapefruit     1    0
0      apples     2    1
1     oranges     3    1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.