Duplicate Rows in Pandas Dataframe if Values are in a List

Question

I have a dataframe that looks like this:

fruit   cost
apples  2
oranges 3
bananas 5
grapefruit  1

I want to pass a list that includes specified "fruit" column values and use that to duplicate those rows in the dataframe. For example, duplicated fruits = ['apples', 'oranges'].

These rows should then be copied back into the dataframe with an extra column that denotes that they are a copy (can be a binary 1/0).

I only want to duplicate "oranges" and "apples," so values I have specified. The desired output is to duplicate these rows in the dataframe and create a new column marking which rows are originals and which are copies. — dataelephant
– dataelephant, Commented Nov 1, 2019 at 13:12

jezrael · Accepted Answer · 2019-11-01 13:47:27Z

3

Use Series.isin for get matching rows and for duplication use DataFrame.append to original data with DataFrame.assign for indicato column:

duplicated  = ['apples', 'oranges']
df1 = df[df['fruit'].isin(duplicated)].assign(new=1)
df = df.assign(new=0).append(df1, ignore_index=True)
print (df)
        fruit  cost  new
0      apples     2    0
1     oranges     3    0
2     bananas     5    0
3  grapefruit     1    0
4      apples     2    1
5     oranges     3    1

Another idea is use parameter keys in concat - it create new level filled by 0 and 1, so necessary DataFrame.reset_index by first level for convert this level for column:

df = (pd.concat([df, df1], keys=(0,1))
       .rename_axis(('new', None))
       .reset_index(level=0)
       .reset_index(drop=True))
print (df)
   new       fruit  cost
0    0      apples     2
1    0     oranges     3
2    0     bananas     5
3    0  grapefruit     1
4    1      apples     2
5    1     oranges     3

edited Nov 1, 2019 at 13:47

answered Nov 1, 2019 at 13:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dataelephant Over a year ago

Thanks! Follow up q - if I want to specify that row now, i.e. where apples is duplicated (fruit = apples, new = 1) and I want to replace "apples" with "apple juice," how could I do that replacement?

jezrael Over a year ago

@dataelephant use m =(df['fruit'] =='apples') & (df['new'] ==1) and then df.loc[m, 'fruit'] = "apple juice"

jezrael Over a year ago

@dataelephant also check this.

Dani Mesejo · Accepted Answer · 2019-11-01 13:41:43Z

You could use concat:

result = pd.concat([df, df[df.fruit.isin(fruits)].assign(new=1)], sort=False).fillna(0)

Output

        fruit  cost  new
0      apples     2  0.0
1     oranges     3  0.0
2     bananas     5  0.0
3  grapefruit     1  0.0
0      apples     2  1.0
1     oranges     3  1.0

As an alternative you could reindex, with default_value=0, before concat:

filtered = df[df.fruit.isin(fruits)].assign(new=1)

result = pd.concat([df.reindex(columns=filtered.columns, fill_value=0), filtered], sort=False)

print(result)

Output

        fruit  cost  new
0      apples     2    0
1     oranges     3    0
2     bananas     5    0
3  grapefruit     1    0
0      apples     2    1
1     oranges     3    1

Collectives™ on Stack Overflow

Duplicate Rows in Pandas Dataframe if Values are in a List

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related