How to repeat rows based on value of a column in Python

Question

I have the following df that comprises of code/product & weeks columns.

code.  Product  .   weeks
123 .  product1 .    1;2
123 .  product1 .    3
321 .  product2 .    4;5;6
321 .  product2 .    7

For those rows that have more than 1 week (eg 1;2 or 4;5;6), I want to repeat these rows. My Desirsed output is as follows:

code.  Product  .   weeks
123 .  product1 .    1
123 .  product1 .    2
123 .  product1 .    3
321 .  product2 .    4
321 .  product2 .    5
321 .  product2 .    6
321 .  product2 .    7

What is the best approach to take using pandas or numpy?

Possible duplicate of How to explode a list inside a Dataframe cell into separate rows — Matthias
– Matthias, Commented Jul 23, 2018 at 7:37

jezrael · Accepted Answer · 2018-07-23 07:51:58Z

2

Use:

df = (df.set_index(['code','Product'])['weeks']
       .str.split(';', expand=True)
       .stack()
       .reset_index(level=2, drop=True)
       .reset_index(name='weeks'))
print (df)
   code   Product weeks
0   123  product1     1
1   123  product1     2
2   123  product1     3
3   321  product2     4
4   321  product2     5
5   321  product2     6
6   321  product2     7

Explanation:

First set_index by all repeated columns
Create DataFrame by split
Reshape by stack
Last data cleaning by reset_index

Another solution:

from itertools import chain

weeks = df['weeks'].str.split(';')
lens = weeks.str.len()
df = pd.DataFrame({
    'code' : df['code'].repeat(lens),
    'Product' : df['Product'].repeat(lens),
    'weeks' : list(chain.from_iterable(weeks.values.tolist())), 
})

print (df)
   code   Product weeks
0   123  product1     1
0   123  product1     2
1   123  product1     3
2   321  product2     4
2   321  product2     5
2   321  product2     6
3   321  product2     7

Explanation:

Create lists by split
Get lengths of lsits by len
Last repeat columns and flatten weeks

edited Jul 23, 2018 at 7:51

answered Jul 23, 2018 at 7:39

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jwlon81 Over a year ago

The first solution appears to be working correctly except for the fact that in my df, it excludes rows that don't have (' ; ') in the ['weeks'] column value. To relate back to the original df above, data for weeks 3 and week 7 are not appearing. I think it has something to do with the str.split function accidentally excluding the additional data. Your help in finding the last missing piece of code will be most helpful!

jezrael Over a year ago

@jwlon81 - So first solution failed and second working? If data are not confidental, is possible send me csv to my email in my profile for test it?

jwlon81 Over a year ago

Hi Jezrael, I'm not sure what happened but I ran the code again and I now works! Thanks

Kiruparan Balachandran · Accepted Answer · 2018-07-23 07:40:09Z

0

#assume test.xlsx is your data
test = pd.read_excel('test.xlsx')  
test_processed = pd.DataFrame(columns=test.columns)
for index, row in test.iterrows():
   weeks = row['weeks'].split(';')
   for week in weeks:
       test_processed = test_processed.append({'code':row['code'], 'Product':row['Product'],'weeks':week}, ignore_index=True)

answered Jul 23, 2018 at 7:40

Kiruparan Balachandran

3621 gold badge2 silver badges10 bronze badges

Collectives™ on Stack Overflow

How to repeat rows based on value of a column in Python

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related