4

I am trying to explode an existing dataframe based on a numeric value in a column. For example, if the column has a numeric value of 3, I want to have 3 of those rows, so on and so on.

Assuming we start with this dataframe:

inventory_partner inventory_partner2  calc
0              A1                 aa     1
1              A2                 bb     2
2              A3                 cc     5
3              A4                 dd     4
4              A5                 ee     5
5              A6                 ff     3

How do we get to this dataframe?

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
1                A2                 bb     2
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
5                A6                 ff     3
5                A6                 ff     3
5                A6                 ff     3

I have gotten this to work by using the below code, but I was wondering if there is an easier way to accomplish this without having to manually create the comma separated lists to feed into the explode method.

import pandas as pd

#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'], 'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'], 'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)

print(df1) #print original dataframe

#create my_comma_list column based on number values in calc column
df1.insert(3, 'my_comma_list', '')
df1.loc[df1['calc'] == 1, 'my_comma_list'] = '1'
df1.loc[df1['calc'] == 2, 'my_comma_list'] = '1, 2'
df1.loc[df1['calc'] == 3, 'my_comma_list'] = '1, 2, 3'
df1.loc[df1['calc'] == 4, 'my_comma_list'] = '1, 2, 3, 4'
df1.loc[df1['calc'] == 5, 'my_comma_list'] = '1, 2, 3, 4, 5'

print(df1) #print before row explosion

#explode the rows using the my_comma_list column to get desired number of rows
df1 = df1.assign(my_comma_list=df1['my_comma_list'].str.split(',')).explode('my_comma_list')
#drop the my_comma_list column since we no longer need it
del df1['my_comma_list']

print(df1) #print after row explosion

2 Answers 2

5

You can use Index.repeat and DataFrame.loc to repeat rows.

import pandas as pd

#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
     'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
     'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print (df1)
df1 = df1.loc[df1.index.repeat(df1['calc'])]
print (df1)

Output is:

Original DataFrame:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
2                A3                 cc     5
3                A4                 dd     4
4                A5                 ee     5
5                A6                 ff     3

Updated DataFrame with repeated rows:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
1                A2                 bb     2
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
5                A6                 ff     3
5                A6                 ff     3
5                A6                 ff     3

If you want to repeat rows based on a column value with a reference lookup, you can create a dictionary and identify how many times you want it to repeat, then use map to pass the value.

Let's say, you want to repeat based on the value in inventory_partner. Then you can do this:

import pandas as pd

inv_partner_dict = {'A1':1, 'A2':2, 'A3':5, 'A4':4,'A5':5,'A6':3}

#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
     'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
     'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)


print (df1)
df1 = df1.loc[df1.index.repeat(df1['inventory_partner2'].map(inv_partner_dict))]
print (df1)

This will do the same thing.

The output of this will be:

Original DataFrame:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
2                A3                 cc     5
3                A4                 dd     4
4                A5                 ee     5
5                A6                 ff     3

Updated DataFrame with repeated rows:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
1                A2                 bb     2
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
5                A6                 ff     3
5                A6                 ff     3
5                A6                 ff     3
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I am looking for. I never thought about using repeat to accomplish this task. Thanks so much for your help!
1

Use pd.Series.repeat to get the index and then reindex:

df.reindex(df.inventory_partner2.repeat(df.calc).index)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.