1

I want to create Dataframe rows using the value in the Dataframe column(Race, TGR1). I still have additional columns aside from Race, TGR1 but the number of column values are the same. I can't think of the best possible way to achieve this.

Any help would be greatly appreciated.

Track              Date               Race                             TGR1
0   Addington       24/09/2021  R1,R2,R3,R4,R5,R6,R7,R8,R9,R0,R1,R2    5,8,2,5,6,1,6,3,1,2,1,2
1   Mount Gambier   26/09/2021  R1,R2,R3,R4,R5,R6,R7,R8,R9,R0          8,1,4,8,8,1,2,1,2,2

Expected output

Track           Date                  Race             TGR1
Addington       24/09/2021                R1                 5
Addington       24/09/2021                R2                 8
Addington       24/09/2021                R3                 2
Addington       24/09/2021                R4                 5
Addington       24/09/2021                R5                 6
Addington       24/09/2021                R6                 1
Addington       24/09/2021                R7                 6
Addington       24/09/2021                R8                 3
Addington       24/09/2021                R9                 1
Addington       24/09/2021                R0                 2
Addington       24/09/2021                R1                 1
Addington       24/09/2021                R2                 2

Mount Gambier   26/09/2021                R1                 8
Mount Gambier   26/09/2021                R2                 1
Mount Gambier   26/09/2021                R3                 4
Mount Gambier   26/09/2021                R4                 8
Mount Gambier   26/09/2021                R5                 8
Mount Gambier   26/09/2021                R6                 1
Mount Gambier   26/09/2021                R7                 2
Mount Gambier   26/09/2021                R8                 1
Mount Gambier   26/09/2021                R9                 2
Mount Gambier   26/09/2021                R10                2

1 Answer 1

1

You can use apply+pd.Series.explode. You first need to set aside the columns not to be exploded using set_index, then bring them back as columns with reset_index.

(df.assign(Race=df['Race'].str.split(','),
           TGR1=df['TGR1'].str.split(','))
   .set_index(['Track', 'Date'])
   .apply(pd.Series.explode)
   .reset_index()
)

output:

            Track        Date Race TGR1
0       Addington  24/09/2021   R1    5
1       Addington  24/09/2021   R2    8
2       Addington  24/09/2021   R3    2
3       Addington  24/09/2021   R4    5
4       Addington  24/09/2021   R5    6
5       Addington  24/09/2021   R6    1
6       Addington  24/09/2021   R7    6
7       Addington  24/09/2021   R8    3
8       Addington  24/09/2021   R9    1
9       Addington  24/09/2021   R0    2
10      Addington  24/09/2021   R1    1
11      Addington  24/09/2021   R2    2
12  Mount Gambier  26/09/2021   R1    8
13  Mount Gambier  26/09/2021   R2    1
14  Mount Gambier  26/09/2021   R3    4
15  Mount Gambier  26/09/2021   R4    8
16  Mount Gambier  26/09/2021   R5    8
17  Mount Gambier  26/09/2021   R6    1
18  Mount Gambier  26/09/2021   R7    2
19  Mount Gambier  26/09/2021   R8    1
20  Mount Gambier  26/09/2021   R9    2
21  Mount Gambier  26/09/2021   R0    2
Sign up to request clarification or add additional context in comments.

2 Comments

I applied the above method to the Dataframe but nothing seems to change. I have the same Dataframe as before. the dtype of all the columns is object
@ChukypedroOkolie check the update ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.