I ran into this problem where I have a dataframe that looks like the following (the values in the last 3 columns are usually 4-5 alphanumeric codes).
import pandas as pd
data = {'ID':['P39','S32'],
'Name':['Pipe','Screw'],
'Col3':['Test1, Test2, Test3','Test6, Test7'],
'Col4':['','Test8, Test9'],
'Col5':['Test4, Test5','Test10, Test11, Test12, Test13']
}
df = pd.DataFrame(data)
| ID | Name | Col3 | Col4 | Col5 | |
|---|---|---|---|---|---|
| 0 | P39 | Pipe | Test1, Test2, Test3 | Test4, Test5 | |
| 1 | S32 | Screw | Test6, Test7 | Test8, Test9 | Test10, Test11, Test12, Test13 |
I want to expand this dataframe or create a new one based on the values in the last 3 columns in each row. I want to create more rows based on the maximum amount of values separated by commas in one of the last 3 rows. I then want to keep the first 2 columns the same in all of the expanded rows. But I want to fill the last 3 columns in the expanded rows with only one value each from the original column.
In the above example, the first row would indicate I need 3 total rows (Col3 has the most at 3 values), and the second row would indicate I need 4 total rows (Col5 has the most at 4 values). A desired output would be along the lines of:
| ID | Name | Col3 | Col4 | Col5 | |
|---|---|---|---|---|---|
| 0 | P39 | Pipe | Test1 | Test4 | |
| 1 | P39 | Pipe | Test2 | Test5 | |
| 2 | P39 | Pipe | Test3 | ||
| 3 | S32 | Screw | Test6 | Test8 | Test10 |
| 4 | S32 | Screw | Test7 | Test9 | Test11 |
| 5 | S32 | Screw | Test12 | ||
| 6 | S32 | Screw | Test13 |
I first found a way to figure out the number of rows needed. I also had the idea to append the values to a new dataframe in the same loop. Although, I'm not sure how to separate the values in the last 3 columns and append them one by one in the rows. I know the str.split() is useful to put the values into a list. My only idea would be if I need to loop through each column separately and append it to the correct row, but I'm not sure how to do that.
output1 = pd.DataFrame(
columns = ['ID', 'Name', 'Col3', 'Col4', 'Col5'])
for index, row in df.iterrows():
output2 = pd.DataFrame(
columns = ['ID', 'Name', 'Col3', 'Col4', 'Col5'])
col3counter = df.iloc[index, 2].count(',')
col4counter = df.iloc[index, 3].count(',')
col5counter = df.iloc[index, 4].count(',')
numofnewcols = max(col3counter, col4counter, col5counter) + 1
iter1 = df.iloc[index, 2].split(', ')
iter2 = df.iloc[index, 3].split(', ')
iter3 = df.iloc[index, 4].split(', ')
#for q in iter1
#output2.iloc[ , 2] =
output1 = pd.concat([output1, output2], ignore_index=True)
del output2