I apologize for the confusing title, but this is kind of a confusing question.
I have a CSV file with multiple columns, like in this example:
header_a | header_b | header_c | header_d
abc 1 data1 data2
abc 1 data3 data4
abc 2 data5 data6
abc 2 data7 data8
abc 3 data9 data10
I need a script that would be able to transform this data to the following format:
header_a | header_b | header_c | header_d
abc 1 data1 data2 data3 data4
abc 2 data5 data6 data7 data8
abc 3 data9 data10
I do not care about the header as much since there could me multiple entries. But in short, whenever the values in header_b match, I need all the values after it in the row to be appended to the first instance of it in the data frame.
I kind of have a skeleton of how i would approach the problem but I am stuck:
dd.sort_values('Purchase Order #', inplace=True)
values = dd['Purchase Order #'].unique().tolist()
for x in values:
header_flag = False
for row in dd['Purchase Order #']:
if x == row:
if header_flag == False:
#This is the first purchase order, copy entire line
print(row.tolist())
#set the flag to True
header_flag = True
else:
#We already have the first header, only copy next 5
print('Else Block')
else:
#Do nothing
print('False')
The first 2 lines sort it by the value that needs to match and pulls a list of unique ones in the dataframe. Is pandas perhaps not suited for this?