Sample Input DataFrame:
merged_df
Full Name Kommata 2007 Kommata 2015 Kommata 2019
0 Athanasios bouras New democracy New democracy New democracy
1 Andreas loverdos Pasok Pasok-democratic alignment Movement for change
2 Theodora tzakri Pasok Pasok Syriza
3 Thanasis zempilis Pasok NaN New democracy
Desired DataFrame:
edges_df
Source Target
0 New democracy_2007 New democracy_2015
1 New democracy_2015 New democracy_2019
2 Pasok_2007 Pasok-democratic alignment_2015
3 Pasok-democratic alignment_2015 Movement for change_2019
4 Pasok_2007 Pasok_2015
5 Pasok_2015 Syriza_2019
6 Pasok_2007 New democracy_2019
As implied above, I have an input DataFrame with n columns; the first one has unique values (Full Name) and the other n-1 (Kommata YYYY) are some attributes of the rows. I want to generate a new DataFrame with two columns as follows:
For each
Full Nameit will have 0 or more rowsStarting from the leftmost
Kommatacolumn, it takes every adjacent pair of not null values e.g.Kommata 2007-Kommata 2015, Kommata 2015-Kommata 2019; the pairKommata 2007-Kommata 2019can only exist ifKommata 2015is nullEvery pair will be a new row
Each column's value is modified like this: value_YYYY where the value remains the same and the YYYY is taken from the column name (e.g.
'{}_{}'.format(prev_value, col_name.split()[-1]))
Thanks in advance