I've done this with iterrows(), but hoping there is a faster and more elegant way to achieve the desired outcome.
Problem Statement:
I have several rows of NaN and notnull values across a subset of columns (product1, product2, ...) in a dataframe (df_orders). I want to take every non-null value in this subset and create a new column containing every value starting from the first row all the way to the last.
Example: Create a single column containing all the products ordered.
>>> df_orders = pd.read_csv('orders.csv')
>>> df_orders
OrderNo CustName Product1 Product2 Product3 Product4 Product5
0 20043 Sanjay Singh 131 320 320 131 nan
1 20042 William Sonoma 420 420 131 320 511
2 20041 Maria Alonso 320 420 320 nan nan
3 20040 Jim Beam 511 131 nan nan nan
4 20039 Gunter Grass 320 131 131 131 nan
5 20038 Billy Joe Bob 420 511 511 nan nan
6 20037 Cynthia Silvia Stout 55 12 131 55 12
7 20036 Alan Ginsburg 131 320 320 12 nan
8 20035 Ronald McDonald 131 131 511 nan nan
The result I'm looking for:
Create a new dataframe called df_product_list. Starting with the first row in df_orders, create a new row in df_product_list for each non-null product column value.
Because the order from Sanjay Singh is first and has four non-null values in the product columns, the first four rows of the df_product_list will be 131, 320, 320, and 131.
>>> df_product_list
ProdCode
0 131
1 320
2 320
3 131
4 420
5 420
6 131
7 320
8 511
9 320
10 420
11 320
12 511
13 131
14 320
15 131
16 131
17 131
...
...