Context: I have a data in excel that we process through Pandas to clean up and then further use it in ML model. In clean-up process, I'm trying to filter data based on multiple columns as an OR condition. This set of columns has header name as start of week date -so these 7 columns would represent 7 weeks. This column's header name changes every week. Hence, I can't keep the consistent code in place to pick the header name automatically.
Logic That I Have tried: I wrote a code chunk to print the "OR" condition using this date columns, after that I copy paste this print statement in Data frame in-dices part. Below is how it looks like:
I'm copy pasting the column as of now. But I guess I can built a logic to identify the date column by applying type-based-condition to column names
Sample Data:
1/20/2019 1/27/2019 2/3/2019 2/10/2019 2/17/2019 2/24/2019 3/3/2019 \
0 0(80CS,8H) 0(80CS) 0(80CS) 0(80CS) 0(80CS) 0(80CS) 0(80CS)
1 0(50CS,8H) 0(50CS) 0(50CS) 0(50CS) 0(50CS) 0(50CS) 0(50CS)
2 0(40CS,8H) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
3 0(40CS,8H) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
4 0(40CS,8H) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
5 0(40CS,8H) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
6 12(25CS,8H) 15(25CS) 15(25CS) 15(25CS) 15(25CS) 15(25CS) 15(25CS)
7 11(28CS,8H) 12(28CS) 12(28CS) 12(28CS) 12(28CS) 12(28CS) 12(28CS)
8 8(30CS,8H) 10(30CS) 10(30CS) 10(30CS) 2(30CS,32T) 10(30CS) 10(30CS)
9 0(40CS,8H) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
3/10/2019 3/17/2019 3/24/2019 3/31/2019 4/7/2019
0 0(80CS) 0(80CS) 0(80CS) 0(80CS) 0(80CS)
1 0(50CS) 0(50CS) 0(50CS) 0(50CS) 0(50CS)
2 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
3 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
4 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
5 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
6 15(25CS) 15(25CS) 15(25CS) 20(20CS) 20(20CS)
7 12(28CS) 12(28CS) 12(28CS) 12(28CS) 12(28CS)
8 10(30CS) 10(30CS) 10(30CS) 10(30CS) 10(30CS)
9 0(40CS) 0(40CS) 0(40CS) 0(40CS) 0(40CS)
avail_col = ['1/20/2019',
'1/27/2019', '2/3/2019', '2/10/2019', '2/17/2019', '2/24/2019',
'3/3/2019', '3/10/2019', '3/17/2019', '3/24/2019', '3/31/2019',
'4/7/2019']
##changing the data type of selected columns
for i in avail_col:
avail_dat[i] = avail_dat[i].astype(str).apply(lambda x: x.split('(')[0])
avail_dat[i] = avail_dat[i].str.replace('-','0')
avail_dat[i] = avail_dat[i].astype(float)
or_str = ''
for i in avail_col:
or_str = "(avail_dat['"+i+"'] >= 24) | "
print(or_str)
Apparently I can't pass the variable to data frame to filter or I don't know how to do that yet, So I copy paste the printed statement to the below code to filter the data frame
avail_dat = avail_dat[(avail_dat['1/20/2019'] >= 24) |
(avail_dat['1/27/2019'] >= 24) |
(avail_dat['2/3/2019'] >= 24) |
(avail_dat['2/10/2019'] >= 24) |
(avail_dat['2/17/2019'] >= 24) |
(avail_dat['2/24/2019'] >= 24) |
(avail_dat['3/3/2019'] >= 24) |
(avail_dat['3/10/2019'] >= 24) |
(avail_dat['3/17/2019'] >= 24) |
(avail_dat['3/24/2019'] >= 24) |
(avail_dat['3/31/2019'] >= 24) |
(avail_dat['4/7/2019'] >= 24)
]
Is there a way that I can pass a variable instead of copy pasting every time ?