I need to subset rows of df based on several columns (c1 through c100 columns) which have strings. Subset rows which equal a particular value for any of the 100 columns. Minimal example with 3 columns is:
df = pd.DataFrame({"": [0,1,2,3,4,5,6,7,8],
"c1": ["abc1", "", "dfg", "abc1", "","dfg","ghj","abc1","abc1"],
"c2": ["abc1", "abc1", "dfg", "dfg", "","dfg","","ghj","abc1"],
"c3": ["abc1", "", "dfg", "dfg", "dfg","dfg","abc1","ghj","abc1"]})
c1 c2 c3
0 0 abc1 abc1 abc1
1 1 abc1
2 2 dfg dfg dfg
3 3 abc1 dfg dfg
4 4 dfg
5 5 dfg dfg dfg
6 6 ghj abc1
7 7 abc1 ghj ghj
8 8 abc1 abc1 abc1
The loc command gives us the answer but too many 'or' conditions to write. I am looking for something iterative over 100 columns. Also what if I want to check for a list of strings instead of just 1 string. For example ['abc1', 'ghj'], instead of just 'abc1'.
df.loc[(df['c1'] == "abc1") | (df['c2'] == "abc1") | (df['c3'] == "abc1")]