I have a large df with 1000s of columns, shorter version here:
largedf = pd.DataFrame({'arow': ['row1', 'row2', 'row3', 'row4'], 'bread': ['b', 'b', 'b', 'a'], 'fruit': ['c', 'b', 'b', 'a'],
'tea': ['b', 'a', 'b', 'a'], 'water': ['b', 'c', 'b', 'c']})
arow bread fruit tea water
0 row1 b c b b
1 row2 b b a c
2 row3 b b b b
3 row4 a a a c
I want to save rows that have exactly one category without b, where the categories are defined as the lists (once again, actually lots more lists than 2):
food = ['bread', 'fruit']
drink = ['tea', 'water']
row2 is the only row that would be saved in this case.
row1 doesn't have a category without b,
row3 is all b,
row4 is all notb
The preferred output would have a column for the single notb category and what percentage of notb is in that row:
arow bread fruit tea water category perc
1 row2 b b a c drink 0.5