2

Hey so the title might be hard to understand so basically here's a small sample of my DataFrame.

    A   B   C   D   E   F   G   H   J   K   action
0                       22                  noise
1                           68              junk
2                   93                      junk
3           80                              junk
4                                   57      noise

The actions column only has two values (noise and junk). For instance in the first initial row column 'F' has a value of 22 and it's action is noise, and I want to count how many times 'F' has a non-null value when action is 'noise' and 'F' when action is 'junk'. Of course I want to count this for all the other single letter columns also. So I want to have a dictionary that likely looks like this where the inner dictionary has counts per action.

{'F': {'noise': 1, 'junk': 0},
 'G': {'noise': 0, 'junk': 1},
 'E': {'noise': 0, 'junk': 1},
 'C': {'noise': 0, 'junk': 1},
 'J': {'noise': 1, 'junk': 0}
}

I've tried going through with df.iterrows() and df.notnull() but I can't seem to get the logic right.

edit - Updated the expected output.

2
  • Your dictionary doesn't seem right for your sample data Commented Jul 24, 2019 at 19:30
  • Oh, sorry that example output was just me giving a really random example, I'll edit it to actually mirror the sample data. Commented Jul 24, 2019 at 19:31

1 Answer 1

3

Use notnull() to mask your df, groupby each action and simply sum

df.iloc[:, :-1].notnull().astype(int).groupby(df.action).sum().to_dict()
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot, it worked perfectly! Could it be possible to explain what's happening here? My take is that the .iloc statement indexes the whole dataframe, then notnull() changes them to boolean values and .astype(int) only retains the columns with integers?
@animusdx Glad it helped! iloc just removes the last column (i.e. we just select columns from A to K). notnull() will transform the dataframe such that every value that is NaN becomes False, and every other becomes True. Then, astype(int) makes these 0 and 1 respectively (so that we can sum them ! :] ). Then, simply groupby action and sum. The to_dict method is built in and creates the final dictionary!
Thanks again for the help! If it wasn't evident I'm a novice with pandas and working with it is like opening a whole can of worms.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.