I am trying to build a query that tells me how many distinct women and men there are in a given dataset. The person is identified by a number 'tel'. It is possible for the same 'tel' to appear multiple times, but that 'tel's gender should only be counted one time!
7136609221 - male
7136609222 - male
7136609223 - female
7136609228 - male
7136609222 - male
7136609223 - female
This example_dataset would yield the following.
Total unique gender count: 4
Total unique male count: 3
Total unique female count: 1
My attempted query:
SELECT COUNT(DISTINCT tel, gender) as gender_count,
COUNT(DISTINCT tel, gender = 'male') as man_count,
SUM(if(gender = 'female', 1, 0)) as woman_count
FROM example_dataset;
There's actually two attempts in there. COUNT(DISTINCT tel, gender = 'male') as man_count seems to just return the same as COUNT(DISTINCT tel, gender) -- it doesn't take into account the qualifier there. And the SUM(if(gender = 'female', 1, 0)) counts all the female records, but is not filtered by DISTINCT tels.
COUNT(DISTINCT tel, gender = 'male')gives man_count = 4 wrongly; it should be 3 -- unique per tel.