35

I am trying to build a query that tells me how many distinct women and men there are in a given dataset. The person is identified by a number 'tel'. It is possible for the same 'tel' to appear multiple times, but that 'tel's gender should only be counted one time!

7136609221 - male
7136609222 - male
7136609223 - female
7136609228 - male
7136609222 - male
7136609223 - female

This example_dataset would yield the following.
Total unique gender count: 4
Total unique male count: 3
Total unique female count: 1

My attempted query:

SELECT COUNT(DISTINCT tel, gender) as gender_count, 
       COUNT(DISTINCT tel, gender = 'male') as man_count, 
       SUM(if(gender = 'female', 1, 0)) as woman_count 
FROM example_dataset;

There's actually two attempts in there. COUNT(DISTINCT tel, gender = 'male') as man_count seems to just return the same as COUNT(DISTINCT tel, gender) -- it doesn't take into account the qualifier there. And the SUM(if(gender = 'female', 1, 0)) counts all the female records, but is not filtered by DISTINCT tels.

3
  • 1
    What are you getting as an answer when you run this? Commented Oct 30, 2013 at 1:09
  • COUNT(DISTINCT tel, gender = 'male') gives man_count = 4 wrongly; it should be 3 -- unique per tel. Commented Oct 30, 2013 at 3:37
  • SUM(if(gender = 'female', 1, 0)) gives woman_count = 2, wrongly. It should be 1 (unique per tel) Commented Oct 30, 2013 at 3:37

2 Answers 2

90

Here's one option using a subquery with DISTINCT:

SELECT COUNT(*) gender_count,
   SUM(IF(gender='male',1,0)) male_count,
   SUM(IF(gender='female',1,0)) female_count
FROM (
   SELECT DISTINCT tel, gender
   FROM example_dataset
) t

This will also work if you don't want to use a subquery:

SELECT COUNT(DISTINCT tel) gender_count,
    COUNT(DISTINCT CASE WHEN gender = 'male' THEN tel END) male_count,  
    COUNT(DISTINCT CASE WHEN gender = 'female' THEN tel END) female_count
FROM example_dataset
Sign up to request clarification or add additional context in comments.

4 Comments

DISTINCT CASE WHEN gender = 'male' THEN tel END worked perfect. It was the solution I was looking for. Thanks!!
COUNT(DISTINCT CASE WHEN ) was also exactly what i was looking for, thanks alot.
COUNT(DISTINCT CASE WHEN) worked for me. Thank You So Much.
Why does the second example use CASE WHEN instead of IF()?
14

There is another solution similar to @segeddes's second solution

Select COUNT(DISTINCT tel) as gender_count, 
       COUNT(DISTINCT IF(gender = "male", tel, NULL)) as male_count, 
       COUNT(DISTINCT IF(gender = "female", tel, NULL)) as female_count 
FROM example_dataset

Explanation :

IF(gender = "male", tel, NULL)

Above expression will return tel if gender is male else it will return NULL value

Then we've

DISTINCT

It will remove all the duplicates

And finally

COUNT(DISTINCT IF(gender = "male", tel, NULL))

Will count all the distinct occurrences of rows having male gender

Note : SQL COUNT function with expression only counts rows with non NULL values, for detailed explanation check - http://www.mysqltutorial.org/mysql-count/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.