JOIN with multiple columns in postgresql

Question

I have the following two tables in postgresql:

     TABLE: act_codes
    ===================
     activity  act_desc
    ____________________
        1      sleeping
        2      commuting
        3      eating
        4      working
     TABLE: data
    ===================
    act1_1     act_1_2     act1_3     act1_4
    ---------------------------------------------
      1         1           3           4
      1         2           2           3
      1         1           2           2
      1         2           2           3
      1         1           1           2
      1         1           3           4
      1         2           2           4
      1         1           1           3
      1         3           3           4
      1         1           4           4

The act_codes table is basically a table of activities (with a code and a description), and the data table contains the activity codes for (in this case) 4 different times (act1_1, act1_2, act1_3 and act1_4).

I am trying to query this to get a table of counts for each activity. I have managed to do this for each individual column (in this case act1_4) like this:

    SELECT A.act_code, A.act_desc, COUNT (act1_4) 
    FROM act_codes AS A
    LEFT JOIN data AS D 
    ON D.act1_4 = A.act_code
    GROUP BY A.act_code, A.act_desc;

Which works fine for that column, but I have a very large number of columns to work through, so would prefer it if there was a way to do this within an SQL query.

I now have the following query (many thanks to banazs):

    SELECT
        ac.act_code, 
        ac.act_desc,
        act_time,
        COUNT(activity) AS act_count
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS act_time,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS activity
        FROM
            data d) t
    RIGHT JOIN
        act_codes ac ON t.activity = ac.act_code
    GROUP BY
        ac.act_code, 
        ac.act_desc,
        act_time, activity
    ORDER BY 
        activity, 
        act_time
    ;

Which outputs:

    act_code        act_desc        act_time        act_count
    ---------------------------------------------------------
        1           sleeping            act1_1          10
        1           sleeping            act1_2          6
        1           sleeping            act1_3          2
        2           commuting           act1_2          3
        2           commuting           act1_3          4
        2           commuting           act1_4          2
        3           eating              act1_2          1
        3           eating              act1_3          3
        3           eating              act1_4          3
        4           working             act1_3          1
        4           working             act1_4          5

Which is basically what I was looking for. Ideally, the rows with zero counts could be added in somehow, but gI am guessing that this is perhaps best done as a separate process (e.g. constructing a crosstab in R or something).

Please confirm: the data table has a separate column for each time? — Richard
– Richard, Commented Feb 20, 2017 at 8:21
The columns in the data table correspond to time segments (e.g. act1_1 is the first 5 mins, act1_2 is the second,etc..). Is not my design - so I have to work with this shape of data. — T Craig
– T Craig, Commented Feb 20, 2017 at 8:38

banazs · Accepted Answer · 2017-02-20 16:11:22Z

2

You can "unpivot" the data using UNNEST:

   SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS value
    FROM
        data d
    ;

Count the activities:

SELECT
    ac.act_code, 
    ac.act_desc,
    COUNT(*)
FROM
    (SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS val
    FROM
        data d) t
INNER JOIN
    act_codes ac ON t.val = ac.act_code
GROUP BY
    ac.act_code, 
    ac.act_desc 
;

edited Feb 20, 2017 at 16:11

answered Feb 20, 2017 at 10:50

banazs

1163 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

T Craig Over a year ago

I just tried that, and it works as a reshaping step, but I am not sure how to convert this into the table of counts that I am looking to generate.

banazs · Accepted Answer · 2017-02-21 07:38:09Z

To achive the table described above, the query is need to be redesigned a bit.

First you have to create an auxiliary table which contains the cartesian product of the column names and the activities:

SELECT 
    *
FROM
    act_codes ac
-- if you have lots of columns you can query their 
-- names from the information_schema.columns system 
-- table 
CROSS JOIN -- the CROSS JOIN combine each rows from both tables
    (SELECT 
        column_name 
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn 
;

Adding the number of activites to this:

SELECT 
    ac.act_code,
    ac.act_desc,
    cn.column_name,
    -- the COALESCE add zero values where the original is NULL
    COALESCE(ad.act_no ,0) AS act_no
FROM
    act_codes ac
CROSS JOIN
    (SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn
-- you need to use LEFT JOIN to preserve all rows
-- from the cartesian product
LEFT JOIN
    (SELECT 
        t.column_name,
        t.act_code,
        COUNT(*) AS act_no
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
        FROM
            data d) t
    GROUP BY
        t.column_name,
        t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

To format the result to looks like yours is possible, but a little bit messy. You need to create two tables, the first has to contain the result set of the previous query, the second the column names.

CREATE TABLE acts AS
    SELECT 
        ac.act_code,
        ac.act_desc,
        cn.column_name,
        COALESCE(ad.act_no ,0) AS act_no
    FROM
        act_codes ac
    CROSS JOIN
        (SELECT 
            column_name
        FROM 
            information_schema.columns 
        WHERE 
            table_schema = 'stackoverflow' 
            AND table_name = 'data' 
            AND column_name LIKE 'act%') cn
    LEFT JOIN
        (SELECT 
            t.column_name,
            t.act_code,
            COUNT(*) AS act_no
        FROM
            (SELECT
                UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
                UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
            FROM
                data d) t
        GROUP BY
            t.column_name,
            t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

CREATE TABLE column_names AS
    SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%'
;

Install the tablefunc extension.

CREATE EXTENSION tablefunc;

It provides the crosstab() function and using this you can get the described output.

SELECT 
    *
FROM   
    crosstab(
        'SELECT act_desc, column_name, act_no FROM acts ORDER  BY 1',  
        'SELECT * FROM column_names'
    )  
AS 
    ct (
        "act_desc" text, 
        "act1_1" int, 
        "act1_2" int, 
        "act1_3" int, 
        "act1_4" int
        );
;

+-----------+--------+--------+--------+--------+
| act_desc  | act1_1 | act1_2 | act1_3 | act1_4 |
+-----------+--------+--------+--------+--------+
| commuting |      0 |      3 |      4 |      2 |
| eating    |      0 |      1 |      3 |      3 |
| sleeping  |     10 |      6 |      2 |      0 |
| working   |      0 |      0 |      1 |      5 |
+-----------+--------+--------+--------+--------+

T Craig · Accepted Answer · 2017-02-20 20:22:52Z

0

Thanks @banazs - that is really useful in terms of helping me understand how to structure queries like this.

However, I still have a difficulty in arranging the query to split the output so that there is a column of counts for each time. Apologies - I think the labeling here is a bit confusing (act1_1 is referring to activities done at time_1, and 'act1_2' refers to time_2, etc..). The result I am trying to get to looks like this:

    act_code    act_desc        count_act1_1    count_act1_2    count_act1_3    count_act1_4
    ----------------------------------------------------------------------------------------
        1       sleeping            10              6               2               0
        2       commuting           0               3               4               2
        3       eating              0               1               3               3
        4       working             0               0               1               5

I am not concerned about the output being in columns - I can easily reshape it, but it is important that the zero's are present in the table. Is this possible?

edited Feb 20, 2017 at 20:22

answered Feb 20, 2017 at 17:42

T Craig

151 silver badge5 bronze badges

1 Comment

banazs Over a year ago

I created a new answer to this, it's a little bit complicated so feel free to ask about it.

Collectives™ on Stack Overflow

JOIN with multiple columns in postgresql

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related