1

Using this database:

create table t1 (t1_id int);
insert into t1 values (1);
insert into t1 values (2);

create table t2 (t2_id int, t1_id int, t2_value int);
insert into t2 values (1, 1, 1);
insert into t2 values (2, 1, 1);
insert into t2 values (3, 2, 2);
insert into t2 values (4, 2, 3);

And running this query that aggregates data from a correlated subquery into a JSON array:

select json_arrayagg(json_object(
  key 't1_id' value t1_id,
  key 't2' value (
    select json_arrayagg(json_object(
      key 't2_value' value t2_value
    ))
    from (
      select distinct t2.t2_value
      from t2
      where t2.t1_id = t1.t1_id
    ) t
  ) format json
))
from t1;

I'm getting the following wrong result on Oracle 18c XE:

[{
  "t1_id":1,
  "t2":[{ "t2_value":1 }, { "t2_value":1 }]
}, {
  "t1_id":2,
  "t2":[{ "t2_value":2 }, { "t2_value":3 }]
}]

Notice that despite me using DISTINCT t2.t2_value in the derived table, I'm getting duplicates for t1_id = 1.

Is this a bug? It probably is, I can't reproduce it in 19c. How can I work around it on 18c?

0

2 Answers 2

2

It does seem to be a bug. The execution plan does not hint at any DISTINCT operation being applied:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     6 (100)|          |
|   1 |  SORT GROUP BY     |      |     1 |    26 |            |          |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |    26 |     3   (0)| 00:00:01 |
|   3 |  SORT GROUP BY     |      |     1 |    13 |            |          |
|   4 |   TABLE ACCESS FULL| T1   |     2 |    26 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Workaround 1

Use a dummy HAVING COUNT(*) = COUNT(*) predicate:

select json_arrayagg(json_object(
  key 't1_id' value t1_id,
  key 't2' value (
    select json_arrayagg(json_object(
      key 't2_value' value t2_value
    ))
    from (
      select distinct t2.t2_value
      from t2
      where t2.t1_id = t1.t1_id
    ) t
    having count(*) = count(*) -- Workaround
  ) format json
))
from t1;

This produces the correct result:

[{
  "t1_id":1,
  "t2":[{ "t2_value":1 }]
}, {
  "t1_id":2,
  "t2":[{ "t2_value":2 }, { "t2_value":3 }]
}]

The plan is now:

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |       |       |     7 (100)|          |
|*  1 |  FILTER               |      |       |       |            |          |
|   2 |   SORT GROUP BY       |      |     1 |    13 |            |          |
|   3 |    VIEW               |      |     1 |    13 |     4  (25)| 00:00:01 |
|   4 |     SORT UNIQUE       |      |     1 |    26 |     4  (25)| 00:00:01 | <--
|*  5 |      TABLE ACCESS FULL| T2   |     1 |    26 |     3   (0)| 00:00:01 |
|   6 |  SORT GROUP BY        |      |     1 |    13 |            |          |
|   7 |   TABLE ACCESS FULL   | T1   |     2 |    26 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Workaround 2

Use a UNION to enforce distinctness:

select json_arrayagg(json_object(
  key 't1_id' value t1_id,
  key 't2' value (
    select json_arrayagg(json_object(
      key 't2_value' value t2_value
    ))
    from (
      select distinct t2.t2_value
      from t2
      where t2.t1_id = t1.t1_id
      union select null from dual where 1 = 0 -- Dummy union
    ) t
  ) format json
))
from t1;

The plan is now:

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |       |       |     6 (100)|          |
|   1 |  SORT GROUP BY        |      |     1 |    13 |            |          |
|   2 |   VIEW                |      |     2 |    26 |     3   (0)| 00:00:01 |
|   3 |    SORT UNIQUE        |      |     2 |    26 |     3   (0)| 00:00:01 | <--
|   4 |     UNION-ALL         |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| T2   |     1 |    26 |     3   (0)| 00:00:01 |
|*  6 |      FILTER           |      |       |       |            |          |
|   7 |       FAST DUAL       |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |  SORT GROUP BY        |      |     1 |    13 |            |          |
|   9 |   TABLE ACCESS FULL   | T1   |     2 |    26 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

And the result is also correct

Sign up to request clarification or add additional context in comments.

Comments

1

could be this bug Bug 27757725 - JSON GENERATION AGGREGATION FUNCTIONS IGNORE DISTINCT can you request a patch?

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.