24

I have a table with JSON data in it, and a statement that pulls out an array of ID's for each row...

SELECT items.data->"$.matrix[*].id" as ids
FROM items

This results in something like..

+------------+
|    ids     |
+------------+
| [1,2,3]    |
+------------+

Next I want to select from another table where the ID of that other table is in the array, similar to the WHERE id IN ('1,2,3') but using the JSON array...

Something along the lines of...

SELECT * FROM other_items 
WHERE id IN ( 
  SELECT items.data->"$.matrix[*].id" FROM items
);

but it needs some JSON magic and I cant work it out...

5
  • Is this [1,2,3] how data stored in that corresponding column? I mean comma separated ids enclosed by [ and ]? Commented Jul 18, 2016 at 9:18
  • this post is relevant Commented Jul 18, 2016 at 9:26
  • the resulting data is a mySQL JSON ARRAY Commented Jul 18, 2016 at 9:27
  • thanks @ImranAli, but that appears to be about selecting from a single value, what I want to do is select all rows where a value is contained in a type of ARRAY Commented Jul 18, 2016 at 9:33
  • there is a wild card to select multiple values please read the section Searching and Modifying JSON Values in this documentation Commented Jul 18, 2016 at 10:03

4 Answers 4

15

Starting from MySQL 8.0.13, there is MEMBER OF operator, which does exactly what you're looking for.

The query should be rewritten in the form of JOIN, though:

SELECT o.* FROM other_items o
JOIN items i ON o.id MEMBER OF(i.data->>'$.id')

If you want your query to have better performance, consider using multi-valued indexes on your JSON column.


Using of MEMBER OF() can be explained more clearly on the following example:

CREATE TABLE items ( data JSON );

INSERT INTO items
SET data = '{"id":[1,2,3]}';

That is how you find out whether the value is present in the JSON array:

SELECT * FROM items
WHERE 3 MEMBER OF(data->>'$.id');
+-------------------+
| data              |
+-------------------+
| {"id": [1, 2, 3]} |
+-------------------+
1 row in set (0.00 sec)

Note that type of the value matters in this case, unlike regular comparison. If you pass it in a form of string, there will be no match:

SELECT * FROM items
WHERE "3" MEMBER OF(data->>'$.id');

Empty set (0.00 sec)

Although regular comparison would return 1:

SELECT 3 = "3";
+---------+
| 3 = "3" |
+---------+
|       1 |
+---------+
1 row in set (0.00 sec)
Sign up to request clarification or add additional context in comments.

1 Comment

Fantastic - exactly what I was looking for. Kudos!
11

Below is a complete answer. You may want a 'use <db_name>;' statement at the top of the script. The point is to show that JSON_CONTAINS() may be used to achieve the desired join.

DROP TABLE IF EXISTS `tmp_items`;
DROP TABLE IF EXISTS `tmp_other_items`;

CREATE TABLE `tmp_items` (`id` int NOT NULL PRIMARY KEY AUTO_INCREMENT, `data` json NOT NULL);
CREATE TABLE `tmp_other_items` (`id` int NOT NULL, `text` nvarchar(30) NOT NULL);

INSERT INTO `tmp_items` (`data`) 
VALUES 
    ('{ "matrix": [ { "id": 11 }, { "id": 12 }, { "id": 13 } ] }')
,   ('{ "matrix": [ { "id": 21 }, { "id": 22 }, { "id": 23 }, { "id": 24 } ] }')
,   ('{ "matrix": [ { "id": 31 }, { "id": 32 }, { "id": 33 }, { "id": 34 }, { "id": 35 } ] }')
;

INSERT INTO `tmp_other_items` (`id`, `text`) 
VALUES 
    (11, 'text for 11')
,   (12, 'text for 12')
,   (13, 'text for 13')
,   (14, 'text for 14 - never retrieved')
,   (21, 'text for 21')
,   (22, 'text for 22')
-- etc...
;

-- Show join working:
SELECT 
    t1.`id` AS json_table_id
,   t2.`id` AS joined_table_id
,   t2.`text` AS joined_table_text
FROM 
    (SELECT st1.id, st1.data->'$.matrix[*].id' as ids FROM `tmp_items` st1) t1
INNER JOIN `tmp_other_items` t2 ON JSON_CONTAINS(t1.ids, CAST(t2.`id` as json), '$')

You should see the following results:

Results

3 Comments

Good example! The takeaway is the JSON_CONTAINS() function: dev.mysql.com/doc/refman/5.7/en/json-search-functions.html
This seems not to use the potential index on the t2.id (because of CAST(t2.id AS JSON)) leading to utterly slow results on bigger tables.
JSON_TABLE() now simplifies things and deals with not using an index: ` SELECT st1.id AS json_table_id , t2.id AS joined_table_id , t2.text AS joined_table_text FROM tmp_items st1 JOIN JSON_TABLE(st1.data, '$.matrix[*]' COLUMNS(other_item_id int PATH '$.id')) AS t1 INNER JOIN tmp_other_items t2 ON t1.other_item_id = t2.id `
5

Before JSON being introduced in MySQL, I use this:

  1. Ur original data: [1,2,3]

  2. After replace comma with '][': [1][2][3]

  3. Wrap ur id in '[]'

  4. Then use REVERSE LIKE instead of IN: WHERE '[1][2][3]' LIKE '%[1]%'

Answer to your question:

SELECT * FROM other_items 
WHERE
    REPLACE(SELECT items.data->"$.matrix[*].id" FROM items, ',', '][')
    LIKE CONCAT('%', CONCAT('[', id, ']'), '%')

Why wrap into '[]'

'[12,23,34]' LIKE '%1%' --> true
'[12,23,34]' LIKE '%12%' --> true

If wrap into '[]'

'[12][23][34]' LIKE '%[1]%' --> false
'[12][23][34]' LIKE '%[12]%' --> true

Comments

4

Take care that the accepted answer won't use index on tmp_other_items leading to slow performances for bigger tables.

In such case, I usually use an integers table, containing integers from 0 to an arbitrary fixed number N (below, about 1 million), and I join on that integers table to get the nth JSON element:

DROP TABLE IF EXISTS `integers`;
DROP TABLE IF EXISTS `tmp_items`;
DROP TABLE IF EXISTS `tmp_other_items`;

CREATE TABLE `integers` (`n` int NOT NULL PRIMARY KEY);
CREATE TABLE `tmp_items` (`id` int NOT NULL PRIMARY KEY AUTO_INCREMENT, `data` json NOT NULL);
CREATE TABLE `tmp_other_items` (`id` int NOT NULL PRIMARY KEY, `text` nvarchar(30) NOT NULL);

INSERT INTO `tmp_items` (`data`) 
VALUES 
    ('{ "matrix": [ { "id": 11 }, { "id": 12 }, { "id": 13 } ] }'),
   ('{ "matrix": [ { "id": 21 }, { "id": 22 }, { "id": 23 }, { "id": 24 } ] }'),
   ('{ "matrix": [ { "id": 31 }, { "id": 32 }, { "id": 33 }, { "id": 34 }, { "id": 35 } ] }')
;

-- Put a lot of rows in integers (~1M)
INSERT INTO `integers` (`n`) 
(
    SELECT 
        a.X
        + (b.X << 1)
        + (c.X << 2)
        + (d.X << 3)
        + (e.X << 4)
        + (f.X << 5)
        + (g.X << 6)
        + (h.X << 7)
        + (i.X << 8)
        + (j.X << 9)
        + (k.X << 10)
        + (l.X << 11)
        + (m.X << 12)
        + (n.X << 13)
        + (o.X << 14)
        + (p.X << 15)
        + (q.X << 16)
        + (r.X << 17)
        + (s.X << 18)
        + (t.X << 19) AS i
    FROM (SELECT 0 AS x UNION SELECT 1) AS a
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS b ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS c ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS d ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS e ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS f ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS g ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS h ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS i ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS j ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS k ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS l ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS m ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS n ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS o ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS p ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS q ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS r ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS s ON TRUE
        INNER JOIN (SELECT 0 AS x UNION SELECT 1) AS t ON TRUE)
;

-- Insert normal rows (a lot!)
INSERT INTO `tmp_other_items` (`id`, `text`) 
    (SELECT n, CONCAT('text for ', n) FROM integers);

Now you cna try again the accepted answer's query, which takes about 11seconds to run (but is simple):

-- Show join working (slow)
SELECT 
    t1.`id` AS json_table_id
,   t2.`id` AS joined_table_id
,   t2.`text` AS joined_table_text
FROM 
    (SELECT st1.id, st1.data->'$.matrix[*].id' as ids FROM `tmp_items` st1) t1
INNER JOIN `tmp_other_items` t2 ON JSON_CONTAINS(t1.ids, CAST(t2.`id` as JSON), '$')
;

And compare it to the faster approach of converting the JSON into a (temporary) table of ids, and then doing a JOIN over it (which lead to instant results, 0.000sec according to heidiSQL):

-- Fast
SELECT
    i.json_table_id,
    t2.id AS joined_table_id,
    t2.`text` AS joined_table_text
FROM (
    SELECT 
        j.json_table_id,
        -- Don't forget to CAST if needed, so the column type matches the index type
        -- Do an "EXPLAIN" and check its warnings if needed
        CAST(JSON_EXTRACT(j.ids, CONCAT('$[', i.n - 1, ']')) AS UNSIGNED) AS id
    FROM (
        SELECT 
            st1.id AS json_table_id,
            st1.data->'$.matrix[*].id' as ids,
            JSON_LENGTH(st1.data->'$.matrix[*].id') AS len
        FROM `tmp_items` AS st1) AS j
        INNER JOIN integers AS i ON i.n BETWEEN 1 AND len) AS i
    INNER JOIN tmp_other_items AS t2 ON t2.id = i.id
    ;

The most inner SELECT retrieves the list of JSON ids, along with their length (for outer join).

The 2nd inner SELECT takes this list of ids, and JOIN on the integers to retrieve the nth id of every JSON list, leading to a table of ids (instead of a table of jsons).

The outer most SELECT now only has to join this table of ids with the table containing the data you wanted.

Below is the same query using WHERE IN, to match the question title:

-- Fast (using WHERE IN)
SELECT t2.*
FROM tmp_other_items AS t2
WHERE t2.id IN (
    SELECT 
        CAST(JSON_EXTRACT(j.ids, CONCAT('$[', i.n - 1, ']')) AS UNSIGNED) AS id
    FROM (
        SELECT 
            st1.data->'$.matrix[*].id' as ids, 
            JSON_LENGTH(st1.data->'$.matrix[*].id') AS len
        FROM `tmp_items` AS st1) AS j
        INNER JOIN integers AS i ON i.n BETWEEN 1 AND len)
    ;

5 Comments

Hey this looks extremely ingenious, the JSON_EXTRACT trick based on length is very clever! But I can't wrap my head around the 1m integers table: why so many? Do you really expect for a matrix key to contain so many ids? Am I missing something obvious? Thank you!
@Gruber Thanks! There is no speciific reason for the integers table to have 1M rows: it has 1M rows in my case because I also use it for some other stuff that requires 100K+ rows. You only need a table with "enough rows to be greater than your JSON_LENGTH during your application life"
Thank you for your answer! Sorry I'm quite a rookie with SQL, I noticed that working with 1m rows ints table the query can be a bit slow on my machine, here the gist with an explain. The rows scanned in ints table is 530258, about half the total. This happens because of the between 1 and j.len. I've noticed that a hard coded int instead like between 1 and 10 the query is instant, also reducing the ints total rows to 10k will make the query instant, but still about 5k rows will be scanned.Why is that? Thank you again!
PS. I'm using MariaDB 10.3.18 if that makes any difference
integers table has an index on its n column, so the number of row (in short) doesn't matter at all (even 1 billion row would be like as fast as 100 rows). The perf depends on the number of elements in your matrix JSON array (cutting the table to 10K if you have 500k elements will not yield all elements! :) ). Don't forget to also index your tmp_other_items. I don't know for mysql/maridb difference; so long you are on last versions, it's fine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.