Check if a Postgres array contains a subarray in order

Question

I need to write a SQL(or a SQL function) to check if a Postgres item contains a slice of the item. I'll just call this item 'Item'. So basically 'Item' looks like this:

enter image description here

The size of e1 is always the same as e2.

Now here comes 2 arrays, I'll just call them a1 and a2, which looks like this.

'{b,c,d}', '{2,3,4}'.

The values in e2(a2) have to match exact the same index as in e1(a1), so in this example both the 1st and the 4th items in the table will match, but not the 2nd or the 3rd. So if 2 arrays are '{c,d}', '{3,4}', then the 1st, the 3rd and the 4th items will match.

I have no idea how to do this, do I need to use something like generate_series() to generate all possible slice of the item and then check? I am a bit confused now.

Hi. In future please don't post screenshots of data; instead, use the site's formatting features to display tables etc. This makes it searchable for people later, and helps out readers with vision difficulties. It also helps people help you since they can copy and paste your sample data when testing. Thanks! — Craig Ringer
– Craig Ringer, Commented Oct 9, 2017 at 1:23
For the benefit of other readers, the @> operator won't help here since it doesn't respect order, it's a set operation. — Craig Ringer
– Craig Ringer, Commented Oct 9, 2017 at 1:25
Sorry about that, it is my first time asking a question on stackoverflow. I won't do that again :) — GrandmaChen
– GrandmaChen, Commented Oct 9, 2017 at 4:54

klin · Accepted Answer · 2017-10-09 00:16:11Z

1

The function returns an index in the array arr of the subarray sub, or 0 if arr does not contain sub:

create or replace function index_of_subarray(arr anyarray, sub anyarray)
returns integer language plpgsql immutable as $$
begin
    for i in 1 .. cardinality(arr)- cardinality(sub)+ 1 loop
        if arr[i:i+ cardinality(sub)- 1] = sub then
            return i;
        end if;
    end loop;
    return 0;
end $$;

Use:

with my_table(e1, e2) as (
values
    ('{a,b,c,d}'::text[], '{1,2,3,4}'::int[]),
    ('{b,c,d,a}', '{1,2,3,4}'),
    ('{c,d}', '{3,4}'),
    ('{b,c,d}', '{2,3,4}')
)

select e1, e2
from my_table
where index_of_subarray(e1, '{b,c,d}') > 0 
and index_of_subarray(e1, '{b,c,d}') = index_of_subarray(e2, '{2,3,4}')

    e1     |    e2     
-----------+-----------
 {a,b,c,d} | {1,2,3,4}
 {b,c,d}   | {2,3,4}
(2 rows)

answered Oct 9, 2017 at 0:16

klin

123k15 gold badges240 silver badges262 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

c_froehlich · Accepted Answer · 2023-12-31 11:17:55Z

Depending on the situation you may create a string from the array and query it with regex/like.

Which sounds like a naive approach turns out to be faster than the plpgsql function of the accepted answer. At least in the following situation:

drop table if exists temp;
create table temp as
  SELECT array[
    chr((floor(random()*26) + 65)::int),
    chr((floor(random()*26) + 65)::int),
    chr((floor(random()*26) + 65)::int),
    chr((floor(random()*26) + 65)::int),
    chr((floor(random()*26) + 65)::int),
    chr((floor(random()*26) + 65)::int)
    ] as a FROM GENERATE_SERIES(1, 1000000);

Query with like

\timing

select count(*) from temp
where array_to_string(a, '>>') like '%A>>B%';
 count
-------
  7505
(1 row)

Time: 270,404 ms

Query with function

select count(*) from temp
where index_of_subarray (a, array['A', 'B']) != 0;
 count
-------
  7505
(1 row)

Time: 1999,002 ms (00:01,999)

The difference in performance becomes even more clearly when you need to match wildcards:

select count(*) from temp
where array_to_string(a, '>>') like '%A>>B%>>C>>D%';
 count
-------
     7
(1 row)

Time: 173,343 ms

With function

select count(*) from temp
where index_of_subarray (a, array['A', 'B']) != 0
and index_of_subarray (a, array['A', 'B']) < index_of_subarray (a, array['C', 'D']);
 count
-------
     7
(1 row)

Time: 1999,791 ms (00:02,000)

I'm astonished of the difference in performance.

Maybe it's because plpgsql is so much slower than the native implementation of Regex. Which is most probably very optimized.

Maybe there is something special in the test situation created that leads to the advantage in performance.

Furthermore to make the stringify solution reliable you need to supply a delimiter to array_to_string which does not occur in the array elements. There are situations where this may not be feasible.

But apart from that you may consider to make a string from the array and query it with like/regex.

Collectives™ on Stack Overflow

Check if a Postgres array contains a subarray in order

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related