20

I have a table invoices with a field invoice_number. This is what happens when i execute select invoice_number from invoice:

invoice_number
--------------
1
2
3
5
6
10
11

I want a SQL that gives me the following result:

gap_start | gap_end
4         | 4
7         | 9

How can i write a SQL to perform such query? I am using PostgreSQL.

1
  • 1
    I think you will find the LEAD() window function (or LAG()) very helpful. Commented May 11, 2015 at 13:08

3 Answers 3

40

The name of this problem is the "Gaps and Islands problem" which can be done with any modern SQL, using window functions:

select invoice_number + 1 as gap_start, 
       next_nr - 1 as gap_end
from (
  select invoice_number, 
         lead(invoice_number) over (order by invoice_number) as next_nr
  from invoices
) nr
where invoice_number + 1 <> next_nr;

SQLFiddle: http://sqlfiddle.com/#!15/1e807/1

Walkthrough example here using row_number over partition and interval: Postgres Consecutive Days, gaps and islands, Tabibitosan

Sign up to request clarification or add additional context in comments.

Comments

6

We can use a simpler technique to get all missing values first, by joining on a generated sequence column like so:

select series
from generate_series(1, 11, 1) series
left join invoices on series = invoices.invoice_number
where invoice_number is null;

This gets us the series of missing numbers, which can be useful on it's own in some cases.

To get the gap start/end range, we can instead join the source table with itself.

select invoices.invoice_number + 1 as start, 
       min(fr.invoice_number) - 1 as stop
from invoices
left join invoices r on invoices.invoice_number = r.invoice_number - 1
left join invoices fr on invoices.invoice_number < fr.invoice_number
where r.invoice_number is null
      and fr.invoice_number is not null
group by invoices.invoice_number,
         r.invoice_number;

dbfiddle: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=32c5f3c021b0f1a876305a2bd3afafc9

This is probably less optimised than the above solutions, but could be useful in SQL servers that don't support lead() function perhaps.


Full credit goes to this excellent page in SILOTA docs: http://www.silota.com/docs/recipes/sql-gap-analysis-missing-values-sequence.html

I highly recommend reading it, as it explains the solution step by step.

Comments

1

I found another query:

 select invoice_number + lag gap_start, 
        invoice_number + lead - 1 gap_end
    from (select invoice_number, 
                 invoice_number - lag(invoice_number) over w lag,
                 lead(invoice_number) over w - invoice_number lead 
              from invoices window w as (order by invoice_number)) x 
  where lag = 1 and lead > 1;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.