1

I have a large database with connections between cities. Each connection has a start and destination town, a start date, and a price for that connection.

I'd like to compute any combinations of outgoing+return connections, for any connections, and for dates where the return connection is between 1-20 days. Then select the best price for each date combination.

Example:

Table:

city_start,     city_end,   date_start,     price
Hamburg         Berlin      01.01.2016      100.00
Berlin          Hamburg     10.01.2016      112.00
Berlin          Hamburg     10.01.2016      70.00
Berlin          Hamburg     12.01.2016      50.00
Berlin          Hamburg     30.02.2016      20.00
Paris           Madrid      ...
Madrid          Paris
London          Paris

Desired result:

Hamburg-Berlin-Hamburg, 01.01.2016, 10.01.2016, 170.00 (100+70)
Hamburg-Berlin-Hamburg, 01.01.2016, 12.01.2016, 150.00 (100+50)
...
(not Berlin-Hamburg on 30.02.2016 because it's >20 days from departure drive)
(not London-Paris, as there is no return Paris-London)

I can get the possible combinations by:

SELECT DISTINCT city_start, city_end, city_end, city_start from table

But how can I now compute their permutations?

3 Answers 3

3

The query to get all pairs uses a join:

select tto.city_start, tto.city_end, tto.date_start, tfrom.date_end,
       (tto.price + tfrom.price) as price
from t tto join
     t tfrom
     on tto.city_end = tfrom.city_start and
        tto.city_start = tfrom.city_end and
        tfrom.date_start >= tto.date_start + interval '1 day' and
        tfrom.date_end <= tto.date_start + interval '20 day';

To get the cheapest price, use window functions:

select tt.*
from (select tto.city_start, tto.city_end, tto.date_start, tfrom.date_end,
             (tto.price + tfrom.price) as price,
             row_number() over (partition by tto.city_start, tto.city_end order by (tto.price + tfrom.price) asc) as seqnum
      from t tto join
           t tfrom
           on tto.city_end = tfrom.city_start and
              tto.city_start = tfrom.city_end and
              tfrom.date_start >= tto.date_start + interval '1 day' and
              tfrom.date_end <= tto.date_start + interval '20 day'
      ) tt
where seqnum = 1;
Sign up to request clarification or add additional context in comments.

1 Comment

Great, that seems to work in general. Is there an alternative to the row_number partition part? (because the windowAggr functions performance is quite poor)?
1

Here is a solution without the row_number partition part:

SELECT
    a.city_start, a.city_end, b.city_end, a.date_start, b.date_start,
    min(a.price + b.price)
FROM
    flight AS a
    JOIN
    flight AS b ON a.city_start = b.city_end AND a.city_end = b.city_start
WHERE b.date_start BETWEEN a.date_start + 1 AND a.date_start + 20
GROUP BY a.city_start, a.city_end, b.city_end, a.date_start, b.date_start;

13 Comments

That's also great thanks. One question: what if I want to select a property outside the group by? Eg a.carName?
I'm not sure because I don't see carName in the data. I would need to see that in the context of the rest of the data to be able to answer the question...
If course the question shows just an example. Just assume each row has an additional column carName. The question is: if not within group by, how can I then select this value
Choose the carName for the outgoing or the return trip?
Preferably for both.
|
0

If you want to include additional columns, try the following:

SELECT
    a.city_start, a.city_end, b.city_end, a.date_start, b.date_start,
    a.price + b.price, a.car_name, b.car_name
FROM
    flight AS a
    JOIN
    flight AS b ON a.city_start = b.city_end AND a.city_end = b.city_start
    LEFT JOIN
    flight AS c ON
         a.city_start = c.city_start
         AND
         a.city_end = c.city_end
         AND
         a.date_start = c.date_start
         AND (
             a.price > c.price
             OR (
                 a.price = c.price
                 AND
                 a.id > c.id))
    LEFT JOIN
    flight AS d ON
         b.city_start = d.city_start
         AND
         b.city_end = d.city_end
         AND
         b.date_start = d.date_start
         AND (
             b.price > d.price
             OR (
                 b.price = d.price
                 AND
                 b.id > d.id))
WHERE
    b.date_start BETWEEN a.date_start + 1 AND a.date_start + 20
    AND
    c.id IS NULL
    AND
    d.id IS NULL;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.