For our development of a flight retail engine we store orders as JSON documents in a PostgreSQL database.
The order table is defined as:
CREATE TABLE IF NOT EXISTS orders (
id SERIAL PRIMARY KEY,
order_data JSONB NOT NULL
);
A simplified version of a typical order document looks like this:
{
"orderID":"ORD000001",
"invalid":false,
"creationDate":"2017-11-19T15:49:53.897",
"orderItems":[
{
"orderItemID":"ITEM000001",
"flight":{
"id":"FL000001",
"segments":[
{
"origin":"FRA",
"destination":"LHR",
"departure":"2018-05-12T14:00:00",
"arrival":"2018-05-12T14:40:00",
"marketingCarrier":"LH",
"marketingFlightNumber":"LH908"
}
]
},
"passenger":{
"lastName":"Test",
"firstName":"Thomas",
"passengerTypeCode":"ADT"
}
},
{
"orderItemID":"ITEM000002",
"flight":{
"id":"FL000002",
"segments":[
{
"origin":"LHR",
"destination":"FRA",
"departure":"2018-05-17T11:30:00",
"arrival":"2018-05-17T14:05:00",
"marketingCarrier":"LH",
"marketingFlightNumber":"LH905"
}
]
},
"passenger":{
"lastName":"Test",
"firstName":"Thomas",
"passengerTypeCode":"ADT"
}
}
]
}
The number of entries for this table can grow rather larger (up to over 100 million).
Creating a GIN index on "orderID" works fine and, as expected, significantly speeds up queries for orders with a specific ID.
But we also require a fast execution time for much more complex requests like searching for orders with a specific flight segment.
Thanks to this thread I was able to write a request like
SELECT *
FROM orders,
jsonb_array_elements(order_data->'orderItems') orderItems,
jsonb_array_elements(orderItems->'flight'->'segments') segments
WHERE order_data->>'invalid'='false'
AND segments->>'origin'='LHR'
AND ( (segments->>'marketingCarrier'='LH' AND segments->>'marketingFlightNumber'='LH905') OR (segments->>'operatingCarrier'='LH' AND segments->>'operatingFlightNumber'='LH905') )
AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'
This works fine, but is too slow for our requirements.
What is the best way to speed up such a query?
Creating a materialized view like
CREATE MATERIALIZED VIEW order_segments AS
SELECT id, order_data->>'orderID' AS orderID, segments->>'origin' AS origin, segments->>'marketingCarrier' AS marketingCarrier, segments->>'marketingFlightNumber' AS marketingFlightNumber, segments->>'operatingCarrier' AS operatingCarrier, segments->>'operatingFlightNumber' AS operatingFlightNumber, segments->>'departure' AS departure
FROM orders,
jsonb_array_elements(order_data -> 'orderItems') orderItems,
jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE order_data->>'invalid'='false';
works, but has the disadvantage of not being updated automatically.
So, how would I define indices on the orders table to achieve fast execution times? Or is there an entirely different solution?