1

For our development of a flight retail engine we store orders as JSON documents in a PostgreSQL database.

The order table is defined as:

CREATE TABLE IF NOT EXISTS orders (
  id          SERIAL PRIMARY KEY,
  order_data  JSONB NOT NULL
);

A simplified version of a typical order document looks like this:

{  
   "orderID":"ORD000001",
   "invalid":false,
   "creationDate":"2017-11-19T15:49:53.897",
   "orderItems":[  
      {  
         "orderItemID":"ITEM000001",
         "flight":{  
            "id":"FL000001",
            "segments":[  
               {  
                  "origin":"FRA",
                  "destination":"LHR",
                  "departure":"2018-05-12T14:00:00",
                  "arrival":"2018-05-12T14:40:00",
                  "marketingCarrier":"LH",
                  "marketingFlightNumber":"LH908"
               }
            ]
         },
         "passenger":{  
            "lastName":"Test",
            "firstName":"Thomas",
            "passengerTypeCode":"ADT"
         }
      },
      {  
         "orderItemID":"ITEM000002",
         "flight":{  
            "id":"FL000002",
            "segments":[  
               {  
                  "origin":"LHR",
                  "destination":"FRA",
                  "departure":"2018-05-17T11:30:00",
                  "arrival":"2018-05-17T14:05:00",
                  "marketingCarrier":"LH",
                  "marketingFlightNumber":"LH905"
               }
            ]
         },
         "passenger":{  
            "lastName":"Test",
            "firstName":"Thomas",
            "passengerTypeCode":"ADT"
         }
      }
   ]
}

The number of entries for this table can grow rather larger (up to over 100 million).

Creating a GIN index on "orderID" works fine and, as expected, significantly speeds up queries for orders with a specific ID.

But we also require a fast execution time for much more complex requests like searching for orders with a specific flight segment.

Thanks to this thread I was able to write a request like

SELECT *
FROM orders,
  jsonb_array_elements(order_data->'orderItems') orderItems,
  jsonb_array_elements(orderItems->'flight'->'segments') segments
WHERE order_data->>'invalid'='false'
  AND segments->>'origin'='LHR'
  AND ( (segments->>'marketingCarrier'='LH' AND segments->>'marketingFlightNumber'='LH905') OR (segments->>'operatingCarrier'='LH' AND segments->>'operatingFlightNumber'='LH905') )
  AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'

This works fine, but is too slow for our requirements.

What is the best way to speed up such a query?

Creating a materialized view like

CREATE MATERIALIZED VIEW order_segments AS
SELECT id, order_data->>'orderID' AS orderID, segments->>'origin' AS origin, segments->>'marketingCarrier' AS marketingCarrier, segments->>'marketingFlightNumber' AS marketingFlightNumber, segments->>'operatingCarrier' AS operatingCarrier, segments->>'operatingFlightNumber' AS operatingFlightNumber, segments->>'departure' AS departure
FROM orders,
  jsonb_array_elements(order_data -> 'orderItems') orderItems,
  jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE order_data->>'invalid'='false';

works, but has the disadvantage of not being updated automatically.

So, how would I define indices on the orders table to achieve fast execution times? Or is there an entirely different solution?

2
  • I don't know what your queries look iike in general, but have you considered indexing other columns as well? Like marketingCarrier and/or origin? Commented Nov 20, 2017 at 17:19
  • Thanks for your suggestion! Yes, I have considered this, but unfortunately I never found out how to set proper indices on nested arrays within arrays. Commented Nov 20, 2017 at 22:25

1 Answer 1

1

Finally found an answer to my own question:

Setting an index

CREATE INDEX ix_order_items ON orders USING gin (((order_data->'orderItems')) jsonb_path_ops)

and using the request

SELECT DISTINCT id, order_data
FROM orders,
  jsonb_array_elements(order_data -> 'orderItems') orderItems,
  jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE id IN
( SELECT id
  FROM orders
  WHERE order_data->'orderItems'@>'[{"flight": {"segments": [{"origin":"LHR"}]}}]'
    AND (
      order_data->'orderItems'@>'[{"flight": {"segments": [{"marketingCarrier":"LH","marketingFlightNumber":"LH905"}]}}]'
      OR
      order_data->'orderItems'@>'[{"flight": {"segments": [{"operatingCarrier":"LH","operatingFlightNumber":"LH905"}]}}]'
    )
)
AND order_data@>'{"invalid": false}'
AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'

speeds up the request from several seconds to a few milliseconds.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.