2

The following query takes around 300-400ms on postgresql 9.1. The table contains ~2M rows. Is this performance justified? Can it be improved?

SELECT "Products"."Id"
      , "Products"."Title"
      , "Products"."ThumbHeight"
      , "Products"."LargeImageWidth"
      , "Products"."LargeImageHeight"
      , "Products"."Url"
      , "Products"."BrowseNodeId"
FROM "Products"
WHERE  "Products"."Id" = ANY(ARRAY(SELECT (random()*2233071)::int
                FROM generate_series(1, 100)));

And here is the explain plan:

--------------------------------------------------------------------------------
 Bitmap Heap Scan on "Products"  (cost=60.48..100.46 rows=10 width=268)
   Recheck Cond: ("Id" = ANY ($0))
   InitPlan 1 (returns $0)
     ->  Function Scan on generate_series  (cost=0.00..17.50 rows=1000 width=0)
   ->  Bitmap Index Scan on "Products_pkey"  (cost=0.00..42.97 rows=10 width=0)
     Index Cond: ("Id" = ANY ($0))

Explain analyze:

Bitmap Heap Scan on "Products"  (cost=60.48..100.46 rows=10 width=268) (actual time=77.702..80.944 rows=100 loops=1)
   Recheck Cond: ("Id" = ANY ($0))
   InitPlan 1 (returns $0)
     ->  Function Scan on generate_series  (cost=0.00..17.50 rows=1000 width=0) (actual time=0.097..0.348 rows=100 loops=1)
   ->  Bitmap Index Scan on "Products_pkey"  (cost=0.00..42.97 rows=10 width=0) (actual time=77.601..77.601 rows=104 loops=1)
         Index Cond: ("Id" = ANY ($0))
 Total runtime: 81.409 ms

Id is the primary key: "Products_pkey" PRIMARY KEY, btree ("Id")

Thank You!

2 Answers 2

1

Try this in comparison to your query:

SELECT "Products"."Id"
      , "Products"."Title"
      , "Products"."ThumbHeight"
      , "Products"."LargeImageWidth"
      , "Products"."LargeImageHeight"
      , "Products"."Url"
      , "Products"."BrowseNodeId"
FROM "Products"
ORDER BY random()
LIMIT 100
Sign up to request clarification or add additional context in comments.

1 Comment

This is the worst solution when you are looking for performance, especially for large tables.
0

Here is a solution which works well for my use case (select 100 random products for a web page):

  1. Duplicate the table
  2. Shuffle table rows
  3. Add auto increment column
  4. Select from a random range (e.g 100-200, 1567000-1567100)

Query time went down under 2ms.

Here is the set of commands I've used:

create table RandProducts as select * from "Products" order by random();
alter table RandProducts add column RandId serial8;
create index on RandProducts(randid);

And then to get 100 random rows I just do something like this:

select * from Products where RandId between 8000 and 8100;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.