postgresql fetch 100 random rows

Question

The following query takes around 300-400ms on postgresql 9.1. The table contains ~2M rows. Is this performance justified? Can it be improved?

SELECT "Products"."Id"
      , "Products"."Title"
      , "Products"."ThumbHeight"
      , "Products"."LargeImageWidth"
      , "Products"."LargeImageHeight"
      , "Products"."Url"
      , "Products"."BrowseNodeId"
FROM "Products"
WHERE  "Products"."Id" = ANY(ARRAY(SELECT (random()*2233071)::int
                FROM generate_series(1, 100)));

And here is the explain plan:

--------------------------------------------------------------------------------
 Bitmap Heap Scan on "Products"  (cost=60.48..100.46 rows=10 width=268)
   Recheck Cond: ("Id" = ANY ($0))
   InitPlan 1 (returns $0)
     ->  Function Scan on generate_series  (cost=0.00..17.50 rows=1000 width=0)
   ->  Bitmap Index Scan on "Products_pkey"  (cost=0.00..42.97 rows=10 width=0)
     Index Cond: ("Id" = ANY ($0))

Explain analyze:

Bitmap Heap Scan on "Products"  (cost=60.48..100.46 rows=10 width=268) (actual time=77.702..80.944 rows=100 loops=1)
   Recheck Cond: ("Id" = ANY ($0))
   InitPlan 1 (returns $0)
     ->  Function Scan on generate_series  (cost=0.00..17.50 rows=1000 width=0) (actual time=0.097..0.348 rows=100 loops=1)
   ->  Bitmap Index Scan on "Products_pkey"  (cost=0.00..42.97 rows=10 width=0) (actual time=77.601..77.601 rows=104 loops=1)
         Index Cond: ("Id" = ANY ($0))
 Total runtime: 81.409 ms

Id is the primary key: "Products_pkey" PRIMARY KEY, btree ("Id")

Thank You!

Mithrandir · Accepted Answer · 2012-05-30 14:48:28Z

1

Try this in comparison to your query:

SELECT "Products"."Id"
      , "Products"."Title"
      , "Products"."ThumbHeight"
      , "Products"."LargeImageWidth"
      , "Products"."LargeImageHeight"
      , "Products"."Url"
      , "Products"."BrowseNodeId"
FROM "Products"
ORDER BY random()
LIMIT 100

answered May 30, 2012 at 14:48

Mithrandir

25.5k6 gold badges53 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

michaelr524 Over a year ago

This is the worst solution when you are looking for performance, especially for large tables.

michaelr524 · Accepted Answer · 2012-06-01 07:50:42Z

0

Here is a solution which works well for my use case (select 100 random products for a web page):

Duplicate the table
Shuffle table rows
Add auto increment column
Select from a random range (e.g 100-200, 1567000-1567100)

Query time went down under 2ms.

Here is the set of commands I've used:

create table RandProducts as select * from "Products" order by random();
alter table RandProducts add column RandId serial8;
create index on RandProducts(randid);

And then to get 100 random rows I just do something like this:

select * from Products where RandId between 8000 and 8100;

edited Jun 1, 2012 at 7:50

answered May 31, 2012 at 5:15

michaelr524

9212 gold badges11 silver badges21 bronze badges

Collectives™ on Stack Overflow

postgresql fetch 100 random rows

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related