I run
SELECT *
FROM
(SELECT
*,
ROW_NUMBER() OVER () AS n
FROM
{table_name}) t
WHERE
n < 10000
in Postgres. I've noticed the result is different for each run.
To test if content is different in addition to the order, I do an avg on a column. The result is interesting: table with primary key is consistent in return value, while another table without primary key differs in each run.
The execution plan for table with PK:
"Aggregate (cost=139391585.22..139391585.23 rows=1 width=32)"
" -> WindowAgg (cost=0.58..99288350.02 rows=3208258816 width=9090)"
" Run Condition: (row_number() OVER (?) < 10000)"
" -> Index Only Scan using mea_vit_pi_4221ef4deeadcabf_ix on {table_name} (cost=0.58..59185114.82 rows=3208258816 width=8)"
Execution plan for table without pk:
"Aggregate (cost=83580303.64..83580303.65 rows=1 width=32)"
" -> WindowAgg (cost=0.00..61837074.84 rows=1739458304 width=650)"
" Run Condition: (row_number() OVER (?) < 10000)"
" -> Seq Scan on {table_2} (cost=0.00..40093846.04 rows=1739458304 width=8)"
Why it's different? If possible, what should I do to get a stable result?
SELECT avg(person_id) FROM (SELECT *, ROW_NUMBER() OVER () AS n FROM {table_name}) t WHERE n<10000to test if the content is actually different in addition to order. The result is interesting: the content seems to be same for table with primary key. But different for table without primary key.