1

I'm trying to insert test data to test postgresql partionning, using Postgresql 11.

Here is my table:

CREATE TABLE measurement (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
);

Here is my statement to insert test data:

explain insert into measurement(
  city_id,
  logdate,
  peaktemp,
  unitsales
)
select
  ( 
    select (random() * 10000)::int + (generator*0) as city_id 
  ),
  ( 
    select 
        * 
    from 
        (select * from generate_series('2006-01-01'::timestamp,
                                       '2006-12-31'::timestamp,
                                       '1 day')) as rng
    order by
        random()
    limit 1
  ) as logdate,
  ( 
    select (random() * 10000)::int + (generator*0) as peaktemp 
  ),
  ( 
    select (random() * 10000)::int + (generator*0) as unitsales 
  )
from generate_series(1, 1000) as generator

The problem is: when I run this statement, the select subquery which generates the field logdate seems to be cached and the table measurement contains a single value for the field logdate.

So, if I run the following query:

select distinct(logdate)
from measurement

I only get a single value: 2006-02-10

How can I prevent the caching side-effect?

1
  • 1
    Your query seems to be gratuitously complicated, but if you don't want to rearrange it, you could just order by random()+generator to force the subselect to re-evaluated each time. Commented Oct 25, 2019 at 23:19

2 Answers 2

3

As you've observed, the ORDER BY is calculated once for the sub-query, such that for each row that's generated in your outer query, the selected row is going to be the same. As such, you won't be able to use sub-selects and ORDER BY to achieve what you're aiming for, even though you call random().

Try this:

SELECT
  (random() * 10000)::int AS city_id,
  '2006-01-01'::timestamp + random() * ('2006-12-31'::timestamp - '2006-01-01'::timestamp) AS logdate,
  (random() * 10000)::int AS peaktemp,
  (random() * 10000)::int AS unitsales
FROM
generate_series(1,1000);

I'm not sure what your requirements are, but I don't think you need to have 1) a bunch of sub-selects or 2) all the references to generator*0, so I've removed them too.

Disclosure: I work for EnterpriseDB (EDB)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the tip. Concerning 1) and 2), I was following a generate_series guide.
1

Maybe something like this:

select
  (random() * 10000)::int as city_id,
  (
    '2006-01-01'::timestamptz
    +random()*(
      ('2006-01-01'::timestamptz+'1 year'::interval)
      -'2006-01-01'::timestamptz
    )
  ) as logdate,
  (random() * 10000)::int as peaktemp,
  (random() * 10000)::int as unitsales
from generate_series(1, 1000) as generator;

Although I'm confused whether you want a timestamptz (you definitely don't want timestamp, without timezone, as this type is evil) or a date in logdate.

1 Comment

Thanks for your answer, I use Timestamp just to test stuff, but I probably wont use it in production code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.