1

I have written an sql for a table that have about 50.000.000 users. The query takes too much time than I expected that it did not finished about 23 hours.

UPDATE users
    SET building_id = B.id
    FROM (
      SELECT *
      FROM buildings B
    ) AS B
    WHERE B.city          = address_city
      AND B.town          = address_town
      AND B.neighbourhood = address_neighbourhood
      AND B.street        = address_street
      AND B.no            = address_building_no

The idea of this sql is to that removing the building/address info from users and instead referencing it to the buildings table.

EXPLAIN

Update on users  (cost=22226900.43..22548054.14 rows=15212 width=166) 
->  Merge Join  (cost=22226900.43..22548054.14 rows=15212 width=166)
         Merge Cond: (((users.address_city)::text = (b.city)::text) AND ((users.address_town)::text = (b.town)::text) AND ((users.address_neighbourhood)::text = (b.neighbourhood)::text) AND ((users.address_street)::text = (b.street)::text) AND ((users.address_building_no)::text = (b.no)::text))
         ->  Sort  (cost=21352886.76..21401078.96 rows=96384398 width=156)
               Sort Key: users.address_city, users.address_town, users.address_neighbourhood, users.address_street, users.address_building_no
               ->  Seq Scan on users  (cost=0.00..2559921.19 rows=96384398 width=156)
         ->  Materialize  (cost=874013.68..883606.86 rows=9593179 width=63)
               ->  Sort  (cost=874013.68..878810.27 rows=9593179 width=63)
                     Sort Key: b.city, b.town, b.neighbourhood, b.street, b.no
                     ->  Seq Scan on buildings b  (cost=0.00..136253.54 rows=9593179 width=63) (10 rows)

I don't know whether this sql uses the inner SELECT sql for each of the users or caches for transaction. Also, if it caches, does it use the indexes for the cached temp table?

I couldn't write the sql like this:

FROM (
  SELECT * 
  FROM buildings B
  WHERE B.city          = users.address_city
    AND B.town          = users.address_town
    AND B.neighbourhood = users.address_neighbourhood
    AND B.street        = users.address_street
    AND B.no            = users.address_building_no
  )

it says that users cannot be accessed from the inner select. Do you have any suggestion how to access buildings in the inner sql statement.

16
  • 1
    first i wouldn't use select * instead of select the fields you really use. I would ssuggest you to use Toad or another program that can help you optimizing queries. Also you can talk to your dba to check if indexes are correct. Commented Mar 16, 2015 at 9:16
  • 1
    Can you show us an output of EXPLAIN for your first query? Commented Mar 16, 2015 at 9:17
  • 1
    did you check if there are table or row locks active? And please drop the subselect, you don't need it. Commented Mar 16, 2015 at 9:17
  • 1
    I presume create table as select join on those two tables and then renaming it to "users" will be faster... and will produce microsecond lock only... Of course if a table is not editable at the moment and you have enough space in temp_tablespace Commented Mar 16, 2015 at 11:35
  • 1
    @exculuber sure - thanks! (we had horrible outage just after my comment :) Commented Mar 17, 2015 at 7:25

2 Answers 2

1

I presume

create table t as select column_list from a join b on column=column;
alter table t rename to users;

will be faster, and will produce microsecond lock only... Of course if a table is not editable at the moment and there is enough space in temp_tablespace

Sign up to request clarification or add additional context in comments.

Comments

0

Not sure but wouldn't this be (at least slightly, if not considerably) faster?

UPDATE users
SET building_id = B.id
FROM buildings B
WHERE B.city          = address_city
  AND B.town          = address_town
  AND B.neighbourhood = address_neighbourhood
  AND B.street        = address_street
  AND B.no            = address_building_no

If nothing else, it wouldn't require the Materialize stage as given in EXPLAIN above.

2 Comments

I did not know that we can just FROM buildnigs B until seeing it in comments. However, it doesn't change a bit of performance, db engine optimizes them same.
In that case its possible that you're looking at the wrong end for a solution. While at it, do both tables have appropriate indexes on the said 5 fields? If not, a Seq scan is inevitable and then any form of query re-writing is not going to make this UPDATE any faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.