0

First, I'm new to DB programming and the following seems to me a little strange.

I have the following very large table (details):

id       name         name_details
PK     varchar(32)    varchar(32)
--------------------------------
 1      'core'         'xAj6l3Fg5d'
 2      'core'         '8lEfs01nkf'
 3      'smt'          'oij3Gll4d6'
...................................

I need to write the following query:

SELECT name_details
FROM details
WHERE name = 'core' OR name = 'smt'

I've noticed that if I open two separate windows in PGAdmin and execute these two queries:

SELECT name_details
FROM details
WHERE name = 'core'

SELECT name_details
FROM details
WHERE name = 'smt' 

The execution time will be almost the same as I if were executed the only one query. So, I presume that every SQL connection is handled in their own thread. I have 16-core system.

Question: is it generally useful to split the whole query in smaller parts (16) and execute any part in the different thread? Is it generally useful for performing queries handling large amount of data?

Particularly, I'd use ThreadPoolExecutor(Runtime.getRuntime().availableProcessors() \*cores*\, \* other_params *\) for handling that.

2
  • We really need more info to provide any sort of sensible response on this. In particular, is the table split 50/50 between "core" and "smt" records or are there other records in there. What is the table structure? In particular any indexes or horizontal partitioning in place. How big is "very-large" compared to available system memory. Commented Jul 27, 2015 at 16:26
  • @Gary No, I just need all rows with 'core' and 'smt'. I don't know anything about their count. Very-large means ~10e10 rows, I can't even load the table in the memory entirely. Commented Jul 27, 2015 at 16:27

2 Answers 2

1

One possible reason why all the queries take approximately the same time could be simple table scanning.

In that particular situation, assuming no indexes, postgres will just read the whole table. It will also cache some records as it reads them.

In your single query scenario, it reads each records, caches it for a bit, discards it.

In the two query scenario, each query will be doing a table scan, but whenever the second query wants to access a particular block it will probably find that the first query has already loaded it into cache, so no disk access is required. In terms of disk access the second query is effectively free.

This would obviously be completely different if the table was indexed on the Name as each query would be accessing only those parts of the table of interest.

Sign up to request clarification or add additional context in comments.

1 Comment

Actually Postgres can use a single table scan from two different connections. So if those two queries run "at the same" time, there might be only a single scanning of the table be in progress (one query piggy-backs the scan of the other)
0

In general, you should not split the query. The DBMS takes care of efficient execution and you should trust the system. "Optimizations" as you suggest, should only be done in rare circumstances (and only of there is a performance issue in the first place). Furthermore, those "optimizations" might "fire back", ie, result in bad performance if you do not know very well what you are doing (ie, only expert users should do it with care).

@Gary explained some details already why it performs as you observe it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.