2

In MySQL (and PostgreSQL), what exactly constitutes a DB instance and a DB partition?

For example, do different DB partitions need to necessarily live on different database instances? Or can a single DB instance manage multiple partitions? If the latter, what's the point of calling it a "partition"? Would the DB have any knowledge of it in this case?

Here's a quote from a document describing a system design from an online course:

How can we plan for the future growth of our system?

We can have a large number of logical partitions to accommodate future data growth, such that in the beginning, multiple logical partitions reside on a single physical database server. Since each database server can have multiple database instances on it, we can have separate databases for each logical partition on any server. So whenever we feel that a particular database server has a lot of data, we can migrate some logical partitions from it to another server. We can maintain a config file (or a separate database) that can map our logical partitions to database servers; this will enable us to move partitions around easily. Whenever we want to move a partition, we only have to update the config file to announce the change.

6
  • Be specific - which database system? I think that text is talking about sharding, not partitions. Commented Jul 13, 2020 at 16:16
  • @LaurenzAlbe I am primarily interested in MySQL, but left it open to PostgreSQL since it sounds like these concepts may be relatively universal Commented Jul 13, 2020 at 16:20
  • Can you provide a link to the quoted material? I am guessing that that is not from the official docs of either mysql or postgresql. Commented Jul 13, 2020 at 17:10
  • Looks like it is from quizlet.com/444358211/designing-instagram-flash-cards Commented Jul 13, 2020 at 18:57
  • Sharding involves multiple instances. In MySQL, Partitioning should be limited to a single instance. See my answers. Commented Jul 13, 2020 at 19:04

2 Answers 2

3

These terms are confusing, misused, and inconsistently defined.

For MySQL:

A Database has multiple definitions:

  • A "schema" (as used by other vendors/standards). This is a collection of tables. There are one or more "databases in an instance.
  • The instance. You should use "server" or "database server" to be clearer.
  • The data. "Dataset" might be a better term.

An instance refers to a copy of mysqld running on some machine somewhere.

  • You can have multiple instances on a single piece of hardware. (Rare)
  • You can have multiple instances on a single piece of hardware, with the instances in different VMs or Dockers. (handy for testing)
  • Usually "instance" refers to one server with one copy of MySQL on it. (Typical for larger-scale situations)

A PARTITION is a specific way to lay out a table (in a database).

  • It is seen in CREATE TABLE (...) PARTITION BY ....
  • It is a "horizontal" split of the data, often by date, but could be by some other 'column'.
  • It have no direct impact on performance, making it rarely useful.

Sharding is not implemented in MySQL, but can be done on top of MySQL.

  • It is also a "horizontal" split of the data, but in this case across multiple "instances".
  • The use case is, for example, social media where there are millions of "users" that are mostly handled by themselves. That is, most of the queries focus on a single slice of the data, hence it is practical to a bunch of users on one server and do all those queries there.
  • It can be called "horizontal partitioning" but should not be confused with PARTITIONs of a table.

Vertical partitioning is where some columns are pulled out of a table in put into a parallel table.

  • Both tables would (normally) have the same PRIMARY KEY, thereby facilitating JOINs.
  • Vertical partitioning would (normally) be done only in a single "instance".
  • The purposes include splitting off big text/blog columns; splitting off optional columns (and use LEFT JOIN to get NULLs).
  • Vertical partitioning was somewhat useful in MyISAM, but rarely useful in InnoDB, since that engine automatically does such.

Replication and Clustering

  • Multiple instances contain the same data.
  • Used for "High Availability" (HA).
  • Used for scaling out reads.
  • Orthogonally to partitioning or sharding.
  • Does not make sense to have the instances on the same server (except for testing/experimenting/staging/etc).
Sign up to request clarification or add additional context in comments.

Comments

2

Partitions, in terms of MySQL and PostgreSQL feature set, are physical segmentations of data. They exist within a single database instance, and are used to reduce the scope of data you're interacting with at a particular time, to cope with high data volume situations.

The document you're quoting from is speaking of a more abstract concept of a data partition at the system design level.

2 Comments

Thanks @chaos so it's correct to say that (at least in MySQL and PostgreSQL), a server can contain a number of instances, and an instance can contain a number of partitions (for a given set of tables), but we can't have instances share partitions, or servers share instances, right?
Correct as far as the kind MySQL/PostgreSQL kind of partitioning goes. See also en.wikipedia.org/wiki/Shard_(database_architecture)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.