PostgreSQL Queries With Table Partitions

Performance

By Mayuri Fakirpure

Updated on Jan 27, 2024

In this article, we'll explain how to increase performance of PostgreSQL queries with table partitions.

It is a powerful, open source object-relational database system. PostgreSQL runs on all major operating systems and comes with many features aimed to help developers build applications, administrators to protect data integrity and build fault-tolerant environments.

What is Table Partitions

Table partitioning in PostgreSQL is a function that allows you to break up large tables into smaller, more manageable pieces known as partitions. Each partition is largely a small sub-table with its own storage indexing scheme. This can enhance query performance and simplify facts management, specifically for tables with large amount of records.

Partitioning can be done based on one or more columns, such as a date column or a range of values. For instance, you may divide a table into partitions according to the record dates, with each partition containing data for a particular set of dates. PostgreSQL can instantly remove partitions from a query that are not related to it, which speeds up the query execution process while searching the data.

  • In some cases, query performance can be significantly increased, especially if most of the table's frequently used rows are included in one or a small number of partitions. By effectively replacing the higher tree levels of indexes, partitioning increases the possibility that the frequently used parts of the indexes will fit in memory.
  • Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is accounted for in the partitioning design. Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.
  • Partition-based queries narrow a search to a specific partition. This approach allows PostgreSQL to cache frequently used parts of a partition into the system RAM. The cached data reduces disk I/O and improves access performance.

Create Database

Before we proceed further, let's create a new database to use and execute Partitioning queries. Execute following query to create database in PostgreSQL:

CREATE DATABASE ecommerce_db;

You can name database anything you want.

Connect to the newly created database by using following query:

\c ecommerce_db

That's it. Now let's check the types of partitioning

Types of Partitioning

Here are the types of partitioning that PostgreSQL supports.

Range Partitioning:

Divides the data based on a range of values. For example, you might partition a table by date ranges or by ranges of identifiers for particular business objects. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.

Example

Let's create orders table first with few columns. Then, instruct the PostgreSQL to partition the table using the category column by including the PARTITION BY RANGE (order_date).


CREATE TABLE orders (
    oid SERIAL,
    product_id INT,
    customer_id INT,
    order_date DATE,
    quantity INT,
    status TEXT,
    amount NUMERIC,
    CONSTRAINT order_pkey PRIMARY KEY (oid, order_date)
) PARTITION BY RANGE (order_date);

It is the test table for demonstration purpose and to understand the concept.

Next, create 4 partition tables with the value of "order_q1_2023", "order_q2_2023", "order_q3_2023", "order_q4_2023". So that we can retrieve data as per the quarter.

CREATE TABLE order_q1_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2023-03-31');

CREATE TABLE order_q2_2023 PARTITION OF orders FOR VALUES FROM ('2023-04-01') TO ('2023-06-30');

CREATE TABLE order_q3_2023 PARTITION OF orders FOR VALUES FROM ('2023-07-01') TO ('2023-09-30');

CREATE TABLE order_q4_2023 PARTITION OF orders FOR VALUES FROM ('2023-10-01') TO ('2023-12-31');

Now, insert data into the orders table. PostgreSQL will automatically route the data to the specific partition based on the *order_date*.

INSERT INTO orders (product_id, customer_id, order_date, quantity, status, amount) VALUES (3, 5, '2023-01-25', 1, 'paid', 500);

INSERT INTO orders (product_id, customer_id, order_date, quantity, status, amount) VALUES (10, 3, '2023-02-12', 2, 'paid', 450);

INSERT INTO orders (product_id, customer_id, order_date, quantity, status, amount) VALUES (7, 9, '2023-09-01', 4, 'paid', 320);

INSERT INTO orders (product_id, customer_id, order_date, quantity, status, amount) VALUES (6, 13, '2023-11-10', 1, 'paid', 211);

INSERT INTO orders (product_id, customer_id, order_date, quantity, status, amount) VALUES (8, 20, '2023-12-18', 1, 'paid', 99);

We have successfully inserted the data. Now lets verify the partition. Execute following SELECT query:

SELECT * FROM order_q1_2023;

Output should look similar like:

 oid | product_id | customer_id | order_date | quantity | status | amount 
-----+------------+-------------+------------+----------+--------+--------
   2 |          3 |           5 | 2023-01-25 |        1 | paid   |    500
   3 |         10 |           3 | 2023-02-12 |        2 | paid   |    450
(2 rows)


List Partitioning:

Divides the data based on a predefined list of values. Each partition contains rows with a specific value from the list. The table is partitioned by explicitly listing which key value(s) appear in each partition. PostgreSQL examines the data in each partition field before routing the data to a specific table. 

Example

First, we need to create a table with few columns. Then, instruct the PostgreSQL to partition the table using the category column by including the PARTITION BY LIST (product_category).

CREATE TABLE products (
    pid SERIAL NOT NULL,
    product_name TEXT NOT NULL,
    product_category TEXT NOT NULL,
    product_description TEXT,
    product_price NUMERIC,
    CONSTRAINT product_pkey PRIMARY KEY (pid, product_category)
) PARTITION BY LIST (product_category);


Next, create 2 partition tables with the value of "electronics_items," and "clothing_items". *Electronics_items* partition stores all electronics items while *clothing_items* partition stores all the clothing items. Execute following 2 queries to create partitions:


CREATE TABLE electronics_items PARTITION OF products FOR VALUES IN ('Electronics');
CREATE TABLE clothing_items PARTITION OF products FOR VALUES IN ('Clothing');


Now, insert data into the products table. PostgreSQL will automatically route the data to the specific partition based on the category.

INSERT INTO products (product_name, product_category, product_description, product_price) VALUES ('Xiaomi Pad 6', 'Electronics', 'It comes with Snapdragon®870', 330.00);

INSERT INTO products (product_name, product_category, product_description, product_price) VALUES ('Apple iPhone 15', 'Electronics', 'It has 12MP camera', 700.00);

INSERT INTO products (product_name, product_category, product_description, product_price) VALUES ('Striped Padded Jacket', 'Clothing', 'Charcoal striped jacket', 18.00);

INSERT INTO products (product_name, product_category, product_description, product_price) VALUES ('Stretchable Jeans', 'Clothing', 'Black dark wash', 25.00);

Once you execute all the above queries, you can retrieve data by partitioning and verify that the data is partitioned correctly.

SELECT * FROM electronics_items;

Output:

 pid |  product_name   | product_category |     product_description          | product_price 
-----+-----------------+------------------+------------------------------+---------------
   1 | Xiaomi Pad 6      | Electronics      | It comes with Snapdragon®870 |        330.00
   2 | Apple iPhone 15 | Electronics      | It has 12MP camera                    |        700.00
(2 rows)

We have successfully inserted data and retrieve it in partitioned. Here you can see that only *Electronics* category data retrieve. It is easiest way to retrieve specific type of list data.

Hash Partitioning:

Distributes rows across partitions based on a hash function. This method can be useful for evenly distributing data.  Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.  Hash partitioning is best used when each partition is on different table spaces residing on separate physical disks, so the IO is equally divided by more devices.

Create a new employees table with few columns. Then, instruct the PostgreSQL to partition the table using the category column by including the PARTITION BY HASH (eid).

CREATE TABLE employees ( 
    eid SERIAL, 
    first_name TEXT,
    last_name TEXT,
    department_id INT
) PARTITION BY HASH (eid);

Next, create 3 partition tables. Each partition covering a specific range of hash values.

CREATE TABLE employees_1 PARTITION OF employees FOR VALUES WITH (MODULUS 3, REMAINDER 0);

CREATE TABLE employees_2 PARTITION OF employees FOR VALUES WITH (MODULUS 3, REMAINDER 1);

CREATE TABLE employees_3 PARTITION OF employees FOR VALUES WITH (MODULUS 3, REMAINDER 2);

Now, insert data into the employees table. PostgreSQL will automatically route the data to the specific partition.

INSERT INTO employees (first_name, last_name, department_id) VALUES ('Racheal', 'Bing', 2);

INSERT INTO employees (first_name, last_name, department_id) VALUES ('MARY', 'Cooper', 1);

INSERT INTO employees (first_name, last_name, department_id) VALUES ('John', 'Smith', 3);

INSERT INTO employees (first_name, last_name, department_id) VALUES ('Jennifer', 'Mathew', 1);

INSERT INTO employees (first_name, last_name, department_id) VALUES ('Cory', 'Mathew', 3);

Now, retrieve the data using partition.

SELECT * FROM employees_1;

Output will similar like:

eid | first_name | last_name | department_id 
-----+------------+-----------+---------------
   2 | Racheal    | Bing      |             2
   4 | John       | Smith     |             3
   6 | Cory       | Mathew    |             3
(3 rows)

We have seen PostgreSQL queries With table partitions. We have implemented three types of partitions. Partition is very useful and makes querying faster. With the right implementation, table partitioning can be a game-changer for handling massive amounts of data in PostgreSQL.