Contact
Offices

PostgreSQL 12 released

news rss

Bernd Helmle

The PostgreSQL Global Development Group (PGDG) has released version 12 of the popular free PostgreSQL database. As our article for Beta 4 has already indicated, a number of new features, improvements and optimizations have been incorporated into the release. These include among others:

Optimized disk space utilization and speed for btree indexes

btree-Indexes, the default index type in PostgreSQL, has experienced some optimizations in PostgreSQL 12.

btree indexes used to store duplicates (multiple entries with the same key values) in an unsorted order. This has resulted in suboptimal use of physical representation in these indexes. An optimization now stores these multiple key values in the same order as they are physically stored in the table. This improves disk space utilization and the effort required to manage corresponding btree type indexes. In addition, indexes with multiple indexed columns use an improved physical representation so that their storage utilization is also improved. To take advantage of this in PostgreSQL 12, however, if they were upgraded to the new version using pg_upgrade via a binary upgrade, these indexes must be recreated or re-indexed.

Insert operations in btree indexes are also accelerated by improved locking.

Improvements for pg_checksums

credativ has contributed an extension for pg_checksums that allows to enable or disable block checksums in stopped PostgreSQL instances. Previously, this could only be done by recreating the physical data representation of the cluster using initdb.
pg_checksums now has the option to display a status history on the console with the --progress parameter. The corresponding code contributions come from the colleagues Michael Banck and Bernd Helmle.

Optimizer Inlining of Common Table Expressions

Up to and including PostgreSQL 11, the PostgreSQL Optimizer was unable to optimize common table expressions (also called CTE or WITH queries). If such an expression was used in a query, the CTE was always evaluated and materialized first before the rest of the query was processed. This resulted in expensive execution plans for more complex CTE expressions. The following generic example illustrates this. A join is given with a CTE expression that filters all even numbers from a numeric column:

WITH t_cte AS (SELECT id FROM foo WHERE id % 2 = 0) SELECT COUNT(*) FROM t_cte JOIN bar USING(id);

In PostgreSQL 11, using a CTE always leads to a CTE scan that materializes the CTE expression first:

EXPLAIN (ANALYZE, BUFFERS) WITH t_cte AS (SELECT id FROM foo WHERE id % 2 = 0) SELECT COUNT(*) FROM t_cte JOIN bar USING(id) ;
                                                       QUERY PLAN                                                        
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=2231.12..2231.14 rows=1 width=8) (actual time=48.684..48.684 rows=1 loops=1)
   Buffers: shared hit=488
   CTE t_cte
     ->  Seq Scan on foo  (cost=0.00..1943.00 rows=500 width=4) (actual time=0.055..17.146 rows=50000 loops=1)
           Filter: ((id % 2) = 0)
           Rows Removed by Filter: 50000
           Buffers: shared hit=443
   ->  Hash Join  (cost=270.00..286.88 rows=500 width=0) (actual time=7.297..47.966 rows=5000 loops=1)
         Hash Cond: (t_cte.id = bar.id)
         Buffers: shared hit=488
         ->  CTE Scan on t_cte  (cost=0.00..10.00 rows=500 width=4) (actual time=0.063..31.158 rows=50000 loops=1)
               Buffers: shared hit=443
         ->  Hash  (cost=145.00..145.00 rows=10000 width=4) (actual time=7.191..7.192 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               Buffers: shared hit=45
               ->  Seq Scan on bar  (cost=0.00..145.00 rows=10000 width=4) (actual time=0.029..3.031 rows=10000 loops=1)
                     Buffers: shared hit=45
 Planning Time: 0.832 ms
 Execution Time: 50.562 ms
(19 rows)

This plan first materializes the CTE with a sequential scan with a corresponding filter (id % 2 = 0). Here no functional index is used, therefore this scan is correspondingly more expensive. Then the result of the CTE is linked to the table bar by Hash Join with the corresponding Join condition. With PostgreSQL 12, the optimizer now has the ability to inline these CTE expressions without prior materialization. The underlying optimized plan in PostgreSQL 12 will look like this:

EXPLAIN (ANALYZE, BUFFERS) WITH t_cte AS (SELECT id FROM foo WHERE id % 2 = 0) SELECT COUNT(*) FROM t_cte JOIN bar USING(id) ;
                                                                QUERY PLAN                                                                 
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=706.43..706.44 rows=1 width=8) (actual time=9.203..9.203 rows=1 loops=1)
   Buffers: shared hit=148
   ->  Merge Join  (cost=0.71..706.30 rows=50 width=0) (actual time=0.099..8.771 rows=5000 loops=1)
         Merge Cond: (foo.id = bar.id)
         Buffers: shared hit=148
         ->  Index Only Scan using foo_id_idx on foo  (cost=0.29..3550.29 rows=500 width=4) (actual time=0.053..3.490 rows=5001 loops=1)
               Filter: ((id % 2) = 0)
               Rows Removed by Filter: 5001
               Heap Fetches: 10002
               Buffers: shared hit=74
         ->  Index Only Scan using bar_id_idx on bar  (cost=0.29..318.29 rows=10000 width=4) (actual time=0.038..3.186 rows=10000 loops=1)
               Heap Fetches: 10000
               Buffers: shared hit=74
 Planning Time: 0.646 ms
 Execution Time: 9.268 ms
(15 rows)

The advantage of this method is that there is no initial materialization of the CTE expression. Instead, the query is executed directly with a Join. This works for all non-recursive CTE expressions without side effects (for example, CTEs with write statements) and those that are referenced only once per query. The old behavior of the optimizer can be forced with the WITH ... AS MATERIALIZED ... directive.

Generated Columns

Generated Columns in PostgreSQL 12 are materialized columns, which calculate a result based on expressions using existing column values. These are stored with the corresponding result values in the tuple. The advantage is that there is no need to create triggers for subsequent calculation of column values. The following simple example illustrates the new functionality using a price table with net and gross prices:

CREATE TABLE preise(netto numeric,
                    brutto numeric GENERATED ALWAYS AS (netto * 1.19) STORED);
 
INSERT INTO preise VALUES(17.30);
INSERT INTO preise VALUES(225);
INSERT INTO preise VALUES(247);
INSERT INTO preise VALUES(19.15);
 
SELECT * FROM preise;
 netto │ brutto
───────┼─────────
 17.30 │ 20.5870
   225 │  267.75
   247 │  293.93
 19.15 │ 22.7885
(4 rows)

The column brutto is calculated directly from the net price. The keyword STORED is mandatory. Of course, indexes can also be created on Generated Columns, but they cannot be part of a primary key. Furthermore, the SQL expression must be unique, i.e. it must return the same result even if the input quantity is the same. Columns declared as Generated Columns cannot be used explicitly in INSERT or UPDATE operations. If a column list is absolutely necessary, the corresponding value can be indirectly referenced with the keyword DEFAULT.

Omission of explicit OID columns

Explicit OID columns have historically been a way to create unique column values so that a table row can be uniquely identified database-wide. However, for a long time PostgreSQL has only created these explicitly and considered their basic functionality obsolete. With PostgreSQL the possibility to create such columns explicitly is now finally abolished. This means that it will no longer be possible to specify the WITH OIDS directive for tables. System tables that have always referenced OID objects uniquely will now return OID values without explicitly specifying OID columns in the result set. Especially older software, which handled catalog queries carelessly, could get problems with a double column output.

Moving recovery.conf to postgresql.conf

Up to and including PostgreSQL 11, database recovery and streaming replication instances were configured via a separate configuration file recovery.conf.

With PostgreSQL 12, all configuration work done there now migrates to postgresql.conf. The recovery.conf file is no longer required. PostgreSQL 12 refuses to start as soon as this file exists. Whether recovery or streaming standby is desired is now decided either by a recovery.signal file (for recovery) or by a standby.signal file (for standby systems). The latter has priority if both files are present. The old parameter standby_mode, which controlled this behavior since then, has been removed.

For automatic deployments of high-availability systems, this means a major change. However, it is now also possible to perform corresponding configuration work almost completely using the ALTER SYSTEM command.

REINDEX CONCURRENTLY

With PostgreSQL 12 there is now a way to re-create indexes with as few locks as possible. This greatly simplifies one of the most common maintenance tasks in very write-intensive databases. Previously, a combination of CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY had to be used. In doing so, it was also necessary to ensure that index names were reassigned accordingly, if required.

The release notes give an even more detailed overview of all new features and above all incompatibilities with previous PostgreSQL versions.

Add new comment

Image CAPTCHA