PostgreSQL 12 (2019-10-03 => 2024-11-14) SQL/JSON path expression support; generated columns

../../_images/pg_12_is_out.png

https://postgresweekly.com/issues/326

../../_images/postgresql1.png
../../_images/modern_sql.png

https://fediverse.org/MarkusWinand/status/1179804815657967616?s=20 https://modern-sql.com/

Announce

In english

PostgreSQL 12 Released !

OCTOBER 3, 2019 - The PostgreSQL Global Development Group today announced the release of PostgreSQL 12, the latest version of the world’s most advanced open source database.

PostgreSQL 12 enhancements include notable improvements to query performance, particularly over larger data sets, and overall space utilization.

This release provides application developers with new capabilities such as SQL/JSON path expression support, optimizations for how common table expression (WITH) queries are executed, and generated columns.

The PostgreSQL community continues to support the extensibility and robustness of PostgreSQL, with further additions to internationalization, authentication, and providing easier ways to administrate PostgreSQL.

This release also introduces the pluggable table storage interface, which allows developers to create their own methods for storing data.

The development community behind PostgreSQL contributed features for PostgreSQL 12
that offer performance and space management gains that our users can achieve
with minimal effort, as well as improvements in enterprise authentication,
administration functionality, and SQL/JSON support.

said Dave Page, a core team member of the PostgreSQL Global Development Group.

This release continues the trend of making it easier to manage database
workloads large and small while building on PostgreSQL's reputation of
flexibility, reliability and stability in production environments.

PostgreSQL benefits from over 20 years of open source development and has become the preferred open source relational database for organizations of all sizes.

The project continues to receive recognition across the industry, including being featured for the second year in a row as the DBMS of the Year in 2018 by DB-Engines and receiving the “Lifetime Achievement” open source award at OSCON 2019.

En français

PostgreSQL 12 est publiée !

Le PostgreSQL Global Development Group annonce aujourd’hui la sortie de PostgreSQL 12, la dernière version du SGBD open source le plus avancé du monde.

PostgreSQL 12 inclut des améliorations notables sur la performance des requêtes , particulièrement sur les gros volumes de données et sur l’utilisation générale de l’espace disque.

Cette version offre aux développeurs d’applications de nouvelles fonctionnalités comme le support des expressions SQL/JSON path, des optimisations sur l’exécution des requêtes common tables expression (WITH) et l’ajout des colonnes calculées.

La communauté PostgreSQL poursuit les objectifs d’extensibilité et de robustesse de PostgreSQL en y incluant plusieurs ajouts à l’internationalisation et l’authenfication et en simplifiant l’administration de PostgreSQL.

Cette version introduit également l’interface de stockage connectable permettant de développer sa propre méthode de stockage des données.

La communauté des développeurs de PostgreSQL a ajouté à PostgreSQL 12 des
fonctionnalités qui apportent de la performance et des gains dans la gestion
de l'espace disque que nos clients peuvent mettre en œuvre avec un minimum d'effort.
Cette version apporte également une authentification de niveau entreprise, des
fonctionnalités d'administration et le support de SQL/JSON

déclare Dave Page, membre du noyau des développeurs du PostgreSQL Global Development Group.

Cette version poursuit l'objectif de simplifier la gestion des bases de données,
quelle que soit la charge de travail, grande ou petite. Elle contribue également
à consolider la réputation de flexibilité, sûreté et stabilité de PostgreSQL
dans des environnements de production.

PostgreSQL bénéficie d’un développement de plus de 20 années et est devenu le SGBD relationnel open source de référence pour les entreprises et institutions de toutes tailles.

La reconnaissance du projet en entreprise est toujours plus grande.

Le projet a ainsi reçu pour la seconde année consécutive le prix du SGBD de l’année (« DBMS of the Year ») dans le classement DB-Engines de 2018. Il a également reçu le prix open source Lifetime Achievement lors de l’OSCON 2019.

Indexing & Constraints

../../_images/indexing.png

Announce ( https://www.postgresql.org/about/press/presskit12/ )

PostgreSQL 12 provides significant performance and maintenance enhancements to its indexing system and to partitioning.

B-tree Indexes, the standard type of indexing in PostgreSQL, have been optimized in PostgreSQL 12 to better handle workloads where the indexes are frequently modified.

Using a fair use implementation of the TPC-C benchmark, PostgreSQL 12 demonstrated on average a 40% reduction in space utilization and an overall gain in query performance.

There are additional enhancements to indexing in PostgreSQL 12 that affect overall performance, including lower overhead in write-ahead log generation for the GiST, GIN, and SP-GiST index types, the ability to create covering indexes (the INCLUDE clause) on GiST indexes, the ability to perform K-nearest neighbor queries with the distance operator (<->) using SP-GiST indexes, and CREATE STATISTICS now supporting most-common value (MCV) statistics to help generate better query plans when using columns that are nonuniformly distributed.

Covering indexes for GiST (INCLUDE)

Allows additional columns to be included in as “non-key” columns in a GiST index, and if requirements are met, can return data from these columns in an index-only scan.

K-nearest neighbor SP-GiST Support

Add support for K-nearest neighbor (K-NN) searches on SP-GiST indexes when the distance operator “<->” is defined

SQL

../../_images/sql.png

Announce ( https://www.postgresql.org/about/press/presskit12/ )

Just-in-time (JIT) compilation using LLVM, introduced in PostgreSQL 11, is now enabled by default. JIT compilation can provide performance benefits to the execution of expressions in WHERE clauses, target lists, aggregates, and some internal operations, and is available if your PostgreSQL installation is compiled or packaged with LLVM.

COPY FROM WHERE

Specify which rows are loaded from a data source via a WHERE clause when using COPY FROM.

JSON Path

Announce ( https://www.postgresql.org/about/press/presskit12/ )

PostgreSQL is known for its conformance to the SQL standard, one reason why it was renamed from POSTGRES to PostgreSQL , and PostgreSQL 12 adds several features to continue its implementation of the SQL standard with enhanced functionality.

PostgreSQL 12 introduces the ability to run queries over JSON documents using JSON path expressions defined in the SQL/JSON standard.

Such queries may utilize the existing indexing mechanisms for documents stored in the JSONB format to efficiently retrieve data.

../../_images/json_path.png

https://modern-sql.com/blog/2017-06/whats-new-in-sql-2016

Data Definition Language (DDL)

../../_images/ddl.png

Announce ( https://www.postgresql.org/about/press/presskit12/ )

PostgreSQL 12 introduces the ability to rebuild indexes without blocking writes to an index via the REINDEX CONCURRENTLY command, allowing users to avoid downtime scenarios for lengthy index rebuilds.

CREATE ACCESS METHOD

Define a new access method.

Example

Create an index access method heptree with handler function heptree_handler:

CREATE ACCESS METHOD heptree TYPE INDEX HANDLER heptree_handler;

Rebuilding Indexes Concurrently

Rebuilding an index can interfere with regular operation of a database. Normally PostgreSQL locks the table whose index is rebuilt against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index rebuild is finished.

This could have a severe effect if the system is a live production database.

Very large tables can take many hours to be indexed, and even for smaller tables, an index rebuild can lock out writers for periods that are unacceptably long for a production system.

PostgreSQL supports rebuilding indexes with minimum locking of writes. This method is invoked by specifying the CONCURRENTLY option of REINDEX.

When this option is used, PostgreSQL must perform two scans of the table for each index that needs to be rebuilt and wait for termination of all existing transactions that could potentially use the indexation

Examples

Rebuild a single index:

REINDEX INDEX my_index;

Rebuild all the indexes on the table my_table:

REINDEX TABLE my_table;

Generated Columns

PostgreSQL 12 introduces generated columns . Defined in the SQL standard, this type of column computes its value from the contents of other columns in the same table. In this version, PostgreSQL supports stored generated columns where the computed value is stored on the disk. ( https://www.postgresql.org/about/press/presskit12/ )

A generated column is a special column that is always computed from other columns.

Thus, it is for columns what a view is for tables.

There are two kinds of generated columns: stored and virtual.

A stored generated column is computed when it is written (inserted or updated) and occupies storage as if it were a normal column.

A virtual generated column occupies no storage and is computed when it is read.

Thus, a virtual generated column is similar to a view and a stored generated column is similar to a materialized view (except that it is always updated automatically).

PostgreSQL currently implements only stored generated columns .

To create a generated column, use the GENERATED ALWAYS AS clause in CREATE TABLE, for example:

CREATE TABLE people (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);

The keyword STORED must be specified to choose the stored kind of generated column.

Performance

../../_images/performance.png

Accelerated partition pruning

Announce ( https://www.postgresql.org/about/press/presskit12/ )

Queries on partitioned tables have also seen demonstrable improvements, particularly for tables with thousands of partitions that only need to retrieve data from a limited subset. PostgreSQL 12 also improves the performance of adding data to partitioned tables with INSERT and COPY, and includes the ability to attach a new partition to a table without blocking queries.

Detail

Performance improvements for queries on tables with thousands of partitions when only a limited subset of partitions need to be accessed.

2ndQuadrant

SELECT Performance

Back in PostgreSQL 10, the query planner would check the constraint of each partition one-by-one to see if it could possibly be required for the query. This meant a per-partition overhead, resulting in planning times increasing with higher numbers of partitions.

PostgreSQL 11 improved this by adding partition pruning , an algorithm which can much more quickly identify matching partitions. However, PostgreSQL 11 still did some unnecessary processing and still loaded meta-data for each partition, regardless of if it was pruned or not.

PostgreSQL 12 changes things so this meta-data loading is performed after partition pruning. This results in significant performance improvements in the query planner when many partitions are pruned.

CREATE STATISTICS - most-common values (MCV) statistics collection

Detail

CREATE STATISTICS can collection statistics on the most common value, which can improve optimizations for columns that contain nonuniform distributions.

Enterprisedb

This update, which has been in development for several years, is meant to address an issue that has generated complaints over the years: the edge case of correlated columns in a query.

Take the example of Cincinnati, Ohio — you have one field labeled city, and another field called state, with Cincinnati in one column and Ohio in another. Cincinnati, Ohio, is going to be fairly common, but Cincinnati, Arizona, is very rare.

PostgreSQL, up until this feature, only recorded a single correlation value for multiple columns. In essence, it would count Cincinnati, Ohio and Cincinnati, Arizona as the same thing.

Now you can compare multiple columns and correlate the combinations to optimize indexes for queries.

Inlined Common Table Expressions (WITH queries)

Detail

A WITH query that is neither recursive nor has any side-effects (e.g. an INSERT/UPDATE/DELETE) can be executed inline, which can lead to performance improvements.

This behavior can be forced on a query by using the NOT MATERIALIZED clause, e.g.:

WITH cte AS NOT MATERIALIZED ( SELECT * FROM a ) SELECT * FROM cte JOIN b ON b.id = cte.id;

info.crunchydata (WITH Queries Get a Big Boost)

When the inlined common table expression patch was committed (aka CTEs, aka WITH queries) I could not wait to write an article on how big a deal this was for PostgreSQL application developers. This is one of those features where you can see your applications get faster, well, if you make use of CTEs.

I’ve often found that developers that are new to SQL like to make use of CTEs: if you write them in a certain way, it can feel like you’re writing an imperative program.

I also enjoyed rewriting those queries to not use CTEs and demonstrate a performance gain. Alas, these days are now gone.

PostgreSQL 12 now allows a certain kind of CTE to be inlined, i.e. one that has no side-effects (a SELECT) that is used only once later in a query. If I had collected statistics on the number of queries using CTEs I would rewrite, the majority would fall into this group. This will help developers to write code that can feel more readable that is now performant as well .

What’s better is that PostgreSQL 12 will optimize the execution of this SQL without you having to do any additional work. And while I may no longer have to optimize this type of query pattern, it’s certainly better that PostgreSQL is continuing to improve its query optimizations.

Partitioning & Inheritance

../../_images/partition_inheritance.png

2ndquadrant: Partitioning performance

Partitioning isn’t a new feature, it’s been around for several years, but the partitioning overhead detracted from performance.

While PostgreSQL 11 introduced some performance improvements for partitioning, PostgreSQL 12 delivers a polished implementation.

For users moving from other databases with thousands of partitions, PostgreSQL 12 now delivers performance benefits by delivering capabilities that can efficiently process thousands of partitions simultaneously.

Partitioning performance enhancements can improve query performance, particularly performance with INSERT and COPY statements.

In addition, users now have the ability to alter partitioned tables without blocking queries and use foreign keys to reference partitioned tables.

info.crunchydata: Partitioning is Bigger, Better, Faster

PostgreSQL 10 introduced declarative partitioning. PostgreSQL 11 made it much easier to use.

PostgreSQL 12 lets you really scale your partitions .

PostgreSQL 12 received significant performance improvements to the partitioning system, notably around how it can process tables that have thousands of partitions.

For example, a query that only affects a few partitions on a table with thousands of them will perform significantly faster. In addition to seeing performance improvements on those types of queries, you should also see an improvement in INSERT speed on tables with many partitions as well.

Writing data with COPY, which is a great way to bulk load data (here’s an example of JSON ingestion ) to partitioned tables, also received a boost in PostgreSQL 12. Using COPY was already fast;

PostgreSQL 12 has made it noticeably faster.

All of the above makes it possible to store even larger data sets in PostgreSQL while making it easier to retrieve the data and, even better, it should just work.

Applications that tend to have a lot of partitions, e.g. ones that record time series data, should see noticeable performance improvements just with an upgrade.

And while it may not broadly fall under the “make better just by upgrading” category, PostgreSQL 12 allows you to create foreign keys that reference partitioned tables, eliminating a “gotcha” that you may have experienced with partitioning.

Foreign Key references for partitioned tables

A foreign key can reference a partitioned table.

Backup, Restore, Data Integrity, & Replication

../../_images/backup_restore.png

Enable/Disable page checksums in an offline cluster

Page checksums can be enabled or disabled in an offline PostgreSQL cluster via the pg_checksums command: https://www.postgresql.org/docs/12/app-pgchecksums.html

Configuration Management

../../_images/configuration_management.png

https://www.postgresql.org/about/featurematrix/#configuration-management

Fractional input for “integer” values

Fractional input for “integer” server variables is now accepted e.g. SET work_mem = ‘24.2MB’

Security

../../_images/security.png

Announce ( https://www.postgresql.org/about/press/presskit12/ )

PostgreSQL expands on its robust authentication method support with several enhancements that provide additional security and functionality.

This release introduces both client and server-side encryption for authentication over GSSAPI interfaces, as well as the ability for PostgreSQL to discover LDAP servers if PostgreSQL is compiled with OpenLDAP.

Additionally, PostgreSQL 12 now supports a form of multifactor authentication. A PostgreSQL server can now require an authenticating client to provide a valid SSL certificate with their username using the clientcert=verify-full option and combine this with the requirement of a separate authentication method (e.g. scram-sha-256).

GSSAPI client and server-side encryption

Support for client and server-side connection encryption when using GSSAPI for authentication. https://www.postgresql.org/docs/12/gssapi-auth.html

LDAP server discovery

LDAP servers with DNS SRV can be discovered if PostgreSQL is built with OpenLDAP

Multifactor authentication via valid client SSL certificate

If an authentication entry in the pg_hba.conf specifies the “clientcert=verify-full”, then the client must present a valid SSL certificate that matches the login name, or the client name based on a map.

Internationalisation

Announce ( https://www.postgresql.org/about/press/presskit12 )

PostgreSQL 12 extends its support of ICU collations by allowing users to define nondeterministic collations that can, for example, allow case-insensitive or accent-insensitive comparisons

../../_images/i18n.png

Nondeterministic Collations

A collation is either deterministic or nondeterministic.

A deterministic collation uses deterministic comparisons, which means that it considers strings to be equal only if they consist of the same byte sequence.

Nondeterministic comparison may determine strings to be equal even if they consist of different bytes.

Typical situations include case-insensitive comparison, accent-insensitive comparison, as well as comparison of strings in different Unicode normal forms.

It is up to the collation provider to actually implement such insensitive comparisons; the deterministic flag only determines whether ties are to be broken using bytewise comparison.

See also Unicode Technical Standard 10 for more information on the terminology.

To create a nondeterministic collation, specify the property deterministic = false to CREATE COLLATION, for example:

CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);

psql

PG_COLOR=auto will bring colour to my psql !

Upgrade considerations

Since index storage has been changed in v12, a new B-tree index version 4 has been introduced.

Since upgrading with pg_upgrade does not change the data files, indexes will still be in version 3 after an upgrade.

PostgreSQL v12 can use these indexes, but the above optimizations will not be available.

You need to REINDEX an index to upgrade it to version 4 (this has been made easier with REINDEX CONCURRENTLY in PostgreSQL v12).