The PostgreSQL^® Global Development Group (PGDG) has released version 12 of the popular free PostgreSQL^® database. As our article for Beta 4 has already indicated, a number of new features, improvements and optimizations have been incorporated into the release. These include among others:

Optimized disk space utilization and speed for btree indexes

btree-Indexes, the default index type in PostgreSQL^®, has experienced some optimizations in PostgreSQL^® 12.

btree indexes used to store duplicates (multiple entries with the same key values) in an unsorted order. This has resulted in suboptimal use of physical representation in these indexes. An optimization now stores these multiple key values in the same order as they are physically stored in the table. This improves disk space utilization and the effort required to manage corresponding btree type indexes. In addition, indexes with multiple indexed columns use an improved physical representation so that their storage utilization is also improved. To take advantage of this in PostgreSQL^® 12, however, if they were upgraded to the new version using pg_upgrade via a binary upgrade, these indexes must be recreated or re-indexed.

Insert operations in btree indexes are also accelerated by improved locking.

Improvements for pg_checksums

credativ has contributed an extension for pg_checksums that allows to enable or disable block checksums in stopped PostgreSQL^® instances. Previously, this could only be done by recreating the physical data representation of the cluster using initdb.
pg_checksums now has the option to display a status history on the console with the --progress parameter. The corresponding code contributions come from the colleagues Michael Banck and Bernd Helmle.

Optimizer Inlining of Common Table Expressions

Up to and including PostgreSQL^® 11, the PostgreSQL^® Optimizer was unable to optimize common table expressions (also called CTE or WITH queries). If such an expression was used in a query, the CTE was always evaluated and materialized first before the rest of the query was processed. This resulted in expensive execution plans for more complex CTE expressions. The following generic example illustrates this. A join is given with a CTE expression that filters all even numbers from a numeric column:

WITH t_cte AS (SELECT id FROM foo WHERE id % 2 = 0) SELECT COUNT(*) FROM t_cte JOIN bar USING(id);

In PostgreSQL^® 11, using a CTE always leads to a CTE scan that materializes the CTE expression first:

EXPLAIN (ANALYZE, BUFFERS) WITH t_cte AS (SELECT id FROM foo WHERE id % 2 = 0) SELECT COUNT(*) FROM t_cte JOIN bar USING(id) ;
                                                       QUERY PLAN                                                        
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=2231.12..2231.14 rows=1 width=8) (actual time=48.684..48.684 rows=1 loops=1)
   Buffers: shared hit=488
   CTE t_cte
     ->  Seq Scan on foo  (cost=0.00..1943.00 rows=500 width=4) (actual time=0.055..17.146 rows=50000 loops=1)
           Filter: ((id % 2) = 0)
           Rows Removed by Filter: 50000
           Buffers: shared hit=443
   ->  Hash Join  (cost=270.00..286.88 rows=500 width=0) (actual time=7.297..47.966 rows=5000 loops=1)
         Hash Cond: (t_cte.id = bar.id)
         Buffers: shared hit=488
         ->  CTE Scan on t_cte  (cost=0.00..10.00 rows=500 width=4) (actual time=0.063..31.158 rows=50000 loops=1)
               Buffers: shared hit=443
         ->  Hash  (cost=145.00..145.00 rows=10000 width=4) (actual time=7.191..7.192 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 480kB
               Buffers: shared hit=45
               ->  Seq Scan on bar  (cost=0.00..145.00 rows=10000 width=4) (actual time=0.029..3.031 rows=10000 loops=1)
                     Buffers: shared hit=45
 Planning Time: 0.832 ms
 Execution Time: 50.562 ms
(19 rows)

This plan first materializes the CTE with a sequential scan with a corresponding filter (id % 2 = 0). Here no functional index is used, therefore this scan is correspondingly more expensive. Then the result of the CTE is linked to the table bar by Hash Join with the corresponding Join condition. With PostgreSQL^® 12, the optimizer now has the ability to inline these CTE expressions without prior materialization. The underlying optimized plan in PostgreSQL^® 12 will look like this:

EXPLAIN (ANALYZE, BUFFERS) WITH t_cte AS (SELECT id FROM foo WHERE id % 2 = 0) SELECT COUNT(*) FROM t_cte JOIN bar USING(id) ;
                                                                QUERY PLAN                                                                 
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=706.43..706.44 rows=1 width=8) (actual time=9.203..9.203 rows=1 loops=1)
   Buffers: shared hit=148
   ->  Merge Join  (cost=0.71..706.30 rows=50 width=0) (actual time=0.099..8.771 rows=5000 loops=1)
         Merge Cond: (foo.id = bar.id)
         Buffers: shared hit=148
         ->  Index Only Scan using foo_id_idx on foo  (cost=0.29..3550.29 rows=500 width=4) (actual time=0.053..3.490 rows=5001 loops=1)
               Filter: ((id % 2) = 0)
               Rows Removed by Filter: 5001
               Heap Fetches: 10002
               Buffers: shared hit=74
         ->  Index Only Scan using bar_id_idx on bar  (cost=0.29..318.29 rows=10000 width=4) (actual time=0.038..3.186 rows=10000 loops=1)
               Heap Fetches: 10000
               Buffers: shared hit=74
 Planning Time: 0.646 ms
 Execution Time: 9.268 ms
(15 rows)

The advantage of this method is that there is no initial materialization of the CTE expression. Instead, the query is executed directly with a Join. This works for all non-recursive CTE expressions without side effects (for example, CTEs with write statements) and those that are referenced only once per query. The old behavior of the optimizer can be forced with the WITH ... AS MATERIALIZED ... directive.

Generated Columns

Generated Columns in PostgreSQL^® 12 are materialized columns, which calculate a result based on expressions using existing column values. These are stored with the corresponding result values in the tuple. The advantage is that there is no need to create triggers for subsequent calculation of column values. The following simple example illustrates the new functionality using a price table with net and gross prices:

CREATE TABLE preise(netto numeric,
                    brutto numeric GENERATED ALWAYS AS (netto * 1.19) STORED);
 
INSERT INTO preise VALUES(17.30);
INSERT INTO preise VALUES(225);
INSERT INTO preise VALUES(247);
INSERT INTO preise VALUES(19.15);
 
SELECT * FROM preise;
 netto │ brutto
───────┼─────────
 17.30 │ 20.5870
   225 │  267.75
   247 │  293.93
 19.15 │ 22.7885
(4 rows)

The column brutto is calculated directly from the net price. The keyword STORED is mandatory. Of course, indexes can also be created on Generated Columns, but they cannot be part of a primary key. Furthermore, the SQL expression must be unique, i.e. it must return the same result even if the input quantity is the same. Columns declared as Generated Columns cannot be used explicitly in INSERT or UPDATE operations. If a column list is absolutely necessary, the corresponding value can be indirectly referenced with the keyword DEFAULT.

Omission of explicit OID columns

Explicit OID columns have historically been a way to create unique column values so that a table row can be uniquely identified database-wide. However, for a long time PostgreSQL^® has only created these explicitly and considered their basic functionality obsolete. With PostgreSQL^® the possibility to create such columns explicitly is now finally abolished. This means that it will no longer be possible to specify the WITH OIDS directive for tables. System tables that have always referenced OID objects uniquely will now return OID values without explicitly specifying OID columns in the result set. Especially older software, which handled catalog queries carelessly, could get problems with a double column output.

Moving `recovery.conf` to `postgresql.conf`

Up to and including PostgreSQL^® 11, database recovery and streaming replication instances were configured via a separate configuration file recovery.conf.

With PostgreSQL^® 12, all configuration work done there now migrates to postgresql.conf. The recovery.conf file is no longer required. PostgreSQL^® 12 refuses to start as soon as this file exists. Whether recovery or streaming standby is desired is now decided either by a recovery.signal file (for recovery) or by a standby.signal file (for standby systems). The latter has priority if both files are present. The old parameter standby_mode, which controlled this behavior since then, has been removed.

For automatic deployments of high-availability systems, this means a major change. However, it is now also possible to perform corresponding configuration work almost completely using the ALTER SYSTEM command.

REINDEX CONCURRENTLY

With PostgreSQL^® 12 there is now a way to re-create indexes with as few locks as possible. This greatly simplifies one of the most common maintenance tasks in very write-intensive databases. Previously, a combination of CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY had to be used. In doing so, it was also necessary to ensure that index names were reassigned accordingly, if required.

The release notes give an even more detailed overview of all new features and above all incompatibilities with previous PostgreSQL^® versions.

Yesterday, the fourth beta of the upcoming PostgreSQL^®-major version 12 was released.

Compared to its predecessor PostgreSQL^® 11, there are many new features:

Performance improvements for indexes: btree indexes now manage space more efficiently. The REINDEX command now also supports CONCURRENTLY, which was previously only possible with new indexes.
WITH queries are now embedded in the main query and thus optimized much better by the planner. Previously, WITH queries were always executed independently.
The native partitioning was further improved. Foreign keys can now also reference partitioned tables. Maintenance commands such as ATTACH PARTITION no longer require an exclusive table lock.
The support of page checksums and the tool pg_checksums was further improved, also with substantial cooperation by credativ.
It is now possible to integrate additional storage engines. The “zheap”, which is still under development, will be based on this, which promises more compact data storage with less bloat.

Of course, PostgreSQL^® 12 will be tested using sqlsmith, the SQL “fuzzer” from our colleague Andreas Seltenreich. Numerous bugs in different PostgreSQL^® versions were found with sqlsmith by using randomly generated SQL queries.

Debian and Ubuntu packages for PostgreSQL^® 12 are going to be published on apt.postgresql.org with credativ’s help. This work will be handled by our colleague Christoph Berg.

The release of PostgreSQL^® 12 is expected in the next weeks.

In this article we will look at the highly available operation of PostgreSQL^® in a Kubernetes environment. A topic that is certainly of particular interest to many of our PostgreSQL^® users.

Together with our partner company MayaData, we will demonstrate below the application possibilities and advantages of the extremely powerful open source project – OpenEBS.

OpenEBS is a freely available storage management system, whose development is supported and backed by MayaData.

We would like to thank Murat-Karslioglu from MayaData and our colleague Adrian Vondendriesch for this interesting and helpful article. This article simultaneously also appeared on OpenEBS.io.

PostgreSQL^® anywhere — via Kubernetes with some help from OpenEBS and credativ engineering

by Murat Karslioglu, OpenEBS and Adrian Vondendriesch, credativ

Introduction

If you are already running Kubernetes on some form of cloud whether on-premises or as a service, you understand the ease-of-use, scalability and monitoring benefits of Kubernetes — and you may well be looking at how to apply those benefits to the operation of your databases.

PostgreSQL^® remains a preferred relational database, and although setting up a highly available Postgres cluster from scratch might be challenging at first, we are seeing patterns emerging that allow PostgreSQL^® to run as a first class citizen within Kubernetes, improving availability, reducing management time and overhead, and limiting cloud or data center lock-in.

There are many ways to run high availability with PostgreSQL^®; for a list, see the PostgreSQL^® Documentation. Some common cloud-native Postgres cluster deployment projects include Crunchy Data’s, Sorint.lab’s Stolon and Zalando’s Patroni/Spilo. Thus far we are seeing Zalando’s operator as a preferred solution in part because it seems to be simpler to understand and we’ve seen it operate well.

Some quick background on your authors:

OpenEBS is a broadly deployed OpenSource storage and storage management project sponsored by MayaData.
credativ is a leading open source support and engineering company with particular depth in PostgreSQL^®.

In this blog, we’d like to briefly cover how using cloud-native or “container attached” storage can help in the deployment and ongoing operations of PostgreSQL^® on Kubernetes. This is the first of a series of blogs we are considering — this one focuses more on why users are adopting this pattern and future ones will dive more into the specifics of how they are doing so.

At the end you can see how to use a Storage Class and a preferred operator to deploy PostgreSQL^® with OpenEBS underlying

If you are curious about what container attached storage of CAS is you can read more from the Cloud Native Computing Foundation (CNCF) here.

Conceptually you can think of CAS as being the decomposition of previously monolithic storage software into containerized microservices that themselves run on Kubernetes. This gives all the advantages of running Kubernetes that already led you to run Kubernetes — now applied to the storage and data management layer as well. Of special note is that like Kubernetes, OpenEBS runs anywhere so the same advantages below apply whether on on-premises or on any of the many hosted Kubernetes services.

PostgreSQL^® plus OpenEBS

®-with-OpenEBS-persistent-volumes.png”>Postgres-Operator (for cluster deployment)

Docker installed

Kubernetes 1.9+ cluster installed

kubectl installed

OpenEBS installed

Install OpenEBS

If OpenEBS is not installed in your K8s cluster, this can be done from here. If OpenEBS is already installed, go to the next step.
Connect to MayaOnline (Optional): Connecting the Kubernetes cluster to MayaOnline provides good visibility of storage resources. MayaOnline has various support options for enterprise customers.

Configure cStor Pool

If cStor Pool is not configured in your OpenEBS cluster, this can be done from here. As PostgreSQL^® is a StatefulSet application, it requires a single storage replication factor. If you prefer additional redundancy you can always increase the replica count to 3.
During cStor Pool creation, make sure that the maxPools parameter is set to >=3. If a cStor pool is already configured, go to the next step. Sample YAML named openebs-config.yaml for configuring cStor Pool is provided in the Configuration details below.

openebs-config.yaml

#Use the following YAMLs to create a cStor Storage Pool.
# and associated storage class.
apiVersion: openebs.io/v1alpha1
kind: StoragePoolClaim
metadata:
 name: cstor-disk
spec:
 name: cstor-disk
 type: disk
 poolSpec:
 poolType: striped
 # NOTE — Appropriate disks need to be fetched using `kubectl get disks`
 #
 # `Disk` is a custom resource supported by OpenEBS with `node-disk-manager`
 # as the disk operator
# Replace the following with actual disk CRs from your cluster `kubectl get disks`
# Uncomment the below lines after updating the actual disk names.
 disks:
 diskList:
# Replace the following with actual disk CRs from your cluster from `kubectl get disks`
# — disk-184d99015253054c48c4aa3f17d137b1
# — disk-2f6bced7ba9b2be230ca5138fd0b07f1
# — disk-806d3e77dd2e38f188fdaf9c46020bdc
# — disk-8b6fb58d0c4e0ff3ed74a5183556424d
# — disk-bad1863742ce905e67978d082a721d61
# — disk-d172a48ad8b0fb536b9984609b7ee653
 — -

Create Storage Class

You must configure a StorageClass to provision cStor volume on a cStor pool. In this solution, we are using a StorageClass to consume the cStor Pool which is created using external disks attached on the Nodes. The storage pool is created using the steps provided in the Configure StoragePool section. In this solution, PostgreSQL^® is a deployment. Since it requires replication at the storage level the cStor volume replicaCount is 3. Sample YAML named openebs-sc-pg.yaml to consume cStor pool with cStorVolume Replica count as 3 is provided in the configuration details below.

openebs-sc-pg.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-postgres
  annotations:
    openebs.io/cas-type: cstor
    cas.openebs.io/config: |
      - name: StoragePoolClaim
        value: "cstor-disk"
      - name: ReplicaCount
        value: "3"       
provisioner: openebs.io/provisioner-iscsi
reclaimPolicy: Delete
---

Launch and test Postgres Operator

Clone Zalando’s Postgres Operator.

git clone https://github.com/zalando/postgres-operator.git
cd postgres-operator

Use the OpenEBS storage class

Edit manifest file and add openebs-postgres as the storage class.

nano manifests/minimal-postgres-manifest.yaml

After adding the storage class, it should look like the example below:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
  namespace: default
spec:
  teamId: "ACID"
  volume:
    size: 1Gi
    storageClass: openebs-postgres
  numberOfInstances: 2
  users:
    # database owner
    zalando:
    - superuser
    - createdb
 
# role for application foo
    foo_user: []
 
#databases: name->owner
  databases:
    foo: zalando
  postgresql:
    version: "10"
    parameters:
      shared_buffers: "32MB"
      max_connections: "10"
      log_statement: "all"

Start the Operator

Run the command below to start the operator

kubectl create -f manifests/configmap.yaml # configuration
kubectl create -f manifests/operator-service-account-rbac.yaml # identity and permissions
kubectl create -f manifests/postgres-operator.yaml # deployment

Create a Postgres cluster on OpenEBS

Optional: The operator can run in a namespace other than default. For example, to use the test namespace, run the following before deploying the operator’s manifests:

kubectl create namespace test
kubectl config set-context $(kubectl config current-context) — namespace=test

Run the command below to deploy from the example manifest:

kubectl create -f manifests/minimal-postgres-manifest.yaml

2. It only takes a few seconds to get the persistent volume (PV) for the pgdata-acid-minimal-cluster-0 up. Check PVs created by the operator using the kubectl get pv command:

$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-8852ceef-48fe-11e9–9897–06b524f7f6ea 1Gi RWO Delete Bound default/pgdata-acid-minimal-cluster-0 openebs-postgres 8m44s
pvc-bfdf7ebe-48fe-11e9–9897–06b524f7f6ea 1Gi RWO Delete Bound default/pgdata-acid-minimal-cluster-1 openebs-postgres 7m14s

Connect to the Postgres master and test

If it is not installed previously, install psql client:

sudo apt-get install postgresql-client

2. Run the command below and note the hostname and host port.

kubectl get service — namespace default |grep acid-minimal-cluster

3. Run the commands below to connect to your PostgreSQL^® DB and test. Replace the [HostPort] below with the port number from the output of the above command:

export PGHOST=$(kubectl get svc -n default -l application=spilo,spilo-role=master -o jsonpath="{.items[0].spec.clusterIP}")
export PGPORT=[HostPort]
export PGPASSWORD=$(kubectl get secret -n default postgres.acid-minimal-cluster.credentials -o ‘jsonpath={.data.password}’ | base64 -d)
psql -U postgres -c ‘create table foo (id int)’

Congrats you now have the Postgres-Operator and your first test database up and running with the help of cloud-native OpenEBS storage.

Partnership and future direction

As this blog indicates, the teams at MayaData / OpenEBS and credativ are increasingly working together to help organizations running PostgreSQL^® and other stateful workloads. In future blogs, we’ll provide more hands-on tips.

We are looking for feedback and suggestions on where to take this collaboration. Please provide feedback below or find us on Twitter or on the OpenEBS slack community.

Patroni is a PostgreSQL high availability solution with a focus on containers and Kubernetes. Until recently, the available Debian packages had to be configured manually and did not integrate well with the rest of the distribution. For the upcoming Debian 10 “Buster” release, the Patroni packages have been integrated into Debian’s standard PostgreSQL framework by credativ. They now allow for an easy setup of Patroni clusters on Debian or Ubuntu.

Patroni employs a “Distributed Consensus Store” (DCS) like Etcd, Consul or Zookeeper in order to reliably run a leader election and orchestrate automatic failover. It further allows for scheduled switchovers and easy cluster-wide changes to the configuration. Finally, it provides a REST interface that can be used together with HAProxy in order to build a load balancing solution. Due to these advantages Patroni has gradually replaced Pacemaker as the go-to open-source project for PostgreSQL high availability.

However, many of our customers run PostgreSQL on Debian or Ubuntu systems and so far Patroni did not integrate well into those. For example, it does not use the postgresql-common framework and its instances were not displayed in pg_lsclusters output as usual.

Integration into Debian

In a collaboration with Patroni lead developer Alexander Kukushkin from Zalando the Debian Patroni package has been integrated into the postgresql-common framework to a large extent over the last months. This was due to changes both in Patroni itself as well as additional programs in the Debian package. The current Version 1.5.5 of Patroni contains all these changes and is now available in Debian “Buster” (testing) in order to setup Patroni clusters.

The packages are also available on apt.postgresql.org and thus installable on Debian 9 “Stretch” and Ubuntu 18.04 “Bionic Beaver” LTS for any PostgreSQL version from 9.4 to 11.

The most important part of the integration is the automatic generation of a suitable Patroni configuration with the pg_createconfig_patroni command. It is run similar to pg_createcluster with the desired PostgreSQL major version and the instance name as parameters:

pg_createconfig_patroni 11 test

This invocation creates a file /etc/patroni/11-test.yml, using the DCS configuration from /etc/patroni/dcs.yml which has to be adjusted according to the local setup. The rest of the configuration is taken from the template /etc/patroni/config.yml.in which is usable in itself but can be customized by the user according to their needs. Afterwards the Patroni instance is started via systemd similar to regular PostgreSQL instances:

systemctl start patroni@11-test

A simple 3-node Patroni cluster can be created and started with the following few commands, where the nodes pg1, pg2 and pg3 are considered to be hostnames and the local file dcs.yml contains the DCS configuration:


for i in pg1 pg2 pg3; do ssh $i 'apt -y install postgresql-common'; done
for i in pg1 pg2 pg3; do ssh $i 'sed -i "s/^#create_main_cluster = true/create_main_cluster = false/" /etc/postgresql-common/createcluster.conf'; done
for i in pg1 pg2 pg3; do ssh $i 'apt -y install patroni postgresql'; done
for i in pg1 pg2 pg3; do scp ./dcs.yml $i:/etc/patroni; done
for i in pg1 pg2 pg3; do ssh @$i 'pg_createconfig_patroni 11 test' && systemctl start patroni@11-test'; done

Afterwards, you can get the state of the Patroni cluster via

ssh pg1 'patronictl -c /etc/patroni/11-patroni.yml list'
+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member |    Host    |  Role  |  State  | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test |  pg1   | 10.0.3.111 | Leader | running |  1 |           |
| 11-test |  pg2   | 10.0.3.41  |        | stopped |    |   unknown |
| 11-test |  pg3   | 10.0.3.46  |        | stopped |    |   unknown |
+---------+--------+------------+--------+---------+----+-----------+

Leader election has happened and pg1 has become the primary. It created its instance with the Debian-specific pg_createcluster_patroni program that runs pg_createcluster in the background. Then the two other nodes clone from the leader using the pg_clonecluster_patroni program which sets up an instance using pg_createcluster and then runs pg_basebackup from the primary. After that, all nodes are up and running:

+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member |    Host    |  Role  |  State  | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test |  pg1   | 10.0.3.111 | Leader | running |  1 |         0 |
| 11-test |  pg2   | 10.0.3.41  |        | running |  1 |         0 |
| 11-test |  pg3   | 10.0.3.46  |        | running |  1 |         0 |
+---------+--------+------------+--------+---------+----+-----------+

The well-known Debian postgresql-common commands work as well:

ssh pg1 'pg_lsclusters'
Ver Cluster Port Status Owner    Data directory                 Log file
11  test    5432 online postgres /var/lib/postgresql/11/test    /var/log/postgresql/postgresql-11-test.log

Failover Behaviour

If the primary is abruptly shutdown, its leader token will expire after a while and Patroni will eventually initiate failover and a new leader election:

+---------+--------+-----------+------+---------+----+-----------+
| Cluster | Member |    Host   | Role |  State  | TL | Lag in MB |
+---------+--------+-----------+------+---------+----+-----------+
| 11-test |  pg2   | 10.0.3.41 |      | running |  1 |         0 |
| 11-test |  pg3   | 10.0.3.46 |      | running |  1 |         0 |
+---------+--------+-----------+------+---------+----+-----------+
[...]
+---------+--------+-----------+--------+---------+----+-----------+
| Cluster | Member |    Host   |  Role  |  State  | TL | Lag in MB |
+---------+--------+-----------+--------+---------+----+-----------+
| 11-test |  pg2   | 10.0.3.41 | Leader | running |  2 |         0 |
| 11-test |  pg3   | 10.0.3.46 |        | running |  1 |         0 |
+---------+--------+-----------+--------+---------+----+-----------+
[...]
+---------+--------+-----------+--------+---------+----+-----------+
| Cluster | Member |    Host   |  Role  |  State  | TL | Lag in MB |
+---------+--------+-----------+--------+---------+----+-----------+
| 11-test |  pg2   | 10.0.3.41 | Leader | running |  2 |         0 |
| 11-test |  pg3   | 10.0.3.46 |        | running |  2 |         0 |
+---------+--------+-----------+--------+---------+----+-----------+

The old primary will rejoin the cluster as standby once it is restarted:

+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member |    Host    |  Role  |  State  | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test |  pg1   | 10.0.3.111 |        | running |    |   unknown |
| 11-test |  pg2   | 10.0.3.41  | Leader | running |  2 |         0 |
| 11-test |  pg3   | 10.0.3.46  |        | running |  2 |         0 |
+---------+--------+------------+--------+---------+----+-----------+
[...]
+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member |    Host    |  Role  |  State  | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test |  pg1   | 10.0.3.111 |        | running |  2 |         0 |
| 11-test |  pg2   | 10.0.3.41  | Leader | running |  2 |         0 |
| 11-test |  pg3   | 10.0.3.46  |        | running |  2 |         0 |
+---------+--------+------------+--------+---------+----+-----------+

If a clean rejoin is not possible due to additional transactions on the old timeline the old primary gets re-cloned from the current leader. In case the data is too large for a quick re-clone, pg_rewind can be used. In this case a password needs to be set for the postgres user and regular database connections (as opposed to replication connections) need to be allowed between the cluster nodes.

Creation of additional Instances

It is also possible to create further clusters with pg_createconfig_patroni, one can either assign a PostgreSQL port explicitly via the --port option, or let pg_createconfig_patroni assign the next free port as is known from pg_createcluster:

for i in pg1 pg2 pg3; do ssh $i 'pg_createconfig_patroni 11 test2 && systemctl start patroni@11-test2'; done
ssh pg1 'patronictl -c /etc/patroni/11-test2.yml list'
+----------+--------+-----------------+--------+---------+----+-----------+
| Cluster  | Member |       Host      |  Role  |  State  | TL | Lag in MB |
+----------+--------+-----------------+--------+---------+----+-----------+
| 11-test2 |  pg1   | 10.0.3.111:5433 | Leader | running |  1 |         0 |
| 11-test2 |  pg2   |  10.0.3.41:5433 |        | running |  1 |         0 |
| 11-test2 |  pg3   |  10.0.3.46:5433 |        | running |  1 |         0 |
+----------+--------+-----------------+--------+---------+----+-----------+

Ansible Playbook

In order to easily deploy a 3-node Patroni cluster we have created an Ansible playbook on Github. It automates the installation and configuration of PostgreSQL and Patroni on the three nodes, as well as the DCS server on a fourth node.

Questions and Help

Do you have any questions or need help? Feel free to write to info@credativ.com.

This week version 1.3 of our PostgreSQL^® appliance Elephant Shed was released.

The highlight of the new version is support for Red Hat Enterprise Linux 7 and CentOS 7. As is already the case for Debian, the appliance heavily relies on the postgresql-common infrastructure which was previously ported to RPM.
The well-known PostgreSQL^® RPM packages from yum.postgresql.org are integrated into the system via pg_createcluster and can be administrated from the Elephant Shed web interface.

All other Elephant Shed components like pgAdmin4, Grafana, Prometheus, pgbackrest, Cockpit or shellinabox work in the same way as in the Debian version of the appliance. Only the SELinux functionality has to be deactivated in order to run pgAdmin4 and shellinabox as their packages do not support this.

Besides the port to RPM the appliance infrastructure was updated. The Prometheus node-exporter is now available in version 0.16 in which many metric names were adjusted to the Prometheus naming scheme. The Grafana dashboard was updated accordingly. The Apache configuration was switched from authnz_pam to authnz_external as the former is not available on CentOS and stable functionality could not longer be guaranteed on Debian Buster.

The next items on the Elephant-Shed roadmap are the integration of the REST-API in order to control particular components, as well as multi-host support so that several Elephant-Shed instances can be controlled simultaneously.
An overhaul of the user interface is planned as well.

The updated packages are available for download at packages.credativ.com. If Elephant-Shed was installed already, the updates are provided via apt as usual.

The open-source PostgreSQL^® appliance Elephant-Shed is developed by credativ and is increasingly popular, as the most important compontents for the administration of PostgreSQL^® servers are already included. Adjustments and extensions can be done at any time.

Comprehensible technical support for Elephant-Shed is offered by credativ including guaranteed service-level agreements and optional 365 days and 24/7 hours.

Today PostgreSQL^® version 11 was released. The new release brings improvements in many areas.

Since version 9.6 query plans can be executed on multiple CPU cores in parallel, this is now supported for other plan types, especially the creation of B-tree indexes. Sequential scans and UNION queries have also been improved.

Brand new is the possibility to optimize queries via Just-in-Time Compilation (JIT). When PostgreSQL^® is compiled, the source code is stored as LLVM bit code. When executing a query whose planner cost exceeds a limit, libllvm then translates this bit code into native machine code specifically for that query. Since all used data types etc. are known in advance, the machine code eliminates all case distinctions that are generally necessary. The feature is disabled by default and can be enabled with “SET jit = on;”. In PostgreSQL^® 12 it should then be active by default.

Until now only functions could be defined on SQL level. New are now procedures which can manage BEGIN/COMMIT independently. Batch operations can now be completely transferred to the database side.

Table Partitioning section has been further improved to support hash partitioning. Integration with partitions using postgres_fdw has also been improved. It is now possible to create a default partition that holds data that does not fall into any of the existing partitions.

This Amp goes to 11!

Other improvements include the ability to add columns with default values to tables without having to rewrite the table completely. “Covering Indexes” allows more index accesses than “Index Only Scan”. Window functions now also support the RANGE and GROUPS keywords.

More information can be found in the release announcement of the PostgreSQL^® project.

Next week the PGConf.EU will take place in Lisbon. We from credativ are present with a booth and celebrate the PostgreSQL^® release with our “This Amp goes to 11” T-shirt.

The shirt is available for free at our booth. If you don’t attend the conference, you can order a shirt. The profits will be donated to the PostgreSQL^® project.

The second beta release of PostgreSQL 11 (which is now feature frozen) has been released recently. Time to look at some of the improvements that credativ has contributed in the area of checksums and backups.

Checksum verification during base backups

Since version 9.3 it is possible to enable checksums for the underlying storage of tables and indices during instance creation. Those checksums raise SQL errors if bit errors are encountered in their respective data pages, which allows for early discovery of storage issues. However those checksums are only verified if queries access the corrupted page. Running an explicit check is only possible with the forthcoming pg_verify_checksums application from version 11, however, it requires an offline instance in order to work.

Our change allows for verification of checksums during base backups. This is a good opportunity to verify the checksum consistency as all data blocks needs to be read during a base backup anyway. Checksum failures are logged as warnings (rather than errors) in order not to abort the whole base backup when they occur. The commonly used pg_basebackup application was extended with the –no-verify-checksums option which disables the verification.

Replication slots during base backups

The second change concerns the handling of replication slots by pg_basebackup during the setup of standby servers. Replication slots allow a primary to reserve the required transaction logs for the standby associated with the slot, even if it is temporarily down. Previous releases already allowed using a replication slot with pg_basebackup, however, this slot had to be created manually beforehand. Our change adds the new option -C or –create-slot and allows the on-shot creation of a standby clone including the usage of replication slots:

$ pg_basebackup -v -h primary.lan -D data --slot=standby1 --create-slot --write-recovery-conf
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/1D000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created replication slot "standby1"
pg_basebackup: write-ahead log end point: 0/1D0000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
$ cat data2/recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=postgres passfile=''/var/lib/postgresql/.pgpass'' host=primary.lan
  port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any'
primary_slot_name = 'standby1'

Afterwards the standby just has to be started and will replicate automatically.

In addition, several other small improvements to pg_basebackup and its testsuite were done by us.

Parallel dump to /dev/null

A patch that did not make it into the release is presented here nevertheless: parallel pg_dump to /dev/null in the directory format. The reason for it is the common usage of pg_dump to check for errors in a PostgreSQL instance where /dev/null/ is used a target in order not to use additional disk space. The problem is that /dev/null can only be used in the custom format which does not allow dumping in parallel. The directory format supports multiple concurrent processes but cannot use /dev/null as target as it is not a directory. Our patch adds support for /dev/null as a target when using the directory format.

The reasons for the rejection were not technical issues with the patch but the fact that pg_dump is not a diagnostics tool and no special support for that should be included. Nevertheless, the submitted patch works and is being used by our clients. Versions of the patch for PostgreSQL 9.3, 9.4, 9.5, 9.6 and 10 are available.

The credativ PostgreSQL^® Competence Center has released a project roadmap for the PostgreSQL^® Appliance Elephant Shed.

Elephant Shed is a freely available PostgreSQL^® solution developed by credativ, which combines management, monitoring and administration into one system.

The project roadmap for 2018 includes following points:

Q2 2018: Support for Ubuntu 18.04
Q3 2018: Support for CentOS 7

An additional planned feature is the implementation of REST API to control individual components. REST stands for REpresentational State Transfer and is an application programming interface based on the behavior of the World Wide Web. Specifically the PostgreSQL^® database and the backup via pgBackRest should be addressed.

Multi host support is also planned. A central control of several Elephant Shed instances is thus possible.

In order to make Elephant Shed even more user friendly, various configuration parameters of the web interface are going to be adjusted.

The project roadmap is of course also constantly being worked on. On GitHub you can leave us your feedback at any time.

We would like to take this opportunity to thank all users and testers, and look forward to further development of the project!

For further information please visit elephant-shed.io and GitHub.

This article was originally written by Philip Haas

Elephant Shed is now available as a Vagrant box. This makes it very easy to test and try out the PostgreSQL^® appliance.

Vagrant is an Open Source tool for creating portable virtual software environments. By being fully script controlled Vagrant makes it easy to generate virtual machines, in which a software component is installed for testing purposes. Vagrant itself is only the manager, whereas various backends such as Virtualbox or cloud providers can be used for the actual virtualization.

For the development of Elephant Shed we have relied on Vagrant from the very beginning. This “box” is now also available in the Vagrant Cloud.

To use this box Vagrant and VirtualBox must be installed. The host operating system hereby is irrelevant (Linux/MacOS/Windows), but inside the box runs Debian Stretch. The box is then automatically downloaded.

vagrant init credativ/elephant-shed
vagrant up

This creates a virtual machine where Elephant Shed runs in a VirtualBox on your computer.

Default-User: admin
Default-Passwort: admin
The web interface listens to port 4433:https://localhost:4433
PostgreSQL^® listens to port 55432: psql -h localhost -p 55432 -U admin

We are looking forward to your feedback!

Further information can be found on our Elephant Shed project page and on Github as well.

The pgAudit extension allows for fine-grained auditing of access to a PostgreSQL^® database. Among other things, it is an important component of the recently published PostgreSQL® STIG (“Security Technical Implementation Guide”) by the US Department of Defense. Debian packages created by us as part of the pkg-postgresql project are now available in the official PostgreSQL^® APT repository, as well as the developer branches of Debian and Ubuntu.

While PostgreSQL^® offers configurable logging by itself, the audit messages generated by pgAudit go significantly further and also cover most compliance guidelines. They are also written to the PostgreSQL^® log, but in a uniform format and, in contrast to conventional logging via e.g. log_statement, they are deterministic and comprehensive on the one hand, and targeted on the other. For example, it was previously possible to log executed SELECT commands to detect unwanted access to a specific table, but since this then applies to all SELECT commands, it is not manageable in practice. With pgAudit’s so-called object audit logging, it is possible to write only access to specific tables to the audit log by assigning appropriate permissions to an auditor role, e.g.:

Prepare Database

CREATE ROLE AUDITOR;
SET pgaudit.role = 'auditor';
CREATE TABLE account
(
 id INT,
 name TEXT,
 password TEXT,
 description TEXT
);
GRANT SELECT (password) ON public.account TO auditor;

Abfragen, die die Spalte password betreffen (und nur solche) erscheinen nun im Audit-Log:

SELECT id, name FROM account;
SELECT password FROM account;
AUDIT: OBJECT,1,1,READ,SELECT,TABLE,public.account,SELECT password FROM account

Log content

The first field is either SESSION or OBJECT for the corresponding audit logging type. The two subsequent fields are statement IDs, the fourth field is the query class (READ, WRITE, ROLE, DDL, etc.), followed by the command type and (if applicable) the object type and name; the last field is finally the command actually executed. Crucial for auditing is that the commands actually executed are logged, so that circumvention through deliberate obfuscation is not possible. An example of this from the pgAudit documentation is:

AUDIT: SESSION,1,1,FUNCTION,DO,,,"DO $$
BEGIN
 EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END $$;"
AUDIT: SESSION,1,2,DDL,CREATE TABLE,TABLE,public.important_table,CREATE TABLE important_table (id INT)

As can be seen from the command used (lines 1-4), an attempt is made here to prevent the table name important_table from appearing in the log file, but pgAudit reliably logs the table name (field 7) as well as the actually executed CREATE TABLE statement. However, in the case of the conventional PostgreSQL^® log, this attempt is successful; only the entered command is logged here:

LOG: statement: DO $$
BEGIN
 EXECUTE 'CREATE TABLE import' || 'ant_table (id INT)';
END $$;

The pgAudit extension can, in principle, be used from PostgreSQL^® version 9.5 onwards. While the version 1.1 of pgAudit packaged by us officially only supports 9.6, the created Debian package can also be used with 9.5 thanks to an additional patch. Therefore, packages for both 9.6 and 9.5 are available on apt.postgresql.org for all supported Debian and Ubuntu versions.

Informix FDW

In the context of many Informix migrations, credativ GmbH developed a Foreign Data Wrapper (FDW) for Informix databases. This supports migration, as well as the integration of PostgreSQL^® into existing Informix installations, to facilitate data exchange and processing. The Informix FDW supports all PostgreSQL® versions with SQL/MED implementation, i.e., from 9.1 up to the current 9.3. Furthermore, with PostgreSQL^® 9.3, the Informix FDW also supports write operations on Foreign Tables.

Installation

The installation requires at least an existing IBM CSDK for Informix. The CSDK can be obtained directly via download. In the following, we assume that a CSDK installation exists in /opt/IBM/informix. The path may vary depending on the installation. The Informix FDW sources can be cloned directly from the credativ GitHub repository, or a release tarball can be acquired (version 0.2.1 at the time of this article). The following installation description assumes a system with CentOS 6, but can be performed with variations on any other Linux system.

% wget 'https://github.com/credativ/informix_fdw/archive/REL0_2_1.tar.gz'
% tar -xzf REL0_2_1.tar.gz

Subsequently, the FDW can be built by specifying the CSDK installation folder:

% export INFORMIXDIR=/opt/IBM/informix
% export PATH=$INFORMIXDIR/bin:$PATH
% PG_CONFIG=/usr/pgsql-9.3/bin/pg_config make

In the listing, the environment variable PG_CONFIG was explicitly set to the PostgreSQL^® 9.3 installation. This refers to an installation of PGDG-RPM packages on CentOS 6, where pg_config is located outside the standard path. Furthermore, the postgresql93-devel package must be installed. If all paths have been set correctly, the FDW can be installed.

% PG_CONFIG=/usr/pgsql-9.3/bin/pg_config make install

This installs all necessary libraries. For these to function correctly and for the required Informix libraries to be found when loading into the PostgreSQL^® instance, the latter must still be configured in the system’s dynamic linker. Based on the previously mentioned CentOS 6 platform, this is most easily done via an additional configuration file in /etc/ld.so.conf.d (this requires root privileges in this case!):

% vim /etc/ld.so.conf.d/informix.conf

This file should contain the paths to the required Informix libraries:

/opt/IBM/informix/lib
/opt/IBM/informix/lib/esql

Subsequently, the dynamic linker’s cache must be refreshed:

% ldconfig

The Informix FDW installation should now be ready for use.

Configuration

To use the Informix FDW in a database, the FDW must be loaded into the respective database. The FDW is a so-called EXTENSION, and these are loaded with the CREATE EXTENSION command:

#= CREATE EXTENSION informix_fdw;
CREATE EXTENSION

To connect an Informix database via the Informix FDW, one first needs a definition of how the external data source should be accessed. For this, a definition is created with the CREATE SERVER command, containing the information required by the Informix FDW. It should be noted that the options passed to a SERVER directive depend on the respective FDW. For the Informix FDW, at least the following parameters are required:

informixdir – CSDK installation directory
informixserver – Name of the server connection configured via the Informix CSDK (see Informix documentation, or $INFORMIXDIR/etc/sqlhosts for details)

The SERVER definition is then created as follows:

=# CREATE SERVER centosifx_tcp
  FOREIGN DATA WRAPPER informix_fdw
  OPTIONS(informixdir '/opt/IBM/informix',
  informixserver 'ol_informix1210');

The informixserver variable is the instance name of the Informix instance; here, one simply needs to follow the specifications of the Informix installation. The next step now creates a so-called Usermapping, with which a PostgreSQL^® role (or user) is bound to the Informix access data. If this role is logged in to the PostgreSQL^® server, it automatically uses the specified login information via the Usermapping to log in to the Informix server centosifx_tcp.

=# CREATE USER MAPPING FOR bernd
  SERVER centosifx_tcp
  OPTIONS(username 'informix',
  password 'informix')

Here too, the parameters specified in the OPTIONS directive depend on the respective FDW. Now, so-called Foreign Tables can be created to integrate Informix tables. In the following example, a simple relation with street names from the Informix server ol_informix1210 is integrated into the PostgreSQL® instance. It is important that the conversions of data types from Informix to PostgreSQL® occur with compatible data types. First, the definition of the Informix table as it was created in the Informix instance:

CREATE TABLE osm_roads(id bigint primary key,
  name varchar(255),
  highway varchar(32),
  area varchar(10));

The definition in PostgreSQL^® should be analogous; if data types cannot be converted, an error will be thrown when creating the Foreign Table. When converting string types such as varchar, it is also important that a corresponding target conversion is defined when creating the Foreign Table. The Informix FDW therefore requires the following parameters when creating a Foreign Table:

client_locale – Client locale (should be identical to the server encoding of the PostgreSQL^® instance)
db_locale – Database server locale
table or query – The table or SQL query on which the Foreign Table is based. A Foreign Table based on a query cannot execute modifying SQL operations.

#= CREATE FOREIGN TABLE osm_roads(id bigint,
  name varchar(255),
  highway varchar(32),
  area varchar(10))
  SERVER centosifx_tcp
  OPTIONS(table 'osm_roads',
  database 'kettle',
  client_locale 'en_US.utf8',
  db_locale 'en_US.819');

If you are unsure what the default locale of the PostgreSQL ^® database looks like, you can most easily display it via SQL:

=# SELECT name, setting FROM pg_settings WHERE name IN ('lc_ctype', 'lc_collate');
  name | setting  
------------+------------
  lc_collate | en_US.utf8
  lc_ctype | en_US.utf8
(2 rows)

The locale and encoding settings should definitely be chosen to be as consistent as possible. If all settings have been chosen correctly, the data of the osm_roads table can be accessed directly via PostgreSQL^®:

=# SELECT id, name FROM osm_roads WHERE name = 'Albert-Einstein-Straße';
  id | name  
------+------------------------
  1002 | Albert-Einstein-Straße
(1 row)

Depending on the number of tuples in the Foreign Table, selection can take a significant amount of time, as with incompatible WHERE clauses, the entire data set must first be transmitted to the PostgreSQL^® server via a full table scan. However, the Informix FDW supports Predicate Pushdown under certain conditions. This refers to the ability to transmit parts of the WHERE condition that concern the Foreign Table to the remote server and apply the filter condition there. This saves the transmission of inherently useless tuples, as they would be filtered out on the PostgreSQL^® server anyway. The above example looks like this in the execution plan:

#= EXPLAIN (ANALYZE, VERBOSE)
  SELECT id, name  
  FROM osm_roads WHERE name = 'Albert-Einstein-Straße';
  QUERY PLAN  
-----------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.osm_roads (cost=2925.00..3481.13 rows=55613 width=24) (actual time=85.726..4324.341 rows=1 loops=1)
  Output: id, name
  Filter: ((osm_roads.name)::text = 'Albert-Einstein-Straße'::text)
  Rows Removed by Filter: 55612
  Informix costs: 2925.00
  Informix query: SELECT *, rowid FROM osm_roads
  Total runtime: 4325.351 ms

The Filter display in this execution plan shows that a total of 55612 tuples were filtered out, and ultimately only a single tuple was returned because it met the filter condition. The problem here lies in the implicit cast that PostgreSQL^® applies in the WHERE condition to the string column name. Current versions of the Informix FDW do not yet account for this. However, predicates can be transmitted to the Foreign Server if they correspond to integer types, for example:

#= EXPLAIN (ANALYZE, VERBOSE) SELECT id, name FROM osm_roads WHERE id = 1002;
  QUERY PLAN  
--------------------------------------------------------------------------------------------------------------------------
  Foreign Scan on public.osm_roads (cost=2925.00..2980.61 rows=5561 width=40) (actual time=15.849..16.410 rows=1 loops=1)
  Output: id, name
  Filter: (osm_roads.id = 1002)
  Informix costs: 2925.00
  Informix query: SELECT *, rowid FROM osm_roads WHERE (id = 1002)
  Total runtime: 17.145 ms
(6 rows)

Data Modification

With PostgreSQL^® 9.3, the Informix FDW also supports manipulating (UPDATE), deleting (DELETE), and inserting (INSERT) data into the Foreign Table. Furthermore, transactions initiated in the PostgreSQL® client implicitly interact with transactions on the Foreign Server (provided that the Informix database was created with logging and is therefore transactional). The following example demonstrates this capability:

=# BEGIN;
BEGIN
 
*=# INSERT INTO osm_roads(id, name, highway) VALUES(55614, 'Hans-Mustermann-Straße', 'no');
INSERT 0 1
 
*=# SAVEPOINT A;
SAVEPOINT
 
*=# UPDATE osm_roads SET area = 'Nordrhein-Westfalen' WHERE id = 55614;
ERROR: value too long for type character varying(10)
 
!=# ROLLBACK TO A;
ROLLBACK
 
*=# UPDATE osm_roads SET area = 'NRW' WHERE id = 55614;
UPDATE 1
 
*=# COMMIT;
COMMIT
 
=# SELECT * FROM osm_roads WHERE id = 55614;
  id | name | highway | area  
-------+------------------------+---------+------
  55614 | Hans-Mustermann-Straße | no | NRW
(1 row)

The example first creates a new record after starting a transaction. Subsequently, a SAVEPOINT is set to secure the current state of this transaction. In the next step, the new record is modified because the locality of the new record was forgotten to be specified. However, since only country codes are allowed, this fails due to the excessive length of the new identifier, and the transaction becomes invalid. Via the SAVEPOINT, the transaction is rolled back to the last created savepoint, and the Informix FDW also implicitly rolls back to this SAVEPOINT on the Foreign Server. Subsequently, the correct country code can be entered within this transaction. After the COMMIT, the Informix FDW also confirms the transaction on the Foreign Server, and the record has been correctly entered.

Summary

Foreign Data Wrappers are a very powerful and flexible tool for integrating PostgreSQL^® into heterogeneous database landscapes. These interfaces are by no means limited to purely relational data sources. Furthermore, with the support of modifying SQL queries (DML), the FDW API also enables the integration of writable data sources into PostgreSQL^® databases.

This article was originally written by Bernd Helmle.

PostgreSQL^® 9.3 is here! The new major version brings many improvements and new functions. This article introduces some of the interesting innovations.

Improved Locking with Foreign Keys

Foreign keys are indispensable for the referential integrity of the data. However, up to and including PostgreSQL^® 9.2, these also caused locking problems under certain conditions when UPDATE was performed on columns with foreign keys. Especially with overlapping updates of several transactions via a foreign key, deadlocks can even occur. The following example is problematic in PostgreSQL^® 9.2, exemplified by this simple data model:

CREATE TABLE parent(id integer, value text);
ALTER TABLE parent ADD PRIMARY KEY(id);
CREATE TABLE child(id integer, parent_id integer REFERENCES parent(id), value text);
INSERT INTO parent VALUES(1, 'bob');
INSERT INTO parent VALUES(2, 'andrea');

Two transactions, called Session 1 and Session 2, are started simultaneously in the database:

Session 1

BEGIN;
INSERT INTO child VALUES(1, 1, 'abcdef');
UPDATE parent SET value = 'thomas' WHERE id = 1;

Session 2

BEGIN;
INSERT INTO child VALUES(2, 1, 'ghijkl');
UPDATE parent SET value = 'thomas' WHERE id = 1;

Session 1 and Session 2 both execute the INSERT successfully, but the UPDATE blocks in Session 1 because the INSERT in Session 2 holds a so-called SHARE LOCK on the tuple with the foreign key. This conflicts with the lock that the UPDATE wants to have on the foreign key. Now Session 2 tries to execute its UPDATE, but this in turn creates a conflict with the UPDATE from Session 1; A deadlock has occurred and is automatically resolved in PostgreSQL® 9.2. In this case, however, the transaction in Session 2 is aborted:

ERROR: deadlock detected
DETAIL: Process 88059 waits for ExclusiveLock on tuple (0,1) of relation 1144033 of database 1029038; blocked by process 88031.
Process 88031 waits for ShareLock on transaction 791327; blocked by process 88059.

In PostgreSQL^® 9.3, there is now a significantly refined locking mechanism for updates on rows and tables with foreign keys, which alleviates the problem mentioned. Here, tuples that do not represent a targeted UPDATE to a foreign key (i.e. the value of a foreign key is not changed) are locked with a weaker lock (FOR NO KEY UPDATE). For the simple checking of a foreign key, PostgreSQL^® also uses the new FOR KEY SHARE lock, which does not conflict with it. In the example shown, this prevents a deadlock; the UPDATE in Session 1 is executed first, and the conflicting UPDATE in Session 2 must wait until Session 1 successfully confirms or rolls back the transaction.

Parallel pg_dump

With the new command line option -j, pg_dump has the ability to back up tables and their data in parallel. This function can significantly speed up a dump of very large databases, as tables can be backed up simultaneously. This only works with the Directory output format (-Fd) of pg_dump. These dumps can then also be restored simultaneously with multiple restore processes using pg_restore.

Updatable Foreign Data Wrapper and postgres_fdw

With PostgreSQL^® 9.3, the API for Foreign Data Wrapper (FDW) for DML commands has been extended. This enables the implementation of updatable FDW modules. At the same time, the Foreign Data Wrapper for PostgreSQL® (postgres_fdw) was integrated as an extension. This now allows transparent access to remote PostgreSQL® instances. Tables appear in the database as local tables and can easily be used, for example, in complex SQL constructs such as JOINs or Views. FDWs will be explained in more detail in a later article at this point.

Event Trigger

A feature that users have been requesting for some time are Event Trigger for DDL (Data Definition Language) operations, such as ALTER TABLE or CREATE TABLE. This enables automatic reactions to certain events via DDL, such as logging certain actions. Currently, actions for ddl_command_start, sql_drop and ddl_command_end are supported. Trigger functions can be created with C or PL/PgSQL. The following example for sql_drop, for example, prevents the deletion of tables:

CREATE OR REPLACE FUNCTION del_event_func() RETURNS event_trigger AS $$
DECLARE
 v_item record;
BEGIN
 FOR v_item IN SELECT * FROM pg_event_trigger_dropped_objects()
 LOOP
 RAISE EXCEPTION 'deletion of object %.% forbidden', v_item.schema_name, v_item.object_name;
 END LOOP;
END;
$$ LANGUAGE plpgsql;
 
CREATE EVENT TRIGGER delete_table_obj ON sql_drop WHEN tag IN ('drop table') EXECUTE PROCEDURE del_event_func();
 
DROP TABLE child ;
FEHLER: deletion of object public.child forbidden

Checksums in tables

With PostgreSQL^® 9.3, it is now possible in environments in which the reliability of the storage solutions used cannot be guaranteed (for example, certain cloud environments) to initialize the database cluster with checksums. PostgreSQL^® organizes tuples in blocks that are 8KB in size by default. These form the smallest unit with which the database works on the storage system. Checksums now provide these blocks with additional information in order to detect storage errors such as corrupt block headers or tuple headers at an early stage. Checksums cannot be activated subsequently, but require the initialization of the physical database directory with initdb and the new command line parameter –data-checksums. This then applies to all database objects such as tables or indexes and has an impact on the speed.

Materialized Views

The new PostgreSQL^® version 9.3 now also contains a basic implementation for Materialized Views. Normal views in PostgreSQL® are not materialized objects in the sense of a table. In a figurative sense, views must be imagined as a kind of macro that PostgreSQL^® applies to the underlying view. For example, a SELECT * FROM <view> is then rewritten to the actual, arbitrarily complex SELECT. However, this has the consequence that with large amounts of data, the result must be planned, executed and materialized again and again. With Materialized Views, views can be created that directly hold the materialized result set. However, the implementation in PostgreSQL^® 9.3 is still very generic. The REFRESH MATERIALIZED VIEW command also blocks all access to the Materialized View.

These are just a few examples from the long list of innovations that have found their way into the new PostgreSQL^® version. With JSON, improvements in streaming replication and some improvements in the configuration such as shared memory usage, many other new features have been added. For those who are interested in detailed explanations of all the new features, the Release Notes are highly recommended. RPM packages for RedHat, CentOS, Scientific Linux and Fedora are available at http://yum.postgresql.org. The PGAPT repository is available for Debian. Exact instructions can be found at http://wiki.postgresql.org/wiki/Apt