planetpostgresql Archives - credativ®

Moodle is a popular Open Source online learning platform. Especially since the beginning of the COVID-19 pandemic the importance of Moodle for schools and universities has further increased. In some states in Germany all schools had to switch to Moodle and other platforms like BigBlueButton in the course of a few days. This leads to scalability problems if suddenly several tens of thousands of pupils need to access Moodle.
Besides scaling the Moodle application itself, the database needs to be considered as well. One of the database options for Moodle is PostgreSQL. In this blog post, we present load-balancing options for Moodle using PostgreSQL.

High-Availability via Patroni

An online learning platform can be considered critical infrastructure from the point of view of the educational system and should be made highly available, in particular the database. A good solution for PostgreSQL is Patroni, we reported on its Debian-integration in the past.

In short, Patroni uses a distributed consensus store (DCS) to elect a leader from a typically 3-node cluster or initiate a failover and elect a new leader in the case of a leader failure, without entering a split-brain scenario. In addition, Patroni provides a REST API used for communication among nodes and from the patronictl program, e.g. to change the Postgres configuration online on all nodes or to initiate a switchover.

Client-solutions for high availability

From Moodle’s perspective, however, it must additionally be ensured that it is connected to the leader, otherwise no write transactions are possible. Traditional high-availability solutions such as Pacemaker use virtual IPs (VIPs) here, which are pivoted to the new primary node in the event of a failover. For Patroni there is the vip-manager project instead, which monitors the leader key in the DCS and sets or removes cluster VIP locally. This is also integrated into Debian as well.

An alternative is to use client-side failover based on PostgreSQL’s libpq library. For this, all cluster members are listed in the connection string and the connection option target_session_attrs=read-write is added. Configured this way, if a connection is broken, the client will try to reach the other nodes until a new primary is found.

Another option is HAProxy, a highly scalable TCP/HTTP load balancer. By performing periodic health checks on Patroni‘s REST API of each node, it can determine the current leader and forward client queries to it.

Moodle database configuration

Moodle’s connection to a PostgreSQL database is configured in config.php, e.g. for a simple stand-alone database:

$CFG->dbtype    = 'pgsql';
$CFG->dblibrary = 'native';
$CFG->dbhost    = '192.168.1.1';
$CFG->dbname    = 'moodle';
$CFG->dbuser    = 'moodle';
$CFG->dbpass    = 'moodle';
$CFG->prefix    = 'mdl_';
$CFG->dboptions = array (
  'dbport' => '',
  'dbsocket' => ''
);

The default port 5432 is used here.

If streaming replication is used, the standbys can additionally be defined as readonly and assigned to an own database user (which only needs read permissions):

$CFG->dboptions = array (
[...]
  'readonly' => [
    'instance' => [
      [
      'dbhost' => '192.168.1.2',
      'dbport' =>  '',
      'dbuser' => 'moodle_safereads',
      'dbpass' => 'moodle'
      ],
      [
      'dbhost' => '192.168.1.3',
      'dbport' =>  '',
      'dbuser' => 'moodle_safereads',
      'dbpass' => 'moodle'
      ]
    ]
  ]
);

Failover/load balancing with libpq

If a highly available Postgres cluster is used with Patroni, the primary, as described above, can be switched to prevent loss of data or shutdown of the system, in case of a failover or switchover incident. Moodle does not provide a way to set generic database options here and thus setting target_session_attrs=read-write directly is not possible. Therefore we developed a patch for this and implemented it in the Moodle tracker. This allows the additional option 'dbfailover' => 1, in the $CFG->dboptions array, which adds the necessary connection option target_session_attrs=read-write. A customized config.php would look like this:

$CFG->dbtype    = 'pgsql';
$CFG->dblibrary = 'native';
$CFG->dbhost    = '192.168.1.1,192.168.1.2,192.168.1.3';
$CFG->dbname    = 'moodle';
$CFG->dbuser    = 'moodle';
$CFG->dbpass    = 'moodle';
$CFG->prefix    = 'mdl_';
$CFG->dboptions = array (
  'dbfailover' => 1,
  'dbport' => '',
  'dbsocket' => '',
  'readonly' => [
    'instance' => [
      [
      'dbhost' => '192.168.1.1',
      'dbport' =>  '',
      'dbuser' => 'moodle_safereads',
      'dbpass' => 'moodle'
      ],
      [
      'dbhost' => '192.168.1.2',
      'dbport' =>  '',
      'dbuser' => 'moodle_safereads',
      'dbpass' => 'moodle'
      ],
      [
      'dbhost' => '192.168.1.3',
      'dbport' =>  '',
      'dbuser' => 'moodle_safereads',
      'dbpass' => 'moodle'
      ]
    ]
  ]
);

Failover/load balancing with HAProxy

If HAProxy is to be used instead, then $CFG->dbhost must be set to the HAProxy host e.g. 127.0.0.1 in case HAProxy is running locally on the Moodle server(s). Moreover a second port (e.g. 65432) can be defined for read queries, which is configured as readonly in $CFG->dboptions, same as the streaming replication standby above. The config.php would then look like this:

$CFG->dbtype    = 'pgsql';
$CFG->dblibrary = 'native';
$CFG->dbhost    = '127.0.0.1';
$CFG->dbname    = 'moodle';
$CFG->dbuser    = 'moodle';
$CFG->dbpass    = 'moodle';
$CFG->prefix    = 'mdl_';
$CFG->dboptions = array (
  'dbport' => '',
  'dbsocket' => '',
  'readonly' => [
    'instance' => [
      'dbhost' => '127.0.0.1',
      'dbport' =>  '65432',
      'dbuser' => 'moodle_safereads',
      'dbpass' => 'moodle'
    ]
  ]
);

The HAProxy configuration file haproxy.cfg can look like the following example:

global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen postgres_write
    bind *:5432
    mode tcp
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 3 on-marked-down shutdown-sessions
    server pg1 192.168.1.1:5432 maxconn 100 check port 8008
    server pg2 192.168.1.2:5432 maxconn 100 check port 8008
    server pg3 192.168.1.3:5432 maxconn 100 check port 8008

HAProxy expects incoming write connections (postgres_write) on port 5432 and forwards them to port 5432 of the cluster members. The primary is determined by an HTTP check on port 8008 (the default Patroni REST API port); Patroni returns status 200 here for the primary and status 503 for standbys.

For read queries (postgres_read), it must be decided whether the primary should also serve read-only queries or not. If this is the case, a simple Postgres check (pgsql-check) can be used; however, this may lead to entries in the PostgreSQL log regarding incorrect or incomplete logins:

listen postgres_read
    bind *:65432
    mode tcp
    balance leastconn
    option pgsql-check user haproxy
    default-server inter 3s fall 3 rise 3 on-marked-down shutdown-sessions
    server pg1 192.168.1.1:5432 check
    server pg2 192.168.1.2:5432 check
    server pg3 192.168.1.3:5432 check

If you don’t want the primary to participate in the read scaling you can simply use the same HTTP check as in the postgres_write section, this time expecting HTTP status 503:

listen postgres_read
    bind *:65432
    mode tcp
    balance leastconn
    option httpchk
    http-check expect status 503
    default-server inter 3s fall 3 rise 3 on-marked-down shutdown-sessions
    server pg1 192.168.1.1:5432 check port 8008
    server pg2 192.168.1.2:5432 check port 8008
    server pg3 192.168.1.3:5432 check port 8008

Revised Ansible playbook

HAProxy support has also been implemented in version 0.3 of our Ansible playbooks for automated setup of a three-node PostgreSQL Patroni cluster on Debian. The new variable haproxy_primary_read_scale can be used to decide whether HAProxy should also issue requests on the read-only port to the primary node or only to the followers.

We are happy to help!

Whether it’s PostgreSQL, Patroni, HAProxy, Moodle, or any other open source software; with over 22+ years of development and service experience in the open source space, credativ GmbH can assist you with unparalleled and individually customizable support. We are there to help and assist you in all your open source infrastructure needs – if desired 24 hours a day, 365 days a year!

We look forward to hearing from you.

SQLreduce: Reduce verbose SQL queries to minimal examples

Developers often face very large SQL queries that raise some errors. SQLreduce is a tool to reduce that complexity to a minimal query.

SQLsmith generates random SQL queries

SQLsmith is a tool that generates random SQL queries and runs them against a PostgreSQL server (and other DBMS types). The idea is that by fuzz-testing the query parser and executor, corner-case bugs can be found that would otherwise go unnoticed in manual testing or with the fixed set of test cases in PostgreSQL’s regression test suite. It has proven to be an effective tool with over 100 bugs found in different areas in the PostgreSQL server and other products since 2015, including security bugs, ranging from executor bugs to segfaults in type and index method implementations. For example, in 2018, SQLsmith found that the following query triggered a segfault in PostgreSQL:

select
  case when pg_catalog.lastval() < pg_catalog.pg_stat_get_bgwriter_maxwritten_clean() then case when pg_catalog.circle_sub_pt(
          cast(cast(null as circle) as circle),
          cast((select location from public.emp limit 1 offset 13)
             as point)) ~ cast(nullif(case when cast(null as box) &> (select boxcol from public.brintest limit 1 offset 2)
                 then (select f1 from public.circle_tbl limit 1 offset 4)
               else (select f1 from public.circle_tbl limit 1 offset 4)
               end,
          case when (select pg_catalog.max(class) from public.f_star)
                 ~~ ref_0.c then cast(null as circle) else cast(null as circle) end
            ) as circle) then ref_0.a else ref_0.a end
       else case when pg_catalog.circle_sub_pt(
          cast(cast(null as circle) as circle),
          cast((select location from public.emp limit 1 offset 13)
             as point)) ~ cast(nullif(case when cast(null as box) &> (select boxcol from public.brintest limit 1 offset 2)
                 then (select f1 from public.circle_tbl limit 1 offset 4)
               else (select f1 from public.circle_tbl limit 1 offset 4)
               end,
          case when (select pg_catalog.max(class) from public.f_star)
                 ~~ ref_0.c then cast(null as circle) else cast(null as circle) end
            ) as circle) then ref_0.a else ref_0.a end
       end as c0,
  case when (select intervalcol from public.brintest limit 1 offset 1)
         >= cast(null as "interval") then case when ((select pg_catalog.max(roomno) from public.room)
             !~~ ref_0.c)
        and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end
       else case when ((select pg_catalog.max(roomno) from public.room)
             !~~ ref_0.c)
        and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end
       end as c1,
  ref_0.a as c2,
  (select a from public.idxpart1 limit 1 offset 5) as c3,
  ref_0.b as c4,
    pg_catalog.stddev(
      cast((select pg_catalog.sum(float4col) from public.brintest)
         as float4)) over (partition by ref_0.a,ref_0.b,ref_0.c order by ref_0.b) as c5,
  cast(nullif(ref_0.b, ref_0.a) as int4) as c6, ref_0.b as c7, ref_0.c as c8
from
  public.mlparted3 as ref_0
where true;

However, just like in this 40-line, 2.2kB example, the random queries generated by SQLsmith that trigger some error are most often very large and contain a lot of noise that does not contribute to the error. So far, manual inspection of the query and tedious editing was required to reduce the example to a minimal reproducer that developers can use to fix the problem.

Reduce complexity with SQLreduce

This issue is solved by SQLreduce. SQLreduce takes as input an arbitrary SQL query which is then run against a PostgreSQL server. Various simplification steps are applied, checking after each step that the simplified query still triggers the same error from PostgreSQL. The end result is a SQL query with minimal complexity.

SQLreduce is effective at reducing the queries from original error reports from SQLsmith to queries that match manually-reduced queries. For example, SQLreduce can effectively reduce the above monster query to just this:

SELECT pg_catalog.stddev(NULL) OVER () AS c5 FROM public.mlparted3 AS ref_0

Note that SQLreduce does not try to derive a query that is semantically identical to the original, or produces the same query result – the input is assumed to be faulty, and we are looking for the minimal query that produces the same error message from PostgreSQL when run against a database. If the input query happens to produce no error, the minimal query output by SQLreduce will just be SELECT.

How it works

We’ll use a simpler query to demonstrate how SQLreduce works and which steps are taken to remove noise from the input. The query is bogus and contains a bit of clutter that we want to remove:

$ psql -c 'select pg_database.reltuples / 1000 from pg_database, pg_class where 0 < pg_database.reltuples / 1000 order by 1 desc limit 10'
ERROR:  column pg_database.reltuples does not exist

Let’s pass the query to SQLreduce:

$ sqlreduce 'select pg_database.reltuples / 1000 from pg_database, pg_class where 0 < pg_database.reltuples / 1000 order by 1 desc limit 10'

SQLreduce starts by parsing the input using pglast and libpg_query which expose the original PostgreSQL parser as a library with Python bindings. The result is a parse tree that is the basis for the next steps. The parse tree looks like this:

selectStmt
├── targetList
│   └── /
│       ├── pg_database.reltuples
│       └── 1000
├── fromClause
│   ├── pg_database
│   └── pg_class
├── whereClause
│   └── <
│       ├── 0
│       └── /
│           ├── pg_database.reltuples
│           └── 1000
├── orderClause
│   └── 1
└── limitCount
    └── 10

Pglast also contains a query renderer that can render back the parse tree as SQL, shown as the regenerated query below. The input query is run against PostgreSQL to determine the result, in this case ERROR: column pg_database.reltuples does not exist.

Input query: select pg_database.reltuples / 1000 from pg_database, pg_class where 0 < pg_database.reltuples / 1000 order by 1 desc limit 10
Regenerated: SELECT pg_database.reltuples / 1000 FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ORDER BY 1 DESC LIMIT 10
Query returns: ✔ ERROR:  column pg_database.reltuples does not exist

SQLreduce works by deriving new parse trees that are structurally simpler, generating SQL from that, and run these queries against the database. The first simplification steps work on the top level node, where SQLreduce tries to remove whole subtrees to quickly find a result. The first reduction tried is to remove LIMIT 10:

SELECT pg_database.reltuples / 1000 FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ORDER BY 1 DESC ✔

The query result is still ERROR: column pg_database.reltuples does not exist, indicated by a ✔ check mark. Next, ORDER BY 1 is removed, again successfully:

SELECT pg_database.reltuples / 1000 FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ✔

Now the entire target list is removed:

SELECT FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ✔

This shorter query is still equivalent to the original regarding the error message returned when it is run against the database. Now the first unsuccessful reduction step is tried, removing the entire FROM clause:

SELECT WHERE 0 < ((pg_database.reltuples / 1000)) ✘ ERROR:  missing FROM-clause entry for table "pg_database"

That query is also faulty, but triggers a different error message, so the previous parse tree is kept for the next steps. Again a whole subtree is removed, now the WHERE clause:

SELECT FROM pg_database, pg_class ✘ no error

We have now reduced the input query so much that it doesn’t error out any more. The previous parse tree is still kept which now looks like this:

selectStmt
├── fromClause
│   ├── pg_database
│   └── pg_class
└── whereClause
    └── <
        ├── 0
        └── /
            ├── pg_database.reltuples
            └── 1000

Now SQLreduce starts digging into the tree. There are several entries in the FROM clause, so it tries to shorten the list. First, pg_database is removed, but that doesn’t work, so pg_class is removed:

SELECT FROM pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ✘ ERROR:  missing FROM-clause entry for table "pg_database"
SELECT FROM pg_database WHERE 0 < ((pg_database.reltuples / 1000)) ✔

Since we have found a new minimal query, recursion restarts at top-level with another try to remove the WHERE clause. Since that doesn’t work, it tries to replace the expression with NULL, but that doesn’t work either.

SELECT FROM pg_database ✘ no error
SELECT FROM pg_database WHERE NULL ✘ no error

Now a new kind of step is tried: expression pull-up. We descend into WHERE clause, where we replace A < B first by A and then by B.

SELECT FROM pg_database WHERE 0 ✘ ERROR:  argument of WHERE must be type boolean, not type integer
SELECT FROM pg_database WHERE pg_database.reltuples / 1000 ✔
SELECT WHERE pg_database.reltuples / 1000 ✘ ERROR:  missing FROM-clause entry for table "pg_database"

The first try did not work, but the second one did. Since we simplified the query, we restart at top-level to check if the FROM clause can be removed, but it is still required.

From A / B, we can again pull up A:

SELECT FROM pg_database WHERE pg_database.reltuples ✔
SELECT WHERE pg_database.reltuples ✘ ERROR:  missing FROM-clause entry for table "pg_database"

SQLreduce has found the minimal query that still raises ERROR: column pg_database.reltuples does not exist with this parse tree:

selectStmt
├── fromClause
│   └── pg_database
└── whereClause
    └── pg_database.reltuples

At the end of the run, the query is printed along with some statistics:

Minimal query yielding the same error:
SELECT FROM pg_database WHERE pg_database.reltuples

Pretty-printed minimal query:
SELECT
FROM pg_database
WHERE pg_database.reltuples

Seen: 15 items, 915 Bytes
Iterations: 19
Runtime: 0.107 s, 139.7 q/s

This minimal query can now be inspected to fix the bug in PostgreSQL or in the application.

About credativ

The credativ GmbH is a manufacturer-independent consulting and service company located in Moenchengladbach, Germany. With over 22+ years of development and service experience in the open source space, credativ GmbH can assist you with unparalleled and individually customizable support. We are here to help and assist you in all your open source infrastructure needs.

Since the successful merger with Instaclustr in 2021, credativ GmbH has been the European headquarters of the Instaclustr Group, which helps organizations deliver applications at scale through its managed platform for open source technologies such as Apache Cassandra®, Apache Kafka®, Apache Spark™, Redis™, OpenSearch®, PostgreSQL®, and Cadence.
Instaclustr combines a complete data infrastructure environment with hands-on technology expertise to ensure ongoing performance and optimization. By removing the infrastructure complexity, we enable companies to focus internal development and operational resources on building cutting edge customer-facing applications at lower cost. Instaclustr customers include some of the largest and most innovative Fortune 500 companies.