HowTos Archiv - credativ®

Foreman is clearly rooted in an IPv4 world. You can see it everywhere. Like it does not want to give you a host without IPv4, but without IPv6 is fine. But it does not have to be this way, so let’s go together on a journey to bring Foreman and the things around it into the present.
(more…)

ProxCLMC – What is it?

Live migration is one of the most powerful and frequently used functions in a Proxmox VE Cluster. However, it presupposes a requirement that is often underestimated: consistent CPU compatibility across all nodes. In real-world environments, clusters rarely consist of identical hardware where you could simply use the CPU type host. Nodes are added over time, CPU generations differ, and functional scopes continue to evolve. Although Proxmox VE allows a flexible CPU configuration, determining a safe and optimal CPU baseline for the entire cluster has so far been largely a manual and experience-based task.

ProxCLMC (Prox CPU Live Migration Checker) was developed by our colleague Florian Paul Azim Hoberg (also known as gyptazy) in Rust as an open-source solution under the GPLv3 license to close this gap in a simple, automated, and reproducible way. The tool examines all nodes of a Proxmox VE cluster, subsequently analyzes their CPU capabilities, and calculates the highest possible CPU compatibility level supported by each node. Instead of relying on assumptions, spreadsheets, or trial and error, administrators receive a clear and deterministic result that can be used directly when selecting VM CPU models.

Comparable mechanisms already exist in other virtualization ecosystems. Enterprise platforms often offer integrated tools or automatic assistance to detect compatible CPU baselines and prevent invalid live migration configurations. Proxmox VE currently does not have such an automated detection mechanism, so administrators must manually compare CPU flags or rely on operational experience. ProxCLMC closes this gap by providing a cluster-wide CPU compatibility analysis specifically tailored to Proxmox environments.

How does ProxCLMC work?

ProxCLMC is designed to integrate seamlessly into existing Proxmox VE clusters without requiring additional services, agents, or configuration changes. It is written entirely in Rust (fully open source under GPLv3), compiled as a static binary, and provided as a Debian package via the gyptazy repository to facilitate easy installation. The workflow follows a clear and transparent process that reflects how administrators think about CPU compatibility but automates it reliably and reproducibly.

After starting, the tool parses the local corosync.conf on the node on which it is running. This allows ProxCLMC to automatically detect all members of the cluster without relying on external inventories or manual input. The determined node list thus always corresponds to the actual state of the cluster.

Once all cluster nodes are identified, ProxCLMC establishes an SSH connection to each node. Via this connection, it remotely reads the content of /proc/cpuinfo. This file provides a detailed and authoritative view of the CPU capabilities provided by the host kernel, including the complete set of supported CPU flags.

From the collected data, ProxCLMC extracts the relevant CPU flags and evaluates them based on clearly defined x86-64 CPU baseline definitions. These baselines are directly based on the CPU models supported by Proxmox VE and QEMU, including:

By mapping the CPU flags of each node to these standardized baselines, ProxCLMC can determine which CPU levels are supported per node. The tool then calculates the lowest common CPU type shared by all nodes in the cluster. This resulting baseline represents the maximum CPU compatibility level that can be safely used for virtual machines while still allowing unrestricted live migrations between all nodes. To get a general idea of the output:


test-pmx01 | 10.10.10.21 | x86-64-v3
test-pmx02 | 10.10.10.22 | x86-64-v3
test-pmx03 | 10.10.10.23 | x86-64-v4
Cluster CPU type: x86-64-v3

With this approach, ProxCLMC brings automated CPU compatibility checking to Proxmox VE-based clusters. Comparable concepts are already known from other virtualization platforms, such as VMware EVC, where CPU compatibility is enforced cluster-wide to ensure secure migrations. ProxCLMC transfers this basic idea to Proxmox environments but implements it in a lightweight, transparent, and completely open manner, thus integrating seamlessly into existing operating procedures and workflows.

Installation of ProxCLMC

ProxCLMC was developed with the goal of enabling easy deployment and integrating cleanly into existing Proxmox VE environments. It can be used directly from the source code or installed as a packaged Debian binary, making it suitable for both development and production environments.

The complete source code is publicly available on GitHub and can be accessed at the following address:
https://github.com/gyptazy/ProxCLMC

This enables complete transparency, auditability, and the ability to adapt or build the tool to individual requirements.

Prerequisites and Dependencies

Before installing ProxCLMC, the following prerequisites must be met:

ProxCLMC uses SSH to remotely examine each node and read CPU information. Passwordless SSH authentication is therefore recommended to ensure smooth and automated execution.

Installation via the Debian Repository

The recommended way to install ProxCLMC on Debian-based systems, including Proxmox VE, is via the Debian repository provided by gyptazy. This repository is also used to distribute the ProxLB project and integrates seamlessly into the usual package management workflows.

To add the repository and install ProxCLMC, execute the following commands:

echo "deb https://repo.gyptazy.com/stable /" > /etc/apt/sources.list.d/proxlb.list
wget -O /etc/apt/trusted.gpg.d/proxlb.asc https://repo.gyptazy.com/repository.gpg
apt-get update && apt-get -y install proxclmc

Using the repository ensures that ProxCLMC can be easily installed, updated, and managed together with other system packages.

Installation via a Debian Package

Alternatively, ProxCLMC can also be installed manually via a pre-built Debian package. This is particularly useful for environments without direct repository access or for offline installations.

The package can be downloaded directly from the gyptazy CDN and installed with dpkg:

wget https://cdn.gyptazy.com/debian/proxclmc/proxclmc_1.0.0_amd64.deb
dpkg -i proxclmc_1.0.0_amd64.deb

This method offers the same functionality as installation via the repository, but without automatic updates.

Conclusion

ProxCLMC exemplifies how quickly gaps in the open-source virtualization ecosystem can be closed when real operational requirements are directly addressed. Similar to the ProxLB project (GitHub), which provides advanced scheduling and balancing functions for Proxmox VE-based clusters, ProxCLMC focuses on a very specific but critical area that was previously largely left to manual processes and experience.

By introducing automated CPU compatibility detection, ProxCLMC brings a functionality to Proxmox VE clusters that is commonly expected in enterprise virtualization platforms but has not yet been available in automated form. It shows that open-source solutions are not limited by missing functions but rather offer the freedom to extend and adapt platforms exactly where it is most important.

With ProxCLMC, operators can now automatically determine the most suitable CPU type for virtual machines in a Proxmox VE cluster, ensuring secure live migrations and consistent behavior across all nodes. Together with projects like ProxLB, this underscores the strength of the open-source model: missing enterprise functions can be transparently added, adapted to real requirements, and shared with the community to continuously improve the Proxmox ecosystem. Should you also require further adjustments or developments around or for Proxmox VE, we would be happy to support you in the realization! Do not hesitate to contact us – we would be happy to advise you on your project!

Efficient Storage Automation in Proxmox with the proxmox_storage Module

Managing various storage systems in Proxmox environments often involves recurring tasks. Whether it’s creating new storage, connecting NFS, CIFS shares, iSCSI, or integrating more complex backends like CephFS or Proxmox Backup Server, in larger environments with multiple nodes or entire clusters, this can quickly become time-consuming, error-prone, and difficult to track.

With Ansible, these processes can be efficiently automated and standardized. Instead of manual configurations, Infrastructure as Code ensures a clear structure, reproducibility, and traceability of all changes. Similar to the relatively new module proxmox_cluster, which automates the creation and joining of Proxmox nodes to clusters, this now applies analogously to storage systems. This is precisely where the Ansible module proxmox_storage, developed by our highly esteemed colleague Florian Paul Azim Hoberg (also well-known in the open-source community as gyptazy), comes into play. It enables the simple and flexible integration of various storage types directly into Proxmox nodes and clusters, automated, consistent, and repeatable at any time. The module is already part of the Ansible Community.Proxmox Collections and has been included in the collections since version 1.3.0.

This makes storage management in Proxmox not only faster and more secure, but also seamlessly integrates into modern automation workflows.

Ansible Module: proxmox_storage

The proxmox_storage module is an Ansible module developed in-house at credativ for the automated management of storage in Proxmox VE. It supports various storage types such as NFS, CIFS, iSCSI, CephFS, and Proxmox Backup Server.

The module allows you to create new storage resources, adjust existing configurations, and completely automate the removal of no longer needed storage. Its integration into Ansible Playbooks enables idempotent and reproducible storage management in Proxmox nodes and clusters. The module simplifies complex configurations and reduces sources of error that can occur during manual setup.

Add iSCSI Storage

Integrating iSCSI storage into Proxmox enables centralized access to block-based storage that can be flexibly used by multiple nodes in the cluster. By using the proxmox_storage module, the connection can be configured automatically and consistently, which saves time and prevents errors during manual setup.

- name: Add iSCSI storage to Proxmox VE Cluster
  community.proxmox.proxmox_storage:
  api_host: proxmoxhost
  api_user: root@pam
  api_password: password123
  validate_certs: false
  nodes: ["de-cgn01-virt01", "de-cgn01-virt02", "de-cgn01-virt03"]
  state: present
  type: iscsi
  name: net-iscsi01
  iscsi_options:
  portal: 10.10.10.94
  target: "iqn.2005-10.org.freenas.ctl:s01-isci01"
  content: ["rootdir", "images"]

The integration takes place within a single task, where the consuming nodes and the iSCSI-relevant information are defined. It is also possible to define for which “content” this storage should be used.

Add Proxmox Backup Server

The Proxmox Backup Server (PBS) is also considered storage in Proxmox VE and can therefore be integrated into the environment just like other storage types. With the proxmox_storage module, a PBS can be easily integrated into individual nodes or entire clusters, making backups available centrally, consistently, and automatically.

- name: Add PBS storage to Proxmox VE Cluster
  community.proxmox.proxmox_storage:
  api_host: proxmoxhost
  api_user: root@pam
  api_password: password123
  validate_certs: false
  nodes: ["de-cgn01-virt01", "de-cgn01-virt02"]
  state: present
  name: backup-backupserver01
  type: pbs
  pbs_options:
  server: proxmox-backup-server.example.com
  username: backup@pbs
  password: password123
  datastore: backup
  fingerprint: "F3:04:D2:C1:33:B7:35:B9:88:D8:7A:24:85:21:DC:75:EE:7C:A5:2A:55:2D:99:38:6B:48:5E:CA:0D:E3:FE:66"
  export: "/mnt/storage01/b01pbs01"
  content: ["backup"]

Note: It is important to consider the fingerprint of the Proxmox Backup Server system that needs to be defined. This is always relevant if the instance’s associated certificate was not issued by a trusted root CA. If you are using and legitimizing your own root CA, this definition is not necessary. .

Remove Storage

No longer needed or outdated storage can be removed just as easily from Proxmox VE. With the proxmox_storage module, this process is automated and performed idempotently, ensuring that the cluster configuration remains consistent and unused resources are cleanly removed. A particular advantage is evident during storage migrations, as old storage can be removed in a controlled manner after successful data transfer. This way, environments can be gradually modernized without manual intervention or unnecessary configuration remnants remaining in the cluster.

- name: Remove storage from Proxmox VE Cluster
  community.proxmox.proxmox_storage:
  api_host: proxmoxhost
  api_user: root@pam
  api_password: password123
  validate_certs: false
  state: absent
  name: net-nfsshare01
  type: nfs

Conclusion

The example of automated storage integration with Ansible and Proxmox impressively demonstrates the advantages and extensibility of open-source solutions. Open-source products like Proxmox VE and Ansible can be flexibly combined, offering an enormous range of applications that also prove their worth in enterprise environments.

A decisive advantage is the independence from individual manufacturers, meaning companies do not have to fear vendor lock-in and retain more design freedom in the long term. At the same time, it becomes clear that the successful implementation of such scenarios requires sound knowledge and experience to optimally leverage the possibilities of open source.

While this only covers a partial area, our colleague Florian Paul Azim Hoberg (gyptazy) impressively demonstrates here in his video “Proxmox Cluster Fully Automated: Cluster Creation, NetApp Storage & SDN Networking with Ansible” what full automation with Proxmox can look like.

This is precisely where we stand by your side as your partner and are happy to support you in the areas of automation, development and all questions relating to Proxmox and modern infrastructures. Please do not hesitate to contact us – we will be happy to advise you!

Automated Proxmox Subscription Handling with Ansible

When deploying Proxmox VE in enterprise environments, whether for new locations, expanding existing clusters, or migrating from platforms like VMware, automation becomes essential. These scenarios typically involve rolling out dozens or even hundreds of nodes across multiple sites. Manually activating subscriptions through the Proxmox web interface is not practical at this scale.

To ensure consistency and efficiency, every part of the deployment process should be automated from the beginning. This includes not just the installation and configuration of nodes, automated cluster creation, but also the activation of the Proxmox subscription. In the past, this step often required manual interaction, which slowed down provisioning and introduced unnecessary complexity.

Now there is a clean solution to this. With the introduction of the new Ansible module proxmox_node, the subscription management is fully integrated. This module allows you to handle subscription activation as part of your Ansible playbooks, making it possible to automate the entire process without ever needing to open the web interface.

This improvement is particularly valuable for mass deployments, where reliability and repeatability matter most. Every node can now be automatically configured, licensed, and production-ready right after boot. It is a great example of how Proxmox VE continues to evolve into a more enterprise-friendly platform, while still embracing the flexibility and openness that sets it apart.

Ansible Module: proxmox_node

With automation becoming more critical in modern IT operations, managing Proxmox VE infrastructure through standardized tools like Ansible has become a common practice. Until now, while there were various community modules available to interact with Proxmox resources, node-level management often required custom workarounds or direct SSH access. That gap has now been closed with the introduction of the new proxmox_node module.

This module was developed by our team at credativ GmbH, specifically by our colleague known in the community under the handle gyptazy. It has been contributed upstream and is already part of the official Ansible Community Proxmox collection, available to anyone using the collection via Ansible Galaxy or automation controller integrations.

The proxmox_node module focuses on tasks directly related to the lifecycle and configuration of a Proxmox VE node. What makes this module particularly powerful is that it interacts directly with the Proxmox API, without requiring any SSH access to the node. This enables a cleaner, more secure, and API-driven approach to automation.

The module currently supports several key features that are essential in real-world operations:

By bringing all of this functionality into a single, API-driven Ansible module, the process of managing Proxmox nodes becomes much more reliable and maintainable. You no longer need to script around pveproxy with shell commands or use SSH just to manage node settings.

Subscription Integration Example

Adding a subscription to a Proxmox VE node is as simple as the following task. While this shows the easiest way for a single node, this can also be used in a loop over a dictionary holding the related subscriptions for each node.

- name: Place a subscription license on a Proxmox VE Node
  community.proxmox.node:
    api_host: proxmoxhost
    api_user: gyptazy@pam
    api_password: password123
    validate_certs: false
    node_name: de-cgn01-virt01
    subscription:
        state: present
        key: ABCD-EFGH-IJKL-MNOP-QRST-UVWX-YZ0123456789

Conclusion

For us at credativ, this module fills a real gap in the automation landscape around Proxmox and demonstrates how missing features in open-source projects can be addressed effectively by contributing upstream. It also reinforces the broader movement of managing infrastructure declaratively, where configuration is versioned, documented, and easily reproducible.

In combination with other modules from the community Proxmox collection like our recent proxmox_cluster module, proxmox_node helps complete the picture of a fully automated Proxmox VE environment — from cluster creation and VM provisioning to node configuration and licensing. If you’re looking for help or assistance for creating Proxmox VE based virtualization infrastructures, automation or custom development to fit your needs, we’re always happy to help! Feel free to contact us at any time.

Efficient Proxmox Cluster Deployment through Automation with Ansible

Manually setting up and managing servers is usually time-consuming, error-prone, and difficult to scale. This becomes especially evident during large-scale rollouts, when building complex infrastructures, or during the migration from other virtualization environments. In such cases, traditional manual processes quickly reach their limits. Consistent automation offers an effective and sustainable solution to these challenges.

Proxmox is a powerful virtualization platform known for its flexibility and comprehensive feature set. When combined with Ansible, a lightweight and agentless automation tool, the management of entire system landscapes becomes significantly more efficient. Ansible allows for the definition of reusable configurations in the form of playbooks, ensuring that deployment processes are consistent, transparent, and reproducible.

To enable fully automated deployment of Proxmox clusters, our team member, known in the open-source community under the alias gyptazy, has developed a dedicated Ansible module called proxmox_cluster. This module handles all the necessary steps to initialize a Proxmox cluster and add additional nodes. It has been officially included in the upstream Ansible Community Proxmox collection and is available for installation via Ansible Galaxy starting with version 1.1.0. As a result, the manual effort required for cluster deployment is significantly reduced. Further insights can be found in his blog post titled How My BoxyBSD Project Boosted the Proxmox Ecosystem“.

By adopting this solution, not only can valuable time be saved, but a solid foundation for scalable and low-maintenance infrastructure is also established. Unlike fragile task-based approaches that often rely on Ansible’s shell or command modules, this solution leverages the full potential of the Proxmox API through a dedicated module. As a result, it can be executed in various scopes and does not require SSH access to the target systems.

This automated approach makes it possible to deploy complex setups efficiently while laying the groundwork for stable and future-proof IT environments. Such environments can be extended at a later stage and are built according to a consistent and repeatable structure.

Benefits

Using the proxmox_cluster module for Proxmox cluster deployment brings several key advantages to modern IT environments. The focus lies on secure, flexible, and scalable interaction with the Proxmox API, improved error handling, and simplified integration across various use cases:

Ansible Proxmox Module: proxmox_cluster

The newly added proxmox_cluster module in Ansible significantly simplifies the automated provisioning of Proxmox VE clusters. With just a single task, it enables the seamless creation of a complete cluster, reducing complexity and manual effort to a minimum.

Creating a Cluster

Creating a cluster requires now only a single task in Ansible by using the proxmox_cluster module:

- name: Create a Proxmox VE Cluster
  community.proxmox.proxmox_cluster:
    state: present
    api_host: proxmoxhost
    api_user: root@pam
    api_password: password123
    api_ssl_verify: false
    link0: 10.10.1.1
    link1: 10.10.2.1
    cluster_name: "devcluster"

Afterwards, the cluster is created and additional Proxmox VE nodes can join the cluster.

Joining a Cluster

Additional nodes can now also join the cluster using a single task. When combined with the use of a dynamic inventory, it becomes easy to iterate over a list of nodes from a defined group and add them to the cluster within a loop. This approach enables the rapid deployment of larger Proxmox clusters in an efficient and scalable manner.

- name: Join a Proxmox VE Cluster
  community.proxmox.proxmox_cluster:
    state: present
    api_host: proxmoxhost
    api_user: root@pam
    api_password: password123
    master_ip: "{{ primary_node }}"
    fingerprint: "{{ cluster_fingerprint }}"
    cluster_name: “devcluster"

Cluster Join Informationen

In order for a node to join a Proxmox cluster, the cluster’s join information is generally required. To avoid defining this information manually for each individual cluster, this step can also be automated. As part of this feature, a new module called cluster_join_info has been introduced. It allows the necessary data to be retrieved automatically via the Proxmox API and made available for further use in the automation process.

- name: List existing Proxmox VE cluster join information
  community.proxmox.proxmox_cluster_join_info:
    api_host: proxmox1
    api_user: root@pam
    api_password: "{{ password | default(omit) }}"
    api_token_id: "{{ token_id | default(omit) }}"
    api_token_secret: "{{ token_secret | default(omit) }}"
  register: proxmox_cluster_join

Conclusion

While automation in the context of virtualization technologies is often focused on the provisioning of guest systems or virtual machines (VMs), this approach demonstrates that automation can be applied at a much deeper level within the underlying infrastructure. It is also possible to fully automate scenarios in which nodes are initially deployed using a customer-specific image with Proxmox VE preinstalled, and then proceed to automatically create the cluster.

As an official Proxmox partner, we are happy to support you in implementing a comprehensive automation strategy tailored to your environment and based on Proxmox products. You can contact us at any time!

 

Introduction

Puppet is a software configuration management solution to manage IT infrastructure. One of the first things to be learnt about Puppet is its domain-specific language – the Puppet-DSL – and the concepts that come with it.

Users can organize their code in classes and modules and use pre-defined resource types to manage resources like files, packages, services and others.

The most commonly used types are part of the Puppet core, implemented in Ruby. Composite resource types may be defined via Puppet-DSL from already known types by the Puppet user themself, or imported as part of an external Puppet module which is maintained by external module developers.

It so happens that Puppet users can stay within the Puppet-DSL for a long time even when they deal with Puppet on a regular basis.

The first time I had a glimpse into this topic was when Debian Stable was shipping Puppet 5.5, which was not too long ago. The Puppet 5.5 documentation includes a chapter on custom types and provider development respectively, but to me they felt incomplete and lacking self contained examples. Apparently I was not the only one feeling that way, even though Puppet’s gem documentation is a good overview of what is possible in principle.

Gary Larizza’s blog post was more than ten years ago. I had another look into the documentation for Puppet 7 on that topic recently, as this is the Puppet version in current’s Debian Stable.

The Puppet 5.5 way to type & provider development is now called the low level method, and its documentation has not changed significantly. However, Puppet 6 upwards recommends a new method to create custom types & providers via the so-called Resource-API, whose documentation is a major improvement compared to the low-level method’s. The Resource-API is not a replacement, though, and has several documented limitations.

Nevertheless, for the remaining blog post, we will re-prototype a small portion of files functionality using the low-level method, as well as the Resource-API, namely the ensure and content properties.

Preparations

The following preparations are not necessary in an agent-server setup. We use bundle to obtain a puppet executable for this demo.

demo@85c63b50bfa3:~$ cat > Gemfile <<EOF
source 'https://rubygems.org'

gem 'puppet', '>= 6'
EOF
demo@85c63b50bfa3:~$ bundle install
Fetching gem metadata from https://rubygems.org/........
Resolving dependencies...
...
Installing puppet 8.10.0
Bundle complete! 1 Gemfile dependency, 17 gems now installed.
Bundled gems are installed into `./.vendor`
demo@85c63b50bfa3:~$ cat > file_builtin.pp <<EOF
$file = '/home/demo/puppet-file-builtin'
file {$file: content => 'This is madness'}
EOF
demo@85c63b50bfa3:~$ bin/puppet apply file_builtin.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.01 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-builtin]/ensure: defined content as '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: Applied catalog in 0.04 seconds

demo@85c63b50bfa3:~$ sha256sum /home/demo/puppet-file-builtin
0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e /home/demo/puppet-file-builtin

Keep in mind the state change information printed by Puppet.

Low-level prototype

Custom types and providers that are not installed via a Gem need to be part of some Puppet module, so they can be copied to Puppet agents via the pluginsync mechanism.

A common location for Puppet modules is the modules directory inside a Puppet environment. For this demo, we declare a demo module.

Basic functionality

Our first attempt is the following type definition for a new type we will call file_llmethod. It has no documentation or validation of input values.

# modules/demo/lib/puppet/type/file_llmethod.rb

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) {}
  newproperty(:content) {}
  newproperty(:ensure) do
    newvalues(:present, :absent)
    defaultto(:present)
  end
end

We have declared a path parameter that serves as the namevar for this type – there cannot be other file_llmethod instances managing the same path. The ensure property is restricted to two values and defaults to present.

The following provider implementation consists of a getter and a setter for each of the two properties content and ensure.

# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  def ensure
    File.exist?(@resource[:path]) ? :present : :absent
  end

  def ensure=(value)
    if value == :present
      # reuse setter
      self.content=(@resource[:content])
    else
      File.unlink(@resource[:path])
    end
  end

  def content
    File.read(@resource[:path])
  end

  def content=(value)
    File.write(@resource[:path], value)
  end
end

This gives us the following:

demo@85c63b50bfa3:~$ cat > file_llmethod_create.pp <<EOF
$file = '/home/demo/puppet-file-lowlevel-method-create'
file {$file: ensure => absent} ->
file_llmethod {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.01 seconds
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined 'ensure' as 'present'

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.01 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined 'ensure' as 'present'

demo@85c63b50bfa3:~$ cat > file_llmethod_change.pp <<EOF
$file = '/home/demo/puppet-file-lowlevel-method-change'
file {$file: content => 'This is madness'} ->
file_llmethod {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-change]/ensure: defined content as '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change]/content: content changed 'This is madness' to 'This is Sparta!'

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change]/content: content changed 'This is madness' to 'This is Sparta!'

Our custom type already kind of works, even though we have not implemented any explicit comparison of is- and should-state. Puppet does this for us based on the Puppet catalog and the property getter return values. Our defined setters are also invoked by Puppet on demand, only.

We can also see that the ensure state change notice is defined 'ensure' as 'present' and does not incorporate the desired content in any way, while the content state change notice shows plain text. Both tell us that the SHA256 checksum from the file_builtin.pp example is already something non-trivial.

Validating input

As a next step we add validation for path and content.

# modules/demo/lib/puppet/type/file_llmethod.rb

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
  end

  newproperty(:ensure) do
    newvalues(:present, :absent)
    defaultto(:present)
  end
end

Failed validations will look like these:

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules --exec 'file_llmethod {"./relative/path": }'
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Error: Parameter path failed on File_llmethod[./relative/path]: ./relative/path is not an absolute path (line: 1)

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules --exec 'file_llmethod {"/absolute/path": content => 42}'
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Error: Parameter content failed on File_llmethod[/absolute/path]: 42 is not a String (line: 1)

Content Checksums

We override change_to_s so that state changes include content checksums:

# modules/demo/lib/puppet/type/file_llmethod.rb

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    define_method(:change_to_s) do |currentvalue, newvalue|
      old = "{sha256}#{Digest::SHA256.hexdigest(currentvalue)}"
      new = "{sha256}#{Digest::SHA256.hexdigest(newvalue)}"
      "content changed '#{old}' to '#{new}'"
    end
  end

  newproperty(:ensure) do
    define_method(:change_to_s) do |currentvalue, newvalue| if currentvalue == :absent should = @resource.property(:content).should digest = "{sha256}#{Digest::SHA256.hexdigest(should)}" "defined content as '#{digest}'" else super(currentvalue, newvalue) end end newvalues(:present, :absent) defaultto(:present) end end

The above type definition yields:

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined content as '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change]/content: content changed '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e' to '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

Improving memory footprint

So far so good. While our current implementation apparently works, it has at least one major flaw. If the managed file already exists, the provider stores the file’s whole content in memory.

demo@85c63b50bfa3:~$ cat > file_llmethod_change_big.pp <<EOF
$file = '/home/demo/puppet-file-lowlevel-method-change_big'
file_llmethod {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ rm -f /home/demo/puppet-file-lowlevel-method-change_big
demo@85c63b50bfa3:~$ ulimit -Sv 200000

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change_big.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change_big]/ensure: defined content as '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'
Notice: Applied catalog in 0.02 seconds

demo@85c63b50bfa3:~$ dd if=/dev/zero of=/home/demo/puppet-file-lowlevel-method-change_big seek=8G bs=1 count=1
1+0 records in
1+0 records out
1 byte copied, 8.3047e-05 s, 12.0 kB/s

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change_big.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Error: Could not run: failed to allocate memory

Instead, the implementation should only store the checksum so that Puppet can decide based on checksums if our content= setter needs to be invoked.

This also means that the Puppet catalog’s content needs to be checksummed by munge before it it processed by Puppet’s internal comparison routine. Luckily we also have access to the original value via shouldorig.

# modules/demo/lib/puppet/type/file_llmethod.rb

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    munge do |value|
      "{sha256}#{Digest::SHA256.hexdigest(value)}"
    end
    # No need to override change_to_s with munging
  end

  newproperty(:ensure) do
    define_method(:change_to_s) do |currentvalue, newvalue|
      if currentvalue == :absent
        should = @resource.property(:content).should
        "defined content as '#{should}'"
      else
        super(currentvalue, newvalue)
      end
    end
    newvalues(:present, :absent)
    defaultto(:present)
  end
end
# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  ...

  def content
    File.open(@resource[:path], 'r') do |file|
      sha = Digest::SHA256.new
      while chunk = file.read(2**16)
        sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end

  def content=(value)
    # value is munged, but we need to write the original
    File.write(@resource[:path], @resource.parameter(:content).shouldorig[0])
  end
end

Now we can manage big files:

demo@85c63b50bfa3:~$ ulimit -Sv 200000
demo@85c63b50bfa3:~$ dd if=/dev/zero of=/home/demo/puppet-file-lowlevel-method-change_big seek=8G bs=1 count=1
1+0 records in
1+0 records out
1 byte copied, 9.596e-05 s, 10.4 kB/s
demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_change_big.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-change_big]/content: content changed '{sha256}ef17a425c57a0e21d14bec2001d8fa762767b97145b9fe47c5d4f2fda323697b' to '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

Ensure it the Puppet way

There is still something not right. Maybe you have noticed that our provider’s content getter attempts to open a file unconditionally, and yet the file_llmethod_create.pp run has not produced an error. It seems that an ensure transition from absent to present short-circuits the content getter, even though we have not expressed a wish to do so.

It turns out that an ensure property gets special treatment by Puppet. If we had attempted to use a makeitso property instead of ensure, there would be no short-circuiting and the content getter would raise an exception.

We will not fix the content getter though. If Puppet has special treatment for ensure, we should use Puppet’s intended mechanism for it, and declare the type ensurable:

# modules/demo/lib/puppet/type/file_llmethod.rb

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  ensurable

  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    munge do |value|
      "{sha256}#{Digest::SHA256.hexdigest(value)}"
    end
  end
end

With ensurable the provider needs to implement three new methods, but we can drop the ensure accessors:

# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  def exists?
    File.exist?(@resource[:name])
  end

  def create
    self.content=(:dummy)
  end

  def destroy
    File.unlink(@resource[:name])
  end

  def content
    File.open(@resource[:path], 'r') do |file|
      sha = Digest::SHA256.new
      while chunk = file.read(2**16)
        sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end

  def content=(value)
    # value is munged, but we need to write the original
    File.write(@resource[:path], @resource.parameter(:content).shouldorig[0])
  end
end

However, now we have lost the SHA256 checksum on file creation:

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: created

To get it back, we replace ensurable by an adapted implementation of it, which includes our previous change_to_s override:

newproperty(:ensure, :parent => Puppet::Property::Ensure) do
  defaultvalues
  define_method(:change_to_s) do |currentvalue, newvalue|
    if currentvalue == :absent
      should = @resource.property(:content).should
      "defined content as '#{should}'"
    else
      super(currentvalue, newvalue)
    end
  end
end
demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_llmethod_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.02 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-lowlevel-method-create]/ensure: removed
Notice: /Stage[main]/Main/File_llmethod[/home/demo/puppet-file-lowlevel-method-create]/ensure: defined content as '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

Our final low-level prototype is thus as follows.

Final low-level prototype

# modules/demo/lib/puppet/type/file_llmethod.rb

# frozen_string_literal: true

require 'digest'

Puppet::Type.newtype(:file_llmethod) do
  newparam(:path, namevar: true) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
      fail "#{value} is not an absolute path" unless File.absolute_path?(value)
    end
  end

  newproperty(:content) do
    validate do |value|
      fail "#{value} is not a String" unless value.is_a?(String)
    end
    munge do |value|
      "{sha256}#{Digest::SHA256.hexdigest(value)}"
    end
  end

  newproperty(:ensure, :parent => Puppet::Property::Ensure) do
    defaultvalues
    define_method(:change_to_s) do |currentvalue, newvalue|
      if currentvalue == :absent
        should = @resource.property(:content).should
        "defined content as '#{should}'"
      else
        super(currentvalue, newvalue)
      end
    end
  end
end
# modules/demo/lib/puppet/provider/file_llmethod/ruby.rb

# frozen_string_literal: true

Puppet::Type.type(:file_llmethod).provide(:ruby) do
  def exists?
    File.exist?(@resource[:name])
  end

  def create
    self.content=(:dummy)
  end

  def destroy
    File.unlink(@resource[:name])
  end

  def content
    File.open(@resource[:path], 'r') do |file|
      sha = Digest::SHA256.new
      while (chunk = file.read(2**16))
        sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end

  def content=(_value)
    # value is munged, but we need to write the original
    File.write(@resource[:path], @resource.parameter(:content).shouldorig[0])
  end
end

Resource-API prototype

According to the Resource-API documentation we need to define our new file_rsapi type by calling Puppet::ResourceApi.register_type with several parameters, amongst which are the desired attributes, even ensure.

# modules/demo/lib/puppet/type/file_rsapi.rb

require 'puppet/resource_api'

Puppet::ResourceApi.register_type(
  name: 'file_rsapi', 
  attributes: {
    content: {
      desc: 'description of content parameter',
      type: 'String' 
    },
    ensure: {
      default: 'present',
      desc: 'description of ensure parameter',
      type: 'Enum[present, absent]'
    },
    path: {
      behaviour: :namevar,
      desc: 'description of path parameter',
      type: 'Pattern[/\A\/([^\n\/\0]+\/*)*\z/]'
    },
  },
  desc: 'description of file_rsapi'
)

The path type uses a built-in Puppet data type. Stdlib::Absolutepath would have been more convenient but external data types are not possible with the Resource-API yet.

In comparison with our low-level prototype, the above type definition has no SHA256-munging and SHA256-output counterparts. The canonicalize provider feature looks similar to munging, but we skip it for now.

The Resource-API documentation tells us to implement a get and a set method in our provider, stating

The get method reports the current state of the managed resources. It returns an enumerable of all existing resources. Each resource is a hash with attribute names as keys, and their respective values as values.

This demand is the first bummer, as we definitely do not want to read all files with their content and store it in memory. We can ignore this demand – how would the Resource-API know anyway.

However, the documented signature is def get(context) {...} where context has no information about the resource we want to manage.

This would have been a show-stopper, if the simple_get_filter provider feature didn’t exist, which changes the signature to def get(context, names = nil) {...}.

Our first version of file_rsapi is thus the following.

Basic functionality

# modules/demo/lib/puppet/type/file_rsapi.rb

require 'puppet/resource_api'

Puppet::ResourceApi.register_type(
  name: 'file_rsapi',
  features: %w[simple_get_filter],
  attributes: {
    content: {
      desc: 'description of content parameter',
      type: 'String' 
    },
    ensure: {
      default: 'present',
      desc: 'description of ensure parameter',
      type: 'Enum[present, absent]'
    },
    path: {
      behaviour: :namevar,
      desc: 'description of path parameter',
      type: 'Pattern[/\A\/([^\n\/\0]+\/*)*\z/]'
    },
  },
  desc: 'description of file_rsapi'
)
# modules/demo/lib/puppet/provider/file_rsapi/file_rsapi.rb

require 'digest'

class Puppet::Provider::FileRsapi::FileRsapi
  def get(context, names)
    (names or []).map do |name|
      File.exist?(name) ? {
        path: name,
        ensure: 'present',
        content: filedigest(name),
      } : nil
    end.compact # remove non-existing resources
  end

  def set(context, changes)
    changes.each do |path, change|
      if change[:should][:ensure] == 'present'
        File.write(path, change[:should][:content])
      elsif File.exist?(path)
        File.delete(path)
      end
    end
  end

  def filedigest(path)
    File.open(path, 'r') do |file|
      sha = Digest::SHA256.new
      while chunk = file.read(2**16)
         sha << chunk
      end
      "{sha256}#{sha.hexdigest}"
    end
  end
end

The desired content is written correctly into the file, but we have again no SHA256 checksum on creation as well as unnecessary writes, because the checksum from get does not match the cleartext from the catalog:

demo@85c63b50bfa3:~$ cat > file_rsapi_create.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-create'
file       {$file: ensure  => absent} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-create]/ensure: removed
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-create]/ensure: defined 'ensure' as 'present'
Notice: Applied catalog in 0.02 seconds

demo@85c63b50bfa3:~$ cat > file_rsapi_change.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-change'
file       {$file: content => 'This is madness'} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e' to 'This is Sparta!'
Notice: Applied catalog in 0.03 seconds

Hence we enable and implement the canonicalize provider feature.

Canonicalize

According to the documentation, canonicalize is applied to the results of get as well as to the catalog properties. On the one hand, we do not want to read the file’s content into memory, on the other hand cannot checksum the file’s content twice.

An easy way would be to check whether the canonicalized call happens after the get call:

def canonicalize(context, resources)
  if @stage == :get
    # do nothing, is-state canonicalization already performed by get()
  else
    # catalog canonicalization
    ...
  end
end

def get(context, paths)
  @stage = :get

  ...
end

While this works for the current implementation of the Resource-API, there is no guarantee about the order of canonicalize calls. Instead we subclass from String and handle checksumming internally. We also add some state change calls to the appropriate context methods.

Our final Resource-API-based prototype implementation is:

# modules/demo/lib/puppet/type/file_rsapi.rb

# frozen_string_literal: true

require 'puppet/resource_api'

Puppet::ResourceApi.register_type(
  name: 'file_rsapi',
  features: %w[simple_get_filter canonicalize],
  attributes: {
    content: {
      desc: 'description of content parameter',
      type: 'String'
    },
    ensure: {
      default: 'present',
      desc: 'description of ensure parameter',
      type: 'Enum[present, absent]'
    },
    path: {
      behaviour: :namevar,
      desc: 'description of path parameter',
      type: 'Pattern[/\A\/([^\n\/\0]+\/*)*\z/]'
    }
  },
  desc: 'description of file_rsapi'
)
# modules/demo/lib/puppet/provider/file_rsapi/file_rsapi.rb

# frozen_string_literal: true

require 'digest'
require 'pathname'

class Puppet::Provider::FileRsapi::FileRsapi
  class CanonicalString < String
    attr_reader :original

    def class
      # Mask as String for YAML.dump to mitigate
      #   Error: Transaction store file /var/cache/puppet/state/transactionstore.yaml
      #   is corrupt ((/var/cache/puppet/state/transactionstore.yaml): Tried to
      #   load unspecified class: Puppet::Provider::FileRsapi::FileRsapi::CanonicalString)
      String
    end

    def self.from(obj)
      return obj if obj.is_a?(self)
      return new(filedigest(obj)) if obj.is_a?(Pathname)

      new("{sha256}#{Digest::SHA256.hexdigest(obj)}", obj)
    end

    def self.filedigest(path)
      File.open(path, 'r') do |file|
        sha = Digest::SHA256.new
        while (chunk = file.read(2**16))
          sha << chunk
        end
        "{sha256}#{sha.hexdigest}"
      end
    end

    def initialize(canonical, original = nil)
      @original = original
      super(canonical)
    end
  end

  def canonicalize(_context, resources)
    resources.each do |resource|
      next if resource[:ensure] == 'absent'

      resource[:content] = CanonicalString.from(resource[:content])
    end
    resources
  end

  def get(_context, names)
    (names or []).map do |name|
      next unless File.exist?(name)

      {
        content: CanonicalString.from(Pathname.new(name)),
        ensure: 'present',
        path: name
      }
    end.compact # remove non-existing resources
  end

  def set(context, changes)
    changes.each do |path, change|
      if change[:should][:ensure] == 'present'
        File.write(path, change[:should][:content].original)
        if change[:is][:ensure] == 'present'
          # The only other possible change is due to content,
          # but content change transition info is covered implicitly
        else
          context.created("#{path} with content '#{change[:should][:content]}'")
        end
      elsif File.exist?(path)
        File.delete(path)
        context.deleted(path)
      end
    end
  end
end

It gives us:

demo@85c63b50bfa3:~$ cat > file_rsapi_create.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-create'
file       {$file: ensure  => absent} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_create.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-create]/ensure: removed
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-create]/ensure: defined 'ensure' as 'present'
Notice: file_rsapi: Created: /home/demo/puppet-file-rsapi-create with content '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

demo@85c63b50bfa3:~$ cat > file_rsapi_change.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-change'
file       {$file: content => 'This is madness'} ->
file_rsapi {$file: content => 'This is Sparta!'}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_change.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2' to '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e'
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-change]/content: content changed '{sha256}0549defd0a7d6d840e3a69b82566505924cacbe2a79392970ec28cddc763949e' to '{sha256}823cbb079548be98b892725b133df610d0bff46b33e38b72d269306d32b73df2'

demo@85c63b50bfa3:~$ cat > file_rsapi_remove.pp <<EOF
$file = '/home/demo/puppet-file-rsapi-remove'
file       {$file: ensure => present} ->
file_rsapi {$file: ensure => absent}
EOF

demo@85c63b50bfa3:~$ bin/puppet apply --modulepath modules file_rsapi_remove.pp
Notice: Compiled catalog for 85c63b50bfa3 in environment production in 0.03 seconds
Notice: /Stage[main]/Main/File[/home/demo/puppet-file-rsapi-remove]/ensure: created
Notice: /Stage[main]/Main/File_rsapi[/home/demo/puppet-file-rsapi-remove]/ensure: undefined 'ensure' from 'present'
Notice: file_rsapi: Deleted: /home/demo/puppet-file-rsapi-remove

Thoughts

The apparent need for a CanonicalString masking as String makes it look like we are missing something. If the Resource-API only checked data types after canonicalization, we could make get return something simpler than CanonicalString to signal an already canonicalized value.

The Resource-API’s default demand to return all existing resources simplifies development when one had this plan anyway. The low-level version of doing so is often a combination of prefetch and instances.

Overview

Infrastructures that offer services via TLS often require an automated mechanism for the regular exchange of certificates. This can be achieved, among other things, by using Certmonger and FreeIPA.

By default, a FreeIPA server is configured to create certificates using a profile with RSA keys. However, if cryptographic requirements exist that, for example, necessitate EC keys, a change to the FreeIPA profile must be made to enable them. Once this is done, these types of keys can also be used via Certmonger – by specifying the adapted profile.

The following article addresses precisely the aforementioned FreeIPA adjustment to demonstrate how certificates with EC keys can also be utilized via Certmonger.

Setup

For the following examples, a FreeIPA server and a corresponding client, which has also been registered as a client with FreeIPA, are required. In this example, both systems are set up with CentOS 9 Stream.

Certmonger

Certmonger typically runs as a daemon in the background and monitors certificates for their expiration. The service can renew certificates if a CA is available for it, or it can also handle the complete deployment of certificates, including key generation, if desired.

The typical sequence for Certmonger is as follows:

  1. Creating a key pair
  2. Creating a CSR (Certificate Signing Request) with the public key
  3. Submitting the CSR to a corresponding CA
  4. Verifying the CA’s response
  5. Monitoring the certificate for expiration

The design documentation provides further details on this.

EC

EC is also known as ECDSA (Elliptic Curve Digital Signature Algorithm). Elliptic Curve Cryptography (ECC) is used to create such keys, which allows for the use of smaller key lengths. Like RSA, ECDSA also works asymmetrically, meaning a public and a private key are also used here.

Requirements

The prerequisites for the specified setup are as follows:

  1. FreeIPA (version used: 4.12.2)
  2. At least one client that is to manage certificates using Certmonger (version: 0.79.17)

The FreeIPA server is already configured and offers its services accordingly. The clients, on the other hand, can reach the FreeIPA server, where Certmonger (ipa-client) is already installed.

Creating a Profile for Using EC Keys on FreeIPA

With the default configuration, the FreeIPA server already offers several profiles for certificate creation:


[root@ipa ~]# ipa certprofile-find
------------------
4 profiles matched
------------------
Profile ID: acmeIPAServerCert
Profile description: ACME IPA service certificate profile
Store issued certificates: False


Profile ID: caIPAserviceCert
Profile description: Standard profile for network services
Store issued certificates: True


Profile ID: IECUserRoles
Profile description: User profile that includes IECUserRoles extension from request
Store issued certificates: True


Profile ID: KDCs_PKINIT_Certs
Profile description: Profile for PKINIT support by KDCs
Store issued certificates: False
----------------------------
Number of entries returned 4
----------------------------

It is therefore easiest, for example, to export the existing profile caIPAServicecert and make the necessary changes there. For this, the corresponding profile is exported and saved, for example, in a file named caIPAserviceCert_ECDSA.cfg:


[root@ipa ~]# ipa certprofile-show caIPAserviceCert --out caIPAserviceCert_ECDSA.cfg
-----------------------------------------------------------------
Profile configuration stored in file 'caIPAserviceCert_ECDSA.cfg'
-----------------------------------------------------------------
Profile ID: caIPAserviceCert
Profile description: Standard profile for network services
Store issued certificates: True

In the exported profile, the necessary changes are now made using a simple text editor to support other key types as well:


policyset.serverCertSet.3.constraint.params.keyParameters=nistp256,nistp384,nistp521,sect163k1,nistk163
policyset.serverCertSet.3.constraint.params.keyType=EC
profileId=caIPAserviceCert_ECDSA

The possible values for the parameter keyParameters can be determined with the following command:


[root@ipa ~]# certutil -G -H
[...]
-k key-type Type of key pair to generate ("dsa", "ec", "rsa" (default))
-g key-size Key size in bits, (min 512, max 8192, default 2048) (not for ec)
-q curve-name Elliptic curve name (ec only)
One of nistp256, nistp384, nistp521, curve25519.
If a custom token is present, the following curves are also supported:
sect163k1, nistk163, sect163r1, sect163r2,
nistb163, sect193r1, sect193r2, sect233k1, nistk233,
sect233r1, nistb233, sect239k1, sect283k1, nistk283,
sect283r1, nistb283, sect409k1, nistk409, sect409r1,
nistb409, sect571k1, nistk571, sect571r1, nistb571,
secp160k1, secp160r1, secp160r2, secp192k1, secp192r1,
nistp192, secp224k1, secp224r1, nistp224, secp256k1,
secp256r1, secp384r1, secp521r1,
prime192v1, prime192v2, prime192v3,
prime239v1, prime239v2, prime239v3, c2pnb163v1,
c2pnb163v2, c2pnb163v3, c2pnb176v1, c2tnb191v1,
c2tnb191v2, c2tnb191v3,
c2pnb208w1, c2tnb239v1, c2tnb239v2, c2tnb239v3,
c2pnb272w1, c2pnb304w1,
c2tnb359w1, c2pnb368w1, c2tnb431r1, secp112r1,
secp112r2, secp128r1, secp128r2, sect113r1, sect113r2
sect131r1, sect131r2

For demonstration purposes, the command output is greatly reduced. Only the possible key types and the key size (configurable later in the Certmonger request) are included in the output shown here. By using the passing parameter curve-name, the corresponding command returns the values for the keyParameters.

Of course, when using other key types, client applications must also be considered, as not every browser, for example, supports every available parameter (curve-name). Indications of which key types a browser supports can be determined, for example, via Qualys SSL Labs (in the area of named group).

Additionally, further settings can be made within the profile itself, such as the validity period of the issued certificates.

After these adjustments, the profile (with the new name caIPAserviceCert_ECDSA) can be imported into FreeIPA to use it afterwards. The following command is used for importing:


[root@ipa ~]# ipa certprofile-import caIPAserviceCert_ECDSA --file caIPAserviceCert_ECDSA.cfg
Profile description: Profile for network service with ECDSA
Store issued certificates [True]:
-----------------------------------------
Imported profile "caIPAserviceCert_ECDSA"
-----------------------------------------
Profile ID: caIPAserviceCert_ECDSA
Profile description: Profile for network service with ECDSA
Store issued certificates: True

Thus, another profile is now available in FreeIPA, which can be passed along with certificate requests via Certmonger to obtain corresponding certificates and keys.

Requesting a Certificate via Certmonger

EC

For Certmonger to manage a certificate, a corresponding “Request” must be created. The CA and the appropriate profile to be used for creation are provided with this. Additionally, the storage locations for the private key and the certificate are also specified:


[root@ipa-client-01 ~]# getcert request --ca=IPA --profile=caIPAserviceCert_ECDSA --certfile=/etc/pki/tls/certs/$HOSTNAME.crt --keyfile=/etc/pki/tls/private/$HOSTNAME.key --key-type=ec --subject-name="$HOSTNAME" --principal="HTTP/$HOSTNAME" -g 256

We can observe the successful request in the output as follows:


Number of certificates and requests being tracked: 1.
Request ID '20250206122718':
status: MONITORING
stuck: no
key pair storage: type=FILE,location='/etc/pki/tls/private/ipa-client-01.vrc.lan.key'
certificate: type=FILE,location='/etc/pki/tls/certs/ipa-client-01.vrc.lan.crt'
CA: IPA
issuer: CN=Certificate Authority,O=VRC.LAN
subject: CN=ipa-client-01.vrc.lan,O=VRC.LAN
issued: 2025-02-06 13:27:18 CET
expires: 2027-02-07 13:27:18 CET
dns: ipa-client-01.vrc.lan
principal name: HTTP/ipa-client-01.vrc.lan@VRC.LAN
key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
eku: id-kp-serverAuth,id-kp-clientAuth
profile: caIPAserviceCert_ECDSA
pre-save command:
post-save command:
track: yes
auto-renew: yes

Subsequently, we can now also inspect the created certificate more closely with openssl:


[root@ipa-client-01 ~]# openssl x509 -in '/etc/pki/tls/certs/ipa-client-01.vrc.lan.crt' -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 11 (0xb)
Signature Algorithm: sha256WithRSAEncryption
Issuer: O=VRC.LAN, CN=Certificate Authority
Validity
Not Before: Feb 6 12:27:18 2025 GMT
Not After : Feb 7 12:27:18 2027 GMT
Subject: O=VRC.LAN, CN=ipa-client-01.vrc.lan
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:91:15:7d:ac:83:9e:91:cc:9b:ea:f9:0a:5b:53:
03:37:a5:c7:33:69:73:88:38:e4:c1:38:57:8b:b4:
d8:c5:5e:18:8d:83:af:80:fc:9d:64:ab:32:69:dd:
05:50:27:57:be:32:3b:e1:25:10:f3:57:74:e5:42:
a7:16:8e:41:1a
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Authority Key Identifier:
03:4B:F3:FE:CB:F9:EC:26:14:F8:61:56:BB:81:6A:CE:A1:DB:4C:0B
Authority Information Access:
OCSP - URI:http://ipa-ca.vrc.lan/ca/ocsp
X509v3 Key Usage: critical
Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 CRL Distribution Points:
Full Name:
URI:http://ipa-ca.vrc.lan/ipa/crl/MasterCRL.bin CRL Issuer:
DirName:O = ipaca, CN = Certificate Authority
X509v3 Subject Key Identifier:
00:B5:C7:96:FA:D1:18:D8:6A:11:B4:E0:83:ED:CE:A8:8F:A1:19:7B
X509v3 Subject Alternative Name:
othername: UPN::HTTP/ipa-client-01.vrc.lan@VRC.LAN, othername: 1.3.6.1.5.2.2::, DNS:ipa-client-01.vrc.lan
Signature Algorithm: sha256WithRSAEncryption
[...]

It is relevant that we have indeed received a certificate from the EC profile.

RSA

In parallel to EC keys, we can, of course, also request and monitor other types of certificates with Certmonger. By specifying a different profile, an RSA-based certificate can thus also be requested additionally:


[root@ipa-client-01 ~]# getcert request --ca=IPA --profile=caIPAserviceCert --certfile=/etc/pki/tls/certs/${HOSTNAME}_rsa.crt --keyfile=/etc/pki/tls/private/${HOSTNAME}_rsa.key --subject-name="$HOSTNAME" --principal="HTTP/$HOSTNAME"

The details in the certificate itself can be viewed more closely via the parameter list:


[root@ipa-client-01 ~]# getcert list
Number of certificates and requests being tracked: 2.
Request ID '20250206122718':
status: MONITORING
stuck: no
key pair storage: type=FILE,location='/etc/pki/tls/private/ipa-client-01.vrc.lan.key'
certificate: type=FILE,location='/etc/pki/tls/certs/ipa-client-01.vrc.lan.crt'
CA: IPA
issuer: CN=Certificate Authority,O=VRC.LAN
subject: CN=ipa-client-01.vrc.lan,O=VRC.LAN
issued: 2025-02-06 13:27:18 CET
expires: 2027-02-07 13:27:18 CET
dns: ipa-client-01.vrc.lan
principal name: HTTP/ipa-client-01.vrc.lan@VRC.LAN
key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
eku: id-kp-serverAuth,id-kp-clientAuth
profile: caIPAserviceCert_ECDSA
pre-save command:
post-save command:
track: yes
auto-renew: yes
Request ID '20250206123339':
status: MONITORING
stuck: no
key pair storage: type=FILE,location='/etc/pki/tls/private/ipa-client-01.vrc.lan_rsa.key'
certificate: type=FILE,location='/etc/pki/tls/certs/ipa-client-01.vrc.lan_rsa.crt'
CA: IPA
issuer: CN=Certificate Authority,O=VRC.LAN
subject: CN=ipa-client-01.vrc.lan,O=VRC.LAN
issued: 2025-02-06 13:33:40 CET
expires: 2027-02-07 13:33:40 CET
dns: ipa-client-01.vrc.lan
principal name: HTTP/ipa-client-01.vrc.lan@VRC.LAN
key usage: digitalSignature,nonRepudiation,keyEncipherment,dataEncipherment
eku: id-kp-serverAuth,id-kp-clientAuth
profile: caIPAserviceCert
pre-save command:
post-save command:
track: yes
auto-renew: yes

Here, certificates can also be re-requested or management can be terminated by specifying the request ID. The parameters resubmit and stop-tracking are available for this.

Adjusting the Signature Algorithm

The signature algorithms can also be adjusted via the profile. For this, the entry


policyset.serverCertSet.8.default.params.signingAlg=SHA512withRSA

is adjusted. By default, only a “-” is found there. The algorithm can be selected from the list of options in the parameter


policyset.serverCertSet.8.constraint.params.signingAlgsAllowed=SHA1withRSA,SHA256withRSA,SHA384withRSA,SHA512withRSA,MD5withRSA,MD2withRSA,SHA1withDSA,SHA1withEC,SHA256withEC,SHA384withEC,SHA512withEC

can be selected.

Subsequently, the changed profile must, of course, be updated in IPA itself. This is done with:


[root@ipa ~]# ipa certprofile-mod caIPAserviceCert_ECDSA --file caIPAserviceCert_ECDSA.cfg
Profile ID: caIPAserviceCert_ECDSA
Profile description: Profile for network service with ECDSA
Store issued certificates: True

A check with openssl then shows the new algorithm:


[root@ipa-client-01 ~]# openssl x509 -in '/etc/pki/tls/certs/ipa-client-01.vrc.lan.crt' -noout -text | grep Sig
Signature Algorithm: sha512WithRSAEncryption
Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment
[...]

Conclusion

Certmonger in combination with FreeIPA can seamlessly manage certificates with EC keys. Only the configuration of the IPA and the corresponding Certmonger requests need to be adjusted, as shown in the preceding chapters. This allows FreeIPA to be operated excellently in infrastructures that require EC keys. As part of proper operation, it is also recommended to continuously monitor the results of certificate requests in central monitoring.

TOAST (The Oversized Attribute Storage Technique) is PostgreSQL‘s mechanism for handling large data objects that exceed the 8KB data page limit. Introduced in PostgreSQL 7.1, TOAST is an improved version of the out-of-line storage mechanism used in Oracle databases for handling large objects (LOBs). Both databases store variable-length data either inline within the table or in a separate structure. PostgreSQL limits the maximum size of a single tuple to one data page. When the size of the tuple, including compressed data in a variable-length column, exceeds a certain threshold, the compressed part is moved to a separate data file and automatically chunked to optimize performance.

PostgreSQL Data TOASTedTOAST can be used for storing long texts, binary data in bytea columns, JSONB data, HSTORE long key-value pairs, large arrays, big XML documents, or custom-defined composite data types. Its behavior is influenced by two parameters: TOAST_TUPLE_THRESHOLD and TOAST_TUPLE_TARGET. The first is a hardcoded parameter defined in PostgreSQL source code in the heaptoast.h file, based on the MaximumBytesPerTuple function, which is calculated for four toast tuples per page, resulting in a 2000-byte limit. This hardcoded threshold prevents users from storing values that are too small in out-of-line storage, which would degrade performance. The second parameter, TOAST_TUPLE_TARGET, is a table-level storage parameter initialized to the same value as TOAST_TUPLE_THRESHOLD, but it can be adjusted for individual tables. It defines the minimum tuple length required before trying to compress and move long column values into TOAST tables.

In the source file heaptoast.h, a comment explains: “If a tuple is larger than TOAST_TUPLE_THRESHOLD, we will try to toast it down to no more than TOAST_TUPLE_TARGET bytes through compressing compressible fields and moving EXTENDED and EXTERNAL data out-of-line. The numbers need not be the same, though they currently are. It doesn’t make sense for TARGET to exceed THRESHOLD, but it could be useful to make it be smaller.” This means that in real tables, data stored directly in the tuple may or may not be compressed, depending on its size after compression. To check if columns are compressed and which algorithm is used, we can use the PostgreSQL system function pg_column_compression. Additionally, the pg_column_size function helps check the size of individual columns. PostgreSQL 17 introduces a new function, pg_column_toast_chunk_id, which indicates whether a column’s value is stored in the TOAST table.

In the latest PostgreSQL versions, two compression algorithms are used: PGLZ (PostgreSQL LZ) and LZ4. Both are variants of the LZ77 algorithm, but they are designed for different use cases. PGLZ is suitable for mixed text and numeric data, such as XML or JSON in text form, providing a balance between compression speed and ratio. It uses a sliding window mechanism to detect repeated sequences in the data, offering a reasonable balance between compression speed and compression ratio. LZ4, on the other hand, is a fast compression method designed for real-time scenarios. It offers high-speed compression and decompression, making it ideal for performance-sensitive applications. LZ4 is significantly faster than PGLZ, particularly for decompression, and processes data in fixed-size blocks (typically 64KB), using a hash table to find matches. This algorithm excels with binary data, such as images, audio, and video files.

In my internal research project aimed at understanding the performance of JSONB data under different use cases, I ran multiple performance tests on queries that process JSONB data. The results of some tests showed interesting and sometimes surprising performance differences between these algorithms. But presented examples are anecdotal and cannot be generalized. The aim of this article is to raise an awareness that there can be huge differences in performance, which vary depending on specific data and use cases and also on specific hardware. Therefore, these results cannot be applied blindly.

JSONB data is stored as a binary object with a tree structure, where keys and values are stored in separate cells, and keys at the same JSON level are stored in sorted order. Nested levels are stored as additional tree structures under their corresponding keys from the higher level. This structure means that retrieving data for the first keys in the top JSON layer is quicker than retrieving values for highly nested keys stored deeper in the binary tree. While this difference is usually negligible, it becomes significant in queries that perform sequential scans over the entire dataset, where these small delays can cumulatively degrade overall performance.

The dataset used for the tests consisted of GitHub historical events available as JSON objects from gharchive.org covering the first week of January 2023. I tested three different tables: one using PGLZ, one using LZ4, and one using EXTERNAL storage without compression. A Python script downloaded the data, unpacked it, and loaded it into the respective tables. Each table was loaded separately to prevent prior operations from influencing the PostgreSQL storage format.

The first noteworthy observation was the size difference between the tables. The table using LZ4 compression was the smallest, around 38GB, followed by the table using PGLZ at 41GB. The table using external storage without compression was significantly larger at 98GB. As the testing machines had only 32GB of RAM, none of the tables could fit entirely in memory, making disk I/O a significant factor in performance. About one-third of the records were stored in TOAST tables, which reflected a typical data size distribution seen by our clients.

To minimize caching effects, I performed several tests with multiple parallel sessions running testing queries, each with randomly chosen parameters. In addition to use cases involving different types of indexes, I also ran sequential scans across the entire table. Tests were repeated with varying numbers of parallel sessions to gather sufficient data points, and the same tests were conducted on all three tables with different compression algorithms.

The first graph shows the results of select queries performing sequential scans, retrieving JSON keys stored at the beginning of the JSONB binary object. As expected, external storage without compression (blue line) provides nearly linear performance, with disk I/O being the primary factor. On an 8-core machine, the PGLZ algorithm (red line) performs reasonably well under smaller loads. However, as the number of parallel queries reaches the number of available CPU cores (8), its performance starts to degrade and becomes worse than the performance of uncompressed data. Under higher loads, it becomes a serious bottleneck. In contrast, LZ4 (green line) handles parallel queries exceptionally well, maintaining better performance than uncompressed data, even with up to 32 parallel queries on 8 cores.

The second test targeted JSONB keys stored at different positions (beginning, middle, and end) within the JSONB binary object. The results, measured on a 20-core machine, demonstrate that PGLZ (red line) is slower than the uncompressed table right from the start. In this case, the performance of PGLZ degrades linearly, rather than geometrically, but still lags significantly behind LZ4 (green line). LZ4 consistently outperformed uncompressed data throughout the test.

But if we decide to change the compression algorithm, simply creating a new table with the default_toast_compression setting set to “lz4” and running INSERT INTO my_table_lz4 SELECT \* FROM my_table_pglz; will not change the compression algorithm of existing records. Each already compressed record retains its original compression algorithm. You can use the pg_column_compression system function to check which algorithm was used for each record. The default compression setting only applies to new, uncompressed data; old, already compressed data is copied as-is.

To truly convert old data to a different compression algorithm, we need to recast it through text. For JSONB data, we would use a query like: INSERT INTO my_table_lz4 (jsonb_data, …) SELECT jsonb_data::text::jsonb, … FROM my_table_pglz; This ensures that old data is stored using the new LZ4 compression. However, this process can be time and resource-intensive, so it’s important to weigh the benefits before undertaking it.

To summarize it – my tests showed significant performance differences between the PGLZ and LZ4 algorithms for storing compressed JSONB data. These differences are particularly pronounced when the machine is under high parallel load. The tests showed a strong degradation in performance on data stored with PGLZ algorithm, when the number of parallel sessions exceeded the number of available cores. In some cases, PGLZ performed worse than uncompressed data right from the start. In contrast, LZ4 consistently outperformed both uncompressed and PGLZ-compressed data, especially under heavy loads. Setting LZ4 as the default compression for new data seems to be the right choice, and some cloud providers have already adopted this approach. However, these results should not be applied blindly to existing data. You should test your specific use cases and data to determine if conversion is worth the time and resource investment, as converting data requires re-casting and can be a resource-intensive process.

Introduction

Redis is a widely popular in-memory key-value high-performance database, which can also be used as a cache and message broker. It has been a go-to choice for many due to its performance and versatility. Many cloud providers offer Redis-based solutions:

However, due to recent changes in the licensing model of Redis, its prominence and usage are changing. Redis was initially developed under the open-source BSD license, allowing developers to freely use, modify and distribute the source code for both commercial and non-commercial purposes. As a result, Redis quickly gained popularity in the developer community.

But, Redis has recently changed to a dual source-available license. To be precise, in the future, it will be available under RSALv2 (Redis Source Available License Version 2) or SSPLv1 (Server Side Public License Version 1), commercial use requires individual agreements, potentially increasing costs for cloud service providers. For a detailed overview of these changes, refer to the Redis licensing page. Based on the Redis Community Edition, the source code will remain freely available for developers, customers and partners of the company. However, cloud service providers and others who want to use Redis as part of commercial offerings will have to make individual agreements with the provider.

Due to these recent changes in Redis’s licensing model, many developers and organizations are re-evaluating their in-memory key-value database choices. Valkey, an open-source fork of Redis, maintains the high performance and versatility while ensuring unrestricted use for both developers and commercial entities. The Linux Foundation forked the project and contributors are now supporting the Valkey project. More information can be found here and here. Its commitment to open-source principles has gained support from major cloud providers, including AWS. Amazon Web Services (AWS) announced “AWS is committed to supporting open source Valkey for the long term“, more information can be found here. So it may be the right time to switch the infrastructure from Redis to Valkey.

In this article, we will set up a Valkey instance with TLS and outline the steps to migrate your data from Redis seamlessly.

Overview of possible migration approaches

In general, there are several approaches to migrate:

  1. Reuse the database file
    In this approach, Redis is shutted down to update the rdb file on disk and Valkey will be started using this file in its data directory.
  2. Use REPLICAOF to connect Valkey to the Redis instance
    Register the new Valkey instance as a replica of a Redis master to stream the data. The Valkey instance and its network must be able to reach the Redis service.
  3. Automated data migration to Valkey
    Scripting the migration can be used on a machine that can reach both the Redis and Valkey databases.

In this bog article, we encounter that direct access to the file system of the Redis server in the cloud is not feasible to reuse the database file and that the Valkey service and Redis service are in different networks and cannot reach themselves for setting up a replica. As a result, we choose the third option and run an automated data migration script on a different machine, which can connect to both servers and transfer the data.

Setup of Valkey

In case you are using a cloud service, please consult their instructions how to setup a Valkey instance. Since it is a new project there are only a few distributions, which provides ready-to-use packages like Red Hat Enterprise Linux 8 and 9 via Extra Packages for Enterprise Linux (EPEL). In this blog post, we use an on-premises Debian 12 server to host the Valkey server in version 7.2.6 with TLS . Please consult your distribution guides to install Valkey or use the manual provided on GitHub. The migration itself with be done by a Python 3 script using TLS.

Start the server and establish a client connection:

In this bog article, we will use a server with the listed TLS parameters. We specify all used TLS parameters including port 0 to disable the non-TLS port completely:

$ valkey-server --tls-port 6379 --port 0 --tls-cert-file ./tls/redis.crt --tls-key-file ./tls/redis.key --tls-ca-cert-file ./tls/ca.crt

                .+^+.                                                
            .+#########+.                                            
        .+########+########+.           Valkey 7.2.6 (579cca5f/0) 64 bit
    .+########+'     '+########+.                                    
 .########+'     .+.     '+########.    Running in standalone mode
 |####+'     .+#######+.     '+####|    Port: 6379
 |###|   .+###############+.   |###|    PID: 436767                     
 |###|   |#####*'' ''*#####|   |###|                                 
 |###|   |####'  .-.  '####|   |###|                                 
 |###|   |###(  (@@@)  )###|   |###|          https://valkey.io      
 |###|   |####.  '-'  .####|   |###|                                 
 |###|   |#####*.   .*#####|   |###|                                 
 |###|   '+#####|   |#####+'   |###|                                 
 |####+.     +##|   |#+'     .+####|                                 
 '#######+   |##|        .+########'                                 
    '+###|   |##|    .+########+'                                    
        '|   |####+########+'                                        
             +#########+'                                            
                '+v+'                                                

436767:M 27 Aug 2024 16:08:56.058 * Server initialized
436767:M 27 Aug 2024 16:08:56.058 * Loading RDB produced by valkey version 7.2.6
[...]
436767:M 27 Aug 2024 16:08:56.058 * Ready to accept connections tls

Now it is time to test the connection with a client using TLS:

$ valkey-cli --tls --cert ./tls/redis.crt --key ./tls/redis.key --cacert ./tls/ca.crt -p 6379

127.0.0.1:6379> INFO SERVER
# Server
server_name:valkey
valkey_version:7.2.6
[...]

Automated data migration to Valkey

Finally, we migrate the data in this example using a Python 3 script. This Python script establishes connections to both the Redis source and Valkey target databases, fetches all the keys from the Redis database and creates or updates each key-value pair in the Valkey database. This approach is not off the shelf and uses the redis-py library, which provides a list of examples. By using Python 3 the process could even be extended to filter unwanted data, alter values to be suitable for the new environment or by adding sanity checks. The script, which is used here, provides progress updates during the migration process:

#!/usr/bin/env python3

import redis

# Connect to the Redis source database, which is password protected, via IP and port
redis_client = redis.StrictRedis(host='172.17.0.3', port=6379, password='secret', db=0)

# Connect to the Valkey target database, which is using TLS
ssl_certfile="./tls/client.crt"
ssl_keyfile="./tls/client.key"
ssl_ca_certs="./tls/ca.crt"
valkey_client = redis.Redis(
    host="192.168.0.3",
    port=6379,
    ssl=True,
    ssl_certfile=ssl_certfile,
    ssl_keyfile=ssl_keyfile,
    ssl_cert_reqs="required",
    ssl_ca_certs=ssl_ca_certs,
)

# Fetch all keys from the Redis database
keys = redis_client.keys('*')
print("Found", len(keys), "Keys in Source!")

# Migrate each key-value pair to the Valkey database
for counter, key in enumerate(keys):
    value = redis_client.get(key)
    valkey_client.set(key, value)
    print("Status: ", round((counter+1) / len(keys) * 100, 1), "%", end='\r')
print()

To start the process execute the script:

$ python3 redis_to_tls_valkey.py
Found 569383 Keys in Source!
Status: 100.0 %

As a last step, configure your application to connect to the new Valkey server.

Conclusion

Since the change of Redis’ license, the new project Valkey is gaining more and more attraction. Migrating to Valkey ensures continued access to a robust, open-source in-memory database without the licensing restrictions of Redis. Whether you’re running your infrastructure on-premises or in the cloud, this guide provides the steps needed for a successful migration. Migrating from a cloud instance to a new environment can be cumbersome, because of no direct file access or isolated networks. Depending on these circumstances, we used a Python script, which is a flexible way to implement various steps to master the task.

If you find this guide helpful and in case you need support to migrate your databases, feel free to contact us. We like to support you on-premises or in cloud environments.

Artificial Intelligence (AI) is often regarded as a groundbreaking innovation of the modern era, yet its roots extend much further back than many realize. In 1943, neuroscientist Warren McCulloch and logician Walter Pitts proposed the first computational model of a neuron. The term “Artificial Intelligence” was coined in 1956. The subsequent creation of the Perceptron in 1957, the first model of a neural network, and the expert system Dendral designed for chemical analysis demonstrated the potential of computers to process data and apply expert knowledge in specific domains. From the 1970s to the 1990s, expert systems proliferated. A pivotal moment for AI in the public eye came in 1997 when IBM’s chess-playing computer Deep Blue defeated chess world champion Garry Kasparov.

The Evolution of Modern AI: From Spam Filters to LLMs and the Power of Context

The new millennium brought a new era for AI, with the integration of rudimentary AI systems into everyday technology. Spam filters, recommendation systems, and search engines subtly shaped online user experiences. In 2006, deep learning emerged, marking the renaissance of neural networks. The landmark development came in 2017 with the introduction of Transformers, a neural network architecture that became the most important ingredient for the creation of Large Language Models (LLMs). Its key component, the attention mechanism, enables the model to discern relationships between words over long distances within a text. This mechanism assigns varying weights to words depending on their contextual importance, acknowledging that the same word can hold different meanings in different situations. However, modern AI, as we know it, was made possible mainly thanks to the availability of large datasets and powerful computational hardware. Without the vast resources of the internet and electronic libraries worldwide, modern AI would not have enough data to learn and evolve. And without modern performant GPUs, training AI would be a challenging task.

The LLM is a sophisticated, multilayer neural network comprising numerous interconnected nodes. These nodes are the micro-decision-makers that underpin the collective intelligence of the system. During its training phase, an LLM learns to balance myriad small, simple decisions, which, when combined, enable it to handle complex tasks. The intricacies of these internal decisions are typically opaque to us, as we are primarily interested in the model’s output. However, these complex neural networks can only process numbers, not raw text. Text must be tokenized into words or sub-words, standardized, and normalized — converted to lowercase, stripped of punctuation, etc. These tokens are then put into a dictionary and mapped to unique numerical values. Only this numerical representation of the text allows LLMs to learn the complex relationships between words, phrases, and concepts and the likelihood of certain words or phrases following one another. LLMs therefore process texts as huge numerical arrays without truly understanding the content. They lack a mental model of the world and operate solely on mathematical representations of word relationships and their probabilities. This focus on the answer with the highest probability is also the reason why LLMs can “hallucinate” plausible yet incorrect information or get stuck in response loops, regurgitating the same or similar answers repeatedly.

The Art of Conversation: Prompt Engineering and Guiding an LLM’s Semantic Network

Based on the relationships between words learned from texts, LLMs also create vast webs of semantic associations that interconnect words. These associations form the backbone of an LLM’s ability to generate contextually appropriate and meaningful responses. When we provide a prompt to an LLM, we are not merely supplying words; we are activating a complex network of related concepts and ideas. Consider the word “apple”. This simple term can trigger a cascade of associated concepts such as “fruit,” “tree,” “food,” and even “technology” or “computer”. The activated associations depend on the context provided by the prompt and the prevalence of related concepts in the training data. The specificity of a prompt greatly affects the semantic associations an LLM considers. A vague prompt like “tell me about apples” may activate a wide array of diverse associations, ranging from horticultural information about apple trees to the nutritional value of the fruit or even cultural references like the tale of Snow White. An LLM will typically use the association with the highest occurrence in its training data when faced with such a broad prompt. For more targeted and relevant responses, it is crucial to craft focused prompts that incorporate specific technical jargon or references to particular disciplines. By doing so, the user can guide the LLM to activate a more precise subset of semantic associations, thereby narrowing the scope of the response to the desired area of expertise or inquiry.

LLMs have internal parameters that influence their creativity and determinism, such as “temperature”, “top-p”, “max length”, and various penalties. However, these are typically set to balanced defaults, and users should not modify them; otherwise, they could compromise the ability of LLMs to provide meaningful answers. Prompt engineering is therefore the primary method for guiding LLMs toward desired outputs. By crafting specific prompts, users can subtly direct the model’s responses, ensuring relevance and accuracy. The LLM derives a wealth of information from the prompt, determining not only semantic associations for the answer but also estimating its own role and the target audience’s knowledge level. By default, an LLM assumes the role of a helper and assistant, but it can adopt an expert’s voice if prompted. However, to elicit an expert-level response, one must not only set an expert role for the LLM but also specify that the inquirer is an expert as well. Otherwise, an LLM assumes an “average Joe” as the target audience by default. Therefore, even when asked to impersonate an expert role, an LLM may decide to simplify the language for the “average Joe” if the knowledge level of the target audience is not specified, which can result in a disappointing answer.

Ideas & samples

Consider two prompts for addressing a technical issue with PostgreSQL:

1. “Hi, what could cause delayed checkpoints in PostgreSQL?”

2. “Hi, we are both leading PostgreSQL experts investigating delayed checkpoints. The logs show checkpoints occasionally taking 3-5 times longer than expected. Let us analyze this step by step and identify probable causes.”

The depth of the responses will vary significantly, illustrating the importance of prompt specificity. The second prompt employs common prompting techniques, which we will explore in the following paragraphs. However, it is crucial to recognize the limitations of LLMs, particularly when dealing with expert-level knowledge, such as the issue of delayed checkpoints in our example. Depending on the AI model and the quality of its training data, users may receive either helpful or misleading answers. The quality and amount of training data representing the specific topic play a crucial role.

LLM Limitations and the Need for Verification: Overcoming Overfitting and Hallucinations

Highly specialized problems may be underrepresented in the training data, leading to overfitting or hallucinated responses. Overfitting occurs when an LLM focuses too closely on its training data and fails to generalize, providing answers that seem accurate but are contextually incorrect. In our PostgreSQL example, a hallucinated response might borrow facts from other databases (like MySQL or MS SQL) and adjust them to fit PostgreSQL terminology. Thus, the prompt itself is no guarantee of a high-quality answer—any AI-generated information must be carefully verified, which is a task that can be challenging for non-expert users.

With these limitations in mind, let us now delve deeper into prompting techniques. “Zero-shot prompting” is a baseline approach where the LLM operates without additional context or supplemental reference material, relying on its pre-trained knowledge and the prompt’s construction. By carefully activating the right semantic associations and setting the correct scope of attention, the output can be significantly improved. However, LLMs, much like humans, can benefit from examples. By providing reference material within the prompt, the model can learn patterns and structure its output accordingly. This technique is called “few-shot prompting”. The quality of the output is directly related to the quality and relevance of the reference material; hence, the adage “garbage in, garbage out” always applies.

For complex issues, “chain-of-thought” prompting can be particularly effective. This technique can significantly improve the quality of complicated answers because LLMs can struggle with long-distance dependencies in reasoning. Chain-of-thought prompting addresses this by instructing the model to break down the reasoning process into smaller, more manageable parts. It leads to more structured and comprehensible answers by focusing on better-defined sub-problems. In our PostgreSQL example prompt, the phrase “let’s analyze this step by step” instructs the LLM to divide the processing into a chain of smaller sub-problems. An evolution of this technique is the “tree of thoughts” technique. Here, the model not only breaks down the reasoning into parts but also creates a tree structure with parallel paths of reasoning. Each path is processed separately, allowing the model to converge on the most promising solution. This approach is particularly useful for complex problems requiring creative brainstorming. In our PostgreSQL example prompt, the phrase “let’s identify probable causes” instructs the LLM to discuss several possible pathways in the answer.

Plausibility vs. Veracity: The Critical Need to Verify All LLM Outputs

Of course, prompting techniques have their drawbacks. Few-shot prompting is limited by the number of tokens, which restricts the amount of information that can be included. Additionally, the model may ignore parts of excessively long prompts, especially the middle sections. Care must also be taken with the frequency of certain words in the reference material, as overlooked frequency can bias the model’s output. Chain-of-thought prompting can also lead to overfitted or “hallucinated” responses for some sub-problems, compromising the overall result.

Instructing the model to provide deterministic, factual responses is another prompting technique, vital for scientific and technical topics. Formulations like “answer using only reliable sources and cite those sources” or “provide an answer based on peer-reviewed scientific literature and cite the specific studies or articles you reference” can direct the model to base its responses on trustworthy sources. However, as already discussed, even with instructions to focus on factual information, the AI’s output must be verified to avoid falling into the trap of overfitted or hallucinated answers.

In conclusion, effective prompt engineering is a skill that combines creativity with strategic thinking, guiding the AI to deliver the most useful and accurate responses. Whether we are seeking simple explanations or delving into complex technical issues, the way we communicate with the AI always makes a difference in the quality of the response. However, we must always keep in mind that even the best prompt is no guarantee of a quality answer, and we must double-check received facts. The quality and amount of training data are paramount, and this means that some problems with received answers can persist even in future LLMs simply because they would have to use the same limited data for some specific topics.

When the model’s training data is sparse or ambiguous in certain highly focused areas, it can produce responses that are syntactically valid but factually incorrect. One reason AI hallucinations can be particularly problematic is their inherent plausibility. The generated text is usually grammatically correct and stylistically consistent, making it difficult for users to immediately identify inaccuracies without external verification. This highlights a key distinction between plausibility and veracity: just because something sounds right it does not mean it is true.

Conclusion

Whether the response is an insightful solution to a complex problem or completely fabricated nonsense is a distinction that must be made by human users, based on their expertise of the topic at hand. Our clients gained repeatedly exactly this experience with different LLMs. They tried to solve their technical problems using AI, but answers were partially incorrect or did not work at all. This is why human expert knowledge is still the most important factor when it comes to solving difficult technical issues. The inherent limitations of LLMs are unlikely to be fully overcome, at least not with current algorithms. Therefore expert knowledge will remain essential in delivering reliable, high-quality solutions even in the future. As people increasingly use AI tools in the same way they rely on Google — using them as a resource or assistant — true expertise will still be needed to interpret, refine, and implement these tools effectively. On the other hand, AI is emerging as a key driver of innovations. Progressive companies are investing heavily in AI, facing challenges related to security and performance. And this is the area where NetApp can also help. Its cloud AI focused solutions are designed to address exactly these issues.

(Picture generated by my colleague Felix Alipaz-Dicke using ChatGPT-4.)


Mastering Cloud Infrastructure with Pulumi: Introduction

In today’s rapidly changing landscape of cloud computing, managing infrastructure as code (IaC) has become essential for developers and IT professionals. Pulumi, an open-source IaC tool, brings a fresh perspective to the table by enabling infrastructure management using popular programming languages like JavaScript, TypeScript, Python, Go, and C#. This approach offers a unique blend of flexibility and power, allowing developers to leverage their existing coding skills to build, deploy, and manage cloud infrastructure. In this post, we’ll explore the world of Pulumi and see how it pairs with Amazon FSx for NetApp ONTAP—a robust solution for scalable and efficient cloud storage.


Pulumi – The Theory

Why Pulumi?

Pulumi distinguishes itself among IaC tools for several compelling reasons:

Challenges with Pulumi

Like any tool, Pulumi comes with its own set of challenges:

State Management in Pulumi: Ensuring Consistency Across Deployments

Effective infrastructure management hinges on proper state handling. Pulumi excels in this area by tracking the state of your infrastructure, enabling it to manage resources efficiently. This capability ensures that Pulumi knows exactly what needs to be created, updated, or deleted during deployments. Pulumi offers several options for state storage:

Managing state effectively is crucial for maintaining consistency across deployments, especially in scenarios where multiple team members are working on the same infrastructure.

Other IaC Tools: Comparing Pulumi to Traditional IaC Tools

When comparing Pulumi to other Infrastructure as Code (IaC) tools, several drawbacks of traditional approaches become evident:


Pulumi – In Practice

Introduction

In the this section, we’ll dive into a practical example to better understand Pulumi’s capabilities. We’ll also explore how to set up a project using Pulumi with AWS and automate it using GitHub Actions for CI/CD.

Prerequisites

Before diving into using Pulumi with AWS and automating your infrastructure management through GitHub Actions, ensure you have the following prerequisites in place:

Project Structure

When working with Infrastructure as Code (IaC) using Pulumi, maintaining an organized project structure is essential. A clear and well-defined directory structure not only streamlines the development process but also improves collaboration and deployment efficiency. In this post, we’ll explore a typical directory structure for a Pulumi project and explain the significance of each component.

Overview of a Typical Pulumi Project Directory

A standard Pulumi project might be organized as follows:


/project-root
├── .github
│ └── workflows
│ └── workflow.yml # GitHub Actions workflow for CI/CD
├── __main__.py # Entry point for the Pulumi program
├── infra.py # Infrastructure code
├── pulumi.dev.yml # Pulumi configuration for the development environment
├── pulumi.prod.yml # Pulumi configuration for the production environment
├── pulumi.yml # Pulumi configuration (common or default settings)
├── requirements.txt # Python dependencies
└── test_infra.py # Tests for infrastructure code

NetApp FSx on AWS

Introduction

Amazon FSx for NetApp ONTAP offers a fully managed, scalable storage solution built on the NetApp ONTAP file system. It provides high-performance, highly available shared storage that seamlessly integrates with your AWS environment. Leveraging the advanced data management capabilities of ONTAP, FSx for NetApp ONTAP is ideal for applications needing robust storage features and compatibility with existing NetApp systems.

Key Features

What It’s About

Setting up Pulumi for managing your cloud infrastructure can revolutionize the way you deploy and maintain resources. By leveraging familiar programming languages, Pulumi brings Infrastructure as Code (IaC) to life, making the process more intuitive and efficient. When paired with Amazon FSx for NetApp ONTAP, it unlocks advanced storage solutions within the AWS ecosystem.

Putting It All Together

Using Pulumi, you can define and deploy a comprehensive AWS infrastructure setup, seamlessly integrating the powerful FSx for NetApp ONTAP file system. This combination simplifies cloud resource management and ensures you harness the full potential of NetApp’s advanced storage capabilities, making your cloud operations more efficient and robust.

In the next sections, we’ll walk through the specifics of setting up each component using Pulumi code, illustrating how to create a VPC, configure subnets, set up a security group, and deploy an FSx for NetApp ONTAP file system, all while leveraging the robust features provided by both Pulumi and AWS.

Architecture Overview

A visual representation of the architecture we’ll deploy using Pulumi: Single AZ Deployment with FSx and EC2

The diagram above illustrates the architecture for deploying an FSx for NetApp ONTAP file system within a single Availability Zone. The setup includes a VPC with public and private subnets, an Internet Gateway for outbound traffic, and a Security Group controlling access to the FSx file system and the EC2 instance. The EC2 instance is configured to mount the FSx volume using NFS, enabling seamless access to storage.

Setting up Pulumi

Follow these steps to set up Pulumi and integrate it with AWS:

Install Pulumi: Begin by installing Pulumi using the following command:

curl -fsSL https://get.pulumi.com | sh

Install AWS CLI: If you haven’t installed it yet, install the AWS CLI to manage AWS services:

pip install awscli

Configure AWS CLI: Configure the AWS CLI with your credentials:

aws configure

Create a New Pulumi Project: Initialize a new Pulumi project with AWS and Python:

pulumi new aws-python

Configure Your Pulumi Stack: Set the AWS region for your Pulumi stack:

pulumi config set aws:region eu-central-1

Deploy Your Stack: Deploy your infrastructure using Pulumi:

pulumi preview ; pulumi up

Example: VPC, Subnets, and FSx for NetApp ONTAP

Let’s dive into an example Pulumi project that sets up a Virtual Private Cloud (VPC), subnets, a security group, an Amazon FSx for NetApp ONTAP file system, and an EC2 instance.

Pulumi Code Example: VPC, Subnets, and FSx for NetApp ONTAP

The first step is to define all the parameters required to set up the infrastructure. You can use the following example to configure these parameters as specified in the pulumi.dev.yaml file.

This pulumi.dev.yaml file contains configuration settings for a Pulumi project. It specifies various parameters for the deployment environment, including the AWS region, availability zones, and key name. It also defines CIDR blocks for subnets. These settings are used to configure and deploy cloud infrastructure resources in the specified AWS region.


config:
  aws:region: eu-central-1
  demo:availabilityZone: eu-central-1a
  demo:keyName: XYZ
  demo:subnet1CIDER: 10.0.3.0/24
  demo:subnet2CIDER: 10.0.4.0/24

The following code snippet should be placed in the infra.py file. It details the setup of the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each step in the code is explained through inline comments.


import pulumi
import pulumi_aws as aws
import pulumi_command as command
import os

# Retrieve configuration values from Pulumi configuration files
aws_config = pulumi.Config("aws")
region = aws_config.require("region")  # The AWS region where resources will be deployed

demo_config = pulumi.Config("demo")
availability_zone = demo_config.require("availabilityZone")  # Availability Zone for the deployment
subnet1_cidr = demo_config.require("subnet1CIDER")  # CIDR block for the public subnet
subnet2_cidr = demo_config.require("subnet2CIDER")  # CIDR block for the private subnet
key_name = demo_config.require("keyName")  # Name of the SSH key pair for EC2 instance access# Create a new VPC with DNS support enabled
vpc = aws.ec2.Vpc(
    "fsxVpc",
    cidr_block="10.0.0.0/16",  # VPC CIDR block
    enable_dns_support=True,    # Enable DNS support in the VPC
    enable_dns_hostnames=True   # Enable DNS hostnames in the VPC
)

# Create an Internet Gateway to allow internet access from the VPC
internet_gateway = aws.ec2.InternetGateway(
    "vpcInternetGateway",
    vpc_id=vpc.id  # Attach the Internet Gateway to the VPC
)

# Create a public route table for routing internet traffic via the Internet Gateway
public_route_table = aws.ec2.RouteTable(
    "publicRouteTable",
    vpc_id=vpc.id,
    routes=[aws.ec2.RouteTableRouteArgs(
        cidr_block="0.0.0.0/0",  # Route all traffic (0.0.0.0/0) to the Internet Gateway
        gateway_id=internet_gateway.id
    )]
)

# Create a single public subnet in the specified Availability Zone
public_subnet = aws.ec2.Subnet(
    "publicSubnet",
    vpc_id=vpc.id,
    cidr_block=subnet1_cidr,  # CIDR block for the public subnet
    availability_zone=availability_zone,  # The specified Availability Zone
    map_public_ip_on_launch=True  # Assign public IPs to instances launched in this subnet
)

# Create a single private subnet in the same Availability Zone
private_subnet = aws.ec2.Subnet(
    "privateSubnet",
    vpc_id=vpc.id,
    cidr_block=subnet2_cidr,  # CIDR block for the private subnet
    availability_zone=availability_zone  # The same Availability Zone
)

# Associate the public subnet with the public route table to enable internet access
public_route_table_association = aws.ec2.RouteTableAssociation(
    "publicRouteTableAssociation",
    subnet_id=public_subnet.id,
    route_table_id=public_route_table.id
)

# Create a security group to control inbound and outbound traffic for the FSx file system
security_group = aws.ec2.SecurityGroup(
    "fsxSecurityGroup",
    vpc_id=vpc.id,
    description="Allow NFS traffic",  # Description of the security group
    ingress=[
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=2049,  # NFS protocol port
            to_port=2049,
            cidr_blocks=["0.0.0.0/0"]  # Allow NFS traffic from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=111,  # RPCBind port for NFS
            to_port=111,
            cidr_blocks=["0.0.0.0/0"]  # Allow RPCBind traffic from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="udp",
            from_port=111,  # RPCBind port for NFS over UDP
            to_port=111,
            cidr_blocks=["0.0.0.0/0"]  # Allow RPCBind traffic over UDP from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=22,  # SSH port for EC2 instance access
            to_port=22,
            cidr_blocks=["0.0.0.0/0"]  # Allow SSH traffic from anywhere
        )
    ],
    egress=[
        aws.ec2.SecurityGroupEgressArgs(
            protocol="-1",  # Allow all outbound traffic
            from_port=0,
            to_port=0,
            cidr_blocks=["0.0.0.0/0"]  # Allow all outbound traffic to anywhere
        )
    ]
)

# Create the FSx for NetApp ONTAP file system in the private subnet
file_system = aws.fsx.OntapFileSystem(
    "fsxFileSystem",
    subnet_ids=[private_subnet.id],  # Deploy the FSx file system in the private subnet
    preferred_subnet_id=private_subnet.id,  # Preferred subnet for the FSx file system
    security_group_ids=[security_group.id],  # Attach the security group to the FSx file system
    deployment_type="SINGLE_AZ_1",  # Single Availability Zone deployment
    throughput_capacity=128,  # Throughput capacity in MB/s
    storage_capacity=1024  # Storage capacity in GB
)

# Create a Storage Virtual Machine (SVM) within the FSx file system
storage_virtual_machine = aws.fsx.OntapStorageVirtualMachine(
    "storageVirtualMachine",
    file_system_id=file_system.id,  # Associate the SVM with the FSx file system
    name="svm1",  # Name of the SVM
    root_volume_security_style="UNIX"  # Security style for the root volume
)

# Create a volume within the Storage Virtual Machine (SVM)
volume = aws.fsx.OntapVolume(
    "fsxVolume",
    storage_virtual_machine_id=storage_virtual_machine.id,  # Associate the volume with the SVM
    name="vol1",  # Name of the volume
    junction_path="/vol1",  # Junction path for mounting
    size_in_megabytes=10240,  # Size of the volume in MB
    storage_efficiency_enabled=True,  # Enable storage efficiency features
    tiering_policy=aws.fsx.OntapVolumeTieringPolicyArgs(
        name="SNAPSHOT_ONLY"  # Tiering policy for the volume
    ),
    security_style="UNIX"  # Security style for the volume
)

# Extract the DNS name from the list of SVM endpoints
dns_name = storage_virtual_machine.endpoints.apply(lambda e: e[0]['nfs'][0]['dns_name'])

# Get the latest Amazon Linux 2 AMI for the EC2 instance
ami = aws.ec2.get_ami(
    most_recent=True,
    owners=["amazon"],
    filters=[{"name": "name", "values": ["amzn2-ami-hvm-*-x86_64-gp2"]}]  # Filter for Amazon Linux 2 AMI
)

# Create an EC2 instance in the public subnet
ec2_instance = aws.ec2.Instance(
    "fsxEc2Instance",
    instance_type="t3.micro",  # Instance type for the EC2 instance
    vpc_security_group_ids=[security_group.id],  # Attach the security group to the EC2 instance
    subnet_id=public_subnet.id,  # Deploy the EC2 instance in the public subnet
    ami=ami.id,  # Use the latest Amazon Linux 2 AMI
    key_name=key_name,  # SSH key pair for accessing the EC2 instance
    tags={"Name": "FSx EC2 Instance"}  # Tag for the EC2 instance
)

# User data script to install NFS client and mount the FSx volume on the EC2 instance
user_data_script = dns_name.apply(lambda dns: f"""#!/bin/bash
sudo yum update -y
sudo yum install -y nfs-utils
sudo mkdir -p /mnt/fsx
if ! mountpoint -q /mnt/fsx; then
sudo mount -t nfs {dns}:/vol1 /mnt/fsx
fi
""")

# Retrieve the private key for SSH access from environment variables while running with Github Actions
private_key_content = os.getenv("PRIVATE_KEY")
print(private_key_content)

# Ensure the FSx file system is available before executing the script on the EC2 instance
pulumi.Output.all(file_system.id, ec2_instance.public_ip).apply(lambda args: command.remote.Command(
    "mountFsxFileSystem",
    connection=command.remote.ConnectionArgs(
        host=args[1],
        user="ec2-user",
        private_key=private_key_content
    ),
    create=user_data_script,
    opts=pulumi.ResourceOptions(depends_on=[volume])
))


Pytest with Pulumi

Introduction

Pytest is a widely-used Python testing framework that allows developers to create simple and scalable test cases. When paired with Pulumi, an infrastructure as code (IaC) tool, Pytest enables thorough testing of cloud infrastructure code, akin to application code testing. This combination is crucial for ensuring that infrastructure configurations are accurate, secure, and meet the required state before deployment. By using Pytest with Pulumi, you can validate resource properties, mock cloud provider responses, and simulate various scenarios. This reduces the risk of deploying faulty infrastructure and enhances the reliability of your cloud environments. Although integrating Pytest into your CI/CD pipeline is not mandatory, it is highly beneficial as it leverages Python’s robust testing capabilities with Pulumi.

Testing Code

The following code snippet should be placed in the infra_test.py file. It is designed to test the infrastructure setup defined in infra.py, including the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each test case focuses on different aspects of the infrastructure to ensure correctness, security, and that the desired state is achieved. Inline comments are provided to explain each test case.


# Importing necessary libraries
import pulumi
import pulumi_aws as aws
from typing import Any, Dict, List

# Setting up configuration values for AWS region and various parameters
pulumi.runtime.set_config('aws:region', 'eu-central-1')
pulumi.runtime.set_config('demo:availabilityZone1', 'eu-central-1a')
pulumi.runtime.set_config('demo:availabilityZone2', 'eu-central-1b')
pulumi.runtime.set_config('demo:subnet1CIDER', '10.0.3.0/24')
pulumi.runtime.set_config('demo:subnet2CIDER', '10.0.4.0/24')
pulumi.runtime.set_config('demo:keyName', 'XYZ') - Change based on your own key

# Creating a class MyMocks to mock Pulumi's resources for testing
class MyMocks(pulumi.runtime.Mocks):
    def new_resource(self, args: pulumi.runtime.MockResourceArgs) -> List[Any]:
        # Initialize outputs with the resource's inputs
        outputs = args.inputs

        # Mocking specific resources based on their type
        if args.typ == "aws:ec2/instance:Instance":
            # Mocking an EC2 instance with some default values
            outputs = {
                **args.inputs,  # Start with the given inputs
                "ami": "ami-0eb1f3cdeeb8eed2a",  # Mock AMI ID
                "availability_zone": "eu-central-1a",  # Mock availability zone
                "publicIp": "203.0.113.12",  # Mock public IP
                "publicDns": "ec2-203-0-113-12.compute-1.amazonaws.com",  # Mock public DNS
                "user_data": "mock user data script",  # Mock user data
                "tags": {"Name": "test"}  # Mock tags
            }
        elif args.typ == "aws:ec2/securityGroup:SecurityGroup":
            # Mocking a Security Group with default ingress rules
            outputs = {
                **args.inputs,
                "ingress": [
                    {"from_port": 80, "cidr_blocks": ["0.0.0.0/0"]},  # Allow HTTP traffic from anywhere
                    {"from_port": 22, "cidr_blocks": ["192.168.0.0/16"]}  # Allow SSH traffic from a specific CIDR block
                ]
            }
        
        # Returning a mocked resource ID and the output values
        return [args.name + '_id', outputs]

    def call(self, args: pulumi.runtime.MockCallArgs) -> Dict[str, Any]:
        # Mocking a call to get an AMI
        if args.token == "aws:ec2/getAmi:getAmi":
            return {
                "architecture": "x86_64",  # Mock architecture
                "id": "ami-0eb1f3cdeeb8eed2a",  # Mock AMI ID
            }
        
        # Return an empty dictionary if no specific mock is needed
        return {}

# Setting the custom mocks for Pulumi
pulumi.runtime.set_mocks(MyMocks())

# Import the infrastructure to be tested
import infra

# Define a test function to validate the AMI ID of the EC2 instance
@pulumi.runtime.test
def test_instance_ami():
    def check_ami(ami_id: str) -> None:
        print(f"AMI ID received: {ami_id}")
        # Assertion to ensure the AMI ID is the expected one
        assert ami_id == "ami-0eb1f3cdeeb8eed2a", 'EC2 instance must have the correct AMI ID'

    # Running the test to check the AMI ID
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.ami.apply(check_ami))

# Define a test function to validate the availability zone of the EC2 instance
@pulumi.runtime.test
def test_instance_az():
    def check_az(availability_zone: str) -> None:
        print(f"Availability Zone received: {availability_zone}")
        # Assertion to ensure the instance is in the correct availability zone
        assert availability_zone == "eu-central-1a", 'EC2 instance must be in the correct availability zone'
    
    # Running the test to check the availability zone
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.availability_zone.apply(check_az))

# Define a test function to validate the tags of the EC2 instance
@pulumi.runtime.test
def test_instance_tags():
    def check_tags(tags: Dict[str, Any]) -> None:
        print(f"Tags received: {tags}")
        # Assertions to ensure the instance has tags and a 'Name' tag
        assert tags, 'EC2 instance must have tags'
        assert 'Name' in tags, 'EC2 instance must have a Name tag'
    
    # Running the test to check the tags
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.tags.apply(check_tags))

# Define a test function to validate the user data script of the EC2 instance
@pulumi.runtime.test
def test_instance_userdata():
    def check_user_data(user_data_script: str) -> None:
        print(f"User data received: {user_data_script}")
        # Assertion to ensure the instance has user data configured
        assert user_data_script is not None, 'EC2 instance must have user_data_script configured'
    
    # Running the test to check the user data script
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.user_data.apply(check_user_data))

Github Actions

Introduction

GitHub Actions is a powerful automation tool integrated within GitHub, enabling developers to automate their workflows, including testing, building, and deploying code. Pulumi, on the other hand, is an Infrastructure as Code (IaC) tool that allows you to manage cloud resources using familiar programming languages. In this post, we’ll explore why you should use GitHub Actions and its specific purpose when combined with Pulumi.

Why Use GitHub Actions and Its Importance

GitHub Actions is a powerful tool for automating workflows within your GitHub repository, offering several key benefits, especially when combined with Pulumi:

Execution

To execute the GitHub Actions workflow:

By incorporating this workflow, you ensure that your Pulumi infrastructure is continuously integrated and deployed with proper validation, significantly improving the reliability and efficiency of your infrastructure management process.

Example: Deploy infrastructure with Pulumi


name: Pulumi Deployment

on:
  push:
    branches:
      - main

env:
  # Environment variables for AWS credentials and private key.
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
  PRIVATE_KEY: ${{ secrets.PRIVATE_KEY }}

jobs:
  pulumi-deploy:
    runs-on: ubuntu-latest
    environment: dev

    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        # Check out the repository code to the runner.

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v3
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: eu-central-1
        # Set up AWS credentials for use in subsequent actions.

      - name: Set up SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/XYZ.pem
          chmod 600 ~/.ssh/XYZ.pem
        # Create an SSH directory, add the private SSH key, and set permissions.

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
        # Set up Python 3.9 environment for running Python-based tasks.
  
      - name: Set up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '14'
        # Set up Node.js 14 environment for running Node.js-based tasks.

      - name: Install project dependencies
        run: npm install
        working-directory: .
        # Install Node.js project dependencies specified in `package.json`.
      
      - name: Install Pulumi
        run: npm install -g pulumi
        # Install the Pulumi CLI globally.

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
        working-directory: .
        # Upgrade pip and install Python dependencies from `requirements.txt`.

      - name: Login to Pulumi
        run: pulumi login
        env:
          PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
        # Log in to Pulumi using the access token stored in secrets.
        
      - name: Set Pulumi configuration for tests
        run: pulumi config set aws:region eu-central-1 --stack dev
        # Set Pulumi configuration to specify AWS region for the `dev` stack.

      - name: Pulumi stack select
        run: pulumi stack select dev  
        working-directory: .
        # Select the `dev` stack for Pulumi operations.

      - name: Run tests
        run: |
          pulumi config set aws:region eu-central-1
          pytest
        working-directory: .
        # Set AWS region configuration and run tests using pytest.
    
      - name: Preview Pulumi changes
        run: pulumi preview --stack dev
        working-directory: .
        # Preview the changes that Pulumi will apply to the `dev` stack.
    
      - name: Update Pulumi stack
        run: pulumi up --yes --stack dev
        working-directory: . 
        # Apply the changes to the `dev` stack with Pulumi.

      - name: Pulumi stack output
        run: pulumi stack output
        working-directory: .
        # Retrieve and display outputs from the Pulumi stack.

      - name: Cleanup Pulumi stack
        run: pulumi destroy --yes --stack dev
        working-directory: . 
        # Destroy the `dev` stack to clean up resources.

      - name: Pulumi stack output (after destroy)
        run: pulumi stack output
        working-directory: .
        # Retrieve and display outputs from the Pulumi stack after destruction.

      - name: Logout from Pulumi
        run: pulumi logout
        # Log out from the Pulumi session.

Output:

Finally, let’s take a look at how everything appears both in GitHub and in AWS. Check out the screenshots below to see the GitHub Actions workflow in action and the resulting AWS resources.

GitHub Actions workflow

AWS FSx resource

AWS FSx storage virtual machine


Outlook: Exploring Advanced Features of Pulumi and Amazon FSx for NetApp ONTAP

As you become more comfortable with Pulumi and Amazon FSx for NetApp ONTAP, there are numerous advanced features and capabilities to explore. These can significantly enhance your infrastructure automation and storage management strategies. In a follow-up blog post, we will delve into these advanced topics, providing a deeper understanding and practical examples.

Advanced Features of Pulumi

  1. Cross-Cloud Infrastructure Management
    Pulumi supports multiple cloud providers, including AWS, Azure, Google Cloud, and Kubernetes. In more advanced scenarios, you can manage resources across different clouds in a single Pulumi project, enabling true multi-cloud and hybrid cloud architectures.
  2. Component Resources
    Pulumi allows you to create reusable components by grouping related resources into custom classes. This is particularly useful for complex deployments where you want to encapsulate and reuse configurations across different projects or environments.
  3. Automation API
    Pulumi’s Automation API enables you to embed Pulumi within your own applications, allowing for infrastructure to be managed programmatically. This can be useful for building custom CI/CD pipelines or integrating with other systems.
  4. Policy as Code with Pulumi CrossGuard
    Pulumi CrossGuard allows you to enforce compliance and security policies across your infrastructure using familiar programming languages. Policies can be applied to ensure that resources adhere to organizational standards, improving governance and reducing risk.
  5. Stack References and Dependency Management
    Pulumi’s stack references enable you to manage dependencies between different Pulumi stacks, allowing for complex, interdependent infrastructure setups. This is crucial for large-scale environments where components must interact and be updated in a coordinated manner.

Advanced Features of Amazon FSx for NetApp ONTAP

  1. Data Protection and Snapshots
    FSx for NetApp ONTAP offers advanced data protection features, including automated snapshots, SnapMirror for disaster recovery, and integration with AWS Backup. These features help safeguard your data and ensure business continuity.
  2. Data Tiering and Cost Optimization
    FSx for ONTAP includes intelligent data tiering, which automatically moves infrequently accessed data to lower-cost storage. This feature is vital for optimizing costs, especially in environments with large amounts of data that have varying access patterns.
  3. Multi-Protocol Access and CIFS/SMB Integration
    FSx for ONTAP supports multiple protocols, including NFS, SMB, and iSCSI, enabling seamless access from both Linux/Unix and Windows clients. This is particularly useful in mixed environments where applications or users need to access the same data using different protocols.
  4. Performance Tuning and Quality of Service (QoS)
    FSx for ONTAP allows you to fine-tune performance parameters and implement QoS policies, ensuring that critical workloads receive the necessary resources. This is essential for applications with stringent performance requirements.
  5. ONTAP System Manager and API Integration
    Advanced users can leverage the ONTAP System Manager or integrate with NetApp’s extensive API offerings to automate and customize the management of FSx for ONTAP. This level of control is invaluable for organizations looking to tailor their storage solutions to specific needs.

What’s Next?

In the next blog post, we will explore these advanced features in detail, providing practical examples and use cases. We’ll dive into multi-cloud management with Pulumi, demonstrate the creation of reusable infrastructure components, and explore how to enforce security and compliance policies with Pulumi CrossGuard. Additionally, we’ll examine advanced data management strategies with FSx for NetApp ONTAP, including snapshots, data tiering, and performance optimization.

Stay tuned as we take your infrastructure as code and cloud storage management to the next level!

Conclusion

This example demonstrates how Pulumi can be used to manage AWS infrastructure using Python. By defining resources like VPCs, subnets, security groups, and FSx file systems in code, you can version control your infrastructure and easily reproduce environments.

Amazon FSx for NetApp ONTAP offers a powerful and flexible solution for running file-based workloads in the cloud, combining the strengths of AWS and NetApp ONTAP. Pulumi’s ability to leverage existing programming languages for infrastructure management allows for more complex logic and better integration with your development workflows. However, it requires familiarity with these languages and has a smaller ecosystem compared to Terraform. Despite these differences, Pulumi is a powerful tool for managing modern cloud infrastructure.


Disclaimer

The information provided in this blog post is for educational and informational purposes only. The features and capabilities of Pulumi and Amazon FSx for NetApp ONTAP mentioned are subject to change as new versions and updates are released. While we strive to ensure that the content is accurate and up-to-date, we cannot guarantee that it reflects the latest changes or improvements. Always refer to the official documentation and consult with your cloud provider or technology partner for the most current and relevant information. The author and publisher of this blog post are not responsible for any errors or omissions, or for any actions taken based on the information provided.

Suggested Links

  1. Pulumi Official Documentation: https://www.pulumi.com/docs/
  2. Amazon FSx for NetApp ONTAP Documentation: https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/what-is.html
  3. Pulumi GitHub Repository: https://github.com/pulumi/pulumi
  4. NetApp ONTAP Documentation: https://docs.netapp.com/ontap/index.jsp
  5. AWS VPC Documentation: https://docs.aws.amazon.com/vpc/
  6. AWS Security Group Documentation: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html
  7. AWS EC2 Documentation: https://docs.aws.amazon.com/ec2/index.html
This article was initially created by Harshil Gupta.

Veeam & Proxmox VE

Veeam has made a strategic move by integrating the open-source virtualization solution Proxmox VE (Virtual Environment) into its  portfolio. Signaling its commitment into the evolving needs of the open-source community and the open-source virtualization market, this integration positions Veeam as a forward-thinking player in the industry, ready to support the rising tide of open-source solutions. The combination of Veeam’s data protection solutions with the flexibility of Proxmox VE’s platform offers enterprises a compelling alternative that promises cost savings and enhanced data security.

With the Proxmox VE, now also one of the most important and often requested open-source solution and hypervisor is being natively supported – and it could definitely make a turn in the virtualization market!

Opportunities for Open-Source Virtualization

In many enterprises, a major hypervisor platform is already in place, accompanied by a robust backup solution – often Veeam. However, until recently, Veeam lacked direct support for Proxmox VE, leaving a gap for those who have embraced or are considering this open-source virtualization platform. The latest version of Veeam changes the game by introducing the capability to create and manage backups and restores directly within Proxmox VE environments, without the need for agents inside the VMs.

This advancement means that entire VMs can now be backed up and restored across any hypervisor, providing unparalleled flexibility. Moreover, enterprises can seamlessly integrate a new Proxmox VE-based cluster into their existing Veeam setup, managing everything from a single, central point. This integration simplifies operations, reduces complexity, and enhances the overall efficiency of data protection strategies in environments that include multiple hypervisors by simply having a one-fits-all solution in place.

Also, an heavily underestimated benefit, offers the possibilities to easily migrate, copy, backup and restore entire VMs even independent of their underlying hypervisor – also known as cross platform recovery. As a result, operators are now able to shift VMs from VMware ESXi nodes / vSphere, or Hyper-V to Proxmox VE nodes. This provides a great solution to introduce and evaluate a new virtualization platform without taking any risks. For organizations looking to unify their virtualization and backup infrastructure, this update offers a significant leap forward.

Integration into Veeam

Integrating a new Proxmox cluster into an existing Veeam setup is a testament to the simplicity and user-centric design of both systems. Those familiar with Veeam will find the process to be intuitive and minimally disruptive, allowing for a seamless extension of their virtualization environment. This ease of integration means that your new Proxmox VE cluster can be swiftly brought under the protective umbrella of Veeam’s robust backup and replication services.

Despite the general ease of the process, it’s important to recognize that unique configurations and specific environments may present their own set of challenges. These corner cases, while not common, are worth noting as they can require special attention to ensure a smooth integration. Rest assured, however, that these are merely nuances in an otherwise straightforward procedure, and with a little extra care, even these can be managed effectively.

Overview

Starting with version 12.2, the Proxmox VE support is enabled and integrated by a plugin which gets installed on the Veeam Backup server. Veeam Backup for Proxmox incorporates a distributed architecture that necessitates the deployment of worker nodes. These nodes function analogously to data movers, facilitating the transfer of virtual machine payloads from the Proxmox VE hosts to the designated Backup Repository. The workers operate on a Linux platform and are seamlessly instantiated via the Veeam Backup Server console. Their role is critical and akin to that of proxy components in analogous systems such as AHV or VMware backup solutions.

Such a worker is needed at least once in a cluster. For improved performance, one worker for each Proxmox VE node might be considered. Each worker requires 6 vCPU, 6 GB memory and 100 GB disk space which should be kept in mind.

Requirements

This blog post assumes that an already present installation of Veeam Backup & Replication in version 12.2 or later is already in place and fully configured for another environment such like VMware. It also assumes that the Proxmox VE cluster is already present and a credential with the needed roles to perform the backup/restore actions is given.

Configuration

The integration and configuration of a Proxmox VE cluster can be fully done within the Veeam Backup & Replication Console application and does not require any additional commands on any cli to be executed. The previously mentioned worker nodes can be installed fully automated.

Adding a Proxmox Server

To integrate a new Proxmox Server into the Veeam Backup & Replication environment, one must initiate the process by accessing the Veeam console. Subsequently, navigate through the designated sections to complete the addition:

Virtual Infrastructure -> Add Server

This procedure is consistent with the established protocol for incorporating nodes from other virtualization platforms that are compatible with Veeam.

Afterwards, Veeam shows you a selection of possible and supported Hypervisors:

In this case we simply choose Proxmox VE and proceed the setup wizard.

During the next steps in the setup wizard, the authentication details, the hostname or IP address of the target Proxmox VE server and also a snapshot storage of the Proxmox VE server must be defined.

Hint: When it comes to the authentication details, take care to use functional credentials for the SSH service on the Proxmox VE server. If you usually use the root@pam credentials for the web interface, you simply need to prompt root to Veeam. Veeam will initiate a connection to the system over the ssh protocol.

In one of the last surveys of the setup wizard, Veeam offers to automatically install the required worker node. Such a worker node is a small sized VM that is running inside the cluster on the targeted Proxmox VE server. In general, a single worker node for a cluster in enough but to enhance the overall performance, one worker for each node is recommended.

Usage

Once the Proxmox VE server has been successfully integrated into the Veeam inventory, it can be managed as effortlessly as any other supported hypervisor, such as VMware vSphere or Microsoft Hyper-V. A significant advantage, as shown in the screenshot, is the capability to centrally administrate various hypervisors and servers in clusters. This eliminates the necessity for a separate Veeam instance for each cluster, streamlining operations. Nonetheless, there may be specific scenarios where individual setups for each cluster are preferable.

As a result, this does not only simplify the operator’s work when working with different servers and clusters but also provides finally the opportunity for cross-hypervisor-recoveries.

Creating Backup Jobs

Creating a new backup job for a single VM or even multiple VMs in a Proxmox environment is as simple and exactly the same way, like you already know for other hypervisors. However, let us have a quick summary about the needed tasks:

Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the drop-down menu.

When the New Backup Job wizard opens, you will need to enter a name and a description for the backup job. Click Next to proceed to the next step. Now, you will need to select the VMs that you want to back up. Click Add in the Virtual Machines step and choose the individual VMs or containers like folders, clusters, or entire hosts that you want to include in the backup. Once you have made your selection, click Next.

The next step is to specify where you want to store the backup files. In the Storage step, select the backup repository and decide on the retention policy that dictates how long you want to keep the backup data. After setting this up, click Next.

If you have configured multiple backup proxies, the next step allows you to specify which one to use. If you are not sure or if you prefer, you can let Veeam Backup & Replication automatically select the best proxy for the job. Click Next after making your choice.

Now it is time to schedule when the backup job should run. In the Schedule step, you can set up the job to run automatically at specific times or in response to certain events. After configuring the schedule, click Next.

Review all the settings on the summary page to ensure they are correct. If everything looks good, click Finish to create the backup job.

 

If you want to run the backup job immediately for ensuring everything works as expected, you can do so by right-clicking on the job and selecting Start. Alternatively, you can wait for the scheduled time to trigger the job automatically.

Restoring an entire VM

The restore and replication process for a full VM restore remains to the standard procedures. However, it now includes the significant feature of cross-hypervisor restore. This functionality allows for the migration of VMs between different hypervisor types without compatibility issues. For example, when introducing Proxmox VE into a corporate setting, operators can effortlessly migrate VMs from an existing hypervisor to the Proxmox VE cluster. Should any issues arise during the testing phase, the process also supports the reverse migration back to the original hypervisor. Let us have a look at the details.

Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the Disk menu.

Choose the Entire VM restore option, which will launch the wizard for restoring a full virtual machine. The first step in the wizard will ask you to select a backup from which you want to restore. You will see a list of available backups; select the one that contains the VM you wish to restore and proceed to the next step by clicking Next.

Now, you must decide on the restore point. Typically, this will be the most recent backup, but you may choose an earlier point if necessary. After selecting the restore point, continue to the next step.

The wizard will then prompt you to specify the destination for the VM. This is the very handy point for cross-hypervisor-restore where this could be the original location or a new location if you are performing a migration or don’t want to overwrite the existing VM. Configure the network settings as required, ensuring that the restored VM will have the appropriate network access.

In the next step, you will have options regarding the power state of the VM after the restoration. You can choose to power on the VM automatically or leave it turned off, depending on your needs.

Before finalizing the restore process, review all the settings to make sure they align with your intended outcome. This is your chance to go back and make any necessary adjustments. Once you’re satisfied with the configuration, proceed to restore the VM by clicking Finish.

The restoration process will begin, and its progress can be monitored within the Veeam Backup & Replication console. Depending on the size of the VM and the performance of your backup storage and network, the restoration can take some time.

File-Level-Restore

Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the Disk menu.

Select Restore guest files. The wizard for file-level recovery will start, guiding you through the necessary steps. The first step involves choosing the VM backup from which you want to restore files. Browse through the list of available backups, select the appropriate one, and then click Next to proceed.

Choose the restore point that you want to use for the file-level restore. This is typically the most recent backup, but you can select an earlier one if needed. After picking the restore point, click Next to continue.

At this stage, you may need to choose the operating system of the VM that you are restoring files from. This is particularly important if the backup is of a different OS than the one on the Veeam Backup & Replication server because it will determine the type of helper appliance required for the restore.

Veeam Backup & Replication will prompt you to deploy a helper appliance if the backup is from an OS that is not natively supported by the Windows-based Veeam Backup & Replication server. Follow the on-screen instructions to deploy the helper appliance, which will facilitate the file-level restore process.

Once the helper appliance is ready, you will be able to browse the file system of the backup. Navigate through the backup to locate the files or folders you wish to restore.

After selecting the files or folders for restoration, you will be prompted to choose the destination where you want to restore the data. You can restore to the original location or specify a new location, depending on your requirements.

Review your selections to confirm that the correct files are being restored and to the right destination. If everything is in order, proceed with the restoration by clicking Finish.

The file-level restore process will start, and you can monitor the progress within the Veeam Backup & Replication console. The time it takes to complete the restore will depend on the size and number of files being restored, as well as the performance of your backup storage and network.

Conclusion

Summarising all the things, the latest update to Veeam introduces a very important and welcomed integration with Proxmox VE, filling a significant gap for enterprises that have adopted this open-source virtualization platform. By enabling direct backups and restores of entire VMs across different hypervisors without the need for in-VM agents, Veeam now offers unparalleled flexibility and simplicity in managing mixed environments. This advancement not only streamlines operations and enhances data protection strategies but also empowers organizations to easily migrate and evaluate new open-source virtualization platforms like Proxmox VE with minimal risk. It is great to see that more and more companies are putting efforts into supporting open-source solutions which underlines the ongoing importance of open-source based products in enterprises.

Additionally, for those starting fresh with Proxmox, the Proxmox Backup Server remains a viable open-source alternative and you can find our blog post about configuring the Proxmox Backup Server right here. Overall, this update represents a significant step forward in unifying virtualization and backup infrastructures, offering both versatility and ease of integration.

We are always here to help and assist you with further consulting, planning, and integration needs. Whether you are exploring new virtualization platforms, optimizing your current infrastructure, or looking for expert guidance on your backup strategies, our team is dedicated to ensuring your success every step of the way. Do not hesitate to reach out to us for personalized support and tailored solutions to meet your unique requirements in virtualization- or backup environments.