Introduction
Redis is a widely popular in-memory key-value high-performance database, which can also be used as a cache and message broker. It has been a go-to choice for many due to its performance and versatility. Many cloud providers offer Redis-based solutions:
- Amazon Web Services (AWS) – Amazon ElastiCache for Redis
- Microsoft Azure – Azure Cache for Redis
- Google Cloud Platform (GCP) – Google Cloud Memorystore for Redis
However, due to recent changes in the licensing model of Redis, its prominence and usage are changing. Redis was initially developed under the open-source BSD license, allowing developers to freely use, modify and distribute the source code for both commercial and non-commercial purposes. As a result, Redis quickly gained popularity in the developer community.
But, Redis has recently changed to a dual source-available license. To be precise, in the future, it will be available under RSALv2 (Redis Source Available License Version 2) or SSPLv1 (Server Side Public License Version 1), commercial use requires individual agreements, potentially increasing costs for cloud service providers. For a detailed overview of these changes, refer to the Redis licensing page. Based on the Redis Community Edition, the source code will remain freely available for developers, customers and partners of the company. However, cloud service providers and others who want to use Redis as part of commercial offerings will have to make individual agreements with the provider.
Due to these recent changes in Redis’s licensing model, many developers and organizations are re-evaluating their in-memory key-value database choices. Valkey, an open-source fork of Redis, maintains the high performance and versatility while ensuring unrestricted use for both developers and commercial entities. The Linux Foundation forked the project and contributors are now supporting the Valkey project. More information can be found here and here. Its commitment to open-source principles has gained support from major cloud providers, including AWS. Amazon Web Services (AWS) announced “AWS is committed to supporting open source Valkey for the long term“, more information can be found here. So it may be the right time to switch the infrastructure from Redis to Valkey.
In this article, we will set up a Valkey instance with TLS and outline the steps to migrate your data from Redis seamlessly.
Overview of possible migration approaches
In general, there are several approaches to migrate:
- Reuse the database file
In this approach, Redis is shutted down to update the rdb file on disk and Valkey will be started using this file in its data directory. - Use
REPLICAOF
to connect Valkey to the Redis instance
Register the new Valkey instance as a replica of a Redis master to stream the data. The Valkey instance and its network must be able to reach the Redis service. - Automated data migration to Valkey
Scripting the migration can be used on a machine that can reach both the Redis and Valkey databases.
In this bog article, we encounter that direct access to the file system of the Redis server in the cloud is not feasible to reuse the database file and that the Valkey service and Redis service are in different networks and cannot reach themselves for setting up a replica. As a result, we choose the third option and run an automated data migration script on a different machine, which can connect to both servers and transfer the data.
Setup of Valkey
In case you are using a cloud service, please consult their instructions how to setup a Valkey instance. Since it is a new project there are only a few distributions, which provides ready-to-use packages like Red Hat Enterprise Linux 8 and 9 via Extra Packages for Enterprise Linux (EPEL). In this blog post, we use an on-premises Debian 12 server to host the Valkey server in version 7.2.6 with TLS . Please consult your distribution guides to install Valkey or use the manual provided on GitHub. The migration itself with be done by a Python 3 script using TLS.
Start the server and establish a client connection:
In this bog article, we will use a server with the listed TLS parameters. We specify all used TLS parameters including port 0
to disable the non-TLS port completely:
$ valkey-server --tls-port 6379 --port 0 --tls-cert-file ./tls/redis.crt --tls-key-file ./tls/redis.key --tls-ca-cert-file ./tls/ca.crt .+^+. .+#########+. .+########+########+. Valkey 7.2.6 (579cca5f/0) 64 bit .+########+' '+########+. .########+' .+. '+########. Running in standalone mode |####+' .+#######+. '+####| Port: 6379 |###| .+###############+. |###| PID: 436767 |###| |#####*'' ''*#####| |###| |###| |####' .-. '####| |###| |###| |###( (@@@) )###| |###| https://valkey.io |###| |####. '-' .####| |###| |###| |#####*. .*#####| |###| |###| '+#####| |#####+' |###| |####+. +##| |#+' .+####| '#######+ |##| .+########' '+###| |##| .+########+' '| |####+########+' +#########+' '+v+' 436767:M 27 Aug 2024 16:08:56.058 * Server initialized 436767:M 27 Aug 2024 16:08:56.058 * Loading RDB produced by valkey version 7.2.6 [...] 436767:M 27 Aug 2024 16:08:56.058 * Ready to accept connections tls
Now it is time to test the connection with a client using TLS:
$ valkey-cli --tls --cert ./tls/redis.crt --key ./tls/redis.key --cacert ./tls/ca.crt -p 6379 127.0.0.1:6379> INFO SERVER # Server server_name:valkey valkey_version:7.2.6 [...]
Automated data migration to Valkey
Finally, we migrate the data in this example using a Python 3 script. This Python script establishes connections to both the Redis source and Valkey target databases, fetches all the keys from the Redis database and creates or updates each key-value pair in the Valkey database. This approach is not off the shelf and uses the redis-py library, which provides a list of examples. By using Python 3 the process could even be extended to filter unwanted data, alter values to be suitable for the new environment or by adding sanity checks. The script, which is used here, provides progress updates during the migration process:
#!/usr/bin/env python3 import redis # Connect to the Redis source database, which is password protected, via IP and port redis_client = redis.StrictRedis(host='172.17.0.3', port=6379, password='secret', db=0) # Connect to the Valkey target database, which is using TLS ssl_certfile="./tls/client.crt" ssl_keyfile="./tls/client.key" ssl_ca_certs="./tls/ca.crt" valkey_client = redis.Redis( host="192.168.0.3", port=6379, ssl=True, ssl_certfile=ssl_certfile, ssl_keyfile=ssl_keyfile, ssl_cert_reqs="required", ssl_ca_certs=ssl_ca_certs, ) # Fetch all keys from the Redis database keys = redis_client.keys('*') print("Found", len(keys), "Keys in Source!") # Migrate each key-value pair to the Valkey database for counter, key in enumerate(keys): value = redis_client.get(key) valkey_client.set(key, value) print("Status: ", round((counter+1) / len(keys) * 100, 1), "%", end='\r') print()
To start the process execute the script:
$ python3 redis_to_tls_valkey.py Found 569383 Keys in Source! Status: 100.0 %
As a last step, configure your application to connect to the new Valkey server.
Conclusion
Since the change of Redis’ license, the new project Valkey is gaining more and more attraction. Migrating to Valkey ensures continued access to a robust, open-source in-memory database without the licensing restrictions of Redis. Whether you’re running your infrastructure on-premises or in the cloud, this guide provides the steps needed for a successful migration. Migrating from a cloud instance to a new environment can be cumbersome, because of no direct file access or isolated networks. Depending on these circumstances, we used a Python script, which is a flexible way to implement various steps to master the task.
If you find this guide helpful and in case you need support to migrate your databases, feel free to contact us. We like to support you on-premises or in cloud environments.
Mastering Cloud Infrastructure with Pulumi: Introduction
In today’s rapidly changing landscape of cloud computing, managing infrastructure as code (IaC) has become essential for developers and IT professionals. Pulumi, an open-source IaC tool, brings a fresh perspective to the table by enabling infrastructure management using popular programming languages like JavaScript, TypeScript, Python, Go, and C#. This approach offers a unique blend of flexibility and power, allowing developers to leverage their existing coding skills to build, deploy, and manage cloud infrastructure. In this post, we’ll explore the world of Pulumi and see how it pairs with Amazon FSx for NetApp ONTAP—a robust solution for scalable and efficient cloud storage.
Pulumi – The Theory
Why Pulumi?
Pulumi distinguishes itself among IaC tools for several compelling reasons:
- Use Familiar Programming Languages: Unlike traditional IaC tools that rely on domain-specific languages (DSLs), Pulumi allows you to use familiar programming languages. This means no need to learn new syntax, and you can incorporate sophisticated logic, conditionals, and loops directly in your infrastructure code.
- Seamless Integration with Development Workflows: Pulumi integrates effortlessly with existing development workflows and tools, making it a natural fit for modern software projects. Whether you’re managing a simple web app or a complex, multi-cloud architecture, Pulumi provides the flexibility to scale without sacrificing ease of use.
Challenges with Pulumi
Like any tool, Pulumi comes with its own set of challenges:
- Learning Curve: While Pulumi leverages general-purpose languages, developers need to be proficient in the language they choose, such as Python or TypeScript. This can be a hurdle for those unfamiliar with these languages.
- Growing Ecosystem: As a relatively new tool, Pulumi’s ecosystem is still expanding. It might not yet match the extensive plugin libraries of older IaC tools, but its vibrant and rapidly growing community is a promising sign of things to come.
State Management in Pulumi: Ensuring Consistency Across Deployments
Effective infrastructure management hinges on proper state handling. Pulumi excels in this area by tracking the state of your infrastructure, enabling it to manage resources efficiently. This capability ensures that Pulumi knows exactly what needs to be created, updated, or deleted during deployments. Pulumi offers several options for state storage:
- Local State: Stored directly on your local file system. This option is ideal for individual projects or simple setups.
- Remote State: By default, Pulumi stores state remotely on the Pulumi Service (a cloud-hosted platform provided by Pulumi), but it also allows you to configure storage on AWS S3, Azure Blob Storage, or Google Cloud Storage. This is particularly useful in team environments where collaboration is essential.
Managing state effectively is crucial for maintaining consistency across deployments, especially in scenarios where multiple team members are working on the same infrastructure.
Other IaC Tools: Comparing Pulumi to Traditional IaC Tools
When comparing Pulumi to other Infrastructure as Code (IaC) tools, several drawbacks of traditional approaches become evident:
- Domain-Specific Language (DSL) Limitations: Many IaC tools depend on DSLs, such as Terraform’s HCL, requiring users to learn a specialized language specific to the tool.
- YAML/JSON Constraints: Tools that rely on YAML or JSON can be both restrictive and verbose, complicating the management of more complex configurations.
- Steep Learning Curve: The necessity to master DSLs or particular configuration formats adds to the learning curve, especially for newcomers to IaC.
- Limited Logical Capabilities: DSLs often lack support for advanced logic constructs such as loops, conditionals, and reusability. This limitation can lead to repetitive code that is challenging to maintain.
- Narrow Ecosystem: Some IaC tools have a smaller ecosystem, offering fewer plugins, modules, and community-driven resources.
- Challenges with Code Reusability: The inability to reuse code across different projects or components can hinder efficiency and scalability in infrastructure management.
- Testing Complexity: Testing infrastructure configurations written in DSLs can be challenging, making it difficult to ensure the reliability and robustness of the infrastructure code.
Pulumi – In Practice
Introduction
In the this section, we’ll dive into a practical example to better understand Pulumi’s capabilities. We’ll also explore how to set up a project using Pulumi with AWS and automate it using GitHub Actions for CI/CD.
Prerequisites
Before diving into using Pulumi with AWS and automating your infrastructure management through GitHub Actions, ensure you have the following prerequisites in place:
- Pulumi CLI: Begin by installing the Pulumi CLI by following the official installation instructions. After installation, verify that Pulumi is correctly set up and accessible in your system’s PATH by running a quick version check.
- AWS CLI: Install the AWS CLI, which is essential for interacting with AWS services. Configure the AWS CLI with your AWS credentials to ensure you have access to the necessary AWS resources. Ensure your AWS account is equipped with the required permissions, especially for IAM, EC2, S3, and any other AWS services you plan to manage with Pulumi.
- AWS IAM User/Role for GitHub Actions: Create a dedicated IAM user or role in AWS specifically for use in your GitHub Actions workflows. This user or role should have permissions necessary to manage the resources in your Pulumi stack. Store the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY securely as secrets in your GitHub repository.
- Pulumi Account: Set up a Pulumi account if you haven’t already. Generate a Pulumi access token and store it as a secret in your GitHub repository to facilitate secure automation.
- Python and Pip: Install Python (version 3.7 or higher is recommended) along with Pip, which are necessary for Pulumi’s Python SDK. Once Python is installed, proceed to install Pulumi’s Python SDK along with any required AWS packages to enable infrastructure management through Python.
- GitHub Account: Ensure you have an active GitHub account to host your code and manage your repository. Create a GitHub repository where you’ll store your Pulumi project and related automation workflows. Store critical secrets like AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and your Pulumi access token securely in the GitHub repository’s secrets section.
- GitHub Runners: Utilize GitHub-hosted runners to execute your GitHub Actions workflows, or set up self-hosted runners if your project requires them. Confirm that the runners have all necessary tools installed, including Pulumi, AWS CLI, Python, and any other dependencies your Pulumi project might need.
Project Structure
When working with Infrastructure as Code (IaC) using Pulumi, maintaining an organized project structure is essential. A clear and well-defined directory structure not only streamlines the development process but also improves collaboration and deployment efficiency. In this post, we’ll explore a typical directory structure for a Pulumi project and explain the significance of each component.
Overview of a Typical Pulumi Project Directory
A standard Pulumi project might be organized as follows:
/project-root
├── .github
│ └── workflows
│ └── workflow.yml # GitHub Actions workflow for CI/CD
├── __main__.py # Entry point for the Pulumi program
├── infra.py # Infrastructure code
├── pulumi.dev.yml # Pulumi configuration for the development environment
├── pulumi.prod.yml # Pulumi configuration for the production environment
├── pulumi.yml # Pulumi configuration (common or default settings)
├── requirements.txt # Python dependencies
└── test_infra.py # Tests for infrastructure code
NetApp FSx on AWS
Introduction
Amazon FSx for NetApp ONTAP offers a fully managed, scalable storage solution built on the NetApp ONTAP file system. It provides high-performance, highly available shared storage that seamlessly integrates with your AWS environment. Leveraging the advanced data management capabilities of ONTAP, FSx for NetApp ONTAP is ideal for applications needing robust storage features and compatibility with existing NetApp systems.
Key Features
- High Performance: FSx for ONTAP delivers low-latency storage designed to handle demanding, high-throughput workloads.
- Scalability: Capable of scaling to support petabytes of storage, making it suitable for both small and large-scale applications.
- Advanced Data Management: Leverages ONTAP’s comprehensive data management features, including snapshots, cloning, and disaster recovery.
- Multi-Protocol Access: Supports NFS and SMB protocols, providing flexible access options for a variety of clients.
- Cost-Effectiveness: Implements tiering policies to automatically move less frequently accessed data to lower-cost storage, helping optimize storage expenses.
What It’s About
In the next sections, we’ll walk through the specifics of setting up each component using Pulumi code, illustrating how to create a VPC, configure subnets, set up a security group, and deploy an FSx for NetApp ONTAP file system, all while leveraging the robust features provided by both Pulumi and AWS.
Architecture Overview
A visual representation of the architecture we’ll deploy using Pulumi: Single AZ Deployment with FSx and EC2
The diagram above illustrates the architecture for deploying an FSx for NetApp ONTAP file system within a single Availability Zone. The setup includes a VPC with public and private subnets, an Internet Gateway for outbound traffic, and a Security Group controlling access to the FSx file system and the EC2 instance. The EC2 instance is configured to mount the FSx volume using NFS, enabling seamless access to storage.
Setting up Pulumi
Follow these steps to set up Pulumi and integrate it with AWS:
Install Pulumi: Begin by installing Pulumi using the following command:
curl -fsSL https://get.pulumi.com | sh
Install AWS CLI: If you haven’t installed it yet, install the AWS CLI to manage AWS services:
pip install awscli
Configure AWS CLI: Configure the AWS CLI with your credentials:
aws configure
Create a New Pulumi Project: Initialize a new Pulumi project with AWS and Python:
pulumi new aws-python
Configure Your Pulumi Stack: Set the AWS region for your Pulumi stack:
pulumi config set aws:region eu-central-1
Deploy Your Stack: Deploy your infrastructure using Pulumi:
pulumi preview ; pulumi up
Example: VPC, Subnets, and FSx for NetApp ONTAP
Let’s dive into an example Pulumi project that sets up a Virtual Private Cloud (VPC), subnets, a security group, an Amazon FSx for NetApp ONTAP file system, and an EC2 instance.
Pulumi Code Example: VPC, Subnets, and FSx for NetApp ONTAP
The first step is to define all the parameters required to set up the infrastructure. You can use the following example to configure these parameters as specified in the pulumi.dev.yaml file.
This pulumi.dev.yaml file contains configuration settings for a Pulumi project. It specifies various parameters for the deployment environment, including the AWS region, availability zones, and key name. It also defines CIDR blocks for subnets. These settings are used to configure and deploy cloud infrastructure resources in the specified AWS region.
config:
aws:region: eu-central-1
demo:availabilityZone: eu-central-1a
demo:keyName: XYZ
demo:subnet1CIDER: 10.0.3.0/24
demo:subnet2CIDER: 10.0.4.0/24
The following code snippet should be placed in the infra.py file. It details the setup of the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each step in the code is explained through inline comments.
import pulumi import pulumi_aws as aws import pulumi_command as command import os # Retrieve configuration values from Pulumi configuration files aws_config = pulumi.Config("aws") region = aws_config.require("region") # The AWS region where resources will be deployed demo_config = pulumi.Config("demo") availability_zone = demo_config.require("availabilityZone") # Availability Zone for the deployment subnet1_cidr = demo_config.require("subnet1CIDER") # CIDR block for the public subnet subnet2_cidr = demo_config.require("subnet2CIDER") # CIDR block for the private subnet key_name = demo_config.require("keyName") # Name of the SSH key pair for EC2 instance access# Create a new VPC with DNS support enabled vpc = aws.ec2.Vpc( "fsxVpc", cidr_block="10.0.0.0/16", # VPC CIDR block enable_dns_support=True, # Enable DNS support in the VPC enable_dns_hostnames=True # Enable DNS hostnames in the VPC ) # Create an Internet Gateway to allow internet access from the VPC internet_gateway = aws.ec2.InternetGateway( "vpcInternetGateway", vpc_id=vpc.id # Attach the Internet Gateway to the VPC ) # Create a public route table for routing internet traffic via the Internet Gateway public_route_table = aws.ec2.RouteTable( "publicRouteTable", vpc_id=vpc.id, routes=[aws.ec2.RouteTableRouteArgs( cidr_block="0.0.0.0/0", # Route all traffic (0.0.0.0/0) to the Internet Gateway gateway_id=internet_gateway.id )] ) # Create a single public subnet in the specified Availability Zone public_subnet = aws.ec2.Subnet( "publicSubnet", vpc_id=vpc.id, cidr_block=subnet1_cidr, # CIDR block for the public subnet availability_zone=availability_zone, # The specified Availability Zone map_public_ip_on_launch=True # Assign public IPs to instances launched in this subnet ) # Create a single private subnet in the same Availability Zone private_subnet = aws.ec2.Subnet( "privateSubnet", vpc_id=vpc.id, cidr_block=subnet2_cidr, # CIDR block for the private subnet availability_zone=availability_zone # The same Availability Zone ) # Associate the public subnet with the public route table to enable internet access public_route_table_association = aws.ec2.RouteTableAssociation( "publicRouteTableAssociation", subnet_id=public_subnet.id, route_table_id=public_route_table.id ) # Create a security group to control inbound and outbound traffic for the FSx file system security_group = aws.ec2.SecurityGroup( "fsxSecurityGroup", vpc_id=vpc.id, description="Allow NFS traffic", # Description of the security group ingress=[ aws.ec2.SecurityGroupIngressArgs( protocol="tcp", from_port=2049, # NFS protocol port to_port=2049, cidr_blocks=["0.0.0.0/0"] # Allow NFS traffic from anywhere ), aws.ec2.SecurityGroupIngressArgs( protocol="tcp", from_port=111, # RPCBind port for NFS to_port=111, cidr_blocks=["0.0.0.0/0"] # Allow RPCBind traffic from anywhere ), aws.ec2.SecurityGroupIngressArgs( protocol="udp", from_port=111, # RPCBind port for NFS over UDP to_port=111, cidr_blocks=["0.0.0.0/0"] # Allow RPCBind traffic over UDP from anywhere ), aws.ec2.SecurityGroupIngressArgs( protocol="tcp", from_port=22, # SSH port for EC2 instance access to_port=22, cidr_blocks=["0.0.0.0/0"] # Allow SSH traffic from anywhere ) ], egress=[ aws.ec2.SecurityGroupEgressArgs( protocol="-1", # Allow all outbound traffic from_port=0, to_port=0, cidr_blocks=["0.0.0.0/0"] # Allow all outbound traffic to anywhere ) ] ) # Create the FSx for NetApp ONTAP file system in the private subnet file_system = aws.fsx.OntapFileSystem( "fsxFileSystem", subnet_ids=[private_subnet.id], # Deploy the FSx file system in the private subnet preferred_subnet_id=private_subnet.id, # Preferred subnet for the FSx file system security_group_ids=[security_group.id], # Attach the security group to the FSx file system deployment_type="SINGLE_AZ_1", # Single Availability Zone deployment throughput_capacity=128, # Throughput capacity in MB/s storage_capacity=1024 # Storage capacity in GB ) # Create a Storage Virtual Machine (SVM) within the FSx file system storage_virtual_machine = aws.fsx.OntapStorageVirtualMachine( "storageVirtualMachine", file_system_id=file_system.id, # Associate the SVM with the FSx file system name="svm1", # Name of the SVM root_volume_security_style="UNIX" # Security style for the root volume ) # Create a volume within the Storage Virtual Machine (SVM) volume = aws.fsx.OntapVolume( "fsxVolume", storage_virtual_machine_id=storage_virtual_machine.id, # Associate the volume with the SVM name="vol1", # Name of the volume junction_path="/vol1", # Junction path for mounting size_in_megabytes=10240, # Size of the volume in MB storage_efficiency_enabled=True, # Enable storage efficiency features tiering_policy=aws.fsx.OntapVolumeTieringPolicyArgs( name="SNAPSHOT_ONLY" # Tiering policy for the volume ), security_style="UNIX" # Security style for the volume ) # Extract the DNS name from the list of SVM endpoints dns_name = storage_virtual_machine.endpoints.apply(lambda e: e[0]['nfs'][0]['dns_name']) # Get the latest Amazon Linux 2 AMI for the EC2 instance ami = aws.ec2.get_ami( most_recent=True, owners=["amazon"], filters=[{"name": "name", "values": ["amzn2-ami-hvm-*-x86_64-gp2"]}] # Filter for Amazon Linux 2 AMI ) # Create an EC2 instance in the public subnet ec2_instance = aws.ec2.Instance( "fsxEc2Instance", instance_type="t3.micro", # Instance type for the EC2 instance vpc_security_group_ids=[security_group.id], # Attach the security group to the EC2 instance subnet_id=public_subnet.id, # Deploy the EC2 instance in the public subnet ami=ami.id, # Use the latest Amazon Linux 2 AMI key_name=key_name, # SSH key pair for accessing the EC2 instance tags={"Name": "FSx EC2 Instance"} # Tag for the EC2 instance ) # User data script to install NFS client and mount the FSx volume on the EC2 instance user_data_script = dns_name.apply(lambda dns: f"""#!/bin/bash sudo yum update -y sudo yum install -y nfs-utils sudo mkdir -p /mnt/fsx if ! mountpoint -q /mnt/fsx; then sudo mount -t nfs {dns}:/vol1 /mnt/fsx fi """) # Retrieve the private key for SSH access from environment variables while running with Github Actions private_key_content = os.getenv("PRIVATE_KEY") print(private_key_content) # Ensure the FSx file system is available before executing the script on the EC2 instance pulumi.Output.all(file_system.id, ec2_instance.public_ip).apply(lambda args: command.remote.Command( "mountFsxFileSystem", connection=command.remote.ConnectionArgs( host=args[1], user="ec2-user", private_key=private_key_content ), create=user_data_script, opts=pulumi.ResourceOptions(depends_on=[volume]) ))
Pytest with Pulumi
# Importing necessary libraries
import pulumi
import pulumi_aws as aws
from typing import Any, Dict, List
# Setting up configuration values for AWS region and various parameters
pulumi.runtime.set_config('aws:region', 'eu-central-1')
pulumi.runtime.set_config('demo:availabilityZone1', 'eu-central-1a')
pulumi.runtime.set_config('demo:availabilityZone2', 'eu-central-1b')
pulumi.runtime.set_config('demo:subnet1CIDER', '10.0.3.0/24')
pulumi.runtime.set_config('demo:subnet2CIDER', '10.0.4.0/24')
pulumi.runtime.set_config('demo:keyName', 'XYZ') - Change based on your own key
# Creating a class MyMocks to mock Pulumi's resources for testing
class MyMocks(pulumi.runtime.Mocks):
def new_resource(self, args: pulumi.runtime.MockResourceArgs) -> List[Any]:
# Initialize outputs with the resource's inputs
outputs = args.inputs
# Mocking specific resources based on their type
if args.typ == "aws:ec2/instance:Instance":
# Mocking an EC2 instance with some default values
outputs = {
**args.inputs, # Start with the given inputs
"ami": "ami-0eb1f3cdeeb8eed2a", # Mock AMI ID
"availability_zone": "eu-central-1a", # Mock availability zone
"publicIp": "203.0.113.12", # Mock public IP
"publicDns": "ec2-203-0-113-12.compute-1.amazonaws.com", # Mock public DNS
"user_data": "mock user data script", # Mock user data
"tags": {"Name": "test"} # Mock tags
}
elif args.typ == "aws:ec2/securityGroup:SecurityGroup":
# Mocking a Security Group with default ingress rules
outputs = {
**args.inputs,
"ingress": [
{"from_port": 80, "cidr_blocks": ["0.0.0.0/0"]}, # Allow HTTP traffic from anywhere
{"from_port": 22, "cidr_blocks": ["192.168.0.0/16"]} # Allow SSH traffic from a specific CIDR block
]
}
# Returning a mocked resource ID and the output values
return [args.name + '_id', outputs]
def call(self, args: pulumi.runtime.MockCallArgs) -> Dict[str, Any]:
# Mocking a call to get an AMI
if args.token == "aws:ec2/getAmi:getAmi":
return {
"architecture": "x86_64", # Mock architecture
"id": "ami-0eb1f3cdeeb8eed2a", # Mock AMI ID
}
# Return an empty dictionary if no specific mock is needed
return {}
# Setting the custom mocks for Pulumi
pulumi.runtime.set_mocks(MyMocks())
# Import the infrastructure to be tested
import infra
# Define a test function to validate the AMI ID of the EC2 instance
@pulumi.runtime.test
def test_instance_ami():
def check_ami(ami_id: str) -> None:
print(f"AMI ID received: {ami_id}")
# Assertion to ensure the AMI ID is the expected one
assert ami_id == "ami-0eb1f3cdeeb8eed2a", 'EC2 instance must have the correct AMI ID'
# Running the test to check the AMI ID
pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.ami.apply(check_ami))
# Define a test function to validate the availability zone of the EC2 instance
@pulumi.runtime.test
def test_instance_az():
def check_az(availability_zone: str) -> None:
print(f"Availability Zone received: {availability_zone}")
# Assertion to ensure the instance is in the correct availability zone
assert availability_zone == "eu-central-1a", 'EC2 instance must be in the correct availability zone'
# Running the test to check the availability zone
pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.availability_zone.apply(check_az))
# Define a test function to validate the tags of the EC2 instance
@pulumi.runtime.test
def test_instance_tags():
def check_tags(tags: Dict[str, Any]) -> None:
print(f"Tags received: {tags}")
# Assertions to ensure the instance has tags and a 'Name' tag
assert tags, 'EC2 instance must have tags'
assert 'Name' in tags, 'EC2 instance must have a Name tag'
# Running the test to check the tags
pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.tags.apply(check_tags))
# Define a test function to validate the user data script of the EC2 instance
@pulumi.runtime.test
def test_instance_userdata():
def check_user_data(user_data_script: str) -> None:
print(f"User data received: {user_data_script}")
# Assertion to ensure the instance has user data configured
assert user_data_script is not None, 'EC2 instance must have user_data_script configured'
# Running the test to check the user data script
pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.user_data.apply(check_user_data))
Github Actions
Introduction
GitHub Actions is a powerful automation tool integrated within GitHub, enabling developers to automate their workflows, including testing, building, and deploying code. Pulumi, on the other hand, is an Infrastructure as Code (IaC) tool that allows you to manage cloud resources using familiar programming languages. In this post, we’ll explore why you should use GitHub Actions and its specific purpose when combined with Pulumi.
Why Use GitHub Actions and Its Importance
GitHub Actions is a powerful tool for automating workflows within your GitHub repository, offering several key benefits, especially when combined with Pulumi:
- Integrated CI/CD: GitHub Actions seamlessly integrates Continuous Integration and Continuous Deployment (CI/CD) directly into your GitHub repository. This automation enhances consistency in testing, building, and deploying code, reducing the risk of manual errors.
- Custom Workflows: It allows you to create custom workflows for different stages of your software development lifecycle, such as code linting, running unit tests, or managing complex deployment processes. This flexibility ensures your automation aligns with your specific needs.
- Event-Driven Automation: You can trigger GitHub Actions with events like pushes, pull requests, or issue creation. This event-driven approach ensures that tasks are automated precisely when needed, streamlining your workflow.
- Reusable Code: GitHub Actions supports reusable “actions” that can be shared across multiple workflows or repositories. This promotes code reuse and maintains consistency in automation processes.
- Built-in Marketplace: The GitHub Marketplace offers a wide range of pre-built actions from the community, making it easy to integrate third-party services or implement common tasks without writing custom code.
- Enhanced Collaboration: By using GitHub’s pull request and review workflows, teams can discuss and approve changes before deployment. This process reduces risks and improves collaboration on infrastructure changes.
- Automated Deployment: GitHub Actions automates the deployment of infrastructure code, using Pulumi to apply changes. This automation reduces the risk of manual errors and ensures a consistent deployment process.
- Testing: Running tests before deploying with GitHub Actions helps confirm that your infrastructure code works correctly, catching potential issues early and ensuring stability.
- Configuration Management: It manages and sets up necessary configurations for Pulumi and AWS, ensuring your environment is correctly configured for deployments.
- Preview and Apply Changes: GitHub Actions allows you to preview changes before applying them, helping you understand the impact of modifications and minimizing the risk of unintended changes.
- Cleanup: You can optionally destroy the stack after testing or deployment, helping control costs and maintain a clean environment.
Execution
To execute the GitHub Actions workflow:
- Placement: Save the workflow YAML file in your repository’s .github/workflows directory. This setup ensures that GitHub Actions will automatically detect and execute the workflow whenever there’s a push to the main branch of your repository.
- Workflow Actions: The workflow file performs several critical actions:
- Environment Setup: Configures the necessary environment for running the workflow.
- Dependency Installation: Installs the required dependencies, including Pulumi CLI and other Python packages.
- Testing: Runs your tests to verify that your infrastructure code functions as expected.
- Preview and Apply Changes: Uses Pulumi to preview and apply any changes to your infrastructure.
- Cleanup: Optionally destroys the stack after tests or deployment to manage costs and maintain a clean environment.
By incorporating this workflow, you ensure that your Pulumi infrastructure is continuously integrated and deployed with proper validation, significantly improving the reliability and efficiency of your infrastructure management process.
Example: Deploy infrastructure with Pulumi
name: Pulumi Deployment
on:
push:
branches:
- main
env:
# Environment variables for AWS credentials and private key.
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
PRIVATE_KEY: ${{ secrets.PRIVATE_KEY }}
jobs:
pulumi-deploy:
runs-on: ubuntu-latest
environment: dev
steps:
- name: Checkout code
uses: actions/checkout@v3
# Check out the repository code to the runner.
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v3
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
# Set up AWS credentials for use in subsequent actions.
- name: Set up SSH key
run: |
mkdir -p ~/.ssh
echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/XYZ.pem
chmod 600 ~/.ssh/XYZ.pem
# Create an SSH directory, add the private SSH key, and set permissions.
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
# Set up Python 3.9 environment for running Python-based tasks.
- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: '14'
# Set up Node.js 14 environment for running Node.js-based tasks.
- name: Install project dependencies
run: npm install
working-directory: .
# Install Node.js project dependencies specified in `package.json`.
- name: Install Pulumi
run: npm install -g pulumi
# Install the Pulumi CLI globally.
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
working-directory: .
# Upgrade pip and install Python dependencies from `requirements.txt`.
- name: Login to Pulumi
run: pulumi login
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
# Log in to Pulumi using the access token stored in secrets.
- name: Set Pulumi configuration for tests
run: pulumi config set aws:region eu-central-1 --stack dev
# Set Pulumi configuration to specify AWS region for the `dev` stack.
- name: Pulumi stack select
run: pulumi stack select dev
working-directory: .
# Select the `dev` stack for Pulumi operations.
- name: Run tests
run: |
pulumi config set aws:region eu-central-1
pytest
working-directory: .
# Set AWS region configuration and run tests using pytest.
- name: Preview Pulumi changes
run: pulumi preview --stack dev
working-directory: .
# Preview the changes that Pulumi will apply to the `dev` stack.
- name: Update Pulumi stack
run: pulumi up --yes --stack dev
working-directory: .
# Apply the changes to the `dev` stack with Pulumi.
- name: Pulumi stack output
run: pulumi stack output
working-directory: .
# Retrieve and display outputs from the Pulumi stack.
- name: Cleanup Pulumi stack
run: pulumi destroy --yes --stack dev
working-directory: .
# Destroy the `dev` stack to clean up resources.
- name: Pulumi stack output (after destroy)
run: pulumi stack output
working-directory: .
# Retrieve and display outputs from the Pulumi stack after destruction.
- name: Logout from Pulumi
run: pulumi logout
# Log out from the Pulumi session.
Output:
Veeam & Proxmox VE
Veeam has made a strategic move by integrating the open-source virtualization solution Proxmox VE (Virtual Environment) into its portfolio. Signaling its commitment into the evolving needs of the open-source community and the open-source virtualization market, this integration positions Veeam as a forward-thinking player in the industry, ready to support the rising tide of open-source solutions. The combination of Veeam’s data protection solutions with the flexibility of Proxmox VE’s platform offers enterprises a compelling alternative that promises cost savings and enhanced data security.
With the Proxmox VE, now also one of the most important and often requested open-source solution and hypervisor is being natively supported – and it could definitely make a turn in the virtualization market!
Opportunities for Open-Source Virtualization
In many enterprises, a major hypervisor platform is already in place, accompanied by a robust backup solution – often Veeam. However, until recently, Veeam lacked direct support for Proxmox VE, leaving a gap for those who have embraced or are considering this open-source virtualization platform. The latest version of Veeam changes the game by introducing the capability to create and manage backups and restores directly within Proxmox VE environments, without the need for agents inside the VMs.
This advancement means that entire VMs can now be backed up and restored across any hypervisor, providing unparalleled flexibility. Moreover, enterprises can seamlessly integrate a new Proxmox VE-based cluster into their existing Veeam setup, managing everything from a single, central point. This integration simplifies operations, reduces complexity, and enhances the overall efficiency of data protection strategies in environments that include multiple hypervisors by simply having a one-fits-all solution in place.
Also, an heavily underestimated benefit, offers the possibilities to easily migrate, copy, backup and restore entire VMs even independent of their underlying hypervisor – also known as cross platform recovery. As a result, operators are now able to shift VMs from VMware ESXi nodes / vSphere, or Hyper-V to Proxmox VE nodes. This provides a great solution to introduce and evaluate a new virtualization platform without taking any risks. For organizations looking to unify their virtualization and backup infrastructure, this update offers a significant leap forward.
Integration into Veeam
Integrating a new Proxmox cluster into an existing Veeam setup is a testament to the simplicity and user-centric design of both systems. Those familiar with Veeam will find the process to be intuitive and minimally disruptive, allowing for a seamless extension of their virtualization environment. This ease of integration means that your new Proxmox VE cluster can be swiftly brought under the protective umbrella of Veeam’s robust backup and replication services.
Despite the general ease of the process, it’s important to recognize that unique configurations and specific environments may present their own set of challenges. These corner cases, while not common, are worth noting as they can require special attention to ensure a smooth integration. Rest assured, however, that these are merely nuances in an otherwise straightforward procedure, and with a little extra care, even these can be managed effectively.
Overview
Starting with version 12.2, the Proxmox VE support is enabled and integrated by a plugin which gets installed on the Veeam Backup server. Veeam Backup for Proxmox incorporates a distributed architecture that necessitates the deployment of worker nodes. These nodes function analogously to data movers, facilitating the transfer of virtual machine payloads from the Proxmox VE hosts to the designated Backup Repository. The workers operate on a Linux platform and are seamlessly instantiated via the Veeam Backup Server console. Their role is critical and akin to that of proxy components in analogous systems such as AHV or VMware backup solutions.
Such a worker is needed at least once in a cluster. For improved performance, one worker for each Proxmox VE node might be considered. Each worker requires 6 vCPU, 6 GB memory and 100 GB disk space which should be kept in mind.
Requirements
This blog post assumes that an already present installation of Veeam Backup & Replication in version 12.2 or later is already in place and fully configured for another environment such like VMware. It also assumes that the Proxmox VE cluster is already present and a credential with the needed roles to perform the backup/restore actions is given.
Configuration
The integration and configuration of a Proxmox VE cluster can be fully done within the Veeam Backup & Replication Console application and does not require any additional commands on any cli to be executed. The previously mentioned worker nodes can be installed fully automated.
Adding a Proxmox Server
Virtual Infrastructure -> Add Server
This procedure is consistent with the established protocol for incorporating nodes from other virtualization platforms that are compatible with Veeam.
Afterwards, Veeam shows you a selection of possible and supported Hypervisors:
- VM vSphere
- Microsoft Hyper-V
- Nutanix AHV
- RedHat Virtualization
- Oracle Virtualization Manager
- Proxmox VE
In this case we simply choose Proxmox VE and proceed the setup wizard.
During the next steps in the setup wizard, the authentication details, the hostname or IP address of the target Proxmox VE server and also a snapshot storage of the Proxmox VE server must be defined.
Hint: When it comes to the authentication details, take care to use functional credentials for the SSH service on the Proxmox VE server. If you usually use the root@pam credentials for the web interface, you simply need to prompt root to Veeam. Veeam will initiate a connection to the system over the ssh protocol.
Usage
As a result, this does not only simplify the operator’s work when working with different servers and clusters but also provides finally the opportunity for cross-hypervisor-recoveries.
Creating Backup Jobs
Creating a new backup job for a single VM or even multiple VMs in a Proxmox environment is as simple and exactly the same way, like you already know for other hypervisors. However, let us have a quick summary about the needed tasks:
Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the drop-down menu.
When the New Backup Job wizard opens, you will need to enter a name and a description for the backup job. Click Next to proceed to the next step. Now, you will need to select the VMs that you want to back up. Click Add in the Virtual Machines step and choose the individual VMs or containers like folders, clusters, or entire hosts that you want to include in the backup. Once you have made your selection, click Next.
If you have configured multiple backup proxies, the next step allows you to specify which one to use. If you are not sure or if you prefer, you can let Veeam Backup & Replication automatically select the best proxy for the job. Click Next after making your choice.
Review all the settings on the summary page to ensure they are correct. If everything looks good, click Finish to create the backup job.
If you want to run the backup job immediately for ensuring everything works as expected, you can do so by right-clicking on the job and selecting Start. Alternatively, you can wait for the scheduled time to trigger the job automatically.
Restoring an entire VM
The restore and replication process for a full VM restore remains to the standard procedures. However, it now includes the significant feature of cross-hypervisor restore. This functionality allows for the migration of VMs between different hypervisor types without compatibility issues. For example, when introducing Proxmox VE into a corporate setting, operators can effortlessly migrate VMs from an existing hypervisor to the Proxmox VE cluster. Should any issues arise during the testing phase, the process also supports the reverse migration back to the original hypervisor. Let us have a look at the details.
Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the Disk menu.
Choose the Entire VM restore option, which will launch the wizard for restoring a full virtual machine. The first step in the wizard will ask you to select a backup from which you want to restore. You will see a list of available backups; select the one that contains the VM you wish to restore and proceed to the next step by clicking Next.
Now, you must decide on the restore point. Typically, this will be the most recent backup, but you may choose an earlier point if necessary. After selecting the restore point, continue to the next step.
In the next step, you will have options regarding the power state of the VM after the restoration. You can choose to power on the VM automatically or leave it turned off, depending on your needs.
The restoration process will begin, and its progress can be monitored within the Veeam Backup & Replication console. Depending on the size of the VM and the performance of your backup storage and network, the restoration can take some time.
File-Level-Restore
Open the Veeam Backup & Replication console on your backup server or management workstation. To start creating a backup job, navigate to the Home tab and click on Backup Job, then select Virtual machine from the Disk menu.
Choose the restore point that you want to use for the file-level restore. This is typically the most recent backup, but you can select an earlier one if needed. After picking the restore point, click Next to continue.
At this stage, you may need to choose the operating system of the VM that you are restoring files from. This is particularly important if the backup is of a different OS than the one on the Veeam Backup & Replication server because it will determine the type of helper appliance required for the restore.
Once the helper appliance is ready, you will be able to browse the file system of the backup. Navigate through the backup to locate the files or folders you wish to restore.
After selecting the files or folders for restoration, you will be prompted to choose the destination where you want to restore the data. You can restore to the original location or specify a new location, depending on your requirements.
The file-level restore process will start, and you can monitor the progress within the Veeam Backup & Replication console. The time it takes to complete the restore will depend on the size and number of files being restored, as well as the performance of your backup storage and network.
Conclusion
Summarising all the things, the latest update to Veeam introduces a very important and welcomed integration with Proxmox VE, filling a significant gap for enterprises that have adopted this open-source virtualization platform. By enabling direct backups and restores of entire VMs across different hypervisors without the need for in-VM agents, Veeam now offers unparalleled flexibility and simplicity in managing mixed environments. This advancement not only streamlines operations and enhances data protection strategies but also empowers organizations to easily migrate and evaluate new open-source virtualization platforms like Proxmox VE with minimal risk. It is great to see that more and more companies are putting efforts into supporting open-source solutions which underlines the ongoing importance of open-source based products in enterprises.
Additionally, for those starting fresh with Proxmox, the Proxmox Backup Server remains a viable open-source alternative and you can find our blog post about configuring the Proxmox Backup Server right here. Overall, this update represents a significant step forward in unifying virtualization and backup infrastructures, offering both versatility and ease of integration.
We are always here to help and assist you with further consulting, planning, and integration needs. Whether you are exploring new virtualization platforms, optimizing your current infrastructure, or looking for expert guidance on your backup strategies, our team is dedicated to ensuring your success every step of the way. Do not hesitate to reach out to us for personalized support and tailored solutions to meet your unique requirements in virtualization- or backup environments.
In the world of virtualization, ensuring data redundancy and high availability is crucial. Proxmox Virtual Environment (PVE) is a powerful open-source platform for enterprise virtualization, combining KVM hypervisor and LXC containers. One of the key features that Proxmox offers is local storage replication, which helps in maintaining data integrity and availability in case of hardware failures. In this blog post, we will delve into the concept of local storage replication in Proxmox, its benefits, and how to set it up.
What is Local Storage Replication?
Local storage replication in Proxmox refers to the process of duplicating data from one local storage device to another within the same Proxmox cluster. This ensures that if one storage device fails, the data is still available on another device, thereby minimizing downtime and data loss. This is particularly useful in environments where high availability is critical.
Benefits
- Data Redundancy: By replicating data across multiple storage devices, you ensure that a copy of your data is always available, even if one device fails.
- High Availability: In the event of hardware failure, the system can quickly switch to the replicated data, ensuring minimal disruption to services.
Caveat
Please note that data loss may occur between the last synchronization of the data and the failure of the node. Otherwise use shared storage (Ceph, NFS, …) in a cluster if you can not tolerate any small data loss.
Setting Up Local Storage Replication in Proxmox
Setting up local storage replication in Proxmox involves a few steps. Here’s a step-by-step guide to help you get started:
Step 1: Prepare Your Environment
Ensure that you have a Proxmox cluster set up with at least two nodes. Each node should have local ZFS storage configured.
Step 2: Configure Storage Replication
- Access the Proxmox Web Interface: Log in to the Proxmox web interface.
- Navigate to Datacenter: In the left-hand menu, click on Datacenter.
- Select Storage: Under the Datacenter menu, click on Storage.
- Add Storage: Click on Add and select the type of storage you want to replicate.
- Configure Storage: Fill in the required details for the ZFS storage (one local storage per node).
Step 3: Set Up Replication
- Navigate to the Node: In the left-hand menu, select the node where you want to set up replication.
- Select the VM/CT: Click on the virtual machine (VM) or container (CT) you want to replicate.
- Configure Replication: Go to the Replication tab and click on Add.
- Select Target Node: Choose the target node where the data will be replicated to.
- Schedule Replication: Set the replication schedule according to your needs (e.g. every 5 minutes, hourly).
Step 4: Monitor Replication
Once replication is set up, you can monitor its status in the Replication tab. Proxmox provides detailed logs and status updates to help you ensure that replication is functioning correctly.
Best Practices for Local Storage Replication
- Regular Backups: While replication provides redundancy, it is not a substitute for regular backups. Ensure that you have a robust backup strategy in place. Use tools like the Proxmox Backup Server (PBS) for this task.
- Monitor Storage Health: Regularly check the health of your storage devices to preemptively address any issues.
- Test Failover: Periodically test the failover process to ensure that your replication setup works as expected in case of an actual failure.
- Optimize Replication Schedule: Balance the replication frequency with your performance requirements and network bandwidth to avoid unnecessary load.
Conclusion
Local storage replication in Proxmox is a powerful feature that enhances data redundancy and high availability. By following the steps outlined in this blog post, you can set up and manage local storage replication in your Proxmox environment, ensuring that your data remains safe and accessible even in the face of hardware failures. Remember to follow best practices and regularly monitor your replication setup to maintain optimal performance and reliability.
You can find further information here about the Proxmox storage replication:
https://pve.proxmox.com/wiki/Storage_Replication https://pve.proxmox.com/pve-docs/chapter-pvesr.html
Happy virtualizing!
If you had the choice, would you rather take Salsa or Guacamole? Let me explain, why you should choose Guacamole over Salsa.
In this blog article, we want to take a look at one of the smaller Apache projects out there called Apache Guacamole. Apache Guacamole allows administrators to run a web based client tool for accessing remote applications and servers. This can include remote desktop systems, applications or terminal sessions. Users can simply access them by using their web browsers. No special client or other tools are required. From there, they can login and access all pre-configured remote connections that have been specified by an administrator.
Thereby, Guacamole supports a wide variety of protocols like VNC, RDP, and SSH. This way, users can basically access anything from remote terminal sessions to full fledged Graphical User Interfaces provided by operation systems like Debian, Ubuntu, Windows and many more.
Convert every window application to a web application
If we spin this idea further, technically every window application that isn’t designed to run as an web application can be transformed to a web application by using Apache Guacamole. We helped a customer to bring its legacy application to Kubernetes, so that other users could use their web browsers to run it. Sure, implementing the application from ground up, so that it follows the Cloud Native principles, is the preferred solution. As always though, efforts, experience and costs may exceed the available time and budget and in that cases, Apache Guacamole can provide a relatively easy way for realizing such projects.
In this blog article, I want to show you, how easy it is to run a legacy window application as a web app on Kubernetes. For this, we will use a Kubernetes cluster created by kind
and create a Kubernetes Deployment to make kate
– a KDE based text editor – our own web application. It’s just an example, so there might be better application to transform but this one should be fine to show you the concepts behind Apache Guacamole.
So, without further ado, let’s create our kate
web application.
Preparation of Kubernetes
Before we can start, we must make sure that we have a Kubernetes cluster, that we can test on. If you already have a cluster, simply skip this section. If not, let’s spin one up by using kind
.
kind
is a lightweight implementation of Kubernetes that can be run on every machine. It’s written in Go and can be installed like this:
# For AMD64 / x86_64
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
# For ARM64
[ $(uname -m) = aarch64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-arm64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
Next, we need to install some dependencies for our cluster. This includes for example docker
and kubectl
.
$ sudo apt install docker.io kubernetes-client
By creating our Kubernetes Cluster with kind
, we need docker
because the Kubernetes cluster is running within Docker containers on your host machine. Installing kubectl
allows us to access the Kubernetes after creating it.
Once we installed those packages, we can start to create our cluster now. First, we must define a cluster configuration. It defines which ports are accessible from our host machine, so that we can access our Guacamole application. Remember, the cluster itself is operated within Docker containers, so we must ensure that we can access it from our machine. For this, we define the following configuration which we save in a file called cluster.yaml
:
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraPortMappings:
- containerPort: 30000
hostPort: 30000
listenAddress: "127.0.0.1"
protocol: TCP
Hereby, we basically map the container’s port 30000 to our local machine’s port 30000, so that we can easily access it later on. Keep this in mind because it will be the port that we will use with our web browser to access our kate
instance.
Ultimately, this configuration is consumed by kind
. With it, you can also adjust multiple other parameters of your cluster besides of just modifying the port configuration which are not mentioned here. It’s worth to take a look kate’s documentation for this.
As soon as you saved the configuration to cluster.yaml
, we can now start to create our cluster:
$ sudo kind create cluster --name guacamole --config cluster.yaml
Creating cluster "guacamole" ...
✓ Ensuring node image (kindest/node:v1.29.2) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-guacamole"
You can now use your cluster with:
kubectl cluster-info --context kind-guacamole
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
Since we don’t want to run everything in root context, let’s export the kubeconfig
, so that we can use it with kubectl
by using our unpriviledged user:
$ sudo kind export kubeconfig \
--name guacamole \
--kubeconfig $PWD/config
$ export KUBECONFIG=$PWD/config
$ sudo chown $(logname): $KUBECONFIG
By doing so, we are ready and can access our Kubernetes cluster using kubectl
now. This is our baseline to start migrating our application.
Creation of the Guacamole Deployment
In order to run our application on Kubernetes, we need some sort of workload resource. Typically, you could create a Pod, Deployment, Statefulset or Daemonset to run workloads on a cluster.
Let’s create the Kubernetes Deployment for our own application. The example shown below shows the deployment’s general structure. Each container definition will have their dedicated examples afterwards to explain them in more detail.
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web-based-kate
name: web-based-kate
spec:
replicas: 1
selector:
matchLabels:
app: web-based-kate
template:
metadata:
labels:
app: web-based-kate
spec:
containers:
# The guacamole server component that each
# user will connect to via their browser
- name: guacamole-server
image: docker.io/guacamole/guacamole:1.5.4
...
# The daemon that opens the connection to the
# remote entity
- name: guacamole-guacd
image: docker.io/guacamole/guacd:1.5.4
...
# Our own self written application that we
# want to make accessible via the web.
- name: web-based-kate
image: registry.example.com/own-app/web-based-kate:0.0.1
...
volumes:
- name: guacamole-config
secret:
secretName: guacamole-config
- name: guacamole-server
emptyDir: {}
- name: web-based-kate-home
emptyDir: {}
- name: web-based-kate-tmp
emptyDir: {}
As you can see, we need three containers and some volumes for our application. The first two containers are dedicated to Apache Guacamole itself. First, it’s the server component which is the external endpoint for clients to access our web application. It serves the web server as well as the user management and configuration to run Apache Guacamole.
Next to this, there is the guacd
daemon. This is the core component of Guacamole which creates the remote connections to the application based on the configuration done to the server. This daemon forwards the remote connection to the clients by making it accessible to the Guacamole server which then forwards the connection to the end user.
Finally, we have our own application. It will offer a connection endpoint to the guacd
daemon using one of Guacamole’s supported protocols and provide the Graphical User Interface (GUI).
Guacamole Server
Now, let’s deep dive into each container specification. We are starting with the Guacamole server instance. This one handles the session and user management and contains the configuration which defines what remote connections are available and what are not.
- name: guacamole-server
image: docker.io/guacamole/guacamole:1.5.4
env:
- name: GUACD_HOSTNAME
value: "localhost"
- name: GUACD_PORT
value: "4822"
- name: GUACAMOLE_HOME
value: "/data/guacamole/settings"
- name: HOME
value: "/data/guacamole"
- name: WEBAPP_CONTEXT
value: ROOT
volumeMounts:
- name: guacamole-config
mountPath: /data/guacamole/settings
- name: guacamole-server
mountPath: /data/guacamole
ports:
- name: http
containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: true
capabilities:
drop: ["all"]
resources:
limits:
cpu: "250m"
memory: "256Mi"
requests:
cpu: "250m"
memory: "256Mi"
Since it needs to connect to the guacd
daemon, we have to provide the connection information for guacd
by passing them into the container using environment variables like GUACD_HOSTNAME
or GUACD_PORT
. In addition, Guacamole would usually be accessible via http://<your domain>/guacamole
.
This behavior however can be adjusted by modifying the WEBAPP_CONTEXT
environment variable. In our case for example, we don’t want a user to type in /guacamole
to access it but simply using it like this http://<your domain>/
Guacamole Guacd
Then, there is the guacd
daemon.
- name: guacamole-guacd
image: docker.io/guacamole/guacd:1.5.4
args:
- /bin/sh
- -c
- /opt/guacamole/sbin/guacd -b 127.0.0.1 -L $GUACD_LOG_LEVEL -f
securityContext:
allowPrivilegeEscalation: true
privileged: false
readOnlyRootFileSystem: true
capabilities:
drop: ["all"]
resources:
limits:
cpu: "250m"
memory: "512Mi"
requests:
cpu: "250m"
memory: "512Mi"
It’s worth mentioning that you should modify the arguments used to start the guacd
container. In the example above, we want guacd
to only listen to localhost for security reasons. All containers within the same pod share the same network namespace. As a a result, they can access each other via localhost. This said, there is no need to make this service accessible to over services running outside of this pod, so we can limit it to localhost only. To achieve this, you would need to set the -b 127.0.0.1
parameter which sets the corresponding listen address. Since you need to overwrite the whole command, don’t forget to also specify the -L
and -f
parameter. The first parameter sets the log level and the second one set the process in the foreground.
Web Based Kate
To finish everything off, we have the kate
application which we want to transform to a web application.
- name: web-based-kate
image: registry.example.com/own-app/web-based-kate:0.0.1
env:
- name: VNC_SERVER_PORT
value: "5900"
- name: VNC_RESOLUTION_WIDTH
value: "1280"
- name: VNC_RESOLUTION_HEIGHT
value: "720"
securityContext:
allowPrivilegeEscalation: true
privileged: false
readOnlyRootFileSystem: true
capabilities:
drop: ["all"]
volumeMounts:
- name: web-based-kate-home
mountPath: /home/kate
- name: web-based-kate-tmp
mountPath: /tmp
Configuration of our Guacamole setup
After having the deployment in place, we need to prepare the configuration for our Guacamole setup. In order to know, what users exist and which connections should be offered, we need to provide a mapping configuration to Guacamole.
In this example, a simple user mapping is shown for demonstration purposes. It uses a static mapping defined in a XML file that is handed over to the Guacamole server. Typically, you would use other authentication methods instead like a database or LDAP.
This said however, let’s continue with our static one. For this, we simply define a Kubernetes Secret which is mounted into the Guacamole server. Hereby, it defines two configuration files. One is the so called guacamole.properties
. This is Guacamole’s main configuration file. Next to this, we also define the user-mapping.xml
which contains all available users and their connections.
apiVersion: v1
kind: Secret
metadata:
name: guacamole-config
stringData:
guacamole.properties: |
enable-environment-properties: true
user-mapping.xml: |
<user-mapping>
<authorize username="admin" password="PASSWORD" encoding="sha256">
<connection name="web-based-kate">
<protocol>vnc</protocol>
<param name="hostname">localhost</param>
<param name="port">5900</param>
</connection>
</authorize>
</user-mapping>
As you can see, we only defined on specific user called admin
which can use a connection called web-based-kate
. In order to access the kate
instance, Guacamole would use VNC as the configured protocol. To make this happen, our web application must offer a VNC Server port on the other side, so that the guacd
daemon can then access it to forward the remote session to clients. Keep in mind that you need to replace the string PASSWORD
to a proper sha256 sum which contains the password. The sha256 sum could look like this for example:
$ echo -n "test" | sha256sum
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 -
Next, the hostname parameter is referencing the corresponding VNC server of our kate
container. Since we are starting our container alongside with our Guacamole containers within the same pod, the Guacamole Server as well as the guacd
daemon can access this application via localhost
. There is no need to set up a Kubernetes Service in front of it since only guacd
will access the VNC server and forward the remote session via HTTP to clients accessing Guacamole via their web browsers. Finally, we also need to specify the VNC server port which is typically 5900
but this could be adjusted if needed.
The corresponding guacamole.properties
is quite short. By enabling the enabling-environment-properties
configuration parameter, we make sure that every Guacamole configuration parameter can also be set via environment variables. This way, we don’t need to modify this configuration file each and every time when we want to adjust the configuration but we only need to provide updated environment variables to the Guacamole server container.
Make Guacamole accessible
Last but not least, we must make the Guacamole server accessible for clients. Although each provided service can access each other via localhost, the same does not apply to clients trying to access Guacamole. Therefore, we must make Guacamole’s server port 8080 available to the outside world. This can be achieved by creating a Kubernetes Service of type NodePort
. This service is forwarding each request from a local node port to the corresponding container that is offering the configured target port. In our case, this would be the Guacamole server container which is offering port 8080.
apiVersion: v1
kind: Service
metadata:
name: web-based-kate
spec:
type: NodePort
selector:
app: web-based-kate
ports:
- name: http
protocol: TCP
port: 8080
targetPort: 8080
nodePort: 30000
This specific port is then mapped to the Node’s 30000 port for which we also configured the kind
cluster in such a way that it forwards its node port 30000 to the host system’s port 30000. This port is the one that we would need to use to access Guacamole with our web browsers.
Prepartion of the Application container
Before we can start to deploy our application, we need to prepare our kate
container. For this, we simply create a Debian container that is running kate
. Keep in mind that you would typically use lightweight base images like alpine
to run applications like this. For this demonstration however, we use the Debian images since it is easier to spin it up but in general you only need a small friction of the functionality that is provided by this base image. Moreover – from an security point of view – you want to keep your images small to minimize the attack surface and make sure it is easier to maintain. For now however, we will continue with the Debian image.
In the example below, you can see a Dockerfile for the kate
container.
FROM debian:12
# Install all required packages
RUN apt update && \
apt install -y x11vnc xvfb kate
# Add user for kate
RUN adduser kate --system --home /home/kate -uid 999
# Copy our entrypoint in the container
COPY entrypoint.sh /opt
USER 999
ENTRYPOINT [ "/opt/entrypoint.sh" ]
Here you see that we create a dedicated user called kate
(User ID 999) for which we also create a home directory. This home directory is used for all files that kate
is creating during runtime. Since we set the readOnlyRootFilesystem
to true
, we must make sure that we mount some sort of writable volume (e.g EmptyDir
) to kate’s home directory. Otherwise, kate wouldn’t be able to write any runtime data then.
Moreover, we have to install the following three packages:
- x11vnc
- xvfb
- kate
These are the only packages we need for our container. In addition, we also need to create an entrypoint script to start the application and prepare the container accordingly. This entrypoint script creates the configuration for kate
, starts it in a virtual display by using xvfb-run
and provides this virtual display to end users by using the VNC server via x11vnc
. In the meantime, xdrrinfo
is used to check if the virtual display came up successfully after starting kate
. If it takes to long, the entrypoint script will fail by returning the exit code 1.
By doing this, we ensure that the container is not stuck in an infinite loop during a failure and let Kubernetes restart the container whenever it couldn’t start the application successfully. Furthermore, it is important to check if the virtual display came up prior of handing it over to the VNC server because the VNC server would crash if the virtual display is not up and running since it needs something to share. On the other hand though, our container will be killed whenever kate
is terminated because it would also terminate the virtual display and in the end it would then also terminate the VNC server which let’s the container exit, too. This way, we don’t need take care of it by our own.
#!/bin/bash
set -e
# If no resolution is provided
if [ -z $VNC_RESOLUTION_WIDTH ]; then
VNC_RESOLUTION_WIDTH=1920
fi
if [ -z $VNC_RESOLUTION_HEIGHT ]; then
VNC_RESOLUTION_HEIGHT=1080
fi
# If no server port is provided
if [ -z $VNC_SERVER_PORT ]; then
VNC_SERVER_PORT=5900
fi
# Prepare configuration for kate
mkdir -p $HOME/.local/share/kate
echo "[MainWindow0]
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: Height=$VNC_RESOLUTION_HEIGHT
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: Width=$VNC_RESOLUTION_WIDTH
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: XPosition=0
"$VNC_RESOLUTION_WIDTH"x"$VNC_RESOLUTION_HEIGHT" screen: YPosition=0
Active ViewSpace=0
Kate-MDI-Sidebar-Visible=false" > $HOME/.local/share/kate/anonymous.katesession
# We need to define an XAuthority file
export XAUTHORITY=$HOME/.Xauthority
# Define execution command
APPLICATION_CMD="kate"
# Let's start our application in a virtual display
xvfb-run \
-n 99 \
-s ':99 -screen 0 '$VNC_RESOLUTION_WIDTH'x'$VNC_RESOLUTION_HEIGHT'x16' \
-f $XAUTHORITY \
$APPLICATION_CMD &
# Let's wait until the virtual display is initalize before
# we proceed. But don't wait infinitely.
TIMEOUT=10
while ! (xdriinfo -display :99 nscreens); do
sleep 1
let TIMEOUT-=1
done
# Now, let's make the virtual display accessible by
# exposing it via the VNC Server that is listening on
# localhost and the specified port (e.g. 5900)
x11vnc \
-display :99 \
-nopw \
-localhost \
-rfbport $VNC_SERVER_PORT \
-forever
After preparing those files, we can now create our image and import it to our Kubernetes cluster by using the following commands:
# Do not forget to give your entrypoint script
# the proper permissions do be executed
$ chmod +x entrypoint.sh
# Next, build the image and import it into kind,
# so that it can be used from within the clusters.
$ sudo docker build -t registry.example.com/own-app/web-based-kate:0.0.1 .
$ sudo kind load -n guacamole docker-image registry.example.com/own-app/web-based-kate:0.0.1
The image will be imported to kind
, so that every workload resource operated in our kind
cluster can access it. If you use some other Kubernetes cluster, you would need to upload this to a registry that your cluster can pull images from.
Finally, we can also apply our previously created Kubernetes manifests to the cluster. Let’s say we saved everything to one file called kuberentes.yaml
. Then, you can simply apply it like this:
$ kubectl apply -f kubernetes.yaml
deployment.apps/web-based-kate configured
secret/guacamole-config configured
service/web-based-kate unchanged
This way, a Kubernetes Deployment, Secret and Service is created which ultimately creates a Kubernetes Pod which we can access afterwards.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
web-based-kate-7894778fb6-qwp4z 3/3 Running 0 10m
Verification of our Deployment
Now, it’s money time! After preparing everything, we should be able to access our web based kate
application by using our web browser. As mentioned earlier, we configured kind
in such a way that we can access our application by using our local port 30000. Every request to this port is forwarded to the kind control plane node from where it is picked up by the Kubernetes Service of type NodePort
. This one is then forwarding all requests to our designated Guacamole server container which is offering the web server for accessing remote application’s via Guacamole.
If everything works out, you should be able to see the the following login screen:
After successfully login in, the remote connection is established and you should be able to see the welcome screen from kate
:
If you click on New
, you can create a new text file:
Those text files can even be saved but keep in mind that they will only exist as long as our Kubernetes Pod exists. Once it gets deleted, the corresponding EmptyDir
, that we mounted into our kate
container, gets deleted as well and all files in it are lost. Moreover, the container is set to read-only meaning that a user can only write files to the volumes (e.g. EmptyDir
) that we mounted to our container.
Conclusion
After seeing that it’s relatively easy to convert every application to a web based one by using Apache Guacamole, there is only one major question left…
What do you prefer the most. Salsa or Guacamole?
Integrating Proxmox Backup Server into Proxmox Clusters
Proxmox Backup Server
In today’s digital landscape, where data reigns supreme, ensuring its security and integrity is paramount for businesses of all sizes. Enter Proxmox Backup Server, a robust solution poised to revolutionize data protection strategies with its unparalleled features and open-source nature.
At its core, Proxmox Backup Server is a comprehensive backup solution designed to safeguard critical data and applications effortlessly in virtualized environments based on Proxmox VE. Unlike traditional backup methods, Proxmox Backup Server offers a streamlined approach, simplifying the complexities associated with data backup and recovery.
One of the standout features of Proxmox Backup Server is its seamless integration with Proxmox Virtual Environment (PVE), creating a cohesive ecosystem for managing virtualized environments. This integration allows for efficient backup and restoration of Linux containers and virtual machines, ensuring minimal downtime and maximum productivity. Without the need of any backup clients on each container or virtual machine, this solution still offers the back up and restore the entire system but also single files directly from the filesystem.
Proxmox Backup Server provides a user friendly interface, making it accessible to both seasoned IT professionals and newcomers alike. With its intuitive design, users can easily configure backup tasks, monitor progress, and retrieve data with just a few clicks, eliminating the need for extensive training or technical expertise.
Data security is a top priority for businesses across industries and Proxmox Backup Server delivers on this front. Bundled with solutions like ZFS it also brings in all the enterprise filesystem features like encryption at rest, encryption at transition, checksums, snapshots, deduplication and compression but also integrating iSCSI or NFS storage from enterprise storage solutions like from NetApp can be used.
Another notable aspect of Proxmox Backup Server is its cost effectiveness. As an open-source solution, it eliminates the financial barriers (also in addition with the Proxmox VE solutions) associated with proprietary backup software.
Integrating Proxmox Backup Server into Proxmox Clusters
General
This guide expects you to have already at least one Proxmox VE system up and running and also a system where a basic installation of Proxmox Backup Server has been performed. Within this example, the Proxmox Backup Server is installed on a single disk, where the datastore gets attached to an additional block device holding the backups. Proxmox VE and Proxmox Backup Server instances must not be in the same network but must be reachable for each other. The integration requires administrative access to the datacenter of the Proxmox VE instance(s) and the Backup Server.
Prerequisites
- Proxmox VE (including the datacenter).
- Proxmox Backup Server (basic installation).
- Administrative access to all systems.
- Network reachability.
- Storage device holding the backups (in this case a dedicated block storage device).
Administration: Proxmox Backup Server
Like the Proxmox VE environment, the Proxmox Backup Server comes along with a very intuitive web frontend. Unlike the web frontend of Proxmox VE, which runs on tcp/8006, the Proxmox Backup Server can be reached on tcp/8007. Therefore, all next tasks will be done on https://<IP-PROXMOX-BACKUP-SERVER>:8007.
After logging in to the web frontend, the dashboard overview welcomes the user.
Adding Datastore / Managing Storage
The initial and major tasks relies in managing the storage and adding a usable datastore for the virtualization environment holding the backup data. Therefore, we switch to the Administration chapter and click on Storage / Disks. This provides an overview of the available Devices on the Proxmox Backup Server.
By clicking on ZFS in the top menu bar, a ZFS trunk can be created as a datastore. Within this survey, a name, the raid level, compression and the devices to use must be defined.
Ensure to check Add as Datastore option (default) will create the given name directly as a usable datastore. In our example this will be backup01.
Keep in mind, that this part is not needed when using a NFS share. Also do not use this in addition with hardware RAID controllers.
Adding User for Backup
In a next step, a dedicated user will be created that will be used for the datastore permissions and for the Proxmox VE instances for authentication and authorization. This allows even complex setups with different datastores, different users including different access levels (e.g., reading, writing, auditing,…) on different clusters and instances. To keep it simple for demonstrations, just a single user for everything will be used.
Adding Permission of User for Datastore
Mentioning already the possibility to create complex setups regarding authentication and authorization, the datastore must be linked to at least a single user that can access it. Therefore, we go back to the Datastore and select the previously created backup01 datastore. In the top menu bar, the permissions can be created and adjusted in the Permissions chapter. Initially, a new one will be created now. Within the following survey the datastore or path, the user and the role must be defined:
Path: /datastore/backup01
User: dc01cluster22backup01@pbs
Role: DatastoreAdmin
Propagate: True
To provide a short overview of the possible roles, this will be shortly mentioned without any further explanation:
- Admin
- Audit
- DatastoreAdmin
- DatastoreAudit
- DatastoreBackup
- DatastorePowerUser
- DatastoreReader
Administration: Proxmox VE
The integration of the backup datastore will be performed from the Proxmox VE instances via the Datacenter. As a result, the Proxmox VE web frontend will now be used for further administrative actions. The Proxmox VE web frontend runs on tcp/8006, Therefore, all next tasks will be done on https://<IP-PROXMOX-VE-SERVER>:8006.
Adding Storage
In the left menu we choose the active datacenter and select the Storage options. There, we can find all natively support storage options like (NFS, SMB/CIFS, iSCSI, ZFS, GlusterFS,…) of Proxmox and finally select the Proxmox Backup Server as a dedicated item.
Afterwards, the details for adding this datastore to the datacenter must be inserted. The following options need to be defined:
Server: <FQDN-OR-IP-OF-BACKUP-SERVER>
Username: dc01cluster22backup01@pbs
Password: <THE-PASSWORD-OF-THE-USER>
Enable: True
Datastore: backup01
Fingerprint: <SYSTEM-FINGERPRINT-OF-BACKUP-SERVER>
After adding this backup datastore to the datacenter, this can immediately be used for backup and the integration is finalized.
Conclusion
Proxmox provides with the Proxmox Backup Server an enterprise backup solution, for backing up Linux containers and virtual machines. Supporting features like incremental and fully deduplicated backups by using the benefits of different open-source solutions, in addition with strong encryption and data integrity this solution is a prove that open-source software can compete with closed-source enterprise software. Together with Proxmox VE, enterprise like virtualization environments can be created and managed without missing the typical enterprise feature set. Proxmox VE and the Proxmox Backup Server can also be used in addition to storage appliances from vendors like NetApp, by directly use iSCSI or NFS.
Providing this simple example, there are of course much more complex scenarios which can be created and also should be considered. We are happy to provide you more information and to assist you creating such setups. We also provide help for migrating from other products to Proxmox VE setups. Feel free to contact us at any time for more information.
Migrating VMs from VMware ESXi to Proxmox
In response to Broadcom’s recent alterations in VMware’s subscription model, an increasing number of enterprises are reevaluating their virtualization strategies. With heightened concerns over licensing costs and accessibility to features, businesses are turning towards open source solutions for greater flexibility and cost-effectiveness. Proxmox, in particular, has garnered significant attention as a viable alternative. Renowned for its robust feature set and open architecture, Proxmox offers a compelling platform for organizations seeking to mitigate the impact of proprietary licensing models while retaining comprehensive virtualization capabilities. This trend underscores a broader industry shift towards embracing open-source technologies as viable alternatives in the virtualization landscape. Just to mention, Proxmox is widely known as a viable alternative to VMware ESXi but there are also other options available, such as bhyve which we also covered in one of our blog posts.
Benefits of Opensource Solutions
In the dynamic landscape of modern business, the choice to adopt open source solutions for virtualization presents a strategic advantage for enterprises. With platforms like KVM, Xen and even LXC containers, organizations can capitalize on the absence of license fees, unlocking significant cost savings and redirecting resources towards innovation and growth. This financial flexibility empowers companies to make strategic investments in their IT infrastructure without the burden of proprietary licensing costs. Moreover, open source virtualization promotes collaboration and transparency, allowing businesses to tailor their environments to suit their unique needs and seamlessly integrate with existing systems. Through community-driven development and robust support networks, enterprises gain access to a wealth of expertise and resources, ensuring the reliability, security, and scalability of their virtualized infrastructure. Embracing open source virtualization not only delivers tangible financial benefits but also equips organizations with the agility and adaptability needed to thrive in an ever-evolving digital landscape.
Migrating a VM
Prerequisites
To ensure a smooth migration process from VMware ESXi to Proxmox, several key steps must be taken. First, SSH access must be enabled on both the VMware ESXi host and the Proxmox host, allowing for remote management and administration. Additionally, it’s crucial to have access to both systems, facilitating the migration process. Furthermore, establishing SSH connectivity between VMware ESXi and Proxmox is essential for seamless communication between the two platforms. This ensures efficient data transfer and management during migration. Moreover, it’s imperative to configure the Proxmox system or cluster in a manner similar to the ESXi setup, especially concerning networking configurations. This includes ensuring compatibility with VLANs or VXLANs for more complex setups. Additionally, both systems should either run on local storage or have access to shared storage, such as NFS, to facilitate the transfer of virtual machine data. Lastly, before initiating the migration, it’s essential to verify that the Proxmox system has sufficient available space to accommodate the imported virtual machine, ensuring a successful transition without storage constraints.
Activate SSH on ESXi
The SSH server must be activated in order to copy the content from the ESXi system to the new location on the Proxmox server. The virtual machine will later be copied from the Proxmox server. Therefore, it is necessary that the Proxmox system can establish an SSH connection on tcp/22 to the ESXi system:
- Log in to the VMware ESXi host.
- Navigate to Configuration > Security Profile.
- Enable SSH under Services.
Find Source Information about VM on ESXi
One of the challenging matters in finding the location of the virtual machine holding the virtual machine disk. The path can be found within the web UI of the ESXi system:
- Locate the ESXi node that runs the Virtual Machine that should be migrated
- Identify the virtual machine to be migrated (e.g., pgsql07.gyptazy.ch).
- Obtain the location of the virtual disk (VMDK) associated with the VM from the configuration panel.
- The VM location path should be shown (e.g., /vmfs/volumes/137b4261-68e88bae-0000-000000000000/pgsql07.gyptazy.ch).
- Stop and shutdown the VM.
Create a New Empty VM on Proxmox
- Create a new empty VM in Proxmox.
- Assign the same resources like in the ESXi setup.
- Set the network type to VMware vmxnet3.
- Ensure the needed network resources (e.g., VLAN, VXLAN) are properly configured.
- Set the SCSCI controller for the disk to VMware PVSCSI.
- Do not create a new disk (this will be imported later from the ESXi source).
- Each VM gets an ID assigned by Proxmox (note it down, it will be needed later).
Copy VM from ESXi to Proxmox
The content of the virtual machine (VM) will be transferred from the ESXi to the Proxmox system using the open source tool rsync for efficient synchronization and copying. Therefore, the following commands need to be executed from the Proxmox system, where we create a temporary directory to store the VM’s content:
mkdir /tmp/migration_pgsql07.gyptazy.ch cd /tmp/migration_pgsql07.gyptazy.ch rsync -avP root@esx02-test.gyptazy.ch:/vmfs/volumes/137b4261-68e88bae-0000-000000000000/pgsq07.gyptazy.ch/* .
Depending on the file size of them virtual machine and the network connectivity this process may take some time.
Import VM in Proxmox
qm disk import 119 pgsql07.gyptazy.ch.vmdk local-lvm
Depending on the creation format of the VM or the exporting format there may be multiple disk files which may also be suffixed by _flat. This procedure needs to be repeated by all available disks.
Starting the VM
In the final step, all settings, resources, definitions and customizations of the system should be thoroughly reviewed. One validated, the VM can be launched, ensuring that all components are correctly configured for operation within the Proxmox environment.
Conclusion
This article only covers one of many possible methods for migrations in simple, standalone setups. In more complex environments involving multiple host nodes and different storage systems like fibre channel or network storage, there are significant differences and additional considerations. Additionally, there may be specific requirements regarding availability and Service Level Agreements (SLAs) to be concern. This may be very specific for each environment. Feel free to contact us for personalized guidance on your specific migration needs at any time. We are also pleased to offer our support in related areas in open source such as virtualization (e.g., OpenStack, VirtualBox) and topics pertaining to cloud migrations.
Addendum
On the 27th of March, Proxmox released their new import wizard (pve-esxi-import-tools) which makes migrations from VMware ESXi instances to a Proxmox environment much easier. Within an upcoming blog post we will provide more information about the new tooling and cases where this might be more useful but also covering the corner cases where the new import wizard cannot be used.
SQLreduce: Reduce verbose SQL queries to minimal examples
Developers often face very large SQL queries that raise some errors. SQLreduce is a tool to reduce that complexity to a minimal query.
SQLsmith generates random SQL queries
SQLsmith is a tool that generates random SQL queries and runs them against a PostgreSQL server (and other DBMS types). The idea is that by fuzz-testing the query parser and executor, corner-case bugs can be found that would otherwise go unnoticed in manual testing or with the fixed set of test cases in PostgreSQL’s regression test suite. It has proven to be an effective tool with over 100 bugs found in different areas in the PostgreSQL server and other products since 2015, including security bugs, ranging from executor bugs to segfaults in type and index method implementations. For example, in 2018, SQLsmith found that the following query triggered a segfault in PostgreSQL:
select
case when pg_catalog.lastval() < pg_catalog.pg_stat_get_bgwriter_maxwritten_clean() then case when pg_catalog.circle_sub_pt(
cast(cast(null as circle) as circle),
cast((select location from public.emp limit 1 offset 13)
as point)) ~ cast(nullif(case when cast(null as box) &> (select boxcol from public.brintest limit 1 offset 2)
then (select f1 from public.circle_tbl limit 1 offset 4)
else (select f1 from public.circle_tbl limit 1 offset 4)
end,
case when (select pg_catalog.max(class) from public.f_star)
~~ ref_0.c then cast(null as circle) else cast(null as circle) end
) as circle) then ref_0.a else ref_0.a end
else case when pg_catalog.circle_sub_pt(
cast(cast(null as circle) as circle),
cast((select location from public.emp limit 1 offset 13)
as point)) ~ cast(nullif(case when cast(null as box) &> (select boxcol from public.brintest limit 1 offset 2)
then (select f1 from public.circle_tbl limit 1 offset 4)
else (select f1 from public.circle_tbl limit 1 offset 4)
end,
case when (select pg_catalog.max(class) from public.f_star)
~~ ref_0.c then cast(null as circle) else cast(null as circle) end
) as circle) then ref_0.a else ref_0.a end
end as c0,
case when (select intervalcol from public.brintest limit 1 offset 1)
>= cast(null as "interval") then case when ((select pg_catalog.max(roomno) from public.room)
!~~ ref_0.c)
and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end
else case when ((select pg_catalog.max(roomno) from public.room)
!~~ ref_0.c)
and (cast(null as xid) <> 100) then ref_0.b else ref_0.b end
end as c1,
ref_0.a as c2,
(select a from public.idxpart1 limit 1 offset 5) as c3,
ref_0.b as c4,
pg_catalog.stddev(
cast((select pg_catalog.sum(float4col) from public.brintest)
as float4)) over (partition by ref_0.a,ref_0.b,ref_0.c order by ref_0.b) as c5,
cast(nullif(ref_0.b, ref_0.a) as int4) as c6, ref_0.b as c7, ref_0.c as c8
from
public.mlparted3 as ref_0
where true;
However, just like in this 40-line, 2.2kB example, the random queries generated by SQLsmith that trigger some error are most often very large and contain a lot of noise that does not contribute to the error. So far, manual inspection of the query and tedious editing was required to reduce the example to a minimal reproducer that developers can use to fix the problem.
Reduce complexity with SQLreduce
This issue is solved by SQLreduce. SQLreduce takes as input an arbitrary SQL query which is then run against a PostgreSQL server. Various simplification steps are applied, checking after each step that the simplified query still triggers the same error from PostgreSQL. The end result is a SQL query with minimal complexity.
SQLreduce is effective at reducing the queries from original error reports from SQLsmith to queries that match manually-reduced queries. For example, SQLreduce can effectively reduce the above monster query to just this:
SELECT pg_catalog.stddev(NULL) OVER () AS c5 FROM public.mlparted3 AS ref_0
Note that SQLreduce does not try to derive a query that is semantically identical to the original, or produces the same query result – the input is assumed to be faulty, and we are looking for the minimal query that produces the same error message from PostgreSQL when run against a database. If the input query happens to produce no error, the minimal query output by SQLreduce will just be SELECT
.
How it works
We’ll use a simpler query to demonstrate how SQLreduce works and which steps are taken to remove noise from the input. The query is bogus and contains a bit of clutter that we want to remove:
$ psql -c 'select pg_database.reltuples / 1000 from pg_database, pg_class where 0 < pg_database.reltuples / 1000 order by 1 desc limit 10'
ERROR: column pg_database.reltuples does not exist
Let’s pass the query to SQLreduce:
$ sqlreduce 'select pg_database.reltuples / 1000 from pg_database, pg_class where 0 < pg_database.reltuples / 1000 order by 1 desc limit 10'
SQLreduce starts by parsing the input using pglast and libpg_query which expose the original PostgreSQL parser as a library with Python bindings. The result is a parse tree that is the basis for the next steps. The parse tree looks like this:
selectStmt
├── targetList
│ └── /
│ ├── pg_database.reltuples
│ └── 1000
├── fromClause
│ ├── pg_database
│ └── pg_class
├── whereClause
│ └── <
│ ├── 0
│ └── /
│ ├── pg_database.reltuples
│ └── 1000
├── orderClause
│ └── 1
└── limitCount
└── 10
Pglast also contains a query renderer that can render back the parse tree as SQL, shown as the regenerated query below. The input query is run against PostgreSQL to determine the result, in this case ERROR: column pg_database.reltuples does not exist
.
Input query: select pg_database.reltuples / 1000 from pg_database, pg_class where 0 < pg_database.reltuples / 1000 order by 1 desc limit 10
Regenerated: SELECT pg_database.reltuples / 1000 FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ORDER BY 1 DESC LIMIT 10
Query returns: ✔ ERROR: column pg_database.reltuples does not exist
SQLreduce works by deriving new parse trees that are structurally simpler, generating SQL from that, and run these queries against the database. The first simplification steps work on the top level node, where SQLreduce tries to remove whole subtrees to quickly find a result. The first reduction tried is to remove LIMIT 10
:
SELECT pg_database.reltuples / 1000 FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ORDER BY 1 DESC ✔
The query result is still ERROR: column pg_database.reltuples does not exist
, indicated by a ✔ check mark. Next, ORDER BY 1
is removed, again successfully:
SELECT pg_database.reltuples / 1000 FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ✔
Now the entire target list is removed:
SELECT FROM pg_database, pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ✔
This shorter query is still equivalent to the original regarding the error message returned when it is run against the database. Now the first unsuccessful reduction step is tried, removing the entire FROM
clause:
SELECT WHERE 0 < ((pg_database.reltuples / 1000)) ✘ ERROR: missing FROM-clause entry for table "pg_database"
That query is also faulty, but triggers a different error message, so the previous parse tree is kept for the next steps. Again a whole subtree is removed, now the WHERE
clause:
SELECT FROM pg_database, pg_class ✘ no error
We have now reduced the input query so much that it doesn’t error out any more. The previous parse tree is still kept which now looks like this:
selectStmt
├── fromClause
│ ├── pg_database
│ └── pg_class
└── whereClause
└── <
├── 0
└── /
├── pg_database.reltuples
└── 1000
Now SQLreduce starts digging into the tree. There are several entries in the FROM
clause, so it tries to shorten the list. First, pg_database
is removed, but that doesn’t work, so pg_class
is removed:
SELECT FROM pg_class WHERE 0 < ((pg_database.reltuples / 1000)) ✘ ERROR: missing FROM-clause entry for table "pg_database"
SELECT FROM pg_database WHERE 0 < ((pg_database.reltuples / 1000)) ✔
Since we have found a new minimal query, recursion restarts at top-level with another try to remove the WHERE
clause. Since that doesn’t work, it tries to replace the expression with NULL
, but that doesn’t work either.
SELECT FROM pg_database ✘ no error
SELECT FROM pg_database WHERE NULL ✘ no error
Now a new kind of step is tried: expression pull-up. We descend into WHERE
clause, where we replace A < B
first by A
and then by B
.
SELECT FROM pg_database WHERE 0 ✘ ERROR: argument of WHERE must be type boolean, not type integer
SELECT FROM pg_database WHERE pg_database.reltuples / 1000 ✔
SELECT WHERE pg_database.reltuples / 1000 ✘ ERROR: missing FROM-clause entry for table "pg_database"
The first try did not work, but the second one did. Since we simplified the query, we restart at top-level to check if the FROM
clause can be removed, but it is still required.
From A / B
, we can again pull up A
:
SELECT FROM pg_database WHERE pg_database.reltuples ✔
SELECT WHERE pg_database.reltuples ✘ ERROR: missing FROM-clause entry for table "pg_database"
SQLreduce has found the minimal query that still raises ERROR: column pg_database.reltuples does not exist
with this parse tree:
selectStmt
├── fromClause
│ └── pg_database
└── whereClause
└── pg_database.reltuples
At the end of the run, the query is printed along with some statistics:
Minimal query yielding the same error:
SELECT FROM pg_database WHERE pg_database.reltuples
Pretty-printed minimal query:
SELECT
FROM pg_database
WHERE pg_database.reltuples
Seen: 15 items, 915 Bytes
Iterations: 19
Runtime: 0.107 s, 139.7 q/s
This minimal query can now be inspected to fix the bug in PostgreSQL or in the application.
About credativ
The credativ GmbH is a manufacturer-independent consulting and service company located in Moenchengladbach, Germany. With over 22+ years of development and service experience in the open source space, credativ GmbH can assist you with unparalleled and individually customizable support. We are here to help and assist you in all your open source infrastructure needs.
Since the successful merger with Instaclustr in 2021, credativ GmbH has been the European headquarters of the Instaclustr Group, which helps organizations deliver applications at scale through its managed platform for open source technologies such as Apache Cassandra®, Apache Kafka®, Apache Spark™, Redis™, OpenSearch®, PostgreSQL®, and Cadence.
Instaclustr combines a complete data infrastructure environment with hands-on technology expertise to ensure ongoing performance and optimization. By removing the infrastructure complexity, we enable companies to focus internal development and operational resources on building cutting edge customer-facing applications at lower cost. Instaclustr customers include some of the largest and most innovative Fortune 500 companies.
Patroni is a clustering solution for PostgreSQL® that is getting more and more popular in the cloud and Kubernetes sector due to its operator pattern and integration with Etcd or Consul. Some time ago we wrote a blog post about the integration of Patroni into Debian. Recently, the vip-manager project which is closely related to Patroni has been uploaded to Debian by us. We will present vip-manager and how we integrated it into Debian in the following.
To recap, Patroni uses a distributed consensus store (DCS) for leader-election and failover. The current cluster leader periodically updates its leader-key in the DCS. As soon the key cannot be updated by Patroni for whatever reason it becomes stale. A new leader election is then initiated among the remaining cluster nodes.
PostgreSQL Client-Solutions for High-Availability
From the user’s point of view it needs to be ensured that the application is always connected to the leader, as no write transactions are possible on the read-only standbys. Conventional high-availability solutions like Pacemaker utilize virtual IPs (VIPs) that are moved to the primary node in the case of a failover.
For Patroni, such a mechanism did not exist so far. Usually, HAProxy (or a similar solution) is used which does periodic health-checks on each node’s Patroni REST-API and routes the client requests to the current leader.
An alternative is client-based failover (which is available since PostgreSQL 10), where all cluster members are configured in the client connection string. After a connection failure the client tries each remaining cluster member in turn until it reaches a new primary.
vip-manager
A new and comfortable approach to client failover is vip-manager. It is a service written in Go that gets started on all cluster nodes and connects to the DCS. If the local node owns the leader-key, vip-manager starts the configured VIP. In case of a failover, vip-manager removes the VIP on the old leader and the corresponding service on the new leader starts it there. The clients are configured for the VIP and will always connect to the cluster leader.
Debian-Integration of vip-manager
For Debian, the pg_createconfig_patroni
program from the Patroni package has been adapted so that it can now create a vip-manager configuration:
pg_createconfig_patroni 11 test --vip=10.0.3.2
Similar to Patroni, we start the service for each instance:
systemctl start vip-manager@11-test
The output of patronictl
shows that pg1
is the leader:
+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test | pg1 | 10.0.3.247 | Leader | running | 1 | |
| 11-test | pg2 | 10.0.3.94 | | running | 1 | 0 |
| 11-test | pg3 | 10.0.3.214 | | running | 1 | 0 |
+---------+--------+------------+--------+---------+----+-----------+
In journal of ‘pg1’ it can be seen that the VIP has been configured:
Jan 19 14:53:38 pg1 vip-manager[9314]: 2020/01/19 14:53:38 IP address 10.0.3.2/24 state is false, desired true
Jan 19 14:53:38 pg1 vip-manager[9314]: 2020/01/19 14:53:38 Configuring address 10.0.3.2/24 on eth0
Jan 19 14:53:38 pg1 vip-manager[9314]: 2020/01/19 14:53:38 IP address 10.0.3.2/24 state is true, desired true
If LXC containers are used, one can also see the VIP in the output of lxc-ls -f
:
NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED
pg1 RUNNING 0 - 10.0.3.2, 10.0.3.247 - false
pg2 RUNNING 0 - 10.0.3.94 - false
pg3 RUNNING 0 - 10.0.3.214 - false
The vip-manager packages are available for Debian testing (bullseye
) and unstable, as well as for the upcoming 20.04 LTS Ubuntu release (focal
) in the official repositories. For Debian stable (buster
), as well as for Ubuntu 19.04 and 19.10, packages are available at apt.postgresql.org
maintained by credativ, along with the updated Patroni packages with vip-manager integration.
Switchover Behaviour
In case of a planned switchover, e.g. pg2
becomes the new leader:
# patronictl -c /etc/patroni/11-test.yml switchover --master pg1 --candidate pg2 --force
Current cluster topology
+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test | pg1 | 10.0.3.247 | Leader | running | 1 | |
| 11-test | pg2 | 10.0.3.94 | | running | 1 | 0 |
| 11-test | pg3 | 10.0.3.214 | | running | 1 | 0 |
+---------+--------+------------+--------+---------+----+-----------+
2020-01-19 15:35:32.52642 Successfully switched over to "pg2"
+---------+--------+------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+---------+--------+------------+--------+---------+----+-----------+
| 11-test | pg1 | 10.0.3.247 | | stopped | | unknown |
| 11-test | pg2 | 10.0.3.94 | Leader | running | 1 | |
| 11-test | pg3 | 10.0.3.214 | | running | 1 | 0 |
+---------+--------+------------+--------+---------+----+-----------+
The VIP has now been moved to the new leader:
NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED
pg1 RUNNING 0 - 10.0.3.247 - false
pg2 RUNNING 0 - 10.0.3.2, 10.0.3.94 - false
pg3 RUNNING 0 - 10.0.3.214 - false
This can also be seen in the journals, both from the old leader:
Jan 19 15:35:31 pg1 patroni[9222]: 2020-01-19 15:35:31,634 INFO: manual failover: demoting myself
Jan 19 15:35:31 pg1 patroni[9222]: 2020-01-19 15:35:31,854 INFO: Leader key released
Jan 19 15:35:32 pg1 vip-manager[9314]: 2020/01/19 15:35:32 IP address 10.0.3.2/24 state is true, desired false
Jan 19 15:35:32 pg1 vip-manager[9314]: 2020/01/19 15:35:32 Removing address 10.0.3.2/24 on eth0
Jan 19 15:35:32 pg1 vip-manager[9314]: 2020/01/19 15:35:32 IP address 10.0.3.2/24 state is false, desired false
As well as from the new leader pg2
:
Jan 19 15:35:31 pg2 patroni[9229]: 2020-01-19 15:35:31,881 INFO: promoted self to leader by acquiring session lock
Jan 19 15:35:31 pg2 vip-manager[9292]: 2020/01/19 15:35:31 IP address 10.0.3.2/24 state is false, desired true
Jan 19 15:35:31 pg2 vip-manager[9292]: 2020/01/19 15:35:31 Configuring address 10.0.3.2/24 on eth0
Jan 19 15:35:31 pg2 vip-manager[9292]: 2020/01/19 15:35:31 IP address 10.0.3.2/24 state is true, desired true
Jan 19 15:35:32 pg2 patroni[9229]: 2020-01-19 15:35:32,923 INFO: Lock owner: pg2; I am pg2
As one can see, the VIP is moved within one second.
Updated Ansible Playbook
Our Ansible-Playbook for the automated setup of a three-node cluster on Debian has also been updated and can now configure a VIP if so desired:
# ansible-playbook -i inventory -e vip=10.0.3.2 patroni.yml
Questions and Help
Do you have any questions or need help? Feel free to write to info@credativ.com.
There are two ways to authenticate yourself as a client to Icinga2. On the one hand there is the possibility to authenticate yourself by username and password. The other option is authentication using client certificates. With the automated query of the Icinga2 API, the setup of client certificates is not only safety-technically advantageous, but also in the implementation on the client side much more practical.
Unfortunately, the official Icinga2 documentation does not provide a description of the exact certificate creation process. Therefore here is a short manual:
After installing Icinga2 the API feature has to be activated first:
icinga2 feature enable api
The next step is to configure the Icinga2-node as master, the easiest way to do this is with the “node-wizard” program:
icinga2 node wizard
Icinga2 creates the necessary CA certificates with which the client certificates still to be created must be signed. Now the client certificate is created:
icinga2 pki new-cert --cn --key .key --csr .csr
The parameter cn
stands for the so-called common-name
. This is the name used in the Icinga2 user configuration to assign the user certificate to the user. Usually the common name is the FQDN. In this scenario, however, this name is freely selectable. All other names can also be freely chosen, but it is recommended to use a name that suggests that the three files belong together.
Now the certificate has to be signed by the CA, Icinga2:
icinga2 pki sign-csr --csr .csr --cert .crt
Finally, the API user must be created in the file “api-user.conf”. This file is located in the subfolder of each Icinga2 configuration:
object ApiUser { client_cn = permissions = [] }
For a detailed explanation of the user’s assignment of rights, it is worth taking a look at the documentation.
Last but not least Icinga2 has to be restarted. Then the user can access the Icinga2 API without entering a username and password, if he passes the certificates during the query.
You can read up on the services we provide for Icinga2 righthere.
This post was originally written by Bernd Borowski.
One would think that microcode updates are basically unproblematic on modern Linux distributions. This is fundamentally correct. Nevertheless, there are always edge cases in which distribution developers may have missed something.
Using the example of Ubuntu 18.04 LTS “Bionic Beaver” in connection with the XEN Hypervisor this becomes obvious when it comes to processors microcode updates.
Ubuntu delivers updated microcode packages for both AMD and Intel. However, these are apparently not applied to the processor.
XEN Microkernel
The reason for this is not to obvious. In XEN, the host system is already paravirtualized and cannot directly influence the CPU for security reasons. Accordingly, manual attempts to change the current microcode fail.
Therefore, the XEN microkernel has to take care of the microcode patching. Instructed correctyl, it will do so at boot time.
Customize command line in Grub
For the XEN kernel to patch the microcode of the CPU, it must have access to the microcode code files at boot time and on the other hand, he must also have the order to apply them. We can achieve the latter by Grub boot loader configuration. To do so, we setup a parameter in the kernel command line.
In the case of Ubuntu 18.04 LTS, the grub configuration file can be found at /etc/default/grub
.
There you should find the file xen.cfg
. This is of course only the case if the XEN Hypervisor package is installed. Open the config file in your editor and look for the location of the variable GRUB_CMDLINE_XEN_DEFAULT
. Add the parameter ucode=scan
. In the default state, the line of the xen.cfg
then should look like this:
GRUB_CMDLINE_XEN_DEFAULT="ucode=scan"
Customize Initramfs
In addition to the instruction, the microkernel of the XEN hypervisor also needs access to the respective microcode files as well as the ‘Intel Microcode Tool’, if applicable.
While the microcode packages are usually already installed correctly, the the Intel tool may had to be made accessible via sudo apt-get install iucode tool
. Care must also be taken to ensure that the microcode files also get into the initial ramdisk. For this purpose, Ubuntu already has matching scripts available.
In the default state, the system tries to select the applicable microcodes for the CPU in the InitramFS. Unfortunately, this does not succeed always, so you might have to help here.
With the command sudo lsinitrd /boot/initrd.img-4.15.0-46-generic
you can, for example, check which contents are stored in the InitramFS with the name initrd.img-4.15.0-46-generic
. If on an Intel system there is something from AMD but not Intel shown, the automatic processor detection went wrong when creating the initramdisk.
To get this right, you need to look at the files amd64-microcode
and intel-microcode
in the directory /etc/default
. Each of these two config files has a INITRAMFS variable AMD64UCODE_INITRAMFS
or IUCODE_TOOL_INITRAMFS
. The valid values to configure are “no,” “auto,” or “early”. Default is “auto”. With “auto” the system tries the auto discovery mentioned above. If it doesn’t work, you should set the value to early
in the file matching the manufacturer of your CPU, and the other setup file to no
. If the manufacturer is Intel, you can use the file intel-microcode
to set the following additional variable:
IUCODE_TOOL_SCANCPUS=yes
This causes the script set to perform advanced CPU detection based on the Intel CPU, so that only the microcode files are included in the InitramFS that match the CPU. This helps avoiding an oversized initial ramdisk.
Finalize changes
Both the changes to the grub config and the adjustments to the InitramFS must also be finalized. This is done via
sudo update-initramfs -u sudo update-grub
A subsequent restart of the hypervisor will then let the XEN microkernel integrate the microcode patches provided in the InitramFS to the CPU.
Is it worth the effort?
Adjustments to the microcode of the processors are important. CPU manufacturers troubleshoot the “hardware” they sell. This fixes can be very important to maintain the integrity oder security of your server system – as we saw last year when the Spectre and Meltdown bugs got undisclosed. Of course, microcode updates can also be seen as negative, since the fixes for “Spectre” as well as “Meltdown” impose performance losses. Here it is necessary to consider whether one should integrate the microcode updates or not. This depends on risk vs. reward. Here there are quite different views, which are to be considered in the context of the system application.
A virtualization host, which runs third party virtual machines has whole other security requirements than a hypervisor who is deeply digged into the internal infrastructure and only runs trusted VMs. Between these two extremes, there are, of course, a few shades to deal with.
In this article we will look at the highly available operation of PostgreSQL® in a Kubernetes environment. A topic that is certainly of particular interest to many of our PostgreSQL® users.
Together with our partner company MayaData, we will demonstrate below the application possibilities and advantages of the extremely powerful open source project – OpenEBS.
OpenEBS is a freely available storage management system, whose development is supported and backed by MayaData.
We would like to thank Murat-Karslioglu from MayaData and our colleague Adrian Vondendriesch for this interesting and helpful article. This article simultaneously also appeared on OpenEBS.io.
PostgreSQL® anywhere — via Kubernetes with some help from OpenEBS and credativ engineering
by Murat Karslioglu, OpenEBS and Adrian Vondendriesch, credativ
Introduction
If you are already running Kubernetes on some form of cloud whether on-premises or as a service, you understand the ease-of-use, scalability and monitoring benefits of Kubernetes — and you may well be looking at how to apply those benefits to the operation of your databases.
PostgreSQL® remains a preferred relational database, and although setting up a highly available Postgres cluster from scratch might be challenging at first, we are seeing patterns emerging that allow PostgreSQL® to run as a first class citizen within Kubernetes, improving availability, reducing management time and overhead, and limiting cloud or data center lock-in.
There are many ways to run high availability with PostgreSQL®; for a list, see the PostgreSQL® Documentation. Some common cloud-native Postgres cluster deployment projects include Crunchy Data’s, Sorint.lab’s Stolon and Zalando’s Patroni/Spilo. Thus far we are seeing Zalando’s operator as a preferred solution in part because it seems to be simpler to understand and we’ve seen it operate well.
Some quick background on your authors:
- OpenEBS is a broadly deployed OpenSource storage and storage management project sponsored by MayaData.
- credativ is a leading open source support and engineering company with particular depth in PostgreSQL®.
In this blog, we’d like to briefly cover how using cloud-native or “container attached” storage can help in the deployment and ongoing operations of PostgreSQL® on Kubernetes. This is the first of a series of blogs we are considering — this one focuses more on why users are adopting this pattern and future ones will dive more into the specifics of how they are doing so.
At the end you can see how to use a Storage Class and a preferred operator to deploy PostgreSQL® with OpenEBS underlying
If you are curious about what container attached storage of CAS is you can read more from the Cloud Native Computing Foundation (CNCF) here.
Conceptually you can think of CAS as being the decomposition of previously monolithic storage software into containerized microservices that themselves run on Kubernetes. This gives all the advantages of running Kubernetes that already led you to run Kubernetes — now applied to the storage and data management layer as well. Of special note is that like Kubernetes, OpenEBS runs anywhere so the same advantages below apply whether on on-premises or on any of the many hosted Kubernetes services.
PostgreSQL® plus OpenEBS
®-with-OpenEBS-persistent-volumes.png”>Postgres-Operator (for cluster deployment)
Install OpenEBS
- If OpenEBS is not installed in your K8s cluster, this can be done from here. If OpenEBS is already installed, go to the next step.
- Connect to MayaOnline (Optional): Connecting the Kubernetes cluster to MayaOnline provides good visibility of storage resources. MayaOnline has various support options for enterprise customers.
Configure cStor Pool
- If cStor Pool is not configured in your OpenEBS cluster, this can be done from here. As PostgreSQL® is a StatefulSet application, it requires a single storage replication factor. If you prefer additional redundancy you can always increase the replica count to 3.
During cStor Pool creation, make sure that the maxPools parameter is set to >=3. If a cStor pool is already configured, go to the next step. Sample YAML named openebs-config.yaml for configuring cStor Pool is provided in the Configuration details below.
openebs-config.yaml
#Use the following YAMLs to create a cStor Storage Pool. # and associated storage class. apiVersion: openebs.io/v1alpha1 kind: StoragePoolClaim metadata: name: cstor-disk spec: name: cstor-disk type: disk poolSpec: poolType: striped # NOTE — Appropriate disks need to be fetched using `kubectl get disks` # # `Disk` is a custom resource supported by OpenEBS with `node-disk-manager` # as the disk operator # Replace the following with actual disk CRs from your cluster `kubectl get disks` # Uncomment the below lines after updating the actual disk names. disks: diskList: # Replace the following with actual disk CRs from your cluster from `kubectl get disks` # — disk-184d99015253054c48c4aa3f17d137b1 # — disk-2f6bced7ba9b2be230ca5138fd0b07f1 # — disk-806d3e77dd2e38f188fdaf9c46020bdc # — disk-8b6fb58d0c4e0ff3ed74a5183556424d # — disk-bad1863742ce905e67978d082a721d61 # — disk-d172a48ad8b0fb536b9984609b7ee653 — -
Create Storage Class
- You must configure a StorageClass to provision cStor volume on a cStor pool. In this solution, we are using a StorageClass to consume the cStor Pool which is created using external disks attached on the Nodes. The storage pool is created using the steps provided in the Configure StoragePool section. In this solution, PostgreSQL® is a deployment. Since it requires replication at the storage level the cStor volume replicaCount is 3. Sample YAML named openebs-sc-pg.yaml to consume cStor pool with cStorVolume Replica count as 3 is provided in the configuration details below.
openebs-sc-pg.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: openebs-postgres annotations: openebs.io/cas-type: cstor cas.openebs.io/config: | - name: StoragePoolClaim value: "cstor-disk" - name: ReplicaCount value: "3" provisioner: openebs.io/provisioner-iscsi reclaimPolicy: Delete ---
Launch and test Postgres Operator
- Clone Zalando’s Postgres Operator.
git clone https://github.com/zalando/postgres-operator.git cd postgres-operator
Use the OpenEBS storage class
- Edit manifest file and add openebs-postgres as the storage class.
nano manifests/minimal-postgres-manifest.yaml
After adding the storage class, it should look like the example below:
apiVersion: "acid.zalan.do/v1" kind: postgresql metadata: name: acid-minimal-cluster namespace: default spec: teamId: "ACID" volume: size: 1Gi storageClass: openebs-postgres numberOfInstances: 2 users: # database owner zalando: - superuser - createdb # role for application foo foo_user: [] #databases: name->owner databases: foo: zalando postgresql: version: "10" parameters: shared_buffers: "32MB" max_connections: "10" log_statement: "all"
Start the Operator
- Run the command below to start the operator
kubectl create -f manifests/configmap.yaml # configuration kubectl create -f manifests/operator-service-account-rbac.yaml # identity and permissions kubectl create -f manifests/postgres-operator.yaml # deployment
Create a Postgres cluster on OpenEBS
Optional: The operator can run in a namespace other than default. For example, to use the test namespace, run the following before deploying the operator’s manifests:
kubectl create namespace test kubectl config set-context $(kubectl config current-context) — namespace=test
- Run the command below to deploy from the example manifest:
kubectl create -f manifests/minimal-postgres-manifest.yaml
2. It only takes a few seconds to get the persistent volume (PV) for the pgdata-acid-minimal-cluster-0 up. Check PVs created by the operator using the kubectl get pv command:
$ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-8852ceef-48fe-11e9–9897–06b524f7f6ea 1Gi RWO Delete Bound default/pgdata-acid-minimal-cluster-0 openebs-postgres 8m44s pvc-bfdf7ebe-48fe-11e9–9897–06b524f7f6ea 1Gi RWO Delete Bound default/pgdata-acid-minimal-cluster-1 openebs-postgres 7m14s
Connect to the Postgres master and test
- If it is not installed previously, install psql client:
sudo apt-get install postgresql-client
2. Run the command below and note the hostname and host port.
kubectl get service — namespace default |grep acid-minimal-cluster
3. Run the commands below to connect to your PostgreSQL® DB and test. Replace the [HostPort] below with the port number from the output of the above command:
export PGHOST=$(kubectl get svc -n default -l application=spilo,spilo-role=master -o jsonpath="{.items[0].spec.clusterIP}") export PGPORT=[HostPort] export PGPASSWORD=$(kubectl get secret -n default postgres.acid-minimal-cluster.credentials -o ‘jsonpath={.data.password}’ | base64 -d) psql -U postgres -c ‘create table foo (id int)’
Congrats you now have the Postgres-Operator and your first test database up and running with the help of cloud-native OpenEBS storage.
Partnership and future direction
As this blog indicates, the teams at MayaData / OpenEBS and credativ are increasingly working together to help organizations running PostgreSQL® and other stateful workloads. In future blogs, we’ll provide more hands-on tips.
We are looking for feedback and suggestions on where to take this collaboration. Please provide feedback below or find us on Twitter or on the OpenEBS slack community.