credativ® Inside Archiv - Page 3 of 4 - credativ®

Having a Build Infrastructure for Extended Support of End-of-Life Linux Distributions

Intro

When it comes to providing extended support for end-of-life (EOL) Linux distributions, the traditional method of building packages from source may getting certain challenges. One of the biggest challenges is the need for individual developers to install build dependencies on their own build environments. This often leads to different alternative build dependencies being installed, resulting in issues related to binary compatibility. Additionally, there are other challenges to consider, such as establishing an isolated and clean build environment in parallel different package builds,  implementing an effective package validation strategy, and managing repository publishing..etc

To overcome these challenges, it is highly recommended to have a dedicated build infrastructure that enables the building of packages in a consistent and reproducible manner. With the build infrastructure, teams can address these issues effectively, and ensures that the necessary packages and dependencies are available, eliminating the need for individual developers to manage their own build environments.

OpenSUSE’s Open Build Service (OBS) offers a powerful build infrastructure that is already well known for teams working on customized or private software packages built on top of major Linux distributions, and we found it’s also useful to work on extended support for End-of-Life (EOL) Linux distributions. With support for RPM-based, Debian-based, and Arch-based Linux distributions, OBS proves to be a versatile solution capable of meeting the needs for Extended Support of End-of-Life(EOL) Linux distributions.

On June 30, 2024, two major Linux distributions, Debian 10 “Buster” and CentOS 7, are reaching their End-of-Life (EOL) status. These gained widespread adoption in numerous large enterprise environments. However, due to various factors, organizations may face challenges in completing migration processes in a timely manner, necessitating temporary extended support.

Once a distribution reaches its End-of-Life(EOL). Upstream support, including mirrors and infrastructure are typically removed. This makes a significant challenge for organizations seeking extended support, as they not only require access to a local offline mirror but also need the infrastructure to build packages in a manner consistent with the original upstream builds.

To achieve the goal of providing extended support for end-of-life (EOL) Linux distributions, we choice to have a local mirror of the EOL Linux distribution and also have a private OBS instance. To have an private OBS instance. You can download OBS code from the upstream GitHub repository and install it on your local Linux distribution, or alternatively, you can simply download the OBS Appliance Installer to integrate a private OBS instance as an build infrastructure.

By adopting a private OBS instance as the build infrastructure for extended support of EOL Linux distributions, teams can simplifies the package building process, minimize inconsistencies, and achieve reproducibility. This not only simplifies the development workflow but also enhances the overall quality and reliability of the packages being generated. Replicate the semi-original build process, ensuring that the resulting packages maintain the same quality, integrity, and compatibility as those generated upstream.

Allow us to demonstrate how OBS can be utilized for extended support of end-of-life (EOL) Linux distributions. However, please note that the following examples are purely for demonstrate purpose and may not precisely reflect the specific approach nor EOL product used in our actual project.

Config local mirror as DoD Repository

Once your private OBS instance is set up, you can login into OBS on it’s WebUI via https by ‘Admin’ with default password ‘opensuse’. You can only update some configurations with the Admin account. Including Manage Users and Groups, Add Download on Demand (DoD) Repository to establish a connection with the local mirror in Project created on private OBS instance.

Add DoD Repository in Repository tab on your private OBS instance:

Add DoD Repository in Reposotory tab on private OBS instance.
Config DoD with url for internal local mirror of EOL Linux Distribution:

Configure local mirror as DoD repository.

After you connected the local mirror with the EOL Linux Distribution in the private OBS instance. The private OBS instance became the build infrastructure, supporting various major distribution types, positions it as an ideal solution for a team seeking infrastructure to provide extended support for EOL Linux distributions.

Benefits of build infrastructure

OBS ensures consistency and reproducibility in the build process. Also provide a customizable build environment, allowing teams to tailor the environment to specific project requirements.


Customizable Build Environment.

Customizable Build Environment.

Each worker runs isolated with customizable build environment efficiently in parallel.

Each worker runs isolated with customizable build environment efficiently in parallel.

See more details: https://en.opensuse.org/openSUSE:Build_Service_prjconf#Release

Demonstrate of other key benefits

Here are some benefits of using OBS with screenshots:

Manage users and groups and different roles:

Built-in Access Control: different users and groups in different roles.

Fetching build dependencies packages from offline local mirror via DoD:

Download on Demand(DoD): fetching build dependencies packages from offline local mirror.

Start the build process after downloaded build dependencies from DoD:

Bootstrap in clean chroot or KVM environment and start the build process. Shows the progress via webui.

Build Result for Staging repo:

Staging repo: Build Result on WebUI.

Automatically published in staging repo once package builds:

Repository Publish Integration: automatically published in staging repo once package builds.

You can start the package validation process with staging repo. And then submit a package from staging to prod after it passed the validation process:

Submit a package from staging to prod.

The package submit request can be reviewed by prod repo master to accept or decline:Review by prod repo master to accept or decline.

The build process starts after package accepted into Prod repo. Here shows build status in Prod repo: Prod repo: See Build Result on WebUI.

Automatically published to prod repo once package builds succeeded:Automatically published to prod repo once package builds

Package got automated builds when committed into an OBS project. Easily to integrate into a delivery pipelines.

Package got automated builds when committed into an OBS project. Easily to integrate into a delivery pipelines.

In conclusion, OpenSUSE’s Open Build Service (OBS) provides a robust and comprehensive build infrastructure for teams working on extended support for EOL Linux distributions.

If you are interested to integrate a build infrastructure, or require extend support for any EOL Linux distribution. Please reach out to us. Our team is equipped with the expertise to assist you in integrate a private OBS instance into your exist infrastructure.

This article was originally written by Andrew Lee

Introduction

When it comes to providing extended support for end-of-life (EOL) Linux distributions, the traditional method of building packages from sources may present certain challenges. One of the biggest challenges is the need for individual developers to install build dependencies on their own build environments. This often leads to different build dependencies being installed, resulting in issues related to binary compatibility. Additionally, there are other challenges to consider, such as establishing an isolated and clean build environment for different package builds in parallel, implementing an effective package validation strategy, and managing repository publishing, etc.

To overcome these challenges, it is highly recommended to have a dedicated build infrastructure that enables the building of packages in a consistent and reproducible way. With such an infrastructure, teams can address these issues effectively, and ensures that the necessary packages and dependencies are available, eliminating the need for individual developers to manage their own build environments.

OpenSUSE’s Open Build Service (OBS) provides a widely recognized public build infrastructure that supports RPM-based, Debian-based, and Arch-based package builds. This infrastructure is well-regarded among users who require customized package builds on major Linux distributions. Additionally, OBS can be deployed as a private instance, offering a robust solution for organizations engaged in private software package builds. With its extensive feature set, OBS serves as a comprehensive build infrastructure that meets the requirements of teams to build security-patched package on top of already existing packages in order to extend the support of end-of-life (EOL) Linux distributions, for example.

Facts

On June 30, 2024, two major Linux distributions, Debian 10 “Buster” and CentOS 7, have reached their End-of-Life (EOL) status. These distributions gained widespread adoption in numerous large enterprise environments. However, due to various factors, organizations may face challenges in completing migration processes in a timely manner, requiring temporary extended support for these distributions.

When a distribution reaches its End-of-Life (EOL) support from upstream, mirrors and corresponding infrastructure are typically removed. This creates a significant challenge for organizations still using those distributions and seeking extended support, as they not only require access to a local offline mirror but also need the infrastructure to rebuild packages in such a way that is consistent with their former original upstream builds whenever there is a need to rebuild packages for security reasons.

Goal

To provide utilities that support teams building packages, OBS is able to help them. Thereby, it also helps them to have a build infrastructure that builds all existing source and binary packages along with their security patches in a semi-original build environment, further extending their support of EOL Linux distributions. This is achieved by keeping package builds at a similar level of quality and compatibility to their former original upstream builds.

Requirements

To achieve the goal mentioned above, two essential components are required:

  1. Local mirror of the EOL Linux distribution
  2. Private OBS instance.

In order to set up a private OBS instance, you can fetch OBS from the upstream GitHub repository and install as well as build it on your local machine, or alternatively, you can simply download the OBS Appliance Installer to integrate a private OBS instance as a new build infrastructure.

Keep in mind that the local mirror typically doesn’t contain the security patches to properly extend the support of the EOL distribution. Just rebuilding the existing packages won’t automatically extend the support of such distributions. Either those security patches must be maintained by the teams themselves and added to the package update by hand, or they are obtained from an upstream project that provides those patches already.

Nevertheless, once the private OBS instance is successfully installed, it can then be linked with the local offline mirror that the team wants to extend the support for, and be configured as a semi-original build environment. By doing so, the build environment can be configured as close as possible to the original upstream build environment to keep a similar level of quality and compatibility although packages might get built for security update with new patches.

This way, the already ended support of a Linux distribution can be extended because a designated team can not only build packages on their own but also provide patches for packages that are affected by bugs or security vulnerabilities.

Achievement

The OBS build infrastructure provides a semi-original build environment to simplify the package building process, reduce inconsistencies, and achieve reproducibility. By adopting this approach, teams can simplify their development and testing workflows and enhance the overall quality and reliability of the generated packages.

Demonstrates

Allow us to demonstrate how OBS can be utilized as a build infrastructure for teams in order to provide the build and publish infrastructure needed for extending the support of End-of-Life (EOL) Linux distributions. However, please note that the following examples are purely for demonstration purposes and may not precisely reflect the specific approach.

Config OBS

Once your private OBS instance is set up, you can log in to OBS by using its WebUI. You must use the ‘Admin‘ account with its default password ‘opensuse‘. After that, you can create a new project for the EOL Linux distribution as well as add users and groups for teams that are going to work on this project. In addition, you must link your project with a local mirror of the EOL Linux distribution and customize your OBS project to replicate the former upstream build environment.

OBS supports the Download on Demand (DoD) feature that can establish a connection with a local mirror on the OBS instance with the designated project within OBS to fetch build dependencies on-demand during the build process. This simplifies the build workflow and reduces the need for manual intervention.

Note: some configurations are only available with the Admin account, including Manage Users, Groups and Add Download on Demand (DoD) Repository.

Now, let’s dive into the OBS configuration of such a project.

Where to Add DoD Repository

After you created a new project in OBS, you may find the “Add DoD Repository” option in the Repositories tab under the ‘Admin’ account.

There, you must click the Add DoD Repository button as shown below:

Add DoD Repository in Reposotory tab on private OBS instance.

Link your local mirror via DoD

Next, you can provide a URL to link your local mirror via “Add DoD Repository“.

By doing so, you are going to configure the DoD repository. For this, you have to:

Configure local mirror as DoD repository.

The build dependencies should now be available for OBS to download automatically via the DoD feature.

As you can see in the example below, OBS was able to fetch all the build dependencies for the shown “kernel” package via DoD before it actually started to build the package:

Download on Demand(DoD): fetching build dependencies packages from offline local mirror.

However, after you connected the local mirror with the EOL Linux Distribution, we should also customize the project configuration to re-create the semi-original build environment of the former upstream project in our OBS project.

Project Config

OBS offers a customizable build environment via Project Config, allowing teams to meet specific project requirements.

It allows teams to customize the build environment through the project configuration, enabling them to define macros, packages, flags, and other build-related settings. This helps to adapt their project to all requirements needed to keep the compatibility with the original packages.

You can find the Project Config here:

OBS offers a customizable build environment.

Let’s have a closer look at some examples showing how the project configuration could look like for building RPM-based packages or Debian-based packages.

Depending on the actual package type that has been used by the EOL Linux distribution, the project configuration must be adjusted accordingly.

RPM-based config
Here is an example for an RPM-based distribution where you can define macros, packages, flags, and other build-related settings.

Please note that this example is only for demonstration purposes:

Project config may looks like.

Debian based config

Then, there is another example for Debian-based distribution where you can specify Repotype: debian, define preinstalled packages, and build-related settings in the build environment.

Again, please note that this example is for demonstration purposes only:

Customizable Build Environment.

As you can see in the examples above, the Repotype defines the repository format for the published repositories. Preinstall defines packages that should be available in the build environment while building other packages. You may find more details about the syntax in the Build Configuration chapter in the manual provided by OpenSuSE.

With this customizable build environment feature, each build starts in a clean chroot or KVM environment, providing consistency and reproducibility in the build process. Teams may have different build environments in different projects that are isolated and can be built in parallel. The separated projects together, with dedicated chroot or KVM environments for building packages prevent interference or conflicts between builds.

User Management

In addition to projects, OBS offers flexible access control, allowing team members to have different roles and permissions for various projects or distributions. This ensures efficient collaboration and streamlined workflow management.

To use this feature properly, we must create Users and Groups by using the Admin account in OBS. You may find the ‘Users‘ tab within the project where you can add users and groups with different roles.

The following example shows how to add users and groups in a given project:

Built-in Access Control: different users and groups in different roles.

Flexible Project Setup

OBS allows teams to create separate projects for development, staging, and publishing purposes. This controlled workflow ensures thorough testing and validation of packages before they are submitted to the production project for publishing.

Testing Strategy with Projects

Before packages are released and published, proper testing of those packages is crucial. To achieve this, OBS offers repository publish integration that helps to implement a Testing Strategy. Developers may get their packages published in a corresponding Staging repository first before starting the validation process of the packages, so that they can be submitted to the Production project. From there, they will then be published and made available to other systems.

It seamlessly integrates with local repository publishing, ensuring that all packages in the build process are readily available. This eliminates the need for additional configuration on internal or external repositories.

The next example shows a ‘home:andrew:internal-os:10:staging’ project, which is used as a staging purposed repository. There, we import the nginx source package to build and validate it afterward.

Start the Build Process

Once the build dependencies are downloaded via DoD, the build process starts automatically.

Bootstrap in clean chroot or KVM environment and start the build process. Shows the progress via webui.

As soon as the build process for a package like ‘nginx’ succeeds, we will also see it in the interface:

Staging repo: Build Result on WebUI.

Last but not least, the package will automatically be published into a repository that is ready to be used with package managers like YUM. In our test case, the YUM-ready repository helps us to easily perform validation by conducting tests on VM or actual hardware.

Below, you can see an example of this staging repository (Index of the YUM-ready staging repository):

Repository Publish Integration: automatically published in staging repo once package builds.

Please note: To meet the testing strategy in our example, packages should only be submitted to the Prod project once they have successfully passed the validation process in the Staging repository.

Submit to Prod after Validation

As soon as we validate the functionality of a package in the staging project and repository, we can move on by submitting the package to the production project. Underneath the package overview page (e.g., in our case, the ‘nginx’ package), you can find the Submit Package button to start this process:

Submit package

In our example, we use the ‘internal-os:10:prod’ project for the Production repository. Previously, we built and validated the ‘nginx’ package in the Staging repository. Now we want to submit this package into the Production project which we call ‘internal-os:10:prod’.

Once you click “Submit package“, the following dialog should pop up. Here, you can fill out the form to submit the newly built package to production:

Submit a package from staging to prod.

Fill out this form and click ‘Create’ as soon as you are ready to submit the new package to the production project.

Find Requests

Next to submitting a package, a so-called repository master must approve the newly created requests for the designated project. This master can find a list of all requests in the Requests tab, where he/she can do the Request Review Control.

The master can find ‘Requests‘ in the project overview page like this:

See Requests

Here, you can see an example of a package that was submitted from Staging to the Prod project.

The master must click on the link marked with a red circle to perform the Request Review Control:

List of Requests

This will bring up the interface for Request Review Control, in which the repository master from the production project may decline or accept the package request:

Review by prod repo master to accept or decline.

Reproduce in Prod Project

Once the package is accepted into the production project, it will automatically download all required dependencies from DoD, and reproduce the build process.

The example below shows the successfully built package in the production repo after the build process has been completed:

Prod repo: See Build Result on WebUI.

Publish to YUM-ready repository for Prod

Finally, the package will then also be published to the YUM-ready Prod repository as soon as the package has been built successfully. Here, you can see an example of the YUM-ready Prod repository that OBS handles.

(Note: you can name the project and repository name on your preferences):

Automatically published to prod repo once package builds

Extras

In this chapter, we want to have a look at some possible extra customizations for OBS in your build infrastructures.

Automated Build Pipelines

OBS supports automated builds that may seamlessly integrate into delivery pipelines. Teams can set up CI triggers to commit new packages into OBS to initiate builds automatically based on predefined criteria, reducing manual effort and ensuring timely delivery of updated packages.

Packages get automated builds when committed into an OBS project, which enables administrators to easily integrate automated builds via pipelines:

Package got automated builds when committed into an OBS project. Easily to integrate into a delivery pipelines.

OBS Worker

Further, the Open Build Service is capable of configuring multiple workers (builders) in multiple architectures. If you want to build with other architectures, just install the ‘obs-worker’ package on another hardware with a different architecture and then simply edit the /etc/sysconfig/obs-server file.

In that file, you must set at least those parameters on the worker node:

  • OBS_SRC_SERVER
  • OBS_REPO_SERVERS
  • OBS_WORKER_INSTANCES

This method may also be useful if you want to have more workers in the same architecture available in OBS. The OpenSUSE’s public OBS instance has a large-scale setup which is a good example of this.

Emulated Build

In Open Build Service, it is also possible to configure emulated builds for cross-architecture build capabilities via QEMU. By using this feature, almost no changes are needed in package sources. Developers can build on target architectures without the need for other hardware architectures to be available. However, you might encounter problems with bugs or missing support in QEMU, and emulation may slow down the build process significantly. For instance, see how OpenSuSE managed this on RISC-V when they didn’t have the real hardware in the initial porting stage.

Conclusion

Integrating a private OBS instance as the build infrastructure provides teams a robust and comprehensive solution to build security-patched package for extended support of end-of-life (EOL) Linux distributions. By doing so, teams can unify the package building process, reduce inconsistencies, and achieve reproducibility. This approach simplifies the development and testing workflow while enhancing the overall quality and reliability of the generated packages. Additionally, by replicating the semi-original build environment, the resulting packages builds in the similar level of quality and compatibility to upstream builds.

If you are interested in integrating a private build infrastructure, or require extended support for any EOL Linux distribution. Please reach out to us. Our team is equipped with the expertise to assist you in integrating a private OBS instance into your exist infrastructure.

This article was originally written by Andrew Lee.

Artificial Intelligence (AI) is often regarded as a groundbreaking innovation of the modern era, yet its roots extend much further back than many realize. In 1943, neuroscientist Warren McCulloch and logician Walter Pitts proposed the first computational model of a neuron. The term “Artificial Intelligence” was coined in 1956. The subsequent creation of the Perceptron in 1957, the first model of a neural network, and the expert system Dendral designed for chemical analysis demonstrated the potential of computers to process data and apply expert knowledge in specific domains. From the 1970s to the 1990s, expert systems proliferated. A pivotal moment for AI in the public eye came in 1997 when IBM’s chess-playing computer Deep Blue defeated chess world champion Garry Kasparov.

The Evolution of Modern AI: From Spam Filters to LLMs and the Power of Context

The new millennium brought a new era for AI, with the integration of rudimentary AI systems into everyday technology. Spam filters, recommendation systems, and search engines subtly shaped online user experiences. In 2006, deep learning emerged, marking the renaissance of neural networks. The landmark development came in 2017 with the introduction of Transformers, a neural network architecture that became the most important ingredient for the creation of Large Language Models (LLMs). Its key component, the attention mechanism, enables the model to discern relationships between words over long distances within a text. This mechanism assigns varying weights to words depending on their contextual importance, acknowledging that the same word can hold different meanings in different situations. However, modern AI, as we know it, was made possible mainly thanks to the availability of large datasets and powerful computational hardware. Without the vast resources of the internet and electronic libraries worldwide, modern AI would not have enough data to learn and evolve. And without modern performant GPUs, training AI would be a challenging task.

The LLM is a sophisticated, multilayer neural network comprising numerous interconnected nodes. These nodes are the micro-decision-makers that underpin the collective intelligence of the system. During its training phase, an LLM learns to balance myriad small, simple decisions, which, when combined, enable it to handle complex tasks. The intricacies of these internal decisions are typically opaque to us, as we are primarily interested in the model’s output. However, these complex neural networks can only process numbers, not raw text. Text must be tokenized into words or sub-words, standardized, and normalized — converted to lowercase, stripped of punctuation, etc. These tokens are then put into a dictionary and mapped to unique numerical values. Only this numerical representation of the text allows LLMs to learn the complex relationships between words, phrases, and concepts and the likelihood of certain words or phrases following one another. LLMs therefore process texts as huge numerical arrays without truly understanding the content. They lack a mental model of the world and operate solely on mathematical representations of word relationships and their probabilities. This focus on the answer with the highest probability is also the reason why LLMs can “hallucinate” plausible yet incorrect information or get stuck in response loops, regurgitating the same or similar answers repeatedly.

The Art of Conversation: Prompt Engineering and Guiding an LLM’s Semantic Network

Based on the relationships between words learned from texts, LLMs also create vast webs of semantic associations that interconnect words. These associations form the backbone of an LLM’s ability to generate contextually appropriate and meaningful responses. When we provide a prompt to an LLM, we are not merely supplying words; we are activating a complex network of related concepts and ideas. Consider the word “apple”. This simple term can trigger a cascade of associated concepts such as “fruit,” “tree,” “food,” and even “technology” or “computer”. The activated associations depend on the context provided by the prompt and the prevalence of related concepts in the training data. The specificity of a prompt greatly affects the semantic associations an LLM considers. A vague prompt like “tell me about apples” may activate a wide array of diverse associations, ranging from horticultural information about apple trees to the nutritional value of the fruit or even cultural references like the tale of Snow White. An LLM will typically use the association with the highest occurrence in its training data when faced with such a broad prompt. For more targeted and relevant responses, it is crucial to craft focused prompts that incorporate specific technical jargon or references to particular disciplines. By doing so, the user can guide the LLM to activate a more precise subset of semantic associations, thereby narrowing the scope of the response to the desired area of expertise or inquiry.

LLMs have internal parameters that influence their creativity and determinism, such as “temperature”, “top-p”, “max length”, and various penalties. However, these are typically set to balanced defaults, and users should not modify them; otherwise, they could compromise the ability of LLMs to provide meaningful answers. Prompt engineering is therefore the primary method for guiding LLMs toward desired outputs. By crafting specific prompts, users can subtly direct the model’s responses, ensuring relevance and accuracy. The LLM derives a wealth of information from the prompt, determining not only semantic associations for the answer but also estimating its own role and the target audience’s knowledge level. By default, an LLM assumes the role of a helper and assistant, but it can adopt an expert’s voice if prompted. However, to elicit an expert-level response, one must not only set an expert role for the LLM but also specify that the inquirer is an expert as well. Otherwise, an LLM assumes an “average Joe” as the target audience by default. Therefore, even when asked to impersonate an expert role, an LLM may decide to simplify the language for the “average Joe” if the knowledge level of the target audience is not specified, which can result in a disappointing answer.

Ideas & samples

Consider two prompts for addressing a technical issue with PostgreSQL:

1. “Hi, what could cause delayed checkpoints in PostgreSQL?”

2. “Hi, we are both leading PostgreSQL experts investigating delayed checkpoints. The logs show checkpoints occasionally taking 3-5 times longer than expected. Let us analyze this step by step and identify probable causes.”

The depth of the responses will vary significantly, illustrating the importance of prompt specificity. The second prompt employs common prompting techniques, which we will explore in the following paragraphs. However, it is crucial to recognize the limitations of LLMs, particularly when dealing with expert-level knowledge, such as the issue of delayed checkpoints in our example. Depending on the AI model and the quality of its training data, users may receive either helpful or misleading answers. The quality and amount of training data representing the specific topic play a crucial role.

LLM Limitations and the Need for Verification: Overcoming Overfitting and Hallucinations

Highly specialized problems may be underrepresented in the training data, leading to overfitting or hallucinated responses. Overfitting occurs when an LLM focuses too closely on its training data and fails to generalize, providing answers that seem accurate but are contextually incorrect. In our PostgreSQL example, a hallucinated response might borrow facts from other databases (like MySQL or MS SQL) and adjust them to fit PostgreSQL terminology. Thus, the prompt itself is no guarantee of a high-quality answer—any AI-generated information must be carefully verified, which is a task that can be challenging for non-expert users.

With these limitations in mind, let us now delve deeper into prompting techniques. “Zero-shot prompting” is a baseline approach where the LLM operates without additional context or supplemental reference material, relying on its pre-trained knowledge and the prompt’s construction. By carefully activating the right semantic associations and setting the correct scope of attention, the output can be significantly improved. However, LLMs, much like humans, can benefit from examples. By providing reference material within the prompt, the model can learn patterns and structure its output accordingly. This technique is called “few-shot prompting”. The quality of the output is directly related to the quality and relevance of the reference material; hence, the adage “garbage in, garbage out” always applies.

For complex issues, “chain-of-thought” prompting can be particularly effective. This technique can significantly improve the quality of complicated answers because LLMs can struggle with long-distance dependencies in reasoning. Chain-of-thought prompting addresses this by instructing the model to break down the reasoning process into smaller, more manageable parts. It leads to more structured and comprehensible answers by focusing on better-defined sub-problems. In our PostgreSQL example prompt, the phrase “let’s analyze this step by step” instructs the LLM to divide the processing into a chain of smaller sub-problems. An evolution of this technique is the “tree of thoughts” technique. Here, the model not only breaks down the reasoning into parts but also creates a tree structure with parallel paths of reasoning. Each path is processed separately, allowing the model to converge on the most promising solution. This approach is particularly useful for complex problems requiring creative brainstorming. In our PostgreSQL example prompt, the phrase “let’s identify probable causes” instructs the LLM to discuss several possible pathways in the answer.

Plausibility vs. Veracity: The Critical Need to Verify All LLM Outputs

Of course, prompting techniques have their drawbacks. Few-shot prompting is limited by the number of tokens, which restricts the amount of information that can be included. Additionally, the model may ignore parts of excessively long prompts, especially the middle sections. Care must also be taken with the frequency of certain words in the reference material, as overlooked frequency can bias the model’s output. Chain-of-thought prompting can also lead to overfitted or “hallucinated” responses for some sub-problems, compromising the overall result.

Instructing the model to provide deterministic, factual responses is another prompting technique, vital for scientific and technical topics. Formulations like “answer using only reliable sources and cite those sources” or “provide an answer based on peer-reviewed scientific literature and cite the specific studies or articles you reference” can direct the model to base its responses on trustworthy sources. However, as already discussed, even with instructions to focus on factual information, the AI’s output must be verified to avoid falling into the trap of overfitted or hallucinated answers.

In conclusion, effective prompt engineering is a skill that combines creativity with strategic thinking, guiding the AI to deliver the most useful and accurate responses. Whether we are seeking simple explanations or delving into complex technical issues, the way we communicate with the AI always makes a difference in the quality of the response. However, we must always keep in mind that even the best prompt is no guarantee of a quality answer, and we must double-check received facts. The quality and amount of training data are paramount, and this means that some problems with received answers can persist even in future LLMs simply because they would have to use the same limited data for some specific topics.

When the model’s training data is sparse or ambiguous in certain highly focused areas, it can produce responses that are syntactically valid but factually incorrect. One reason AI hallucinations can be particularly problematic is their inherent plausibility. The generated text is usually grammatically correct and stylistically consistent, making it difficult for users to immediately identify inaccuracies without external verification. This highlights a key distinction between plausibility and veracity: just because something sounds right it does not mean it is true.

Conclusion

Whether the response is an insightful solution to a complex problem or completely fabricated nonsense is a distinction that must be made by human users, based on their expertise of the topic at hand. Our clients gained repeatedly exactly this experience with different LLMs. They tried to solve their technical problems using AI, but answers were partially incorrect or did not work at all. This is why human expert knowledge is still the most important factor when it comes to solving difficult technical issues. The inherent limitations of LLMs are unlikely to be fully overcome, at least not with current algorithms. Therefore expert knowledge will remain essential in delivering reliable, high-quality solutions even in the future. As people increasingly use AI tools in the same way they rely on Google — using them as a resource or assistant — true expertise will still be needed to interpret, refine, and implement these tools effectively. On the other hand, AI is emerging as a key driver of innovations. Progressive companies are investing heavily in AI, facing challenges related to security and performance. And this is the area where NetApp can also help. Its cloud AI focused solutions are designed to address exactly these issues.

(Picture generated by my colleague Felix Alipaz-Dicke using ChatGPT-4.)


Mastering Cloud Infrastructure with Pulumi: Introduction

In today’s rapidly changing landscape of cloud computing, managing infrastructure as code (IaC) has become essential for developers and IT professionals. Pulumi, an open-source IaC tool, brings a fresh perspective to the table by enabling infrastructure management using popular programming languages like JavaScript, TypeScript, Python, Go, and C#. This approach offers a unique blend of flexibility and power, allowing developers to leverage their existing coding skills to build, deploy, and manage cloud infrastructure. In this post, we’ll explore the world of Pulumi and see how it pairs with Amazon FSx for NetApp ONTAP—a robust solution for scalable and efficient cloud storage.


Pulumi – The Theory

Why Pulumi?

Pulumi distinguishes itself among IaC tools for several compelling reasons:

Challenges with Pulumi

Like any tool, Pulumi comes with its own set of challenges:

State Management in Pulumi: Ensuring Consistency Across Deployments

Effective infrastructure management hinges on proper state handling. Pulumi excels in this area by tracking the state of your infrastructure, enabling it to manage resources efficiently. This capability ensures that Pulumi knows exactly what needs to be created, updated, or deleted during deployments. Pulumi offers several options for state storage:

Managing state effectively is crucial for maintaining consistency across deployments, especially in scenarios where multiple team members are working on the same infrastructure.

Other IaC Tools: Comparing Pulumi to Traditional IaC Tools

When comparing Pulumi to other Infrastructure as Code (IaC) tools, several drawbacks of traditional approaches become evident:


Pulumi – In Practice

Introduction

In the this section, we’ll dive into a practical example to better understand Pulumi’s capabilities. We’ll also explore how to set up a project using Pulumi with AWS and automate it using GitHub Actions for CI/CD.

Prerequisites

Before diving into using Pulumi with AWS and automating your infrastructure management through GitHub Actions, ensure you have the following prerequisites in place:

Project Structure

When working with Infrastructure as Code (IaC) using Pulumi, maintaining an organized project structure is essential. A clear and well-defined directory structure not only streamlines the development process but also improves collaboration and deployment efficiency. In this post, we’ll explore a typical directory structure for a Pulumi project and explain the significance of each component.

Overview of a Typical Pulumi Project Directory

A standard Pulumi project might be organized as follows:


/project-root
├── .github
│ └── workflows
│ └── workflow.yml # GitHub Actions workflow for CI/CD
├── __main__.py # Entry point for the Pulumi program
├── infra.py # Infrastructure code
├── pulumi.dev.yml # Pulumi configuration for the development environment
├── pulumi.prod.yml # Pulumi configuration for the production environment
├── pulumi.yml # Pulumi configuration (common or default settings)
├── requirements.txt # Python dependencies
└── test_infra.py # Tests for infrastructure code

NetApp FSx on AWS

Introduction

Amazon FSx for NetApp ONTAP offers a fully managed, scalable storage solution built on the NetApp ONTAP file system. It provides high-performance, highly available shared storage that seamlessly integrates with your AWS environment. Leveraging the advanced data management capabilities of ONTAP, FSx for NetApp ONTAP is ideal for applications needing robust storage features and compatibility with existing NetApp systems.

Key Features

What It’s About

Setting up Pulumi for managing your cloud infrastructure can revolutionize the way you deploy and maintain resources. By leveraging familiar programming languages, Pulumi brings Infrastructure as Code (IaC) to life, making the process more intuitive and efficient. When paired with Amazon FSx for NetApp ONTAP, it unlocks advanced storage solutions within the AWS ecosystem.

Putting It All Together

Using Pulumi, you can define and deploy a comprehensive AWS infrastructure setup, seamlessly integrating the powerful FSx for NetApp ONTAP file system. This combination simplifies cloud resource management and ensures you harness the full potential of NetApp’s advanced storage capabilities, making your cloud operations more efficient and robust.

In the next sections, we’ll walk through the specifics of setting up each component using Pulumi code, illustrating how to create a VPC, configure subnets, set up a security group, and deploy an FSx for NetApp ONTAP file system, all while leveraging the robust features provided by both Pulumi and AWS.

Architecture Overview

A visual representation of the architecture we’ll deploy using Pulumi: Single AZ Deployment with FSx and EC2

The diagram above illustrates the architecture for deploying an FSx for NetApp ONTAP file system within a single Availability Zone. The setup includes a VPC with public and private subnets, an Internet Gateway for outbound traffic, and a Security Group controlling access to the FSx file system and the EC2 instance. The EC2 instance is configured to mount the FSx volume using NFS, enabling seamless access to storage.

Setting up Pulumi

Follow these steps to set up Pulumi and integrate it with AWS:

Install Pulumi: Begin by installing Pulumi using the following command:

curl -fsSL https://get.pulumi.com | sh

Install AWS CLI: If you haven’t installed it yet, install the AWS CLI to manage AWS services:

pip install awscli

Configure AWS CLI: Configure the AWS CLI with your credentials:

aws configure

Create a New Pulumi Project: Initialize a new Pulumi project with AWS and Python:

pulumi new aws-python

Configure Your Pulumi Stack: Set the AWS region for your Pulumi stack:

pulumi config set aws:region eu-central-1

Deploy Your Stack: Deploy your infrastructure using Pulumi:

pulumi preview ; pulumi up

Example: VPC, Subnets, and FSx for NetApp ONTAP

Let’s dive into an example Pulumi project that sets up a Virtual Private Cloud (VPC), subnets, a security group, an Amazon FSx for NetApp ONTAP file system, and an EC2 instance.

Pulumi Code Example: VPC, Subnets, and FSx for NetApp ONTAP

The first step is to define all the parameters required to set up the infrastructure. You can use the following example to configure these parameters as specified in the pulumi.dev.yaml file.

This pulumi.dev.yaml file contains configuration settings for a Pulumi project. It specifies various parameters for the deployment environment, including the AWS region, availability zones, and key name. It also defines CIDR blocks for subnets. These settings are used to configure and deploy cloud infrastructure resources in the specified AWS region.


config:
  aws:region: eu-central-1
  demo:availabilityZone: eu-central-1a
  demo:keyName: XYZ
  demo:subnet1CIDER: 10.0.3.0/24
  demo:subnet2CIDER: 10.0.4.0/24

The following code snippet should be placed in the infra.py file. It details the setup of the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each step in the code is explained through inline comments.


import pulumi
import pulumi_aws as aws
import pulumi_command as command
import os

# Retrieve configuration values from Pulumi configuration files
aws_config = pulumi.Config("aws")
region = aws_config.require("region")  # The AWS region where resources will be deployed

demo_config = pulumi.Config("demo")
availability_zone = demo_config.require("availabilityZone")  # Availability Zone for the deployment
subnet1_cidr = demo_config.require("subnet1CIDER")  # CIDR block for the public subnet
subnet2_cidr = demo_config.require("subnet2CIDER")  # CIDR block for the private subnet
key_name = demo_config.require("keyName")  # Name of the SSH key pair for EC2 instance access# Create a new VPC with DNS support enabled
vpc = aws.ec2.Vpc(
    "fsxVpc",
    cidr_block="10.0.0.0/16",  # VPC CIDR block
    enable_dns_support=True,    # Enable DNS support in the VPC
    enable_dns_hostnames=True   # Enable DNS hostnames in the VPC
)

# Create an Internet Gateway to allow internet access from the VPC
internet_gateway = aws.ec2.InternetGateway(
    "vpcInternetGateway",
    vpc_id=vpc.id  # Attach the Internet Gateway to the VPC
)

# Create a public route table for routing internet traffic via the Internet Gateway
public_route_table = aws.ec2.RouteTable(
    "publicRouteTable",
    vpc_id=vpc.id,
    routes=[aws.ec2.RouteTableRouteArgs(
        cidr_block="0.0.0.0/0",  # Route all traffic (0.0.0.0/0) to the Internet Gateway
        gateway_id=internet_gateway.id
    )]
)

# Create a single public subnet in the specified Availability Zone
public_subnet = aws.ec2.Subnet(
    "publicSubnet",
    vpc_id=vpc.id,
    cidr_block=subnet1_cidr,  # CIDR block for the public subnet
    availability_zone=availability_zone,  # The specified Availability Zone
    map_public_ip_on_launch=True  # Assign public IPs to instances launched in this subnet
)

# Create a single private subnet in the same Availability Zone
private_subnet = aws.ec2.Subnet(
    "privateSubnet",
    vpc_id=vpc.id,
    cidr_block=subnet2_cidr,  # CIDR block for the private subnet
    availability_zone=availability_zone  # The same Availability Zone
)

# Associate the public subnet with the public route table to enable internet access
public_route_table_association = aws.ec2.RouteTableAssociation(
    "publicRouteTableAssociation",
    subnet_id=public_subnet.id,
    route_table_id=public_route_table.id
)

# Create a security group to control inbound and outbound traffic for the FSx file system
security_group = aws.ec2.SecurityGroup(
    "fsxSecurityGroup",
    vpc_id=vpc.id,
    description="Allow NFS traffic",  # Description of the security group
    ingress=[
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=2049,  # NFS protocol port
            to_port=2049,
            cidr_blocks=["0.0.0.0/0"]  # Allow NFS traffic from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=111,  # RPCBind port for NFS
            to_port=111,
            cidr_blocks=["0.0.0.0/0"]  # Allow RPCBind traffic from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="udp",
            from_port=111,  # RPCBind port for NFS over UDP
            to_port=111,
            cidr_blocks=["0.0.0.0/0"]  # Allow RPCBind traffic over UDP from anywhere
        ),
        aws.ec2.SecurityGroupIngressArgs(
            protocol="tcp",
            from_port=22,  # SSH port for EC2 instance access
            to_port=22,
            cidr_blocks=["0.0.0.0/0"]  # Allow SSH traffic from anywhere
        )
    ],
    egress=[
        aws.ec2.SecurityGroupEgressArgs(
            protocol="-1",  # Allow all outbound traffic
            from_port=0,
            to_port=0,
            cidr_blocks=["0.0.0.0/0"]  # Allow all outbound traffic to anywhere
        )
    ]
)

# Create the FSx for NetApp ONTAP file system in the private subnet
file_system = aws.fsx.OntapFileSystem(
    "fsxFileSystem",
    subnet_ids=[private_subnet.id],  # Deploy the FSx file system in the private subnet
    preferred_subnet_id=private_subnet.id,  # Preferred subnet for the FSx file system
    security_group_ids=[security_group.id],  # Attach the security group to the FSx file system
    deployment_type="SINGLE_AZ_1",  # Single Availability Zone deployment
    throughput_capacity=128,  # Throughput capacity in MB/s
    storage_capacity=1024  # Storage capacity in GB
)

# Create a Storage Virtual Machine (SVM) within the FSx file system
storage_virtual_machine = aws.fsx.OntapStorageVirtualMachine(
    "storageVirtualMachine",
    file_system_id=file_system.id,  # Associate the SVM with the FSx file system
    name="svm1",  # Name of the SVM
    root_volume_security_style="UNIX"  # Security style for the root volume
)

# Create a volume within the Storage Virtual Machine (SVM)
volume = aws.fsx.OntapVolume(
    "fsxVolume",
    storage_virtual_machine_id=storage_virtual_machine.id,  # Associate the volume with the SVM
    name="vol1",  # Name of the volume
    junction_path="/vol1",  # Junction path for mounting
    size_in_megabytes=10240,  # Size of the volume in MB
    storage_efficiency_enabled=True,  # Enable storage efficiency features
    tiering_policy=aws.fsx.OntapVolumeTieringPolicyArgs(
        name="SNAPSHOT_ONLY"  # Tiering policy for the volume
    ),
    security_style="UNIX"  # Security style for the volume
)

# Extract the DNS name from the list of SVM endpoints
dns_name = storage_virtual_machine.endpoints.apply(lambda e: e[0]['nfs'][0]['dns_name'])

# Get the latest Amazon Linux 2 AMI for the EC2 instance
ami = aws.ec2.get_ami(
    most_recent=True,
    owners=["amazon"],
    filters=[{"name": "name", "values": ["amzn2-ami-hvm-*-x86_64-gp2"]}]  # Filter for Amazon Linux 2 AMI
)

# Create an EC2 instance in the public subnet
ec2_instance = aws.ec2.Instance(
    "fsxEc2Instance",
    instance_type="t3.micro",  # Instance type for the EC2 instance
    vpc_security_group_ids=[security_group.id],  # Attach the security group to the EC2 instance
    subnet_id=public_subnet.id,  # Deploy the EC2 instance in the public subnet
    ami=ami.id,  # Use the latest Amazon Linux 2 AMI
    key_name=key_name,  # SSH key pair for accessing the EC2 instance
    tags={"Name": "FSx EC2 Instance"}  # Tag for the EC2 instance
)

# User data script to install NFS client and mount the FSx volume on the EC2 instance
user_data_script = dns_name.apply(lambda dns: f"""#!/bin/bash
sudo yum update -y
sudo yum install -y nfs-utils
sudo mkdir -p /mnt/fsx
if ! mountpoint -q /mnt/fsx; then
sudo mount -t nfs {dns}:/vol1 /mnt/fsx
fi
""")

# Retrieve the private key for SSH access from environment variables while running with Github Actions
private_key_content = os.getenv("PRIVATE_KEY")
print(private_key_content)

# Ensure the FSx file system is available before executing the script on the EC2 instance
pulumi.Output.all(file_system.id, ec2_instance.public_ip).apply(lambda args: command.remote.Command(
    "mountFsxFileSystem",
    connection=command.remote.ConnectionArgs(
        host=args[1],
        user="ec2-user",
        private_key=private_key_content
    ),
    create=user_data_script,
    opts=pulumi.ResourceOptions(depends_on=[volume])
))


Pytest with Pulumi

Introduction

Pytest is a widely-used Python testing framework that allows developers to create simple and scalable test cases. When paired with Pulumi, an infrastructure as code (IaC) tool, Pytest enables thorough testing of cloud infrastructure code, akin to application code testing. This combination is crucial for ensuring that infrastructure configurations are accurate, secure, and meet the required state before deployment. By using Pytest with Pulumi, you can validate resource properties, mock cloud provider responses, and simulate various scenarios. This reduces the risk of deploying faulty infrastructure and enhances the reliability of your cloud environments. Although integrating Pytest into your CI/CD pipeline is not mandatory, it is highly beneficial as it leverages Python’s robust testing capabilities with Pulumi.

Testing Code

The following code snippet should be placed in the infra_test.py file. It is designed to test the infrastructure setup defined in infra.py, including the VPC, subnets, security group, and FSx for NetApp ONTAP file system. Each test case focuses on different aspects of the infrastructure to ensure correctness, security, and that the desired state is achieved. Inline comments are provided to explain each test case.


# Importing necessary libraries
import pulumi
import pulumi_aws as aws
from typing import Any, Dict, List

# Setting up configuration values for AWS region and various parameters
pulumi.runtime.set_config('aws:region', 'eu-central-1')
pulumi.runtime.set_config('demo:availabilityZone1', 'eu-central-1a')
pulumi.runtime.set_config('demo:availabilityZone2', 'eu-central-1b')
pulumi.runtime.set_config('demo:subnet1CIDER', '10.0.3.0/24')
pulumi.runtime.set_config('demo:subnet2CIDER', '10.0.4.0/24')
pulumi.runtime.set_config('demo:keyName', 'XYZ') - Change based on your own key

# Creating a class MyMocks to mock Pulumi's resources for testing
class MyMocks(pulumi.runtime.Mocks):
    def new_resource(self, args: pulumi.runtime.MockResourceArgs) -> List[Any]:
        # Initialize outputs with the resource's inputs
        outputs = args.inputs

        # Mocking specific resources based on their type
        if args.typ == "aws:ec2/instance:Instance":
            # Mocking an EC2 instance with some default values
            outputs = {
                **args.inputs,  # Start with the given inputs
                "ami": "ami-0eb1f3cdeeb8eed2a",  # Mock AMI ID
                "availability_zone": "eu-central-1a",  # Mock availability zone
                "publicIp": "203.0.113.12",  # Mock public IP
                "publicDns": "ec2-203-0-113-12.compute-1.amazonaws.com",  # Mock public DNS
                "user_data": "mock user data script",  # Mock user data
                "tags": {"Name": "test"}  # Mock tags
            }
        elif args.typ == "aws:ec2/securityGroup:SecurityGroup":
            # Mocking a Security Group with default ingress rules
            outputs = {
                **args.inputs,
                "ingress": [
                    {"from_port": 80, "cidr_blocks": ["0.0.0.0/0"]},  # Allow HTTP traffic from anywhere
                    {"from_port": 22, "cidr_blocks": ["192.168.0.0/16"]}  # Allow SSH traffic from a specific CIDR block
                ]
            }
        
        # Returning a mocked resource ID and the output values
        return [args.name + '_id', outputs]

    def call(self, args: pulumi.runtime.MockCallArgs) -> Dict[str, Any]:
        # Mocking a call to get an AMI
        if args.token == "aws:ec2/getAmi:getAmi":
            return {
                "architecture": "x86_64",  # Mock architecture
                "id": "ami-0eb1f3cdeeb8eed2a",  # Mock AMI ID
            }
        
        # Return an empty dictionary if no specific mock is needed
        return {}

# Setting the custom mocks for Pulumi
pulumi.runtime.set_mocks(MyMocks())

# Import the infrastructure to be tested
import infra

# Define a test function to validate the AMI ID of the EC2 instance
@pulumi.runtime.test
def test_instance_ami():
    def check_ami(ami_id: str) -> None:
        print(f"AMI ID received: {ami_id}")
        # Assertion to ensure the AMI ID is the expected one
        assert ami_id == "ami-0eb1f3cdeeb8eed2a", 'EC2 instance must have the correct AMI ID'

    # Running the test to check the AMI ID
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.ami.apply(check_ami))

# Define a test function to validate the availability zone of the EC2 instance
@pulumi.runtime.test
def test_instance_az():
    def check_az(availability_zone: str) -> None:
        print(f"Availability Zone received: {availability_zone}")
        # Assertion to ensure the instance is in the correct availability zone
        assert availability_zone == "eu-central-1a", 'EC2 instance must be in the correct availability zone'
    
    # Running the test to check the availability zone
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.availability_zone.apply(check_az))

# Define a test function to validate the tags of the EC2 instance
@pulumi.runtime.test
def test_instance_tags():
    def check_tags(tags: Dict[str, Any]) -> None:
        print(f"Tags received: {tags}")
        # Assertions to ensure the instance has tags and a 'Name' tag
        assert tags, 'EC2 instance must have tags'
        assert 'Name' in tags, 'EC2 instance must have a Name tag'
    
    # Running the test to check the tags
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.tags.apply(check_tags))

# Define a test function to validate the user data script of the EC2 instance
@pulumi.runtime.test
def test_instance_userdata():
    def check_user_data(user_data_script: str) -> None:
        print(f"User data received: {user_data_script}")
        # Assertion to ensure the instance has user data configured
        assert user_data_script is not None, 'EC2 instance must have user_data_script configured'
    
    # Running the test to check the user data script
    pulumi.runtime.run_in_stack(lambda: infra.ec2_instance.user_data.apply(check_user_data))

Github Actions

Introduction

GitHub Actions is a powerful automation tool integrated within GitHub, enabling developers to automate their workflows, including testing, building, and deploying code. Pulumi, on the other hand, is an Infrastructure as Code (IaC) tool that allows you to manage cloud resources using familiar programming languages. In this post, we’ll explore why you should use GitHub Actions and its specific purpose when combined with Pulumi.

Why Use GitHub Actions and Its Importance

GitHub Actions is a powerful tool for automating workflows within your GitHub repository, offering several key benefits, especially when combined with Pulumi:

Execution

To execute the GitHub Actions workflow:

By incorporating this workflow, you ensure that your Pulumi infrastructure is continuously integrated and deployed with proper validation, significantly improving the reliability and efficiency of your infrastructure management process.

Example: Deploy infrastructure with Pulumi


name: Pulumi Deployment

on:
  push:
    branches:
      - main

env:
  # Environment variables for AWS credentials and private key.
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
  PRIVATE_KEY: ${{ secrets.PRIVATE_KEY }}

jobs:
  pulumi-deploy:
    runs-on: ubuntu-latest
    environment: dev

    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        # Check out the repository code to the runner.

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v3
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: eu-central-1
        # Set up AWS credentials for use in subsequent actions.

      - name: Set up SSH key
        run: |
          mkdir -p ~/.ssh
          echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/XYZ.pem
          chmod 600 ~/.ssh/XYZ.pem
        # Create an SSH directory, add the private SSH key, and set permissions.

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.9'
        # Set up Python 3.9 environment for running Python-based tasks.
  
      - name: Set up Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '14'
        # Set up Node.js 14 environment for running Node.js-based tasks.

      - name: Install project dependencies
        run: npm install
        working-directory: .
        # Install Node.js project dependencies specified in `package.json`.
      
      - name: Install Pulumi
        run: npm install -g pulumi
        # Install the Pulumi CLI globally.

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
        working-directory: .
        # Upgrade pip and install Python dependencies from `requirements.txt`.

      - name: Login to Pulumi
        run: pulumi login
        env:
          PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
        # Log in to Pulumi using the access token stored in secrets.
        
      - name: Set Pulumi configuration for tests
        run: pulumi config set aws:region eu-central-1 --stack dev
        # Set Pulumi configuration to specify AWS region for the `dev` stack.

      - name: Pulumi stack select
        run: pulumi stack select dev  
        working-directory: .
        # Select the `dev` stack for Pulumi operations.

      - name: Run tests
        run: |
          pulumi config set aws:region eu-central-1
          pytest
        working-directory: .
        # Set AWS region configuration and run tests using pytest.
    
      - name: Preview Pulumi changes
        run: pulumi preview --stack dev
        working-directory: .
        # Preview the changes that Pulumi will apply to the `dev` stack.
    
      - name: Update Pulumi stack
        run: pulumi up --yes --stack dev
        working-directory: . 
        # Apply the changes to the `dev` stack with Pulumi.

      - name: Pulumi stack output
        run: pulumi stack output
        working-directory: .
        # Retrieve and display outputs from the Pulumi stack.

      - name: Cleanup Pulumi stack
        run: pulumi destroy --yes --stack dev
        working-directory: . 
        # Destroy the `dev` stack to clean up resources.

      - name: Pulumi stack output (after destroy)
        run: pulumi stack output
        working-directory: .
        # Retrieve and display outputs from the Pulumi stack after destruction.

      - name: Logout from Pulumi
        run: pulumi logout
        # Log out from the Pulumi session.

Output:

Finally, let’s take a look at how everything appears both in GitHub and in AWS. Check out the screenshots below to see the GitHub Actions workflow in action and the resulting AWS resources.

GitHub Actions workflow

AWS FSx resource

AWS FSx storage virtual machine


Outlook: Exploring Advanced Features of Pulumi and Amazon FSx for NetApp ONTAP

As you become more comfortable with Pulumi and Amazon FSx for NetApp ONTAP, there are numerous advanced features and capabilities to explore. These can significantly enhance your infrastructure automation and storage management strategies. In a follow-up blog post, we will delve into these advanced topics, providing a deeper understanding and practical examples.

Advanced Features of Pulumi

  1. Cross-Cloud Infrastructure Management
    Pulumi supports multiple cloud providers, including AWS, Azure, Google Cloud, and Kubernetes. In more advanced scenarios, you can manage resources across different clouds in a single Pulumi project, enabling true multi-cloud and hybrid cloud architectures.
  2. Component Resources
    Pulumi allows you to create reusable components by grouping related resources into custom classes. This is particularly useful for complex deployments where you want to encapsulate and reuse configurations across different projects or environments.
  3. Automation API
    Pulumi’s Automation API enables you to embed Pulumi within your own applications, allowing for infrastructure to be managed programmatically. This can be useful for building custom CI/CD pipelines or integrating with other systems.
  4. Policy as Code with Pulumi CrossGuard
    Pulumi CrossGuard allows you to enforce compliance and security policies across your infrastructure using familiar programming languages. Policies can be applied to ensure that resources adhere to organizational standards, improving governance and reducing risk.
  5. Stack References and Dependency Management
    Pulumi’s stack references enable you to manage dependencies between different Pulumi stacks, allowing for complex, interdependent infrastructure setups. This is crucial for large-scale environments where components must interact and be updated in a coordinated manner.

Advanced Features of Amazon FSx for NetApp ONTAP

  1. Data Protection and Snapshots
    FSx for NetApp ONTAP offers advanced data protection features, including automated snapshots, SnapMirror for disaster recovery, and integration with AWS Backup. These features help safeguard your data and ensure business continuity.
  2. Data Tiering and Cost Optimization
    FSx for ONTAP includes intelligent data tiering, which automatically moves infrequently accessed data to lower-cost storage. This feature is vital for optimizing costs, especially in environments with large amounts of data that have varying access patterns.
  3. Multi-Protocol Access and CIFS/SMB Integration
    FSx for ONTAP supports multiple protocols, including NFS, SMB, and iSCSI, enabling seamless access from both Linux/Unix and Windows clients. This is particularly useful in mixed environments where applications or users need to access the same data using different protocols.
  4. Performance Tuning and Quality of Service (QoS)
    FSx for ONTAP allows you to fine-tune performance parameters and implement QoS policies, ensuring that critical workloads receive the necessary resources. This is essential for applications with stringent performance requirements.
  5. ONTAP System Manager and API Integration
    Advanced users can leverage the ONTAP System Manager or integrate with NetApp’s extensive API offerings to automate and customize the management of FSx for ONTAP. This level of control is invaluable for organizations looking to tailor their storage solutions to specific needs.

What’s Next?

In the next blog post, we will explore these advanced features in detail, providing practical examples and use cases. We’ll dive into multi-cloud management with Pulumi, demonstrate the creation of reusable infrastructure components, and explore how to enforce security and compliance policies with Pulumi CrossGuard. Additionally, we’ll examine advanced data management strategies with FSx for NetApp ONTAP, including snapshots, data tiering, and performance optimization.

Stay tuned as we take your infrastructure as code and cloud storage management to the next level!

Conclusion

This example demonstrates how Pulumi can be used to manage AWS infrastructure using Python. By defining resources like VPCs, subnets, security groups, and FSx file systems in code, you can version control your infrastructure and easily reproduce environments.

Amazon FSx for NetApp ONTAP offers a powerful and flexible solution for running file-based workloads in the cloud, combining the strengths of AWS and NetApp ONTAP. Pulumi’s ability to leverage existing programming languages for infrastructure management allows for more complex logic and better integration with your development workflows. However, it requires familiarity with these languages and has a smaller ecosystem compared to Terraform. Despite these differences, Pulumi is a powerful tool for managing modern cloud infrastructure.


Disclaimer

The information provided in this blog post is for educational and informational purposes only. The features and capabilities of Pulumi and Amazon FSx for NetApp ONTAP mentioned are subject to change as new versions and updates are released. While we strive to ensure that the content is accurate and up-to-date, we cannot guarantee that it reflects the latest changes or improvements. Always refer to the official documentation and consult with your cloud provider or technology partner for the most current and relevant information. The author and publisher of this blog post are not responsible for any errors or omissions, or for any actions taken based on the information provided.

Suggested Links

  1. Pulumi Official Documentation: https://www.pulumi.com/docs/
  2. Amazon FSx for NetApp ONTAP Documentation: https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/what-is.html
  3. Pulumi GitHub Repository: https://github.com/pulumi/pulumi
  4. NetApp ONTAP Documentation: https://docs.netapp.com/ontap/index.jsp
  5. AWS VPC Documentation: https://docs.aws.amazon.com/vpc/
  6. AWS Security Group Documentation: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html
  7. AWS EC2 Documentation: https://docs.aws.amazon.com/ec2/index.html
This article was initially created by Harshil Gupta.

DebConf 2024 from 28. July to 4. Aug 2024 https://debconf24.debconf.org/

Last week the annual Debian Community Conference DebConf happend in Busan, South Korea. Four NetApp employees (Michael, Andrew, Christop and Noël) participated the whole week at the Pukyong National University. The camp takes place before the conference, where the infrastructure is set up and the first collaborations take place. The camp is described in a separate article: https://www.credativ.de/en/blog/credativ-inside/debcamp-bootstrap-for-debconf24/
There was a heat wave with high humidity in Korea at the time but the venue and accommodation at the University are air conditioned so collaboration work, talks and BoF were possible under the circumstances.

Around 400 Debian enthusiasts from all over the world were onsite and additional people attended remotly with the video streaming and the Matrix online chat #debconf:matrix.debian.social

The content team created a schedule with different aspects of Debian; technical, social, political,….
https://debconf24.debconf.org/schedule/

There were two bigger announcements during DebConf24:

  1. the new distribution eLxr https://elxr.org/ based on Debian initiated by Windriver
    https://debconf24.debconf.org/talks/138-a-unified-approach-for-intelligent-deployments-at-the-edge/
    Two takeaway points I understood from this talk is Windriver wants to exchange CentOS and preferes a binary distribution.
  2. The Debian package management system will get a new solver https://debconf24.debconf.org/talks/8-the-new-apt-solver/

The list of interesting talks is much longer from a full conference week. Most talks and BoF were streamed live and the recordings can be found in the video archive:
https://meetings-archive.debian.net/pub/debian-meetings/2024/DebConf24/

It is a tradtion to have a Daytrip for socializing and get a more interesting view of the city and the country. https://wiki.debian.org/DebConf/24/DayTrip/ (sorry the details of the three Daytrip are on the website for participants).

For the annual conference group photo we have to go outsite into the heat with high humidity but I hope you will not see us sweeting.

The Debian Conference 2025 will be in July in Brest, France: https://wiki.debian.org/DebConf/25/ and we will be there.:) Maybe it will be a chance for you to join us.

See also Debian News: DebConf24 closes in Busan and DebConf25 dates announced

DebConf24 https://debconf24.debconf.org/ took place from 2024-07-28 to 2024-08–04 in Busan, Korea.

Four employees (three Debian developers) from NetApp had the opportunity to participate in the annual event, which is the most important conference in the Debian world: Christoph Senkel, Andrew Lee, Michael Meskes and Noël Köthe.

DebCamp

What is DebCamp? DebCamp usually takes place a week before DebConf begins. For participants, DebCamp is a hacking session that takes place just before DebConf. It’s a week dedicated to Debian contributors focusing on their Debian-related projects, tasks, or problems without interruptions.

DebCamps are largely self-organized since it’s a time for people to work. Some prefer to work individually, while others participate in or organize sprints. Both approaches are encouraged, although it’s recommended to plan your DebCamp week in advance.

During this DebCamp, there are the following public sprints:

Scheduled workshops include:

GPG Workshop for Newcomers:
Asymmetric cryptography is a daily tool in Debian operations, used to establish trust and secure communications through email encryption, package signing, and more. This workshop participants will learn to create a PGP key and perform essential tasks such as file encryption/decryption, content signing, and sending encrypted emails. Post-creation, the key will be uploaded to public keyservers, enabling attendees to participate in our Continuous Keysigning Party.

Creating Web Galleries with Geo-Tagged Photos:
Learn how to create a web gallery with integrated maps from a geo-tagged photo collection. The session will cover the use of fgallery, openlayers, and a custom Python script, all orchestrated by a Makefile. This method, used for a South Korea gallery in 2018, will be taught hands-on, empowering others to showcase their photo collections similarly.

Introduction to Creating .deb Files (Debian Packaging):
This session will delve into the basics of Debian packaging and the Debian release cycle, including stable, unstable, and testing branches. Attendees will set up a Debian unstable system, build existing packages from source, and learn to create a Debian package from scratch. Discussions will extend online at #debconf24-bootcamp on irc.oftc.net.

In addition to the organizational part, our colleague Andrew is part of the orga team this year. He suported to arrange Cheese and Wine party and proposed an idea to organize a “Coffee Lab” where people can bring their coffee equipments and beans from their country and share each other during the conference. Andrew successfully set up the Coffee Lab in the social space with support from the “Local Team” and contributors Kitt, Clement, and Steven. They provided a diverse selection of beans and teas from countries such as Colombia, Ethiopia, India, Peru, Taiwan, Thailand, and Guatemala. Additionally, they shared various coffee-making tools, including the “Mr. Clever Dripper,” AeroPress, and AerSpeed grinder.

It also allows the DebConf committee to work together with the local team to prepare additional details for the conference. During DebCamp, the organization team typically handles the following tasks:

Setting up the Frontdesk: This involves providing conference badges (with maps and additional information) and distributing SWAG such as food vouchers, conference t-shirts, conference cups, usb-powered fan, and sponsor gifts.
Setting up the network: This includes configuring the network in conference rooms, hack labs, and video team equipment for live streaming during the event.
Accommodation arrangements: Assigning rooms for participants to check in to on-site accommodations.
Food arrangements: Catering to various dietary requirements, including regular, vegetarian, vegan, and accommodating special religious and allergy-related needs.
Setting up a spcial space: Providing a relaxed environment for participants to socialize and get to know each other.
Writing daily announcements: Keeping participants informed about ongoing activities.
Arranging childcare service.
Organizing day trip options.
Arranging parties.

In addition to the organizational part, our colleague Andrew also attended and arranged private sprints during DebCamp and contiune through DebConf via his LXQt team BoF and LXQt team newcommer private workshop. Where the team received contribution from new commers. The youngest one is only 13 years old who created his first GPG key during the GPG key workshop and attended LXQt team workshop where he managed to fix a few bugs in Debian during the workshop session.

Young kids in DebCamp

At DebCamp, two young attendees, aged 13 and 10, participated in a GPG workshop for newcomers and created their own GPG keys. The older child hastily signed another new attendee’s key without proper verification, not fully grasping that Debian’s security relies on the trustworthiness of GPG keys. This prompted a lesson from his Debian Developer father, who explained the importance of trust by comparing it to entrusting someone with the keys to one’s home. Realizing his mistake, the child considered how to rectify the situation since he had already signed and uploaded the key. He concluded that he could revoke the old key and create a new one after DebConf, which he did, securing his new GPG and SSH keys with a Yubikey.

This article was initially written by Andrew Lee.

How and when to use Software-Defined Networks in Proxmox VE

Proxmox VE is still the current go to solution when it comes to VM workloads by using open-source software. In the past, we already covered several topics around Proxmox, like migrating virtual machines from an ESXi environment to Proxmox environments, using Proxmox in addition to NVMe-oF for breakthrough performance or how to integrate the Proxmox Backup Server into a Proxmox cluster.
We can clearly see that there are still many other very interesting topics around Proxmox and therefore, we want to cover the SDN (software defined networking) feature in Proxmox. From a historical point of view, this feature is not really new – it already got introduced in Proxmox’s web ui with Proxmox 6.2 but was always defined as an experimental feature. This finally changed with Proxmox 8.x where this not only got fully integrated but also got with Proxmox 8.1 the essential feature of IP address management (IPAM). Also, the SDN integration is now installed by default in Proxmox. However, you should still take note that this does not mean that all features are already stable – IPAM with DHCP management and also FRRouting and its controller integration are still in a tech preview state. So far, this sounds pretty interesting!

Proxmox VE in SDN (Symbolbild) What is Software-Defined Networking?

But what is SDN and what does it have to do with Proxmox? Software-Defined Networking (SDN) which also often just gets called as Software-Defined Network, is a network architecture that centralizes network intelligence in a programmable controller, which maintains a global view of the network. This architecture allows for dynamic, scalable, and automated network configurations, in contrast to traditional networking where control and data planes are tightly coupled within network devices. The benefits of SDN include increased flexibility and agility, centralized management, improved resource utilization, and enhanced security. These benefits enable a quick deployment and adjustment of network services, simplify the management of large and complex networks, enhance the efficiency of resource allocation, and facilitate the implementation of comprehensive security policies and monitoring.
Proxmox VE also supports SDN to extend its capabilities in managing virtualized networks. With SDN, Proxmox VE offers centralized network management through a unified interface which simplifies the management of virtual networks across multiple nodes. Administrators can define and manage virtual networks at a central point for the whole cluster which reduces the complexity of network configurations. SDN in Proxmox VE also enables network virtualization, allowing the creation of virtual networks that are abstracted from the physical network infrastructure. This capability supports isolated network environments for different virtual machines (VMs) and containers.
Dynamic network provisioning is another key feature of SDN in Proxmox VE, which leverages SDN to dynamically allocate network resources based on the needs of VMs and containers, optimizing performance and resource utilization. The integration of Proxmox VE with Open vSwitch (OVS) enhances these capabilities. OVS is a production-quality, multilayer virtual switch designed to enable SDN and supports advanced network functions such as traffic shaping, QoS, and network isolation. Furthermore, Proxmox VE supports advanced networking features like VLAN tagging, network bonding, and firewall rules, providing comprehensive network management capabilities.

How to configure a SDN

Knowing the basics and possibilities of Software-Defined Networking (SDN) now, it gets interesting to set up such a network within a Proxmox cluster.

Proxmox comes with support for software-defined networking (SDN), allowing users to integrate various types of network configurations to suit their specific networking needs. With Proxmox, you have the flexibility to select from several SDN types, including “Simple”, which is likely aimed at straightforward networking setups without the need for advanced features. For environments requiring network segmentation, VLAN support is available, providing the means to isolate and manage traffic within distinct virtual LANs. More complex scenarios might benefit from QinQ support, which allows multiple VLAN tags on a single interface. Also and very interesting for data centers, Proxmox also includes VxLAN support, which extends layer 2 networking over a layer 3 infrastructure which significantly increases the number of possible VLANs which would else be limited to 4096 VLANs. Lastly to mention is the EVPN support which is also part of Proxmox’s SDN offerings, facilitating advanced layer 2 and layer 3 virtualization and providing a scalable control plane with BGP (Border Gateway Protocol) for multi-tenancy environments.

In this guide, we’ll walk through the process of setting up a streamlined Software-Defined Network (SDN) within a Proxmox Cluster environment. The primary goal is to establish a new network, including its own network configuration that is automatically propagated across all nodes within the cluster. This newly created network will created by its own IP space where virtual machines (VMs) receiving their IP addresses dynamically via DHCP. This setup eliminates the need for manual IP forwarding or Network Address Translation (NAT) on the host machines. An additional advantage of this configuration is the consistency it offers; the gateway for the VMs will always remain constant regardless of the specific host node they are operating on.

Configuration

The configuration of Software-Defined Networking (SDN) got very easy within the latest Proxmox VE versions where the whole process can be done in the Proxmox web UI. Therefore, we just connect to the Proxmox management web interface which typically reachable at:

The SDN options are integrated within the datacenter chapter, in the sub chapter SDN. All further work will only be done within this chapter. Therefore, we navigate to:
–> Datacenter
—-> SDN
——–> Zones

The menu on the right site offers to add a new zone where the new zone of the type Simple will be selected. A new windows pops up where we directly activate the advanced options at the bottom. Afterwards, further required details will be provided.

 

ID: devnet01
MTU: Auto
Nodes: All
IPAM: pve
Automatic DHCP: Activate

 

The ID represents the unique identifier of this zone. It might make sense to give it a recognisable name. Usually, we do not need to adjust the MTU size for this kind of default setups. However, there may always be some corner cases. In the node sections, this zone can be assigned to specific nodes or simply to all ones. There may also be scenarios where zones might only be limited to specific nodes. According to our advanced options, further details like DNS server and also the forward- & reverse zones can be defined. For this basic setup, this will not be used but the automatic DHCP option must be activated.

Now, the next steps will be done in the chapter VNets where the previously created zone will be linked to a virtual network. In the same step we will also provide additional network information like the network range etc.

When creating a new VNet, an identifier or name must be given. It often makes sense to align the virtual network name to the previously generated zone name. In this example, the same names will be used. Optionally, an alias can be defined. The important part is to select the desired zone that should be used (e.g., devnet01). After creating the new VNet, we have the possibility to create a new subnet in the same window by clicking on the Create Subnet button.

Within this dialog, some basic network information will be entered. In general, we need to provide the desired subnet in CIDR notation (e.g., 10.11.12.0/24). Defining the IP address for the gateway is also possible. In this example the gateway will be placed on the IP address 10.11.12.1. Important is to activate the option SNAT. SNAT (Source Network Address Translation) is a technique to modify the source IP address of outgoing network traffic to appear as though it originates from a different IP address, which is usually the IP address of the router or firewall. This method is commonly employed to allow multiple devices on a private network to access external networks.

After creating and linking the zone, VNet and the subnet, the configuration can simply be applied on the web interface by clicking on the apply button. The configuration will now be synced to the desired nodes (in our example all ones).

Usage

After applying the configuration on the nodes within the cluster, virtual machines must still be assigned to this network. Luckily, this can easily be done by using the regular Proxmox web interface which now also provides the newly created network devnet01 in the networking chapter of the VM. But also already present virtual machines can be assigned to this network.

When it comes to DevOps and automation, this is also available in the API where virtual machines can be assigned to the new network. Such a task could look like in the following example in Ansible:

- name: Create Container in Custom Network
community.general.proxmox:
vmid: 100
node: de01-dus01-node03
api_user: root@pam
api_password: {{ api_password }}
api_host: de01-dus01-node01
password: {{ container_password }}
hostname: {{ container_fqdn }}
ostemplate: 'local:vztmpl/debian-12-x86_64.tar.gz'
netif: '{"net0":"name=eth0,ip=dhcp,ip6=dhcp,bridge=devnet01"}'

Virtual machines assigned to this network will immediately get IP addresses within our previously defined network 10.11.12.0/24 and can access the internet without any further needs. VMs may also moved across nodes in the cluster without any needs to adjust the gateway, even a node get shut down or rebooted for maintenances.

Conclusion

In conclusion, the integration of Software-Defined Networking (SDN) into Proxmox VE represents a huge benefit from a technical, but also from a user perspective where this feature is also usable from the Proxmox’s web ui. This ease of configuration empowers even those with limited networking experience to set up and manage even more complex network setups as well.

Proxmox makes it also easier with simple SDNs to create basic networks that let virtual machines connect to the internet. You don’t have to deal with complicated settings or gateways on the main nodes. This makes it quicker to get virtual setups up and running and lowers the chance of making mistakes that could lead to security problems.

For people just starting out, Proxmox has a user friendly website that makes it easy to set up and control networks. This is really helpful because it means they don’t have to learn a lot of complicated stuff to get started. Instead, they can spend more time working with their virtual computers and not worry too much about how to connect everything.

People who know more about technology will like how Proxmox lets them set up complex networks. This is good for large scaled setups because it can make the network run better, handle more traffic, and keep different parts of the network separate from each other.

Just like other useful integrations (e.g. Ceph), also the SDN integration provides huge benefits to its user base and shows the ongoing integration of useful tooling in Proxmox.

On Thursday, 27 June, and Friday, 28 June 2024, I had the amazing opportunity to attend Swiss PGDay 2024. The conference was held at the OST Eastern Switzerland University of Applied Sciences, Campus Rapperswil, which is beautifully situated on the banks of Lake Zurich in a nice, green environment. With approximately 110 attendees, the event had mainly a B2B focus, although not exclusively. Despite the conference being seemingly smaller in scale compared to PostgreSQL events in larger countries, it actually reflected perfectly the scope relevant for Switzerland.

During the conference, I presented my talk “GIN, BTREE_GIN, GIST, BTREE_GIST, HASH & BTREE Indexes on JSONB Data“. The talk summarized the results of my long-term project at NetApp, including newer interesting findings compared to the presentation I gave in Prague at the beginning of June. As far as I could tell, my talk was well received by the audience, and I received very positive feedback.

At the very end on Friday, I also presented a lightning talk, “Can PostgreSQL Have a More Prominent Role in the AI Boom?” (my slides are at the end of the file). In this brief talk, I raised the question of whether it would be possible to implement AI functionality directly into PostgreSQL, including storing embedding models and trained neural networks within the database. Several people in the audience, involved with ML/AI, reacted positively on this proposal, acknowledging that PostgreSQL could indeed play a more significant role in ML and AI topics.

The conference featured two tracks of presentations, one in English and the other in German, allowing for a diverse range of topics and speakers. I would like to highlight some of them:

At the end of the first day, all participants were invited to a social event for networking and personal exchange, which was very well organized. I would like to acknowledge the hard work and dedication of all the organizers and thank them for their efforts. Swiss PGDay 2024 was truly a memorable and valuable experience, offering great learning opportunities. I am grateful for the chance to participate and contribute to the conference, and I look forward to future editions of this event. I am also very thankful to NetApp-credativ for making my participation in the conference possible.

Photos by organizers, Gülçin Yıldırım Jelínek and author:

 

 

   

NetApp Storage and NVMe-oF for Breakthrough Performance in Proxmox Virtualization Environments

What is NVMe-oF

NVMe over Fabrics (NVMe-oF) is a cutting-edge protocol that has been developed to bring the impressive performance and low-latency characteristics of NVMe storage devices over network fabrics. This innovation is particularly transformative for data centers, as it facilitates the separation of storage and compute resources, granting administrators the ability to deploy these resources more flexibly and with greater scalability which makes it perfectly fine for virtualization workloads.

NVMe-oF is versatile in its support for multiple transport layers where it can operate over Fibre Channel (FC), Ethernet with the TCP/IP protocol with RDMA capabilities through RoCE or iWARP, and even utilize InfiniBand, each offering unique performance features tailored to different deployment needs.

NVMe-oF via TCP

When NVMe-oF is deployed over Ethernet with TCP, it brings the benefits of NVMe storage to the broadest possible range of environments without the need for specialized network infrastructure like Fibre Channel or InfiniBand. This brings access to high-performance storage by utilizing the common and familiar TCP stack, significantly reducing complexity and cost. The adoption of NVMe-oF with TCP is further facilitated by the widespread availability of Ethernet and expertise, making it a compelling choice for organizations looking to upgrade their storage networks without a complete overhaul.

The protocol’s efficiency is well maintained even over TCP, allowing NVMe commands to be passed with minimal overhead, thus keeping latency at bay, which is critical for latency-sensitive applications like virtualized data servers.

Configuring NetApp Storage

General

The guide presumes that users have already established the foundational storage setup, including the configuration of Storage Virtual Machines (SVMs). It highlights that the administration of these systems is relatively straightforward, thanks to the intuitive web interface provided by NetApp storage systems. Users can expect a user-friendly experience when managing their storage solutions, as the web interface is designed to simplify complex tasks. This also includes the whole setup for NVMe-oF storage, which requires to enable NVMe in general on the SVM, setting up the NVMe namespace and the NVMe subsystem.

Note: All changes can of course also be performed in an automated way by orchestrating the OnTap API.

Enable NVMe on SVM

To enable NVMe at the SVM level on a NetApp storage system, this can typically be done by following these summarized steps, which can be accessed through the system’s web interface. Simply  navigate to the Storage menu, then to Storage VMs, and finally selecting the specific SVM name you wish to configure:

  1. Configure NVMe Protocol: Within the SVM settings, look for a section or tab related to protocols. Locate the NVMe option and enable it. This might involve checking a box or switching a toggle to the ‘on’ position.
  2. Save and Apply Changes: After enabling NVMe, ensure to save the changes. There might be additional prompts or steps to confirm the changes, depending on the specific NetApp system and its version.

Remember to check for any prerequisites or additional configuration settings that might be required for NVMe operation, such as network settings, licensing, or compatible hardware checks. The exact steps may vary slightly depending on the version of ONTAP or the specific NetApp model you are using. Always refer to the latest official NetApp documentation or support resources for the most accurate guidance.

Creating the NVMe Subsystem

Afterwards, a new NVMe subsystem can be created on the NetApp storage system. This can be done by selecting the Hosts section, and choosing NVMe Subsystem to start the process of adding a new subsystem. A new wizard opens up which requires additional information regarding the newly to be created subsystem:

Name: <An identifier of the NVMe subsystem>
Storage VM: <The previously adjusted SVM>
Host Operating System: Linux (important for the block size)
Host NQN: <NQN of the Proxmox VE node>

It’s essential to ensure that all information, especially the Hosts NQN, is correctly entered to avoid connectivity issues. Additionally, consult the official NetApp documentation for any version-specific instructions or additional configuration steps that may be required.

Creating the NVMe Namespace

As a last step the NVMe namespace must be created. To add an NVMe namespace on a NetApp storage system, you would begin by logging into the web interface and navigating to the Storage section, followed by selecting NVMe Namespaces. In this area, you initiate the process of creating a new namespace. You will be prompted to provide specific details for the namespace, such as its name, size, and the associated SVM that will manage it.

Note: Also take an important note about the Performance Service Level which might be switched to a custom profile to avoid any limitations.

Once you have entered all the necessary information, you will typically proceed to choose the performance characteristics, like the service level or tiering policy, depending on your performance needs and the capabilities of your NetApp system.

After configuring these settings, you need to review your choices and save the new namespace configuration to make it available for use. It is important to ensure that the namespace is properly configured to match the requirements of your environment for optimal performance and compatibility. Always check for any additional steps or prerequisites by consulting the NetApp documentation relevant to your ONTAP version or storage model.

The configuration on the storage part is now complete. The next steps will be performed on the Proxmox VE node(s).

Configuring Proxmox Node

General

After configuring the NetApp storage appliance, all Proxmox VE nodes within the cluster must be configured to use and access the NVMe-oF storage. Unfortunately, Proxmox VE does not support this type of storage out of the box. Therefore, this cannot be easily configured by the Proxmox web interface. Luckily, Proxmox VE is based on Debian Linux from where all needed dependencies and configurations can be obtained but it requires us to do everything on the command line (CLI). Depending on the amount of nodes within the cluster, further config management tools like Ansible may boost up the initial setup process and make it repeatable for new potential nodes in the future. We may also assist you by setting up custom config management environments fitting your needs.

In general, this process consists of:

The next steps in this blog post will cover the process in detail and guide you through the needed steps on the Proxmox VE nodes which must be done on the command line.

Installing Needed Packages

Using and accessing NVMe-oF requires the related user land tools (nvme-cli) to be present on the Proxmox VE node. Debian Linux already provides those tools within the Debian repository. As a result, the overall installation process is very easy. The package can simply be installed by the following command:

apt-get install nvme-cli

This package also already provides the required Kernel module which can simply be loaded by running:

modprobe nvme_tcp

Afterwards, the module should be added to be loaded at boot time:

echo "nvme_tcp" > /etc/modules-load.d/20-nvme_tcp.conf

After these steps, a connection with the storage can be initialized.

Connecting With the Storage

Interacting with the NetApp storage and its NVMe-oF functionality is a multi-step process and requires us to exchange the NVMe Qualified Name (NQN) address of each Proxmox VE node accessing the NVMe storage. The NQN address of a Proxmox VE node can be obtained by running the command:

cat /etc/nvme/hostnqn

Add the host NQN address on your NetApp export to allow the nodes accessing it. An example output is given in the screenshot.

Discovery & Connecting

In the next step, the NVMe’s will be discovered and connected to the Proxmox VE node. The discovery and connect process is simply done by running the following commands:

nvme discover -t tcp -a 192.168.164.100 -s 4420
nvme connect -t tcp -n na01-nqn01 -a 192.168.164.100 -s 4420

To make this configuration persistent to system reboots, the commands will also be added to the nvme discovery file. The nvmf-autoconnect systemd unit file ensures to load this. Therefore, this systemd unit file must also be enabled.

echo "discover -t tcp -a 192.168.164.100 -s 4420" >> etc/nvme/discovery.conf
systemctl enable nvmf-autoconnect.service

Volume Group

The last steps will partially be done on the command line and on the Proxmox web interface to add the new storage to the cluster.

Important:
This steps must only be done on a single Proxmox VE host node and not on all ones. The integration happens on cluster level and implies all Proxmox VE nodes once this has been done.

The last command on the command line results in creating a new LVM Volume Group (VG) on the new NVMe device which can simply be done by executing the command:

vgcreate shared_nvme01 /dev/nvme0n1

The newly created LVM Volume Group (VG) can be validated by running the command:

vgdisplay shared_nvme01

An output like in the given screenshot should be returned, including all further details of this VG. After validating the information, all tasks are completed on the command line.

To finally use this LVM Volume Group on all Proxmox VE nodes within the cluster, this Volume Group must be added and integrated on cluster level. Therefore, we need to login to the web frontend of the cluster and add this under:

In the new window some more details for the new LVM storage have to be defined:

By pressing on Add this will be attached to the selected nodes as a new volume and can directly be used.

Conclusion

The utilization of NVMe over Fabrics (NVMe-oF) via TCP in addition with Proxmox VE in a virtualization environment presents a compelling solution for organizations looking at cost-effective yet high-performance storage architectures. This approach leverages the widespread availability and compatibility of Ethernet-based networks, avoiding the need for specialized hardware such as Fibre Channel or InfiniBand, which can be cost-prohibitive for many enterprises.

By integrating NVMe-oF with Proxmox, a popular open-source virtualization platform, users can benefit from significantly improved data transfer speeds and lower latency compared to traditional storage solutions. NVMe-oF may not only be used with Proxmox VE but also on other operating systems like FreeBSD and hypervisors like bhyve. This provides a great benefit for latency-sensitive workloads, such as virtualized database servers, where rapid access to data is critical for performance. The NVMe protocol is designed to exploit the full potential of solid-state storage technologies. When used over a network fabric using TCP, it can deliver near-local NVMe performance by being very cost-effective.

At your convenience, we’re available to provide more insights into NetApp storage systems, covering both hardware and software aspects. Our expertise also extends to open-source products, especially in establishing virtualization environments using technologies like Proxmox and OpenShift or in maintaining them with config management. We invite you to reach out for any assistance you require.

You might also be interested in learning how to migrate VMs from VMware ESXi to Proxmox VE or how to include the Proxmox Backup Server into your infrastructure.

Creating a Virtualization Environment

Everyone is nowadays talking about containerization but there are still many reasons to run virtual machines and it doesn’t always have to be Proxmox VE on Linux systems to create a virtualization environment with open-source tools!

Next to the well known Proxmox, there are also other open-source alternatives when it comes to virtualization. A promising solution is also given by bhyve (pronounced bee-hive) which runs on FreeBSD based systems. It was initially written for FreeBSD but now also runs on a number of illumos based distributions like OpenIndiana. bhyve offers a robust and high-performance virtualization solution where it directly operates on the bare metal, utilizing the hardware virtualization features for enhanced performance and isolation between virtual machines. Known for its performance, stability and security, bhyve is integrated into FreeBSD, benefiting from the reliability of the FreeBSD kernel. Of course it also provides the typical feature set such as snapshotting and cloning of VMs. Especially in such cases it benefits from additional features in FreeBSD like the ZFS filesystem. Unfortunately, bhyve does not offer any web-frontend for its administration. This is where bhyve-webadmin (BVCP) steps in to fill the gap. This blog post we will cover the initial setup and features of bhyve and bhyve-admin to provide a fully usable virtualization environment.

Features

bhyve-webadmin (also known as BVCP) provides an API, CLI and a secure web interface for administrating bhyve and virtual machines. bhyve-webadmin mainly features:

Requirements

Some general requirements must be fulfilled to run bhyve bundled with bhyve-webadmin:

Additional configuration regarding the underlying storage and network configuration are needed but while these are individual for each setup this is not covered in detail within this guide.

Installation

This guide is based on FreeBSD 14 and bhyve-webadmin v1.9.8p9 to provide a virtualization infrastructure. bhyve-webadmin’s concept relies on working as close as possibly to the FreeBSD system and will not change its configuration. Instead, it will install and maintain everything in dedicated directories, configurations and services. As a result, the whole installation including all further dependencies and configuration can be done in minutes.

Overview

File-Structure

By default, bhyve-webadmin will use the following directories for the content:

Additional helper tools can be found within these directories. For example, a forgotten password can be resetted by running:

Services

As already mentioned before, it consists of multiple and independently working software components that can also be orchestrated.

Frontend:

service bvcp-frontend start / stop / restart

Backend:

service bvcp-backend start / stop / restart

Helper:

service bvcp-helper start / stop / restart

Software Installation

First, the needed code from the GitHub project must be obtained and will be downloaded to /tmp/. Afterwards, the .tar.gz archive must be extracted by invoking the tar command:
cd /tmp/
fetch https://github.com/DaVieS007/bhyve-webadmin/archive/refs/tags/v1.9.8p9.tar.gz
tar xfvz v1.9.8p9.tar.gz
The downloaded package integrity should be verified by its sha256 hash when obtained from external resources:
sha256sum /tmp/bhyve-webadmin-1.9.8p9.tar.gz
$> 758f5900c75a1832c980ed23c74a893f05252aa130e7c803843dac1d2531516f  /tmp/bhyve-webadmin-1.9.8p9.tar.gz
SHA256: 758f5900c75a1832c980ed23c74a893f05252aa130e7c803843dac1d2531516f
Afterwards, the installation of bhyve-admin can be started. The installation process is designed as a “one click” installer and performs everything on its own:
cd bhyve-webadmin-1.9.8p9/
./install.sh

After finishing the installation, all needed directories, configuration and services have been created and the services are already running on the system. On the CLI, the login credentials for the web interface are provided and a login on the web frontend on https://<ip>:8086 is possible. By default, self-signed certificates are generated for the encryption of the web frontend and VNC sessions. The generated self-signed certificates can be later replaced by proper ones (e.g. by using Let’s Encrypt).

Adding ISO-Images

ISO-Images for virtual machines are located in /vms/iso_images. Adding images is simply done by dropping the images within that directory. This can be done on the uploading images via SCP, SFTP or on the system itself by downloading an image from a remote like in the given example:

cd /vms/iso_images/
fetch https://download.freebsd.org/releases/amd64/amd64/ISO-IMAGES/14.0/FreeBSD-14.0-RELEASE-amd64-bootonly.iso

Configuration

An initial first login on the web frontend can now be done. The web frontend can be reached on https://<ip>:8086 and uses self-signed certificates by default. The credentials for the root user login have been printed on the CLI during the setup runtime.

bhyve BVCP Login

After a successful login, the default overview page is being displayed.

After an initial installation it will guide you through the following three next steps which will be covered in detail:

Configure Network

Networking is one of the most important and complex parts when it comes to virtualized environments. Even more complex setups including VLAN separations etc. are possible but not covered within this guide which primarily focuses on a simple bridged– & NAT– networking setup.

Bridged Network

A simple bridged network can be created by clicking on the first Option 1. Configure at least one virtual network where we can create a new network on our cluster. Creating a new network is simple where just the desired network interface on the cluster must be selected.

Within this guide we select the following options which may vary depending on your personal setup:

Network Gateway to bound: (+) [em0] (mtu: 1500) 10.10.10.77
Unique Network Name: uplink-dev
Descriptive Text: Uplink network for the dev environments

bhyve BVCP – Create Network

In the given drop-down menu all available network cards will be shown with their interface name and assigned ip address(es) which makes it easy to choose the right one. After saving the configuration, the newly created network is directly usable.

NAT Network

NAT is already considered as a more advanced networking setup but often needed when working with RFC 1918 addresses. NAT is not supported by default and needs to be configured manually. Within this guide a NAT networking will be created for the interface em1 by using PF (a FreeBSD integrated firewall).

Therefore, we enable the PF service in the rc.conf file by running the following command, enabling IP forwarding and directly edit the firewall configuration file:

echo 'pf_enable="YES"' >> /etc/rc.conf
echo "net.inet.ip.forwarding=1" >> /etc/sysctl.conf
sysctl -f /etc/sysctl.conf
vi /etc/pf.conf

The firewall configuration for NAT networking on the em1 interface will be extended by the following content:

nat on em1 from {10.10.10.1/24} to any -> (em1)

Optional: Also port forwarding can be defined in this file which would need to add:

# Forward SSH (tcp/22) to NAT destination host 192.168.1.99
rdr on em1 proto tcp from any to em1 port 22 -> 192.168.1.99 port 22

As a last step, a dummy interface must be created, that be selected in the web frontend service. As a result, this will be added to the /etc/rc.conf file by adding:

cloned_interfaces="nat0"
ifconfig_nat0="inet 192.168.1.1 netmask 255.255.255.0 up"

Finally, the required services can be restarted by running:

service netif restart && service routing restart

Configure Storage

In a next step the storage must be configured. Therefore, this can simply be initiated by clicking on 2. Configure at least one data store in the web UI. By default, no one is created and configured:

bhyve BVCP – Storage Overview

By clicking on Create Storage a new storage will be created. Newly created storages must be a mount point.

Create VM

Within the last step, the first virtual machine gets created. This can be finally done by clicking on 3. Create your first Virtual Machine where the options for the VM will be defined.

bhyve BVCP – Create VM

Within this menu, a unique name, a description, the hardware resources like number of cores and memory als the underlying hardware architecture must be defined. After saving, the VM object gets created and we can perform additional tasks like defining an ISO-Image that should be booted in the VM.

bhyve BVCP – Add ISO Image to VM

As already mentioned before in the File Structure chapter, all available ISO-Images must be located in /vms/iso-images. Placed images can be directly selected from the drop-down menu.

Within the last step, the VM can be started by clicking on the play button. Switching back to the Virtual Machine overview (List of Virtual Machines) the VM will be displayed with a green icon as an indication that this virtual machine is up and running.

The VM can now be accessed by the integrated VNC server by clicking on the monitor icon in the middle of the option menu.

Statistics

Performance metrics and statistics of the system usage are an important task. BVCP offers many options to validate the machine health status including performance metrics and statistics of the memory-, CPU-, storage-, networking, etc. usage to quickly detect any negative impacts on an overloaded host node. Beside this, it can also help to evaluate the source of any performance problems in the setup or of a single virtual machine instance.

Conclusion

It doesn’t always have to be Proxmox on Linux systems to create a fast and secure virtualization environment with open-source tools. FreeBSD, bhyve and bhyve-webadmin (BVCP) offer a great bundled solution to run a dedicated virtualization infrastructure which does not need any knowledge in FreeBSD, bhyve or any CLI commands for the end user. Given by the included IAM, users are able to login to a graphical web frontend and to manage their virtual machines on their own.

As a specialists for open-source and open-source infrastructure we are also happy to assist you and your business in BSD based systems including their features like ZFS, PF, Jails, bhyve etc.. Since 1999, credativ® has been recognized for providing 24/7 open source support that rivals manufacturer support. We do not just provide technical support, we also provide all other services across the entire lifecycle of open-source landscapes completed by high flexibility.

From April, 18th until Friday, 21st the KubeCon in combination with the CloudNativeCon took place in Amsterdam: COMMUNITY IN BLOOM. An exciting event for people with interest in Kubernetes and cloud native technologies.

At credativ, it is must to pre-train us in many relevant areas. This of course includes Kubernetes and Cloud Native technologies. The KubeCon/Cloud Native Con has been one of the conferences on our must-attend list for several years now.

A short diary

We started our journey to the KubeCon by Tuesday evening with the badge pickups. On Wednesday the Keynotes started with the usual welcome words and opening remarks.

The information that 10000 attendees have been registered with additional 2000 people on the wait list was really impressive and shows the importance of Cloud Native technologies. Nearly 58% of the attendees were new to the conference which proves that more and more people get in touch with Kubernetes and Co.

In addition to the common sponsored keynotes a short update of the CNCF graduated projects was presented. There was a wide variation of projects. From FluxCD to Prometheus, Linkerd, Harbor and many more.

The second day started once again with keynotes which included several project updates e.g. Kubernetes and incubating projects.

The last day as usual opened with keynotes. A highlight here was the presentation “Enabling Real-Time Media in Kubernetes” which gave some insights about a Media Streaming Mesh.

Supplemental to the talks and presentations some tutorials happened. Those tutorials usually take at least two time slots and therefore, provide a deeper insight into a specific topic and left room for questions. The tutorials we visited were well prepared and several people were cruising through the attendees to help and answer questions. One of those tutorials showed the usage and benefits of Pixie which provides deep insights into a system using eBPF and various open source projects.

Beyond the tracks a booth location was available, it has been divided (by halls) to the company related booths and an area with projects. NetApp was represented at several booths.

The main theme this year seemed to be all about eBPF and Cilium. Various presentations on different tracks highlighted this topic and showed areas of application for eBPF. Different Cilium talks presented various aspects of Cilium for e.g. observability or multi-cluster connections and application failover.

Not so good

One bad thing has to be mentioned. Some talks were full. Really full. To some of them we got no access due to the fact, that the room was filled 15-30 minutes before the talk started. Maybe it would be possible for the next time to ask all users to create a personal schedule in the corresponding app and reassign the rooms by the amount of interested (scheduled) people.

Keynotes, Talks and Presentations

A short overview about the (visited) highlights of the talks and presentations:

Conclusion

As always, the conference was worthwhile for gaining new impressions, having exchange with interesting people and expanding one’s knowledge. We were certainly happy to participate are already looking forward to attending the next KubeCon.

 

In November 1999, 20 years ago, credativ GmbH was founded in Germany, and thus laid the first foundation for the current credativ group.

At that time, Dr. Michael Meskes and Jörg Folz started the business operations in the Technology Centre of Jülich, Germany. Our mission has always been to not only work to live, but also to live to work, because we love the work we do. Our aim is to support widespread use of open source software and to ensure independence from software vendors.

Furthermore, it is very important for us to support and remain active in open source communities. Since 1999 we have continuously taken part in PostgreSQL and Debian events, and supported them financially with sponsorships. Additionally, the development of the Linux operating system has also been a dear and important project of ours. Therefore, we have been a member of the Linux Foundation for over 10 years.

In 2006 we opened our Open Source Support Center. Here, for the first time, our customers had the opportunity to get the support for their entire Open Source infrastructure with just one contract. Since then we have expanded and included different locations into a globally operating Open Source Support Center.

Thanks to our healthy and steady growth, credativ grew to over 35 employees at its worldwide locations by our 10th anniversary.

Since then, the founding of credativ international GmbH in 2013 marked another milestone in credativ’s history, as the focus shifted from a local to a global market. We were also able to expand into different countries such as the USA and India.

We have grown now to over 80 employees, with 20 years of company history. credativ is now one of the leading providers of services and support for open source software in enterprise use. We thank our customers, business partners, and employees for their time together.

This Artikel was originally written by Philip Haas.