The proper functioning of IT infrastructure and server landscapes needs to be constantly monitored. This can be achieved by regularly performing automated tests. In their simplest form, these tests can only validate a web server’s availability. However, there is also the option of performing highly complex tests, such as the sending of automated and regular messages by a message broker like RabbitMQ.
These days, modern monitoring systems do not rely exclusively on spot tests, instead they typically also save a range of measured values which can be used to derive a projection for the future when the correct data is selected and the curve of the graph is considered.
The basic objective of a monitoring system is to avoid or at least reduce failures of important systems or infrastructure components.
A monitoring concept ensures that all aspects of the operator’s requirements have been covered. The actual monitoring method is also defined as part of the concept.
In this case we distinguish between fact-checking systems and trend-based systems. Both technologies are generally used in tandem.
A concept also requires a corresponding catalog of measures. These may range from simple fault indication through to complex notification cascades and automatic failover controls.
The keyword “Alerting” refers to all automatic measures that a monitoring system can use to alert the responsible administrators of a possible fault.
Types of notification
This can involve a simple notification on a website, an email or SMS. Various chat systems, such as IRC, Slack, Mattermost, WhatsApp and Telegram can also be connected, in which a monitoring system sends warnings or fault messages. Further extension levels also enable the use of phone calls or SMS notifications.
Signaling units such as flashing lights and sound generators are occasionally also used in these contexts.
These kinds of alerting systems also enable escalations, which sometimes involve complex conditions. Escalations by severity of the individual fault or number of faults in a defined period, or even the failure of a response of an emergency contact within a maximum defined response time are conceivable.
In the context of monitoring, logging relates to the recording of faults or warning statuses within the monitoring system.
This must not be confused with the centralized logging of various systems and the automatic evaluation of log files. However, the latter may also be a possible test within a monitoring concept.
The aim of this collection of information as part of the monitoring is to provide the operation with a history of relevant faults which enables appropriate, specific measures to be quickly derived where necessary.
On the one hand, the trend analysis of modern monitoring concepts aims to identify problems that arise at an early stage before they actually become noticeable in the operation. For example, a storage unit that is slowly but surely filling up can be detected early on. Depending on the quality, the available data can even be used to make a projection for the future.
On the other, it can be used to balance periodic fluctuations. The current development of these systems is heading in this direction. For example, a real-life scenario may be that a certain, regularly performed task on a server every Monday at 7:00 PM fills up to 90% of the hard drive before clearing it. In a traditional, threshold-based system, this would either trigger a warning every Monday or possibly use a threshold of, for example, 95% which may be too high for reliable operation. But, in an ideal case, a trend-based system can filter out this pattern accordingly.
Do you have any questions on the topic of monitoring? Then we are the right partner for you!
Open Source Support Center
Our answers to the most frequently asked questions: