You are here: Home > Products > Network Monitor Pro > User Manual > Fail Safe Monitoring - How to Ensure This with Server Clusters?

Fail Safe Monitoring - How to Ensure This with Server Clusters?

Fail Safe Monitoring - How to Ensure This with Server Clusters?

Computer networks have long become critical components of business processes. The failure of such a system actually means stopping the activities of the entire organization. Monitoring and diagnosing network performance is one of the main tasks of ensuring the health of an enterprise. This is a continuous process of monitoring a digital network in order to timely detect faults and errors in it with a quick and adequate response to them. In this regard, the question arises about the high availability of the monitoring service itself and its uninterrupted operation.

The 10-Strike Network Monitor Pro program implements several mechanisms that ensure fault tolerance of the monitoring system. The use of these mechanisms allows you to keep the network infrastructure under reliable control. Let's review the available fail-safe monitoring methods in the software, up to a failover cluster of monitoring servers in the end.

 

Monitoring Service Guard

The monitoring server watchdog is implemented as a service called 10-Strike Network Monitor Watchdog. It is installed on each host where the main program's installation package or the monitoring server's installation package is deployed). The service monitors the parameters of the 10-Strike Network Monitor Pro Service monitoring server service and the Firebird Server DBMS. If these services stop or crash, the guard will automatically restart them. If five attempts to start these services are unsuccessful, the guard sends an e-mail notification to the specified address. It should be taken into account that even if you regularly stop the monitoring server service to maintain the database, the active guard will start it again in a few seconds. Therefore, in such situations, it is necessary to stop first the watchdog service, and then stop the monitoring server.

The watchdog monitors not only the state of the service (started / not started), but also the activity of the monitoring process itself and its parameters. If the monitoring process stops showing signs of activity (writing a tag in the database), the guard also restarts the service. It does the same if a certain number of checks cease to be performed, which may signal the accumulation of internal errors during the monitoring process.

Using a watchdog prevents monitoring from stopping due to software errors in the monitoring server.

 

Database Backup Server

In the distributed monitoring system, the central database is located on one physical server, and the monitoring servers are usually installed on others. Monitoring services connect to the database via TCP and exchange the information with it, receiving settings, and recording the collected statistics and test results. A situation can occur when the database server fails or the connection with it is lost. This threatens the loss of the operational information that the monitoring server records to the database.

reserve monitoring database server

To solve this problem, it is proposed to use a backup server with an installed database, to which the entire file system (or part of the files) is replicated. The monitoring service settings provide a parameter that contains the address of the backup database server. The monitoring server and the GUI console can switch to it automatically if the main database server becomes unavailable. The address of the backup server is set in the database connection settings of the program. You must independently configure the database replication (the file system mirroring, as an option) to maintain the current state of the backup copy.

setting up database server connection for monitoring

 

Protection against the Information Loss when Connection to the Central Database is Lost

There is an alternative solution to the problem of losing the connection between the remote monitoring server and the central database. If the backup database server is not specified and configured, the monitoring service will switch to the local database, which is configured and installed automatically from the distribution kit (c:\ProgramData\10-Strike\Network Monitor Pro\LocalDB\NETMONITOR.FDB ). The monitoring server will write the host monitoring statistics to the local database until the normal communication with the main database is restored. When the connection is restored, all the data accumulated during this downtime in the local database will be automatically transferred to the main database. The entire process is completely automatic and does not require any action from the user. No additional software settings are required for this functionality to work either.

synchronization of the 10-Strike Network Monitor Pro program's monitoring database

The service log is recorded to the program log (c:\ProgramData\10-Strike\Network Monitor Pro\Logs\NetMonitorPro.log) in such situations. If the connection to the database is interrupted for a long time, you can check whether the data was successfully transferred to the central database.

 

Monitoring Server Reservation - Working with Secondary Backup Server

The reservation can be performed not only for the central database server, but also for the monitoring server itself. Let's consider the situation when a host with the monitoring service installed fails. In this case, the monitoring process stops completely. To avoid such incidents, it is proposed to install a backup monitoring server on a second physical host and give it the same ID. Thus, we get two active monitoring services connected to the same database and having the same ID in the configuration file c:\ProgramData\10-Strike\Network Monitor Pro\NetMonitorPro.ini. The service that starts first is automatically set to active and begins to perform all checks. The service that starts second becomes a backup service. It also reads the same list of checks from the database, but does not excute them. The backup service constantly monitors the activity of the first service based on certain indicators in the database. If the first service becomes not active for some reason, the backup monitoring service automatically starts performing checks without downtime.

reservation of monitoring servers in the 10-Strike Network Monitor Pro system

When the first server is restored to normal operation, the first service takes the backup service role and it begins to monitor the activity of the second service. And so on...

This solution can be used when creating a failover cluster of servers.

 

 

Related Links: