You are here: Home > Products > Network Monitor (Pro) > User Manual > Network Monitoring Checks > Monitoring a Hard Drive's S.M.A.R.T. Parameters

Monitoring a Hard Drive's S.M.A.R.T. Parameters

This feature is available in the PRO version only. HDD, SSD, NVMe drives are supported.

S.M.A.R.T. is a hard drive's built-in self-diagnostic system that measures and reports various parameters. It can also predict when the hard drive is going to fail. S.M.A.R.T. can monitor the HDD's temperature and other indicators, including some attributes that indicate its imminent failure. It should be noted that S.M.A.R.T. can predict mechanical failures, which account for about 60 percent of all HDD failures.

The Application automatically estimates the specified reliability parameters by comparing them with threshold values, and warns you if a hard drive failure is imminent. Moreover, the Application can estimate the overall condition of a hard drive based on a set of parameters. For this check to work, you need to install the Remote Agent on the target computer. Read more about the Remote Agents in the Installing and Configuring the Remote Agent section.

Adding the SMART check and selecting a HDD or SSD for monitoring

adding the SMART check and selecting HDD for monitoring

Configuring the SMART check and selecting parameters for monitoring

configuring the SMART check and selecting parameters for monitoring

You can display the selected parameter value on a widget which will be added to the dashboard or placed on the graphic network map.

 

Solving the SMART Data Receiving Problem

If the program does not receive the S.M.A.R.T. data from one of your disks, or it is not listed at all, try switching the Agent to another mode. With this mode, the Agent will try to receive the S.M.A.R.T. data via the smartmontools utility. To enable this feature, open the NMAgent.ini Agent configuration file with any text editor and add the line:

GetSMARTMode=1

Example:

[AGENT]
TCPPort=45668
UsePasswd=0
Passwd=
UseIPFilter=0
IPList=
GetSMARTMode=1

Save the file and restart the 10-Strike Network Monitor Agent Service.

The NMAgent.ini file can be stored in the following folders:

  1. c:\Program Files (x86)\10-Strike Network Monitor Agent\ if the Agent was installed without the main program.
  2. c:\Program Files (x86)\10-Strike Network Monitor Pro\ if the Agent was installed together with the main program.

 

Monitoring S.M.A.R.T. Parameters for SSD Disks Connected via NVMe

The S.M.A.R.T. parameter set for NVMe SSD disks is different comparing with the ATA hard drive parameters. In most cases, this is a set of ready-made indicators, without specifying thresholds and worst-case values. Here's an example S.M.A.R.T. parameters for the M.2 SSD WD Blue SN500:

critical_warning: 0
temperature: 39
available_spare: 100
available_spare_threshold: 10
percentage_used: 1
data_units_read: 13733602
data_units_written: 14396473
host_reads: 365745477
host_writes: 391133456
controller_busy_time: 836
power_cycles: 1419
power_on_hours: 4612
unsafe_shutdowns: 13
media_errors: 0
num_err_log_entries: 1
warning_temp_time: 0
critical_comp_time: 0

Among these parameters, the most interesting are:

critical_warning

The parameter indicating the status of the disk:

  • 0 - disk is fine
  • 1 - disk resource is below threshold
  • 2 - temperature has exceeded the threshold
  • 4 - reliability decreases due to internal errors
  • 8 - disk is in read-only mode
  • 16 - volatile memory backup system error

temperature

Temperature in Celsius (sometimes it can be in Kelvin - you need to pay attention to this). Constant overheating of the SSD can lead to its rapid failure, therefore it is necessary to monitor this parameter.

percentage_used

Percentage of SSD resource consumed. As soon as this parameter reaches 100%, the SSD will go into read mode (locked). This is a very important parameter that needs monitoring. The task of the administrator is to track in advance those disks, the remaining resource of which is approaching the maximum value, and replace them.

media_errors

The number of times the controller encountered a fatal data integrity error. If the value of this parameter is constantly growing, you should consider replacing the SSD with a new one.

num_err_log_entries

The number of error log entries over the lifetime of the controller. As in the previous case, you should pay attention to the growth of this parameter.

 

Learn more on HDD and SDD SMART monitoring over the network in our article...