I have a data center that I don’t physically visit very often. Recently I’d replaced some UPS units that conveniently had some external temperature sensors on them which I used to use to monitor the environmental temperature. One of my HVAC units does report temperature, but because it isn’t too close to the main equipment rack, I really wanted to know what sort of temperature my servers were experiencing directly in the rack. I finally came up with a way to do this with some of my HP ProLiant DL360 G7s. I used to do this with Dell’s that had built-in temperature sensors, but we don’t have any of them around any more. The hardest part of the process was figuring out all the necessary drivers and agents required to get the sensors monitored and able to be queried by snmp on the HPs.
First thing you need to do is download and install the HP Health command line utility and snmp agents on your server that needs to be monitored. I located them by going to http://www.hp.com and then Support > Put in “DL360 G7” as model > Linux > choose “Software – System Management” category > Download both the HP SNMP Agents for Linux as well as the HP Health command line utility.
Then I installed them and some standard RHEL packages:
# rpm -ivh ./hp-health-10.10-1710.30.rhel6.x86_64.rpm
# rpm -ivh ./hp-snmp-agents-10.10-2732.25.rhel6.x86_64.rpm
# yum install net-snmp net-snmp-utils lm_sensors
** Don’t forget to edit the /etc/snmp/snmpd.conf file to allow only appropriate access **
I’m not going to detail these steps in particular as this is more generic setup that you find in the man pages, etc. However, you WILL need to source the dlmod for the HP agent as follows in the snmpd.conf file as shown below:
dlmod cmaX /usr/lib64/libcmaX64.so
– – Or you can alternatively run:
# /sbin/hpsnmpconfig
(And then chose to use existing snmpd.conf, or create a new one. However, I found it easier to simply add in the dynamic module as I did above.)
** Also, I ran through the sensors-detect in order to see if there was anything else I could monitor of interest
# sensors-detect
…and chose to update the config file when prompted “Do you want to overwrite /etc/sysconfig/lm_sensors? (YES/no): YES”
Once all that was done I gave the new services a clean start if they weren’t already running and made sure they start at boot time:
# /etc/init.d/hp-snmp-agents restart
# /etc/init.d/snmpd restart
# chkconfig snmpd on
Now for the fun part!
I searched around online and figured out the at the Health / thermal MIB OIDS start at: 1.3.6.1.4.1.232.6.2.6 -ish
and that temp Sensor locations by INTEGER can be identified as follows: (I was really only interested in the ambient “11” sensor for this project, but you could track whichever ones you really wanted)
Locations: other(1), unknown(2), system(3), systemBoard(4), ioBoard(5), cpu(6),
memory(7), storage(8), removableMedia(9), powerSupply(10), ambient(11),
chassis(12), bridgeCard(13)
So then, from using the SNMPWALK utility I was able to derive all the available sensor locations:
e.g. snmpwalk -Of -c public -v 1 localhost 1.3.6.1.4.1.232.6.2.6.8
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.1 = INTEGER: 11
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.2 = INTEGER: 6
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.4 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.5 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.7 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.9 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.11 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.12 = INTEGER: 10
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.13 = INTEGER: 10
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.14 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.15 = INTEGER: 6
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.16 = INTEGER: 6
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.17 = INTEGER: 7
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.18 = INTEGER: 6
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.19 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.20 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.21 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.22 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.23 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.24 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.25 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.26 = INTEGER: 3
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.27 = INTEGER: 8
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.3.1.28 = INTEGER: 3
And what the current reading in Celsius of the sensor is:
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.1 = INTEGER: 22
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.2 = INTEGER: 40
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.4 = INTEGER: 34
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.5 = INTEGER: 33
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.7 = INTEGER: 31
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.9 = INTEGER: 32
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.11 = INTEGER: 32
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.12 = INTEGER: 34
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.13 = INTEGER: 45
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.14 = INTEGER: 29
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.15 = INTEGER: 31
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.16 = INTEGER: 31
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.17 = INTEGER: 28
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.18 = INTEGER: 40
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.19 = INTEGER: 35
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.20 = INTEGER: 37
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.21 = INTEGER: 43
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.22 = INTEGER: 45
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.23 = INTEGER: 40
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.24 = INTEGER: 48
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.25 = INTEGER: 36
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.26 = INTEGER: 48
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.27 = INTEGER: 35
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.28 = INTEGER: 72
So, e.g. specifically looking at the “Ambient Sensor” we see a reading of 22 degrees Celsius (~71 deg F)
# snmpwalk -Of -c public -v 1 localhost 1.3.6.1.4.1.232.6.2.6.8.1.4.1.1
.iso.org.dod.internet.private.enterprises.232.6.2.6.8.1.4.1.1 = INTEGER: 22
Then on my Nagios server, I defined the new service and all the other params and restarted Nagios. Below are the config files and custom commands I set up for this (your set up may be slightly different, but pretty close):
** /etc/nagios/conf.d/service_templates.cfg
define service {
register 0
use default-service
name hp_check_ambienttemp
service_description HP Ambient Temperature
servicegroups hpenv
check_command snmp_hpenv_ambienttemp!public
}
** /etc/nagios/servers/servername.cfg
define host {
host_name someservername
address someservername.fqdn
hostgroups acceptance
alias Some HP DL360 G7 Server
use server
}
define service {
host_name someservername
use ping,lowpriority
}
define service {
host_name someservername
use hp_check_ambienttemp,graph,critical
notifications_enabled 0
}
** /etc/nagios/conf.d/commands.cfg (Note how I set 25 degrees as the warning threshold and 26 degrees as critical)
# ‘snmp_hpenv_ambienttemp’ command definition
define command {
command_name snmp_hpenv_ambienttemp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ -C $ARG1$ -o .1.3.6.1.4.1.232.6.2.6.8.1.4.1.1 -w 25 -c 26 -l ‘\Ambient Air Temp\’ -u ‘\Celcius\’
}
** /etc/nagios/conf.d/servicegroups.cfg
define servicegroup {
servicegroup_name hpenv
alias HP Server Envir Monitors
}
———
Here’s what the result is in Nagios.. (also, graphing is a good idea, but that could be another post).