Introduction
Using keepalived in combination with a couple of HAProxy instances is a convenient yet powerful way of ensuring high availability of services.

Up until now, I’ve considered it enough to monitor the VMs where the services run, and the general availability of a HAProxy listener on the common address. The drawback is that it’s hard to see if the site is served by the intended master or the backup load balancer at a glance. The image to the right shows the intended – and at the end of this article achieved – result, with the color of the lines between nodes giving contextual information about the state of the running services.
Monitoring state changes could naïvely be achieved by continuously tailing the syslog and searching for “entered the MASTER state”. This would be a pretty resource-intensive way of solving the issue, though. A less amateurish way to go about it would to use keepalived’s built-in capability of running scripts on state changes, but there are a number of situations in which you can’t be sure that the scripts are able to run, so that’s not really what we want to do either.
Fortunately, keepalived supports SNMP, courtesy of the original author of the SNMP patch for keepalived, Vincent Bernat. In addition to tracking state changes, it potentially allows us to pull out all kinds of interesting statistics from keepalived, as long as we have a third machine from which to monitor things. Let’s set it up.
The deed
First of all, snmpd must be installed on the load balancers:
$ sudo apt update; sudo apt install snmpd
Next, let’s create a very basic SNMP listener. We begin by backing up our default snmpd configuration.
$ sudo mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf.orig
Re-create an empty /etc/snmp/snmpd.conf and add the following lines:
master agentx # Keepalived requires agentx for SNMP rocommunity public zabbixserver.mydomain # Only accept SNMP queries from the Zabbix server. authtrapenable 1 # Yell on auth errors. trapcommunity public trap2sink zabbixserver.mydomain
Restart snmpd:
$ sudo service snmpd restart
Now let’s edit the startup options for keepalived to ask it to actually speak SNMP with us. The file is called /etc/default/keepalived and for this purpose only needs to contain this single line:
DAEMON_ARGS=" -x" # ..or " --snmp"
And finally we need to tell keepalived to throw traps. Add the following clause to the top of /etc/keepalived/keepalived.conf (this example assumes that you already have a valid keepalived configuration):
global_defs { enable_traps }
Now let’s restart keepalived and see what happens.
$ sudo service keepalived restart
Monitoring
After putting the relevant MIB (KEEPALIVED-MIB available from Vincent Bernats Github site, or from Zabbix Share linked elsewhere in this article) in one of the MIB directories on the monitoring server and restarting the monitor server’s instance of snmpd, we should be able to test the features available. Since I’ve only installed and configured SNMP on the backup server yet, let’s check that one:
$ snmpwalk -v2c -cpublic testmachine2 KEEPALIVED-MIB::vrrpInstanceState KEEPALIVED-MIB::vrrpInstanceState.1 = INTEGER: backup(1) KEEPALIVED-MIB::vrrpInstanceState.2 = INTEGER: backup(1)
OK, that looks like it should. Now let’s shut down HAProxy on the master server, which should cause keepalived to fail over:
$ sudo service haproxy stop
And let’s check on the monitoring server again:
$ snmpwalk -v2c -cpublic testmachine2 KEEPALIVED-MIB::vrrpInstanceState KEEPALIVED-MIB::vrrpInstanceState.1 = INTEGER: master(2) KEEPALIVED-MIB::vrrpInstanceState.2 = INTEGER: master(2)
That works just fine. Starting the HAProxy service on the master server made the backup server return to its initial state, which means that this works.
My monitoring server is already configured as an SNMP trap sink, so let’s just see what happened in my SNMP trap log when I killed the master instance of HAProxy:
15:19:30 2016/10/14 ZBXTRAP testmachine2
PDU INFO:
errorindex 0
errorstatus 0
receivedfrom UDP: [testmachine2]:53447->[zabbixserver.mydomain]:162
notificationtype TRAP
messageid 0
version 1
requestid 1067853884
community public
transactionid 23400
VARBINDS:
DISMAN-EXPRESSION-MIB::sysUpTimeInstance type=67 value=Timeticks: (36857) 0:06:08.57
SNMPv2-MIB::snmpTrapOID.0 type=6 value=OID: KEEPALIVED-MIB::vrrpInstanceStateChange
KEEPALIVED-MIB::vrrpInstanceName type=4 value=STRING: "SITEINSTANCE1"
KEEPALIVED-MIB::vrrpInstanceState type=2 value=INTEGER: 2
KEEPALIVED-MIB::vrrpInstanceInitialState type=2 value=INTEGER: 1
KEEPALIVED-MIB::routerId.0 type=4 value=STRING: "testmachine2"
The relevant line is highlighted in blue. We can follow the vrrpInstanceState and see that its value is 2. This corresponds to what we saw from snmpwalk earlier, which means that we’re technically done.
What remains is to catch the trap in my monitoring application and to apply it. I use Zabbix as my preferred monitoring tool. Thanks to Stephen E. Fritz, there are pre-made templates for keepalived on Zabbix Share, which we can use to gather statistics information.
We now have two ways of keeping track of what keepalived is up to: Polling the load balancer with regular SNMP queries, we can create graphs and trends of the uptime and various traffic data from keepalived, and when hit by an SNMP trap, we can easily trigger notification events.
A Zabbix 3.2 SNMP trigger example
I’ve created an SNMP Trap type item, with Type of information: Log. The key for the item is snmptrap[KEEPALIVED-MIB::vrrpInstanceStateChange]. This is complemented by a trigger with a Problem expression where I compare the vrrpInstanceState and the vrrpInitialState provided by the trap to determine the nature of the event:
{Template SNMP Traps Keepalived:snmptrap[KEEPALIVED-MIB::vrrpInstanceStateChange].str(KEEPALIVED-MIB::vrrpInstanceState type=2 value=INTEGER: 2)}=1 and {Template SNMP Traps Keepalived:snmptrap[KEEPALIVED-MIB::vrrpInstanceStateChange].str(KEEPALIVED-MIB::vrrpInstanceInitialState type=2 value=INTEGER: 1)}=1
To reset the trigger, I’ve set up a corresponding OK Event that closes the problem when the values of vrrpInstanceState and of vrrpInstanceInitialState are equal.
Shutting down the Master HAProxy service to trigger a fail over, the following line shows up in the Zabbix Problems screen:


To return to the at-a-glance view mentioned at the start of this article: What do we see when we’ve failed over? As per the image at the start of the article, I’ve configured a bright blue “Color OK” value for the line indicating the connection state of the backup load balancer to indicate that the services on the server are running and ready to take over in case of failure of the master node. The image to the right clearly shows that it’s easy to see when a failover occurs with green lines turning red and the previously blue link to the backup load balancer turning green.
This article has illustrated one of my favourite things about the Unix philosophy: By passing simple text content between programs that each do one thing well we have created a whole that is a lot bigger than its individual parts.
Hi, i use your howto, on centos is needed config without comment. With comment not work agentx.
Client step
1)Open FW
firewall-cmd –add-service=snmp –permanent
firewall-cmd –reload
2)Install snmp
yum install net-snmp
3)Change config (me conf)
vi /etc/snmp/snmpd.conf
#agentAddress udp:127.0.0.1:161
master agentx
rocommunity public ZABBIX_SERVER_HOSTNAME
authtrapenable 1
trapcommunity public
trap2sink ZABBIX_SERVER_HOSTNAME
4) Restart SNMP
systemctl restart snmpd.service
5) Change startup option
vi /etc/sysconfig/keepalived
# Options for keepalived. See `keepalived –help’ output and keepalived(8) and
# keepalived.conf(5) man pages for a list of all options. Here are the most
# common ones :
#
# –vrrp -P Only run with VRRP subsystem.
# –check -C Only run with Health-checker subsystem.
# –dont-release-vrrp -V Dont remove VRRP VIPs & VROUTEs on daemon stop.
# –dont-release-ipvs -I Dont remove IPVS topology on daemon stop.
# –dump-conf -d Dump the configuration data.
# –log-detail -D Detailed log messages.
# –log-facility -S 0-7 Set local syslog facility (default=LOG_DAEMON)
#
KEEPALIVED_OPTIONS=”–snmp -D”
6) Add traps ( i add on first line)
vi /etc/keepalived/keepalived.conf
global_defs {
enable_traps
}
7) Restart and check keepalive
systemctl restart keepalived.service
Feb 10 13:02:49 ts4000zkdblb02 Keepalived[8533]: Stopping Keepalived v1.2.13 (11/05,2016)
Feb 10 13:02:49 ts4000zkdblb02 systemd: Cannot add dependency job for unit microcode.service, ignoring: Unit is not loaded properly: Invalid argument.
Feb 10 13:02:49 ts4000zkdblb02 systemd: Stopping LVS and VRRP High Availability Monitor…
Feb 10 13:02:49 ts4000zkdblb02 systemd: Starting LVS and VRRP High Availability Monitor…
Feb 10 13:02:49 ts4000zkdblb02 Keepalived[9431]: Starting Keepalived v1.2.13 (11/05,2016)
Feb 10 13:02:49 ts4000zkdblb02 systemd: PID file /var/run/keepalived.pid not readable (yet?) after start.
Feb 10 13:02:49 ts4000zkdblb02 Keepalived[9432]: Starting Healthcheck child process, pid=9433
Feb 10 13:02:49 ts4000zkdblb02 Keepalived[9432]: Starting VRRP child process, pid=9434
Feb 10 13:02:49 ts4000zkdblb02 systemd: Started LVS and VRRP High Availability Monitor.
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_healthcheckers[9433]: Netlink reflector reports IP 10.253.50.11 added
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_healthcheckers[9433]: Netlink reflector reports IP fe80::250:56ff:fe8b:24d4 added
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_healthcheckers[9433]: Registering Kernel netlink reflector
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_healthcheckers[9433]: Registering Kernel netlink command channel
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_healthcheckers[9433]: Starting SNMP subagent
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: Netlink reflector reports IP 10.253.50.11 added
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: Netlink reflector reports IP fe80::250:56ff:fe8b:24d4 added
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: Starting SNMP subagent
Feb 10 13:02:49 ts4000zkdblb02 Keepalived_vrrp[9434]: NET-SNMP version 5.7.2 AgentX subagent connected
Thank you, I should have thought about that.