Thursday, 19 June 2014

HSRP and Static Route

I don't use HSRP often, but it does fill a very handy gap in certain situations.  I had a strange scenario recently with mulitple HSRP groups and a static route which wouldn't behave.

We have two gateways, both "inside" interfaces have two HSRP groups, so that the client can load-balance a bit more effectively.  Here's a bad picture:


The scenario is actually a bit more complicated than this in reality, but for the rough and ready sake of this post it will do.  Here is R1 and R2 fa 0/1 "inside" configuration:

R1
interface FastEthernet0/1
 ip address 10.1.1.1 255.255.255.0
 ip nat inside
 standby use-bia
 standby 1 ip 10.1.1.254
 standby 1 preempt
 standby 1 track FastEthernet0/0 20
 standby 2 ip 10.1.1.253
 standby 2 priority 99
 standby 2 preempt
 standby 2 track FastEthernet0/0 20

R2
interface FastEthernet0/1
 ip address 10.1.1.2 255.255.255.0
 ip nat inside
 standby use-bia
 standby 1 ip 10.1.1.254
 standby 1 priority 99
 standby 1 preempt
 standby 1 track FastEthernet0/0 20
 standby 2 ip 10.1.1.253
 standby 2 preempt
 standby 2 track FastEthernet0/0 20

So that's fine isn't, we have R1 as "priority" for group 1 and standby for group 2 - vice versa for R2.  Note the "use-bia" command, this is required when, as Cisco say, "controllers in low-end products can only have a single unicast Media Access Control (MAC) address in their address filter. These platforms only permit a single HSRP group, and they change the interface address to the HSRP virtual MAC address when the group becomes active. Load sharing on platforms with this limitation is not possible with HSRP."  Which is exactly what was happening - despite HRSP being completely aware of changed priorities, without use-bia multiple groups did not function.

Anyway, blah, blah.  To the point, if there is one.  Due to a multiple VPN issue where we'd have both gateways establishing a session back to head-office it was decided not to load-balance the subnets on the other side of the VPN, so all clients must use the active group 1 virtual gateway 10.1.1.254 when sending traffic via VPN.  So how to achieve this?  Well, I thought, first I will put a static route on R2 pointing the VPN subnet at the group 1 address:

ip route 10.10.10.0 255.255.255.0 10.1.1.254

This works, but what happens when R2 becomes active for group 1, this static route will then have a next-hop of itself.  What would happen, would it ignore the static route?  It tried it, the static route persisted, though note I could not add additional static routes due to next-hop error (also please ignore the next-hop fa 0/0 on the default route, I wouldn't normally...):

*Jun 19 22:59:54.587: %HSRP-5-STATECHANGE: FastEthernet0/1 Grp 1 state Standby -> Active

R2#show ip route

Gateway of last resort is 0.0.0.0 to network 0.0.0.0

     1.0.0.0/24 is subnetted, 1 subnets
C       1.1.1.0 is directly connected, FastEthernet0/0
     10.0.0.0/24 is subnetted, 2 subnets
S       10.255.255.0 [1/0] via 10.1.1.254
C       10.1.1.0 is directly connected, FastEthernet0/1

S*   0.0.0.0/0 is directly connected, FastEthernet0/0

R2(config)#ip route 4.4.4.4 255.255.255.255 10.1.1.254

%Invalid next hop address (it's this router)

Crucially of course it breaks the routing, this static route kills the path, I really wanted IOS to be clever and ignore the command.  This was the whole point of the post however its not all that great a point I suppose, just a curio.  

So anyway, what to do?  I wanted to resolve this with static routes, but this "faulty" static needs to be removed when R1 outside interface is down, some fairly basic stuff could solve this.  So an IP SLA job first with a static route on R2 for the R1 WAN address:


ip route 1.1.1.1 255.255.255.255 10.1.1.1

ip sla 1
 icmp-echo 1.1.1.1 source-interface FastEthernet0/1
ip sla schedule 1 life forever start-time now
track 1 ip sla 1

Now a new static route for the VPN subnet tracking this SLA job:

ip route 10.255.255.0 255.255.255.0 10.1.1.254 track 1

And this of course worked fine, the static route was added and removed as required.  Now I need to think of better ways to do this, or is this the ultimate?  (Hahaha)



Tuesday, 11 March 2014

SIP-UA Registration Alarm EEM Script Thing

Today I had yet another instance of EEM script saving the day - or at least proposing a solution which might one day save the day.  I desperately want to monitor a sip-ua registration on a Cisco gateway, something was up with one of my gateways and it would spontaneously stop registering.  And not start again, which is deeply not funny when you are on call GMT-time and they are in Australia.  The only solution was to remove reapply the sip-ua config, which turned out to be an IOS bug, I believe.

However going forward and moving on to the brighter, less-buggy future how could we monitor the trunk going down if this happened again?  As it stood we relied on irate punters to sound the alarm - as we all know being forewarned of an impending angry caller is far more preferable to ignorance as you can magic up brilliant sounding diagnostic measures in advance (reboot it).  Was there a MIB to show this status however so we could beat them to it?  I couldn't find one.  Enter EEM.

Step1.  Use event manager to run the “show sip-ua register status” command and run over each line looking for a regex match.  The output would be:

XX-XX-2911#show sip-ua register status
Line                             peer       expires(sec) reg survival P-Associ-URI
================================ ========== ============ === ======== ============
1XX437                           -1         2194         yes normal

So our script could look like this (if like me you have hashed it together from far more intelligent posts than this):

event manager applet trunk
event none
action 100 cli command "en"
action 200 cli command "show sip-ua register status"
action 250 foreach line "$_cli_result" "\n"
action 300  regexp "1XX437(.*) no " $line
action 400  if $_regexp_result eq "1"
action 600   syslog msg "The trunk is down"
action 700  end
action 800 end

Note the white spaces in action 300, unless you want to match the "no" of  "normal" you are going to need them".  I would change the event to a cron timer running every so often, 5 minutes or so should do it.  To test the reverse, say the trunk is up, change action 300 to “yes” and see the result,:

XX-XX-2911#
Mar 12 06:27:20.812 WST: %HA_EM-6-LOG: trunk: The trunk is down ßreally means up


Finally add in an action 601 to send the monitoring solution of your choice a custom trap, if the result is positive, and yes we have an alarm should SIP trunk not be registered.  Happy days.