In the last post, I explained how to setup an anycast address on a loopback interface, and add a static route to the loopback.
Now, it’s entirely possible that you’d like to stop working after the last entry. If this is a publicly facing service, I’d actually recommend you stop there, as a mitigation method against DOS attacks: if one of your anycast servers is taken out (lets say it’s the closest server to Comcast), your router will continue to send traffic to the now-dead service, which would end up preventing the nearby attackers from shifting their attack to the other anycast servers. In other words, a DOS attack could shut down one class of users, while leaving everyone else unaffected—or end up splitting the attack amongst a lot of different servers.
If you did fail over, then you’d setup a cascade failure similar to the one that took out the electrical grid: one server dies, the and the traffic is shifted to the next-closest sever, which dies, and shifts the traffic over to the third server, and so on.
However, if this is an internal service, you probably do want to fail over: when one server is taken down for maintenance or whatever, users are automatically redirected via the network itself to the next-closest server, and if that server is offline for whatever reason, they should connect to the next-closest server, and so on.
In order to do that, you have to figure out a way to have your routing table influenced in some way by your application state.
There are two methods to accomplish this. The first one is to run a routing daemon that can communicate using one of the IGP protocols on your server. So you have something like Quagga running on your server, and it talks to your router, and exchanges routes. You then write a little monitoring script which watches your daemon, and tells quagga to stop announcing a route to your anycast address, or simply shut down quagga, when the daemon goes away.
Meanwhile, your network engineer is off to the side, breathing into a paper bag, because when quagga blows up, he will be the one that has to put your network back together. So, lets look at a slightly less crazy way of controlling routes—one that doesn’t depend on your server behaving in order to handle the case when your server isn’t behaving 🙂
The less crazy way to do this is to use a pair of features that are present in Cisco routers: IP SLA and route tracking. For this I’ll assume you have a pair of servers attached to separate WS-C3750 router/switches (a fairly common 48 port switch which also has IPv4 routing hardware build in) in two different sites, attached by a 10Mbps Ethernet line:
So, let’s assume you’ve already got your static routes to the anycast address, and OSPF setup between your two routers:
ip route 10.10.10.10 255.255.255.255 10.30.30.30 name dns.example.org ! router ospf 10 log-adjacency-changes router-id 10.30.30.0 passive-interface default no passive-interface GigabitEthernet1/0/52 network 10.30.30.0 0.0.0.255 area 0
ip route 10.10.10.10 255.255.255.255 10.20.20.20 name dns.example.org ! router ospf 10 log-adjacency-changes router-id 10.20.20.0 passive-interface default no passive-interface GigabitEthernet1/0/52 network 10.20.20.0 0.0.0.255 area 0
The first thing to do after that is setup redistribution, so each site knows about the other site’s route to the DNS server:
ip access-list standard static-to-ospf-list permit 10.10.10.10 ! route-map static-to-ospf-rmap 10 match ip address static-to-ospf-list ! router ospf 10 log-adjacency-changes router-id 10.30.30.0 redistribute static subnets metric-type 1 route-map static-to-ospf-rmap passive-interface default no passive-interface GigabitEthernet1/0/52 network 10.30.30.0 0.0.0.255 area 0
ip access-list standard static-to-ospf-list permit 10.10.10.10 ! route-map static-to-ospf-rmap 10 match ip address static-to-ospf-list ! router ospf 10 log-adjacency-changes router-id 10.20.20.0 redistribute static subnets metric-type 1 route-map static-to-ospf-rmap passive-interface default no passive-interface GigabitEthernet1/0/52 network 10.20.20.0 0.0.0.255 area 0
The next step is to setup IP SLA on each router so it tests DNS:
ip sla 10 dns example.org name-server 10.30.30.30 frequency 3 ip sla schedule 10 life forever start-time now
ip sla 10 dns example.org name-server 10.20.20.20 frequency 3 ip sla schedule 10 life forever start-time now
So now you’ve got the router trying to perform a DNS lookup for example.org
every 3 seconds at the regular IP address of the server. The last step is to configure the route tables to hook into the SLA monitors:
track 10 ip sla 10 ! ip route 10.10.10.10 255.255.255.255 10.20.20.20 name dns.example.org track 10
track 10 ip sla 10 ! ip route 10.10.10.10 255.255.255.255 10.30.30.30 name dns.example.org track 10
And that’s it. You’ve now configured your routers to watch the DNS server running on each physical interface, and remove the static route to your anycast address from your route tables if it can’t properly query the address record for example.org
. If it does that, the network will automatically shift the traffic to the anycast IP on the other server.
In my next post, I’ll cover some of the finer points of when and where you may want to use Anycast services:
An excellent addition to this would be to use IP SLA to monitor the anycast address at each location in addition to the unicast address; then tie /that/ SLA instance into your monitoring & alerting system.
(Else you have no way of knowing that the customer-facing interface is actually working – any number of issues can make the unicast address test OK and yet fail to work with the anycast address).
@mibus: I don’t believe you could do that without also doing PBR to force router-originated traffic to the anycast IP towards the local server (though I’m open to alternative suggestions). Depending on what you’re doing with your anycast service, that could get real ugly, real fast (punting a 2GB netboot/install on a 3750? No thanks :-)).
Ah, but that’s why it’s for alerting, not routing. Then you don’t care that it goes to a local service, as long as it goes to one that works. Not awesome-automatic failover, but something extra above just unicast monitoring.
@mibus: You’re misunderstanding: IP-SLA cannot know anything about the loopback interface directly, because the router needs an entry in it’s table to send packets for the loopback address to the local server. Unfortunately, if it has that route, it will route all traffic there (breaking failover and defeating the purpose). The only way around that are to use policy routing or theoretically, a crazy NAT setup.
It doesn’t need to. I’m talking about using IP SLA to monitor the customer experience at each PoP. (else, how do you ensure that the customer’s DNS is working? Your monitoring server can only see the closest anycast node by that IP).
If a node is failed and the route is removed, this other check will continue to pass (by design), because it’s testing the customer’s experience.
If you don’t test the customer experience at each PoP, you can run in to issues where it hasn’t failed over because the unicast queries work, but anycast ones don’t (for any reason).
(I should have noted before, I look after a DNS anycast ‘cluster’ for an ISP)
mibus: Ah, I see what you’re going for: a second SLA that points at the anycast IP to test DNS—i.e. using SLA for it’s obvious/intended purpose 🙂
I thought you meant always testing the local anycast IP and failing over in that case as well.
No; as you said that’s utterly impractical. But using IP SLA for.. SLA monitoring.. is not to be forgotten. (Remote outages being detected by customer complaints isn’t how you want to find that stuff out, when there’s such a good failover mechanism in place!).