Woke up this morning to a bunch of the following:
root@foo:/etc/bind# dig @1.2.3.4 foo.example.com
; <<>> DiG 9.6.1-P2 <<>> @1.2.3.4 foo.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 36121
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;;foo.example.com. IN A
;; Query time: 0 msec
;; SERVER: 1.2.3.4#53(1.2.3.4)
;; WHEN: Thu Apr 1 09:57:59 2010
;; MSG SIZE rcvd: 31
Some background on the fictitious "1.2.3.4". It's a slave name server in my nameserver "farm". Technically I have ns1 (being the master) and ns2/ns3. Currently ns1/ns2 are down for maintenance, so I left ns3 at it serving live traffic. That's the point, DNS is supposed to be resilient.
Now the odd part is, "1.2.3.4" was serving requests for example.com just fine for the last 4-5 days. This morning I get a phone call that it's non-responsive. After investigation I see the message you see above, SERVFAIL.
I looked into the zone file and saw the following:
example.com IN SOA ns1.example.com. hostmaster.mail.example.com. (
I wondered if at this point that the nameserver thought it was not authoritative over example.com and adjusted it to the following:
example.com IN SOA ns3.example.com. hostmaster.mail.example.com. (
After that, it started responding again for all authoritative queries for example.com. I have no idea why. I thought these things were supposed to be normalized upon zone transfer from ns1 - ns3?
Can someone please example why this happened and how to prevent it from happening in the future? I've never had a similar problem, and because I don't understand it well, I might be missing some critical information in this question. So please let me know if I can further add any detail to make things clearer as well.
One more thing to note: I have other domains that I'm authoritative for that have their SOA still saying ns1.example.com. and not ns3.example.com. Those domains are serving requests just fine! Is it a matter of time before they stop also and I have to change SOA to ns3.example.com? Is this also only required because ns1 and ns2 are currently offline?