1

I'm experiencing an odd resolution issue on my DNS servers. I have a couple decades experience with administering Windows DNS servers, but have less than a year's experience administering Ubuntu/BIND 9 servers. A little background on my environment:

I work for a small service provider and administer three Ubuntu/Bind 9 servers. They are configured as a Master and two Slaves. All three servers are configured with Private IP addresses on a VLAN reserved for servers, with Static NATs on our firewall to the slaves. The Slaves are accessible from both our internal network and the Public internet, but Public recursion is limited to the IP subnets we host. The Master only allows access from the two Slaves and from our internal Managment VLAN. The Master and Slave2 are Ubuntu 12.04 running BIND 9.8.1-P1. Slave1 is an older system, scheduled for replacement, running Ubuntu 9.04 and BIND 9.8.1-P1. I am seeing the same problem behavior on both Slaves. I built the Master and Slave2, and inherited Slave1 from a previous admin.

Here's the problem: If I do a NSLOOKUP from a system on one of our hosted IP subnets for office365.com., I get a successful resolution. If I try to resolve outlook.office365.com., I get the following error:

***UnKnown can't find outlook.office365.com.: Unspecified error

I can successfully resolve, through NSLOOKUP, both of those URLs from a system on the server VLAN and from the console of both slave servers. This problem was reported to me by a client who stated that he's seen this issue on a handful of URLs, but outlook.office365.com is the only one he could specifically remember. I've tried a number of other URLs and they all resolve successfully. I can only replicate the issue with that one URL. (Hopefully the client will remember some more.)

I setup a query.log, based on an article I found on this site, and see the request come in regardless of where it originates.

Example:
client MYIPADDRESS#1067: query: outlook.office365.com IN A + (BINDSERVERIPADDRESS)

If I change my DNS server to 8.8.8.8 or 4.2.2.2, it resolves correctly; adding both of those as forwarders on my Bind servers doesn't fix the problem. I've checked my syslog, but see no entries regarding that query that could offer clues. I also tried allowing recursion from "any", but same issue. I've also reviewed our firewall rule set and don't see anything that could account for this. It seems that if an access-list was the problem, no DNS queries would work. Anyone have any ideas? Is there a way to log a reason for a query failure?

Johann
  • 11
  • 1
  • Are you willing to share any of your bind configuration? It would really help to pinpoint your problem. – earthmeLon Jan 06 '14 at 20:53
  • Sure. My named.conf.local is kind of long. Are there any specific files, or portions of files that you're looking for? I don't want to bombard you with a bunch of extraneous info. – Johann Jan 06 '14 at 21:04
  • just add the info as a link to pastebin.com – virtualxtc Jan 06 '14 at 21:10
  • I'm heading to an appointment and want to edit out site-specific info in the config files. I will have everything posted up in the morning and will update this thread. I will post my named.conf.local and my named.conf.options. Is there anything else that would be helpful? Thank you for your help! – Johann Jan 06 '14 at 21:28
  • Here's a link to my named.conf.options file. I'm redacting identifying info from my named.conf.local file and will have it up shortly. http://pastebin.com/8Pf7fJWq – Johann Jan 07 '14 at 15:08
  • And here's my named.conf.local: http://pastebin.com/GXKUJ0j3 There are 2 "NOTE" entries in it offering further explanation/clarification. Obviously, those notes aren't in the production file. – Johann Jan 07 '14 at 15:22
  • One more note: I've been at my current employer for a little over a year. We have over 150 clients, most of which use our DNS servers. This is the first, and only, complaint I've heard like this, so I think the number of URLs that have this problem constitute a fraction of a percent of all requests. – Johann Jan 07 '14 at 15:33
  • UPDATE: OK, this is weird. I've been using an old Windows XP system for testing and have been able to consistently duplicate the issues I've outlined above on two separate Public DNS subnets. On a whim (desperately grasping at straws actually) I tried disabling "NetBIOS over TCP/IP"... and resolution for that URL in NSLOOKUP now works flawlessly, every time. I re-enabled the setting and the problem came back. Disabled it again and flawless operation. So now I'm wondering if this is a Windows issue and not a problem with my DNS, but why that SPECIFIC URL? – Johann Jan 10 '14 at 20:00

0 Answers0