Monthly Archives: August 2012

hands up who’s got an all numerical domain suffix!

By “all numerical domain suffix”, i mean something like 1234.com, or server.567.pants.local. Does this mean you? Got exchange 2010? Wanna buy a DAG?

Cue maniacal laughter.

If you have an all numerical domain suffix, and you try to implement Database Availability Groups in Exchange 2010, you will probably experience a fail:

“blahblahblah.1234.com is not an acceptable witness server name. You must specify a host name or fully-qualifed domain name for the witness server.
Parameter name: FileShareWitnessServerName”.

And in the log there’ll be something similar to this:

[2012-08-13T010:38:27] WriteError! Exception = Microsoft.Exchange.Management.Tasks.DagFswUnableToParseWitnessServerNameException: The task was unable to use the specified witness server. Error: “blahblahblah.1234.com” is not an acceptable witness server name. You must specify a host name or fully-qualifed domain name for the witness server.
Parameter name: FileShareWitnessServerName
at Microsoft.Exchange.Management.Common.FileShareWitness.Initialize()
at Microsoft.Exchange.Management.SystemConfigurationTasks.AddDatabaseAvailabilityGroupServer.CheckFswSettings()

Ouch. Then you’ll get all distressed and google the error, and you’ll probably find that there is next to nothing on it, because no-one uses all numeric domain suffixes (suffices?). There’s a couple of threads on technet, which are confusing, and that’s your lot, pretty much.

Until June, anyway. Scott Schnoll did a marvellous piece at TechEd 2012 which includes an explanation of the error (it’s a bug…), when it will be fixed (no date as yet…) and how to fix it (use a different server). Importantly, he also mentioned how NOT to fix it (CNAME). The bit you’re after is about 55 minutes in. You’ll probably want to listen to the section that follows it carefully as well, as it explains a far more common problem that you will probably hit once you’ve got a machine that you can use as a FSW.

Mr Schnoll:

http://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/EXL305

Other ways to fix it that don’t work:

Editing hosts files to point spurious FQDN to real IP address

The two methods we came up with:

1 – The one we used:

  1. Created a new server (DAGWitness)  and added to the 1234.comdomain
  2. Changed the FQDN of the DAGWitness to dag.local
  3. Added the Exchange Trusted Subsystem group to DAGWitness\Administrators group
  4. Added <ip addresses of DAGWitness>  DAGWitness.dag.local to the mailbox servers hosts files.
  5. Unchecked Register this connection’s address in DNS in the IPv4 properties (may have to delete DNS record if it has been created)
  6. Disabled the Windows Firewall on the DAGWitness server.
  7. Created DAG.
  8. Added DAG members.

2 – My solution. I suspect this is far more robust;

  1. Create a new server in a workgroup
  2. Install DNS services
  3. Create a new domain with forwarders for AD
  4. Create forwarders for new domain in AD
  5. Fritz permissions on the new server so ETS can create the share
  6. Create DAG

will it work? who knows – I have suspicions about the first method, and manually editing hosts files always sends a shiver down my spine. What happens in five years time when everyone who knows about the entry is gone? However it got the DAG installed, and, as this is a three-node DAG, I’m not sure anyone cares too much about the FSW anyway. Not until the primary site fails, anyway…

At what point can I blame the storage?

I quite often find myself wishing a problem would go away. I get desperate to hand off the whole thing to, say, a SAN engineer and be done with it. At what point then is it possible, when troubleshooting poor storage latency,  to do this?

Say i’m looking at some pretty shoddy secs/write on a LUN. I can say to the storage guy “i don’t like your storage”. He then tells me that the secs/transaction he sees are fine. Is there any way to get just a little bit more info out of the box to narrow down the problem (or slope my shoulders just a little bit more?)?

yes there is.

http://blogs.msdn.com/b/ntdebugging/archive/2010/04/22/etw-storport.aspx

As far as i can make out, if you follow that article you can get a measure of how long it takes the storport driver to get a response from EVERYTHING ELSE. So, if you’re seeing 50ms/read in the perfmon, but it never goes above 30ms to service a request, you might want to make sure you’ve got the latest version of storport on your system. Conversely if (like me) you’re seeing 500ms request times in storport, you get together with Mr Storage and go yell at the network elves.

Dear God, why is the Management Console so slow?

I’ve got an Exchange 2010 lab. (Obviously i’ve got an Exchange 2007 lab, and a 2003 lab as well. i also have an exchange 2000 lab and a 5.5 lab, but last year i got rid of my 5.0 lab. i felt a sense of accomplishment.)
My Exchange 2010 lab is on VMWare, and it’s isolated from the internet because we don’t have that many IPv4 addresses to go around. if only someone would come up with a new form of IP addressing. When i open the console or the shell it is so sloooow. so very, very slow.
Last week, for a different problem, i had cause to look at this article:
http://support.microsoft.com/kb/2469863
which says that a failure to contact crl.microsoft.com will cause Exchange Search to timeout and silently fail. it’s not just Search, though that tries to contact that URL. lots of .NET stuff sends windows there to validate origin; including, it appears, the Management Console.
so i added 127.0.0.1 crl.microsoft.com to my hosts file ,and bingo! no more hanging around for five minutes waiting fro the Console to start.