Exchange 2010 doesn’t like Rainbow Trout.

I’ve just finished dealing with a case where after applying service pack 2 one of my customers started experiencing frequent mass disconnections. When they examined the event logs they could see that the RPCClientAccessService was frequently crashing with event ID 4999 and a text similar to:

M.E.RpcClientAccess.Service, M.E.D.S.M.F.Enumerator.MoveNext, System.InvalidOperationException,

They are using Riverbed Steelhead WAN optimizers to reduce their bandwidth requirement between branch offices and the main datacentres that house the Exchange servers. Riverbed have an knowledgebase article that explains that whil many things may cause the CAS to experience 4999 events, one of the things may be the riverbed corrupting RPC/MAPI calls. In this instance the way to tell is by running

Protocol mapi skip-copy enable

on the client and server side riverbed devices. This will disable acceleration for a specific RPC operation type (EcDoRPC Remote OpCode 0x4D – FXSrcCopyTo) of a specific Microsoft RPC Operation (RPC OpNum 0x2 – EcDoRpc), which is used by some Microsoft Outlook plug-ins. It doesn’t disable ALL the MAPI optimisation. It goes on to helpfully reiterate that there are many things that may cause this corruption (although I can’t think of any off the top of my head), and that if enabling this command *doesn’t* fix the problem, then you should log a call with PSS. I would link to the article, but I can’t, so you’ll have to ask them for it yourselves – it’s variously recorded as S18158 and S15635, and it’s called Event ID 4999 occurs on exchange CAS server.

So far, so what? This affects a veeery small proportion of people. Why bother writing about it? Because, dear reader, it highlights a much wider problem. Vendors are not like the Avengers, hanging round on some hovering sky platform with Samuel L Hackson, oiling their muscles. They don’t talk to each other that much, mostly, and when they do it’s like cats in a bag. Microsoft do NOT test Exchange with third party products – it’s up to the third party vendor to join the Technology Adoption Programme, get hold of beta and pre-RTM software and make sure their stuff works as expected. If the vendor can’t or won’t do it – WE are responsible for the service we provide to the customer. WE are responsible for making sure that the stuff we provide and maintain continues to work when we patch it – that might mean we have to ask vendors such as Symantec what effect going to a particular patch level will have on Enterprise Vault, say, or it might mean we just have to bite the bullet, build a lab, and test stuff for ourselves (incidentally, this is a really good reason to keep it simple, stupid). OK, so we just sit on a level that we know works, then? NO, because sooner or later we will run out of vendor support. Microsoft stop supporting SP2 for Exchange 2010 in less than 6 months. After that time anyone wanting to log a call will be told they should be on SP3. And do you know what – they’re right. We should be on SP3*.

We are the guardians of complex heterogeneous environments. It’s time to man up.

*The good news is that once we’re on SP3, that’s likely the last service pack for 2010 – no more major upgrades until 2010 goes end-of-life in 2020. Lovely. Get the deck chairs out.

503 5.0.0 polite people say HELO