Workaround for broken connection management in Exchange

For legacy reasons (don’t even ask…) we still have an old NLB-based Exchange 2010 mail server farm, with a CASArray consisting of two servers, in front of a DAG cluster at work.

The interesting thing, of course, is when one of the CAS’s fail, Outlook clients don’t automatically start using the other CAS as you’d expect in a sane system. But which Outlook clients didn’t keep working seemed to be somewhat arbitrary.

A couple of minutes with my preferred search engine gave me the tools to show what’s wrong:

Get-Mailboxdatabase | ft Identity, RpcClientAccessServer

Identity RpcClientAccessServer
-------- ---------------------
Mailbox DB05 CAS1.tld
Mailbox DB03 CAS2.tld
...

The above example output shows that each database has a preferred CAS, and explains the apparent arbitrariness of clients refusing to connect to the remaining CAS.

The funny thing is that even after an hour and a half and way after NLB Manager stopped presenting the second CAS in its GUI, Exchange hadn’t understood that one of the members of the CASArray was down. The workaround is to manually tell each datastore to use the healthy CAS:

Set-MailboxDatabase "Mailbox DB03" -RPCClientAccessServer CAS1.tld

Get-Mailboxdatabase | ft Identity, RpcClientAccessServer


Identity RpcClientAccessServer
-------- ---------------------
Mailbox DB05 CAS1.tld
Mailbox DB03 CAS1.tld
...

Fortunately it looks as though modern Exchange solutions with real load balancers in front of them don’t experience this issue.