Tag Archives: exchange 2010

Intersting things that i have seen on the internet, october 3rd

Right, this is hopefully a little more timely than the last one. Those of you sitting an MCP exam at a Pearson VUE testing centre may notice that the interface is slightly different. Hopefully this will help you feel good about your exam. Probably not as good as passing the thing, though.

 

Exchange Troubleshooting:

Short and sweet – how to check the autodiscover SRV record using NSLookup, from Rhoderick Milne

Having problems accessing automapped mailboxes in Exchange 2010 recently? Have a look at this article.

New! Exciting! An exchange 2013 CU6 bug design feature! Load balancer marks Exchange server as down in an Exchange Server 2013 Cumulative Update 6 environment.

Can’t create an Exchange 2013 public folder mailbox? “An existing Public Folder deployment has been detected” error when you try to create a public folder mailbox in Exchange Server 2013.

 

Exchange General:

Good news, bad news. Good news; my friend Justin Harris has earned a “2014 Microsoft Exchange Server MVP” Award. Thoroughly deserved! Congratulations Justin. He does an excellent podcast with Larry Novak, a great Exchange engineer at Microsoft.

Bad news; Microsoft layoff senior technical writers. I can’t see how this is possibly a good thing. The writers in question produce some of the most authoritative and in-depth articles on exchange available. Exchange 2010 has been really well documented, Exchange 2013 less so (where is all the performance monitoring stuff, for instance?) – it looks like 2015 will be barely detailed.

 

Core General:

Some interesting and useful information from the Defrag show on the latest and greatest Microsoft product; not Sway, not windows 10, but Minecraft. I can’t begin to tell you how excited my boys are that their father is now officially a minecraft support engineer. There’s also some stuff on Windows perfmon counters for HDDs, and yes, some stuff on windows 10. Who cares? “It looks like you’re building an underground labyrinth filled with zombies. Would you like help?”

minecraftclippy

Clippy has his own Realm, where his army of countless slaves build mile high idols in his image.

Also very exciting (if you support stuff) is Mark Russinovich discussing SysMon on the defrag tool show. Also his latest novel, Rogue Code.

Keith Mayer has advice on using Azure to look at the windows 10 technical preview here, if that’s what floats your boat.

The recommended hotfixes for 2008 R2 clusters article has been updated.

 

Office 365:

Free/busy lookups between Exchange Online and on-premises users stop working after you set up OAuth authentication. You’ll need (it says) to have a mix of Ex2k13 and Ex2k10 on prem, so hopefully it won’t be a problem, but it’s worth checking for if you see any issues with free/busy and OAuth.

The EHLO blog has an article on the new bulk email feature in Exchange Online Protection. If you’re at all interested in how Microsoft handles your spam, then you may enjoy the linked video – “How does Microsoft handle my spam?

I know there have been a few comments about the wealth of material available for the office 365 exams. Just when you think you can’t possibly fit any more in, along comes the official Microsoft Learning Study Group for MCSA : Office 365.

Once again, Microsoft would like to reassure us all that Office 365 does not mean that we will all lose our jobs. Ummm. More kool-aid here. Strangely, they never wrote a part 2.

How to enable a debug trace for the Microsoft Online Services Sign-in Assistant, but not how to analyze it. Sigh.

Troubleshoot single sign-on setup issues in Office 365, Windows Intune, or Azure.

Refreshed advice on Using WAN Optimization Controller devices with Office 365. That’s those riverbed steelhead things… Tricksy.

Mixing Office 2007 and Office 365 causes problems editing Office documents in OWA.

Troubleshooting Lync Online DNS configuration issues in Office 365.

Troubleshooting *more* sign-in issues in Office 365, Azure or Windows Intune.

A List of Attributes that are Synced by the Windows Azure Active Directory Sync Tool.

 

Lync:

Having problems with Lync after migrating your users to office 365? Richard Brynteson explains how to force lync to autodiscover again.

Troubleshooting Lync Sign in issues. Not new, but good. Plus it was linked to in this excellent article that explains a bit about the troubleshooting process…

 

And finally, those crazy cats at MSL have another video for you. If you can bear it, see super sigma and psychomagician explain how online proctored exams work. What, girls? No hats?

 

That’s it for now. Keep an eye out for that creeper.

Interesting things that i see on the internet, 19th May

I know, so soon? These things are always too long, so I’m going to try and get them out more frequently, so people don’t give up after the first four items.

I also wanted to share with you an interesting post on being a career hermit crab. In among all the good advice Ashley has for those who are both technically able and hate chasing other people for their timesheets there are two things that really stand out for me:

  1. Learn to code
  2. Learn powershell DSC

If you’ve not considered the former, and you’ve never heard of the latter, you might want to explore the possibilities.

Anyway, on to the meat. This is what you come here for, right?

 

Exchange Design:

what BDMs and architects need to know about Exchange Online and Exchange Server deployments”. In a poster. I assume that’s not *everything* architects need to know. Good work Microsoft, on reinforcing stereotypes… 😀

Paul Robichaux has written a post about running Exchange on Azure, and why it is a bad thing.

 

Exchange Troubleshooting:

A 1 hour webcast on troubleshooting activesync. It’s due on May 20th, but will be available for download shortly after, if you can’t make it.

 

Exchange General:

Ross Smith IV has published an article on the upcoming changes in OAB that we can expect in Ex2013 CU5. When’s CU5 out, btw, nick? No official date yet, but I’d put money on May 27th at the earliest. Will there be an exchange 2010 ru? I hope so…

Michel de Rooij, UC architect and MVP, has a slide deck here on the things he found interesting and useful at MEC.

Steve Goodman (another UC architect and MVP) has written a good explanation on his blog about why it’s not worth spending a ton of money on storage.  He’s also posted the slides from the recent Office 365 UK Midlands User group meeting if you’re interested.

 

Core General:

There have been a whole load of hotfixes published just recently for windows 2008 and 2012. Some highlights include the ability to use a range of ports for the udp comms in a failover cluster, instead of just port 3343, long certificate authority hostnames, A memory leak in Network Store Interface Service, a web client service cookie fix, XML errors due to Audit Event 4661, an interesting CRL related hotfix that requires careful thinking about before applying, NetLogon 3210 events, stop 50 errors in remote desktop sessions, yet another fix for multiple authentication prompt problems, iSCSI stress testing causes your computer to give up, new HBAs cause windows 2008r2 to crash, system state backups fail, Pass the Hash vulnerability, group policy preferences allow elevated privilege attack, MS14-027: Vulnerability in Windows shell handler could allow elevation of privilege, retrieval of paged results is interrupted when an LDAP server receives queries that generate many results and finally  Vulnerabilities in iSCSI could allow denial of service. Phew.

A marvellous article on cleaning up the winSxS directory by charity Shelburne on the AskPFE blog. I’m sure I’m not the only one who has computer semiliterate friends and relatives who ask “why can’t I just delete it? it’s got 7 drokking gig!”

Channel 9 are far quicker at getting the teched videos up than they were with lync or mec. Sigh. </poor relations>. Here’s the keynote, one on cloud for it professionals featuring a man in a hat, indoors, (!) and an actually decent one on powershell with Don Jones. I’ve not watched the keynote, as it’s two hours long.

 

Office 365:

A windows hotfix to address an apparent office 365 problem; Outlook may take two to three minutes to connect to an Office 365 mailbox.

This article has big pictures and friendly colours. I find this helpful. Choosing a sign-in model for Office 365.

MSExchange.org has just started a three part series on multifactor authN for Office 365. This is part 1. Parts two and three over the next couple of weeks, I expect.

Tony Redmond has a nice oped piece comparing gmail and office 365, here. Lots of good links in there, too.

Your cluster log is TINY, and the source of much amusement.

I quite often get calls logged asking for help understanding why the active copy of a DAG database moves from one server to another. There can be a number of reasons for this, not all of them particularly well recorded in the event logs – a favourite is the DAG networks not being collapsed when they span sites, and therefore different subnets, but that’s not what I wanted to write about.

Quite often, the best way to understand what happened is to go through the failover cluster log – if you’ve not looked at this log before, I urge you to try it, particularly if you suffer from insomnia. In Windows 2008 r2 you can have a look at it by running get-clusterlog –destination <location> in powershell.

A normal cluster log would look something like this:

000016c0.0000162c::2014/03/12-12:15:15.892 INFO  [GUM] Node 2: Processing RequestLock 4:689542
000016c0.00003dcc::2014/03/12-12:15:15.892 INFO  [GUM] Node 2: Processing GrantLock to 4 (sent by 1 gumid: 6354235)
000016c0.000015b4::2014/03/12-12:15:23.192 INFO  [GUM] Node 2: Processing RequestLock 2:144215
000016c0.0000162c::2014/03/12-12:15:23.192 INFO  [GUM] Node 2: Processing GrantLock to 2 (sent by 4 gumid: 6354236)

With a couple of events every few seconds. At this rate of generation, the default log size of 100MB is usually enough for about 24 hours worth of events. However, say you have a problem (like DAG networks not being collapsed correctly, as below*)? Then your log may look more like this:

000018bc.00001998::2014/02/13-11:53:54.854 DBG   [NETFTAPI] Signaled NetftRemoteUnreachable  event, local address xxx.xxx.41.xxx:003853 remote address xxx.xxx.141.xxx:003853
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] got event: Remote endpoint xxx.xxx.141.xxx:~3343~ unreachable from xxx.xxx.41.xxx:~3343~
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] Marking Route from xxx.xxx.41.xxx:~3343~ to xxx.xxx.141.xxx:~3343~ as down
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [NDP] Checking to see if all routes for route (virtual) local fe80::b8ac:d730:1392:4e4d:~0~ to remote fe80::698d:34a4:a5c9:2e77:~0~ are down
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [NDP] Route local xxx.xxx.201.xxx:~3343~ to remote xxx.xxx.202.xxx:~3343~ is up
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] Adding information for route Route from local xxx.xxx.41.xxx:~0~ to remote xxx.xxx.141.xxx:~0~, status: true, attributes: 0
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] Adding information for route Route from local xxx.xxx.41.xxx:~0~ to remote xxx.xxx.141.xxx:~0~, status: false, attributes: 0
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] Sending connectivity report to leader (node 2): <class mscs::InterfaceReport>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO    <fromInterface>d8430531-25e6-4749-8b1d-2bf5f06da430</fromInterface>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO    <upInterfaces><vector len='2'>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO      <item>d8430531-25e6-4749-8b1d-2bf5f06da430</item>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO      <item>62a2fefa-9b12-436d-a270-fec45ee86d23</item>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  </vector>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  </upInterfaces>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO    <downInterfaces><vector len='1'>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO      <item>c16aa803-1446-41d0-8b1f-338a6093ec37</item>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  </vector>
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  </downInterfaces>

As you can see, the rate of entry generation has increased dramatically. In this particular example the default log size of 100mb covers approximately fifteen MINUTES. It would be a good idea, then, to increase the cluster log size from the default of 100MB to a larger number. 400MB is quoted in some of the literature, although not particularly strongly. The best article on this suggests 72 hours of log data should be kept, however in my experience the maximum log size of 1gb can sometimes only hold 12 hours of data. This is the best article, by the way. It also contains instructions for setting the cluster log size in Windows 2008. For 2008 r2, use set-clusterlog –size 1024

But nick, I can’t run get-clusterlog?

You need to import the failover clustering module

Start powershell as an administrator

Run import-module failoverclusters

And bob’s your uncle.

Oh, an how do I know that DAG networks aren’t collapsed? Well, first of all I can see there is a problem replicating across the nominated repl network:

000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] got event: Remote endpoint xxx.xxx.141.xxx:~3343~ unreachable from xxx.xxx.41.xxx:~3343~
000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [IM] Marking Route from xxx.xxx.41.xxx:~3343~ to xxx.xxx.141.xxx:~3343~ as down

The cluster then checks that all possible paths are down:

000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [NDP] Checking to see if all routes for route (virtual) local fe80::b8ac:d730:1392:4e4d:~0~ to remote fe80::698d:34a4:a5c9:2e77:~0~ are down

It is thrilled to see it can get there along another network:

000018bc.0000199c::2014/02/13-11:53:54.854 INFO  [NDP] Route local xxx.xxx.201.xxx:~3343~ to remote xxx.xxx.202.xxx:~3343~ is up

If we run get-databaseavailabilitygroupnetwork then we can see there are 6 networks  for this DAG, which is four too many. The six networks are two MAPI networks (one for each subnet, one subnet per physical AD site), which need collapsing, two replication networks which also need collapsing and two backup networks which need to be excluded from the DAG altogether. For more on sorting your DAG networks out, please see this article from Tim McMichael.

interesting things i see on the internet – 27/01/2014

first of all, you should all be planning your SP3 upgrades, if you haven’t started already. MSExchange.org are starting a new series this week on this very topic, so as well as reading my earlier blog post on this, you should read their article as well.

Exchange design:

Here are some nice test lab guides/posters on cross product solutions with exchange, lync and sharepoint, and here’s a brief (very brief) article on setting up an exchange 2013 lab from Steve Goodman.

Exchange troubleshooting:

I’ve seen a wonderful script for troubleshooting unexpected database growth. This script will snapshot a database and compare it to previous snapshots, and then tell which mailbox is growing, by how much and how many items. Like using exmon, but about a million time easier. I heartily recommend that everyone has a good play with this, so that when you come to use it in anger (and you probably will), you know exactly what you’re doing with it.

This looks like it may save some of you some pain in the near future; the right way to create additional receive connectors in Exchange 2013.

An old post, but an interesting one – do you have sleepy NICs? A common cause of databases moving around unexpectedly in a DAG. We’ve got a couple of customers experiencing this, and we’ve checked that this isn’t the case, but it would be great if people would check again. This is one for the best practices document, i think.

The Romanian exchange support engineers have suddenly become active on their blog, after years of very occasional posting. There’s a couple of pretty detailed posts covering some interesting troubleshooting issues up there at the moment. The mailflow troubleshooting guide is good, if brief.

Rhoderick Milne has a great post on mailbox quarantine that’s got some great hands on advice on configuration, which may explain to you why some users cannot reach their mailboxes.

Exchange general:

Here’s an interesting article on the new restrictions on upgrading the database schema in Exchange 2013. Note that’s the DATABASE SCHEMA, not the DIRECTORY SCHEMA. In exchange 2010 the database schema of each dag member upgraded as soon as the service pack/rollup was applied, making it tricky to then move databases around until all the nodes were on the correct version. In 2013 the schema won’t update until all members of the dag are at the correct software level to support the updated version. Why is this good? Because it means we are less likely to get in a situation where nodes get stuck on the wrong version and are unable to support an active database. It does happen. Twice, in my experience – 2007 was particularly prone to it.

As usual Tony Redmond has a batch of interesting posts;  how exchange 2013 measures and monitors server healthten predictions for exchange in 2014, calcualting client access licenses, service packs and cumulative updates, and the reappearance of powershell command logging. yay. There’s a lot of other posts there, all of which are great. You should read them.

Here’s a great way to start reporting message flow statistics – how many messages are generated, NDRs and so on, using the ExLogAnalyzer.

The Redmond Interoperability Plugfest 2013 has a video on MAPIHttp, the replacement for RPC over HTTP which i mentioned in the last mail. There’s also one on exchange 2013 protocols and one on outlook 2013 protocols. They’re not long, which is probably just as well. Soo… sleeepy…

Msexchange.org are starting a new series on monitoring exchange 2013 with scom 2012. Given that we are sooo up on monitoring exchange 2010 with SCOM 2007 r2, we should probably start reading this now, yes?

They’ve also got a new series on transport high availability in exchange 2013, which looks like it may be useful  – shadow redundancy and safety are in there, somewhere.

Here’s a really nice script for automating exchange mailbox audit logging. Remember to keep an eye on your disk space. What does the mailbox audit log contain? Who accessed a mailbox, what was deleted, if mail was sent using a “send as” permission and lots more. Of course you want to keep this information.

Tony Redmond (again?) has published an article on managing activesync partnerships for multiple devices on his personal blog.

Core general:

Perfmon incorrectly calculates disk latency in windows server 2008. If we don’t apply this hotfix, then we can’t trust what perfmon is telling us; given that disk latency is a major cause of poor user experience, you really need to get this installed.

MS have published a complete and updated list on microsoft product virtualization – what is supported and what is not, here.

there is a new MATS tool released for analysing the storage stack on windows 2012 and 2008. What’s MATS?

It’s not great, yet, but it’s highly promising – the solutions node in technet. How-to guides for all things, eventually. In the meantime, single sign on federation in hybrid environments…

Office 365:

Ali Larter demonstrates how office 365 stops her from trashing banks, cop cars, hotel rooms and so on. Save the cheerleader an that.

MS is developing a series of Test Lab Guides on Hybrid solutions – the Office 365 trial subscription guide is discussed here. Is it great? Well no, but it’s just part of the whole hybrid stack of test lab guides – see also the solutions stuff above – if MS manage to pull this off, it’ll be awesome. If they don’t, well, hopefully it will have given you some ideas. Here is the first step on the stack, the windows 2012 configuration test lab for public cloud technologies.

The EHLO blog has some useful, if basic, guided walkthroughs on mailbox and folder sharing scenarios. It’s Nino Bilic – if he thinks it’s worth writing about, it’s probably worth you reading about it.

it’s windows azure jump start week at the Microsoft virtual academy. live video from 8pm til midnight every night from now until Friday. or you can wait a week or so until the recordings get posted.

MS have published a useful collection of KB articles on troubleshooting common Office 365 issues.

and that’s it. my inbox is now clear. time to move to Asana. not.

Oh no, I’ve got a cert that’s about to expire!

cert001The scenario – you know that your cert is about to expire, you’ve bought a new cert, and you’ve installed it correctly, but you’re still getting an error. Event id 12017, to be exact.

I run

 get-exchangecertificate | fl

which lists the certificates I’ve got installed on this CAS:

cert01

There are two certs there – the first listed cert has been issued by an internal certification authority – the “issuer” value is cert.fabrikam.com – and is NOT self signed. I use it for IIS (owa, ecp, outlook anywhere, EAS). The second cert is the original self-signed cert that exchange produces when it’s installed on a server (the issuer is red-cas1, the local server name) and it’s only valid for imap and pop – two services I don’t bother configuring because none of our customers use them.

so lets get all the services onto a certificate that’s not about to expire. I  run

enable-exchangecertificate –thumbprint <thumbprint> -services pop,imap

using the thumbprint of the good certificate. <tip – to copy thumbprints and the like, right click in the powershell window, select mark, then highlight the thumbprint and click enter. Then when you are ready to type the thumbprint, right click anywhere again and hit “paste”>

Now if I run

get-exchangecertificate | fl

again I can see that all the services are now on my new cert, and the old self signed cert is doing nothing:

cert02

So now there are no services on the old cert, I can remove it using remove-exchangecertificate –t <thumbprint>

cert03

Has it gone? Hell yeah.

cert04

Now – here’s a word of warning. It’s perfectly possible to remove a certificate you are using. You get asked are you sure, and if you say “yes”, well… you’re the boss:

cert05

And you’ll need to reimport your cert.

cert06

In this cmdlet I import an certificate from a password protected .pfx file, which I created by exporting a certificate i’d previously requested on another server – this allows me to use the same certificate on a number of servers (for the same thing, obviously).

cert07

I then enable the cert for the required services – the first cmdlet (get-exchangecertificate) gets all the exchange certs on the server, I then run a couple of select cmdlets to create an array of psobjects which only contains the subjects and thumbprints of the original objects where the subject is like mail.fabrikam.com (select thumbprint, subject | ?{$_.subject –like “*mail.fabrikam.com*”}) – I then get the thumbprint of that psobject (select thumbprint)and pass it to the final cmdlet, which enables the chosen cert for iis, imap and pop (enable-exchangecertificate –services iis,imap,pop)

Finally I run get-exchangecertificate | fl  again to make sure that it has taken and the correct services are enabled.

I’d be a pretty sorry feller if I didn’t then run iisreset /restart at some point, to get iis to pick up the new cert.

Best practices around Mobile Devices and Exchange 2010

Introduction

Mobile devices have moved on since they first became popular ten years ago, and almost all access to Exchange mailboxes is via Exchange ActiveSync these days.  This is a protocol that allows mobile devices such as smartphones and tablets to connect to Exchange. Depending on the device software, ActiveSync is capable of mail, calendar and task connectivity, and also varying levels of device management, including “remote wipe” capability. There is a really good primer on ActiveSync here; don’t be put off because it’s for programmers – there’s a wealth of explanation and troubleshooting information further down.

Almost all mobile devices, including the latest Blackberry 10 devices and smartphones running the Jolla operating system now come with an implementation of ActiveSync, or can have a third party ActiveSync client installed such as NitroDesk.

Challenges

This presents a real challenge for Exchange. Different manufacturers implement ActiveSync in different ways, supporting a varying number of features. While ActiveSync is a Microsoft protocol, Microsoft have no control over how a third party manufacturer might choose to implement it, and so troubleshooting ActiveSync problems can become very tricky indeed. Before blaming Exchange for any issues that are experienced with third party devices, it’s always a good idea to have a good read of this article:

Current issues with Microsoft Exchange ActiveSync and third party devices.

I find that this is kept up to date and includes information on the latest versions on phone software. It is worth paying close attention to where the support boundaries lie, as well.

There is a reasonable table of implementations here, which lists which operating systems support which features. Unfortunately it doesn’t cover Exchange 2010 SP3 or Exchange 2013, but I guess we can’t have everything.

I cannot stress enough that new versions of OS software often break exchange – iOS 6.1, issues with iOS 7 and android have all had problems, some of them very serious. If you don’t block novel devices as a matter of course, then you really must keep an eye on the article above. Here it is again, just in case:  Current issues with Microsoft Exchange ActiveSync and third party devices.

Solutions

So, how can we make things better? Three areas require optimisation and management:

Users

By default, all users are enabled for ActiveSync access. this will probably not be suitable for your environment. disable users by running

set-casmailbox <username> -activesyncenabled $false

Check a user’s status by running

get-casmailbox <username>

 You may see reference to running

get-casmailbox | set-casmailbox –activesyncenabled $false

on a regular basis or a schedule to ensure that all users in an organisation are disabled, then running cmdlets to re-enable on a group by group or user basis. Alternatively, by using the cmdlet extension agent (http://www.ehloworld.com/194) and this handy script (http://www.flobee.net/automatically-disable-activesync-for-new-mailboxes-in-exchange-2010) we can provision new users with ActiveSync already disabled. You can fine-tune individual users or groups through the application of policies, below.

Servers

By default, client access servers are enabled for ActiveSync, and SSL and basic authentication are set. It may be necessary to configure other forms of authentication such as RSA two-factor to suitably protect access to the environment. There is more information on this here.

Devices

By default, if a user is enabled for active sync (and by default they are) they can configure any device that supports ActiveSync to synchronise with the exchange server. It is just as easy to connect a cheap Chinese clone phone as it is a top of the range iOS device, although arguably no more harmful. It is therefore essential that mailbox policies are configured to properly control access to the exchange server. As noted above, there are many implementations of the ActiveSync protocol that pay little regard to the performance of the server and the impact the device may have on other users.

There are three main ways to manage these; mailbox policies, throttling and ABQ.

Mailbox Policies

As noted above users and devices require mailbox policies to be set correctly to control access and use of the exchange environment. This is a good article to start from to understand mailbox policies, but remember that not all devices will implement all settings. Mailbox policies allow you to set things like minimum password length and complexity for the device, and whether users are allowed to download attachments, how often they must refresh their passwords and so on.

Like most policy objects in exchange, there is a built in default policy which should be copied rather than altered, and then the copies altered and applied to users according to their need – so you might choose to prevent a group of users downloading attachments, or strengthen the security settings for users who have access to particularly sensitive material.

Another approach is to set policies according to the device, so for example, you can create a policy for Windows Phone 8 devices and a separate policy for iOS devices. You may also choose to differentiate between devices that are owned by the customer, and devices that are owned by the users themselves. The drawback with this approach is that the policy is always applied to the user, not the device, so if the user has multiple devices, or changes devices, then a policy change will be required. Notice that a user can only be assigned to one ActiveSync mailbox policy at a time, and setting a new policy removes any previous policy.

Client throttling

Client throttling is a new feature in exchange 2010 that is intended to prevent single users or groups of users taking down exchange by performing an unwitting denial of service attack. Now, it may be that your execs have been unknowingly co-opted as zombie shock troops by some shadowy criminal cabal, possibly in a secret volcano base, or it may be that they have just installed the latest version of iOS – personally, I find it hard to tell the difference. And exchange can’t differentiate either; in either case, it just stops working. The client throttling policy has sections for each access method – owa, ActiveSync, MAPI etc. – so you can set limits for critical indicators above which exchange will delay or simply deny connections to that user. Again, as with most exchange policies there is a default that is automatically applied to all users, which should not be altered; it should be copied, and then the copies tailored and applied to individual users or groups of users. As above, you can only apply one policy per user. Client throttling was originally “off” in RTM, but was switched on in SP1 by default, and has been further tweaked since then, so your default policy will depend on your service pack level – service pack 3, right? You can track the effect throttling has on your exchange environment using the instructions in this article.

The throttling policy itself can be a little tricky to understand; Elan Shudnow’s blog does a pretty good job of deciphering it – but remember this is for an earlier version of Exchange. I’d thoroughly recommend anyone who was looking to implement non-default policies do some serious reading. The MSExchange.org article and Chapter 10 of “Exchange server 2010 inside out” are also very good.

ABQ

Allow/Block/Quarantine lists are another new feature of Exchange 2010 that are intended to work together with mailbox policies. The policies allow you to control device *features*, whereas ABQ lists allow you to control which devices are allowed access. There is a great post on the Ehlo blog about the lists, and the work that went into designing them, and some generalisations about the way they are expected to be used – whether to block by default and only allow certain types of devices, or whether to allow or quarantine by default, and only block devices known to cause issues – iOS 6.1 take a bow.

Quarantine does not stop the device from connecting to exchange, only downloading content – so items may be added to calendars and other folders while the device is in quarantine – this can catch people out! For a full explanation of how this works, and screenshots, have a look at the msexchange.org pair of articles on mobile device management with Exchange 2010.

All three tools should be used to ensure that servers are appropriately secured, users have a suitable level of functionality and the organisation is protected from novel devices.

The future

There are some great features in Windows InTune , such as the ability to manage android devices with SCCM; the only drawback is that it is a subscription service. In the meantime, ActiveSync it is. If you follow the advice above and use the provided tools to limit your implementation, you should be fine.

Exciting things i have seen on the internet, 29/11/13

2013 sp1 has been announced – edge servers, support for 2012 r2 and a shift in how service packs work – from now on they’re going to be a lot more like cumulative updates, and we need to see them as such. SP1 will be in place of CU4, and CU5 will include SP1, which is different to previous releases where rollups were for a particular service pack – from now on it will all flow in together. in an earlier mail i suggested that the difference between a service pack and a cumulative update is a schema update, however CU1, CU2 and CU3 include a schema update, so that blows that out of the water. expect a post on how we need to redefine our services in response to this in the near future.

 

Exchange 2010 sp3 rollup 3 has also been announced. note the large numbers of fixes for CAS crashes. be aware that issues with the client access service crashing may be fixed (or substantially altered) by the code changes made in this rollup.

 

Rhod Milne has highlighted some useful Microsoft Virtual Academy courses in a post here. if you’ve not yet tried the MVA stuff then i urge you to find some time for it – it’s much better than free training has any right to be.

 

He has also posted an exceptionally handy list of things you should do to Exchange 2010.You should consider these things best practice, and think about how to incorporate them into your Exchange 2010 environment. if you are unsure if they apply to you, then please give me a ring. there are also some recommendations here that Rhod refers to; these are interesting and useful, but less likely to apply to everybody.

 

Scott Schnoll is always an interesting read, and he’s just published some new documentation for Exchange 2013 managed availability.

 

after last week’s post about IPv6, Microsft have updated their article on how to re-enable it or temporarily disable it for troubleshooting purposes only.

 

Microsoft have changed article 297019 again. this is the networked PST article. it no longer explicitly states that PST files accessed over a network are not supported (although it is still heavily implied). for the avoidance of doubt; PST files that are accessed on a network share are not supported. there are too many issues associated with it, and if MS find out that you have PST files stored on a network share then they will stop doing any troubleshooting until you disable them all. Please do not store PST files on a network share, and please don’t suggest to your customers that it won’t be a problem. it ALWAYS causes a problem. the MS document that is referenced in the kb article is titled “network stored pst files: don’t do it“. That should be a good clue as to what the product group think about it.

 

no-one has reported a problem with iOS 7 to me, but just in case, here’s a hotfix that Microsoft have released for a problem that iOS7 created.

 

and just to prove it’s not just apple, the latest version of android (kitkat, or 4.4) breaks active sync. google have marked the problem as “closed (to be fixed in a later release)”, so basically there’s a reasonable chance that upgrading to kitkat on an android device will BREAK exchange activesync for the long-term. not a good idea.

Exchange 2010 doesn’t like Rainbow Trout.

I’ve just finished dealing with a case where after applying service pack 2 one of my customers started experiencing frequent mass disconnections. When they examined the event logs they could see that the RPCClientAccessService was frequently crashing with event ID 4999 and a text similar to:

M.E.RpcClientAccess.Service, M.E.D.S.M.F.Enumerator.MoveNext, System.InvalidOperationException,

They are using Riverbed Steelhead WAN optimizers to reduce their bandwidth requirement between branch offices and the main datacentres that house the Exchange servers. Riverbed have an knowledgebase article that explains that whil many things may cause the CAS to experience 4999 events, one of the things may be the riverbed corrupting RPC/MAPI calls. In this instance the way to tell is by running

Protocol mapi skip-copy enable

on the client and server side riverbed devices. This will disable acceleration for a specific RPC operation  type (EcDoRPC Remote OpCode 0x4D – FXSrcCopyTo) of a specific Microsoft RPC Operation (RPC OpNum 0x2 – EcDoRpc), which is used by some Microsoft Outlook plug-ins. It doesn’t disable ALL the MAPI optimisation. It goes on to helpfully reiterate that there are many things that may cause this corruption (although I can’t think of any off the top of my head), and that if enabling this command *doesn’t* fix the problem, then you should log a call with PSS. I would link to the article, but I can’t, so you’ll have to ask them for it yourselves – it’s variously recorded as S18158 and S15635, and it’s called Event ID 4999 occurs on exchange CAS server.

So far, so what? This affects a veeery small proportion of people.  Why bother writing about it? Because, dear reader, it highlights a much wider problem. Vendors are not like the Avengers, hanging round on some hovering sky platform with Samuel L Hackson, oiling their muscles. They don’t talk to each other that much, mostly, and when they do it’s like cats in a bag. Microsoft do NOT test Exchange with third party products – it’s up to the third party vendor to join the Technology Adoption Programme, get hold of beta and pre-RTM software and make sure their stuff works as expected. If the vendor can’t or won’t do it – WE are responsible for the service we provide to the customer. WE are responsible for making sure that the stuff we provide and maintain continues to work when we patch it – that might mean we have to ask vendors such as Symantec what effect going to a particular patch level will have on Enterprise Vault, say, or it might mean we just have to bite the bullet, build a lab, and test stuff for ourselves (incidentally, this is a really good reason to keep it simple, stupid). OK, so we just sit on a level that we know works, then? NO, because sooner or later we will run out of vendor support. Microsoft stop supporting SP2 for Exchange 2010 in less than 6 months. After that time anyone wanting to log a call will be told they should be on SP3. And do you know what – they’re right. We should be on SP3*.

We are the guardians of complex heterogeneous environments. It’s time to man up.

*The good news is that once we’re on SP3, that’s likely the last service pack for 2010 – no more major upgrades until 2010 goes end-of-life in 2020. Lovely. Get the deck chairs out.

Load balancing algorithms and Exchange 2010

Last week one of our customers had a service outage brought on by their load balancer. It wasn’t misbehaving, it was doing exactly what it was supposed to do, and in doing so made sure that no users could connect to their mailboxes. Good work, load balancer. As part of the mop-up, I was asked what Microsoft recommend with regard to load balancers.

First – the problem. One of the CAS that the load balancer was ummm… balancing became unhappy – not so unhappy that it couldn’t respond to requests, but unhappy enough that it couldn’t service them, so users would disconnect. Unfortunately, this meant that as far as the load balancer was concerned, it was available – it was still responding, right?

The load balancer looked at its farm, and each new request that came in got sent to the server with the least load. Load *balancer*, see. The clue is in the name. Guess which server has the least load… and in fairly short order, everyone is disconnected.

So, how do you get around this problem? Well, it depends on the load balancer and how clever it is, but without relying on cleverness, the answer is to use round-robin load balancing. Is it perfect? No. Is it better than having all your users disconnected? Yes.

 

And what do Microsoft do? Well, in a way, it doesn’t matter what MS do – their setup and budget is probably very different to yours – but they have chosen to go with round robin, as detailed in this presentation from Andrew Ehrensing, given at TechEd Australia in 2010:

http://channel9.msdn.com/Events/TechEd/Australia/Tech-Ed-Australia-2011/EXL304 slide 32.

 

The slide notes say this:

“Outlook.com / MSIT learnings around Round Robin

1.) When using least connections, 3 node pool.   When server goes down for maintenance and comes back online, gets POUNDED with new connections

2.) When a server “misbehaves” it may be “healthy by the LB healthcheck” but not processing new inbound connections, but all new connections keep directed there causing an outage”

 

I’d certainly not suggest you ignore the load balancer manufacturer’s recommendations, either. But be aware that a problem that may appear a little esoteric can actually occur in the real world.

 

So now you know. As an aside, this customer has experienced two outages in recent months which have been caused by things performing poorly – not badly enough that they completely fail and trigger high availability, but still not good enough to provide a service. Most high availability relies on fairly simple tests to see if a service is available –“can a connection be made on a given port?” – with no regard for whether something can be usefully performed through that port. This is great if the service fails, or the server crashes, but not so good if, as in this instance, authN is broken. High availability features are great, but they do not replace the need for proper planning and effective monitoring, which would have saved this particular customer.

 

Failing databases, sulking network manager

Interesting call here. After a hardware firewall change and a reboot, my customer’s DAG had a database copy in a failed state. The set up is a two node DAG across two sites, with a FSW and an Alternate FSW preconfigured. It’s also IL3. If you don’t know what IL3 is, please stop reading this article. You don’t have clearance. Look out the window. See the guy with the dark glasses watching you? No? THAT’S how good we are.

So, just change the firewall back, dummy. big deal. sheesh. Except, it didn’t fix the problem. interestingly – this is the first time that cluster failover has been tested… DAG has been tested a number of times.

So… he’s got one database copy mounted, one failed.
We ran get-mailboxdatabasecopystatus and saw this error; “replication server encountered transient network error. Network manager not yet initialised”
It’s been in this state for a while now, and through multiple reboots. Sitting watching it won’t help. It’s not really transient.
Oh right, so the FSW needs rebooting, right? I’ve seen this before…(http://port25guy.com/2012/12/10/witness-server-boot-time-getdagnetworkconfig-and-the-pain-of-exchange-2010-dr-tests/). No. The boottime cookies are the correct way around.
So we start checking things. the IP addresses show as down in the DAG. This picture is not their DAG. IL3, remember?

The cluster node shows as “down” in the failover cluster manager.
So, let’s see what happens when we try to start the node. Lots of errors in the event log (which I can’t see… IL3…), but one sticks out like a sore thumb – event id 4123:

Log Name: Application
Source: MSExchangeRepl
Date: 2/26/2012 11:12:08 AM
Event ID: 4123
Task Category: Service
Level: Error
Keywords: Classic
User: N/A
Computer: LABMBX-1.exlab.mydomain.com
Description:
Failed to get the boot time of witness server ‘labcas-1.exlab.mydomain.com’. Error: The remote procedure call failed. (Exception from HRESULT: 0x800706BE)

There’s a great big clue right there. “the remote procedure call failed”. For some reason the endpoint mapper on the FSW isn’t responding. This is a resource domain which just contains a DC, the two Exchange boxes and a Vcenter manager. (I did mention the VMWare, yes?) What is the FSW machine? Well, it’s the Vcenter console machine in the domain.

And there is the problem.

When you install exchange on a box, it adds a security group to the local admins group, and makes changes to the windows firewall (http://marksmith.netrends.com/Lists/Posts/Post.aspx?ID=83). When you put the FSW on a NON-Exchange box, you need to add the exchange trusted subsystem group to the local admins manually – you’ve not installed exchange, so setup won’t do it for you. It’s documented here: http://technet.microsoft.com/en-us/library/dd351172.aspx

If the witness server you specify isn’t an Exchange 2013 or Exchange 2010 server, you must add the Exchange Trusted Subsystem universal security group to the local Administrators group on the witness server. These security permissions are necessary to ensure that Exchange can create a directory and share on the witness server as needed. If the proper permissions aren’t configured, the following error is returned:
Error: An error occurred during discovery of the database availability group topology. Error: An error occurred while attempting a cluster operation. Error: Cluster API “AddClusterNode() (MaxPercentage=12) failed with 0x80070005. Error: Access is denied.”

What it doesn’t say, but assumes, is that RPC will work. Why does it need RPC? It’s just a fileshare, yes? It doesn’t say anything about RPC here: http://technet.microsoft.com/en-us/library/bb331973.aspx

• The Clustering data path listed in the preceding table uses dynamic RPC over TCP to communicate cluster status and activity between the different cluster nodes. The Cluster service (ClusSvc.exe) also uses UDP/3343 and randomly allocated, high TCP ports to communicate between cluster nodes.
• For intra-node communications, cluster nodes communicate over User Datagram Protocol (UDP) port 3343. Each node in the cluster periodically exchanges sequenced, unicast UDP datagrams with every other node in the cluster. The purpose of this exchange is to determine whether all nodes are running correctly, and also to monitor the health of network links.
• Port 64327/TCP is the default port used for log shipping. Administrators can specify a different port for log shipping.
• For HTTP authentication in which Negotiate is listed, Kerberos is tried first, and then NTLM.

Well it does, but for nodes, not FSW. However, when the single remaining node checks it has quorum it needs to compare the current boot time of the FSW against the time stored in the boottime cookie. How does it get the current boot time? Remote registry, I reckon WMI, which requires RPC.

So… open Windows firewall for RPC, reboot FSW and… bingo. Everything up, sweet as a nut.

We ran the cluster validator (http://technet.microsoft.com/en-us/library/bb676379(v=exchg.80).aspx) and Paul Cunningham’s DAG healthcheck script (http://exchangeserverpro.com/get-daghealth-ps1-database-availability-group-health-check-script/ ) and everything comes back clean.

The moral of this story? Stop being clever.

A great takeaway for everyone is this:

unlike earlier versions of Microsoft Exchange where IT administrators had to perform multiple procedures to lock down their servers that were running Microsoft Exchange, Exchange 2010 requires no lock-down or hardening

From the Exchange 2010 Security Guide, here: http://technet.microsoft.com/en-us/library/bb691338(v=exchg.141).aspx

 

 

edit: if you look at Scott Schnoll’s wonderful high availability deep dive, here, then you will find that the node gets the FSW boot time using WMI, not remote registry.