Back in the Fall, I had a question regarding monitoring Azure AD Connect Sync with SCOM. The preferred solution is generally Azure AD Connect Health, and if you have SCOM you couple that with various on prem AD/ADFS Management Packs to monitor your hybrid environment end-to-end.
I love that our product teams who build cloud services are taking a proactive approach to monitoring and thinking about it as integral to the product development cycle. A part of me would love to see a Product Group Management Pack for Azure AD Connect Sync, but I also understand that in this new cloud first world that you have to focus your resources carefully, and sometimes that means developing solutions that can potentially benefit a broader pool of customers.
The biggest challenge that I have seen for some Hybrid Cloud customers is that the out-of-box built-in notification mechanism of these monitoring solutions is e-mail only. Many of the customers I work with have fairly advanced notification/ticketing systems, and while e-mail is one avenue of alerting, it isn’t the only one. For customers with SCOM, they have often put in the leg work of integrating SCOM with their existing notification system. So certain alerts are e-mails, others are tickets, and some might kick-off a page or text to wake-up an engineer at 2 AM.
With some cloud services I can understand the argument that the paging at 2 AM is going to happen on the Microsoft side, so your engineers can continue to sleep peacefully. But with a hybrid solution like Azure AD Connect Sync, that isn’t really the case. You can absolutely have a problem that only your engineers can fix, and you may want to have the flexibility to leverage your existing notification systems. You could certainly explore integrating directly between your ticketing/notification system and Azure AD Connect Health, and for some customers this may be the correct path. (No need to add an extra hop/point of failure if you don’t need to.) But for those who have already invested heavily in SCOM, it would be nice to have a management pack that could provide basic integration with minimal development effort.
I had started poking around the problem in the Fall, but I hadn’t had time to sit down and write an MP to address it. It was basically a lot of pseudo code floating around in my head that I was pretty sure would work if I ever sat down and wrote it. I have a nice week of vacation ahead of me starting today, but I had promised some colleagues I would build an MP if I had some free time, so I spent this past weekend putting together a Management Pack that I believe should address this problem.
The MP is still very much in beta form, and it falls under the usual AS-IS/test heavily/use at your own risk disclaimer that accompanies all community based MPs. I am actively seeking feedback and will come out with additional versions as time allows, so if you have suggestions please feel free to send them my way. If you DM @OpsConfig on Twitter, or leave a comment I will respond via e-mail.
The core functionality of the MP is simple. It makes an API call to your instance of Azure AD Connect Sync Health for alerts every 15 minutes . If there is a new warning alert it will generate a corresponding warning alert in SCOM. If there is a new critical alert it will generate a corresponding critical alert. If an alert closes in Azure AD Connect Health the MP will automatically detect the resolution and close out the Alert in SCOM. Nothing fancy, but it works and is pretty lightweight.
I also added in a custom class/monitor that looks for instances of the Microsoft Azure AD Sync Service:
AAD Connect Health will monitor this too, but it doesn’t monitor it as real-time as SCOM does. I would rather know within 60 seconds if Sync is down rather than having to wait, so it is a nice better together story to have this working in conjunction with Azure AD Connect Sync Health.
In addition the MP monitors the core services which feed Azure AAD Connect Sync Health:
Again if these services go down you will eventually be alerted by AAD Connect Sync Health, but why wait? Since these services are delayed start, I built a custom Unit Monitor Type that gives them a little more leeway so we check the service state every 30 seconds but unlike the default NT Service Unit Monitor Type we wait until we have 6 consecutive samples of service stopped detected before we alert. Since these monitors are tied to the class based on the presence of the Azure AD Sync service, they will also alert if you have a server with the sync service which doesn’t have the Azure AD Connect Health Sync Agent/services installed. (If this is an issue, you can always shut the monitors off, but without those services installed and running you are losing 95% of the functionality provided by this pack.)
To get started with the pack there are some prerequisites:
- You need Azure AD Connect Health to be installed and configured. I won’t go into the details for that, but you can find everything you need to know via the awesome guide/videos which can be found here:
- For Authentication I leverage the Active Directory Authentication Library (ADAL). The key components being these two .dlls:
If you download and install the Azure Powershell Module this should give you everything you need:
You will need to install this on each of the management servers, as I leverage the All Management Servers Resource Pool as the source of API calls to allow for high-availability. (If having the ability to have a dedicated AAD Connect Health Watcher is more desirable than the AMSRP just let me know and I can make another version of the MP which can support this.)
Your management servers will need to allow communication to the following urls through both windows/network firewalls:
- You will need a user account with necessary access to Azure AD Connect Health Sync.
- When logged into https://portal.azure.com navigate to Azure Active Directory in the left hand pane.
Then select Azure AD Connect
Select Azure AD Connect Health
Right now you can see that my environment is unhealthy as I have intentionally stopped the Azure Active Directory Connect Sync Monitoring to force an error condition:
If you click Users – Add – select a role and add a user that we will later add to a Run As Profile in SCOM:
As this is still early in the testing phase I have lazily done most my testing with an account with Owner privs. I believe Monitoring Reader Service Role should be sufficient (Subsequent testing shows that this works — see comments for details), but I need to do some more testing to insure that will always hold true.
There is one more prereq click Azure Active Directory Connect (Sync)
Then click the service name that you want to monitor:
Take note of the url in your browser bar as you will need to copy the small portion highlighted in yellow for an overridable parameter in SCOM:
Once you have all the above prerequisites in place you can download and import the MP from here:
Once imported you will need to add your Azure AD Connect account configured above to a custom Run As Profile.
I use an account configured with Basic Auth that I then distribute to my management servers.
Once this is in place we need to modify the core rule that drives the MP:
Right-click Azure AD Connect Rule – Overrides – Override the Rule – For all objects of class: All Management Servers Resource Pool
Override AADSync URL (the portion of the url highlighted in yellow that you copied before) – Add your AdTenant – Set the rule to enabled.
Then any time an alert gets generated in Azure AD Connect Sync Health:
A corresponding alert will be generated in SCOM:
Once the alert closes in AAD Connect Sync Health it will close out in SCOM within 15 minutes.
When I get back from vacation I will put together a post or a video walking through the underlying mechanics of exactly how the MP works, and then I will most likely post the Visual Studio project files on GitHub. But in the meantime you are welcome to download and test it out from TechNet Gallery. Now I am off to my vacation. Cheers!