I recently had a question from a customer on how to create Wildcard Service Monitors + Recoveries. The Service Monitor from the Monitoring template and a simple Unit Monitor for services both require an explicit service name, no wildcards allowed. For most services this is fine, but there are some applications which do fun and interesting things like concatenate computername + service to create a unique service name. This creates a bit of a problem for monitoring. You can create individual monitors, but if you have hundreds of services each with unique service names that follow a particular pattern creating hundreds of corresponding monitors could get a little time consuming.
Brian Wren has a great article from back in the SCOM 2007 R2 days that answers part of this question, but when I went through the steps I found it needed some slight tweaking and updating for SCOM 2012 R2. Once that was complete I also needed to come up with a simple low overhead wildcard service recovery for when one of the services stops and needs to be brought back online.
Below are my steps:
Launch SCOM Console
Tasks – Create a Management Pack
Enter Name + Description – Next – Create
Right Click Windows Service
Add Monitoring Wizard
Enter a Name, Save to the MP you just created
Enter Service Name. Use % for a wildcard representing multiple characters. As I don’t have any unique services in my environment I am using m% to demonstrate how this can work. For the rest of these instructions wherever you see m% keep in mind that you need to modify this value to match your unique service name wildcard value. Be careful using too broad a wildcard could create a lot of noise and load very quickly in your environment.
Pick a Target Group. In this case I am using All Windows Computers. Generally you would want to target this as precisely as possible. Leave Monitor only automatic service checked
Select Management Packs
Select the Custom Management Pack you just created
Select Export Management Pack
Select a location to save the unsealed xml file
Open the File in your XML editor of choice (Notepad will do, but Visual Studio or Notepad+++ will make it a bit easier to read)
Search the file for your wildcard in my case this is M%
We’ll be making a few replacements in the code.
You will be modifying:
<DataSource ID=”DS” TypeID=”MicrosoftWindowsLibrary7585010!Microsoft.Windows.Win32ServiceInformationProviderWithClassSnapshotDataMapper”>
To: (remember to also swap the m% with the appropriate value)
<DataSource ID=”DS” TypeID=” MicrosoftWindowsLibrary7585010!Microsoft.Windows.WmiProviderWithClassSnapshotDataMapper”>
<Query>select * from win32_service where name like ‘m%'</Query>
- In Brian Wren’s instructions he used TypeID=”Windows!Microsoft,Windows.Win32…” The Alias in my custom console generated MP is MicrosoftWindowsLibrary7585010! If you run into any errors keep in mind that whatever alias is present in the manifest references must be consistent. I haven’t tested to confirm, but based on the output it looks like the console MP generated alias is based on MP+Version Number. If you have a different version of the MP and you follow my steps exactly you will likely hit an error as the Alias I provide for Microsoft.Windows.Library is going to be off by a few numbers from yours. If this is the case just modify the alias in my example to match what you have in the rest of the .xml file.
Save the .xml file
Go back to the SCOM console – ADministration
Import Management Packs
Add from disk
Select the newly modified .xml file
To check and confirm that the discovery associated with the wildcard monitor is working.
Change Target Type
Select Custom Target
A few minutes after importing the updated pack you should see services discovered.
Now we need to create a wildcard recovery. If this was a single service recovery I would create a standard SCOM recovery and call net.exe and pass a start command with the service name. Since this is a wildcard service we have to do things a little differently as I don’t know of a way to pass wildcards to net.exe. (We could use PowerShell, but for this I want to try to be as light weight as possible from an overhead perspective even if that means sacrificing some more advanced error handling that we could easily add in with PowerShell.)
Go to Authoring:
Select Windows Service
Right Click your custom Service Monitor – View Management Pack Objects – Monitors
Expand Entity Health -Availability – Right Click the Basic Service Monitor Stored in your custom MP – Properties
Select the Diagnostics and Recovery Tab
Under Configure recovery tasks select Add – Recovery for critical health state
Select Run Command
Name your Recovery – Check the Boxes for run recovery automatically and recalculate monitor state after recovery finishes
Enter Full path to file
Parameters: (originally I used slightly different param, but found that while it worked in the command line it failed when run as recovery. This method works consistently)
/interactive :off service where “name like ‘m%'” call startservice
You should now be all set to test out and validate your new monitor.
Just keep in mind that a wildcard discovery if targeted incorrectly (too broad a wildcard, too broad a target group, or both) you could have the recipe for a single monitor that can cause a lot of churn/perf issues/and noise in your environment. So be cautious and test very carefully. Make sure you have a good sense of the number of objects this monitor will pick up not just in your test environment, but once you move it into production. To be clear I would never recommend using a wildcard as broad as m% in production. This picks up way too many services that you likely don’t care about.
Also please note that the recovery is equally general as the monitor if not more so. It is also not checking to see if the services that apply to it are already started. In the case of my example m% picks up a bunch of services. If a single service matching that criteria goes down, the recovery will attempt to recover/start every single service that matches that criteria m%. So if you are building your wildcard service monitor to pickup multiple services on single system that follow a common pattern, a failure of one will result in an attempt to recover all.
In theory this shouldn’t be a problem. The method I am using is extremely lightweight and if the service is already started in the background the service will just output an exit code of “I’m already started” and remain started. With that said this is only a sample, and its still worth testing in your environment to confirm the behavior and make sure you understand exactly how the recovery is working before you consider implementing.
An example of running the recovery for an instance of m% being stopped: