Category Archives: How do I

How do I: Monitor Azure AD Connect Sync with SCOM?

Back in the Fall I had a question regarding monitoring Azure AD Connect Sync with SCOM. The preferred solution is generally Azure AD Connect Health, and if you have SCOM you couple that with various on prem AD/ADFS Management Packs to monitor your hybrid environment end-to-end.

I love that our product teams who build cloud services are taking a proactive approach to monitoring and thinking about it as integral to the product development cycle. A part of me would love to see a Product Group Management Pack for Azure AD Connect Sync, but I also understand that in this new cloud first world that you have to focus your resources carefully, and sometimes that means developing solutions that can potentially benefit a broader pool of customers.

The biggest challenge that I have seen for some Hybrid Cloud customers is that the out-of-box built-in notification mechanism of these monitoring solutions is e-mail only. Many of the customers I work with have fairly advanced notification/ticketing systems, and while e-mail is one avenue of alerting, it isn’t the only one. For customers with SCOM, they have often put in the leg work of integrating SCOM with their notification system such that certain alerts are e-mails, others are tickets, and some might kick-off a page or text to wake-up an engineer at 2 AM.

With some cloud services I can understand the argument that the paging at 2 AM is going to happen on the Microsoft side, so your engineers can continue to sleep peacefully. But with a hybrid solution like Azure AD Connect Sync, that isn’t really the case. You can absolutely have a problem that only your engineers can fix, and you may want to have the flexibility to leverage your existing notification system. You could certainly explore integrating directly between your ticketing/notification system and Azure AD Connect Health, and for some customers this might be the correct path. No need to add an extra hop/point of failure if you don’t need to. But for those who have already invested heavily in SCOM, it would be nice to have a management pack that could provide basic integration with minimal development effort.

I had started poking around the problem in the Fall, but I hadn’t had time to sit down and write an MP to address it. It was basically a lot of pseudo code floating around in my head that I was pretty sure would work if I ever sat down and wrote it. I have a nice week of vacation ahead of me starting today, but I had promised some colleagues I would build an MP if I had some free time, so I spent this past weekend putting together a Management Pack that I believe should address this problem.

The MP is still very much in beta form, and it falls under the usual AS-IS/test heavily/use at your own risk disclaimer that accompanies all community based MPs. I am actively seeking feedback and will come out with additional versions as time allows, so if you have suggestions please feel free to send them my way. If you DM @OpsConfig on Twitter, or leave a comment I will respond via e-mail.

The core functionality of the MP is pretty simple. It makes an API call to your instance of Azure AD Connect Sync Health for alerts every 15 minutes . If there is a new warning alert it will generate a corresponding warning alert in SCOM. If there is a new critical alert it will generate a corresponding critical alert. If an alert closes in Azure AD Connect Health the MP will automatically detect the resolution and close out the Alert in SCOM. Nothing fancy, but it works and is pretty lightweight.

I also added in a custom class/monitor that looks for instances of the Microsoft Azure AD Sync Service:

AAD Connect Health will monitor this too, but it doesn’t monitor it as real-time as SCOM does. I would rather know within 60 seconds if Sync is down rather than having to wait, so it is a nice better together story to have this working in conjunction with Azure AD Connect Sync Health.

In addition the MP monitors the core services which feed Azure AAD Connect Sync Health:

Again if these services go down you will eventually be alerted by AAD Connect Sync Health, but why wait? Since these services are delayed start, I built a custom Unit Monitor Type that gives them a little more leeway so we check the service state every 30 seconds but unlike the default NT Service Unit Monitor Type we wait until we have 6 consecutive samples of service stopped detected before we alert. Since these monitors are tied to the class based on the presence of the Azure AD Sync service, they will also alert if you have a server with the sync service which doesn’t have the Azure AD Connect Health Sync Agent/services installed. (If this is an issue, you can always shut the monitors off, but without those services installed and running you are losing 95% of the functionality provided by this pack.)

To get started with the pack there are some prerequisites:

  • You need Azure AD Connect Health to be installed and configured. I won’t go into the details for that, but you can find everything you need to know via the awesome guide/videos which can be found here:

https://docs.microsoft.com/en-us/azure/active-directory/connect-health/active-directory-aadconnect-health

  • For Authentication I leverage the Active Directory Authentication Library (ADAL). The key components being these two .dlls:

If you download and install the Azure Powershell Module this should give you everything you need:

https://aka.ms/webpi-azps

You will need to install this on each of the management servers, as I leverage the All Management Servers Resource Pool as the source of API calls to allow for high-availability. (If having the ability to have a dedicated AAD Connect Health Watcher is more desirable than the AMSRP just let me know and I can make another version of the MP which can support this.)

Your management servers will need to allow communication to the following urls through both windows/network firewalls:

https://management.azure.com/
https://login.windows.net/

  • You will need a user account with necessary access to Azure AD Connect Health Sync.
  • When logged into https://portal.azure.com navigate to Azure Active Directory in the left hand pane.

Then select Azure AD Connect

Select Azure AD Connect Health

Right now you can see that my environment is unhealthy as I have intentionally stopped the Azure Active Directory Connect Sync Monitoring to force an error condition:

If you click Users – Add – select a role and add a user that we will later add to a Run As Profile in SCOM:

As this is still early in the testing phase I have lazily done most my testing with an account  with Owner privs. I believe Monitoring Reader Service Role should be sufficient (Subsequent testing shows that this works — see comments for details), but I need to do some more testing to insure that will always hold true.

There is one more prereq click Azure Active Directory Connect (Sync)

Then click the service name that you want to monitor:

Take note of the url in your browser bar as you will need to copy the small portion highlighted in yellow for an overridable parameter in SCOM:

Once you have all the above prerequisites in place you can download and import the MP from here:

Azure AD Connect Sync Custom MP

Once imported you will need to add your Azure AD Connect account configured above to a custom Run As Profile.

I use an account configured with Basic Auth that I then distribute to my management servers.

Once this is in place we need to modify the core rule that drives the MP:

Right-click Azure AD Connect Rule – Overrides – Override the Rule – For all objects of class: All Management Servers Resource Pool

Override AADSync URL (the portion of the url highlighted in yellow that you copied before) – Add your AdTenant – Set the rule to enabled.

Then any time an alert gets generated in Azure AD Connect Sync Health:

A corresponding alert will be generated in SCOM:

Once the alert closes in AAD Connect Sync Health it will close out in SCOM within 15 minutes.

When I get back from vacation I will put together a post or a video walking through the underlying mechanics of exactly how the MP works, and then I will most likely post the Visual Studio project files on GitHub. But in the meantime you are welcome to download and test it out from TechNet Gallery. Now I am off to my vacation. Cheers!

Tagged , , , , , ,

How do I: Create An Advanced SQL Database Backup Monitor In Visual Studio?

My favorite part of my job is getting to work with customers and understand exactly how they use, and also need a product to work. Some of the time this just means listening closely, answering questions, and relaying information back to the product group. But often there are those tiny changes and use cases that fall outside the realm of things the product group is likely to address.

A change that might be hugely valuable for Customer A, won’t happen if it breaks backwards compatibility for Customers B-Z.

One of my customers has a SCOM environment for SQL that monitors over 20,000 databases. Due to their size there are often cases where the out-of-box monitoring provided by the SQL Product Groups management packs isn’t able to completely meet their needs. While the pack provides fantastic monitoring for SQL, there are times where they need a more granular view of the health of their environment.

An instance of this from the past year was monitoring SQL Database backups. The SQL Product Group’s pack gives you the ability to alert if a database hasn’t been backed up in a certain number of days. By default it is set to 7 days, but you can override that to any integer value of days.

For my customer this wasn’t really good enough. They wanted to be able to drill down and alert on hours since last backup. They also wanted multiple severities so if it had been 20 hours since a backup, an e-mail could go out, but at 30 hours we would generate a page. The 20 & 30 hours would be customizable at the individual database level, and they also wanted some added logic that would check database backups for databases that had a status of “ONLINE”. We have other monitors that look at DB Status in general so in this case if a database was OFFLINE they either knew about it from the other monitors and were fixing it, or it was intentional in which case they didn’t want a backup alert.

The basic logic behind the SQL PG’s MP is a simple T-SQL query wrapped in a fairly complex vbscript. The unwrapped T-SQL is below:

The T-SQL modifications we need to make are relatively simple swap DAY to HOUR, and add in a line to only return database backup info for databases with a status of ONLINE.

To get this into a working Management Pack is a little bit more complex and requires isolating and cloning the Product Groups Database Backup Monitor in Visual Studio, and then making a few changes to the XML for our custom iteration.  To prevent screenshot overload I did a quick step-by-step walkthrough of the process. For this video I opted to leave out three-state severity request, and will show how to add that functionality in a follow up video.

If you have any questions or need any help, just leave a comment.

Tagged , , , , , ,

How do I: Create a task that will allow me to bulk adjust a regkey via the SCOM Console

There are lots of ways to adjust reg keys in bulk. SCCM, Group Policy, Remote PowerShell to name a few.

Occasionally I find that SCOM customers like to have the ability to modify a registry setting via a Task in the SCOM console. This gives them the ability to modify the regkey for a single server, a group of servers, all servers, whatever they want in a matter of seconds without having to rely on outside tools.

Recently I have had a few customers need to adjust the MaxQueueSize reg key for their agents:

This is actually a fairly good simple MP Authoring exercise so I will quick walkthrough the process.

The end design in Visual Studio will look like this:

regkeymp

Easy enough, two Tasks, and two Scripts, standard out-of-box references – which then generate two tasks in the console:

task

Usually for something like this I like to start with the PowerShell before I break open Visual Studio. It is easier for me to get the script working in the PowerShell ISE and then start a new MP once I know I have the PowerShell working.

For the most part the PowerShell is pretty straight forward. The only complication I ran into in testing was that  since some of my customer’s agents are multi-homed and some aren’t I needed a way to handle either scenario without erroring out. Handling multiple management groups adds three lines of code to my original script, but still not too bad:

$GetParentKey = Get-Item -Path ‘HKLM:\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Management Groups’
$MGName = $GetParentKey.getsubkeynames()

Foreach ($Name in $MGName){
Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Management Groups\$Name” -Name ‘maximumQueueSizeKb’ -Value 76800 -Force
}

To make things as simple as possible in this example I am using hardcoded QueueSize Values. One Task to increase the queue size to 75 MB, and one to set it back to the default of 15 MB.

$GetParentKey = Get-Item -Path ‘HKLM:\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Management Groups’
$MGName = $GetParentKey.getsubkeynames()

Foreach ($Name in $MGName){
Set-ItemProperty -Path “HKLM:\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Management Groups\$Name” -Name ‘maximumQueueSizeKb’ -Value 15360 -Force
}

Now that we have the scripts we can open up our copy of Visual Studio with the Visual Studio Authoring Extensions:

File – New Project

newproj

Management Pack – Operations Manager 2012 R2

opsproj

create

We are going to create two folders. These aren’t required, I just like adding a little bit of organization rather than dealing with one large .mpx file. Ultimately how you divide things up is somewhat arbitrary and more a matter of personal preference rather than any specifc rules.

To create a folder. Right-click MaxQueueSize – Add – New Folder

new-folder

Do this two times. We will create one folder called Scripts and one called Tasks:

scripts

Now we need to populate our Scripts folders with the two PowerShell scripts we wrote in the ISE earlier.

Right-Click the Scripts folder – Add – New Item

addnewitem

PowerShell script file – Name file – Add

powershell-script-file

Now you can paste in the code we wrote in the PowerShell ISE

increasemaxsize

This takes care of the Increase Max Queue Size PowerShell. Now repeat the steps above for the reset max queue size script:

powershell-scripts

Now we need to populate our Tasks folder

Right-Click Tasks Folder – Add – New Item

add-new-task

Empty Management Pack Fragment – IncreaseMaxQueueSize.mpx – Add

increasetask

The code for a task that kicks off a PowerShell script is pretty easy:

<ManagementPackFragment SchemaVersion=”2.0xmlns:xsd=”http://www.w3.org/2001/XMLSchema“>
<Monitoring>
<Tasks>
<Task ID=”Sample.RegKey.IncreaseMaxQueueSize.AgentTaskAccessibility=”InternalTarget=”SC!Microsoft.SystemCenter.ManagedComputerEnabled=”trueTimeout=”300Remotable=”true“>
<Category>Custom</Category>
<ProbeAction ID=”ProbeTypeID=”Windows!Microsoft.Windows.PowerShellProbe“>
<ScriptName>IncreaseMaxQueueSize.ps1</ScriptName>
<ScriptBody>$IncludeFileContent/Scripts/IncreaseMaxQueueSize.ps1$</ScriptBody>
<SnapIns />
<Parameters />
<TimeoutSeconds>300</TimeoutSeconds>
<StrictErrorHandling>true</StrictErrorHandling>
</ProbeAction>
</Task>
</Tasks>
</Monitoring>
<LanguagePacks>
<LanguagePack ID=”ENUIsDefault=”true“>
<DisplayStrings>
<DisplayString ElementID=”Sample.RegKey.IncreaseMaxQueueSize.AgentTask“>
<Name>Max Queue Size Increase</Name>
<Description>Increase Max Queue Size Regkey to 75 MB</Description>
</DisplayString>
</DisplayStrings>
</LanguagePack>
</LanguagePacks>
</ManagementPackFragment>

taskxml

You do this for both tasks and associate each with the appropriate PowerShell file.

So Visual Studio will look like this:

regkeymp

And once you build and import the pack you will have two tasks that will show up as options when you are in the Windows Computer State view:

task

If anyone wants these instructions in video form, just post a comment below and I will record a step-by-step video walkthrough.

If the source files or finished MP are helpful again don’t hesitate to ask. Just post a comment and I will zip up the files and upload to TechNet or GitHub.

Tagged , , ,

How do I: Generate a single report of all healthy agents + grey agents +timestamp of last recorded heartbeat?

This week is a training week, which means I have tiny windows of time to catch up on some blogging.

I have had this question a few times over the years. It seems like it should have a straightforward answer, but if there is one, I have not been able to find it.

When customers have asked this in the past I usually refer them to the following three posts:

https://blogs.msdn.microsoft.com/mariussutara/2008/07/24/last-contacted/

http://www.systemcentercentral.com/quicktricks-last-agent-heartbeat/

http://blog.scomskills.com/grey-agents-with-reason-gray-agents/

These do an excellent job in different ways of getting at the question of what agents are greyed out and when did heartbeats stop coming in.

Unfortunately, these do nothing to address the first part of the question, they want all agents, those that have stopped heart beating and also those that haven’t.

This is a little bit more tricky. It is easy enough to get a list of all agents, a list of grey agents, and to query for when health service heartbeat failures occur. But there is nothing easily accessible via the SDK or via the DW that (at least that I am aware of) allows us to capture a timestamp for when a non-grey agents last heartbeat came in.

So my natural question to my customer is why do you need the healthy agents heartbeat timestamp? The answer was basically that they want to feed that data into other systems in their org and they don’t want to deal with two different lists/files. They want one file, but at the end of the day they don’t actually need an exact timestamp for last heartbeat of a healthy agent.

This makes things a lot easier and lends itself to a relatively simple potential solution:

Import-Module OperationsManager

$Agent = get-scomclass -name “Microsoft.SystemCenter.Agent”
$MonitoringObjects = Get-SCOMMonitoringObject $Agent
$Date= Get-Date | Where-Object {$_.ToShortDateString()}
$DateSString = $Date.ToShortDateString()
$TimeLString= $Date.ToLongTimeString()
$DateTimeCombine = $DateSString + ” ” + $TimeLString
$UserDesktop = [Environment]::GetFolderPath(“Desktop”)
 
function GenerateAgentReport

{
    foreach ($object in $MonitoringObjects)
        {
    $result = New-Object –TypeName PSObject
    $result | Add-Member -MemberType NoteProperty -Name DisplayName -Value $object.DisplayName 
    $result | Add-Member -MemberType NoteProperty -Name Agent_Healthy -Value $object.IsAvailable
        if ($object.IsAvailable -contains “True”)
            {
             $result | Add-Member -MemberType NoteProperty -Name LastHeartbeat -Value $DateTimeCombine -PassThru
            }
        else
            {
            $result | Add-Member -MemberType NoteProperty -Name LastHeartbeat -Value $object.AvailabilityLastModified -PassThru
            }
        }
}

#GenerateAgentReport | Export-Csv “$UserDesktop\AgentReport.csv” -NoTypeInformation 
GenerateAgentReport | out-gridview

heartbeat

Basically this returns each agent in your management group. If the Agent is greyed out we use the AvailabilityLastModified property to pull an approximate timestamp. If the agent is still heartbeating as determined by the IsAvailable property then the AvailabilityLastModified property isn’t going to contain useful information, so in this case we substitute the current date/time for that field indicating that we have had a successful heartbeat within the past 5 minutes.

I said “approximate timestamp” when referring to agents with an IsAvailable value of false (greyed out agent) in that while in many cases AvailabilityLastModified should correspond to a when a heartbeat failure occurs flipping the agent from healthy to critical. If for some reason the agent was already in a critical state, but was still heartbeating the AvailabilityLastModified property would only be capturing when the agent went into the critical state, not the moment of last heartbeat. If you need a more or less exact moment of last heartbeat report I suggest using one of the links above. But if you need a quick PowerShell report to feed into other systems to help prioritize agent remediation the above script or some modified form of it might be mildly useful.

Tagged , , , ,

How do I: Create an Event View that excludes a particular Event ID

I had a large enterprise customer recently who was monitoring ADFS with the default management pack. They liked being able to glance at the event view which gave them a single place where they could look at the ADFS events occurring across their environment. They were using this event data as part of their correlation and tuning process to determine if there were additional actionable events that were being missed for their unique infrastructure. The eventual goal being to stop collecting the events altogether and only have alert generating rules/monitors in place for patterns of events that they cared about.

01

They quickly found that at least for their environment some of the events being collected were essentially noise, and they asked how to adjust the view so it would exclude one particular event.

This is one of those sounds really easy and of course the product should do this out of box questions that SCOM has never really had a great answer for.

If we take a look at the view it is populated by the following criteria:

02

And if we dig into the corresponding rule that collects the events we find a wildcard regex-style collection rule targeted at the ADFS log:

03

04

05

Since the collection rule is part of a sealed MP the best we could do at the rule level is to shut off this collection rule, and create a new collection rule with a modified wildcard expression such that it would collect everything the old rule did with the exception of the event ID the customer doesn’t like.

The problem with this solution is it isn’t particularly efficient/self-service friendly. If next week the customer realizes there is an additional event they want excluded the AD team has to contact the SCOM team and request further modifications.

In an ideal world the exclusion would be possible at the View level, but if you ever dig into modifying the classic OpsMgr views you will find that while you can use WildCards for some fields like Event Source to perform exclusions:

06

The same is not true for event ID’s, where wildcard exclusions are not allowed:

07

I briefly toyed with the idea of making modifications to the MP at the XML level to allow exclusions as I have occasionally done in the past to hack a subscription into meeting a customer need, but in this case such a solution doesn’t really fit. The customer needed something that was easy for them to change as they gradually winnow down the list of events they see to only the ones they care about.

They needed something that was extremely easy to edit.

Enter PowerShell and the SCOM SDK.

The first solution I put together for them to test was the following:

PowerShell Grid Widget

08

with a where-object {$_.Number -ne 31552 -and $_.PublisherName -eq “Health Service Modules” } I used a SCOM publishername since I didn’t have any ADFS events in my test environment and I wanted to use something that I could confirm that the exclusion was working as expected: 

11

Everything looked good the event I wanted excluded was dealt with properly  (Description dataObject is commented out in the code for this screenshot to make it easier to view. With Description uncommented each event takes up more lines of screen real-estate. I recommend creating two views, one with description commented out, and one where it is uncommented so customers can easily toggle between views.)

12

And if we remove the -ne $_.Number 31152 I get results as below with the event present:

10

In theory this should be all we needed, but when my customer tested out the script nothing happened. After a little bit of head scratching it became apparent what the problem was.

We were calling Get-SCOMEvent | Where-Object

which means we were telling the OpsMgr SDK to please go retrieve every single event in the OpsDB, and then once you are done with that we are going to pipe the results to a Where-Object and tell you what we really need.

In my relatively small test environment this wasn’t that big of an ask and the results returned quickly.

In my customer’s environment with thousands of servers and friendly event generating MP’s like the Exchange 2010 MP, getting every event in the OpsDB was basically a great way to enter an endless loop of dashboard timeouts with nothing ever being displayed.

So we needed to filter things down a bit up front, before piping to the Where-Object.

If you search the blogs you will find that Stefan Stranger has a nice post describing how to deal with this issue when calling the Get-SCOMAlert cmdlet with a Where-Object. Basically you use Get-SCOMAlert -criteria and then pipe to a Where-Object if still needed.

Unfortunately, Get-SCOMEvent doesn’t have a -criteria parameter because that would make things too easy and intuitive.

It does, however, have a -rule parameter which looked promising:

13

First I tried passing it a rule Name, followed by a second try with a rule GUID for an event collection rule I was interested in. In both cases I got a nice red error message:

14

While a little a cryptic it is saying that I am passing a parameter of the type string, and it wants a special SCOM specific rule type.

To give it what it wants we need to first retrieve the -rule parameter using the get-scomrule cmdlet and then pass it to get-scomevent as a variable:

$rule = get-scomrule -DisplayName “Operations Manager Data Access Service Event Collector Rule”

15

$rule = get-scomrule -DisplayName “Operations Manager Data Access Service Event Collector Rule”

get-scomevent -rule $rule

16

So our final script would look something like this: (I have added some additional filtering to be able to allow if you just want events from the past hour. *Keep in mind this date/time filtering doesn’t increase the efficiency of the script since it occurs after the Where-Object, the only thing making this script more efficient is that we are first only pulling back events collected from a specific rule*)

$rule = get-scomrule -DisplayName “Operations Manager Data Access Service Event Collector Rule”

$DateNow = date

#Modify the .AddMinutes below to determine how far back to pull events

$DateAgo = $DateNow.AddMinutes(-60)

#$_.Number -ne(not equals) is used to indicate the event number that you want to exclude from the view

$eventView = Get-scomevent -rule $rule |where-object {$_.Number -ne 17 -and $_.TimeGenerated -ge $DateAgo -And $_.TimeGenerated -le $DateNow}|Select Id, MonitoringObjectDisplayName,  Number, TimeGenerated, PublisherName, Description| sort-object TimeRaised -descending

foreach ($object in $eventView){

     $dataObject = $ScriptContext.CreateInstance(“xsd://OpsConfig!sample/dashboard”)

     $dataObject[“Id”] = [String]($object.Id)

     $dataObject[“Event Number”] = [Int]($object.Number)

     $dataObject[“Source”] = [String]($object.MonitoringObjectDisplayName)

     $dataObject[“Time Created”] = [String]($object.TimeGenerated)

     $dataObject[“Event Source”] = [String]($object.PublisherName)

     $dataObject[“Description”] = [String]($object.Description)

     $ScriptContext.ReturnCollection.Add($dataObject)

}

And then the ADFS code would look like this, though event 17 was not the event they wanted to exclude:

$rule = get-scomrule -DisplayName “Federation server events collection”

$DateNow = date

#Modify the .AddMinutes below to determine how far back to pull events

$DateAgo = $DateNow.AddMinutes(-60)

#$_.Number -ne(not equals) is used to indicate the event number that you want to exclude from the view

$eventView = Get-scomevent -rule $rule |where-object {$_.Number -ne 17 -and $_.TimeGenerated -ge $DateAgo -And $_.TimeGenerated -le $DateNow}|Select Id, MonitoringObjectDisplayName,  Number, TimeGenerated, PublisherName, Description| sort-object TimeRaised -descending

foreach ($object in $eventView){

     $dataObject = $ScriptContext.CreateInstance(“xsd://OpsConfig!sample/dashboard”)

     $dataObject[“Id”] = [String]($object.Id)

     $dataObject[“Event Number”] = [Int]($object.Number)

     $dataObject[“Source”] = [String]($object.MonitoringObjectDisplayName)

     $dataObject[“Time Created”] = [String]($object.TimeGenerated)

     $dataObject[“Event Source”] = [String]($object.PublisherName)

     $dataObject[“Description”] = [String]($object.Description)

     $ScriptContext.ReturnCollection.Add($dataObject)

Hopefully this helps save a little bit of time for anyone else who comes across a question like this one.

Tagged , , , , , ,

How do I: Change the default behavior of the DB Mirror Status Monitor

MP Authoring Series (For an explanation of this series read this post first.)

Real World Issue: Customer is seeing a lot of Critical alerts for Mirrored Databases in a Disconnected state, but only Warning alerts for Mirrored Databases in Suspended state. In this customer’s environment brief disconnects are common and not necessarily indicative of a Critical issue, whereas a mirrored database in a Suspended state is always Critical for this customer. The customer wants to swap the default states such that Disconnected will now be Warning and Suspended will be Critical.

There are three Mirrored Database Mirror Status Monitors

01 02

When we dig into the properties

2.5

We see the three states as well as the corresponding Statuses that map to those states.

03

When we check Overrides we find there is nothing we can override to meet the customer needs:

04

So to accommodate this request we need to do a little custom authoring in Visual Studio + VSAE.

The process isn’t too complex, however, it is much easier to absorb via video than the many page article that would result if I tried to document the steps by hand:

Once you have your final successful build you will find your files in the bin–Debug folder of your project:

05

Once you import your custom MP into SCOM you will have a cloned monitor with your modified behavior:

06

 

Tagged , , ,

How Do I: Access SCOM Properties Programmatically

For some background. The question that led to this post was in regard to being able to access properties that SCOM was discovering in order to detect config drift in some networking hardware. Normally when I get a question like this my first answer is don’t use SCOM for this–use OMS, SCCM, or some other tool designed specifically for this purpose. With that said, they had a specific use case that made sense, and SCOM was already collecting all the properties they cared about as part of a 3rd party Management Pack so the primary goal became giving the customer a better picture of where this data gets stored and the easiest way to access it.

First way of getting at discovered property data is via the OperastionsManager Database (The usual caveats about directly querying the OpsDB not being recommended or supported apply.)

There are tables called Dbo.MT which contain the various properties associated with a certain class of object.

prop01

If I look at something like SQL 2014 Databases I find the following: (There are more properties, but they get truncated off screen)

Select * from dbo.MT_Microsoft$SQLServer$2014$Database

prop02

To make this a little more meaningful we need to pick which tables we are interested in and join FullName from BaseManagedEntityID so we can understand which systems these databases are associated with. For this I wrote the following query:

SELECT

BME.FullName,

MT.DatabaseName_3AD1AB73_FD77_E630_3CDE_2CA224473213 As ‘DB Name’,

MT.DisplayName,

MT.DatabaseAutogrow_E32D36C4_7E11_62BE_D5B4_B77C841DCCA1 As ‘DB Autogrow’,

MT.RecoveryModel_772240AD_E512_377C_8986_E4F8369BDC21 As ‘DB RecoveryModel’,

MT.LogAutogrow_75D233F6_0569_DB26_0207_8894057F498C As ‘LogAutogrow’,

MT.Collation_4BC5C384_34F3_4C3F_A398_2298DBA85BCD As ‘Collation’,

MT.BaseManagedEntityId

FROM dbo.MT_Microsoft$SQLServer$2014$Database MT

JOIN dbo.BaseManagedEntity BME On BME.BaseManagedEntityID  = MT.BaseManagedEntityId

Which gives this output:

prop03

You could also get at similar data through the SDK via PowerShell (This would technically be the officially supported technique, though sometimes not as flexible as SQL). To do this you would use something like:

Import-Module OperationsManager

$WindowsServerClass= Get-SCOMClass -Name Microsoft.SQLServer.2014.Database

$ServerObjects = Get-SCOMClassInstance -Class $WindowsServerClass | Select Fullname, *.DatabaseName,*.RecoveryModel,*.DatabaseAutogrow,*.LogAutogrow,*.Collation

$ServerObjects

This will give you results that look as follows: (I just arbitrarily picked a few properties, there are more available that you can look at with either I get-member or | Select *

prop04

From there we can make things a little more readable with the following:

Import-Module OperationsManager

$WindowsServerClass= Get-SCOMClass -Name Microsoft.SQLServer.2014.Database

$ServerObjects = Get-SCOMClassInstance -Class $WindowsServerClass

$ServerObjectsB = $ServerObjects | Select *.DatabaseName, *.RecoveryModel, *.DatabaseAutogrow, *.LogAutogrow, *.Updateability, *.UserAccess, *.Collation, *.Owner, *.ResourcePool | FT

prop05

From there we started playing around with ways to quickly identify differences:

prop06

This is still a work in progress, but I figured I would share in case this can be of use to anyone.

Tagged ,

How do I: Send SMS Text Message Notifications for Heartbeat Failures

Continuing in my series of interesting questions from last year and my answers here is one on Sending SMS Notifications for Heartbeat Failures for a subset of mission critical servers. The added wrinkle to this question was they also needed to be certain (due to the security requirements of their environment) that no information regarding servername, IP address, or other info of that nature which might be part of a typical alert description make it into the text alerts.

Text Alert on Heartbeat failures without Confidential information/Server names

SCOM Heartbeat Failure Chain of events:

01SMS

Above Diagram pilfered with attribution from TechNet.

First you need to setup a new E-Mail Notification Channel

Select Administration

02SMS

Channels:

03SMS

New

04SMS

Select E-Mail (SMTP)

05SMS

Enter a Channel Name:

06SMS

Enter a SMTP Server and a Return address (You will likely need an exception that will allow the SMTP server to send messages outside your domain)

07SMS

Modify the Subject and Message as follows:

E-mail subject:

Alert: $Data[Default=’Not Present’]/Context/DataItem/AlertName$ Resolution state: $Data[Default=’Not Present’]/Context/DataItem/ResolutionStateName$

E-mail Message:

Alert: $Data[Default=’Not Present’]/Context/DataItem/AlertName$

Last modified by: $Data[Default=’Not Present’]/Context/DataItem/LastModifiedBy$

Last modified time: $Data[Default=’Not Present’]/Context/DataItem/LastModifiedLocal$

(Ultimately you could add additional text here as well, the key is that we are pulling out the variables from the Channel that would normally populate the server name when there is a heartbeat failure)

08SMS

Click Finish

09SMS

Create a new Subscription

10SMS

Created by specific rules or monitors — Health Service Heartbeat Failure

With a specific resolution state–New

11SMS

Add subscribers (If you want it to send text messages you can create new unique subscriber and have an address that consists of the appropriate cell number + service provider combination:

Sprint

cellnumber@messaging.sprintpcs.com 

Verizon

cellnumber@vtext.com

T-Mobile

cellnumber@tmomail.net

AT&T

cellnumber@txt.att.net

For my example I am just using an internal account in my environment.

12SMS

Select your newly created notification channel. You may want to delay notifications by 15 minutes.  That way if the server is down for less than 15 minutes you won’t get a text message at 3 AM.

13SMS

Click Finish

14SMS

Now if a server goes offline the console will still generate an alert as before with the server name:

15SMS

But the e-mail or text message will be generic without any confidential information:

16SMS

For alerts other than heartbeat you might have to check and craft a slightly modified channel to insure no info you don’t want texted is sent out.

A quick example to illustrate this:

Ultimately $Data/Context/DataItem/AlertName$ will map to a different value for each type of alert. So for the alert below:

17SMS

That variable maps to:

18SMS

So Alert Name by itself will not map to anything proprietary like IP Address/domain/computername etc unless you have created a custom alert which contains any of this info in the Alert Name field. Though with that said it may still map to info about specific technologies. So one might be able to use the Alert Name to determine what types of applications you are running which could in some cases be a security concern. To get a sense of the type of values that typically show up in your environment the quick and easy method is to just look at your Monitoring Pane – Active Alerts  Name column:

19SMS

So for my environment you could learn from this info what apps I am running (SharePoint, SQL, ACS), in the case of the Page Life Expectancy you are able to find out the version of SQL etc. If this kind of info isn’t a security concern for your business you could just pass the Alert Name field from any alerts that meet a certain Severity/Priority Criteria. If this type of info is a concern then you need to determine which alerts are ok to pass alert name like Health Service Heartbeat failure and which need to be withheld and then filter your notification subscription criteria accordingly.

If you want a slighter better view of this info you could use PowerShell:

Import-Module OperationsManager

Get-SCOMAlert 

20SMS

Get-SCOMAlert | Select Name

This will give you possible values that could populate that variable. (Keep in mind this will only pull back values that are currently in the OpsDB so this will be all alerts in that DB based on your grooming/retention settings.)

 

Tagged , , ,

How do I: Add Exclusion Criteria to SCOM Notification Subscriptions

By default all filtering criteria in MP Notification subscriptions are specific to inclusion. There is no native ability via the GUI to indicate that the subscription should pick up every alert related by a certain criteria with the exception of a specific subset of alerts. The only way to accomplish this via the GUI is if your inclusion criteria specifically enumerates every other alert instance with the exception of the alerts you want to exclude.

01

An example where this would cause problems is if you want to have one notification subscription that notifies for every Alert of a Severity of Critical with the exception of alerts from a specific monitor.

02

The first part is easy with the above config, but the second part (the excluding one specific monitor alert) is not possible. This type of scenario becomes important when you want to have two subscriptions:

 

  1. One that sends all critical alerts immediately with the exception of one specific monitor alert that has a recovery.
  2. And a second subscription that is on a 5 minute delay and gives the monitor a chance to recover and only sends an alert if post recovery running and health recalc the monitor still shows an unhealthy condition.

So to accomplish this we need to do a little custom work at the XML level.

First please note that according to the following TechNet article

https://technet.microsoft.com/en-us/library/hh212805.aspx

03

This is of course in reference to what you can and cannot do at the GUI level, but keep in mind that what you are doing is not officially supported and that you need to test carefully because it is very easy to accidentally break your subscription. (Also note that modifying a subscription in this way will require you to change your procedure for future modifications made to the GUI due to the fact that future changes in the GUI will blow away your manual XML changes so if you need to tweak the subscription at the GUI level at a later date you need to remember that you will have to run through the process below again to re-establish the exclusion.)

So to Add an Exclusion to a Notification Subscription

Administration Pane

04

Management Packs

05

Export Notifications Internal Library MP

06

Make a backup copy of this MP XML for safekeeping

Open non backup exported XML in text editor of choice. I am using Visual Studio, but anything including notepad will work.

The Channel Subscription and Subscriber info is all defined within this pack. You need to find the section of XML that corresponds to the subscription you are interested in.

In the case of my environment the Subscription is called Test Subscription:

08

If I scroll to the end of the Management Pack XML I will hit the <DisplayStrings></DisplayStrings> section where I can find the corresponding ID that will allow me to find the my subscription. (If you only have a few subscriptions you may be able to figure this out without the ID, but just to be safe it can be helpful to make sure you are editing the right subscription.

I Find my Test Subscription and see that it has a unique Element ID of: Subscription7adf1953_5ea7_4f20_85c9_67271662212a

09

If I then search the XML for references to this Element ID I will find the relevant portion of XML that we are going to want edit.

10

 

The important part that we will need to modify is contained within the <AlertChangedSubscription></AlertChangedSubscription>

 

In the case of this particular notification subscription we will change:

 

<AlertChangedSubscription Property=”Any”>

<Criteria>

<Expression>

<SimpleExpression xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance“>

<ValueExpression>

<Property>Severity</Property>

</ValueExpression>

<Operator>Equal</Operator>

<ValueExpression>

<Value>2</Value>

</ValueExpression>

</SimpleExpression>

</Expression>

</Criteria>

<ExpirationStartTime>12/11/2015 22:12:44</ExpirationStartTime>

<PollingIntervalMinutes>1</PollingIntervalMinutes>

<UserSid>S-1-5-21-2573163049-3319608367-1007842708-1106</UserSid>

<LanguageCode>ENU</LanguageCode>

<ExcludeNonNullConnectorIds>false</ExcludeNonNullConnectorIds>

<RuleId>$MPElement$</RuleId>

<TargetBaseManagedEntityId>$Target/Id$</TargetBaseManagedEntityId>

<TimeZone>E001000000000000C4FFFFFF00000B0000000100020000000000000000000300000002000200000000000000|Pacific Standard Time</TimeZone>

</AlertChangedSubscription>

To:

<AlertChangedSubscription Property=”Any”>

<Criteria>

<Expression>

<And xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance“>

<Expression>

<SimpleExpression>

<ValueExpression>

<Property>ProblemId</Property>

</ValueExpression>

<Operator>NotEqual</Operator>

<ValueExpression>

<Value>b59f78ce-c42a-8995-f099-e705dbb34fd4 </Value>

</ValueExpression>

</SimpleExpression>

</Expression>

<Expression>

<SimpleExpression>

<ValueExpression>

<Property>Severity</Property>

</ValueExpression>

<Operator>Equal</Operator>

<ValueExpression>

<Value>2</Value>

</ValueExpression>

</SimpleExpression>

</Expression>

</And>

</Expression>

</Criteria>

<ExpirationStartTime>12/11/2015 19:50:38</ExpirationStartTime>

<PollingIntervalMinutes>1</PollingIntervalMinutes>

<UserSid>S-1-5-21-2573163049-3319608367-1007842708-1106</UserSid>

<LanguageCode>ENU</LanguageCode>

<ExcludeNonNullConnectorIds>false</ExcludeNonNullConnectorIds>

<RuleId>$MPElement$</RuleId>

<TargetBaseManagedEntityId>$Target/Id$</TargetBaseManagedEntityId>

<TimeZone>E001000000000000C4FFFFFF00000B0000000100020000000000000000000300000002000200000000000000|Pacific Standard Time</TimeZone>

</AlertChangedSubscription>

This is going to vary depending on the complexity of your existing subscription you have to be careful to take into account existing <And> and <Or> tags when present.

<Value>b59f78ce-c42a-8995-f099-e705dbb34fd4</Value> Needs to be set to the appropriate ID for the alert you want to exclude.

For my example I am using the HealthService Heartbeat failure alert from my environment.

To determine the ID that is associated with a specfic rule/monitor generated alert that is currently present in the console you can use the following PowerShell from a Management server. Keep in mind if testing between a test and prod environment that ID values on custom monitors may be different. Run the PowerShell in both environments to be sure before implementing in prod.

11 12

For my test I will exclude any Health Service Heartbeat failure alerts which have the following ID:

13

If you don’t have an alert in the console to find the ID you could use the following query which will give you the ID of every Monitor in SCOM:

get-SCOMMonitor | select-object @{Name=”MP”;Expression={ foreach-object {$_.GetSCOMManagementPack().DisplayName }}},DisplayName, Priority, Enabled, Id | Out-GridView

 

14

15

You can use the Add criteria button to filter things down further:

16

Once the modifications to the management pack are complete you can reimport the newly updated management pack.

*WARNING* Keep in mind that the this UNSEALED MP will replace the existing MP on import so if there is an error in your code you could potentially break all subscriptions in your environment. This is why having a backup copy is extremely important. It is also why you need to test this procedure in a test environment before trying it in prod *WARNING* Again keep in mind that future changes to this notification subscription via the GUI will break your exclusion criteria and require you to manually modify the subscription again.

17

Click Install

18 19

Once imported generate one of the alerts that correspond with the exclude ID to see if it is properly excluded from the notification. Also test generate alerts that should be picked up by the subscription to confirm they are still being sent and that the subscription is not broken. Also Watch the console for any notification subscription specific alerts. If there are any errors in your syntax it can create a situation where you break all notifications.

Now when I have Critical Alerts The Health Service Heartbeat failures will be excluded from my subscription, but all other alerts including those made by monitors that are created in the future with a severity of critical will get picked up:

20

21

Again keep in mind the one caveat to modifying subscription XML in this way is that you lose the ability to edit that subscription via the GUI in the future. If I modify the subscription further during the GUI after making manual XML changes it will blow away the exclusion/Not Equal XML that was added. If you need to edit the subscription via the GUI just remember you need to go through the process above again to manual edit the XML.

 

Tagged , , ,

How do I: Create a Wildcard SCOM Service Monitor and Recovery

I recently had a question from a customer on how to create  Wildcard Service Monitors + Recoveries. The Service Monitor from the Monitoring template and a simple Unit Monitor for services both require an explicit service name, no wildcards allowed. For most services this is fine, but there are some applications which do fun and interesting things like concatenate computername + service to create a unique service name. This creates a bit of a problem for monitoring. You can create individual monitors, but if you have hundreds of services each with unique service names that follow a particular pattern creating hundreds of corresponding monitors could get a little time consuming.

Brian Wren has a great article from back in the SCOM 2007 R2 days that answers part of this question, but when I went through the steps I found it needed some slight tweaking and updating for SCOM 2012 R2. Once that was complete I also needed to come up with a simple low overhead wildcard service recovery for when one of the services stops and needs to be brought back online.

Below are my steps:

Launch SCOM Console

Administration

01

Management Packs

02

Tasks – Create a Management Pack

03

Enter Name + Description – Next – Create

04

Select Authoring

05

Right Click Windows Service

08

Add Monitoring Wizard

09

Windows Service

10

Enter a Name, Save to the MP you just created

11

Enter Service Name. Use % for a wildcard representing multiple characters. As I don’t have any unique services in my environment I am using m% to demonstrate how this can work. For the rest of these instructions wherever you see m% keep in mind that you need to modify this value to match your unique service name wildcard value. Be careful using too broad a wildcard could create a lot of noise and load very quickly in your environment.

Pick a Target Group. In this case I am using All Windows Computers. Generally you would want to target this as precisely as possible. Leave Monitor only automatic service checked

12

Click Next

13

Click Create

14

Select Administration

01

Select Management Packs

02

Select the Custom Management Pack you just created

15

Select Export Management Pack

16

Select a location to save the unsealed xml file

17

Click OK

18

Open the File in your XML editor of choice (Notepad will do, but Visual Studio or Notepad+++ will make it a bit easier to read)

19

Search the file for your wildcard in my case this is M%

20

We’ll be making a few replacements in the code.

21

You will be modifying:

<DataSource ID=”DS” TypeID=”MicrosoftWindowsLibrary7585010!Microsoft.Windows.Win32ServiceInformationProviderWithClassSnapshotDataMapper”>

<ComputerName>$Target/Property[Type=”MicrosoftWindowsLibrary7585010!Microsoft.Windows.Computer”]/NetworkName$</ComputerName>

<ServiceName>m%</ServiceName>

 

To: (remember to also swap the m% with the appropriate value)

 

<DataSource ID=”DS” TypeID=” MicrosoftWindowsLibrary7585010!Microsoft.Windows.WmiProviderWithClassSnapshotDataMapper”>

<NameSpace>root\cimv2</NameSpace>

<Query>select * from win32_service where name like ‘m%'</Query>

_________

  • In Brian Wren’s instructions he used TypeID=”Windows!Microsoft,Windows.Win32…” The Alias in my custom console generated MP is MicrosoftWindowsLibrary7585010! If you run into any errors keep in mind that whatever alias is present in the manifest references must be consistent. I haven’t tested to confirm, but based on the output it looks like the console MP generated alias is based on MP+Version Number. If you have a different version of the MP and you follow my steps exactly you will likely hit an error as the Alias I provide for Microsoft.Windows.Library is going to be off by a few numbers from yours. If this is the case just modify the alias in my example to match what you have in the rest of the .xml file.

 

And

 

<Name>$MPElement[Name=”MicrosoftSystemCenterNTServiceLibrary!Microsoft.SystemCenter.NTService”]/ServiceProcessName$</Name>

<Value>$Data/Property[@Name=’BinaryPathName’]$</Value>

 

To:

 

<Name>$MPElement[Name=”MicrosoftSystemCenterNTServiceLibrary!Microsoft.SystemCenter.NTService”]/ServiceProcessName$</Name>

<Value>$Data/Property[@Name=’PathName’]$</Value>

Save the .xml file

Go back to the SCOM console – ADministration

01

Import Management Packs

22

Add from disk

23

Select the newly modified .xml file

24

Install

25

Close

26

To check and confirm that the discovery associated with the wildcard monitor is working.

Select Monitoring

27

Discovered Inventory

28

Change Target Type

29

Select Custom Target

30

A few minutes after importing the updated pack you should see services discovered.

31

Now we need to create a wildcard recovery. If this was a single service recovery I would create a standard SCOM recovery and call net.exe and pass a start command with the service name. Since this is a wildcard service we have to do things a little differently as I don’t know of a way to pass wildcards to net.exe. (We could use PowerShell, but for this I want to try to be as light weight as possible from an overhead perspective even if that means sacrificing some more advanced error handling that we could easily add in with PowerShell.)

Go to Authoring:

05

Select Windows Service

32

Right Click your custom Service Monitor – View Management Pack Objects – Monitors

33

Expand Entity Health -Availability – Right Click the Basic Service Monitor Stored in your custom MP – Properties

34

Select the Diagnostics and Recovery Tab

35

Under Configure recovery tasks select Add – Recovery for critical health state

36

Select Run Command

37

Name your Recovery – Check the Boxes for run recovery automatically and recalculate monitor state after recovery finishes

38

Enter Full path to file

c:\windows\system32\wbem\WMIC.exe

Parameters: (originally I used slightly different param, but found that while it worked in the command line it failed when run as recovery. This method works consistently)

/interactive :off service where “name like ‘m%'” call startservice

fix

Click Create

You should now be all set to test out and validate your new monitor.

EndNote/Cautionary Tangent:

Just keep in mind that a wildcard discovery if targeted incorrectly (too broad a wildcard, too broad a target group, or both) you could have the recipe for a single monitor that can cause a lot of churn/perf issues/and noise in your environment. So be cautious and test very carefully. Make sure you have a good sense of the number of objects this monitor will pick up not just in your test environment, but once you move it into production.  To be clear I would never recommend using a wildcard as broad as m% in production. This picks up way too many services that you likely don’t care about.

Also please note that the recovery is equally general as the monitor if not more so. It is also not checking to see if the services that apply to it are already started. In the case of my example m% picks up a bunch of services. If a single service matching that criteria goes down, the recovery will attempt to recover/start every single service that matches that criteria m%. So if you are building your wildcard service monitor to pickup multiple services on single system that follow a common pattern, a failure of one will result in an attempt to recover all.

In theory this shouldn’t be a problem. The method I am using is extremely lightweight and if the service is already started in the background the service will just output an exit code of “I’m already started” and remain started. With that said this is only a sample, and its still worth testing in your environment to confirm the behavior and make sure you understand exactly how the recovery is working before you consider implementing.

An example of running the recovery for an instance of m% being stopped:

40.1

 

Tagged , , , , ,