Tag Archives: SCOM

How do I: Create An Advanced SQL Database Backup Monitor In Visual Studio?

My favorite part of my job is getting to work with customers and understand exactly how they use, and also need a product to work. Some of the time this just means listening closely, answering questions, and relaying information back to the product group. But often there are those tiny changes and use cases that fall outside the realm of things the product group is likely to address.

A change that might be hugely valuable for Customer A, won’t happen if it breaks backwards compatibility for Customers B-Z.

One of my customers has a SCOM environment for SQL that monitors over 20,000 databases. Due to their size there are often cases where the out-of-box monitoring provided by the SQL Product Groups management packs isn’t able to completely meet their needs. While the pack provides fantastic monitoring for SQL, there are times where they need a more granular view of the health of their environment.

An instance of this from the past year was monitoring SQL Database backups. The SQL Product Group’s pack gives you the ability to alert if a database hasn’t been backed up in a certain number of days. By default it is set to 7 days, but you can override that to any integer value of days.

For my customer this wasn’t really good enough. They wanted to be able to drill down and alert on hours since last backup. They also wanted multiple severities so if it had been 20 hours since a backup, an e-mail could go out, but at 30 hours we would generate a page. The 20 & 30 hours would be customizable at the individual database level, and they also wanted some added logic that would check database backups for databases that had a status of “ONLINE”. We have other monitors that look at DB Status in general so in this case if a database was OFFLINE they either knew about it from the other monitors and were fixing it, or it was intentional in which case they didn’t want a backup alert.

The basic logic behind the SQL PG’s MP is a simple T-SQL query wrapped in a fairly complex vbscript. The unwrapped T-SQL is below:

The T-SQL modifications we need to make are relatively simple swap DAY to HOUR, and add in a line to only return database backup info for databases with a status of ONLINE.

To get this into a working Management Pack is a little bit more complex and requires isolating and cloning the Product Groups Database Backup Monitor in Visual Studio, and then making a few changes to the XML for our custom iteration.  To prevent screenshot overload I did a quick step-by-step walkthrough of the process. For this video I opted to leave out three-state severity request, and will show how to add that functionality in a follow up video.

If you have any questions or need any help, just leave a comment.

Tagged , , , , , ,

How do I: Change the default behavior of the DB Mirror Status Monitor

MP Authoring Series (For an explanation of this series read this post first.)

Real World Issue: Customer is seeing a lot of Critical alerts for Mirrored Databases in a Disconnected state, but only Warning alerts for Mirrored Databases in Suspended state. In this customer’s environment brief disconnects are common and not necessarily indicative of a Critical issue, whereas a mirrored database in a Suspended state is always Critical for this customer. The customer wants to swap the default states such that Disconnected will now be Warning and Suspended will be Critical.

There are three Mirrored Database Mirror Status Monitors

01 02

When we dig into the properties

2.5

We see the three states as well as the corresponding Statuses that map to those states.

03

When we check Overrides we find there is nothing we can override to meet the customer needs:

04

So to accommodate this request we need to do a little custom authoring in Visual Studio + VSAE.

The process isn’t too complex, however, it is much easier to absorb via video than the many page article that would result if I tried to document the steps by hand:

Once you have your final successful build you will find your files in the bin–Debug folder of your project:

05

Once you import your custom MP into SCOM you will have a cloned monitor with your modified behavior:

06

 

Tagged , , ,

How do I: Create a Wildcard SCOM Service Monitor and Recovery

I recently had a question from a customer on how to create  Wildcard Service Monitors + Recoveries. The Service Monitor from the Monitoring template and a simple Unit Monitor for services both require an explicit service name, no wildcards allowed. For most services this is fine, but there are some applications which do fun and interesting things like concatenate computername + service to create a unique service name. This creates a bit of a problem for monitoring. You can create individual monitors, but if you have hundreds of services each with unique service names that follow a particular pattern creating hundreds of corresponding monitors could get a little time consuming.

Brian Wren has a great article from back in the SCOM 2007 R2 days that answers part of this question, but when I went through the steps I found it needed some slight tweaking and updating for SCOM 2012 R2. Once that was complete I also needed to come up with a simple low overhead wildcard service recovery for when one of the services stops and needs to be brought back online.

Below are my steps:

Launch SCOM Console

Administration

01

Management Packs

02

Tasks – Create a Management Pack

03

Enter Name + Description – Next – Create

04

Select Authoring

05

Right Click Windows Service

08

Add Monitoring Wizard

09

Windows Service

10

Enter a Name, Save to the MP you just created

11

Enter Service Name. Use % for a wildcard representing multiple characters. As I don’t have any unique services in my environment I am using m% to demonstrate how this can work. For the rest of these instructions wherever you see m% keep in mind that you need to modify this value to match your unique service name wildcard value. Be careful using too broad a wildcard could create a lot of noise and load very quickly in your environment.

Pick a Target Group. In this case I am using All Windows Computers. Generally you would want to target this as precisely as possible. Leave Monitor only automatic service checked

12

Click Next

13

Click Create

14

Select Administration

01

Select Management Packs

02

Select the Custom Management Pack you just created

15

Select Export Management Pack

16

Select a location to save the unsealed xml file

17

Click OK

18

Open the File in your XML editor of choice (Notepad will do, but Visual Studio or Notepad+++ will make it a bit easier to read)

19

Search the file for your wildcard in my case this is M%

20

We’ll be making a few replacements in the code.

21

You will be modifying:

<DataSource ID=”DS” TypeID=”MicrosoftWindowsLibrary7585010!Microsoft.Windows.Win32ServiceInformationProviderWithClassSnapshotDataMapper”>

<ComputerName>$Target/Property[Type=”MicrosoftWindowsLibrary7585010!Microsoft.Windows.Computer”]/NetworkName$</ComputerName>

<ServiceName>m%</ServiceName>

 

To: (remember to also swap the m% with the appropriate value)

 

<DataSource ID=”DS” TypeID=” MicrosoftWindowsLibrary7585010!Microsoft.Windows.WmiProviderWithClassSnapshotDataMapper”>

<NameSpace>root\cimv2</NameSpace>

<Query>select * from win32_service where name like ‘m%'</Query>

_________

  • In Brian Wren’s instructions he used TypeID=”Windows!Microsoft,Windows.Win32…” The Alias in my custom console generated MP is MicrosoftWindowsLibrary7585010! If you run into any errors keep in mind that whatever alias is present in the manifest references must be consistent. I haven’t tested to confirm, but based on the output it looks like the console MP generated alias is based on MP+Version Number. If you have a different version of the MP and you follow my steps exactly you will likely hit an error as the Alias I provide for Microsoft.Windows.Library is going to be off by a few numbers from yours. If this is the case just modify the alias in my example to match what you have in the rest of the .xml file.

 

And

 

<Name>$MPElement[Name=”MicrosoftSystemCenterNTServiceLibrary!Microsoft.SystemCenter.NTService”]/ServiceProcessName$</Name>

<Value>$Data/Property[@Name=’BinaryPathName’]$</Value>

 

To:

 

<Name>$MPElement[Name=”MicrosoftSystemCenterNTServiceLibrary!Microsoft.SystemCenter.NTService”]/ServiceProcessName$</Name>

<Value>$Data/Property[@Name=’PathName’]$</Value>

Save the .xml file

Go back to the SCOM console – ADministration

01

Import Management Packs

22

Add from disk

23

Select the newly modified .xml file

24

Install

25

Close

26

To check and confirm that the discovery associated with the wildcard monitor is working.

Select Monitoring

27

Discovered Inventory

28

Change Target Type

29

Select Custom Target

30

A few minutes after importing the updated pack you should see services discovered.

31

Now we need to create a wildcard recovery. If this was a single service recovery I would create a standard SCOM recovery and call net.exe and pass a start command with the service name. Since this is a wildcard service we have to do things a little differently as I don’t know of a way to pass wildcards to net.exe. (We could use PowerShell, but for this I want to try to be as light weight as possible from an overhead perspective even if that means sacrificing some more advanced error handling that we could easily add in with PowerShell.)

Go to Authoring:

05

Select Windows Service

32

Right Click your custom Service Monitor – View Management Pack Objects – Monitors

33

Expand Entity Health -Availability – Right Click the Basic Service Monitor Stored in your custom MP – Properties

34

Select the Diagnostics and Recovery Tab

35

Under Configure recovery tasks select Add – Recovery for critical health state

36

Select Run Command

37

Name your Recovery – Check the Boxes for run recovery automatically and recalculate monitor state after recovery finishes

38

Enter Full path to file

c:\windows\system32\wbem\WMIC.exe

Parameters: (originally I used slightly different param, but found that while it worked in the command line it failed when run as recovery. This method works consistently)

/interactive :off service where “name like ‘m%'” call startservice

fix

Click Create

You should now be all set to test out and validate your new monitor.

EndNote/Cautionary Tangent:

Just keep in mind that a wildcard discovery if targeted incorrectly (too broad a wildcard, too broad a target group, or both) you could have the recipe for a single monitor that can cause a lot of churn/perf issues/and noise in your environment. So be cautious and test very carefully. Make sure you have a good sense of the number of objects this monitor will pick up not just in your test environment, but once you move it into production.  To be clear I would never recommend using a wildcard as broad as m% in production. This picks up way too many services that you likely don’t care about.

Also please note that the recovery is equally general as the monitor if not more so. It is also not checking to see if the services that apply to it are already started. In the case of my example m% picks up a bunch of services. If a single service matching that criteria goes down, the recovery will attempt to recover/start every single service that matches that criteria m%. So if you are building your wildcard service monitor to pickup multiple services on single system that follow a common pattern, a failure of one will result in an attempt to recover all.

In theory this shouldn’t be a problem. The method I am using is extremely lightweight and if the service is already started in the background the service will just output an exit code of “I’m already started” and remain started. With that said this is only a sample, and its still worth testing in your environment to confirm the behavior and make sure you understand exactly how the recovery is working before you consider implementing.

An example of running the recovery for an instance of m% being stopped:

40.1

 

Tagged , , , , ,

Reading List: SCOMzilla/scom_atlas

Continuing in this week’s series of blogs that you may not know about that you should read I present Matt T’s blog:

http://blogs.technet.com/b/scom_atlas/

Best known for his awesome scheduled maintenance mode SCORCH runbook solution Matt T is an excellent source of SCOM/SCORCH related knowledge.

Tagged , , ,

How do I: Alert on SQL Errors that aren’t logged to the windows event log

This is one of those common questions that if you ask a SQL DBA they will probably know the answer, but it is less common information within the SCOM community.

First if you want to get a sense of all the errors that SQL can generate to its own internal logs run the following from your server (Language ID will of course vary):

01

For my SQL 2014 Server I am getting back 11548 rows of messages:

02

 

Column name Data type Description
message_id int ID of the message. Is unique across server. Message IDs less than 50000 are system messages.
language_id smallint Language ID for which the text in text is used, as defined in syslanguages. This is unique for a specified message_id.
severity tinyint Severity level of the message, between 1 and 25. This is the same for all message languages within a message_id.
is_event_logged bit 1 = Message is event-logged when an error is raised. This is the same for all message languages within a message_id.
text nvarchar(2048) Text of the message used when the corresponding language_id is active.

For the most part the SQL MP’s will give you access to any of the events you might care about in both the SQL and Windows Application event logs. In those cases where this doesn’t happen there is a built in stored procedure in SQL that lets you write SQL errors to the Windows Application log to allow you to pick it up in other systems like SCOM.

sp_altermessage

If you dive into the code for the SQL replication MP’s you will find that this is how replication monitoring is implemented in SCOM. A series of sp_altermessage commands for different replication errors to turn on logging to the app log. Followed by corresponding event ID targeted alert generating rules.

03

https://msdn.microsoft.com/en-us/library/ms175094.aspx

The effect of sp_altermessage with the WITH_LOG option is similar to that of the RAISERROR WITH LOG parameter, except that sp_altermessage changes the logging behavior of an existing message. If a message has been altered to be WITH_LOG, it is always written to the Windows application log, regardless of how a user invokes the error. Even if RAISERROR is executed without the WITH_LOG option, the error is written to the Windows application log.

If a message is written to the Windows application log, it is also written to the Database Engine error log file.

Tagged , , , ,

Talk: Tips & Tricks for Creating Custom Management Packs

I was perusing through some of the talks from last years TechEd and came across this excellent talk by Mickey Gousset on creating custom management packs:

For more talks from TechEd 2012 click here.

Tagged , , , , ,

On The Importance Of Building Test Environments

One of the things I didn’t quite grasp when I first started using SCOM a few years back was the importance of test environments. SCOM was this bright and shiny new tool that was going to help proactively monitor our servers, increase uptime, and as long as I only installed Microsoft approved Management Packs everything would be alright. This was admittedly extremely naive– but it was good starting point. I was enthusiastic as well as fortunate enough to learn that this was a terrible idea long before making a critical mistake.

SCOM is an incredibly powerful tool, but it has to be used and implemented intelligently:

-Installation guides must be read.

-MP’s should be evaluated in Test or Dev environments first (If you don’t have a test environment build one)

-Blogs should be scoured for relevant info.

-Management Packs should be installed in production because they provide value not just because you happen to have the associated product installed.

Anytime an engineer or admin asks to have a shiny new management pack installed in production and doesn’t want to test it first I remember this slide from a talk I stumbled across from Microsoft’s Management Pack University entitled “Getting Manageability Right” given by Nistha Soni, a program manager on the Ops Manager team at Microsoft:

Getting Managability Right Nista Soni

The talk was for the different Microsoft product teams to help them think about how to build better management packs that are useful to their customers. If a MP reduces total cost of ownership this is a good thing, if it increases TCO then we have a problem. This slide was referencing an iteration of a Microsoft MP–name omitted to protect the guilty– which provided feedback that while potentially useful for a developer at Microsoft, was also inundating their customers/operators with alerts.

Building a useful MP is a delicate balancing act and its important to remember that even the ones made by Microsoft are essentially a work in progress. Each successive iteration tends to get better, but if you just import into production without testing and research you are asking for trouble.

The talk itself is an interesting look at how Microsoft thinks about monitoring and building management packs and is still available here.

Tagged , , , ,