Category Archives: Best Practices

How do I: Create an Event View that excludes a particular Event ID

I had a large enterprise customer recently who was monitoring ADFS with the default management pack. They liked being able to glance at the event view which gave them a single place where they could look at the ADFS events occurring across their environment. They were using this event data as part of their correlation and tuning process to determine if there were additional actionable events that were being missed for their unique infrastructure. The eventual goal being to stop collecting the events altogether and only have alert generating rules/monitors in place for patterns of events that they cared about.

01

They quickly found that at least for their environment some of the events being collected were essentially noise, and they asked how to adjust the view so it would exclude one particular event.

This is one of those sounds really easy and of course the product should do this out of box questions that SCOM has never really had a great answer for.

If we take a look at the view it is populated by the following criteria:

02

And if we dig into the corresponding rule that collects the events we find a wildcard regex-style collection rule targeted at the ADFS log:

03

04

05

Since the collection rule is part of a sealed MP the best we could do at the rule level is to shut off this collection rule, and create a new collection rule with a modified wildcard expression such that it would collect everything the old rule did with the exception of the event ID the customer doesn’t like.

The problem with this solution is it isn’t particularly efficient/self-service friendly. If next week the customer realizes there is an additional event they want excluded the AD team has to contact the SCOM team and request further modifications.

In an ideal world the exclusion would be possible at the View level, but if you ever dig into modifying the classic OpsMgr views you will find that while you can use WildCards for some fields like Event Source to perform exclusions:

06

The same is not true for event ID’s, where wildcard exclusions are not allowed:

07

I briefly toyed with the idea of making modifications to the MP at the XML level to allow exclusions as I have occasionally done in the past to hack a subscription into meeting a customer need, but in this case such a solution doesn’t really fit. The customer needed something that was easy for them to change as they gradually winnow down the list of events they see to only the ones they care about.

They needed something that was extremely easy to edit.

Enter PowerShell and the SCOM SDK.

The first solution I put together for them to test was the following:

PowerShell Grid Widget

08

with a where-object {$_.Number -ne 31552 -and $_.PublisherName -eq “Health Service Modules” } I used a SCOM publishername since I didn’t have any ADFS events in my test environment and I wanted to use something that I could confirm that the exclusion was working as expected: 

11

Everything looked good the event I wanted excluded was dealt with properly  (Description dataObject is commented out in the code for this screenshot to make it easier to view. With Description uncommented each event takes up more lines of screen real-estate. I recommend creating two views, one with description commented out, and one where it is uncommented so customers can easily toggle between views.)

12

And if we remove the -ne $_.Number 31152 I get results as below with the event present:

10

In theory this should be all we needed, but when my customer tested out the script nothing happened. After a little bit of head scratching it became apparent what the problem was.

We were calling Get-SCOMEvent | Where-Object

which means we were telling the OpsMgr SDK to please go retrieve every single event in the OpsDB, and then once you are done with that we are going to pipe the results to a Where-Object and tell you what we really need.

In my relatively small test environment this wasn’t that big of an ask and the results returned quickly.

In my customer’s environment with thousands of servers and friendly event generating MP’s like the Exchange 2010 MP, getting every event in the OpsDB was basically a great way to enter an endless loop of dashboard timeouts with nothing ever being displayed.

So we needed to filter things down a bit up front, before piping to the Where-Object.

If you search the blogs you will find that Stefan Stranger has a nice post describing how to deal with this issue when calling the Get-SCOMAlert cmdlet with a Where-Object. Basically you use Get-SCOMAlert -criteria and then pipe to a Where-Object if still needed.

Unfortunately, Get-SCOMEvent doesn’t have a -criteria parameter because that would make things too easy and intuitive.

It does, however, have a -rule parameter which looked promising:

13

First I tried passing it a rule Name, followed by a second try with a rule GUID for an event collection rule I was interested in. In both cases I got a nice red error message:

14

While a little a cryptic it is saying that I am passing a parameter of the type string, and it wants a special SCOM specific rule type.

To give it what it wants we need to first retrieve the -rule parameter using the get-scomrule cmdlet and then pass it to get-scomevent as a variable:

$rule = get-scomrule -DisplayName “Operations Manager Data Access Service Event Collector Rule”

15

$rule = get-scomrule -DisplayName “Operations Manager Data Access Service Event Collector Rule”

get-scomevent -rule $rule

16

So our final script would look something like this: (I have added some additional filtering to be able to allow if you just want events from the past hour. *Keep in mind this date/time filtering doesn’t increase the efficiency of the script since it occurs after the Where-Object, the only thing making this script more efficient is that we are first only pulling back events collected from a specific rule*)

$rule = get-scomrule -DisplayName “Operations Manager Data Access Service Event Collector Rule”

$DateNow = date

#Modify the .AddMinutes below to determine how far back to pull events

$DateAgo = $DateNow.AddMinutes(-60)

#$_.Number -ne(not equals) is used to indicate the event number that you want to exclude from the view

$eventView = Get-scomevent -rule $rule |where-object {$_.Number -ne 17 -and $_.TimeGenerated -ge $DateAgo -And $_.TimeGenerated -le $DateNow}|Select Id, MonitoringObjectDisplayName,  Number, TimeGenerated, PublisherName, Description| sort-object TimeRaised -descending

foreach ($object in $eventView){

     $dataObject = $ScriptContext.CreateInstance(“xsd://OpsConfig!sample/dashboard”)

     $dataObject[“Id”] = [String]($object.Id)

     $dataObject[“Event Number”] = [Int]($object.Number)

     $dataObject[“Source”] = [String]($object.MonitoringObjectDisplayName)

     $dataObject[“Time Created”] = [String]($object.TimeGenerated)

     $dataObject[“Event Source”] = [String]($object.PublisherName)

     $dataObject[“Description”] = [String]($object.Description)

     $ScriptContext.ReturnCollection.Add($dataObject)

}

And then the ADFS code would look like this, though event 17 was not the event they wanted to exclude:

$rule = get-scomrule -DisplayName “Federation server events collection”

$DateNow = date

#Modify the .AddMinutes below to determine how far back to pull events

$DateAgo = $DateNow.AddMinutes(-60)

#$_.Number -ne(not equals) is used to indicate the event number that you want to exclude from the view

$eventView = Get-scomevent -rule $rule |where-object {$_.Number -ne 17 -and $_.TimeGenerated -ge $DateAgo -And $_.TimeGenerated -le $DateNow}|Select Id, MonitoringObjectDisplayName,  Number, TimeGenerated, PublisherName, Description| sort-object TimeRaised -descending

foreach ($object in $eventView){

     $dataObject = $ScriptContext.CreateInstance(“xsd://OpsConfig!sample/dashboard”)

     $dataObject[“Id”] = [String]($object.Id)

     $dataObject[“Event Number”] = [Int]($object.Number)

     $dataObject[“Source”] = [String]($object.MonitoringObjectDisplayName)

     $dataObject[“Time Created”] = [String]($object.TimeGenerated)

     $dataObject[“Event Source”] = [String]($object.PublisherName)

     $dataObject[“Description”] = [String]($object.Description)

     $ScriptContext.ReturnCollection.Add($dataObject)

Hopefully this helps save a little bit of time for anyone else who comes across a question like this one.

Tagged , , , , , ,

Best Practices: How to load test your website? Part I

Earlier this week I received a question about using SCOM to do large scale load testing on a website. While SCOM can be a great resource with synthetic transactions to simulate a finite number of user transactions and APM can give you code level instrumentation of how your .NET app is performing it isn’t really designed as large scale load test rig. You could certainly set up a large number of synthetic transactions executed from anywhere you have a SCOM agent, but when you start talking about the need to simulate 10,000+ transactions against a single site then you are entering a territory where SCOM isn’t the right tool for the job.

If you want to do some basic load testing you can do this natively inside Visual Studio Online

https://www.visualstudio.com/en-us/products/what-is-visual-studio-online-vs.aspx

01

First enter the url of the site you want to load test & give your test a name.

02 03

Next pick which Microsoft Azure Datacenter you want your test to be executed from:

04

Select the number of test users. Simple tests are capped at a max of 200 users (You get 20,000 free user minutes per month)

05

Select how long you want the test to run for:

06

Set the Think-time

07

Select a browser distribution

08

Then just click Test now

09

It will take a few minutes to acquire the necessary resources and configure agents

10

Results will appear as below:

11 93

For more advanced load testing check out the Part II post.

Tagged , ,

Best Practices: Agent Remediation Tool

This is a proof of concept script consisting of a mix of PowerShell with some .NET for a GUI that can serve as an automated playbook for agent remediation.

Typically I prefer to remediate agents via the SCOM console, but there are instances where an agent is locked down such that remote management is not possible, and the SCOM Team may not have access to remote a server and fix an agent. This script empowers non SCOM sysadmins, DBA’s et cetera to be able to perform basic troubleshooting on their agents without the fear of accidentally deleting the wrong thing.

-The script must be run in PowerShell as an administrator

-This script is compatible with PowerShell 2.0, 3.0 & 4.0

*There are some dependencies on the .NET Framework. It is designed for .NET 3.5, but in testing it does work with .NET 2.0, though it will throw some fun red errors for certain non critical display elements which will not be able to load.

This script is designed for SCOM 2012 R2 Agents

opsconfigtool

Functionality:

1. Restart SCOM Agent (This restarts the Microsoft Monitoring Agent)

2. Flush SCOM Agent Cache (This stops the SCOM agent, Clears the Health Service State Agent Caches, Starts the agent and rebuilds the Agent Cache)

3. Uninstall SCOM Agent (This queries WMI to determine appropriate GUID that is associated with the SCOM agent installation and then passes this GUID to an automated uninstall.)

4. Install SCOM Agent (This is a placeholder for either manual agent install instructions, or it can be adapted to call a function to kick off a command-line based agent install assuming agent media is on an accessible UNC file share)

*It appears I neglected to include the link to the script, it can be downloaded here: TechNet Gallery *

 

Tagged , ,

Best Practices: SCOM Health Check Script/Report OpsConfig ed v1.0

This weekend I came across a fantastic SCOM HealthCheck Script/Report written by Tim Culham of http://www.culham.net

I would strongly encourage you to visit the site and check out his original script as he did all the heavy lifting.

http://www.culham.net/powershell/scom-2012-scom-2012-r2-daily-check-powershell-script-html-report/

I decided to extend/tweak his script a bit by adding in a number of the more in depth SQL Queries that I frequently ask customers to run when troubleshooting performance issues with the OpsDB and DW. Many of the queries are modified versions of the KH Useful SQL Queries, though there are a few that might be new to all of you. This sacrifices some of the speed and elegance of Tim’s script, but the information that you get back is invaluable.

This script should be run as administrator from a SCOM Management Server by an account that has permissions to connect and read from the Ops & DW DBs. You can just run the script without inputting any parameters. It will open the report upon script completion. (My version can take anywhere from 30 seconds to 10 minutes to run depending on the size/performance of your environment)

*At times this script is running queries directly against the OpsDB -while this is a completely common practice for troubleshooting and diagnosing issues it is also technically not supported. The script is provided AS-IS without warranty of any kind*

My version of the script can be downloaded here:

https://gallery.technet.microsoft.com/SCOM-Health-Check-fd2272ec

What this version of the script will give you:(Some of these are just features which are carried over from the original, many are added)

01. Version/Service Pack/Edition of SQL for each SCOM DB Server
02. Disk Space Info for Ops DB, DW DB, and associated Temp DB’s
03. Database Backup Status for all DB’s except Temp.
04. Top 25 Largest Tables for Ops DB and DW DB
05. Number of Events Generated Per Day (Ops DB)
06. Top 10 Event Generating Computers (Ops DB)
07. Top 25 Events by Publisher (Ops DB)
08. Number of Perf Insertions Per Day (Ops DB)
09. Top 25 Perf Insertions by Object/Counter Name (Ops DB)
10. Top 25 Alerts by Alert Count
11. Alerts with a Repeat Count higher than 200
12. Stale State Change Data
13.  Top 25 Monitors Changing State in the last 7 Days
14. Top 25 Monitors Changing State By Object
15. Ops DB Grooming History
16. Snapshot of DW Staging Tables
17. DW Grooming Retention
18. Management Server checks (Works well on prem, seems to have some issues with gateways due to remote calls-if you see some errors flash by have no fear though I wouldn’t necessarily trust the results coming back from a Gateway server in the report depending on firewall settings)
19. Daily KPI
20. MP’s Modified in the Last 24 hours
21. Overrides in Default MP Check
22. Unitialized Agents
23. Agent Stats (Healthy, Warning, Critical, Unitialized, Total)
24. Agent Pending Management Summary
25. Alert Summary
26. Servers in Maintenance Mode

Report Output: (Only grabbing a screenshot of the first few pages as you get the basic idea)

Report Output

report output 2

 

 

Tagged ,

Talks: Tech Ed Europe 2014 Keynote

Also be sure to check out other talks as they are uploaded as well as the Live Stream:

http://channel9.msdn.com/Events/TechEd/Europe/2014

 

Troubleshooting: SCOM reports yield weird data/what the heck does 9.221E+07 mean?

Eventually when running a report in SCOM you are going to end up with a report like the one below.

01

At first glance everything looks okay. But then you start looking at the data that was returned and it can sometimes be a little confusing.

02

Usually the questions I get from customers ranges from “I think this report is broken” to “what the heck does 9.221E+07 mean?”

Fear not, reporting is not broken and 9.221E+07 is not nearly as confusing as it may seem. Basically, what is going on is that the dataset you have returned is so large in regards to the number of digits that in order to display it in a meaningful way the system is presenting the data using some shorthand commonly known as scientific notation. All you need to understand is that +07 indicates the number of times the decimal point would need to be moved to the right to display the full number.

So 9.221E+07 = 92210000

And if we look at the top of the chart we will note that the particular performance counter that we are reporting on is being returned in Bytes so we are dealing with:

92210000 Bytes

For those of you who like me are not particularly mathematically inclined and prefer to leave conversions to someone else I recommend using the wonderful built-in functionality of PowerShell.

If you enter 9.221E+07 and hit enter it will automatically output the full value for you:

03

If the original unit–in this case Bytes–is not your unit of choice and you want to know what the value is in MB  just enter the value in scientific notation form and then divide by 1 MB:

9.221E+07  / 1MB

04

Same goes for GB

9.221E+07  / 1GB

05

 

Tagged , , , , , ,

System Center Technical Preview Released: What’s New in SCOM

For those of you with MSDN accounts System Center Technical Preview is now out.

Technical Preview

I haven’t had a huge amount of time to dig in to see what’s new, but here are a few things that have caught my eye so far with the Preview for SCOM:

Open PowerShell Task

Open PowerShell

What is interesting about this task is that it isn’t simply launching an administrative PowerShell prompt on your local system. It is contextual to whatever server you have selected in the windows computer view and thus takes care of the fun of invoking a remote session for you in a single click.

powershell

(I had originally thought this was exclusive to the MP’s in the Technical Preview, but @StanZhelyazkov kindly pointed that this was added in the latest version of the Windows OS MP’s that came out on 8/27/2014: http://www.microsoft.com/en-us/download/details.aspx%3Fid%3D9296)

I was hoping for some new Dashboards, but it looks like for now we have all the same Dashboards from 2012 R2 plus those that were added in UR2. The only change I have noticed so far is that some of the names of the dashboards have been modified.

Contextual Healh renamed

There are some new Management Pack Wizards. The TFS wizard may have already existed I haven’t imported that MP before, but I believe the Unix/Linux Service Monitoring Wizard is new:

Management Pack Wizards

New Monitors: (Two new Unix/Linux Script Monitors)

monitors

New Task Option:(Run a UNIX/Linux Script)

New Task

 

Tagged

Best Practices: Building a home System Center Test environment Part V

Part V (Installing Operations Manager 2012 R2) To start at the beginning check out: Part I

Install the SCOM pre-reqs:

Report Viewer

http://www.microsoft.com/en-us/download/details.aspx?id=35747

Launch PowerShell as an admin

Add-WindowsFeature NET-WCF-HTTP-Activation45,Web-Static-Content,Web-Default-Doc,Web-Dir-Browsing,Web-Http-Errors,Web-Http-Logging,Web-Request-Monitor,Web-Filtering,Web-Stat-Compression,Web-Mgmt-Console,Web-Metabase,Web-Asp-Net,Web-Windows-Auth –Restart

Download SCOM 2012 R2 Media and Mount the media

Launch Setup

Click Install

Select All features and click Install

Click Next

Since we are installing on a DC it will prompt you to alert you this is not recommended. If this were anything other than a stand-alone test environment you should never do this.

Click Next

Create the first Management server in a new management group (Choose a Management group name)

Agree to the license terms — Click Next

Enter localhost or FQDN for the servername

Enter localhost or FQDN in the servername field

Click Next

Click Next

Click Next

Best practices is to use a separate account for each of these, but for the sake of these instructions one account will suffice

You will receive this prompt, again using a domain admin account is not recommended, but this is an isolated test environment so for simplicity I am making an exception.

Click Next

On (recommended) Next

Click Install

Click Close

Best Practices: Building a home System Center Test environment Part IV

Part IV (Installing SQL)

At this point depending on which System Center Product you are installing you may need to spin up more VMs. With SCOM you can create an all in one domain controller, SQL, SCOM Management Server though ultimately it is better to create a distributed environment with each of these items on a different VM. To save a bit on instruction writing I am going to create an all-in-one SCOM environment, but ultimately from this point it is easy to take what you have already learned to create a distributed environment. For excellent documentation on how to spin up test environments for each System Center product once you have Hyper-V and a domain controller in place I recommend taking a look at Kevin Holman’s quickstart guides:

http://blogs.technet.com/b/kevinholman/archive/2013/10/18/opsmgr-2012-r2-quickstart-deployment-guide.aspx

http://blogs.technet.com/b/kevinholman/archive/2013/10/30/configmgr-2012-r2-quickstart-deployment-guide.aspx

http://blogs.technet.com/b/kevinholman/archive/2013/10/18/orchestrator-2012-r2-quickstart-deployment-guide.aspx

https://blogs.technet.com/b/kevinholman/archive/2013/10/18/service-manager-2012-r2-quickstart-deployment-guide.aspx

http://blogs.technet.com/b/kevinholman/archive/2013/10/18/app-controller-2012-r2-quickstart-deployment-guide.aspx

http://blogs.technet.com/b/kevinholman/archive/2013/11/07/dpm-2012-r2-quickstart-deployment-guide.aspx

https://blogs.technet.com/b/kevinholman/archive/2013/10/18/scvmm-2012-r2-quickstart-deployment-guide.aspx

Since screenshots can sometimes be a little bit more helpful I will walk you through a slightly more visual guide to installing SQL and then SCOM in Part V

Log into the Domain Controller with your sysctr account

Download SQL 2012 enterprise edition with SP1 x64

Right click and mount the media

Run setup

Select New SQL Server stand-alone

Click OK

Click Next

Click I Accept & Next

Click Next

Click Next

SQL Server All Features with Defaults– Next