Troubleshooting: SQL MP 6.5.1 with SCOM 2007 R2 (SQL 2012 MP Imports Fail)

So on 6/30 a new SQL MP was released. It has some great new features and fixes:

http://www.microsoft.com/en-us/download/details.aspx?id=10631

and some awesome new instance level dashboards for both SQL 2008 and 2012 when used with SCOM 2012+

SQL MP

Unfortunately, the MP also doesn’t play nice with SCOM 2007 R2 and its predecessor 6.4.1 MP. When you try to import it into your 2007 R2 environment with the previous version of the SQL MP installed you will get failures of all the 2012 MP components

SQL01

This error is not particularly helpful but if you go to the OpsMgr Event log it helps narrow things down:

SQL02

So the workaround is to remove the SQL 2012 6.4.1 MPs and then reimport 6.5.1:

sql03

This should fix the issue, but ultimately you should really upgrade to 2012 R2 as 2007 R2 fell out of mainstream support on 7/8/2014 so any new MP’s going forward will likely be the 2.0 schema and completely incompatible.

Troubleshooting: The installed version of SQL Server is Not Supported (SCOM 2012)

Awhile back I rebuilt one of my test environments. Post rebuild something very strange happened- I could not for the life of me get SCOM reporting to install. All the initial pre-req checks would pass, everything else would install just fine, but I would keep hitting this error.

If you mouse over the little Red X you would get the following:

If you consult the install log files in %userprofile%\AppData\Local\SCOM\LOGS I would find:

Searching for the error online returns a number of posts which while well meaning offer solutions which are unfortunately ultimately not very helpful.

I then spun up a brand new all in one test environment just to try to narrow things down and found that once again the error was present even though the installed version of SQL was a supported version.

After more troubleshooting than I would like to admit this left me with one option, there was something wrong with my SQL media I was using. At first glance it looks just like any other SQL media I have downloaded from MSDN:

But then I looked at the entire name of the media file:

Somehow in a moment of test environment building delirium I had downloaded an x86 copy of SQL 2012 Enterprise, and apparently one of the little known side effects of accidentally installing 32-bit SQL on a 64-bit Operating System is that you will get an SRS Couldn’t Check Version Exception, but everything else will install and work just fine.

I have come across a few instances of other people reporting this problem on the forums, but never actually arriving at a solution. Hopefully this post will be of some use. Once 64-bit SQL was installed on 64-bit Windows Server 2012 everything installs fine as it always has in the past.

 

Troubleshooting: Server 2012 Error Copying File to Folder

I have noticed that Server 2012 has a bug where copying large files between an RDP session fails:

server 2012

Anything over a few gig and I get an “Error Copying File or Folder.” File copy of large files over RDP has always been a little shaky, and eventually Microsoft will have a fix, but in the meantime if you find yourself running into this problem you can get around it by copying the files using UNC paths.

In Windows Explorer type: \\servername\c$ and then copy your files from there. \\Ipaddress\c$ should work as well you just have to make sure windows firewall is open to file and print sharing between the two systems.

Troubleshooting: WSUS with Server 2012

A few weeks ago I had a request to setup a new WSUS server running on server 2012. The setup was easy, but once I had turned it over to our client systems group they were trying to figure out how to have it run on port 80 for testing purposes.  By default WSUS on Server 2012 uses port 8530 for Windows Updates. They quickly discovered that modifying the bindings in IIS won’t work in this case.

To modify WSUS to use port 80 the wsusutil tool is the preferred method.

The tool is located in c:\Program Files\Update Services\Tools

The command to change the WSUS website to use port 80 is: wsusutil usecustomwebsite false

wsus

Troubleshooting: SCOM Agent Healthy, but availability report for server shows monitoring unavailable

This was definitely an odd one. I noticed that one of our systems was showing as having a healthy SCOM Agent yet it if you ran an availability report against the windows computer object it would show monitoring as being unavailable. After confirming that the data warehouse was not running behind I found that this was actually happening with more than one of our servers.

Running an availability report would look as follows:

01

Brody Kilpatrick has a nice post on his blog explaining one of the possible causes and solutions which involves running some unsupported scripts against the data warehouse. I highly recommend reading his post and all credit for this solution must go to him. With that said, I found that the SQL queries he posted have issues that caused them to fail, at least in my environment. (Brody responded that he is updating the queries so it is likely that by the time you read this they will be fixed.) There were also some slight discrepancies between the results of his queries and my results so I opted to use his work as a template, but to modify things ever so slightly so that it would actually work in my environment which is running OpsMgr 2012 SP1 with the datawarehouse running on a dedicated Server 2008 r2 box running SQL 2008 R2.

First on your datawarehouse server you are going to want to run the following query:

02

If nothing is returned, that is fantastic, and you aren’t experiencing the problem this post will solve. If you do get results they will look something like this:

3

 

The EndDateTime with Null is not necessarily indicative of a problem. In some cases it was just a server that had been shutdown for a period of time, but had not been removed from SCOM. However, some of these NULL’s were for the servers that were showing healthy SCOM agents with availability reporting showing monitoring unavailable.

As useful as HealthServiceOutageRowId is it can be helpful to actually know the name of the associated system. Run the following query to join in Name and DisplayName:

04

Your results should look like this with the right-most DisplayName column providing the FQDN of the affected system:

05

At this point Brody’s post recommends confirming that the systems are all experiencing the problem, backing up your datawarehouse, and at your own risk modifying the values of the EndDateTime column via custom SQL. I tend to be a little risk averse, at least in my production environments so the first thing I tried now that I had narrowed down the issue was to uninstall the SCOM agent from one of the misbehaving systems, and then immediately reinstalling it. For that system this resolved the issue immediately with the proper availability monitoring returning post reinstall:

06

However, one of my affected servers was a domain controller which had a manually installed agent. I had no way of uninstalling, and reinstalling the agent without bugging our domain administrator.

So for this case I backed up the datawarehouse and then did the following (Again you could do this via raw SQL, but sometimes I think it is easier to have a clear understanding of what you are doing to a database rather than just copying some code someone else wrote)

Please keep in mind this solution is not supported by Microsoft:

Right click the dbo.HealthServiceOutage table:

07

Select Edit Top 200 Rows:

08

In the right hand properties box hit the + sign next to Top Specification and increase the Expression value to include the value of the HealthServiceOutageRowID of the sytem you want to fix:

09

At the bottom of your query you will see query changed, right click and select Execute SQL:

10

Scroll down to the HealthServiceOutageRowID which matches your affected server. The EndDateTime should show Null. Copy the value from the StartDateTime, and paste it into the box for the EndDateTime and close out of the editor.

11

And then for good measure run this script again to confirm that the your modification worked and the server should no longer be returned:

 

04

So two fixes for this issue:

Recommended Fix Reinstall the SCOM agent

Optional Not Supported back up your datawarehouse first Fix:

Modify the EndDateTime value from Null to match the StartDateTime, either via management studio edit, or via SQL Query.

Just to reiterate, if you opt to use this post as a solution– read Brody’s post as well, he found the solution and presents a much deeper understanding of how availability is actually calculated and the extra info is extremely useful. His method of fixing this via SQL rather than a manual edit via management studio is also far more scalable if you happen to have this problem on more than a handful of servers.

The contents of this site are provided “AS IS” with no warranties, or rights conferred. Example code could harm your environment, and is not intended for production use. Content represents point in time snapshots of information and may no longer be accurate. (I work @ MSFT. Thoughts and opinions are my own.)