Over the past few months I have had a few alerts related to a failure of WMI on servers. The Product Knowledge within SCOM recommends the following:
Unfortunately, I haven’t had much luck with the recommended fixes. In the past with other systems I used to just reboot the server in question. but I hate having to rely on a reboot to fix a problem as it’s not a particularly good long term solution.
When I try running winmgmt /verifyrepository I get a failure message:
If I try searching for anything that might be hogging all the threads, nothing obvious stands out.
If I run the handy WMI diagnosis tool I get more or less the same thing, along with some useful information that other than the threads being created issue all seems to be well.
I am 99% certain if I were to reboot the system it would resolve the issue, but my guess is this would be only a temporary fix. The particular system I am having the problem on now happens to be of the mission critical cannot reboot under any circumstances without change management and a team of skilled surgeons on hand to bring it back to life should it decide to crash post reboot.
In the interest of a long term solution, I am going to try running the recommended hotfixes to make WMI more robust as recommended by Marnix Wolf on his excellent Blog on OpsMgr.
I will continue to update this post with any further info related to WMI troubleshooting that I come across in the future.