This week is a training week, which means I have tiny windows of time to catch up on some blogging.
I have had this question a few times over the years. It seems like it should have a straightforward answer, but if there is one, I have not been able to find it.
When customers have asked this in the past I usually refer them to the following three posts:
These do an excellent job in different ways of getting at the question of what agents are greyed out and when did heartbeats stop coming in.
Unfortunately, these do nothing to address the first part of the question, they want all agents, those that have stopped heart beating and also those that haven’t.
This is a little bit more tricky. It is easy enough to get a list of all agents, a list of grey agents, and to query for when health service heartbeat failures occur. But there is nothing easily accessible via the SDK or via the DW that (at least that I am aware of) allows us to capture a timestamp for when a non-grey agents last heartbeat came in.
So my natural question to my customer is why do you need the healthy agents heartbeat timestamp? The answer was basically that they want to feed that data into other systems in their org and they don’t want to deal with two different lists/files. They want one file, but at the end of the day they don’t actually need an exact timestamp for last heartbeat of a healthy agent.
This makes things a lot easier and lends itself to a relatively simple potential solution:
Basically this returns each agent in your management group. If the Agent is greyed out we use the AvailabilityLastModified property to pull an approximate timestamp. If the agent is still heartbeating as determined by the IsAvailable property then the AvailabilityLastModified property isn’t going to contain useful information, so in this case we substitute the current date/time for that field indicating that we have had a successful heartbeat within the past 5 minutes.
I said “approximate timestamp” when referring to agents with an IsAvailable value of false (greyed out agent) in that while in many cases AvailabilityLastModified should correspond to a when a heartbeat failure occurs flipping the agent from healthy to critical. If for some reason the agent was already in a critical state, but was still heartbeating the AvailabilityLastModified property would only be capturing when the agent went into the critical state, not the moment of last heartbeat. If you need a more or less exact moment of last heartbeat report I suggest using one of the links above. But if you need a quick PowerShell report to feed into other systems to help prioritize agent remediation the above script or some modified form of it might be mildly useful.