David Hallberg

Betrayed by an Old Friend

I had a client complain to me recently that I was reporting a false positive, an alert that was triggered even though there was, in fact, no problem or issue with his Millennium® EMR system. I decided to figure out why there was a discrepancy between my measurements and the client’s quality-assurance tool, which happened to be sysmon.

Let me begin by saying that I love sysmon. Before sysmon was developed in the late 1990s, Millennium had so many tools or utilities that provided a single view of a piece of information that it was difficult to keep track of all of them. At one point there were more than 200 build or monitoring tools even though Millennium only had about 40 actual application executables that were used in various configurations for its 60 plus solutions. Multi-node clients kept asking their Millennium middleware team to figure out where performance bottlenecks were occurring. Believe me, this sleuthing was extremely labor-intensive. Users had to have mon_ss (a Millennium monitoring tool used to watch for proprietary middleware queuing) and joumon (a Millennium monitoring tool used to watch for persistent queuing in the proprietary middleware) up in a different window for each application node. Then they would have to try to show someone else when one or more of these screens had a “blip” or negative performance indicator. It was painful on the good days.

So those managing Millennium were understandably ecstatic when Cerner developed a single utility that told what was happening across Millennium’s persistent and non-persistent queuing mechanisms across all application nodes, in a single screen. Sysmon also provided the date and time when something bad happened.

Sysmon proved to be reliable, and I became a proselytizer for this great monitoring tool. While I was at Cerner, I taught many clients how to use sysmon and its various switches. I also used it to tune a four-node client with ConnectionBias (a load-balancing mechanism for Millennium), moving the transaction workload from one application node to another. I was able to show enough value with this one tool that some clients actually chose not to purchase BMC Patrol (later called BMC Performance Manager and now called BMC ProactiveNet Performance Management). Sysmon had some limitations — it was really difficult and unreliable to pipe or move the output to a file for additional processing, and it had no mechanism to redirect its output to a common data format like CSV (comma-separated values), tab-delimited or regular flat files — yet it was the most accurate view of a multi-node application configuration.

So why was my data conflicting with sysmon? It appears that with 2007.02 some of the newer Millennium services do not register with the old middleware in a way that allows sysmon to see the queuing. With the client I referred to earlier, I found that the SenSage add-on called P2Sentinel did not publish its queuing in a way that sysmon could see it. The toolset I used, however, did see the problem.

That’s not the only problem. Another client reported that one of the Java services introduced with 2007.02 was crashing several hundred times a day. I ran sysmon to see if it would show these crashes, and it did not. I am not exaggerating to say that I was crushed. Sysmon was the one tool I absolutely trusted to let me know in a timely manner when Millennium was failing, typically 15-45 minutes before BMC. Now I find it is not reliable!

If, as it appears, the newer programs will not share or publish their information like the old programs, Millennium clients now must find other tools. None of the options, however, shows them this information in a single view. Sadly, we all have to go back to running multiple windows for each application node when we are using command-line tools.

A source of truth that you can trust and believe in is difficult to find. It is even more difficult to see it no longer be a source of truth.