David Hallberg

What Is Measured Improves

I just came across the following quote by famed management consultant Peter Drucker: “What’s measured improves.” It occurred to me that this was the crux of some recent conversations I have had with Millennium clients. I keep being asked why Millennium’s message logs are important. Why would a CIO, CTO, CMIO, director, or project manager care about an application’s log files?

Because what is measured improves.

Millennium has more than 25 distinct solution areas with more than 60 distinct, sellable solutions or applications. How do you know if you have high quality in the delivery of these solutions? How do you know if the deployment of the technologies that support the solutions is working as designed? Look at the message logs.

The IT department is often charged with quality and performance improvement as well as patient safety. So how do you detect and categorize these quality and performance problems in Millennium? Look at the message logs.

I realize that this simple answer gets a bit more complicated by the fact that each application node can have between 283 and 538 log files. How can anyone figure out what to do with all of these files? Most people see the number and simply give up before they start. I admit it is a daunting task, especially if you do not have system management tools to help. However, the following steps will help you sort through the hundreds of files to identify the message logs that point to quality and performance issues with your system.

Start by looking at log files that are wrapping in less than 10 days. Those wrapping in less than one day pop to the top of the log list. Each log file contains the date/time stamp of the oldest and newest record and the number of times the log file has wrapped. To avoid running out of disk space on the application nodes, Millennium sets by default the log files to 4,096 records. The 4,097th record overwrites the first record. This means that if a log file is wrapping, you are losing information you may need to determine if you are having a problem.

You can get to the information you need using one of the following options: the message controls in Panther or Olympus, the ls -l command on AIX and HP-UX, or the dir command on VMS. Panther and Olympus will give you wrap times. If you don’t have these tools, you will need to derive them. Perform the ls -l or dircommand of the $cer_log directory, wait one day, and do the command again. Save your output, import it into Excel, and look for the files that changed. Now you know which files are wrapping quickly.

Next look for Errors and Warnings (the Level column will show either 0 for Error or 1 for Warning). They indicate that all or part of a transaction was not put into the database. Since it is not in the database, the database tools cannot see it; these events are generally only visible in the log files. If you have a log file wrapping quickly with the Level being greater than 2, you need to change the loglevel in SCP and cycle the server.

Do not accept the Service Request (SR) response you will get when you log this to Cerner. They will tell you to put the Event into a file called messages.suppress and cycle the server. This action will stop the messages from being written to the log files, but it will not stop the problem. It’s a little like turning up the radio when your car starts making a strange sound. You might not hear the noise, but you haven’t eliminated the problem. Likewise, suppressing messages will declutter the log files, but you are only hiding errors or events, not fixing them, and ultimately putting your clinicians and organization at risk. (See our earlier blog titled “When No News Is Bad News.”)

Once your staff has gathered the information, you need to create and execute an action plan to resolve the issues. Unfortunately, this problem solving is an iterative process. By the time the first round of issues is resolved, you probably will have completed more build in Production and had more Millennium code installed. So the process starts again. I have found that if you start this process in the non-Production domains, you will generally be able to resolve the issues before they get to Production. Then the Production message log audits become much easier and faster.

Once you create this quality loop, clinician and business office complaints will diminish, response times will become more consistent, and the general acceptance of the Millennium system will be higher because people will trust the information it holds.

Prognosis: By using message log audits to measure Millennium’s performance, you uncover problem areas and improve the system. Peter Drucker would not be surprised.