Diagnostics of your operating system will help determine if your Millennium application nodes have performance or capacity issues.
The easiest place to begin your analysis is with Millennium's Lights On Network, which monitors how much capacity is being used on the application nodes. From the System Management tab, you can see your Peak CPU, Physical Memory, Disk I/O (input/output) Rates, and Run Queue. Of these items, the Disk I/O Rate and Physical Memory make for nice dials, but you generally can’t do too much with them. Which metrics can you use?
- Disk Busy shows what percentage of the disk’s throughput was used. For instance, a drive that is 30 percent busy is a happy drive, but a drive that is 100 percent busy cannot keep up with its workload. This metric tells your system manager when to spread the work to other drives.
- Paging In (your OS might call it Paging) gives a better indication of your physical memory. If you are paging in a lot, you do not have enough physical memory to keep up with your workload. To reduce the paging in, your system administrator needs to tune the node/lpar/vpar.
- Run Queue shows whether you are running out of CPU. If your Run Queue is 2x your physical CPUs, I recommend that you evaluate whether you can move any work off of that node/lpar/vpar. Other people are comfortable with stressing the CPUs more. In one extreme case, I was told it was OK to have a Run Queue of more 100 on an 8 or 16 CPU node. I disagree. If you are experiencing a Run Queue of more than 4x your physical CPUs, you have exceeded your CPU capacity and need to tune.
One gap in the Lights On Network is the absence of a database’s OS statistics. Without these stats, your database may be having issues but you won’t have any visibility into them. You can gain visibility by requesting the raw data from your outsourcer, downloading the necessary analysis tool and then exporting the data to an Excel file to audit it and track trends. Here’s what you need:
- For AIX and Linux, the raw data is nmon; get the nmon_analyzer.zip from http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmonanalyser.
- For VMS, the raw data is T4; get the TLViz analyzer from http://h71000.www7.hp.com/openvms/products/t4/index.html.
- For HP-UX, the raw data is Glance/GlancePlus; get the analyzer from http://docs.hp.com/en/1219/tuningwp.html (an old document; let me know if you have a newer reference for this toolset).
If you discover OS capacity issues, you will have to work with your system administrators to correct them. I caution you to use tact when talking with them. Even though the truth is in the data, I have found that many system administrators grow defensive when they’re shown the data and can be reticent to make any changes. If they don’t seem to like the data you show them, they can use one of the methodologies published by IBM, HP or Red Hat for maximizing throughput.