David Fletcher

Easy Script Analysis—Part 4: The Cost Basis of Scripts

For the past few weeks, Rob has covered how to collect script performance information and create spreadsheets for analysis. This week I want to take a step back and talk about the formula used to calculate the cost basis of scripts. The equations presented in Parts 2 and 3 of this blog were:

(BUFFER_GETS * DISK_READS + BUFFER_GETS * CPU_TIME + DISK_READS * CPU_TIME) / EXECUTIONS
and

(BUFFER_GETS * DISK_READS + BUFFER_GETS * CPU_TIME + DISK_READS * CPU_TIME) * EXECUTIONS
The basis for the equations is to take each of the metrics and multiply them by each of the other metrics, thus creating a “cost vector.” After further consideration (and a reminder from the Pythagorean theorem), however, we determined that a more proper form of the cost basis for scripts should be:

where BG is BUFFER_GETS, DR is DISK_READS, and CT is CPU_TIME.

Because there are three factors in our equation, the result is the length of the hypotenuse in a three-dimensional space and can be considered the magnitude of the Cost Basis vector. Figure 1 depicts a visual representation of the Cost Basis vector.

Figure 1—Cost Basis Vector

where BGf is the cost factor for BUFFER_GETS, DRf is the cost factor for DISK_READS, and CTf is the cost factor for CPU_TIME.

Determining the appropriate cost factors is dependent on the hardware configuration at your site, but if you relate everything back to CPU_TIME (CTf = 1), all you need to do is determine the average time (measured in microseconds) consumed by the other operations. For both disk reads and buffer gets, two factors influence performance: seek time and transfer rate.

1. For disk I/O, the transfer rate becomes inconsequential when compared to seek time, so for a good general starting point, use the following equation to determine the cost factor for disk reads:

2. For buffer gets, it is the seek time that becomes inconsequential when compared to the transfer rate, so use the following equation to determine the cost factor for buffer gets:

You will notice that the cost factor for disk reads is much larger than the cost factor for buffer gets. This is to be expected because it takes much longer to get data from disk than from memory.

In Excel, if you insert a row at the top of the spreadsheet and put the cost factor values in the first row above the appropriate columns, the Cost Basis equation will look something like this:

=Sqrt(($J$1*J3)^2 + ($K$1*K3)^2 + ($P$1*P3)^2)
I realize that this blog entry has a lot of mathematical equations, so if you have any questions or want to discuss these equations further, please feel free to add a comment below.

Prognosis: An accurate cost basis for each script can be calculated.

Next week: How to automate script analysis.