mhz - calculate clock rate and megahertz                                        
                                                                                
Usage: mhz [-c]                                                                 
                                                                                
*******************************************************************             
                                                                                
PA RISC timings (by Stan Sieler, sieler@gmail.com)                              
                                                                                
   3000 A Class (A440)         54 Mhz, 18.36 nanosec clock (#1)                 
      (equivalent 9000        440 Mhz,  2.27 nanosec clock)                     
   3000 968                    64 Mhz, 15.73 nanosec clock                      
   3000 928                    48 Mhz, 21.03 nanosec clock                      
   3000 918                    34 Mhz, 29.26 nanosec clock (#2)                 
                                                                                
   9000 rp2430                649 Mhz, 1.54 nanosec clock (32 or 64 bit!)       
   9000 K220 (859)            120 Mhz, 8.35 nanosec clock                       
                                                                                
#1: first reports: mhz: should take approximately 297 seconds                   
   (A400 is SERIES e3000/A400-100-11, clock of 440 reduced                      
   to claimed 110, actual 55)                                                   
                                                                                
#2: first reports: mhz: should take approximately 30 seconds                    
   (clock of 64 reduced to claimed 34)                                          
                                                                                
*****************************************************************               
*****************************************************************               
                                                                                
mhz version: v 1.5 1997/06/14 03:27:23                                          
mhz author: Larry McVoy                                                         
                                                                                
*****************************************************************               
                                                                                
Caveat emptor and other warnings                                                
                                                                                
This code must be compiled using the optimizer!  If you don't                   
compile this using the optimizer, then many compilers don't                     
make good use of the registers and your inner loops end up                      
using stack variables, which is SLOW.                                           
                                                                                
Also, it is sensitive to other processor load.  When running                    
mhz with "rtprio" (real-time priority), I have never had mhz                    
make a mistake on my machine.  At other times mhz has been                      
wrong about 10% of the time.                                                    
                                                                                
If there is too much noise/error in the data, then this program                 
will usually return a clock speed that is too high.                             
                                                                                
*****************************************************************               
                                                                                
Constraints                                                                     
                                                                                
mhz.c is meant to be platform independent ANSI/C code, and it                   
has as little platform dependent code as possible.                              
                                                                                
This version of mhz is designed to eliminate the variable                       
instruction counts used by different compilers on different                     
architectures and instruction sets.  It is also structured to                   
be tightly interlocked so processors with super-scalar elements                 
or dynamic instructure reorder buffers cannot overlap the                       
execution of the expressions.                                                   
                                                                                
We have to try and make sure that the code in the various                       
inner loops does not fall out of the on-chip instruction cache                  
and that the inner loop variables fit inside the register set.                  
The i386 only has six addressable registers, so we had to make                  
sure that the inner loop procedures had fewer variables so they                 
would not spill onto the stack.                                                 
                                                                                
*****************************************************************               
                                                                                
Algorithm                                                                       
                                                                                
We can compute the CPU cycle time if we can get the compiler                    
to generate (at least) two instruction sequences inside loops                   
where the inner loop instruction counts are relatively prime.                   
We have several different loops to increase the chance that                     
two of them will be relatively prime on any given architecture.                 
                                                                                
This technique makes no assumptions about the cost of any single                
instruction or the number of instructions used to implement a                   
given expression.  We just hope that the compiler gets at least                 
two inner loop instruction sequences with lengths that are                      
relatively prime.  The "relatively prime" makes the greatest                    
common divisor method work.  If all the instructions sequences                  
have a common factor (e.g. 2), then the apparent CPU speed will                 
be off by that common factor.  Also, if there is too much                       
variability in the data so there is no apparent least common                    
multiple within the error bounds set in multiple_approx, then                   
we simply return the maximum clock rate found in the loops.                     
                                                                                
The processor's clock speed is the greatest common divisor                      
of the execution frequencies of the various loops.  For                         
example, suppose we are trying to compute the clock speed                       
for a 120Mhz processor, and we have two loops:                                  
  SHR      --- two cycles to shift right                                        
  SHR;ADD      --- three cycles to SHR and add                                  
then the expression duration will be:                                           
  SHR      11.1ns (2 cycles/SHR)                                                
  SHR;ADD      16.6ns (3 cycles/SHR;ADD)                                        
so the greatest common divisor is 5.55ns and the clock speed                    
is 120Mhz.  Aside from extraneous variability added by poor                     
benchmarking hygiene, this method should always work when we                    
are able to get loops with cycle counts that are relatively                     
prime.                                                                          
                                                                                
Suppose we are unlucky, and we have our two loops do                            
not have relatively prime instruction counts.  Suppose                          
our two loops are:                                                              
  SHR      11.1ns (2 cycles/SHR)                                                
  SHR;ADD;SUB   22.2ns (4 cycles/SHR;ADD;SUB)                                   
then the greatest common divisor will be 11.1ns, so the clock                   
speed will appear to be 60Mhz.                                                  
                                                                                
The loops provided so far should have at least two relatively                   
prime loops on all tested architectures.                                        
                                                                                
*****************************************************************               
                                                                                
Copyright (c) 1994 Larry McVoy.  Distributed under the FSF GPL with             
additional restriction that results may published only if                       
(1) the benchmark is unmodified, and                                            
(2) the version in the sccsid below is included in the report.                  
Support for this development by Silicon Graphics is gratefully acknowledged.    
Support for this development by Hewlett Packard is gratefully acknowledged.     
Support for this development by Sun Microsystems is gratefully acknowledged.    
                                                                                
*****************************************************************