Fast Inverse Square Root

by James Buckland

float Q_rsqrt( float number )
        long i;
        float x2, y;
        const float threehalfs = 1.5F;

        x2 = number * 0.5F;
        y  = number;
        i  = * ( long * ) &y;                       // evil floating point bit level hacking
        i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
        y  = * ( float * ) &i;
        y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//      y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

        return y;

The Fast Inverse Square Root is a bit-level hack for calculating Q_\text{rsqrt}(x) = x^{-1/2} at massively efficient speeds. The problem it is designed to solve is as follows: modern physics engines calculate (among other things) the reflections of light off an object; this reflection takes the form of a vector normal to the surface. This vector normalization requires, among its many steps, the calculation of \hat{v} = v / \sqrt{v_1^2 + v_2^2 + v_3^2}. In floating-point arithmetic, addition and multiplication are easy, but fast square roots and fast division produce a bottleneck of sorts; thus, the need arises for a solution on the processor-level.

Even to the casual programmer, it should be evident that the power — and weirdness — of  Q_\text{rsqrt}(x) lies in its use of the ‘magic number’ 0x5f3759df, a hexadecimal constant which, in context, is able to calculate y = y^{-1/2} with only a few quick steps, in contrast to the time-consuming (if simple) process of calculating it by normal means. A caveat: this magic number produces a value which is generally close to the accurate value.

For a diligent explanation of the machinery behind 0x5f3759df, read Christian Hansen’s lengthy post on the matter; in short, it exploits an underlying feature of the machinery behind floating-point– and long- type numbers on the processor level. Writing y = x^{-1/2} as \log_2 y = - \frac{1}{2} \log_2 x allows us to manipulate the numbers along a logarithmic path; at small scales, this path is mostly straight, and can be approximated by y = x + \sigma, where \sigma, in hexadecimal, ends up being 0x5f3759df itself. This method ends up being so accurate and so close to the ideal number that, for a sufficient level of precision, Newton’s Method needs to be run only once in order to refine the value to within a reasonable amount.

In order to fix a problem at the bit-level scale, a hack had to be found at the bit-level scale. This is called bit manipulation; it is incredible and can seem like magic to even an experienced programmer.