Back in June, Gustavo Niemeyer posted the following question on his Labix.org blog:
Assume uf is an unsigned integer with 64 bits that holds the IEEE-754 representation for a binary floating point number of that size.
How can you tell if uf represents an integer number?
I can't talk for you, but I write business applications. I just don't have the background to quickly knock out an answer for a question like this. Fortunately Gustavo posted the logic for the answer. I thought it would be fun to better understand the question and break down his answer. I am going to work with a 32 bit number just to make everything easier.
How does the IEEE-754 standard store a floating point number in a binary format? To start I found these two pages:
http://steve.hollasch.net/cgindex/coding/ieeefloat.htmlhttp://class.ece.iastate.edu/arun/CprE281_F05/ieee754/ie5.htmlThe IEEE-754 specification represents a floating point number in base 2 scientific notation using a special binary format. If you don't know what I mean by base 2 then look at my post on Understanding Type In Go (
https://www.ardanlabs.com/blog/2013/07/understanding-type-in-go.html).
Scientific notation is an efficient way of writing very large or small numbers. It works by using a decimal format with a multiplier. Here are some examples:
Base 10 Number | Scientific Notation | Calculation | Coefficient | Base | Exponent | Mantissa |
---|
700 | 7e+2 | 7 * 102 | 7 | 10 | 2 | 0 |
4,900,000,000 | 4.9e+9 | 4.9 * 109 | 4.9 | 10 | 9 | .9 |
5362.63 | 5.36263e+3 | 5.36263 * 103 | 5.36263 | 10 | 3 | .36263 |
-0.00345 | 3.45e-3 | 3.45 * 10-3 | 3.45 | 10 | -3 | .45 |
0.085 | 1.36e-4 | 1.36 * 2-4 | 1.36 | 2 | -4 | .36 |
In normal scientific notation form there is always just one digit on the left side of the decimal point. For base 10 numbers that digit must be between 1 through 9 and for base 2 numbers that digit can only be 1.
The entire number in scientific notation is called the Coefficient. The Mantissa is all the numbers to the right of the decimal point. These terms are important so take the time to study and understand the chart above.
How we move the decimal point to that first position determines the value of the Exponent. If we have to move the decimal point to the left, the Exponent is a positive number, to the right, it is a negative number. Look at the chart above and see the Exponent value for each example.
The Base and the Exponent work together in the notation. The exponent determines the "Power Of" calculation we need to perform on the base. In the first example the number 7 is multiplied by 10 (The Base) to the power of 2 (The Exponent) to get back to the original base 10 number 700. We moved the decimal point to the left two places to convert 700 to 7.00, which made the Exponent +2 and created the notation of 7e+2.
The IEEE-754 standard does not store base 10 scientific notation numbers but base 2 scientific notation numbers. The last example in the chart above represents the base 10 number 0.085 in base 2 scientific notation. Let's learn how that notation is calculated.
Base 10 Number | Scientific Notation | Calculation | Coefficient | Base | Exponent | Mantissa |
---|
0.085 | 1.36e-4 | 1.36 * 2-4 | 1.36 | 2 | -4 | .36 |
We need to divide the base 10 number (0.085) by some power of two so we get a 1 + Fraction value. What do I mean by a 1 + Fraction value? We need a number that looks like the Coefficient in the example, 1 + .36. The IEEE-754 standard requires that we have a "1." in the Coefficient. This allows us to only have to store the Mantissa and give us an extra bit of precision.
If we use brute force you will see when we finally get the 1 + Fraction value for 0.085:
0.085 / 2
-1 = 0.17
0.085 / 2
-2 = 0.34
0.085 / 2
-3 = 0.68
0.085 / 2
-4 = 1.36 ** We found the 1 + Fraction
An exponent of -4 gives us the 1 + Fraction value we need. Now we have everything we need to store the base 10 number 0.085 in IEEE-754 format.
Let's look at how the bits are laid out in the IEEE-754 format.
Precision | Sign | Exponent | Fraction Bits | Bias |
---|
Single (32 Bits) | 1 [31] | 8 [30-23] | 23 [22-00] | 127 |
Double (64 Bits) | 1 [63] | 11 [62-52] | 52 [51-00] | 1023 |
The bits are broken into three parts. There is a bit reserved for the sign, bits for the exponent and bits that are called fraction bits. The fractions bits are where we store the mantissa as a binary fraction.
If we store the value of 0.085 using Single Precision (a 32 bit number), the bit pattern in IEEE-754 would look like this:
Sign | Exponent (123) | Fraction Bits (.36) |
---|
0 | 0111 1011 | 010 1110 0001 0100 0111 1011 |
The Sign bit, the leftmost bit, determines if the number is positive or negative. If the bit is set to 1 then the number is negative else it is positive.
The next 8 bits represent the Exponent. In our example, the base 10 number 0.085 is converted to 1.36 * 2
-4 using base 2 scientific notation. Therefore the value of the exponent is -4. In order to be able to represent negative numbers, there is a Bias value. The Bias value for our 32 bit representation is 127. To represent the number -4, we need to find the number, that when subtract against the Bias, gives us -4. In our case that number is 123. If you look at the bit pattern for the Exponent you will see it represents the number 123 in binary.
The remaining 23 bits are the Fraction bits. To calculate the bit pattern for the fraction bits, we need to calculate and sum binary fractions until we get the value of the Mantissa, or a value that is as close as possible. Remember, we only need to store the Mantissa because we always assume that the "1." value exists.
To understand how binary fractions are calculated, look at the following chart. Each bit position from left to right represents a fractional value:
Binary | Fraction | Decimal | Power |
---|
0.1 | 1/2 | 0.5 | 2-1 |
0.01 | 1/4 | 0.25 | 2-2 |
0.001 | 1/8 | 0.125 | 2-3 |
We need to set the correct number of fraction bits that add up to or gets us close enough to the mantissa. This is why we can sometimes lose precision.
010 1110 0001 0100 0111 1011 = (0.36) | Bit | Value | Fraction | Decimal | Total |
---|
2 | 4 | 1/4 | 0.25 | 0.25 |
4 | 16 | 1/16 | 0.0625 | 0.3125 |
5 | 32 | 1/32 | 0.03125 | 0.34375 |
6 | 64 | 1/64 | 0.015625 | 0.359375 |
11 | 2048 | 1/2048 | 0.00048828125 | 0.35986328125 |
13 | 8192 | 1/8192 | 0.0001220703125 | 0.3599853515625 |
17 | 131072 | 1/131072 | 0.00000762939453 | 0.35999298095703 |
18 | 262144 | 1/262144 | 0.00000381469727 | 0.3599967956543 |
19 | 524288 | 1/524288 | 0.00000190734863 | 0.35999870300293 |
20 | 1048576 | 1/1048576 | 0.00000095367432 | 0.35999965667725 |
22 | 4194304 | 1/4194304 | 0.00000023841858 | 0.35999989509583 |
23 | 8388608 | 1/8388608 | 0.00000011920929 | 0.36000001430512 |
You can see that setting these 12 bits get us to the value of 0.36 plus some extra fractions.
Let's sum up what we now know about the IEEE-754 format:
1. Any base 10 number to be stored is converted to base 2 scientific notation.
2. The base 2 scientific notation value we use must follow the 1 + Fraction format.
3. There are three distinct sections in the format.
4. The Sign bit determines if the number is positive or negative.
5. The Exponent bits represent a number that needs to be subtracted against the Bias.
6. The Fraction bits represent the Mantissa using binary fraction summation.
Let's prove that our analysis is correct about the IEEE-754 format. We should be able to store the number 0.85 and display bit patterns and values for each section that match everything we have seen.
The following code stores the number 0.085 and displays the IEEE-754 binary representation:
package main
import (
"fmt"
"math"
)
func main() {
var number float32 = 0.085
fmt.Printf("Starting Number: %f\n\n", number)
// Float32bits returns the IEEE 754 binary representation
bits := math.Float32bits(number)
binary := fmt.Sprintf("%.32b", bits)
fmt.Printf("Bit Pattern: %s | %s %s | %s %s %s %s %s %s\n\n",
binary[0:1],
binary[1:5], binary[5:9],
binary[9:12], binary[12:16], binary[16:20],
binary[20:24], binary[24:28], binary[28:32])
bias := 127
sign := bits & (1 << 31)
exponentRaw := int(bits >> 23)
exponent := exponentRaw - bias
var mantissa float64
for index, bit := range binary[9:32] {
if bit == 49 {
position := index + 1
bitValue := math.Pow(2, float64(position))
fractional := 1 / bitValue
mantissa = mantissa + fractional
}
}
value := (1 + mantissa) * math.Pow(2, float64(exponent))
fmt.Printf("Sign: %d Exponent: %d (%d) Mantissa: %f Value: %f\n\n",
sign,
exponentRaw,
exponent,
mantissa,
value)
}
When we run the program we get the following output:
Starting Number: 0.085000
Bit Pattern: 0 | 0111 1011 | 010 1110 0001 0100 0111 1011
Sign: 0 Exponent: 123 (-4) Mantissa: 0.360000 Value: 0.085000
If you compare the displayed bit pattern with our example above, you will see that it matches. Everything we learned about IEEE-754 is true.
Now we should be able to answer Gustavo's question. How can we tell if the value being stored is an integer? Here is a function, thanks to Gustavo's code, that tests if the IEEE-754 stored value is an integer:
func IsInt(bits uint32, bias int) {
exponent := int(bits >> 23) - bias - 23
coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
intTest := (coefficient & (1 << uint32(-exponent) - 1))
fmt.Printf("\nExponent: %d Coefficient: %d IntTest: %d\n",
exponent,
coefficient,
intTest)
if exponent < -23 {
fmt.Printf("NOT INTEGER\n")
return
}
if exponent < 0 && intTest != 0 {
fmt.Printf("NOT INTEGER\n")
return
}
fmt.Printf("INTEGER\n")
}
So how does this function work?
Let's start with the first condition which tests if the Exponent is less than -23. If we use the number 1 as our test number, the exponent will be 127 which is the same as the Bias. This means when we subtract the Exponent against the Bias we will get zero.
Starting Number: 1.000000
Bit Pattern: 0 | 0111 1111 | 000 0000 0000 0000 0000 0000
Sign: 0 Exponent: 127 (0) Mantissa: 0.000000 Value: 1.000000
Exponent: -23 Coefficient: 8388608 IntTest: 0
INTEGER
The test function adds an extra subtraction of 23, which represents the starting bit position for the Exponent in the IEEE-754 format. That is why you see -23 for the Exponent value coming from the test function.
Precision | Sign | Exponent | Fraction Bits | Bias |
---|
Single (32 Bits) | 1 [31] | 8 [30-23] | 23 [22-00] | 127 |
This subtraction is required to help with the second test. So any value that is less than -23 must be less than one (1) and therefore not an integer.
To understand how the second test works, let's use an integer value. This time we will set the number to 234523 in the code and run the program again.
Starting Number: 234523.000000
Bit Pattern: 0 | 1001 0000 | 110 0101 0000 0110 1100 0000
Sign: 0 Exponent: 144 (17) Mantissa: 0.789268 Value: 234523.000000
Exponent: -6 Coefficient: 15009472 IntTest: 0
INTEGER
The second test looks for two conditions to identify if the number is not an integer. This requires the use of bitwise mathematics. Let's look at the math we are performing in the function:
exponent := int(bits >> 23) - bias - 23
coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
intTest := coefficient & ((1 << uint32(-exponent)) - 1)
The coefficient calculation is adding the 1 + to the Mantissa so we have the base 2 Coefficient value.
When we look at the first part of the coefficient calculation we see the following bit patterns:
coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
Bits: 01001000011001010000011011000000
(1 << 23) - 1: 00000000011111111111111111111111
bits & ((1 << 23) - 1): 00000000011001010000011011000000
The first part of the coefficient calculation removes the bits for the Sign and Exponent from the entire IEEE-754 bit pattern.
The second part of the coefficient calculation adds the "1 +" into the binary bit pattern:
coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
bits & ((1 << 23) - 1): 00000000011001010000011011000000
(1 << 23): 00000000100000000000000000000000
coefficient: 00000000111001010000011011000000
Now that the coefficient bit pattern is set, we can calculate the intTest value:
exponent := int(bits >> 23) - bias - 23
intTest := (coefficient & ((1 << uint32(-exponent)) - 1))
exponent: (144 - 127 - 23) = -6
1 << uint32(-exponent): 000000
(1 << uint32(-exponent)) - 1: 111111
coefficient: 00000000111001010000011011000000
1 << uint32(-exponent)) - 1: 00000000000000000000000000111111
intTest: 00000000000000000000000000000000
The value of the exponent we calculate in the test function is used to determine the number of bits we will compare against the Coefficient. In this case the exponent value is -6. That is calculated by subtracting the stored Exponent value (144) against the Bias (127) and then against the starting bit position for the Exponent (23). This gives us a bit pattern of 6 ones (1's). The final operation takes those 6 bits and AND's them against the rightmost 6 bits of the Coefficient to get the intTest value.
The second test is looking for an exponent value that is less than zero (0) and an intTest value that is NOT zero (0). This would indicate the number being stored is not an integer. In our example with 234523, the Exponent is less than zero (0), but the value of intTest is zero (0). We have an integer.
I have included the sample code in the Go playground so you can play with it.
[http://play.golang.org/p/3xraud43pi](/broken-link)
If it wasn't for Gustavo's code I could never have identified the solution. Here is a link to his solution:
http://bazaar.launchpad.net/~niemeyer/strepr/trunk/view/6/strepr.go#L160Here is a copy of the code that you can copy and paste:
package main
import (
"fmt"
"math"
)
func main() {
var number float32 = 234523
fmt.Printf("Starting Number: %f\n\n", number)
// Float32bits returns the IEEE 754 binary representation
bits := math.Float32bits(number)
binary := fmt.Sprintf("%.32b", bits)
fmt.Printf("Bit Pattern: %s | %s %s | %s %s %s %s %s %s\n\n",
binary[0:1],
binary[1:5], binary[5:9],
binary[9:12], binary[12:16], binary[16:20],
binary[20:24], binary[24:28], binary[28:32])
bias := 127
sign := bits & (1 << 31)
exponentRaw := int(bits >> 23)
exponent := exponentRaw - bias
var mantissa float64
for index, bit := range binary[9:32] {
if bit == 49 {
position := index + 1
bitValue := math.Pow(2, float64(position))
fractional := 1 / bitValue
mantissa = mantissa + fractional
}
}
value := (1 + mantissa) * math.Pow(2, float64(exponent))
fmt.Printf("Sign: %d Exponent: %d (%d) Mantissa: %f Value: %f\n\n",
sign,
exponentRaw,
exponent,
mantissa,
value)
IsInt(bits, bias)
}
func IsInt(bits uint32, bias int) {
exponent := int(bits>>23) - bias - 23
coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
intTest := coefficient & ((1 << uint32(-exponent)) - 1)
ShowBits(bits, bias, exponent)
fmt.Printf("\nExp: %d Frac: %d IntTest: %d\n",
exponent,
coefficient,
intTest)
if exponent < -23 {
fmt.Printf("NOT INTEGER\n")
return
}
if exponent < 0 && intTest != 0 {
fmt.Printf("NOT INTEGER\n")
return
}
fmt.Printf("INTEGER\n")
}
func ShowBits(bits uint32, bias int, exponent int) {
value := (1 << 23) - 1
value2 := (bits & ((1 << 23) - 1))
value3 := (1 << 23)
coefficient := (bits & ((1 << 23) - 1)) | (1 << 23)
fmt.Printf("Bits:\t\t\t%.32b\n", bits)
fmt.Printf("(1 << 23) - 1:\t\t%.32b\n", value)
fmt.Printf("bits & ((1 << 23) - 1):\t\t%.32b\n\n", value2)
fmt.Printf("bits & ((1 << 23) - 1):\t\t%.32b\n", value2)
fmt.Printf("(1 << 23):\t\t\t%.32b\n", value3)
fmt.Printf("coefficient:\t\t\t%.32b\n\n", coefficient)
value5 := 1 << uint32(-exponent)
value6 := (1 << uint32(-exponent)) - 1
inTest := (coefficient & ((1 << uint32(-exponent)) - 1))
fmt.Printf("1 << uint32(-exponent):\t\t%.32b\n", value5)
fmt.Printf("(1 << uint32(-exponent)) - 1:\t%.32b\n\n", value6)
fmt.Printf("coefficient:\t\t\t%.32b\n", coefficient)
fmt.Printf("(1 << uint32(-exponent)) - 1:\t%.32b\n", value6)
fmt.Printf("intTest:\t\t\t%.32b\n", inTest)
}
I want to thank Gustavo for posting the question and giving me something to really fight through to understand.