Suppose you have the following list of numbers in Python and you would like to compute the sum. You use the
sum() function and expect it to return 0.3. Yet, when you run the code the console returns a value very slightly above 0.3:
numbers = [0.1, 0.1, 0.1] sum(numbers) 0.30000000000000004
You can round this number of course, but it begs the question as to why the correct sum was not returned in the first place. Enter the IEEE 754 floating point standard.
Floating Point Storage
double type is a 64 binary digit (bit) numerical storage standard that includes 1 sign bit (determines if number is positive or negative), a 53 bit significand (only 52 are stored for non-zero values), and an 11 bit exponent.
An 11 bit exponent means the smallest positive number that can be stored is 2-1022. Additionally, the largest rounding error possible in this standard is 2-52 called machine epsilon. Because this is a binary representation that means numbers that can be represented exactly in base 10 must be approximated when converting to binary.
Going back to our example above, 0.1 is a value that must be rounded for storage in this format. This is because 0.1 in binary is infinite:
There are methods to store values exactly but this comes at the speed of computation. What if we want to keep the speed of 64 bit computation but reduce our error, specifically for large number series?
Enter Kahan’s Summation Algorithm. Developed by William Kahan, this summation methodology allows for more accurate summation using the
double storage format. Here is a simple Python implementation:
def kahan_sum(x): sum = 0.0 c = 0.0 for i in x: y = i - c t = sum + y c = t - sum - y sum = t return sum
Okay, so this looks pretty simple. But what do each of the pieces mean? The first two lines establish a function in Python while setting the starting sum and starting error to 0.0:
def kahan_sum(x): sum = 0.0 c = 0.0
The next few lines are the for loop that iterates over each number in the list. First, any error is subtracted from the previous iteration.
y = i - c
Second, the new number is added to the running total minus any error.
t = sum + y
Third, error from this new addition is determined and the new total is assigned. This repeats until there are no more numbers.
c = t - sum - y sum = t
A Practical Example
Okay, so the code is pretty simple but how does this work in practice? Suppose we have a list of two numbers:
The sum and error terms are set to 0.0 when the algorithm is started. The first step of each iteration is to take the current value and subtract any error from the previous iteration. Because the starting error is 0.0, we subtract 0.0 from the first value.
1.0 - 0.0 = 1.0
Next we add the result of the previous operation to the total. Again, the initial total is 0.0 so we just add 0.0 to the value from Step 1 (1.0). Oh no! The computer had to make a rounding error. In this case, the computer was off by 0.1. We can handle this error in the next steps.
0.0 + 1.0 ~ 1.1
In this step we determine the error from Step 2. We take the sum from Step 2 (1.1), subtract the total (0.0), and subtract the total from Step 1 (1.0). This leaves us with the approximate error.
1.1 - 0.0 - 1.0 ~ 0.1
Finally, we record the current total for the next iteration!
Now we repeat Steps 1, 2, 3, and 4 for each additional number. The difference this time is that we have non-zero values for the error and total terms. First, we subtract the error term from the last iteration to the new value:
1.0 - 0.1 = 0.9
Next, add the new value to the previous total:
1.1 + 0.9 = 2.0
Next, take the sum from the last step and subtract the previous iteration’s total and the value from the first step to estimate any error. In this case there is no error so we record a value of 0.0 for the error going into the next iteration:
2.0 - 1.1 - 0.9 = 0.0
Finally, return the sum. We can see that even though the computer made an error of 0.1, the algorithm corrected itself and returned the correct value:
Kahan’s method of summation strikes a balance between the speed of floating point arithmetic and accuracy. Hopefully this walkthrough makes the algorithm more approachable.