| // Copyright 2009 The Go Authors. All rights reserved. |
| // Use of this source code is governed by a BSD-style |
| // license that can be found in the LICENSE file. |
| |
| /* |
| |
| Multi-precision division. Here be dragons. |
| |
| Given u and v, where u is n+m digits, and v is n digits (with no leading zeros), |
| the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v. |
| That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x, |
| and rem = u - quo·v. |
| |
| |
| Long Division |
| |
| Division in a computer proceeds the same as long division in elementary school, |
| but computers are not as good as schoolchildren at following vague directions, |
| so we have to be much more precise about the actual steps and what can happen. |
| |
| We work from most to least significant digit of the quotient, doing: |
| |
| • Guess a digit q, the number of v to subtract from the current |
| section of u to zero out the topmost digit. |
| • Check the guess by multiplying q·v and comparing it against |
| the current section of u, adjusting the guess as needed. |
| • Subtract q·v from the current section of u. |
| • Add q to the corresponding section of the result quo. |
| |
| When all digits have been processed, the final remainder is left in u |
| and returned as rem. |
| |
| For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2). |
| |
| q₂ q₁ q₀ |
| _________________ |
| v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀ |
| ↓ ↓ ↓ | | |
| [u₄ u₃ u₂]| | |
| - [ q₂·v ]| | |
| ----------- ↓ | |
| [ rem | u₁]| |
| - [ q₁·v ]| |
| ----------- ↓ |
| [ rem | u₀] |
| - [ q₀·v ] |
| ------------ |
| [ rem ] |
| |
| Instead of creating new storage for the remainders and copying digits from u |
| as indicated by the arrows, we use u's storage directly as both the source |
| and destination of the subtractions, so that the remainders overwrite |
| successive overlapping sections of u as the division proceeds, using a slice |
| of u to identify the current section. This avoids all the copying as well as |
| shifting of remainders. |
| |
| Division of u with n+m digits by v with n digits (in base B) can in general |
| produce at most m+1 digits, because: |
| |
| • u < B^(n+m) [B^(n+m) has n+m+1 digits] |
| • v ≥ B^(n-1) [B^(n-1) is the smallest n-digit number] |
| • u/v < B^(n+m) / B^(n-1) [divide bounds for u, v] |
| • u/v < B^(m+1) [simplify] |
| |
| The first step is special: it takes the top n digits of u and divides them by |
| the n digits of v, producing the first quotient digit and an n-digit remainder. |
| In the example, q₂ = ⌊u₄u₃u₂ / v⌋. |
| |
| The first step divides n digits by n digits to ensure that it produces only a |
| single digit. |
| |
| Each subsequent step appends the next digit from u to the remainder and divides |
| those n+1 digits by the n digits of v, producing another quotient digit and a |
| new n-digit remainder. |
| |
| Subsequent steps divide n+1 digits by n digits, an operation that in general |
| might produce two digits. However, as used in the algorithm, that division is |
| guaranteed to produce only a single digit. The dividend is of the form |
| rem·B + d, where rem is a remainder from the previous step and d is a single |
| digit, so: |
| |
| • rem ≤ v - 1 [rem is a remainder from dividing by v] |
| • rem·B ≤ v·B - B [multiply by B] |
| • d ≤ B - 1 [d is a single digit] |
| • rem·B + d ≤ v·B - 1 [add] |
| • rem·B + d < v·B [change ≤ to <] |
| • (rem·B + d)/v < B [divide by v] |
| |
| |
| Guess and Check |
| |
| At each step we need to divide n+1 digits by n digits, but this is for the |
| implementation of division by n digits, so we can't just invoke a division |
| routine: we _are_ the division routine. Instead, we guess at the answer and |
| then check it using multiplication. If the guess is wrong, we correct it. |
| |
| How can this guessing possibly be efficient? It turns out that the following |
| statement (let's call it the Good Guess Guarantee) is true. |
| |
| If |
| |
| • q = ⌊u/v⌋ where u is n+1 digits and v is n digits, |
| • q < B, and |
| • the topmost digit of v = vₙ₋₁ ≥ B/2, |
| |
| then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.) |
| |
| That is, if we know the answer has only a single digit and we guess an answer |
| by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division, |
| then that guess is at least as large as the correct answer. It is also not |
| too much larger: it is off by at most two from the correct answer. |
| |
| Note that in the first step of the overall division, which is an n-by-n-digit |
| division, the 2-by-1 guess uses an implicit uₙ = 0. |
| |
| Note that using a 2-by-1-digit division here does not mean calling ourselves |
| recursively. Instead, we use an efficient direct hardware implementation of |
| that operation. |
| |
| Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v. |
| If a guess q̂ is too big, it will not satisfy this test. Viewed a different way, |
| the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is |
| negative, then the guess q̂ is too big. |
| |
| This gives us a way to compute q. First compute q̂ with 2-by-1-digit division. |
| Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because |
| q̂ ≤ q+2. |
| |
| |
| Scaling Inputs |
| |
| The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2. |
| For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off |
| because the first digit 1 is smaller than B/2 = 5. |
| |
| We can ensure that v has a large top digit by multiplying both u and v by the |
| right amount. Continuing the example, if we multiply both 172 and 19 by 3, we |
| now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough |
| ⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here |
| to multiply by 4, because that can be done with a shift. Specifically, we can |
| always count the number of leading zeros i in the first digit of v and then |
| shift both u and v left by i bits. |
| |
| Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will |
| be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder |
| by the scaling factor (shifting right i bits) when we finish. |
| |
| Note that these shifts happen before and after the entire division algorithm, |
| not at each step in the per-digit iteration. |
| |
| Note the effect of scaling inputs on the size of the possible quotient. |
| In the scaled u/v, u can gain a digit from scaling; v never does, because we |
| pick the scaling factor to make v's top digit larger but without overflowing. |
| If u and v have n+m and n digits after scaling, then: |
| |
| • u < B^(n+m) [B^(n+m) has n+m+1 digits] |
| • v ≥ B^n / 2 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2] |
| • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v] |
| • u/v < 2 B^m [simplify] |
| |
| The quotient can still have m+1 significant digits, but if so the top digit |
| must be a 1. This provides a different way to handle the first digit of the |
| result: compare the top n digits of u against v and fill in either a 0 or a 1. |
| |
| |
| Refining Guesses |
| |
| Before we check whether u < q̂·v, we can adjust our guess to change it from |
| q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋. |
| Although not mentioned above, the Good Guess Guarantee also promises that this |
| 3-by-2-digit division guess is more precise and at most one away from the real |
| answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done |
| without n-digit math. |
| |
| If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to |
| ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division: |
| if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced. |
| |
| Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0, |
| and |
| |
| uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂) |
| [splitting off the bottom digit] |
| = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂ |
| [regrouping] |
| |
| The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁. |
| If the initial guess returns both q̂ and its remainder r̂, then checking |
| whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂. |
| |
| If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by |
| decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂. |
| (As before, this fixup is only needed at most twice.) |
| |
| Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one |
| away from the correct q, and we've avoided doing any n-digit math. |
| (If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.) |
| |
| The final check u < q̂·v and the possible fixup must be done at full precision. |
| For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess |
| is not often wrong at all. But still we must do the check. Note that since the |
| 3-by-2 guess is off by at most 1, it can be convenient to perform the final |
| u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the |
| subtraction underflows, decremeting q̂ and adding one v back to r is enough to |
| arrive at the final q, r. |
| |
| That's the entirety of long division: scale the inputs, and then loop over |
| each output position, guessing, checking, and correcting the next output digit. |
| |
| For a 2n-digit number divided by an n-digit number (the worst size-n case for |
| division complexity), this algorithm uses n+1 iterations, each of which must do |
| at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of |
| O(n) time each, so O(n²) time overall. |
| |
| |
| Recursive Division |
| |
| For very large inputs, it is possible to improve on the O(n²) algorithm. |
| Let's call a group of n/2 real digits a (very) “wide digit”. We can run the |
| standard long division algorithm explained above over the wide digits instead of |
| the actual digits. This will result in many fewer steps, but the math involved in |
| each step is more work. |
| |
| Where basic long division uses a 2-by-1-digit division to guess the initial q̂, |
| the new algorithm must use a 2-by-1-wide-digit division, which is of course |
| really an n-by-n/2-digit division. That's OK: if we implement n-digit division |
| in terms of n/2-digit division, the recursion will terminate when the divisor |
| becomes small enough to handle with standard long division or even with the |
| 2-by-1 hardware instruction. |
| |
| For example, here is a sketch of dividing 10 digits by 4, proceeding with |
| wide digits corresponding to two regular digits. The first step, still special, |
| must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit |
| remainder less than v. The middle steps divide 6 digits by 4, guaranteed to |
| produce two output digits each (one wide digit) with 4-digit remainders. |
| The final step must use what it has: the 4-digit remainder plus one more, |
| 5 digits to divide by 4. |
| |
| q₆ q₅ q₄ q₃ q₂ q₁ q₀ |
| _______________________________ |
| v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ |
| ↓ ↓ ↓ ↓ ↓ | | | | | |
| [u₉ u₈ u₇ u₆ u₅]| | | | | |
| - [ q₆q₅·v ]| | | | | |
| ----------------- ↓ ↓ | | | |
| [ rem |u₄ u₃]| | | |
| - [ q₄q₃·v ]| | | |
| -------------------- ↓ ↓ | |
| [ rem |u₂ u₁]| |
| - [ q₂q₁·v ]| |
| -------------------- ↓ |
| [ rem |u₀] |
| - [ q₀·v ] |
| ------------------ |
| [ rem ] |
| |
| An alternative would be to look ahead to how well n/2 divides into n+m and |
| adjust the first step to use fewer digits as needed, making the first step |
| more special to make the last step not special at all. For example, using the |
| same input, we could choose to use only 4 digits in the first step, leaving |
| a full wide digit for the last step: |
| |
| q₆ q₅ q₄ q₃ q₂ q₁ q₀ |
| _______________________________ |
| v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ |
| ↓ ↓ ↓ ↓ | | | | | | |
| [u₉ u₈ u₇ u₆]| | | | | | |
| - [ q₆·v ]| | | | | | |
| -------------- ↓ ↓ | | | | |
| [ rem |u₅ u₄]| | | | |
| - [ q₅q₄·v ]| | | | |
| -------------------- ↓ ↓ | | |
| [ rem |u₃ u₂]| | |
| - [ q₃q₂·v ]| | |
| -------------------- ↓ ↓ |
| [ rem |u₁ u₀] |
| - [ q₁q₀·v ] |
| --------------------- |
| [ rem ] |
| |
| Today, the code in divRecursiveStep works like the first example. Perhaps in |
| the future we will make it work like the alternative, to avoid a special case |
| in the final iteration. |
| |
| Either way, each step is a 3-by-2-wide-digit division approximated first by |
| a 2-by-1-wide-digit division, just as we did for regular digits in long division. |
| Because the actual answer we want is a 3-by-2-wide-digit division, instead of |
| multiplying q̂·v directly during the fixup, we can use the quick refinement |
| from long division (an n/2-by-n/2 multiply) to correct q to its actual value |
| and also compute the remainder (as mentioned above), and then stop after that, |
| never doing a full n-by-n multiply. |
| |
| Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add |
| (not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that |
| produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee |
| to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first |
| digit. These benefits are discussed more after the Good Guess Guarantee proof |
| below. |
| |
| |
| How Fast is Recursive Division? |
| |
| For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over |
| wide digits, producing two wide digits plus a possible leading regular digit 1, |
| which can be handled without a recursive call. That is, the algorithm uses two |
| full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit |
| multiplication, along with a few n-digit additions and subtractions. The standard |
| n-by-n-digit multiplication algorithm requires O(n²) time, making the overall |
| algorithm require time T(n) where |
| |
| T(n) = 2T(n/2) + O(n) + O(n²) |
| |
| which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²). |
| This is not an improvement over regular long division. |
| |
| When the number of digits n becomes large enough, Karatsuba's algorithm for |
| multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time. |
| (Karatsuba multiplication is implemented in func karatsuba in nat.go.) |
| That makes the overall recursive division algorithm take O(n^1.6) time as well, |
| which is an improvement, but again only for large enough numbers. |
| |
| It is not critical to make sure that every recursion does only two recursive |
| calls. While in general the number of recursive calls can change the time |
| analysis, in this case doing three calls does not change the analysis: |
| |
| T(n) = 3T(n/2) + O(n) + O(n^log₂3) |
| |
| ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking |
| time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the |
| division does not hurt the asymptotic performance. Of course, it is likely |
| still faster in practice to do two. |
| |
| |
| Proof of the Good Guess Guarantee |
| |
| Given numbers x, y, let us break them into the quotients and remainders when |
| divided by some scaling factor S, with the added constraints that the quotient |
| x/y and the high part of y are both less than some limit T, and that the high |
| part of y is at least half as big as T. |
| |
| x₁ = ⌊x/S⌋ y₁ = ⌊y/S⌋ |
| x₀ = x mod S y₀ = y mod S |
| |
| x = x₁·S + x₀ 0 ≤ x₀ < S x/y < T |
| y = y₁·S + y₀ 0 ≤ y₀ < S T/2 ≤ y₁ < T |
| |
| And consider the two truncated quotients: |
| |
| q = ⌊x/y⌋ |
| q̂ = ⌊x₁/y₁⌋ |
| |
| We will prove that q ≤ q̂ ≤ q+2. |
| |
| The guarantee makes no real demands on the scaling factor S: it is simply the |
| magnitude of the digits cut from both x and y to produce x₁ and y₁. |
| The guarantee makes only limited demands on T: it must be large enough to hold |
| the quotient x/y, and y₁ must have roughly the same size. |
| |
| To apply to the earlier discussion of 2-by-1 guesses in long division, |
| we would choose: |
| |
| S = Bⁿ⁻¹ |
| T = B |
| x = u |
| x₁ = uₙuₙ₋₁ |
| x₀ = uₙ₋₂...u₀ |
| y = v |
| y₁ = vₙ₋₁ |
| y₀ = vₙ₋₂...u₀ |
| |
| These simpler variables avoid repeating those longer expressions in the proof. |
| |
| Note also that, by definition, truncating division ⌊x/y⌋ satisfies |
| |
| x/y - 1 < ⌊x/y⌋ ≤ x/y. |
| |
| This fact will be used a few times in the proofs. |
| |
| Proof that q ≤ q̂: |
| |
| q̂·y₁ = ⌊x₁/y₁⌋·y₁ [by definition, q̂ = ⌊x₁/y₁⌋] |
| > (x₁/y₁ - 1)·y₁ [x₁/y₁ - 1 < ⌊x₁/y₁⌋] |
| = x₁ - y₁ [distribute y₁] |
| |
| So q̂·y₁ > x₁ - y₁. |
| Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1. |
| |
| q̂ - q = q̂ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] |
| ≥ q̂ - x/y [⌊x/y⌋ < x/y] |
| = (1/y)·(q̂·y - x) [factor out 1/y] |
| ≥ (1/y)·(q̂·y₁·S - x) [y = y₁·S + y₀ ≥ y₁·S] |
| ≥ (1/y)·((x₁ - y₁ + 1)·S - x) [above: q̂·y₁ ≥ x₁ - y₁ + 1] |
| = (1/y)·(x₁·S - y₁·S + S - x) [distribute S] |
| = (1/y)·(S - x₀ - y₁·S) [-x = -x₁·S - x₀] |
| > -y₁·S / y [x₀ < S, so S - x₀ < 0; drop it] |
| ≥ -1 [y₁·S ≤ y] |
| |
| So q̂ - q > -1. |
| Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂. |
| |
| Proof that q̂ ≤ q+2: |
| |
| x₁/y₁ - x/y = x₁·S/y₁·S - x/y [multiply left term by S/S] |
| ≤ x/y₁·S - x/y [x₁S ≤ x] |
| = (x/y)·(y/y₁·S - 1) [factor out x/y] |
| = (x/y)·((y - y₁·S)/y₁·S) [move -1 into y/y₁·S fraction] |
| = (x/y)·(y₀/y₁·S) [y - y₁·S = y₀] |
| = (x/y)·(1/y₁)·(y₀/S) [factor out 1/y₁] |
| < (x/y)·(1/y₁) [y₀ < S, so y₀/S < 1] |
| ≤ (x/y)·(2/T) [y₁ ≥ T/2, so 1/y₁ ≤ 2/T] |
| < T·(2/T) [x/y < T] |
| = 2 [T·(2/T) = 2] |
| |
| So x₁/y₁ - x/y < 2. |
| |
| q̂ - q = ⌊x₁/y₁⌋ - q [by definition, q̂ = ⌊x₁/y₁⌋] |
| = ⌊x₁/y₁⌋ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] |
| ≤ x₁/y₁ - ⌊x/y⌋ [⌊x₁/y₁⌋ ≤ x₁/y₁] |
| < x₁/y₁ - (x/y - 1) [⌊x/y⌋ > x/y - 1] |
| = (x₁/y₁ - x/y) + 1 [regrouping] |
| < 2 + 1 [above: x₁/y₁ - x/y < 2] |
| = 3 |
| |
| So q̂ - q < 3. |
| Since q̂ - q is an integer, q̂ - q ≤ 2. |
| |
| Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore |
| q̂ - q ≤ 1. |
| |
| Note also that in the general case 2n-by-n division where we don't know that |
| x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could |
| remove the special case first step of long division as long as we allow the |
| first fixup loop to run up to four times. (Using a simple comparison to decide |
| whether the first digit is 0 or 1 is still more efficient, though.) |
| |
| Finally, note that when dividing three leading base-B digits by two (scaled), |
| we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T. |
| This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that |
| ⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an |
| integer. For random x and y, the chance of this is 2/B, or, for large B, |
| approximately zero. This means that after we produce the 3-by-2 guess in the |
| long division algorithm, the fixup loop essentially never runs. |
| |
| In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit |
| division has exactly the same effect: the probability of needing a fixup is the |
| same 2/B. Even better, we can allow the general case x/y < 2T and the fixup |
| probability only grows to 4/B, still essentially zero. |
| |
| |
| References |
| |
| There are no great references for implementing long division; thus this comment. |
| Here are some notes about what to expect from the obvious references. |
| |
| Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical |
| reference for long division, but that entire series is highly compressed, never |
| repeating a necessary fact and leaving important insights to the exercises. |
| For example, no rationale whatsoever is given for the calculation that extends |
| q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound. |
| The proof that the calculation even has the desired effect is left to exercises. |
| The solutions to those exercises provided at the back of the book are entirely |
| calculations, still with no explanation as to what is going on or how you would |
| arrive at the idea of doing those exact calculations. Nowhere is it mentioned |
| that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the |
| Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction, |
| making it difficult to understand how modifications like adding another digit |
| or adjusting the quotient range affects the overall bound. |
| |
| All that said, Knuth remains the canonical reference. It is dense but packed |
| full of information and references, and the proofs are simpler than many other |
| presentations. The proofs above are reworkings of Knuth's to remove the |
| arguments by contradiction and add explanations or steps that Knuth omitted. |
| But beware of errors in older printings. Take the published errata with you. |
| |
| Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield” |
| starts with a blunt critique of Knuth's presentation (among others) and then |
| presents a more detailed and easier to follow treatment of long division, |
| including an implementation in Pascal. But the algorithm and implementation |
| work entirely in terms of 3-by-2 division, which is much less useful on modern |
| hardware than an algorithm using 2-by-1 division. The proofs are a bit too |
| focused on digit counting and seem needlessly complex, especially compared to |
| the ones given above. |
| |
| Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of |
| implementing division by an n-digit divisor using recursive calls to division |
| by an n/2-digit divisor, relying on Karatsuba multiplication to yield a |
| sub-quadratic run time. However, the presentation decisions are made almost |
| entirely for the purpose of simplifying the run-time analysis, rather than |
| simplifying the presentation. Instead of a single algorithm that loops over |
| quotient digits, the paper presents two mutually-recursive algorithms, for |
| 2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n |
| algorithm. |
| |
| The proofs in the paper are remarkably complex, especially considering that |
| the algorithm is at its core just long division on wide digits, so that the |
| usual long division proofs apply essentially unaltered. |
| */ |
| |
| package big |
| |
| import "math/bits" |
| |
| // div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v. |
| // It uses z and z2 as the storage for q and r. |
| func (z nat) div(z2, u, v nat) (q, r nat) { |
| if len(v) == 0 { |
| panic("division by zero") |
| } |
| |
| if u.cmp(v) < 0 { |
| q = z[:0] |
| r = z2.set(u) |
| return |
| } |
| |
| if len(v) == 1 { |
| // Short division: long optimized for a single-word divisor. |
| // In that case, the 2-by-1 guess is all we need at each step. |
| var r2 Word |
| q, r2 = z.divW(u, v[0]) |
| r = z2.setWord(r2) |
| return |
| } |
| |
| q, r = z.divLarge(z2, u, v) |
| return |
| } |
| |
| // divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y. |
| // It uses z as the storage for q. |
| // Note that y is a single digit (Word), not a big number. |
| func (z nat) divW(x nat, y Word) (q nat, r Word) { |
| m := len(x) |
| switch { |
| case y == 0: |
| panic("division by zero") |
| case y == 1: |
| q = z.set(x) // result is x |
| return |
| case m == 0: |
| q = z[:0] // result is 0 |
| return |
| } |
| // m > 0 |
| z = z.make(m) |
| r = divWVW(z, 0, x, y) |
| q = z.norm() |
| return |
| } |
| |
| // modW returns x % d. |
| func (x nat) modW(d Word) (r Word) { |
| // TODO(agl): we don't actually need to store the q value. |
| var q nat |
| q = q.make(len(x)) |
| return divWVW(q, 0, x, d) |
| } |
| |
| // divWVW overwrites z with ⌊x/y⌋, returning the remainder r. |
| // The caller must ensure that len(z) = len(x). |
| func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) { |
| r = xn |
| if len(x) == 1 { |
| qq, rr := bits.Div(uint(r), uint(x[0]), uint(y)) |
| z[0] = Word(qq) |
| return Word(rr) |
| } |
| rec := reciprocalWord(y) |
| for i := len(z) - 1; i >= 0; i-- { |
| z[i], r = divWW(r, x[i], y, rec) |
| } |
| return r |
| } |
| |
| // div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn. |
| // It uses z and u as the storage for q and r. |
| // The caller must ensure that len(vIn) ≥ 2 (use divW otherwise) |
| // and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise). |
| func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) { |
| n := len(vIn) |
| m := len(uIn) - n |
| |
| // Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above). |
| // vIn is treated as a read-only input (it may be in use by another |
| // goroutine), so we must make a copy. |
| // uIn is copied to u. |
| shift := nlz(vIn[n-1]) |
| vp := getNat(n) |
| v := *vp |
| shlVU(v, vIn, shift) |
| u = u.make(len(uIn) + 1) |
| u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift) |
| |
| // The caller should not pass aliased z and u, since those are |
| // the two different outputs, but correct just in case. |
| if alias(z, u) { |
| z = nil |
| } |
| q = z.make(m + 1) |
| |
| // Use basic or recursive long division depending on size. |
| if n < divRecursiveThreshold { |
| q.divBasic(u, v) |
| } else { |
| q.divRecursive(u, v) |
| } |
| putNat(vp) |
| |
| q = q.norm() |
| |
| // Undo scaling of remainder. |
| shrVU(u, u, shift) |
| r = u.norm() |
| |
| return q, r |
| } |
| |
| // divBasic implements long division as described above. |
| // It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r. |
| // q must be large enough to hold ⌊u/v⌋. |
| func (q nat) divBasic(u, v nat) { |
| n := len(v) |
| m := len(u) - n |
| |
| qhatvp := getNat(n + 1) |
| qhatv := *qhatvp |
| |
| // Set up for divWW below, precomputing reciprocal argument. |
| vn1 := v[n-1] |
| rec := reciprocalWord(vn1) |
| |
| // Compute each digit of quotient. |
| for j := m; j >= 0; j-- { |
| // Compute the 2-by-1 guess q̂. |
| // The first iteration must invent a leading 0 for u. |
| qhat := Word(_M) |
| var ujn Word |
| if j+n < len(u) { |
| ujn = u[j+n] |
| } |
| |
| // ujn ≤ vn1, or else q̂ would be more than one digit. |
| // For ujn == vn1, we set q̂ to the max digit M above. |
| // Otherwise, we compute the 2-by-1 guess. |
| if ujn != vn1 { |
| var rhat Word |
| qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec) |
| |
| // Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above. |
| vn2 := v[n-2] |
| x1, x2 := mulWW(qhat, vn2) |
| ujn2 := u[j+n-2] |
| for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2] |
| qhat-- |
| prevRhat := rhat |
| rhat += vn1 |
| // If r̂ overflows, then |
| // r̂ u[j+n-2]v[n-1] is now definitely > x1 x2. |
| if rhat < prevRhat { |
| break |
| } |
| // TODO(rsc): No need for a full mulWW. |
| // x2 += vn2; if x2 overflows, x1++ |
| x1, x2 = mulWW(qhat, vn2) |
| } |
| } |
| |
| // Compute q̂·v. |
| qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0) |
| qhl := len(qhatv) |
| if j+qhl > len(u) && qhatv[n] == 0 { |
| qhl-- |
| } |
| |
| // Subtract q̂·v from the current section of u. |
| // If it underflows, q̂·v > u, which we fix up |
| // by decrementing q̂ and adding v back. |
| c := subVV(u[j:j+qhl], u[j:], qhatv) |
| if c != 0 { |
| c := addVV(u[j:j+n], u[j:], v) |
| // If n == qhl, the carry from subVV and the carry from addVV |
| // cancel out and don't affect u[j+n]. |
| if n < qhl { |
| u[j+n] += c |
| } |
| qhat-- |
| } |
| |
| // Save quotient digit. |
| // Caller may know the top digit is zero and not leave room for it. |
| if j == m && m == len(q) && qhat == 0 { |
| continue |
| } |
| q[j] = qhat |
| } |
| |
| putNat(qhatvp) |
| } |
| |
| // greaterThan reports whether the two digit numbers x1 x2 > y1 y2. |
| // TODO(rsc): In contradiction to most of this file, x1 is the high |
| // digit and x2 is the low digit. This should be fixed. |
| func greaterThan(x1, x2, y1, y2 Word) bool { |
| return x1 > y1 || x1 == y1 && x2 > y2 |
| } |
| |
| // divRecursiveThreshold is the number of divisor digits |
| // at which point divRecursive is faster than divBasic. |
| const divRecursiveThreshold = 100 |
| |
| // divRecursive implements recursive division as described above. |
| // It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r. |
| // z must be large enough to hold ⌊u/v⌋. |
| // This function is just for allocating and freeing temporaries |
| // around divRecursiveStep, the real implementation. |
| func (z nat) divRecursive(u, v nat) { |
| // Recursion depth is (much) less than 2 log₂(len(v)). |
| // Allocate a slice of temporaries to be reused across recursion, |
| // plus one extra temporary not live across the recursion. |
| recDepth := 2 * bits.Len(uint(len(v))) |
| tmp := getNat(3 * len(v)) |
| temps := make([]*nat, recDepth) |
| |
| z.clear() |
| z.divRecursiveStep(u, v, 0, tmp, temps) |
| |
| // Free temporaries. |
| for _, n := range temps { |
| if n != nil { |
| putNat(n) |
| } |
| } |
| putNat(tmp) |
| } |
| |
| // divRecursiveStep is the actual implementation of recursive division. |
| // It adds ⌊u/v⌋ to z and overwrites u with the remainder r. |
| // z must be large enough to hold ⌊u/v⌋. |
| // It uses temps[depth] (allocating if needed) as a temporary live across |
| // the recursive call. It also uses tmp, but not live across the recursion. |
| func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) { |
| // u is a subsection of the original and may have leading zeros. |
| // TODO(rsc): The v = v.norm() is useless and should be removed. |
| // We know (and require) that v's top digit is ≥ B/2. |
| u = u.norm() |
| v = v.norm() |
| if len(u) == 0 { |
| z.clear() |
| return |
| } |
| |
| // Fall back to basic division if the problem is now small enough. |
| n := len(v) |
| if n < divRecursiveThreshold { |
| z.divBasic(u, v) |
| return |
| } |
| |
| // Nothing to do if u is shorter than v (implies u < v). |
| m := len(u) - n |
| if m < 0 { |
| return |
| } |
| |
| // We consider B digits in a row as a single wide digit. |
| // (See “Recursive Division” above.) |
| // |
| // TODO(rsc): rename B to Wide, to avoid confusion with _B, |
| // which is something entirely different. |
| // TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋. |
| B := n / 2 |
| |
| // Allocate a nat for qhat below. |
| if temps[depth] == nil { |
| temps[depth] = getNat(n) // TODO(rsc): Can be just B+1. |
| } else { |
| *temps[depth] = temps[depth].make(B + 1) |
| } |
| |
| // Compute each wide digit of the quotient. |
| // |
| // TODO(rsc): Change the loop to be |
| // for j := (m+B-1)/B*B; j > 0; j -= B { |
| // which will make the final step a regular step, letting us |
| // delete what amounts to an extra copy of the loop body below. |
| j := m |
| for j > B { |
| // Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits). |
| // First make the 2-by-1-wide-digit guess using a recursive call. |
| // Then extend the guess to the full 3-by-2 (see “Refining Guesses”). |
| // |
| // For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit, |
| // we use a (2B+1)-by-(B+1) digit, which handles the possibility that |
| // the result has an extra leading 1 digit as well as guaranteeing |
| // that the computed q̂ will be off by at most 1 instead of 2. |
| |
| // s is the number of digits to drop from the 3B- and 2B-digit chunks. |
| // We drop B-1 to be left with 2B+1 and B+1. |
| s := (B - 1) |
| |
| // uu is the up-to-3B-digit section of u we are working on. |
| uu := u[j-B:] |
| |
| // Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n]. |
| qhat := *temps[depth] |
| qhat.clear() |
| qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps) |
| qhat = qhat.norm() |
| |
| // Extend to a 3-by-2 quotient and remainder. |
| // Because divRecursiveStep overwrote the top part of uu with |
| // the remainder r̂, the full uu already contains the equivalent |
| // of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion. |
| // Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder. |
| // If that subtraction underflows, q̂·v > u, which we fix up |
| // by decrementing q̂ and adding v back, same as in long division. |
| |
| // TODO(rsc): Instead of subtract and fix-up, this code is computing |
| // q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u. |
| // But we can do the subtraction directly, as in the comment above |
| // and in long division, because we know that q̂ is wrong by at most one. |
| qhatv := tmp.make(3 * n) |
| qhatv.clear() |
| qhatv = qhatv.mul(qhat, v[:s]) |
| for i := 0; i < 2; i++ { |
| e := qhatv.cmp(uu.norm()) |
| if e <= 0 { |
| break |
| } |
| subVW(qhat, qhat, 1) |
| c := subVV(qhatv[:s], qhatv[:s], v[:s]) |
| if len(qhatv) > s { |
| subVW(qhatv[s:], qhatv[s:], c) |
| } |
| addAt(uu[s:], v[s:], 0) |
| } |
| if qhatv.cmp(uu.norm()) > 0 { |
| panic("impossible") |
| } |
| c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv) |
| if c > 0 { |
| subVW(uu[len(qhatv):], uu[len(qhatv):], c) |
| } |
| addAt(z, qhat, j-B) |
| j -= B |
| } |
| |
| // TODO(rsc): Rewrite loop as described above and delete all this code. |
| |
| // Now u < (v<<B), compute lower bits in the same way. |
| // Choose shift = B-1 again. |
| s := B - 1 |
| qhat := *temps[depth] |
| qhat.clear() |
| qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps) |
| qhat = qhat.norm() |
| qhatv := tmp.make(3 * n) |
| qhatv.clear() |
| qhatv = qhatv.mul(qhat, v[:s]) |
| // Set the correct remainder as before. |
| for i := 0; i < 2; i++ { |
| if e := qhatv.cmp(u.norm()); e > 0 { |
| subVW(qhat, qhat, 1) |
| c := subVV(qhatv[:s], qhatv[:s], v[:s]) |
| if len(qhatv) > s { |
| subVW(qhatv[s:], qhatv[s:], c) |
| } |
| addAt(u[s:], v[s:], 0) |
| } |
| } |
| if qhatv.cmp(u.norm()) > 0 { |
| panic("impossible") |
| } |
| c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv) |
| if c > 0 { |
| c = subVW(u[len(qhatv):], u[len(qhatv):], c) |
| } |
| if c > 0 { |
| panic("impossible") |
| } |
| |
| // Done! |
| addAt(z, qhat.norm(), 0) |
| } |