Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 1 | // Copyright 2010 The Go Authors. All rights reserved. |
| 2 | // Use of this source code is governed by a BSD-style |
| 3 | // license that can be found in the LICENSE file. |
| 4 | |
Rob Pike | 8bca148 | 2014-08-12 17:04:45 -0700 | [diff] [blame] | 5 | #include "textflag.h" |
Keith Randall | 1f79663 | 2013-08-12 10:25:18 -0700 | [diff] [blame] | 6 | |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 7 | // The method is based on a paper by Naoki Shibata: "Efficient evaluation |
| 8 | // methods of elementary functions suitable for SIMD computation", Proc. |
| 9 | // of International Supercomputing Conference 2010 (ISC'10), pp. 25 -- 32 |
| 10 | // (May 2010). The paper is available at |
| 11 | // http://www.springerlink.com/content/340228x165742104/ |
| 12 | // |
| 13 | // The original code and the constants below are from the author's |
| 14 | // implementation available at http://freshmeat.net/projects/sleef. |
| 15 | // The README file says, "The software is in public domain. |
| 16 | // You can use the software without any obligation." |
| 17 | // |
| 18 | // This code is a simplified version of the original. |
| 19 | |
| 20 | #define LN2 0.6931471805599453094172321214581766 // log_e(2) |
| 21 | #define LOG2E 1.4426950408889634073599246810018920 // 1/LN2 |
| 22 | #define LN2U 0.69314718055966295651160180568695068359375 // upper half LN2 |
| 23 | #define LN2L 0.28235290563031577122588448175013436025525412068e-12 // lower half LN2 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 24 | #define T0 1.0 |
| 25 | #define T1 0.5 |
| 26 | #define T2 1.6666666666666666667e-1 |
| 27 | #define T3 4.1666666666666666667e-2 |
| 28 | #define T4 8.3333333333333333333e-3 |
| 29 | #define T5 1.3888888888888888889e-3 |
| 30 | #define T6 1.9841269841269841270e-4 |
| 31 | #define T7 2.4801587301587301587e-5 |
| 32 | #define PosInf 0x7FF0000000000000 |
| 33 | #define NegInf 0xFFF0000000000000 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 34 | |
| 35 | // func Exp(x float64) float64 |
Keith Randall | 1f79663 | 2013-08-12 10:25:18 -0700 | [diff] [blame] | 36 | TEXT ·Exp(SB),NOSPLIT,$0 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 37 | // test bits for not-finite |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 38 | MOVQ x+0(FP), BX |
| 39 | MOVQ $~(1<<63), AX // sign bit mask |
| 40 | MOVQ BX, DX |
| 41 | ANDQ AX, DX |
| 42 | MOVQ $PosInf, AX |
| 43 | CMPQ AX, DX |
| 44 | JLE notFinite |
| 45 | MOVQ BX, X0 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 46 | MOVSD $LOG2E, X1 |
| 47 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 48 | CVTSD2SL X1, BX // BX = exponent |
| 49 | CVTSL2SD BX, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 50 | MOVSD $LN2U, X2 |
| 51 | MULSD X1, X2 |
| 52 | SUBSD X2, X0 |
| 53 | MOVSD $LN2L, X2 |
| 54 | MULSD X1, X2 |
| 55 | SUBSD X2, X0 |
| 56 | // reduce argument |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 57 | MULSD $0.0625, X0 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 58 | // Taylor series evaluation |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 59 | MOVSD $T7, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 60 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 61 | ADDSD $T6, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 62 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 63 | ADDSD $T5, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 64 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 65 | ADDSD $T4, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 66 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 67 | ADDSD $T3, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 68 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 69 | ADDSD $T2, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 70 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 71 | ADDSD $T1, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 72 | MULSD X0, X1 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 73 | ADDSD $T0, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 74 | MULSD X1, X0 |
| 75 | MOVSD $2.0, X1 |
| 76 | ADDSD X0, X1 |
| 77 | MULSD X1, X0 |
| 78 | MOVSD $2.0, X1 |
| 79 | ADDSD X0, X1 |
| 80 | MULSD X1, X0 |
| 81 | MOVSD $2.0, X1 |
| 82 | ADDSD X0, X1 |
| 83 | MULSD X1, X0 |
| 84 | MOVSD $2.0, X1 |
| 85 | ADDSD X0, X1 |
| 86 | MULSD X1, X0 |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 87 | ADDSD $1.0, X0 |
| 88 | // return fr * 2**exponent |
Eoghan Sherry | 976e457 | 2010-12-15 13:20:52 -0500 | [diff] [blame] | 89 | MOVL $0x3FF, AX // bias |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 90 | ADDL AX, BX |
| 91 | JLE underflow |
| 92 | CMPL BX, $0x7FF |
| 93 | JGE overflow |
| 94 | MOVL $52, CX |
| 95 | SHLQ CX, BX |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 96 | MOVQ BX, X1 |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 97 | MULSD X1, X0 |
Russ Cox | 07720b6 | 2013-03-22 12:57:55 -0400 | [diff] [blame] | 98 | MOVSD X0, ret+8(FP) |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 99 | RET |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 100 | notFinite: |
| 101 | // test bits for -Inf |
| 102 | MOVQ $NegInf, AX |
| 103 | CMPQ AX, BX |
| 104 | JNE notNegInf |
| 105 | // -Inf, return 0 |
| 106 | underflow: // return 0 |
| 107 | MOVQ $0, AX |
Russ Cox | 07720b6 | 2013-03-22 12:57:55 -0400 | [diff] [blame] | 108 | MOVQ AX, ret+8(FP) |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 109 | RET |
Charles L. Dorian | c8c2bdb | 2010-08-06 16:50:48 -0700 | [diff] [blame] | 110 | overflow: // return +Inf |
| 111 | MOVQ $PosInf, BX |
| 112 | notNegInf: // NaN or +Inf, return x |
Russ Cox | 07720b6 | 2013-03-22 12:57:55 -0400 | [diff] [blame] | 113 | MOVQ BX, ret+8(FP) |
Charles L. Dorian | a0117ba | 2010-06-30 14:44:27 -0700 | [diff] [blame] | 114 | RET |