design: add 71953-go-dynamic-tls.md
For golang/go#71953
Change-Id: Ie6f3641dbea4a4e0993289f8db25bbe6c228724c
GitHub-Last-Rev: ec997ee87ce120f515f3f737993968f8d2548a4b
GitHub-Pull-Request: golang/proposal#56
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/644615
Commit-Queue: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
diff --git a/design/71953-go-dynamic-tls.md b/design/71953-go-dynamic-tls.md
new file mode 100644
index 0000000..aa70fe4
--- /dev/null
+++ b/design/71953-go-dynamic-tls.md
@@ -0,0 +1,190 @@
+# Proposal: Go general dynamic TLS
+
+Author: Alexander Musman (Advanced Software Technology Lab, Huawei)
+
+Last updated: 2025-01-28
+
+Discussion at [golang.org/issue/71953](https://github.com/golang/go/issues/71953).
+
+## Abstract
+
+The Go runtime currently relies on Thread Local Storage (TLS) to preserve
+goroutine state when interacting with C code,
+but lacks support for the
+general dynamic [TLS model](https://uclibc.org/docs/tls.pdf).
+This limitation hinders the use of certain C libraries,
+such as Musl,
+and restricts loading of Go shared libraries without `LD_PRELOAD`.
+We propose extending the Go assembler and linker to support
+the general dynamic TLS model,
+focusing initially on the Arm64 architecture
+on Linux systems.
+This enhancement will enable seamless interoperability with
+a wider range of C libraries
+and improve the flexibility of deploying Go `c-shared` libraries.
+
+## Background
+
+The current Go runtime leverages a Thread Local Storage (TLS) variable
+for preserving the current goroutine (`g`)
+when interacting with C code.
+This is particularly relevant in scenarios such as
+CGO interactions
+and certain runtime functions like
+race detection,
+where the code switches to C.
+To facilitate this,
+Go uses the `runtime.save_g` function
+to store the goroutine in the `runtime·tls_g` TLSBSS variable.
+The `runtime.load_g` function then retrieves it,
+typically upon returning from C code execution.
+The Go assembler and linker currently support two TLS access models:
+_initial exec_
+and _local exec_.
+The _local exec_ model is predominantly utilized,
+especially in build modes like `exe`,
+and is natively supported by the Go linker.
+Conversely, the _initial exec_ model requires external linkers
+like `bfd-ld`, `lld`, or `gold`
+for support.
+While the absence of a dynamic TLS model is generally benign with
+GlibC—
+owing to its adaptable TLS allocation scheme—
+this shortcoming becomes problematic with the Musl C library.
+Musl's more rigid TLS allocation exposes this limitation,
+as highlighted in issue
+[golang.org/issue/54805](https://github.com/golang/go/issues/54805).
+
+## Proposal
+
+Introduce general dynamic TLS (Thread Local Storage) support in the Go
+assembler/linker,
+and update the runtime assembly—
+currently the sole user of TLS variables—
+to accommodate this model.
+Activate this feature in the assembler
+with the explicit option `-tls=GD`,
+while keeping `-tls=IE` as the default for `shared` mode.
+Additionally,
+pass `-D=TLS_GD` to enable architecture-specific
+macro expansion in the runtime's assembly
+when the general dynamic model is employed.
+The linker support will depend on external linking,
+consistent with the existing initial exec TLS approach.
+
+The `cmd/go` command will enable the general dynamic TLS model by default
+in scenarios that require it,
+based on the combination of `GOOS`/`GOARCH`
+and `buildmode`.
+Initially,
+this model will be supported by the Arm64 architecture on Linux systems,
+specifically for `buildmode=c-shared` and `buildmode=c-archive`.
+
+## Rationale
+
+To enable loading a Go `c-shared` module without relying on `LD_PRELOAD`,
+it is essential to support the _general dynamic_ model.
+Since the variable resides within the same runtime package as its users,
+any relaxation of a _global dynamic_ variable reference to _local dynamic_
+is automatically identified and executed by the external linker.
+While one could avoid using the `-D` flag by generating the save/restore
+of the return address directly in the assembler
+(when lowering MOV instruction),
+this approach seems less convenient.
+It does not explicitly show the clobbered register in the assembly code.
+Another consideration would be to modify the runtime functions
+that interact with TLS variables to have a stack frame.
+However,
+this option is not ideal
+because these functions are sometimes executed in performance-critical paths,
+such as during race detection.
+
+## Compatibility
+
+There is no change in exported APIs.
+The build modes affected are `c-shared` and `c-archive`.
+Archives built with `c-archive` may be used in a `c-shared` library,
+which in turn might be loaded without `LD_PRELOAD`.
+The assembler needs to support a new flag `-tls=`,
+which allows to choose TLS model explicitly.
+This flag will be passed by `cmd/go` and will also be useful
+for testing the TLS lowering.
+A new relocation type `R_ARM64_TLS_GD` would be needed in objabi,
+along with potentially other architecture-specific relocation types.
+
+## Implementation
+
+A prototype of the implementation, is done and tested
+with Musl C
+for arm64
+Linux
+(please see [review 644975](https://go-review.googlesource.com/c/go/+/644975)).
+
+### Changes to `cmd/go` for Supported Platforms
+For compatible GOOS/GOARCH combinations and applicable build modes,
+the following flags are passed to the assembler:
+```
+-tls=GD -D=TLS_GD
+```
+These flags allow conditional use of a register to retain
+the return address across calls,
+as detailed below for arm64.
+
+### Modifications in the Runtime for arm64 Assembly
+In assembly code,
+specifically for arm64,
+we propose updating references to thread-local variable
+in `runtime·save_g`/`runtime·load_g`:
+```
+LOAD_TLS_G_R0 ; get the offset of tls_g from the thread pointer
+MRS TPIDR_EL0, R27 ; get the thread pointer into R27
+MOVD g, (R0)(R27) ; use the address in R0+R27
+```
+The TLS usage occurs in frameless functions,
+so we ensure return addresses are preserved across any sequence
+involving calls by
+using a macro definition as follows:
+```
+#ifdef TLS_GD
+ #define LOAD_TLS_G_R0 \
+ MOVD LR, R25 \
+ MOVD runtime·tls_g(SB), R0 \
+ MOVD R25, LR
+#else
+ #define LOAD_TLS_G_R0 \
+ MOVD runtime·tls_g(SB), R0
+#endif
+```
+
+### Assembler Flag Additions and Instruction Lowering
+We introduce a `-tls=[IE,LE,GD]` flag in the asm tool.
+A new `MOVD` instruction variant, `C_TLS_GD`, is defined,
+which lowers to the following four-instruction sequence
+using a new `R_ARM64_TLS_GD` relocation type:
+```
+ADRP var, R0 // Address of the GOT entry
+LDR [R0], R27 // Load stub from GOT
+ADD #0,R0, R0 // Argument to call
+BLR (R27) // Call, R0 returns offset from TP to variable
+```
+The `C_TLS_GD` variant would be used for `TLSBSS` symbols
+only when a flag `-tls=GD` is passed to assembler.
+The default in `shared` mode still remains to be `C_TLS_IE`.
+
+### Linker Enhancements for New Relocation Support
+The linker will support the `R_ARM64_TLS_GD` relocation type,
+added by the assembler
+at the start of the sequence
+and relocated for specified TLS symbols
+using ELF relocations:
+```
+ADRP var, R0 // R_AARCH64_TLSDESC_ADR_PAGE21
+LDR [R0], R27 // R_AARCH64_TLSDESC_LD64_LO12_NC
+ADD #0,R0, R0 // R_AARCH64_TLSDESC_ADD_LO12_NC
+BLR (R27) // R_AARCH64_TLSDESC_CALL
+```
+In PIE mode, while `TLS_IE` is optimized to `TLS_LE`
+(allowing internal linking),
+similar optimization for `TLS_GD` isn't supported
+as `-tls=GD` isn't passed to the assembler in this mode.
+