ogle/doc: revive ptrace-nptl.txt.

This file was originally in the debug/proc package in the main Go
repository, but that package was removed on July 2011, before the Go
1.0 release.

diff --git a/doc/ptrace-nptl.txt b/doc/ptrace-nptl.txt
new file mode 100644
index 0000000..81f6a10
--- /dev/null
+++ b/doc/ptrace-nptl.txt
@@ -0,0 +1,128 @@
+ptrace and NTPL, the missing manpage
+== Signals ==
+A signal sent to a ptrace'd process or thread causes only the thread
+that receives it to stop and report to the attached process.
+Use tgkill to target a signal (for example, SIGSTOP) at a particular
+thread.  If you use kill, the signal could be delivered to another
+thread in the same process.
+Note that SIGSTOP differs from its usual behavior when a process is
+being traced.  Usually, a SIGSTOP sent to any thread in a thread group
+will stop all threads in the thread group.  When a thread is traced,
+however, a SIGSTOP affects only the receiving thread (and any other
+threads in the thread group that are not traced).
+SIGKILL behaves like it does for non-traced processes.  It affects all
+threads in the process and terminates them without the WSTOPSIG event
+generated by other signals.  However, if PTRACE_O_TRACEEXIT is set,
+the attached process will still receive PTRACE_EVENT_EXIT events
+before receiving WIFSIGNALED events.
+See "Following thread death" for a caveat regarding signal delivery to
+zombie threads.
+== Waiting on threads ==
+Cloned threads in ptrace'd processes are treated similarly to cloned
+threads in your own process.  Thus, you must use the __WALL option in
+order to receive notifications from threads created by the child
+process.  Similarly, the __WCLONE option will wait only on
+notifications from threads created by the child process and *not* on
+notifications from the initial child thread.
+Even when waiting on a specific thread's PID using waitpid or similar,
+__WALL or __WCLONE is necessary or waitpid will return ECHILD.
+== Attaching to existing threads ==
+libthread_db (which gdb uses), attaches to existing threads by pulling
+the pthread data structures out of the traced process.  The much
+easier way is to traverse the /proc/PID/task directory, though it's
+unclear how the semantics of these two approaches differ.
+Unfortunately, if the main thread has exited (but the overall process
+has not), it sticks around as a zombie process.  This zombie will
+appear in the /proc/PID/task directory, but trying to attach to it
+will yield EPERM.  In this case, the third field of the
+/proc/PID/task/PID/stat file will be "Z".  Attempting to open the stat
+file is also a convenient way to detect races between listing the task
+directory and the thread exiting.  Coincidentally, gdb will simply
+fail to attach to a process whose main thread is a zombie.
+Because new threads may be created while the debugger is in the
+process of attaching to existing threads, the debugger must repeatedly
+re-list the task directory until it has attached to (and thus stopped)
+every thread listed.
+In order to follow new threads created by existing threads,
+PTRACE_O_TRACECLONE must be set on each thread attached to.
+== Following new threads ==
+With the child process stopped, use PTRACE_SETOPTIONS to set the
+PTRACE_O_TRACECLONE option.  This option is per-thread, and thus must
+be set on each existing thread individually.  When an existing thread
+with PTRACE_O_TRACECLONE set spawns a new thread, the existing thread
+will stop with (SIGTRAP | PTRACE_EVENT_CLONE << 8) and the PID of the
+new thread can be retrieved with PTRACE_GETEVENTMSG on the creating
+thread.  At this time, the new thread will exist, but will initially
+be stopped with a SIGSTOP.  The new thread will automatically be
+traced and will inherit the PTRACE_O_TRACECLONE option from its
+parent.  The attached process should wait on the new thread to receive
+the SIGSTOP notification.
+When using waitpid(-1, ...), don't rely on the parent thread reporting
+a SIGTRAP before receiving the SIGSTOP from the new child thread.
+Without PTRACE_O_TRACECLONE, newly cloned threads will not be
+ptrace'd.  As a result, signals received by new threads will be
+handled in the usual way, which may affect the parent and in turn
+appear to the attached process, but attributed to the parent (possibly
+in unexpected ways).
+== Following thread death ==
+If any thread with the PTRACE_O_TRACEEXIT option set exits (either by
+returning or pthread_exit'ing), the tracing process will receive an
+immediate PTRACE_EVENT_EXIT.  At this point, the thread will still
+exist.  The exit status, encoded as for wait, can be queried using
+PTRACE_GETEVENTMSG on the exiting thread's PID.  The thread should be
+continued so it can actually exit, after which its wait behavior is
+the same as for a thread without the PTRACE_O_TRACEEXIT option.
+If a non-main thread exits (either by returning or pthread_exit'ing),
+its corresponding process will also exit, producing a WIFEXITED event
+(after the process is continued from a possible PTRACE_EVENT_EXIT
+event).  It is *not* necessary for another thread to ptrace_join for
+this to happen.
+If the main thread exits by returning, then all threads will exit,
+first generating a PTRACE_EVENT_EXIT event for each thread if
+appropriate, then producing a WIFEXITED event for each thread.
+If the main thread exits using pthread_exit, then it enters a
+non-waitable zombie state.  It will still produce an immediate
+PTRACE_O_TRACEEXIT event, but the WIFEXITED event will be delayed
+until the entire process exits.  This state exists so that shells
+don't think the process is done until all of the threads have exited.
+Unfortunately, signals cannot be delivered to non-waitable zombies.
+Most notably, SIGSTOP cannot be delivered; as a result, when you
+broadcast SIGSTOP to all of the threads, you must not wait for
+non-waitable zombies to stop.  Furthermore, any ptrace command on a
+non-waitable zombie, including PTRACE_DETACH, will return ESRCH.
+== Multi-threaded debuggers ==
+If the debugger itself is multi-threaded, ptrace calls must come from
+the same thread that originally attached to the remote thread.  The
+kernel simply compares the PID of the caller of ptrace against the
+tracer PID of the process passed to ptrace.  Because each debugger
+thread has a different PID, calling ptrace from a different thread
+might as well be calling it from a different process and the kernel
+will return ESRCH.
+wait, on the other hand, does not have this restriction.  Any debugger
+thread can wait on any thread in the attached process.