From c39df5fa37b0623589508c95515b4aa1531c524e Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:29 -0700 Subject: exit: call disassociate_ctty() before exit_task_namespaces() Commit 8aac62706ada ("move exit_task_namespaces() outside of exit_notify()") breaks pppd and the exiting service crashes the kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: ppp_register_channel+0x13/0x20 [ppp_generic] Call Trace: ppp_asynctty_open+0x12b/0x170 [ppp_async] tty_ldisc_open.isra.2+0x27/0x60 tty_ldisc_hangup+0x1e3/0x220 __tty_hangup+0x2c4/0x440 disassociate_ctty+0x61/0x270 do_exit+0x7f2/0xa50 ppp_register_channel() needs ->net_ns and current->nsproxy == NULL. Move disassociate_ctty() before exit_task_namespaces(), it doesn't make sense to delay it after perf_event_exit_task() or cgroup_exit(). This also allows to use task_work_add() inside the (nontrivial) code paths in disassociate_ctty(). Investigated by Peter Hurley. Signed-off-by: Oleg Nesterov Reported-by: Sree Harsha Totakura Cc: Peter Hurley Cc: Sree Harsha Totakura Cc: "Eric W. Biederman" Cc: Jeff Dike Cc: Ingo Molnar Cc: Andrey Vagin Cc: Al Viro Cc: [v3.10+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 6480d1c..11f9e39 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -784,6 +784,8 @@ void do_exit(long code) exit_shm(tsk); exit_files(tsk); exit_fs(tsk); + if (group_dead) + disassociate_ctty(1); exit_task_namespaces(tsk); exit_task_work(tsk); check_stack_usage(); @@ -799,13 +801,9 @@ void do_exit(long code) cgroup_exit(tsk); - if (group_dead) - disassociate_ctty(1); - module_put(task_thread_info(tsk)->exec_domain->module); proc_exit_connector(tsk); - /* * FIXME: do that only when needed, using sched_exit tracepoint */ -- cgit v1.1 From 4bcb8232cf4eb061b086c10f56b6808adcdb5a93 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:30 -0700 Subject: exit: move check_stack_usage() to the end of do_exit() It is not clear why check_stack_usage() is called so early and thus it never checks the stack usage in, say, exit_notify() or flush_ptrace_hw_breakpoint() or other functions which are only called by do_exit(). Move the callsite down to the last preempt_disable/schedule. Signed-off-by: Oleg Nesterov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 11f9e39..171c9a9d7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -788,7 +788,6 @@ void do_exit(long code) disassociate_ctty(1); exit_task_namespaces(tsk); exit_task_work(tsk); - check_stack_usage(); exit_thread(); /* @@ -842,6 +841,7 @@ void do_exit(long code) validate_creds_for_do_exit(tsk); + check_stack_usage(); preempt_disable(); if (tsk->nr_dirtied) __this_cpu_add(dirty_throttle_leaks, tsk->nr_dirtied); -- cgit v1.1 From ef9823939e5acd5d323ff61fbc427ef998dd203e Mon Sep 17 00:00:00 2001 From: Guillaume Morin Date: Mon, 7 Apr 2014 15:38:31 -0700 Subject: kernel/exit.c: call proc_exit_connector() after exit_state is set The process events connector delivers a notification when a process exits. This is really convenient for a process that spawns and wants to monitor its children through an epoll-able() interface. Unfortunately, there is a small window between when the event is delivered and the child become wait()-able. This is creates a race if the parent wants to make sure that it knows about the exit, e.g pid_t pid = fork(); if (pid > 0) { register_interest_for_pid(pid); if (waitpid(pid, NULL, WNOHANG) > 0) { /* We might have raced with exit() */ } return; } /* Child */ execve(...) register_interest_for_pid() would be telling the the connector socket reader to pay attention to events related to pid. Though this is not a bug, I think it would make the connector a bit more usable if this race was closed by simply moving the call to proc_exit_connector() from just before exit_notify() to right after. Oleg said: : Even with this patch the code above is still "racy" if the child is : multi-threaded. Plus it should obviously filter-out subthreads. And : afaics there is no way to make it reliable, even if you change the code : above so that waitpid() is called only after the last thread exits WNOHANG : still can fail. Signed-off-by: Guillaume Morin Cc: Matt Helsley Cc: Oleg Nesterov Cc: David S. Miller Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 171c9a9d7..decf648 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -802,13 +802,13 @@ void do_exit(long code) module_put(task_thread_info(tsk)->exec_domain->module); - proc_exit_connector(tsk); /* * FIXME: do that only when needed, using sched_exit tracepoint */ flush_ptrace_hw_breakpoint(tsk); exit_notify(tsk, group_dead); + proc_exit_connector(tsk); #ifdef CONFIG_NUMA task_lock(tsk); mpol_put(tsk->mempolicy); -- cgit v1.1 From dfccbb5e49a621c1b21a62527d61fc4305617aca Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:41 -0700 Subject: wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and drops tasklist_lock. If this task is not the natural child and it is traced, we change its state back to EXIT_ZOMBIE for ->real_parent. The last transition is racy, this is even documented in 50b8d257486a "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race". wait_consider_task() tries to detect this transition and clear ->notask_error but we can't rely on ptrace_reparented(), debugger can exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE. And there is another problem which were missed before: this transition can also race with reparent_leader() which doesn't reset >exit_signal if EXIT_DEAD, assuming that this task must be reaped by someone else. So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if /sbin/init doesn't use __WALL it becomes unreapable. Change reparent_leader() to update ->exit_signal even if EXIT_DEAD. Note: this is the simple temporary hack for -stable, it doesn't try to solve all problems, it will be reverted by the next changes. Signed-off-by: Oleg Nesterov Reported-by: Jan Kratochvil Reported-by: Michal Schmidt Tested-by: Michal Schmidt Cc: Al Viro Cc: Lennart Poettering Cc: Roland McGrath Cc: Tejun Heo Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index decf648..e354cbb 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -560,9 +560,6 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p, struct list_head *dead) { list_move_tail(&p->sibling, &p->real_parent->children); - - if (p->exit_state == EXIT_DEAD) - return; /* * If this is a threaded reparent there is no need to * notify anyone anything has happened. @@ -570,9 +567,19 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p, if (same_thread_group(p->real_parent, father)) return; - /* We don't want people slaying init. */ + /* + * We don't want people slaying init. + * + * Note: we do this even if it is EXIT_DEAD, wait_task_zombie() + * can change ->exit_state to EXIT_ZOMBIE. If this is the final + * state, do_notify_parent() was already called and ->exit_signal + * doesn't matter. + */ p->exit_signal = SIGCHLD; + if (p->exit_state == EXIT_DEAD) + return; + /* If it has exited notify the new parent about this child's death. */ if (!p->ptrace && p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) { -- cgit v1.1 From abd50b39e783e1b6c75c7534c37f1eb2d94a89cd Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:42 -0700 Subject: wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and drops tasklist_lock. If this task is not the natural child and it is traced, we change its state back to EXIT_ZOMBIE for ->real_parent. The last transition is racy, this is even documented in 50b8d257486a "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race". wait_consider_task() tries to detect this transition and clear ->notask_error but we can't rely on ptrace_reparented(), debugger can exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE. And there is another problem which were missed before: this transition can also race with reparent_leader() which doesn't reset >exit_signal if EXIT_DEAD, assuming that this task must be reaped by someone else. So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if /sbin/init doesn't use __WALL it becomes unreapable. This was fixed by the previous commit, but it was the temporary hack. 1. Add the new exit_state, EXIT_TRACE. It means that the task is the traced zombie, debugger is going to detach and notify its natural parent. This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we can avoid the changes in proc/kgdb code, get_task_state() still reports "X (dead)" in this case. Note: with or without this change userspace can see Z -> X -> Z transition. Not really bad, but probably makes sense to fix. 2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD if we need to notify the ->real_parent. 3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD is always the final state we can safely ignore such a task. 4. Change wait_consider_task() to check EXIT_TRACE separately and kill the racy and no longer needed ptrace_reparented() case. If ptrace == T an EXIT_TRACE thread should be simply ignored, the owner of this state is going to ptrace_unlink() this task. We can pretend that it was already removed from ->ptraced list. Otherwise we should skip this thread too but clear ->notask_error, we must be the natural parent and debugger is going to untrace and notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace" even if the task was already untraced. Signed-off-by: Oleg Nesterov Reported-by: Jan Kratochvil Reported-by: Michal Schmidt Tested-by: Michal Schmidt Cc: Al Viro Cc: Lennart Poettering Cc: Roland McGrath Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 50 +++++++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 29 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index e354cbb..022a0ff 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -560,6 +560,9 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p, struct list_head *dead) { list_move_tail(&p->sibling, &p->real_parent->children); + + if (p->exit_state == EXIT_DEAD) + return; /* * If this is a threaded reparent there is no need to * notify anyone anything has happened. @@ -567,19 +570,9 @@ static void reparent_leader(struct task_struct *father, struct task_struct *p, if (same_thread_group(p->real_parent, father)) return; - /* - * We don't want people slaying init. - * - * Note: we do this even if it is EXIT_DEAD, wait_task_zombie() - * can change ->exit_state to EXIT_ZOMBIE. If this is the final - * state, do_notify_parent() was already called and ->exit_signal - * doesn't matter. - */ + /* We don't want people slaying init. */ p->exit_signal = SIGCHLD; - if (p->exit_state == EXIT_DEAD) - return; - /* If it has exited notify the new parent about this child's death. */ if (!p->ptrace && p->exit_state == EXIT_ZOMBIE && thread_group_empty(p)) { @@ -1043,17 +1036,13 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p) return wait_noreap_copyout(wo, p, pid, uid, why, status); } + traced = ptrace_reparented(p); /* - * Try to move the task's state to DEAD - * only one thread is allowed to do this: + * Move the task's state to DEAD/TRACE, only one thread can do this. */ - state = xchg(&p->exit_state, EXIT_DEAD); - if (state != EXIT_ZOMBIE) { - BUG_ON(state != EXIT_DEAD); + state = traced ? EXIT_TRACE : EXIT_DEAD; + if (cmpxchg(&p->exit_state, EXIT_ZOMBIE, state) != EXIT_ZOMBIE) return 0; - } - - traced = ptrace_reparented(p); /* * It can be ptraced but not reparented, check * thread_group_leader() to filter out sub-threads. @@ -1114,7 +1103,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p) /* * Now we are sure this task is interesting, and no other - * thread can reap it because we set its state to EXIT_DEAD. + * thread can reap it because we its state == DEAD/TRACE. */ read_unlock(&tasklist_lock); @@ -1159,14 +1148,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p) * If this is not a sub-thread, notify the parent. * If parent wants a zombie, don't release it now. */ + state = EXIT_DEAD; if (thread_group_leader(p) && - !do_notify_parent(p, p->exit_signal)) { - p->exit_state = EXIT_ZOMBIE; - p = NULL; - } + !do_notify_parent(p, p->exit_signal)) + state = EXIT_ZOMBIE; + p->exit_state = state; write_unlock_irq(&tasklist_lock); } - if (p != NULL) + if (state == EXIT_DEAD) release_task(p); return retval; @@ -1362,12 +1351,15 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, } /* dead body doesn't have much to contribute */ - if (unlikely(p->exit_state == EXIT_DEAD)) { + if (unlikely(p->exit_state == EXIT_DEAD)) + return 0; + + if (unlikely(p->exit_state == EXIT_TRACE)) { /* - * But do not ignore this task until the tracer does - * wait_task_zombie()->do_notify_parent(). + * ptrace == 0 means we are the natural parent. In this case + * we should clear notask_error, debugger will notify us. */ - if (likely(!ptrace) && unlikely(ptrace_reparented(p))) + if (likely(!ptrace)) wo->notask_error = 0; return 0; } -- cgit v1.1 From b436069059fede30ca31d4bf439cc86436ff5b1d Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:43 -0700 Subject: wait: use EXIT_TRACE only if thread_group_leader(zombie) wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if ptrace_reparented(). This is suboptimal and a bit confusing: we do not need do_notify_parent(p) if !thread_group_leader(p) and in this case we also do not need ptrace_unlink(), we can rely on ptrace_release_task(). Change wait_task_zombie() to check thread_group_leader() along with ptrace_reparented() and simplify the final p->exit_state transition. Signed-off-by: Oleg Nesterov Tested-by: Michal Schmidt Cc: Jan Kratochvil Cc: Al Viro Cc: Lennart Poettering Cc: Roland McGrath Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 022a0ff..4773ed9 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1040,7 +1040,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p) /* * Move the task's state to DEAD/TRACE, only one thread can do this. */ - state = traced ? EXIT_TRACE : EXIT_DEAD; + state = traced && thread_group_leader(p) ? EXIT_TRACE : EXIT_DEAD; if (cmpxchg(&p->exit_state, EXIT_ZOMBIE, state) != EXIT_ZOMBIE) return 0; /* @@ -1140,18 +1140,15 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p) if (!retval) retval = pid; - if (traced) { + if (state == EXIT_TRACE) { write_lock_irq(&tasklist_lock); /* We dropped tasklist, ptracer could die and untrace */ ptrace_unlink(p); - /* - * If this is not a sub-thread, notify the parent. - * If parent wants a zombie, don't release it now. - */ - state = EXIT_DEAD; - if (thread_group_leader(p) && - !do_notify_parent(p, p->exit_signal)) - state = EXIT_ZOMBIE; + + /* If parent wants a zombie, don't release it now */ + state = EXIT_ZOMBIE; + if (do_notify_parent(p, p->exit_signal)) + state = EXIT_DEAD; p->exit_state = state; write_unlock_irq(&tasklist_lock); } -- cgit v1.1 From b3ab03160dfaf8ab78d476b670de319f4c1a5685 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:45 -0700 Subject: wait: completely ignore the EXIT_DEAD tasks Now that EXIT_DEAD is the terminal state it doesn't make sense to call eligible_child() or security_task_wait() if the task is really dead. Signed-off-by: Oleg Nesterov Tested-by: Michal Schmidt Cc: Jan Kratochvil Cc: Al Viro Cc: Lennart Poettering Cc: Roland McGrath Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 4773ed9..33cf8db 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1329,7 +1329,12 @@ static int wait_task_continued(struct wait_opts *wo, struct task_struct *p) static int wait_consider_task(struct wait_opts *wo, int ptrace, struct task_struct *p) { - int ret = eligible_child(wo, p); + int ret; + + if (unlikely(p->exit_state == EXIT_DEAD)) + return 0; + + ret = eligible_child(wo, p); if (!ret) return ret; @@ -1347,10 +1352,6 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, return 0; } - /* dead body doesn't have much to contribute */ - if (unlikely(p->exit_state == EXIT_DEAD)) - return 0; - if (unlikely(p->exit_state == EXIT_TRACE)) { /* * ptrace == 0 means we are the natural parent. In this case -- cgit v1.1 From 377d75dafa07ee0da64223c9169f4e17b26c2b9a Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:47 -0700 Subject: wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent "A zombie is only visible to its ptracer" logic in wait_consider_task() is very wrong. Trivial test-case: #include #include #include #include int main(void) { int child = fork(); if (!child) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); return 0x23; } assert(waitid(P_ALL, child, NULL, WEXITED | WNOWAIT) == 0); assert(waitid(P_ALL, 0, NULL, WSTOPPED) == -1); return 0; } it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie child. This is because wait_consider_task(ptrace => 0) sees p->ptrace and cleares ->notask_error assuming that the debugger should detach and notify us. Change wait_consider_task(ptrace => 0) to pretend that ptrace == T if the child is traced by us. This really simplifies the logic and allows us to do more fixes, see the next changes. This also hides the unwanted group stop state automatically, we can remove another ptrace_reparented() check. Unfortunately, this adds the following behavioural changes: 1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap a natural child if it is traced by the caller's sub-thread. Hopefully nobody will ever notice this change, and I think that nobody should rely on this behaviour anyway. 2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if it is real parent. While this change comes as a side effect, I think it is good by itself. The group continued state can not be consumed by another process in this case, it doesn't depend on ptrace, it doesn't make sense to hide it from real parent. Perhaps we should add the thread_group_leader() check before wait_task_continued()? May be, but this shouldn't depend on ptrace_reparented(). Signed-off-by: Oleg Nesterov Cc: Al Viro Cc: Jan Kratochvil Cc: Lennart Poettering Cc: Michal Schmidt Cc: Roland McGrath Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 33cf8db..92d38d4 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1362,6 +1362,22 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, return 0; } + if (likely(!ptrace) && unlikely(p->ptrace)) { + /* + * If it is traced by its real parent's group, just pretend + * the caller is ptrace_do_wait() and reap this child if it + * is zombie. + * + * This also hides group stop state from real parent; otherwise + * a single stop can be reported twice as group and ptrace stop. + * If a ptracer wants to distinguish these two events for its + * own children it should create a separate process which takes + * the role of real parent. + */ + if (!ptrace_reparented(p)) + ptrace = 1; + } + /* slay zombie? */ if (p->exit_state == EXIT_ZOMBIE) { /* @@ -1403,19 +1419,6 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, wo->notask_error = 0; } else { /* - * If @p is ptraced by a task in its real parent's group, - * hide group stop/continued state when looking at @p as - * the real parent; otherwise, a single stop can be - * reported twice as group and ptrace stops. - * - * If a ptracer wants to distinguish the two events for its - * own children, it should create a separate process which - * takes the role of real parent. - */ - if (likely(!ptrace) && p->ptrace && !ptrace_reparented(p)) - return 0; - - /* * @p is alive and it's gonna stop, continue or exit, so * there always is something to wait for. */ -- cgit v1.1 From 7c733eb3eac0e3d091aaf37c183d2175eeebfb2b Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 7 Apr 2014 15:38:49 -0700 Subject: wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process Even if the main thread is dead the process still can stop/continue. However, if the leader is ptraced wait_consider_task(ptrace => false) always skips wait_task_stopped/wait_task_continued, so WSTOPPED or WCONTINUED can never work for the natural parent in this case. Move the "A zombie ptracee is only visible to its ptracer" check into the "if (!delay_group_leader(p))" block. ->notask_error is cleared by the "fall through" code below. This depends on the previous change, wait_task_stopped/continued must be avoided if !delay_group_leader() and the tracer is ->real_parent. Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child is already dead (single-threaded or not). If it is traced by another task then the "stopped" state is fine until the debugger detaches and reveals a zombie state. Stupid test-case: void *tfunc(void *arg) { sleep(1); // wait for zombie leader raise(SIGSTOP); exit(0x13); return NULL; } int run_child(void) { pthread_t thread; if (!fork()) { int tracee = getppid(); assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0); do ptrace(PTRACE_CONT, tracee, 0,0); while (wait(NULL) > 0); return 0; } sleep(1); // wait for PTRACE_ATTACH assert(pthread_create(&thread, NULL, tfunc, NULL) == 0); pthread_exit(NULL); } int main(void) { int child, stat; child = fork(); if (!child) return run_child(); assert(child == waitpid(-1, &stat, WSTOPPED)); assert(stat == 0x137f); kill(child, SIGCONT); assert(child == waitpid(-1, &stat, WCONTINUED)); assert(stat == 0xffff); assert(child == waitpid(-1, &stat, 0)); assert(stat == 0x1300); return 0; } Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is never called. Note: this doesn't fix all problems with a zombie delay_group_leader(), WCONTINUED | WEXITED check is not exactly right. debugger can't assume it will be notified if another thread reaps the whole thread group. Signed-off-by: Oleg Nesterov Cc: Al Viro Cc: Jan Kratochvil Cc: Lennart Poettering Cc: Michal Schmidt Cc: Roland McGrath Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/exit.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) (limited to 'kernel/exit.c') diff --git a/kernel/exit.c b/kernel/exit.c index 92d38d4..6ed6a1d 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1380,20 +1380,16 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace, /* slay zombie? */ if (p->exit_state == EXIT_ZOMBIE) { - /* - * A zombie ptracee is only visible to its ptracer. - * Notification and reaping will be cascaded to the real - * parent when the ptracer detaches. - */ - if (likely(!ptrace) && unlikely(p->ptrace)) { - /* it will become visible, clear notask_error */ - wo->notask_error = 0; - return 0; - } - /* we don't reap group leaders with subthreads */ - if (!delay_group_leader(p)) - return wait_task_zombie(wo, p); + if (!delay_group_leader(p)) { + /* + * A zombie ptracee is only visible to its ptracer. + * Notification and reaping will be cascaded to the + * real parent when the ptracer detaches. + */ + if (unlikely(ptrace) || likely(!p->ptrace)) + return wait_task_zombie(wo, p); + } /* * Allow access to stopped/continued state via zombie by -- cgit v1.1