hqemu - HQEMU

diff options

author	Paolo Bonzini <pbonzini@redhat.com>	2015-03-10 16:45:57 +0100
committer	Stefan Hajnoczi <stefanha@redhat.com>	2015-03-12 17:41:23 +0000
commit	2120465fbb87a69bd60283ec4034a0963766a7ef (patch)
tree	83f65b5c66e8416b2b584982a6478d9bbe608967 /util/rfifolock.c
parent	2a5b58e2405e9fe42ba356b5a1b78146a4e9a659 (diff)
download	hqemu-2120465fbb87a69bd60283ec4034a0963766a7ef.zip hqemu-2120465fbb87a69bd60283ec4034a0963766a7ef.tar.gz

queue: fix QSLIST_INSERT_HEAD_ATOMIC race

There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC. Because atomic_cmpxchg returns the old value instead of a success flag, QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against the second argument to atomic_cmpxchg. Unfortunately, this only works if the second argument is a local or thread-local variable. If it is in memory, it can be subject to common subexpression elimination (and then everything's fine) or reloaded after the atomic_cmpxchg, depending on the compiler's whims. If the latter happens, the race can happen. A thread can sneak in, doing something on elm->field.sle_next after the atomic_cmpxchg and before the comparison. This causes a wrong failure, and then two threads are using "elm" at the same time. In the case discovered by Christian, the sequence was likely something like this: thread 1 | thread 2 QSLIST_INSERT_HEAD_ATOMIC | atomic_cmpxchg succeeds | elm added to list | | steal release_pool | QSLIST_REMOVE_HEAD | elm removed from list | ... | QSLIST_INSERT_HEAD_ATOMIC | (overwrites sle_next) spurious failure | atomic_cmpxchg succeeds | elm added to list again | | steal release_pool | QSLIST_REMOVE_HEAD | elm removed again | The last three steps could be done by a third thread as well. A reproducer that failed in a matter of seconds is as follows: - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled), memory was 16G just to err on the safe side (the host has 64G, but hey at least you need no s390) - the guest has 24 null-aio virtio-blk devices using dataplane (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G -device virtio-blk-pci,iothread=ioN,drive=blkN) - the guest also has a single network interface. It's only doing loopback tests so slirp vs. tap and the model doesn't matter. - the guest is running fio with the following script: [global] rw=randread blocksize=16k ioengine=libaio runtime=10m buffered=0 fallocate=none time_based iodepth=32 [virtio1a] filename=/dev/block/252\:16 [virtio1b] filename=/dev/block/252\:16 ... [virtio24a] filename=/dev/block/252\:384 [virtio24b] filename=/dev/block/252\:384 [listen1] protocol=tcp ioengine=net port=12345 listen rw=read bs=4k size=1000g [connect1] protocol=tcp hostname=localhost ioengine=net port=12345 protocol=tcp rw=write startdelay=1 size=1000g ... [listen8] protocol=tcp ioengine=net port=12352 listen rw=read bs=4k size=1000g [connect8] protocol=tcp hostname=localhost ioengine=net port=12352 rw=write startdelay=1 size=1000g Moral of the story: I should refrain from writing more clever stuff. At least it looks like it is not too clever to be undebuggable. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1426002357-6889-1-git-send-email-pbonzini@redhat.com Fixes: c740ad92d0d958fa785e5d7aa1b67ecaf30a6a54 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Diffstat (limited to 'util/rfifolock.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: