summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorkib <kib@FreeBSD.org>2015-07-28 07:04:51 +0000
committerkib <kib@FreeBSD.org>2015-07-28 07:04:51 +0000
commit45167e7aef77c4a883e69c3240cf66f6313bed84 (patch)
tree3546ddd2119002178d1afc396e245c4d22e997ba
parent2b8c79506d91b63c1438f6cc3cb525eba633e71b (diff)
downloadFreeBSD-src-45167e7aef77c4a883e69c3240cf66f6313bed84.zip
FreeBSD-src-45167e7aef77c4a883e69c3240cf66f6313bed84.tar.gz
Remove full barrier from the amd64 atomic_load_acq_*(). Strong
ordering semantic of x86 CPUs makes only the compiler barrier neccessary to give the acquire behaviour. Existing implementation ensured sequentially consistent semantic for load_acq, making much stronger guarantee than required by standard's definition of the load acquire. Consumers which depend on the barrier are believed to be identified and already fixed to use proper operations. Noted by: alc (long time ago) Reviewed by: alc, bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
-rw-r--r--sys/amd64/include/atomic.h24
1 files changed, 7 insertions, 17 deletions
diff --git a/sys/amd64/include/atomic.h b/sys/amd64/include/atomic.h
index 016aa70..30f594c 100644
--- a/sys/amd64/include/atomic.h
+++ b/sys/amd64/include/atomic.h
@@ -269,13 +269,13 @@ atomic_testandset_long(volatile u_long *p, u_int v)
* IA32 memory model, a simple store guarantees release semantics.
*
* However, a load may pass a store if they are performed on distinct
- * addresses, so for atomic_load_acq we introduce a Store/Load barrier
- * before the load in SMP kernels. We use "lock addl $0,mem", as
- * recommended by the AMD Software Optimization Guide, and not mfence.
- * In the kernel, we use a private per-cpu cache line as the target
- * for the locked addition, to avoid introducing false data
- * dependencies. In userspace, a word in the red zone on the stack
- * (-8(%rsp)) is utilized.
+ * addresses, so we need a Store/Load barrier for sequentially
+ * consistent fences in SMP kernels. We use "lock addl $0,mem" for a
+ * Store/Load barrier, as recommended by the AMD Software Optimization
+ * Guide, and not mfence. In the kernel, we use a private per-cpu
+ * cache line as the target for the locked addition, to avoid
+ * introducing false data dependencies. In user space, we use a word
+ * in the stack's red zone (-8(%rsp)).
*
* For UP kernels, however, the memory of the single processor is
* always consistent, so we only need to stop the compiler from
@@ -319,22 +319,12 @@ __storeload_barrier(void)
}
#endif /* _KERNEL*/
-/*
- * C11-standard acq/rel semantics only apply when the variable in the
- * call is the same for acq as it is for rel. However, our previous
- * (x86) implementations provided much stronger ordering than required
- * (essentially what is called seq_cst order in C11). This
- * implementation provides the historical strong ordering since some
- * callers depend on it.
- */
-
#define ATOMIC_LOAD(TYPE) \
static __inline u_##TYPE \
atomic_load_acq_##TYPE(volatile u_##TYPE *p) \
{ \
u_##TYPE res; \
\
- __storeload_barrier(); \
res = *p; \
__compiler_membar(); \
return (res); \
OpenPOWER on IntegriCloud