Remove full barrier from the amd64 atomic_load_acq_*(). Strong

ordering semantic of x86 CPUs makes only the compiler barrier neccessary to give the acquire behaviour. Existing implementation ensured sequentially consistent semantic for load_acq, making much stronger guarantee than required by standard's definition of the load acquire. Consumers which depend on the barrier are believed to be identified and already fixed to use proper operations. Noted by: alc (long time ago) Reviewed by: alc, bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
author: kib <kib@FreeBSD.org> 2015-07-28 07:04:51 +0000
committer: kib <kib@FreeBSD.org> 2015-07-28 07:04:51 +0000
commit: 45167e7aef77c4a883e69c3240cf66f6313bed84 (patch)
tree: 3546ddd2119002178d1afc396e245c4d22e997ba
parent: 2b8c79506d91b63c1438f6cc3cb525eba633e71b (diff)
download: FreeBSD-src-45167e7aef77c4a883e69c3240cf66f6313bed84.zip
FreeBSD-src-45167e7aef77c4a883e69c3240cf66f6313bed84.tar.gz
1 files changed, 7 insertions, 17 deletions
diff --git a/sys/amd64/include/atomic.h b/sys/amd64/include/atomic.h
index 016aa70..30f594c 100644
--- a/sys/amd64/include/atomic.h
+++ b/sys/amd64/include/atomic.h
@@ -269,13 +269,13 @@ atomic_testandset_long(volatile u_long *p, u_int v)
  * IA32 memory model, a simple store guarantees release semantics.
  *
  * However, a load may pass a store if they are performed on distinct
- * addresses, so for atomic_load_acq we introduce a Store/Load barrier
- * before the load in SMP kernels.  We use "lock addl $0,mem", as
- * recommended by the AMD Software Optimization Guide, and not mfence.
- * In the kernel, we use a private per-cpu cache line as the target
- * for the locked addition, to avoid introducing false data
- * dependencies.  In userspace, a word in the red zone on the stack
- * (-8(%rsp)) is utilized.
+ * addresses, so we need a Store/Load barrier for sequentially
+ * consistent fences in SMP kernels.  We use "lock addl $0,mem" for a
+ * Store/Load barrier, as recommended by the AMD Software Optimization
+ * Guide, and not mfence.  In the kernel, we use a private per-cpu
+ * cache line as the target for the locked addition, to avoid
+ * introducing false data dependencies.  In user space, we use a word
+ * in the stack's red zone (-8(%rsp)).
  *
  * For UP kernels, however, the memory of the single processor is
  * always consistent, so we only need to stop the compiler from
@@ -319,22 +319,12 @@ __storeload_barrier(void)
 }
 #endif /* _KERNEL*/
 
-/*
- * C11-standard acq/rel semantics only apply when the variable in the
- * call is the same for acq as it is for rel.  However, our previous
- * (x86) implementations provided much stronger ordering than required
- * (essentially what is called seq_cst order in C11).  This
- * implementation provides the historical strong ordering since some
- * callers depend on it.
- */
-
 #define	ATOMIC_LOAD(TYPE)					\
 static __inline u_##TYPE					\
 atomic_load_acq_##TYPE(volatile u_##TYPE *p)			\
 {								\
 	u_##TYPE res;						\
 								\
-	__storeload_barrier();					\
 	res = *p;						\
 	__compiler_membar();					\
 	return (res);						\
author	kib <kib@FreeBSD.org>	2015-07-28 07:04:51 +0000
committer	kib <kib@FreeBSD.org>	2015-07-28 07:04:51 +0000
commit	45167e7aef77c4a883e69c3240cf66f6313bed84 (patch)
tree	3546ddd2119002178d1afc396e245c4d22e997ba
parent	2b8c79506d91b63c1438f6cc3cb525eba633e71b (diff)
download	FreeBSD-src-45167e7aef77c4a883e69c3240cf66f6313bed84.zip FreeBSD-src-45167e7aef77c4a883e69c3240cf66f6313bed84.tar.gz