diff options
author | David S. Miller <davem@davemloft.net> | 2014-12-11 21:15:37 -0500 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2014-12-11 21:15:37 -0500 |
commit | 697766df6b952f09b17eefda8b5ef746acb9c1eb (patch) | |
tree | a4962667802529c26231f4768c0233a15f9e9e4c /arch/alpha/include/asm/barrier.h | |
parent | c11a9009ae6a8c42a8cd69d885601e1aa6fbea04 (diff) | |
parent | 124b74c18e0e31b24638d256afee7122a994e1b3 (diff) | |
download | op-kernel-dev-697766df6b952f09b17eefda8b5ef746acb9c1eb.zip op-kernel-dev-697766df6b952f09b17eefda8b5ef746acb9c1eb.tar.gz |
Merge branch 'dma_mb'
Alexander Duyck says:
====================
arch: Add lightweight memory barriers for coherent memory access
These patches introduce two new primitives for synchronizing cache coherent
memory writes and reads. These two new primitives are:
dma_rmb()
dma_wmb()
The first patch cleans up some unnecessary overhead related to the
definition of read_barrier_depends, smp_read_barrier_depends, and comments
related to the barrier.
The second patch adds the primitives for the applicable architectures and
asm-generic.
The third patch adds the barriers to r8169 which turns out to be a good
example of where the new barriers might be useful as they have full
rmb()/wmb() barriers ordering accesses to the descriptors and the DescOwn
bit.
The fourth patch adds support for coherent_rmb() to the Intel fm10k, igb,
and ixgbe drivers. Testing with the ixgbe driver has shown a processing
time reduction of at least 7ns per 64B frame on a Core i7-4930K.
This patch series is essentially the v7 for:
v4-7: Add lightweight memory barriers for coherent memory access
v3: Add lightweight memory barriers fast_rmb() and fast_wmb()
v2: Introduce load_acquire() and store_release()
v1: Introduce read_acquire()
The key changes in this patch series versus the earlier patches are:
v7 resubmit:
- Added Acked-by: Ben Herrenschmidt from v5 to dma_rmb/wmb patch
- No code changes from previous set, still applies cleanly and builds.
v7:
- Dropped test/debug patch that was accidentally slipped in
v6:
- Replaced "memory based device I/O" with "consistent memory" in
docs
- Added reference to DMA-API.txt to explain consistent memory
v5:
- Renamed barriers dma_rmb and dma_wmb
- Undid smp_wmb changes in x86 and PowerPC
- Defined smp_rmb as __lwsync for SMP case on PowerPC
v4:
- Renamed barriers coherent_rmb and coherent_wmb
- Added smp_lwsync for use in smp_load_acquire/smp_store_release
v3:
- Moved away from acquire()/store() and instead focused on barriers
- Added cleanup of read_barrier_depends
- Added change in r8169 to fix cur_tx/DescOwn ordering
- Simplified changes to just replacing/moving barriers in r8169
- Added update to documentation with code example
v2:
- Renamed read_acquire() to be consistent with smp_load_acquire()
- Changed barrier used to be consistent with smp_load_acquire()
- Updated PowerPC code to use __lwsync based on IBM article
- Added store_release() as this is a viable use case for drivers
- Added r8169 patch which is able to fully use primitives
- Added fm10k/igb/ixgbe patch which is able to test performance
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'arch/alpha/include/asm/barrier.h')
-rw-r--r-- | arch/alpha/include/asm/barrier.h | 51 |
1 files changed, 51 insertions, 0 deletions
diff --git a/arch/alpha/include/asm/barrier.h b/arch/alpha/include/asm/barrier.h index 3832bdb..77516c8 100644 --- a/arch/alpha/include/asm/barrier.h +++ b/arch/alpha/include/asm/barrier.h @@ -7,6 +7,57 @@ #define rmb() __asm__ __volatile__("mb": : :"memory") #define wmb() __asm__ __volatile__("wmb": : :"memory") +/** + * read_barrier_depends - Flush all pending reads that subsequents reads + * depend on. + * + * No data-dependent reads from memory-like regions are ever reordered + * over this barrier. All reads preceding this primitive are guaranteed + * to access memory (but not necessarily other CPUs' caches) before any + * reads following this primitive that depend on the data return by + * any of the preceding reads. This primitive is much lighter weight than + * rmb() on most CPUs, and is never heavier weight than is + * rmb(). + * + * These ordering constraints are respected by both the local CPU + * and the compiler. + * + * Ordering is not guaranteed by anything other than these primitives, + * not even by data dependencies. See the documentation for + * memory_barrier() for examples and URLs to more information. + * + * For example, the following code would force ordering (the initial + * value of "a" is zero, "b" is one, and "p" is "&a"): + * + * <programlisting> + * CPU 0 CPU 1 + * + * b = 2; + * memory_barrier(); + * p = &b; q = p; + * read_barrier_depends(); + * d = *q; + * </programlisting> + * + * because the read of "*q" depends on the read of "p" and these + * two reads are separated by a read_barrier_depends(). However, + * the following code, with the same initial values for "a" and "b": + * + * <programlisting> + * CPU 0 CPU 1 + * + * a = 2; + * memory_barrier(); + * b = 3; y = b; + * read_barrier_depends(); + * x = a; + * </programlisting> + * + * does not enforce ordering, since there is no data dependency between + * the read of "a" and the read of "b". Therefore, on some CPUs, such + * as Alpha, "y" could be set to 3 and "x" to 0. Use rmb() + * in cases like this where there are no data dependencies. + */ #define read_barrier_depends() __asm__ __volatile__("mb": : :"memory") #ifdef CONFIG_SMP |