From e22b77611897ae7d2d5eb4de12f21d2a5fc1ccb6 Mon Sep 17 00:00:00 2001 From: vangyzen Date: Mon, 26 Oct 2015 16:21:56 +0000 Subject: Disable SSE in libthr Clang emits SSE instructions on amd64 in the common path of pthread_mutex_unlock. If the thread does not otherwise use SSE, this usage incurs a context-switch of the FPU/SSE state, which reduces the performance of multiple real-world applications by a non-trivial amount (3-5% in one application). Instead of this change, I experimented with eagerly switching the FPU state at context-switch time. This did not help. Most of the cost seems to be in the read/write of memory--as kib@ stated--and not in the #NM handling. I tested on machines with and without XSAVEOPT. One counter-argument to this change is that most applications already use SIMD, and the number of applications and amount of SIMD usage are only increasing. This is absolutely true. I agree that--in general and in principle--this change is in the wrong direction. However, there are applications that do not use enough SSE to offset the extra context-switch cost. SSE does not provide a clear benefit in the current libthr code with the current compiler, but it does provide a clear loss in some cases. Therefore, disabling SSE in libthr is a non-loss for most, and a gain for some. I refrained from disabling SSE in libc--as was suggested--because I can't make the above argument for libc. It provides a wide variety of code; each case should be analyzed separately. https://lists.freebsd.org/pipermail/freebsd-current/2015-March/055193.html Suggestions from: dim, jmg, rpaulo Sponsored by: Dell Inc. --- lib/libthr/arch/amd64/Makefile.inc | 6 ++++++ lib/libthr/arch/i386/Makefile.inc | 6 ++++++ 2 files changed, 12 insertions(+) (limited to 'lib/libthr') diff --git a/lib/libthr/arch/amd64/Makefile.inc b/lib/libthr/arch/amd64/Makefile.inc index e6d99ec..0ae764c 100644 --- a/lib/libthr/arch/amd64/Makefile.inc +++ b/lib/libthr/arch/amd64/Makefile.inc @@ -1,3 +1,9 @@ #$FreeBSD$ SRCS+= pthread_md.c _umtx_op_err.S + +# With the current compiler and libthr code, using SSE in libthr +# does not provide enough performance improvement to outweigh +# the extra context switch cost. This can measurably impact +# performance when the application also does not use enough SSE. +CFLAGS+=${CFLAGS_NO_SIMD} diff --git a/lib/libthr/arch/i386/Makefile.inc b/lib/libthr/arch/i386/Makefile.inc index 01290d5..81fb6bb 100644 --- a/lib/libthr/arch/i386/Makefile.inc +++ b/lib/libthr/arch/i386/Makefile.inc @@ -1,3 +1,9 @@ # $FreeBSD$ SRCS+= pthread_md.c _umtx_op_err.S + +# With the current compiler and libthr code, using SSE in libthr +# does not provide enough performance improvement to outweigh +# the extra context switch cost. This can measurably impact +# performance when the application also does not use enough SSE. +CFLAGS+=${CFLAGS_NO_SIMD} -- cgit v1.1