summaryrefslogtreecommitdiffstats
path: root/sys/dev/hwpmc/hwpmc_x86.c
diff options
context:
space:
mode:
authorbde <bde@FreeBSD.org>2007-11-29 02:01:21 +0000
committerbde <bde@FreeBSD.org>2007-11-29 02:01:21 +0000
commit723157380283ac617fbbbb23fb4ae6120293007b (patch)
treedafc46e7794ee0a35610452328fb817732ffbd66 /sys/dev/hwpmc/hwpmc_x86.c
parent35b85a2fdb998527546900f82b6aca6f012c6561 (diff)
downloadFreeBSD-src-723157380283ac617fbbbb23fb4ae6120293007b.zip
FreeBSD-src-723157380283ac617fbbbb23fb4ae6120293007b.tar.gz
Don't use plain "ret" instructions at targets of jump instructions,
since the branch caches on at least Athlon XP through Athlon 64 CPU's don't understand such instructions and guarantee a cache miss taking at least 10 cycles. Use the documented workaround "ret $0" instead ("nop; ret" also works, but "ret $0" is probably faster on old CPUs). Normal code (even asm code) doesn't branch to "ret", since there is usually some cleanup to do, but the __mcount, .mcount and .mexitcount entry points were optimized too well to have the minimum number of instructions (3 instructions each if profiling is not enabled) and they did this. I didn't see a significant number of cache misses for .mexitcount, but for the shared "ret" for __mcount and .mcount I observed cache misses costing 26 cycles each. For a send(2) syscall that makes about 70 function calls, the cost of these cache misses alone increased the syscall time from about 4000 cycles to about 7000 cycles. 4000 is for a profiling (GUPROF) kernel with profiling disabled; after this fix, configuring profiling only costs about 600 cycles in the 4000, which is consistent with almost perfect branch prediction in the mcounting calls.
Diffstat (limited to 'sys/dev/hwpmc/hwpmc_x86.c')
0 files changed, 0 insertions, 0 deletions
OpenPOWER on IntegriCloud