summaryrefslogtreecommitdiffstats
path: root/lib/libpmc/pmc.corei7.3
diff options
context:
space:
mode:
Diffstat (limited to 'lib/libpmc/pmc.corei7.3')
-rw-r--r--lib/libpmc/pmc.corei7.31581
1 files changed, 1581 insertions, 0 deletions
diff --git a/lib/libpmc/pmc.corei7.3 b/lib/libpmc/pmc.corei7.3
new file mode 100644
index 0000000..679313f
--- /dev/null
+++ b/lib/libpmc/pmc.corei7.3
@@ -0,0 +1,1581 @@
+.\" Copyright (c) 2010 Fabien Thomas. All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" This software is provided by Joseph Koshy ``as is'' and
+.\" any express or implied warranties, including, but not limited to, the
+.\" implied warranties of merchantability and fitness for a particular purpose
+.\" are disclaimed. in no event shall Joseph Koshy be liable
+.\" for any direct, indirect, incidental, special, exemplary, or consequential
+.\" damages (including, but not limited to, procurement of substitute goods
+.\" or services; loss of use, data, or profits; or business interruption)
+.\" however caused and on any theory of liability, whether in contract, strict
+.\" liability, or tort (including negligence or otherwise) arising in any way
+.\" out of the use of this software, even if advised of the possibility of
+.\" such damage.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd March 24, 2010
+.Dt PMC.COREI7 3
+.Os
+.Sh NAME
+.Nm pmc.corei7
+.Nd measurement events for
+.Tn Intel
+.Tn Core i7 and Xeon 5500
+family CPUs
+.Sh LIBRARY
+.Lb libpmc
+.Sh SYNOPSIS
+.In pmc.h
+.Sh DESCRIPTION
+.Tn Intel
+.Tn "Core i7"
+CPUs contain PMCs conforming to version 2 of the
+.Tn Intel
+performance measurement architecture.
+These CPUs may contain up to three classes of PMCs:
+.Bl -tag -width "Li PMC_CLASS_IAP"
+.It Li PMC_CLASS_IAF
+Fixed-function counters that count only one hardware event per counter.
+.It Li PMC_CLASS_IAP
+Programmable counters that may be configured to count one of a defined
+set of hardware events.
+.El
+.Pp
+The number of PMCs available in each class and their widths need to be
+determined at run time by calling
+.Xr pmc_cpuinfo 3 .
+.Pp
+Intel Core i7 and Xeon 5500 PMCs are documented in
+.Rs
+.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual"
+.%T "Volume 3B: System Programming Guide, Part 2"
+.%N "Order Number: 253669-033US"
+.%D December 2009
+.%Q "Intel Corporation"
+.Re
+.Ss COREI7 AND XEON 5500 FIXED FUNCTION PMCS
+These PMCs and their supported events are documented in
+.Xr pmc.iaf 3 .
+Not all CPUs in this family implement fixed-function counters.
+.Ss COREI7 AND XEON 5500 PROGRAMMABLE PMCS
+The programmable PMCs support the following capabilities:
+.Bl -column "PMC_CAP_INTERRUPT" "Support"
+.It Em Capability Ta Em Support
+.It PMC_CAP_CASCADE Ta \&No
+.It PMC_CAP_EDGE Ta Yes
+.It PMC_CAP_INTERRUPT Ta Yes
+.It PMC_CAP_INVERT Ta Yes
+.It PMC_CAP_READ Ta Yes
+.It PMC_CAP_PRECISE Ta \&No
+.It PMC_CAP_SYSTEM Ta Yes
+.It PMC_CAP_TAGGING Ta \&No
+.It PMC_CAP_THRESHOLD Ta Yes
+.It PMC_CAP_USER Ta Yes
+.It PMC_CAP_WRITE Ta Yes
+.El
+.Ss Event Qualifiers
+Event specifiers for these PMCs support the following common
+qualifiers:
+.Bl -tag -width indent
+.It Li rsp= Ns Ar value
+Configure the Off-core Response bits.
+.Bl -tag -width indent
+.It Li DMND_DATA_RD
+Counts the number of demand and DCU prefetch data reads of full
+and partial cachelines as well as demand data page table entry
+cacheline reads. Does not count L2 data read prefetches or
+instruction fetches.
+.It Li DMND_RFO
+Counts the number of demand and DCU prefetch reads for ownership
+(RFO) requests generated by a write to data cacheline. Does not
+count L2 RFO.
+.It Li DMND_IFETCH
+Counts the number of demand and DCU prefetch instruction cacheline
+reads. Does not count L2 code read prefetches.
+WB
+Counts the number of writeback (modified to exclusive) transactions.
+.It Li PF_DATA_RD
+Counts the number of data cacheline reads generated by L2 prefetchers.
+.It Li PF_RFO
+Counts the number of RFO requests generated by L2 prefetchers.
+.It Li PF_IFETCH
+Counts the number of code reads generated by L2 prefetchers.
+.It Li OTHER
+Counts one of the following transaction types, including L3 invalidate,
+I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences,
+lock, unlock, split lock.
+.It Li UNCORE_HIT
+L3 Hit: local or remote home requests that hit L3 cache in the uncore
+with no coherency actions required (snooping).
+.It Li OTHER_CORE_HIT_SNP
+L3 Hit: local or remote home requests that hit L3 cache in the uncore
+and was serviced by another core with a cross core snoop where no modified
+copies were found (clean).
+.It Li OTHER_CORE_HITM
+L3 Hit: local or remote home requests that hit L3 cache in the uncore
+and was serviced by another core with a cross core snoop where modified
+copies were found (HITM).
+.It Li REMOTE_CACHE_FWD
+L3 Miss: local homed requests that missed the L3 cache and was serviced
+by forwarded data following a cross package snoop where no modified
+copies found. (Remote home requests are not counted)
+.It Li REMOTE_DRAM
+L3 Miss: remote home requests that missed the L3 cache and were serviced
+by remote DRAM.
+.It Li LOCAL_DRAM
+L3 Miss: local home requests that missed the L3 cache and were serviced
+by local DRAM.
+.It Li NON_DRAM
+Non-DRAM requests that were serviced by IOH.
+.El
+.It Li cmask= Ns Ar value
+Configure the PMC to increment only if the number of configured
+events measured in a cycle is greater than or equal to
+.Ar value .
+.It Li edge
+Configure the PMC to count the number of de-asserted to asserted
+transitions of the conditions expressed by the other qualifiers.
+If specified, the counter will increment only once whenever a
+condition becomes true, irrespective of the number of clocks during
+which the condition remains true.
+.It Li inv
+Invert the sense of comparison when the
+.Dq Li cmask
+qualifier is present, making the counter increment when the number of
+events per cycle is less than the value specified by the
+.Dq Li cmask
+qualifier.
+.It Li os
+Configure the PMC to count events happening at processor privilege
+level 0.
+.It Li usr
+Configure the PMC to count events occurring at privilege levels 1, 2
+or 3.
+.El
+.Pp
+If neither of the
+.Dq Li os
+or
+.Dq Li usr
+qualifiers are specified, the default is to enable both.
+.Ss Event Specifiers (Programmable PMCs)
+Core i7 and Xeon 5500 programmable PMCs support the following events:
+.Bl -tag -width indent
+.It Li SB_DRAIN.ANY
+.Pq Event 04H , Umask 07H
+Counts the number of store buffer drains.
+.It Li STORE_BLOCKS.AT_RET
+.Pq Event 06H , Umask 04H
+Counts number of loads delayed with at-Retirement block code. The following
+loads need to be executed at retirement and wait for all senior stores on
+the same thread to be drained: load splitting across 4K boundary (page
+split), load accessing uncacheable (UC or USWC) memory, load lock, and load
+with page table in UC or USWC memory region.
+.It Li STORE_BLOCKS.L1D_BLOCK
+.Pq Event 06H , Umask 08H
+Cacheable loads delayed with L1D block code
+.It Li PARTIAL_ADDRESS_ALIAS
+.Pq Event 07H , Umask 01H
+Counts false dependency due to partial address aliasing
+.It Li DTLB_LOAD_MISSES.ANY
+.Pq Event 08H , Umask 01H
+Counts all load misses that cause a page walk
+.It Li DTLB_LOAD_MISSES.WALK_COMPLETED
+.Pq Event 08H , Umask 02H
+Counts number of completed page walks due to load miss in the STLB.
+.It Li DTLB_LOAD_MISSES.STLB_HIT
+.Pq Event 08H , Umask 10H
+Number of cache load STLB hits
+.It Li DTLB_LOAD_MISSES.PDE_MISS
+.Pq Event 08H , Umask 20H
+Number of DTLB cache load misses where the low part of the linear to
+physical address translation was missed.
+.It Li DTLB_LOAD_MISSES.PDP_MISS
+.Pq Event 08H , Umask 40H
+Number of DTLB cache load misses where the high part of the linear to
+physical address translation was missed.
+.It Li DTLB_LOAD_MISSES.LARGE_WALK_COMPLETED
+.Pq Event 08H , Umask 80H
+Counts number of completed large page walks due to load miss in the STLB.
+.It Li MEM_INST_RETIRED.LOADS
+.Pq Event 0BH , Umask 01H
+Counts the number of instructions with an architecturally-visible store
+retired on the architected path.
+In conjunction with ld_lat facility
+.It Li MEM_INST_RETIRED.STORES
+.Pq Event 0BH , Umask 02H
+Counts the number of instructions with an architecturally-visible store
+retired on the architected path.
+In conjunction with ld_lat facility
+.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD
+.Pq Event 0BH , Umask 10H
+Counts the number of instructions exceeding the latency specified with
+ld_lat facility.
+In conjunction with ld_lat facility
+.It Li MEM_STORE_RETIRED.DTLB_MISS
+.Pq Event 0CH , Umask 01H
+The event counts the number of retired stores that missed the DTLB. The DTLB
+miss is not counted if the store operation causes a fault. Does not counter
+prefetches. Counts both primary and secondary misses to the TLB
+.It Li UOPS_ISSUED.ANY
+.Pq Event 0EH , Umask 01H
+Counts the number of Uops issued by the Register Allocation Table to the
+Reservation Station, i.e. the UOPs issued from the front end to the back
+end.
+.It Li UOPS_ISSUED.STALLED_CYCLES
+.Pq Event 0EH , Umask 01H
+Counts the number of cycles no Uops issued by the Register Allocation Table
+to the Reservation Station, i.e. the UOPs issued from the front end to the
+back end.
+set invert=1, cmask = 1
+.It Li UOPS_ISSUED.FUSED
+.Pq Event 0EH , Umask 02H
+Counts the number of fused Uops that were issued from the Register
+Allocation Table to the Reservation Station.
+.It Li MEM_UNCORE_RETIRED.L3_DATA_MISS_UNKNOWN
+.Pq Event 0FH , Umask 01H
+Counts number of memory load instructions retired where the memory reference
+missed L3 and data source is unknown.
+Available only for CPUID signature 06_2EH
+.It Li MEM_UNCORE_RETIRED.OTHER_CORE_L2_HITM
+.Pq Event 0FH , Umask 02H
+Counts number of memory load instructions retired where the memory reference
+hit modified data in a sibling core residing on the same socket.
+.It Li MEM_UNCORE_RETIRED.REMOTE_CACHE_LOCAL_HOME_HIT
+.Pq Event 0FH , Umask 08H
+Counts number of memory load instructions retired where the memory reference
+missed the L1, L2 and L3 caches and HIT in a remote socket's cache. Only
+counts locally homed lines.
+.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM
+.Pq Event 0FH , Umask 10H
+Counts number of memory load instructions retired where the memory reference
+missed the L1, L2 and L3 caches and was remotely homed. This includes both
+DRAM access and HITM in a remote socket's cache for remotely homed lines.
+.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM
+.Pq Event 0FH , Umask 20H
+Counts number of memory load instructions retired where the memory reference
+missed the L1, L2 and L3 caches and required a local socket memory
+reference. This includes locally homed cachelines that were in a modified
+state in another socket.
+.It Li MEM_UNCORE_RETIRED.UNCACHEABLE
+.Pq Event 0FH , Umask 80H
+Counts number of memory load instructions retired where the memory reference
+missed the L1, L2 and L3 caches and to perform I/O.
+Available only for CPUID signature 06_2EH
+.It Li FP_COMP_OPS_EXE.X87
+.Pq Event 10H , Umask 01H
+Counts the number of FP Computational Uops Executed. The number of FADD,
+FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer
+DIVs, and IDIVs. This event does not distinguish an FADD used in the middle
+of a transcendental flow from a separate FADD instruction.
+.It Li FP_COMP_OPS_EXE.MMX
+.Pq Event 10H , Umask 02H
+Counts number of MMX Uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_FP
+.Pq Event 10H , Umask 04H
+Counts number of SSE and SSE2 FP uops executed.
+.It Li FP_COMP_OPS_EXE.SSE2_INTEGER
+.Pq Event 10H , Umask 08H
+Counts number of SSE2 integer uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED
+.Pq Event 10H , Umask 10H
+Counts number of SSE FP packed uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR
+.Pq Event 10H , Umask 20H
+Counts number of SSE FP scalar uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION
+.Pq Event 10H , Umask 40H
+Counts number of SSE* FP single precision uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION
+.Pq Event 10H , Umask 80H
+Counts number of SSE* FP double precision uops executed.
+.It Li SIMD_INT_128.PACKED_MPY
+.Pq Event 12H , Umask 01H
+Counts number of 128 bit SIMD integer multiply operations.
+.It Li SIMD_INT_128.PACKED_SHIFT
+.Pq Event 12H , Umask 02H
+Counts number of 128 bit SIMD integer shift operations.
+.It Li SIMD_INT_128.PACK
+.Pq Event 12H , Umask 04H
+Counts number of 128 bit SIMD integer pack operations.
+.It Li SIMD_INT_128.UNPACK
+.Pq Event 12H , Umask 08H
+Counts number of 128 bit SIMD integer unpack operations.
+.It Li SIMD_INT_128.PACKED_LOGICAL
+.Pq Event 12H , Umask 10H
+Counts number of 128 bit SIMD integer logical operations.
+.It Li SIMD_INT_128.PACKED_ARITH
+.Pq Event 12H , Umask 20H
+Counts number of 128 bit SIMD integer arithmetic operations.
+.It Li SIMD_INT_128.SHUFFLE_MOVE
+.Pq Event 12H , Umask 40H
+Counts number of 128 bit SIMD integer shuffle and move operations.
+.It Li LOAD_DISPATCH.RS
+.Pq Event 13H , Umask 01H
+Counts number of loads dispatched from the Reservation Station that bypass
+the Memory Order Buffer.
+.It Li LOAD_DISPATCH.RS_DELAYED
+.Pq Event 13H , Umask 02H
+Counts the number of delayed RS dispatches at the stage latch. If an RS
+dispatch can not bypass to LB, it has another chance to dispatch from the
+one-cycle delayed staging latch before it is written into the LB.
+.It Li LOAD_DISPATCH.MOB
+.Pq Event 13H , Umask 04H
+Counts the number of loads dispatched from the Reservation Station to the
+Memory Order Buffer.
+.It Li LOAD_DISPATCH.ANY
+.Pq Event 13H , Umask 07H
+Counts all loads dispatched from the Reservation Station.
+.It Li ARITH.CYCLES_DIV_BUSY
+.Pq Event 14H , Umask 01H
+Counts the number of cycles the divider is busy executing divide or square
+root operations. The divide can be integer, X87 or Streaming SIMD Extensions
+(SSE). The square root operation can be either X87 or SSE.
+Set 'edge =1, invert=1, cmask=1' to count the number of divides.
+Count may be incorrect When SMT is on.
+.It Li ARITH.MUL
+.Pq Event 14H , Umask 02H
+Counts the number of multiply operations executed. This includes integer as
+well as floating point multiply operations but excludes DPPS mul and MPSAD.
+Count may be incorrect When SMT is on
+.It Li INST_QUEUE_WRITES
+.Pq Event 17H , Umask 01H
+Counts the number of instructions written into the instruction queue every
+cycle.
+.It Li INST_DECODED.DEC0
+.Pq Event 18H , Umask 01H
+Counts number of instructions that require decoder 0 to be decoded. Usually,
+this means that the instruction maps to more than 1 uop
+.It Li TWO_UOP_INSTS_DECODED
+.Pq Event 19H , Umask 01H
+An instruction that generates two uops was decoded
+.It Li INST_QUEUE_WRITE_CYCLES
+.Pq Event 1EH , Umask 01H
+This event counts the number of cycles during which instructions are written
+to the instruction queue. Dividing this counter by the number of
+instructions written to the instruction queue (INST_QUEUE_WRITES) yields the
+average number of instructions decoded each cycle. If this number is less
+than four and the pipe stalls, this indicates that the decoder is failing to
+decode enough instructions per cycle to sustain the 4-wide pipeline.
+If SSE* instructions that are 6 bytes or longer arrive one after another,
+then front end throughput may limit execution speed. In such case,
+.It Li LSD_OVERFLOW
+.Pq Event 20H , Umask 01H
+Counts number of loops that cant stream from the instruction queue.
+.It Li L2_RQSTS.LD_HIT
+.Pq Event 24H , Umask 01H
+Counts number of loads that hit the L2 cache. L2 loads include both L1D
+demand misses as well as L1D prefetches. L2 loads can be rejected for
+various reasons. Only non rejected loads are counted.
+.It Li L2_RQSTS.LD_MISS
+.Pq Event 24H , Umask 02H
+Counts the number of loads that miss the L2 cache. L2 loads include both L1D
+demand misses as well as L1D prefetches.
+.It Li L2_RQSTS.LOADS
+.Pq Event 24H , Umask 03H
+Counts all L2 load requests. L2 loads include both L1D demand misses as well
+as L1D prefetches.
+.It Li L2_RQSTS.RFO_HIT
+.Pq Event 24H , Umask 04H
+Counts the number of store RFO requests that hit the L2 cache. L2 RFO
+requests include both L1D demand RFO misses as well as L1D RFO prefetches.
+Count includes WC memory requests, where the data is not fetched but the
+permission to write the line is required.
+.It Li L2_RQSTS.RFO_MISS
+.Pq Event 24H , Umask 08H
+Counts the number of store RFO requests that miss the L2 cache. L2 RFO
+requests include both L1D demand RFO misses as well as L1D RFO prefetches.
+.It Li L2_RQSTS.RFOS
+.Pq Event 24H , Umask 0CH
+Counts all L2 store RFO requests. L2 RFO requests include both L1D demand
+RFO misses as well as L1D RFO prefetches.
+.It Li L2_RQSTS.IFETCH_HIT
+.Pq Event 24H , Umask 10H
+Counts number of instruction fetches that hit the L2 cache. L2 instruction
+fetches include both L1I demand misses as well as L1I instruction
+prefetches.
+.It Li L2_RQSTS.IFETCH_MISS
+.Pq Event 24H , Umask 20H
+Counts number of instruction fetches that miss the L2 cache. L2 instruction
+fetches include both L1I demand misses as well as L1I instruction
+prefetches.
+.It Li L2_RQSTS.IFETCHES
+.Pq Event 24H , Umask 30H
+Counts all instruction fetches. L2 instruction fetches include both L1I
+demand misses as well as L1I instruction prefetches.
+.It Li L2_RQSTS.PREFETCH_HIT
+.Pq Event 24H , Umask 40H
+Counts L2 prefetch hits for both code and data.
+.It Li L2_RQSTS.PREFETCH_MISS
+.Pq Event 24H , Umask 80H
+Counts L2 prefetch misses for both code and data.
+.It Li L2_RQSTS.PREFETCHES
+.Pq Event 24H , Umask C0H
+Counts all L2 prefetches for both code and data.
+.It Li L2_RQSTS.MISS
+.Pq Event 24H , Umask AAH
+Counts all L2 misses for both code and data.
+.It Li L2_RQSTS.REFERENCES
+.Pq Event 24H , Umask FFH
+Counts all L2 requests for both code and data.
+.It Li L2_DATA_RQSTS.DEMAND.I_STATE
+.Pq Event 26H , Umask 01H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the I (invalid) state, i.e. a cache miss. L2 demand loads are both L1D
+demand misses and L1D prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.S_STATE
+.Pq Event 26H , Umask 02H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the S (shared) state. L2 demand loads are both L1D demand misses and L1D
+prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.E_STATE
+.Pq Event 26H , Umask 04H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the E (exclusive) state. L2 demand loads are both L1D demand misses and
+L1D prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.M_STATE
+.Pq Event 26H , Umask 08H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the M (modified) state. L2 demand loads are both L1D demand misses and
+L1D prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.MESI
+.Pq Event 26H , Umask 0FH
+Counts all L2 data demand requests. L2 demand loads are both L1D demand
+misses and L1D prefetches.
+.It Li L2_DATA_RQSTS.PREFETCH.I_STATE
+.Pq Event 26H , Umask 10H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the I (invalid) state, i.e. a cache miss.
+.It Li L2_DATA_RQSTS.PREFETCH.S_STATE
+.Pq Event 26H , Umask 20H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the S (shared) state. A prefetch RFO will miss on an S state line, while
+a prefetch read will hit on an S state line.
+.It Li L2_DATA_RQSTS.PREFETCH.E_STATE
+.Pq Event 26H , Umask 40H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the E (exclusive) state.
+.It Li L2_DATA_RQSTS.PREFETCH.M_STATE
+.Pq Event 26H , Umask 80H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the M (modified) state.
+.It Li L2_DATA_RQSTS.PREFETCH.MESI
+.Pq Event 26H , Umask F0H
+Counts all L2 prefetch requests.
+.It Li L2_DATA_RQSTS.ANY
+.Pq Event 26H , Umask FFH
+Counts all L2 data requests.
+.It Li L2_WRITE.RFO.I_STATE
+.Pq Event 27H , Umask 01H
+Counts number of L2 demand store RFO requests where the cache line to be
+loaded is in the I (invalid) state, i.e, a cache miss. The L1D prefetcher
+does not issue a RFO prefetch.
+This is a demand RFO request
+.It Li L2_WRITE.RFO.S_STATE
+.Pq Event 27H , Umask 02H
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in the S (shared) state. The L1D prefetcher does not issue a RFO prefetch,.
+This is a demand RFO request
+.It Li L2_WRITE.RFO.M_STATE
+.Pq Event 27H , Umask 08H
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in the M (modified) state. The L1D prefetcher does not issue a RFO prefetch.
+This is a demand RFO request
+.It Li L2_WRITE.RFO.HIT
+.Pq Event 27H , Umask 0EH
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in either the S, E or M states. The L1D prefetcher does not issue a RFO
+prefetch.
+This is a demand RFO request
+.It Li L2_WRITE.RFO.MESI
+.Pq Event 27H , Umask 0FH
+Counts all L2 store RFO requests.The L1D prefetcher does not issue a RFO
+prefetch.
+This is a demand RFO request
+.It Li L2_WRITE.LOCK.I_STATE
+.Pq Event 27H , Umask 10H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in the I (invalid) state, i.e. a cache miss.
+.It Li L2_WRITE.LOCK.S_STATE
+.Pq Event 27H , Umask 20H
+Counts number of L2 lock RFO requests where the cache line to be loaded is
+in the S (shared) state.
+.It Li L2_WRITE.LOCK.E_STATE
+.Pq Event 27H , Umask 40H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in the E (exclusive) state.
+.It Li L2_WRITE.LOCK.M_STATE
+.Pq Event 27H , Umask 80H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in the M (modified) state.
+.It Li L2_WRITE.LOCK.HIT
+.Pq Event 27H , Umask E0H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in either the S, E, or M state.
+.It Li L2_WRITE.LOCK.MESI
+.Pq Event 27H , Umask F0H
+Counts all L2 demand lock RFO requests.
+.It Li L1D_WB_L2.I_STATE
+.Pq Event 28H , Umask 01H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the I (invalid) state, i.e. a cache miss.
+.It Li L1D_WB_L2.S_STATE
+.Pq Event 28H , Umask 02H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the S state.
+.It Li L1D_WB_L2.E_STATE
+.Pq Event 28H , Umask 04H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the E (exclusive) state.
+.It Li L1D_WB_L2.M_STATE
+.Pq Event 28H , Umask 08H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the M (modified) state.
+.It Li L1D_WB_L2.MESI
+.Pq Event 28H , Umask 0FH
+Counts all L1 writebacks to the L2.
+.It Li L3_LAT_CACHE.REFERENCE
+.Pq Event 2EH , Umask 4FH
+This event counts requests originating from the core that reference a cache
+line in the last level cache. The event count includes speculative traffic
+but excludes cache line fills due to a L2 hardware-prefetch. Because cache
+hierarchy, cache sizes and other implementation-specific characteristics;
+value comparison to estimate performance differences is not recommended.
+see Table A-1
+.It Li L3_LAT_CACHE.MISS
+.Pq Event 2EH , Umask 41H
+This event counts each cache miss condition for references to the last level
+cache. The event count may include speculative traffic but excludes cache
+line fills due to L2 hardware-prefetches. Because cache hierarchy, cache
+sizes and other implementation-specific characteristics; value comparison to
+estimate performance differences is not recommended.
+see Table A-1
+.It Li CPU_CLK_UNHALTED.THREAD_P
+.Pq Event 3CH , Umask 00H
+Counts the number of thread cycles while the thread is not in a halt state.
+The thread enters the halt state when it is running the HLT instruction. The
+core frequency may change from time to time due to power or thermal
+throttling.
+see Table A-1
+.It Li CPU_CLK_UNHALTED.REF_P
+.Pq Event 3CH , Umask 01H
+Increments at the frequency of TSC when not halted.
+see Table A-1
+.It Li L1D_CACHE_LD.I_STATE
+.Pq Event 40H , Umask 01H
+Counts L1 data cache read requests where the cache line to be loaded is in
+the I (invalid) state, i.e. the read request missed the cache.
+Counter 0, 1 only
+.It Li L1D_CACHE_LD.S_STATE
+.Pq Event 40H , Umask 02H
+Counts L1 data cache read requests where the cache line to be loaded is in
+the S (shared) state.
+Counter 0, 1 only
+.It Li L1D_CACHE_LD.E_STATE
+.Pq Event 40H , Umask 04H
+Counts L1 data cache read requests where the cache line to be loaded is in
+the E (exclusive) state.
+Counter 0, 1 only
+.It Li L1D_CACHE_LD.M_STATE
+.Pq Event 40H , Umask 08H
+Counts L1 data cache read requests where the cache line to be loaded is in
+the M (modified) state.
+Counter 0, 1 only
+.It Li L1D_CACHE_LD.MESI
+.Pq Event 40H , Umask 0FH
+Counts L1 data cache read requests.
+Counter 0, 1 only
+.It Li L1D_CACHE_ST.S_STATE
+.Pq Event 41H , Umask 02H
+Counts L1 data cache store RFO requests where the cache line to be loaded is
+in the S (shared) state.
+Counter 0, 1 only
+.It Li L1D_CACHE_ST.E_STATE
+.Pq Event 41H , Umask 04H
+Counts L1 data cache store RFO requests where the cache line to be loaded is
+in the E (exclusive) state.
+Counter 0, 1 only
+.It Li L1D_CACHE_ST.M_STATE
+.Pq Event 41H , Umask 08H
+Counts L1 data cache store RFO requests where cache line to be loaded is in
+the M (modified) state.
+Counter 0, 1 only
+.It Li L1D_CACHE_LOCK.HIT
+.Pq Event 42H , Umask 01H
+Counts retired load locks that hit in the L1 data cache or hit in an already
+allocated fill buffer. The lock portion of the load lock transaction must
+hit in the L1D.
+The initial load will pull the lock into the L1 data cache. Counter 0, 1
+only
+.It Li L1D_CACHE_LOCK.S_STATE
+.Pq Event 42H , Umask 02H
+Counts L1 data cache retired load locks that hit the target cache line in
+the shared state.
+Counter 0, 1 only
+.It Li L1D_CACHE_LOCK.E_STATE
+.Pq Event 42H , Umask 04H
+Counts L1 data cache retired load locks that hit the target cache line in
+the exclusive state.
+Counter 0, 1 only
+.It Li L1D_CACHE_LOCK.M_STATE
+.Pq Event 42H , Umask 08H
+Counts L1 data cache retired load locks that hit the target cache line in
+the modified state.
+Counter 0, 1 only
+.It Li L1D_ALL_REF.ANY
+.Pq Event 43H , Umask 01H
+Counts all references (uncached, speculated and retired) to the L1 data
+cache, including all loads and stores with any memory types. The event
+counts memory accesses only when they are actually performed. For example, a
+load blocked by unknown store address and later performed is only counted
+once.
+The event does not include non- memory accesses, such as I/O accesses.
+Counter 0, 1 only
+.It Li L1D_ALL_REF.CACHEABLE
+.Pq Event 43H , Umask 02H
+Counts all data reads and writes (speculated and retired) from cacheable
+memory, including locked operations.
+Counter 0, 1 only
+.It Li L1D_PEND_MISS.LOAD_BUFFERS_FULL
+.Pq Event 48H , Umask 02H
+Counts cycles of L1 data cache load fill buffers full.
+Counter 0, 1 only
+.It Li DTLB_MISSES.ANY
+.Pq Event 49H , Umask 01H
+Counts the number of misses in the STLB which causes a page walk.
+.It Li DTLB_MISSES.WALK_COMPLETED
+.Pq Event 49H , Umask 02H
+Counts number of misses in the STLB which resulted in a completed page walk.
+.It Li DTLB_MISSES.STLB_HIT
+.Pq Event 49H , Umask 10H
+Counts the number of DTLB first level misses that hit in the second level
+TLB. This event is only relevant if the core contains multiple DTLB levels.
+.It Li LOAD_HIT_PRE
+.Pq Event 4CH , Umask 01H
+Counts load operations sent to the L1 data cache while a previous SSE
+prefetch instruction to the same cache line has started prefetching but has
+not yet finished.
+.It Li L1D_PREFETCH.REQUESTS
+.Pq Event 4EH , Umask 01H
+Counts number of hardware prefetch requests dispatched out of the prefetch
+FIFO.
+.It Li L1D_PREFETCH.MISS
+.Pq Event 4EH , Umask 02H
+Counts number of hardware prefetch requests that miss the L1D. There are two
+prefetchers in the L1D. A streamer, which predicts lines sequentially after
+this one should be fetched, and the IP prefetcher that remembers access
+patterns for the current instruction. The streamer prefetcher stops on an
+L1D hit, while the IP prefetcher does not.
+.It Li L1D_PREFETCH.TRIGGERS
+.Pq Event 4EH , Umask 04H
+Counts number of prefetch requests triggered by the Finite State Machine and
+pushed into the prefetch FIFO. Some of the prefetch requests are dropped due
+to overwrites or competition between the IP index prefetcher and streamer
+prefetcher. The prefetch FIFO contains 4 entries.
+.It Li L1D.REPL
+.Pq Event 51H , Umask 01H
+Counts the number of lines brought into the L1 data cache.
+Counter 0, 1 only
+.It Li L1D.M_REPL
+.Pq Event 51H , Umask 02H
+Counts the number of modified lines brought into the L1 data cache.
+Counter 0, 1 only
+.It Li L1D.M_EVICT
+.Pq Event 51H , Umask 04H
+Counts the number of modified lines evicted from the L1 data cache due to
+replacement.
+Counter 0, 1 only
+.It Li L1D.M_SNOOP_EVICT
+.Pq Event 51H , Umask 08H
+Counts the number of modified lines evicted from the L1 data cache due to
+snoop HITM intervention.
+Counter 0, 1 only
+.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT
+.Pq Event 52H , Umask 01H
+Counts the number of cacheable load lock speculated instructions accepted
+into the fill buffer.
+.It Li L1D_CACHE_LOCK_FB_HIT
+.Pq Event 53H , Umask 01H
+Counts the number of cacheable load lock speculated or retired instructions
+accepted into the fill buffer.
+.It Li CACHE_LOCK_CYCLES.L1D_L2
+.Pq Event 63H , Umask 01H
+Cycle count during which the L1D and L2 are locked. A lock is asserted when
+there is a locked memory access, due to uncacheable memory, a locked
+operation that spans two cache lines, or a page walk from an uncacheable
+page table.
+Counter 0, 1 only. L1D and L2 locks have a very high performance penalty and
+it is highly recommended to avoid such accesses.
+.It Li CACHE_LOCK_CYCLES.L1D
+.Pq Event 63H , Umask 02H
+Counts the number of cycles that cacheline in the L1 data cache unit is
+locked.
+Counter 0, 1 only.
+.It Li IO_TRANSACTIONS
+.Pq Event 6CH , Umask 01H
+Counts the number of completed I/O transactions.
+.It Li L1I.HITS
+.Pq Event 80H , Umask 01H
+Counts all instruction fetches that hit the L1 instruction cache.
+.It Li L1I.MISSES
+.Pq Event 80H , Umask 02H
+Counts all instruction fetches that miss the L1I cache. This includes
+instruction cache misses, streaming buffer misses, victim cache misses and
+uncacheable fetches. An instruction fetch miss is counted only once and not
+once for every cycle it is outstanding.
+.It Li L1I.READS
+.Pq Event 80H , Umask 03H
+Counts all instruction fetches, including uncacheable fetches that bypass
+the L1I.
+.It Li L1I.CYCLES_STALLED
+.Pq Event 80H , Umask 04H
+Cycle counts for which an instruction fetch stalls due to a L1I cache miss,
+ITLB miss or ITLB fault.
+.It Li LARGE_ITLB.HIT
+.Pq Event 82H , Umask 01H
+Counts number of large ITLB hits.
+.It Li ITLB_MISSES.ANY
+.Pq Event 85H , Umask 01H
+Counts the number of misses in all levels of the ITLB which causes a page
+walk.
+.It Li ITLB_MISSES.WALK_COMPLETED
+.Pq Event 85H , Umask 02H
+Counts number of misses in all levels of the ITLB which resulted in a
+completed page walk.
+.It Li ILD_STALL.LCP
+.Pq Event 87H , Umask 01H
+Cycles Instruction Length Decoder stalls due to length changing prefixes:
+66, 67 or REX.W (for EM64T) instructions which change the length of the
+decoded instruction.
+.It Li ILD_STALL.MRU
+.Pq Event 87H , Umask 02H
+Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU)
+Most Recently Used (MRU) bypass.
+.It Li ILD_STALL.IQ_FULL
+.Pq Event 87H , Umask 04H
+Stall cycles due to a full instruction queue.
+.It Li ILD_STALL.REGEN
+.Pq Event 87H , Umask 08H
+Counts the number of regen stalls.
+.It Li ILD_STALL.ANY
+.Pq Event 87H , Umask 0FH
+Counts any cycles the Instruction Length Decoder is stalled.
+.It Li BR_INST_EXEC.COND
+.Pq Event 88H , Umask 01H
+Counts the number of conditional near branch instructions executed, but not
+necessarily retired.
+.It Li BR_INST_EXEC.DIRECT
+.Pq Event 88H , Umask 02H
+Counts all unconditional near branch instructions excluding calls and
+indirect branches.
+.It Li BR_INST_EXEC.INDIRECT_NON_CALL
+.Pq Event 88H , Umask 04H
+Counts the number of executed indirect near branch instructions that are not
+calls.
+.It Li BR_INST_EXEC.NON_CALLS
+.Pq Event 88H , Umask 07H
+Counts all non call near branch instructions executed, but not necessarily
+retired.
+.It Li BR_INST_EXEC.RETURN_NEAR
+.Pq Event 88H , Umask 08H
+Counts indirect near branches that have a return mnemonic.
+.It Li BR_INST_EXEC.DIRECT_NEAR_CALL
+.Pq Event 88H , Umask 10H
+Counts unconditional near call branch instructions, excluding non call
+branch, executed.
+.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL
+.Pq Event 88H , Umask 20H
+Counts indirect near calls, including both register and memory indirect,
+executed.
+.It Li BR_INST_EXEC.NEAR_CALLS
+.Pq Event 88H , Umask 30H
+Counts all near call branches executed, but not necessarily retired.
+.It Li BR_INST_EXEC.TAKEN
+.Pq Event 88H , Umask 40H
+Counts taken near branches executed, but not necessarily retired.
+.It Li BR_INST_EXEC.ANY
+.Pq Event 88H , Umask 7FH
+Counts all near executed branches (not necessarily retired). This includes
+only instructions and not micro-op branches. Frequent branching is not
+necessarily a major performance issue. However frequent branch
+mispredictions may be a problem.
+.It Li BR_MISP_EXEC.COND
+.Pq Event 89H , Umask 01H
+Counts the number of mispredicted conditional near branch instructions
+executed, but not necessarily retired.
+.It Li BR_MISP_EXEC.DIRECT
+.Pq Event 89H , Umask 02H
+Counts mispredicted macro unconditional near branch instructions, excluding
+calls and indirect branches (should always be 0).
+.It Li BR_MISP_EXEC.INDIRECT_NON_CALL
+.Pq Event 89H , Umask 04H
+Counts the number of executed mispredicted indirect near branch instructions
+that are not calls.
+.It Li BR_MISP_EXEC.NON_CALLS
+.Pq Event 89H , Umask 07H
+Counts mispredicted non call near branches executed, but not necessarily
+retired.
+.It Li BR_MISP_EXEC.RETURN_NEAR
+.Pq Event 89H , Umask 08H
+Counts mispredicted indirect branches that have a rear return mnemonic.
+.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL
+.Pq Event 89H , Umask 10H
+Counts mispredicted non-indirect near calls executed, (should always be 0).
+.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL
+.Pq Event 89H , Umask 20H
+Counts mispredicted indirect near calls executed, including both register
+and memory indirect.
+.It Li BR_MISP_EXEC.NEAR_CALLS
+.Pq Event 89H , Umask 30H
+Counts all mispredicted near call branches executed, but not necessarily
+retired.
+.It Li BR_MISP_EXEC.TAKEN
+.Pq Event 89H , Umask 40H
+Counts executed mispredicted near branches that are taken, but not
+necessarily retired.
+.It Li BR_MISP_EXEC.ANY
+.Pq Event 89H , Umask 7FH
+Counts the number of mispredicted near branch instructions that were
+executed, but not necessarily retired.
+.It Li RESOURCE_STALLS.ANY
+.Pq Event A2H , Umask 01H
+Counts the number of Allocator resource related stalls. Includes register
+renaming buffer entries, memory buffer entries. In addition to resource
+related stalls, this event counts some other events. Includes stalls arising
+during branch misprediction recovery, such as if retirement of the
+mispredicted branch is delayed and stalls arising while store buffer is
+draining from synchronizing operations.
+Does not include stalls due to SuperQ (off core) queue full, too many cache
+misses, etc.
+.It Li RESOURCE_STALLS.LOAD
+.Pq Event A2H , Umask 02H
+Counts the cycles of stall due to lack of load buffer for load operation.
+.It Li RESOURCE_STALLS.RS_FULL
+.Pq Event A2H , Umask 04H
+This event counts the number of cycles when the number of instructions in
+the pipeline waiting for execution reaches the limit the processor can
+handle. A high count of this event indicates that there are long latency
+operations in the pipe (possibly load and store operations that miss the L2
+cache, or instructions dependent upon instructions further down the pipeline
+that have yet to retire.
+When RS is full, new instructions can not enter the reservation station and
+start execution.
+.It Li RESOURCE_STALLS.STORE
+.Pq Event A2H , Umask 08H
+This event counts the number of cycles that a resource related stall will
+occur due to the number of store instructions reaching the limit of the
+pipeline, (i.e. all store buffers are used). The stall ends when a store
+instruction commits its data to the cache or memory.
+.It Li RESOURCE_STALLS.ROB_FULL
+.Pq Event A2H , Umask 10H
+Counts the cycles of stall due to re- order buffer full.
+.It Li RESOURCE_STALLS.FPCW
+.Pq Event A2H , Umask 20H
+Counts the number of cycles while execution was stalled due to writing the
+floating-point unit (FPU) control word.
+.It Li RESOURCE_STALLS.MXCSR
+.Pq Event A2H , Umask 40H
+Stalls due to the MXCSR register rename occurring to close to a previous
+MXCSR rename. The MXCSR provides control and status for the MMX registers.
+.It Li RESOURCE_STALLS.OTHER
+.Pq Event A2H , Umask 80H
+Counts the number of cycles while execution was stalled due to other
+resource issues.
+.It Li MACRO_INSTS.FUSIONS_DECODED
+.Pq Event A6H , Umask 01H
+Counts the number of instructions decoded that are macro-fused but not
+necessarily executed or retired.
+.It Li BACLEAR_FORCE_IQ
+.Pq Event A7H , Umask 01H
+Counts number of times a BACLEAR was forced by the Instruction Queue. The IQ
+is also responsible for providing conditional branch prediction direction
+based on a static scheme and dynamic data provided by the L2 Branch
+Prediction Unit. If the conditional branch target is not found in the Target
+Array and the IQ predicts that the branch is taken, then the IQ will force
+the Branch Address Calculator to issue a BACLEAR. Each BACLEAR asserted by
+the BAC generates approximately an 8 cycle bubble in the instruction fetch
+pipeline.
+.It Li LSD.UOPS
+.Pq Event A8H , Umask 01H
+Counts the number of micro-ops delivered by loop stream detector
+Use cmask=1 and invert to count cycles
+.It Li ITLB_FLUSH
+.Pq Event AEH , Umask 01H
+Counts the number of ITLB flushes
+.It Li OFFCORE_REQUESTS.L1D_WRITEBACK
+.Pq Event B0H , Umask 40H
+Counts number of L1D writebacks to the uncore.
+.It Li UOPS_EXECUTED.PORT0
+.Pq Event B1H , Umask 01H
+Counts number of Uops executed that were issued on port 0. Port 0 handles
+integer arithmetic, SIMD and FP add Uops.
+.It Li UOPS_EXECUTED.PORT1
+.Pq Event B1H , Umask 02H
+Counts number of Uops executed that were issued on port 1. Port 1 handles
+integer arithmetic, SIMD, integer shift, FP multiply and FP divide Uops.
+.It Li UOPS_EXECUTED.PORT2_CORE
+.Pq Event B1H , Umask 04H
+Counts number of Uops executed that were issued on port 2. Port 2 handles
+the load Uops. This is a core count only and can not be collected per
+thread.
+.It Li UOPS_EXECUTED.PORT3_CORE
+.Pq Event B1H , Umask 08H
+Counts number of Uops executed that were issued on port 3. Port 3 handles
+store Uops. This is a core count only and can not be collected per thread.
+.It Li UOPS_EXECUTED.PORT4_CORE
+.Pq Event B1H , Umask 10H
+Counts number of Uops executed that where issued on port 4. Port 4 handles
+the value to be stored for the store Uops issued on port 3. This is a core
+count only and can not be collected per thread.
+.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5
+.Pq Event B1H , Umask 1FH
+Counts cycles when the Uops executed were issued from any ports except port
+5. Use Cmask=1 for active cycles; Cmask=0 for weighted cycles; Use CMask=1,
+Invert=1 to count P0-4 stalled cycles Use Cmask=1, Edge=1, Invert=1 to count
+P0-4 stalls.
+.It Li UOPS_EXECUTED.PORT5
+.Pq Event B1H , Umask 20H
+Counts number of Uops executed that where issued on port 5.
+.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES
+.Pq Event B1H , Umask 3FH
+Counts cycles when the Uops are executing. Use Cmask=1 for active cycles;
+Cmask=0 for weighted cycles; Use CMask=1, Invert=1 to count P0-4 stalled
+cycles Use Cmask=1, Edge=1, Invert=1 to count P0-4 stalls.
+.It Li UOPS_EXECUTED.PORT015
+.Pq Event B1H , Umask 40H
+Counts number of Uops executed that where issued on port 0, 1, or 5.
+use cmask=1, invert=1 to count stall cycles
+.It Li UOPS_EXECUTED.PORT234
+.Pq Event B1H , Umask 80H
+Counts number of Uops executed that where issued on port 2, 3, or 4.
+.It Li OFFCORE_REQUESTS_SQ_FULL
+.Pq Event B2H , Umask 01H
+Counts number of cycles the SQ is full to handle off-core requests.
+.It Li OFF_CORE_RESPONSE_0
+.Pq Event B7H , Umask 01H
+see Section 30.6.1.3, Off-core Response Performance Monitoring in the
+Processor Core
+Requires programming MSR 01A6H
+.It Li SNOOP_RESPONSE.HIT
+.Pq Event B8H , Umask 01H
+Counts HIT snoop response sent by this thread in response to a snoop
+request.
+.It Li SNOOP_RESPONSE.HITE
+.Pq Event B8H , Umask 02H
+Counts HIT E snoop response sent by this thread in response to a snoop
+request.
+.It Li SNOOP_RESPONSE.HITM
+.Pq Event B8H , Umask 04H
+Counts HIT M snoop response sent by this thread in response to a snoop
+request.
+.It Li OFF_CORE_RESPONSE_1
+.Pq Event BBH , Umask 01H
+see Section 30.6.1.3, Off-core Response Performance Monitoring in the
+Processor Core
+Requires programming MSR 01A7H
+.It Li INST_RETIRED.ANY_P
+.Pq Event C0H , Umask 01H
+See Table A-1
+Notes: INST_RETIRED.ANY is counted by a designated fixed counter.
+INST_RETIRED.ANY_P is counted by a programmable counter and is an
+architectural performance event. Event is supported if CPUID.A.EBX[1] = 0.
+Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not
+count as retired instructions.
+.It Li INST_RETIRED.X87
+.Pq Event C0H , Umask 02H
+Counts the number of MMX instructions retired.
+.It Li INST_RETIRED.MMX
+.Pq Event C0H , Umask 04H
+Counts the number of floating point computational operations retired:
+floating point computational operations executed by the assist handler and
+sub-operations of complex floating point instructions like transcendental
+instructions.
+.It Li UOPS_RETIRED.ANY
+.Pq Event C2H , Umask 01H
+Counts the number of micro-ops retired, (macro-fused=1, micro- fused=2,
+others=1; maximum count of 8 per cycle). Most instructions are composed of
+one or two micro-ops. Some instructions are decoded into longer sequences
+such as repeat instructions, floating point transcendental instructions, and
+assists.
+Use cmask=1 and invert to count active cycles or stalled cycles
+.It Li UOPS_RETIRED.RETIRE_SLOTS
+.Pq Event C2H , Umask 02H
+Counts the number of retirement slots used each cycle
+.It Li UOPS_RETIRED.MACRO_FUSED
+.Pq Event C2H , Umask 04H
+Counts number of macro-fused uops retired.
+.It Li MACHINE_CLEARS.CYCLES
+.Pq Event C3H , Umask 01H
+Counts the cycles machine clear is asserted.
+.It Li MACHINE_CLEARS.MEM_ORDER
+.Pq Event C3H , Umask 02H
+Counts the number of machine clears due to memory order conflicts.
+.It Li MACHINE_CLEARS.SMC
+.Pq Event C3H , Umask 04H
+Counts the number of times that a program writes to a code section.
+Self-modifying code causes a sever penalty in all Intel 64 and IA-32
+processors. The modified cache line is written back to the L2 and L3caches.
+.It Li BR_INST_RETIRED.ALL_BRANCHES
+.Pq Event C4H , Umask 00H
+See Table A-1
+.It Li BR_INST_RETIRED.CONDITIONAL
+.Pq Event C4H , Umask 01H
+Counts the number of conditional branch instructions retired.
+.It Li BR_INST_RETIRED.NEAR_CALL
+.Pq Event C4H , Umask 02H
+Counts the number of direct & indirect near unconditional calls retired
+.It Li BR_INST_RETIRED.ALL_BRANCHES
+.Pq Event C4H , Umask 04H
+Counts the number of branch instructions retired
+.It Li BR_MISP_RETIRED.ALL_BRANCHES
+.Pq Event C5H , Umask 00H
+See Table A-1
+.It Li BR_MISP_RETIRED.NEAR_CALL
+.Pq Event C5H , Umask 02H
+Counts mispredicted direct & indirect near unconditional retired calls.
+.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE
+.Pq Event C7H , Umask 01H
+Counts SIMD packed single-precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE
+.Pq Event C7H , Umask 02H
+Counts SIMD calar single-precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE
+.Pq Event C7H , Umask 04H
+Counts SIMD packed double- precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE
+.Pq Event C7H , Umask 08H
+Counts SIMD scalar double-precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER
+.Pq Event C7H , Umask 10H
+Counts 128-bit SIMD vector integer Uops retired.
+.It Li ITLB_MISS_RETIRED
+.Pq Event C8H , Umask 20H
+Counts the number of retired instructions that missed the ITLB when the
+instruction was fetched.
+.It Li MEM_LOAD_RETIRED.L1D_HIT
+.Pq Event CBH , Umask 01H
+Counts number of retired loads that hit the L1 data cache.
+.It Li MEM_LOAD_RETIRED.L2_HIT
+.Pq Event CBH , Umask 02H
+Counts number of retired loads that hit the L2 data cache.
+.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT
+.Pq Event CBH , Umask 04H
+Counts number of retired loads that hit their own, unshared lines in the L3
+cache.
+.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM
+.Pq Event CBH , Umask 08H
+Counts number of retired loads that hit in a sibling core's L2 (on die
+core). Since the L3 is inclusive of all cores on the package, this is an L3
+hit. This counts both clean or modified hits.
+.It Li MEM_LOAD_RETIRED.L3_MISS
+.Pq Event CBH , Umask 10H
+Counts number of retired loads that miss the L3 cache. The load was
+satisfied by a remote socket, local memory or an IOH.
+.It Li MEM_LOAD_RETIRED.HIT_LFB
+.Pq Event CBH , Umask 40H
+Counts number of retired loads that miss the L1D and the address is located
+in an allocated line fill buffer and will soon be committed to cache. This
+is counting secondary L1D misses.
+.It Li MEM_LOAD_RETIRED.DTLB_MISS
+.Pq Event CBH , Umask 80H
+Counts the number of retired loads that missed the DTLB. The DTLB miss is
+not counted if the load operation causes a fault. This event counts loads
+from cacheable memory only. The event does not count loads by software
+prefetches. Counts both primary and secondary misses to the TLB.
+.It Li FP_MMX_TRANS.TO_FP
+.Pq Event CCH , Umask 01H
+Counts the first floating-point instruction following any MMX instruction.
+You can use this event to estimate the penalties for the transitions between
+floating-point and MMX technology states.
+.It Li FP_MMX_TRANS.TO_MMX
+.Pq Event CCH , Umask 02H
+Counts the first MMX instruction following a floating-point instruction. You
+can use this event to estimate the penalties for the transitions between
+floating-point and MMX technology states.
+.It Li FP_MMX_TRANS.ANY
+.Pq Event CCH , Umask 03H
+Counts all transitions from floating point to MMX instructions and from MMX
+instructions to floating point instructions. You can use this event to
+estimate the penalties for the transitions between floating-point and MMX
+technology states.
+.It Li MACRO_INSTS.DECODED
+.Pq Event D0H , Umask 01H
+Counts the number of instructions decoded, (but not necessarily executed or
+retired).
+.It Li UOPS_DECODED.MS
+.Pq Event D1H , Umask 02H
+Counts the number of Uops decoded by the Microcode Sequencer, MS. The MS
+delivers uops when the instruction is more than 4 uops long or a microcode
+assist is occurring.
+.It Li UOPS_DECODED.ESP_FOLDING
+.Pq Event D1H , Umask 04H
+Counts number of stack pointer (ESP) instructions decoded: push , pop , call
+, ret, etc. ESP instructions do not generate a Uop to increment or decrement
+ESP. Instead, they update an ESP_Offset register that keeps track of the
+delta to the current value of the ESP register.
+.It Li UOPS_DECODED.ESP_SYNC
+.Pq Event D1H , Umask 08H
+Counts number of stack pointer (ESP) sync operations where an ESP
+instruction is corrected by adding the ESP offset register to the current
+value of the ESP register.
+.It Li RAT_STALLS.FLAGS
+.Pq Event D2H , Umask 01H
+Counts the number of cycles during which execution stalled due to several
+reasons, one of which is a partial flag register stall. A partial register
+stall may occur when two conditions are met: 1) an instruction modifies
+some, but not all, of the flags in the flag register and 2) the next
+instruction, which depends on flags, depends on flags that were not modified
+by this instruction.
+.It Li RAT_STALLS.REGISTERS
+.Pq Event D2H , Umask 02H
+This event counts the number of cycles instruction execution latency became
+longer than the defined latency because the instruction used a register that
+was partially written by previous instruction.
+.It Li RAT_STALLS.ROB_READ_PORT
+.Pq Event D2H , Umask 04H
+Counts the number of cycles when ROB read port stalls occurred, which did
+not allow new micro-ops to enter the out-of-order pipeline. Note that, at
+this stage in the pipeline, additional stalls may occur at the same cycle
+and prevent the stalled micro-ops from entering the pipe. In such a case,
+micro-ops retry entering the execution pipe in the next cycle and the
+ROB-read port stall is counted again.
+.It Li RAT_STALLS.SCOREBOARD
+.Pq Event D2H , Umask 08H
+Counts the cycles where we stall due to microarchitecturally required
+serialization. Microcode scoreboarding stalls.
+.It Li RAT_STALLS.ANY
+.Pq Event D2H , Umask 0FH
+Counts all Register Allocation Table stall cycles due to: Cycles when ROB
+read port stalls occurred, which did not allow new micro-ops to enter the
+execution pipe. Cycles when partial register stalls occurred Cycles when
+flag stalls occurred Cycles floating-point unit (FPU) status word stalls
+occurred. To count each of these conditions separately use the events:
+RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and
+RAT_STALLS.FPSW.
+.It Li SEG_RENAME_STALLS
+.Pq Event D4H , Umask 01H
+Counts the number of stall cycles due to the lack of renaming resources for
+the ES, DS, FS, and GS segment registers. If a segment is renamed but not
+retired and a second update to the same segment occurs, a stall occurs in
+the front-end of the pipeline until the renamed segment retires.
+.It Li ES_REG_RENAMES
+.Pq Event D5H , Umask 01H
+Counts the number of times the ES segment register is renamed.
+.It Li UOP_UNFUSION
+.Pq Event DBH , Umask 01H
+Counts unfusion events due to floating point exception to a fused uop.
+.It Li BR_INST_DECODED
+.Pq Event E0H , Umask 01H
+Counts the number of branch instructions decoded.
+.It Li BPU_MISSED_CALL_RET
+.Pq Event E5H , Umask 01H
+Counts number of times the Branch Prediction Unit missed predicting a call
+or return branch.
+.It Li BACLEAR.CLEAR
+.Pq Event E6H , Umask 01H
+Counts the number of times the front end is resteered, mainly when the
+Branch Prediction Unit cannot provide a correct prediction and this is
+corrected by the Branch Address Calculator at the front end. This can occur
+if the code has many branches such that they cannot be consumed by the BPU.
+Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble
+in the instruction fetch pipeline. The effect on total execution time
+depends on the surrounding code.
+.It Li BACLEAR.BAD_TARGET
+.Pq Event E6H , Umask 02H
+Counts number of Branch Address Calculator clears (BACLEAR) asserted due to
+conditional branch instructions in which there was a target hit but the
+direction was wrong. Each BACLEAR asserted by the BAC generates
+approximately an 8 cycle bubble in the instruction fetch pipeline.
+.It Li BPU_CLEARS.EARLY
+.Pq Event E8H , Umask 01H
+Counts early (normal) Branch Prediction Unit clears: BPU predicted a taken
+branch after incorrectly assuming that it was not taken.
+The BPU clear leads to 2 cycle bubble in the Front End.
+.It Li BPU_CLEARS.LATE
+.Pq Event E8H , Umask 02H
+Counts late Branch Prediction Unit clears due to Most Recently Used
+conflicts. The PBU clear leads to a 3 cycle bubble in the Front End.
+.It Li BPU_CLEARS.ANY
+.Pq Event E8H , Umask 03H
+Counts all BPU clears.
+.It Li L2_TRANSACTIONS.LOAD
+.Pq Event F0H , Umask 01H
+Counts L2 load operations due to HW prefetch or demand loads.
+.It Li L2_TRANSACTIONS.RFO
+.Pq Event F0H , Umask 02H
+Counts L2 RFO operations due to HW prefetch or demand RFOs.
+.It Li L2_TRANSACTIONS.IFETCH
+.Pq Event F0H , Umask 04H
+Counts L2 instruction fetch operations due to HW prefetch or demand ifetch.
+.It Li L2_TRANSACTIONS.PREFETCH
+.Pq Event F0H , Umask 08H
+Counts L2 prefetch operations.
+.It Li L2_TRANSACTIONS.L1D_WB
+.Pq Event F0H , Umask 10H
+Counts L1D writeback operations to the L2.
+.It Li L2_TRANSACTIONS.FILL
+.Pq Event F0H , Umask 20H
+Counts L2 cache line fill operations due to load, RFO, L1D writeback or
+prefetch.
+.It Li L2_TRANSACTIONS.WB
+.Pq Event F0H , Umask 40H
+Counts L2 writeback operations to the L3.
+.It Li L2_TRANSACTIONS.ANY
+.Pq Event F0H , Umask 80H
+Counts all L2 cache operations.
+.It Li L2_LINES_IN.S_STATE
+.Pq Event F1H , Umask 02H
+Counts the number of cache lines allocated in the L2 cache in the S (shared)
+state.
+.It Li L2_LINES_IN.E_STATE
+.Pq Event F1H , Umask 04H
+Counts the number of cache lines allocated in the L2 cache in the E
+(exclusive) state.
+.It Li L2_LINES_IN.ANY
+.Pq Event F1H , Umask 07H
+Counts the number of cache lines allocated in the L2 cache.
+.It Li L2_LINES_OUT.DEMAND_CLEAN
+.Pq Event F2H , Umask 01H
+Counts L2 clean cache lines evicted by a demand request.
+.It Li L2_LINES_OUT.DEMAND_DIRTY
+.Pq Event F2H , Umask 02H
+Counts L2 dirty (modified) cache lines evicted by a demand request.
+.It Li L2_LINES_OUT.PREFETCH_CLEAN
+.Pq Event F2H , Umask 04H
+Counts L2 clean cache line evicted by a prefetch request.
+.It Li L2_LINES_OUT.PREFETCH_DIRTY
+.Pq Event F2H , Umask 08H
+Counts L2 modified cache line evicted by a prefetch request.
+.It Li L2_LINES_OUT.ANY
+.Pq Event F2H , Umask 0FH
+Counts all L2 cache lines evicted for any reason.
+.It Li SQ_MISC.SPLIT_LOCK
+.Pq Event F4H , Umask 10H
+Counts the number of SQ lock splits across a cache line.
+.It Li SQ_FULL_STALL_CYCLES
+.Pq Event F6H , Umask 01H
+Counts cycles the Super Queue is full. Neither of the threads on this core
+will be able to access the uncore.
+.It Li FP_ASSIST.ALL
+.Pq Event F7H , Umask 01H
+Counts the number of floating point operations executed that required
+micro-code assist intervention. Assists are required in the following cases:
+SSE instructions, (Denormal input when the DAZ flag is off or Underflow
+result when the FTZ flag is off): x87 instructions, (NaN or denormal are
+loaded to a register or used as input from memory, Division by 0 or
+Underflow output).
+.It Li FP_ASSIST.OUTPUT
+.Pq Event F7H , Umask 02H
+Counts number of floating point micro-code assist when the output value
+(destination register) is invalid.
+.It Li FP_ASSIST.INPUT
+.Pq Event F7H , Umask 04H
+Counts number of floating point micro-code assist when the input value (one
+of the source operands to an FP instruction) is invalid.
+.It Li SIMD_INT_64.PACKED_MPY
+.Pq Event FDH , Umask 01H
+Counts number of SID integer 64 bit packed multiply operations.
+.It Li SIMD_INT_64.PACKED_SHIFT
+.Pq Event FDH , Umask 02H
+Counts number of SID integer 64 bit packed shift operations.
+.It Li SIMD_INT_64.PACK
+.Pq Event FDH , Umask 04H
+Counts number of SID integer 64 bit pack operations.
+.It Li SIMD_INT_64.UNPACK
+.Pq Event FDH , Umask 08H
+Counts number of SID integer 64 bit unpack operations.
+.It Li SIMD_INT_64.PACKED_LOGICAL
+.Pq Event FDH , Umask 10H
+Counts number of SID integer 64 bit logical operations.
+.It Li SIMD_INT_64.PACKED_ARITH
+.Pq Event FDH , Umask 20H
+Counts number of SID integer 64 bit arithmetic operations.
+.It Li SIMD_INT_64.SHUFFLE_MOVE
+.Pq Event FDH , Umask 40H
+Counts number of SID integer 64 bit shift or move operations.
+.El
+.Ss Event Specifiers (Programmable PMCs)
+Core i7 and Xeon 5500 programmable PMCs support the following events as
+June 2009 document (removed in December 2009):
+.Bl -tag -width indent
+.It Li SB_FORWARD.ANY
+.Pq Event 02H , Umask 01H
+Counts the number of store forwards.
+.It Li LOAD_BLOCK.STD
+.Pq Event 03H , Umask 01H
+Counts the number of loads blocked by a preceding store with unknown data.
+.It Li LOAD_BLOCK.ADDRESS_OFFSET
+.Pq Event 03H , Umask 04H
+Counts the number of loads blocked by a preceding store address.
+.It Li LOAD_BLOCK.ADDRESS_OFFSET
+.Pq Event 01H , Umask 04H
+Counts the cycles of store buffer drains.
+.It Li MISALIGN_MEM_REF.LOAD
+.Pq Event 05H , Umask 01H
+Counts the number of misaligned load references
+.It Li MISALIGN_MEM_REF.STORE
+.Pq Event 05H , Umask 02H
+Counts the number of misaligned store references
+.It Li MISALIGN_MEM_REF.ANY
+.Pq Event 05H , Umask 03H
+Counts the number of misaligned memory references
+.It Li STORE_BLOCKS.NOT_STA
+.Pq Event 06H , Umask 01H
+This event counts the number of load operations delayed caused by preceding
+stores whose addresses are known but whose data is unknown, and preceding
+stores that conflict with the load but which incompletely overlap the load.
+.It Li STORE_BLOCKS.STA
+.Pq Event 06H , Umask 02H
+This event counts load operations delayed caused by preceding stores whose
+addresses are unknown (STA block).
+.It Li STORE_BLOCKS.ANY
+.Pq Event 06H , Umask 0FH
+All loads delayed due to store blocks
+.It Li MEMORY_DISAMBIGURATION.RESET
+.Pq Event 09H , Umask 01H
+Counts memory disambiguration reset cycles
+.It Li MEMORY_DISAMBIGURATION.SUCCESS
+.Pq Event 09H , Umask 02H
+Counts the number of loads that memory disambiguration succeeded
+.It Li MEMORY_DISAMBIGURATION.WATCHDOG
+.Pq Event 09H , Umask 04H
+Counts the number of times the memory disambiguration watchdog kicked in.
+.It Li MEMORY_DISAMBIGURATION.WATCH_CYCLES
+.Pq Event 09H , Umask 08H
+Counts the cycles that the memory disambiguration watchdog is active.
+set invert=1, cmask = 1
+.It Li HW_INT.RCV
+.Pq Event 1DH , Umask 01H
+Number of interrupt received
+.It Li HW_INT.CYCLES_MASKED
+.Pq Event 1DH , Umask 02H
+Number of cycles interrupt are masked
+.It Li HW_INT.CYCLES_PENDING_AND_MASKED
+.Pq Event 1DH , Umask 04H
+Number of cycles interrupts are pending and masked
+.It Li HW_INT.CYCLES_PENDING_AND_MASKED
+.Pq Event 04H , Umask 04H
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in the E (exclusive) state. The L1D prefetcher does not issue a RFO
+prefetch.
+This is a demand RFO request
+.It Li HW_INT.CYCLES_PENDING_AND_MASKED
+.Pq Event 27H , Umask 04H
+LONGEST_LAT_CACH E.MISS
+.It Li UOPS_DECODED.DEC0
+.Pq Event 3DH , Umask 01H
+Counts micro-ops decoded by decoder 0.
+.It Li UOPS_DECODED.DEC0
+.Pq Event 01H , Umask 01H
+Counts L1 data cache store RFO requests where the cache line to be loaded is
+in the I state.
+Counter 0, 1 only
+.It Li 0FH
+.Pq Event 41H , Umask 41H
+L1D_CACHE_ST.MESI
+Counts L1 data cache store RFO requests.
+Counter 0, 1 only
+.It Li DTLB_MISSES.PDE_MISS
+.Pq Event 49H , Umask 20H
+Number of DTLB cache misses where the low part of the linear to physical
+address translation was missed.
+.It Li DTLB_MISSES.PDP_MISS
+.Pq Event 49H , Umask 40H
+Number of DTLB misses where the high part of the linear to physical address
+translation was missed.
+.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
+.Pq Event 49H , Umask 80H
+Counts number of completed large page walks due to misses in the STLB.
+.It Li SSE_MEM_EXEC.NTA
+.Pq Event 4BH , Umask 01H
+Counts number of SSE NTA prefetch/weakly-ordered instructions which missed
+the L1 data cache.
+.It Li SSE_MEM_EXEC.STREAMING_STORES
+.Pq Event 4BH , Umask 08H
+Counts number of SSE non temporal stores
+.It Li SFENCE_CYCLES
+.Pq Event 4DH , Umask 01H
+Counts store fence cycles
+.It Li EPT.EPDE_MISS
+.Pq Event 4FH , Umask 02H
+Counts Extended Page Directory Entry misses. The Extended Page Directory
+cache is used by Virtual Machine operating systems while the guest operating
+systems use the standard TLB caches.
+.It Li EPT.EPDPE_HIT
+.Pq Event 4FH , Umask 04H
+Counts Extended Page Directory Pointer Entry hits.
+.It Li EPT.EPDPE_MISS
+.Pq Event 4FH , Umask 08H
+Counts Extended Page Directory Pointer Entry misses. T
+.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA
+.Pq Event 60H , Umask 01H
+Counts weighted cycles of offcore demand data read requests. Does not
+include L2 prefetch requests.
+counter 0
+.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE
+.Pq Event 60H , Umask 02H
+Counts weighted cycles of offcore demand code read requests. Does not
+include L2 prefetch requests.
+counter 0
+.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO
+.Pq Event 60H , Umask 04H
+Counts weighted cycles of offcore demand RFO requests. Does not include L2
+prefetch requests.
+counter 0
+.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ
+.Pq Event 60H , Umask 08H
+Counts weighted cycles of offcore read requests of any kind. Include L2
+prefetch requests.
+counter 0
+.It Li IFU_IVC.FULL
+.Pq Event 81H , Umask 01H
+Instruction Fetche unit victim cache full.
+.It Li IFU_IVC.L1I_EVICTION
+.Pq Event 81H , Umask 02H
+L1 Instruction cache evictions.
+.It Li L1I_OPPORTUNISTIC_HITS
+.Pq Event 83H , Umask 01H
+Opportunistic hits in streaming.
+.It Li ITLB_MISSES.WALK_CYCLES
+.Pq Event 85H , Umask 04H
+Counts ITLB miss page walk cycles.
+.It Li ITLB_MISSES.PMH_BUSY_CYCLES
+.Pq Event 85H , Umask 04H
+Counts PMH busy cycles.
+.It Li ITLB_MISSES.STLB_HIT
+.Pq Event 85H , Umask 10H
+Counts the number of ITLB misses that hit in the second level TLB.
+.It Li ITLB_MISSES.PDE_MISS
+.Pq Event 85H , Umask 20H
+Number of ITLB misses where the low part of the linear to physical address
+translation was missed.
+.It Li ITLB_MISSES.PDP_MISS
+.Pq Event 85H , Umask 40H
+Number of ITLB misses where the high part of the linear to physical address
+translation was missed.
+.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
+.Pq Event 85H , Umask 80H
+Counts number of completed large page walks due to misses in the STLB.
+.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
+.Pq Event 01H , Umask 80H
+Counts number of offcore demand data read requests. Does not count L2
+prefetch requests.
+.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE
+.Pq Event B0H , Umask 02H
+Counts number of offcore demand code read requests. Does not count L2
+prefetch requests.
+.It Li OFFCORE_REQUESTS.DEMAND.RFO
+.Pq Event B0H , Umask 04H
+Counts number of offcore demand RFO requests. Does not count L2 prefetch
+requests.
+.It Li OFFCORE_REQUESTS.ANY.READ
+.Pq Event B0H , Umask 08H
+Counts number of offcore read requests. Includes L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.ANY.RFO
+.Pq Event B0H , Umask 10H
+Counts number of offcore RFO requests. Includes L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.UNCACHED_MEM
+.Pq Event B0H , Umask 20H
+Counts number of offcore uncached memory requests.
+.It Li OFFCORE_REQUESTS.ANY
+.Pq Event B0H , Umask 80H
+Counts all offcore requests.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA
+.Pq Event B3H , Umask 01H
+Counts weighted cycles of snoopq requests for data. Counter 0 only
+Use cmask=1 to count cycles not empty.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE
+.Pq Event B3H , Umask 02H
+Counts weighted cycles of snoopq invalidate requests. Counter 0 only
+Use cmask=1 to count cycles not empty.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
+.Pq Event B3H , Umask 04H
+Counts weighted cycles of snoopq requests for code. Counter 0 only
+Use cmask=1 to count cycles not empty.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
+.Pq Event BAH , Umask 04H
+Counts number of TPR reads
+.It Li PIC_ACCESSES.TPR_WRITES
+.Pq Event BAH , Umask 02H
+Counts number of TPR writes
+one or two micro-ops. Some instructions are decoded into longer sequences
+.It Li MACHINE_CLEARS.FUSION_ASSIST
+.Pq Event C3H , Umask 10H
+Counts the number of macro-fusion assists
+Counts SIMD packed single- precision floating point Uops retired.
+.It Li BOGUS_BR
+.Pq Event E4H , Umask 01H
+Counts the number of bogus branches.
+.It Li L2_HW_PREFETCH.HIT
+.Pq Event F3H , Umask 01H
+Count L2 HW prefetcher detector hits
+.It Li L2_HW_PREFETCH.ALLOC
+.Pq Event F3H , Umask 02H
+Count L2 HW prefetcher allocations
+.It Li L2_HW_PREFETCH.DATA_TRIGGER
+.Pq Event F3H , Umask 04H
+Count L2 HW data prefetcher triggered
+.It Li L2_HW_PREFETCH.CODE_TRIGGER
+.Pq Event F3H , Umask 08H
+Count L2 HW code prefetcher triggered
+.It Li L2_HW_PREFETCH.DCA_TRIGGER
+.Pq Event F3H , Umask 10H
+Count L2 HW DCA prefetcher triggered
+.It Li L2_HW_PREFETCH.KICK_START
+.Pq Event F3H , Umask 20H
+Count L2 HW prefetcher kick started
+.It Li SQ_MISC.PROMOTION
+.Pq Event F4H , Umask 01H
+Counts the number of L2 secondary misses that hit the Super Queue.
+.It Li SQ_MISC.PROMOTION_POST_GO
+.Pq Event F4H , Umask 02H
+Counts the number of L2 secondary misses during the Super Queue filling L2.
+.It Li SQ_MISC.LRU_HINTS
+.Pq Event F4H , Umask 04H
+Counts number of Super Queue LRU hints sent to L3.
+.It Li SQ_MISC.FILL_DROPPED
+.Pq Event F4H , Umask 08H
+Counts the number of SQ L2 fills dropped due to L2 busy.
+.It Li SEGMENT_REG_LOADS
+.Pq Event F8H , Umask 01H
+Counts number of segment register loads.
+.El
+.Sh SEE ALSO
+.Xr pmc 3 ,
+.Xr pmc.atom 3 ,
+.Xr pmc.core 3 ,
+.Xr pmc.iaf 3 ,
+.Xr pmc.ucf 3 ,
+.Xr pmc.k7 3 ,
+.Xr pmc.k8 3 ,
+.Xr pmc.p4 3 ,
+.Xr pmc.p5 3 ,
+.Xr pmc.p6 3 ,
+.Xr pmc.corei7uc 3 ,
+.Xr pmc.westmere 3 ,
+.Xr pmc.westmereuc 3 ,
+.Xr pmc.tsc 3 ,
+.Xr pmc_cpuinfo 3 ,
+.Xr pmclog 3 ,
+.Xr hwpmc 4
+.Sh HISTORY
+The
+.Nm pmc
+library first appeared in
+.Fx 6.0 .
+.Sh AUTHORS
+The
+.Lb libpmc
+library was written by
+.An "Joseph Koshy"
+.Aq jkoshy@FreeBSD.org .
OpenPOWER on IntegriCloud