1 files changed, 1398 insertions, 0 deletions
diff --git a/lib/libpmc/pmc.westmere.3 b/lib/libpmc/pmc.westmere.3
new file mode 100644
index 0000000..3668460
--- /dev/null
+++ b/lib/libpmc/pmc.westmere.3
@@ -0,0 +1,1398 @@
+.\" Copyright (c) 2010 Fabien Thomas.  All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\"    notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\"    notice, this list of conditions and the following disclaimer in the
+.\"    documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd February 25, 2012
+.Dt PMC.WESTMERE 3
+.Os
+.Sh NAME
+.Nm pmc.westmere
+.Nd measurement events for
+.Tn Intel
+.Tn Westmere
+family CPUs
+.Sh LIBRARY
+.Lb libpmc
+.Sh SYNOPSIS
+.In pmc.h
+.Sh DESCRIPTION
+.Tn Intel
+.Tn "Westmere"
+CPUs contain PMCs conforming to version 2 of the
+.Tn Intel
+performance measurement architecture.
+These CPUs may contain up to three classes of PMCs:
+.Bl -tag -width "Li PMC_CLASS_IAP"
+.It Li PMC_CLASS_IAF
+Fixed-function counters that count only one hardware event per counter.
+.It Li PMC_CLASS_IAP
+Programmable counters that may be configured to count one of a defined
+set of hardware events.
+.El
+.Pp
+The number of PMCs available in each class and their widths need to be
+determined at run time by calling
+.Xr pmc_cpuinfo 3 .
+.Pp
+Intel Westmere PMCs are documented in
+.Rs
+.%B "Intel(R) 64 and IA-32 Architectures Software Developes Manual"
+.%T "Volume 3B: System Programming Guide, Part 2"
+.%N "Order Number: 253669-033US"
+.%D December 2009
+.%Q "Intel Corporation"
+.Re
+.Ss WESTMERE FIXED FUNCTION PMCS
+These PMCs and their supported events are documented in
+.Xr pmc.iaf 3 .
+.Ss WESTMERE PROGRAMMABLE PMCS
+The programmable PMCs support the following capabilities:
+.Bl -column "PMC_CAP_INTERRUPT" "Support"
+.It Em Capability Ta Em Support
+.It PMC_CAP_CASCADE Ta \&No
+.It PMC_CAP_EDGE Ta Yes
+.It PMC_CAP_INTERRUPT Ta Yes
+.It PMC_CAP_INVERT Ta Yes
+.It PMC_CAP_READ Ta Yes
+.It PMC_CAP_PRECISE Ta \&No
+.It PMC_CAP_SYSTEM Ta Yes
+.It PMC_CAP_TAGGING Ta \&No
+.It PMC_CAP_THRESHOLD Ta Yes
+.It PMC_CAP_USER Ta Yes
+.It PMC_CAP_WRITE Ta Yes
+.El
+.Ss Event Qualifiers
+Event specifiers for these PMCs support the following common
+qualifiers:
+.Bl -tag -width indent
+.It Li rsp= Ns Ar value
+Configure the Off-core Response bits.
+.Bl -tag -width indent
+.It Li DMND_DATA_RD
+Counts the number of demand and DCU prefetch data reads of full
+and partial cachelines as well as demand data page table entry
+cacheline reads.
+Does not count L2 data read prefetches or
+instruction fetches.
+.It Li DMND_RFO
+Counts the number of demand and DCU prefetch reads for ownership
+(RFO) requests generated by a write to data cacheline.
+Does not count L2 RFO.
+.It Li DMND_IFETCH
+Counts the number of demand and DCU prefetch instruction cacheline
+reads.
+Does not count L2 code read prefetches.
+WB
+Counts the number of writeback (modified to exclusive) transactions.
+.It Li PF_DATA_RD
+Counts the number of data cacheline reads generated by L2 prefetchers.
+.It Li PF_RFO
+Counts the number of RFO requests generated by L2 prefetchers.
+.It Li PF_IFETCH
+Counts the number of code reads generated by L2 prefetchers.
+.It Li OTHER
+Counts one of the following transaction types, including L3 invalidate,
+I/O, full or partial writes, WC or non-temporal stores, CLFLUSH, Fences,
+lock, unlock, split lock.
+.It Li UNCORE_HIT
+L3 Hit: local or remote home requests that hit L3 cache in the uncore
+with no coherency actions required (snooping).
+.It Li OTHER_CORE_HIT_SNP
+L3 Hit: local or remote home requests that hit L3 cache in the uncore
+and was serviced by another core with a cross core snoop where no modified
+copies were found (clean).
+.It Li OTHER_CORE_HITM
+L3 Hit: local or remote home requests that hit L3 cache in the uncore
+and was serviced by another core with a cross core snoop where modified
+copies were found (HITM).
+.It Li REMOTE_CACHE_FWD
+L3 Miss: local homed requests that missed the L3 cache and was serviced
+by forwarded data following a cross package snoop where no modified
+copies found. (Remote home requests are not counted)
+.It Li REMOTE_DRAM
+L3 Miss: remote home requests that missed the L3 cache and were serviced
+by remote DRAM.
+.It Li LOCAL_DRAM
+L3 Miss: local home requests that missed the L3 cache and were serviced
+by local DRAM.
+.It Li NON_DRAM
+Non-DRAM requests that were serviced by IOH.
+.El
+.It Li cmask= Ns Ar value
+Configure the PMC to increment only if the number of configured
+events measured in a cycle is greater than or equal to
+.Ar value .
+.It Li edge
+Configure the PMC to count the number of de-asserted to asserted
+transitions of the conditions expressed by the other qualifiers.
+If specified, the counter will increment only once whenever a
+condition becomes true, irrespective of the number of clocks during
+which the condition remains true.
+.It Li inv
+Invert the sense of comparison when the
+.Dq Li cmask
+qualifier is present, making the counter increment when the number of
+events per cycle is less than the value specified by the
+.Dq Li cmask
+qualifier.
+.It Li os
+Configure the PMC to count events happening at processor privilege
+level 0.
+.It Li usr
+Configure the PMC to count events occurring at privilege levels 1, 2
+or 3.
+.El
+.Pp
+If neither of the
+.Dq Li os
+or
+.Dq Li usr
+qualifiers are specified, the default is to enable both.
+.Ss Event Specifiers (Programmable PMCs)
+Westmere programmable PMCs support the following events:
+.Bl -tag -width indent
+.It Li LOAD_BLOCK.OVERLAP_STORE
+.Pq Event 03H , Umask 02H
+Loads that partially overlap an earlier store
+.It Li SB_DRAIN.ANY
+.Pq Event 04H , Umask 07H
+All Store buffer stall cycles
+.It Li MISALIGN_MEMORY.STORE
+.Pq Event 05H , Umask 02H
+All store referenced with misaligned address
+.It Li STORE_BLOCKS.AT_RET
+.Pq Event 06H , Umask 04H
+Counts number of loads delayed with at-Retirement block code.
+The following
+loads need to be executed at retirement and wait for all senior stores on
+the same thread to be drained: load splitting across 4K boundary (page
+split), load accessing uncacheable (UC or USWC) memory, load lock, and load
+with page table in UC or USWC memory region.
+.It Li STORE_BLOCKS.L1D_BLOCK
+.Pq Event 06H , Umask 08H
+Cacheable loads delayed with L1D block code
+.It Li PARTIAL_ADDRESS_ALIAS
+.Pq Event 07H , Umask 01H
+Counts false dependency due to partial address aliasing
+.It Li DTLB_LOAD_MISSES.ANY
+.Pq Event 08H , Umask 01H
+Counts all load misses that cause a page walk
+.It Li DTLB_LOAD_MISSES.WALK_COMPLETED
+.Pq Event 08H , Umask 02H
+Counts number of completed page walks due to load miss in the STLB.
+.It Li DTLB_LOAD_MISSES.WALK_CYCLES
+.Pq Event 08H , Umask 04H
+Cycles PMH is busy with a page walk due to a load miss in the STLB.
+.It Li DTLB_LOAD_MISSES.STLB_HIT
+.Pq Event 08H , Umask 10H
+Number of cache load STLB hits
+.It Li DTLB_LOAD_MISSES.PDE_MISS
+.Pq Event 08H , Umask 20H
+Number of DTLB cache load misses where the low part of the linear to
+physical address translation was missed.
+.It Li MEM_INST_RETIRED.LOADS
+.Pq Event 0BH , Umask 01H
+Counts the number of instructions with an architecturally-visible store
+retired on the architected path.
+In conjunction with ld_lat facility
+.It Li MEM_INST_RETIRED.STORES
+.Pq Event 0BH , Umask 02H
+Counts the number of instructions with an architecturally-visible store
+retired on the architected path.
+In conjunction with ld_lat facility
+.It Li MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD
+.Pq Event 0BH , Umask 10H
+Counts the number of instructions exceeding the latency specified with
+ld_lat facility.
+In conjunction with ld_lat facility
+.It Li MEM_STORE_RETIRED.DTLB_MISS
+.Pq Event 0CH , Umask 01H
+The event counts the number of retired stores that missed the DTLB.
+The DTLB miss is not counted if the store operation causes a fault.
+Does not counter prefetches.
+Counts both primary and secondary misses to the TLB
+.It Li UOPS_ISSUED.ANY
+.Pq Event 0EH , Umask 01H
+Counts the number of Uops issued by the Register Allocation Table to the
+Reservation Station, i.e. the UOPs issued from the front end to the back
+end.
+.It Li UOPS_ISSUED.STALLED_CYCLES
+.Pq Event 0EH , Umask 01H
+Counts the number of cycles no Uops issued by the Register Allocation Table
+to the Reservation Station, i.e. the UOPs issued from the front end to the
+back end.
+set invert=1, cmask = 1
+.It Li UOPS_ISSUED.FUSED
+.Pq Event 0EH , Umask 02H
+Counts the number of fused Uops that were issued from the Register
+Allocation Table to the Reservation Station.
+.It Li MEM_UNCORE_RETIRED.LOCAL_HITM
+.Pq Event 0FH , Umask 02H
+Load instructions retired that HIT modified data in sibling core (Precise
+Event)
+.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM_AND_REMOTE_CACHE_HIT
+.Pq Event 0FH , Umask 08H
+Load instructions retired local dram and remote cache HIT data sources
+(Precise Event)
+.It Li MEM_UNCORE_RETIRED.LOCAL_DRAM
+.Pq Event 0FH , Umask 10H
+Load instructions retired with a data source of local DRAM or locally homed
+remote cache HITM (Precise Event)
+.It Li MEM_UNCORE_RETIRED.REMOTE_DRAM
+.Pq Event 0FH , Umask 20H
+Load instructions retired remote DRAM and remote home-remote cache HITM
+(Precise Event)
+.It Li MEM_UNCORE_RETIRED.UNCACHEABLE
+.Pq Event 0FH , Umask 80H
+Load instructions retired I/O (Precise Event)
+.It Li FP_COMP_OPS_EXE.X87
+.Pq Event 10H , Umask 01H
+Counts the number of FP Computational Uops Executed.
+The number of FADD,
+FSUB, FCOM, FMULs, integer MULsand IMULs, FDIVs, FPREMs, FSQRTS, integer
+DIVs, and IDIVs.
+This event does not distinguish an FADD used in the middle
+of a transcendental flow from a separate FADD instruction.
+.It Li FP_COMP_OPS_EXE.MMX
+.Pq Event 10H , Umask 02H
+Counts number of MMX Uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_FP
+.Pq Event 10H , Umask 04H
+Counts number of SSE and SSE2 FP uops executed.
+.It Li FP_COMP_OPS_EXE.SSE2_INTEGER
+.Pq Event 10H , Umask 08H
+Counts number of SSE2 integer uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_FP_PACKED
+.Pq Event 10H , Umask 10H
+Counts number of SSE FP packed uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_FP_SCALAR
+.Pq Event 10H , Umask 20H
+Counts number of SSE FP scalar uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_SINGLE_PRECISION
+.Pq Event 10H , Umask 40H
+Counts number of SSE* FP single precision uops executed.
+.It Li FP_COMP_OPS_EXE.SSE_DOUBLE_PRECISION
+.Pq Event 10H , Umask 80H
+Counts number of SSE* FP double precision uops executed.
+.It Li SIMD_INT_128.PACKED_MPY
+.Pq Event 12H , Umask 01H
+Counts number of 128 bit SIMD integer multiply operations.
+.It Li SIMD_INT_128.PACKED_SHIFT
+.Pq Event 12H , Umask 02H
+Counts number of 128 bit SIMD integer shift operations.
+.It Li SIMD_INT_128.PACK
+.Pq Event 12H , Umask 04H
+Counts number of 128 bit SIMD integer pack operations.
+.It Li SIMD_INT_128.UNPACK
+.Pq Event 12H , Umask 08H
+Counts number of 128 bit SIMD integer unpack operations.
+.It Li SIMD_INT_128.PACKED_LOGICAL
+.Pq Event 12H , Umask 10H
+Counts number of 128 bit SIMD integer logical operations.
+.It Li SIMD_INT_128.PACKED_ARITH
+.Pq Event 12H , Umask 20H
+Counts number of 128 bit SIMD integer arithmetic operations.
+.It Li SIMD_INT_128.SHUFFLE_MOVE
+.Pq Event 12H , Umask 40H
+Counts number of 128 bit SIMD integer shuffle and move operations.
+.It Li LOAD_DISPATCH.RS
+.Pq Event 13H , Umask 01H
+Counts number of loads dispatched from the Reservation Station that bypass
+the Memory Order Buffer.
+.It Li LOAD_DISPATCH.RS_DELAYED
+.Pq Event 13H , Umask 02H
+Counts the number of delayed RS dispatches at the stage latch.
+If an RS dispatch can not bypass to LB, it has another chance to dispatch
+from the one-cycle delayed staging latch before it is written into the LB.
+.It Li LOAD_DISPATCH.MOB
+.Pq Event 13H , Umask 04H
+Counts the number of loads dispatched from the Reservation Station to the
+Memory Order Buffer.
+.It Li LOAD_DISPATCH.ANY
+.Pq Event 13H , Umask 07H
+Counts all loads dispatched from the Reservation Station.
+.It Li ARITH.CYCLES_DIV_BUSY
+.Pq Event 14H , Umask 01H
+Counts the number of cycles the divider is busy executing divide or square
+root operations.
+The divide can be integer, X87 or Streaming SIMD Extensions (SSE).
+The square root operation can be either X87 or SSE.
+Set 'edge =1, invert=1, cmask=1' to count the number of divides.
+Count may be incorrect When SMT is on
+.It Li ARITH.MUL
+.Pq Event 14H , Umask 02H
+Counts the number of multiply operations executed.
+This includes integer as
+well as floating point multiply operations but excludes DPPS mul and MPSAD.
+Count may be incorrect When SMT is on
+.It Li INST_QUEUE_WRITES
+.Pq Event 17H , Umask 01H
+Counts the number of instructions written into the instruction queue every
+cycle.
+.It Li INST_DECODED.DEC0
+.Pq Event 18H , Umask 01H
+Counts number of instructions that require decoder 0 to be decoded.
+Usually, this means that the instruction maps to more than 1 uop
+.It Li TWO_UOP_INSTS_DECODED
+.Pq Event 19H , Umask 01H
+An instruction that generates two uops was decoded
+.It Li INST_QUEUE_WRITE_CYCLES
+.Pq Event 1EH , Umask 01H
+This event counts the number of cycles during which instructions are written
+to the instruction queue.
+Dividing this counter by the number of
+instructions written to the instruction queue (INST_QUEUE_WRITES) yields the
+average number of instructions decoded each cycle.
+If this number is less
+than four and the pipe stalls, this indicates that the decoder is failing to
+decode enough instructions per cycle to sustain the 4-wide pipeline.
+If SSE* instructions that are 6 bytes or longer arrive one after another,
+then front end throughput may limit execution speed.
+In such case,
+.It Li LSD_OVERFLOW
+.Pq Event 20H , Umask 01H
+Number of loops that can not stream from the instruction queue.
+.It Li L2_RQSTS.LD_HIT
+.Pq Event 24H , Umask 01H
+Counts number of loads that hit the L2 cache.
+L2 loads include both L1D demand misses as well as L1D prefetches.
+L2 loads can be rejected for various reasons.
+Only non rejected loads are counted.
+.It Li L2_RQSTS.LD_MISS
+.Pq Event 24H , Umask 02H
+Counts the number of loads that miss the L2 cache.
+L2 loads include both L1D demand misses as well as L1D prefetches.
+.It Li L2_RQSTS.LOADS
+.Pq Event 24H , Umask 03H
+Counts all L2 load requests.
+L2 loads include both L1D demand misses as well as L1D prefetches.
+.It Li L2_RQSTS.RFO_HIT
+.Pq Event 24H , Umask 04H
+Counts the number of store RFO requests that hit the L2 cache.
+L2 RFO requests include both L1D demand RFO misses as well as L1D RFO
+prefetches.
+Count includes WC memory requests, where the data is not fetched but the
+permission to write the line is required.
+.It Li L2_RQSTS.RFO_MISS
+.Pq Event 24H , Umask 08H
+Counts the number of store RFO requests that miss the L2 cache.
+L2 RFO requests include both L1D demand RFO misses as well as L1D RFO
+prefetches.
+.It Li L2_RQSTS.RFOS
+.Pq Event 24H , Umask 0CH
+Counts all L2 store RFO requests.
+L2 RFO requests include both L1D demand
+RFO misses as well as L1D RFO prefetches.
+.It Li L2_RQSTS.IFETCH_HIT
+.Pq Event 24H , Umask 10H
+Counts number of instruction fetches that hit the L2 cache.
+L2 instruction fetches include both L1I demand misses as well as L1I
+instruction prefetches.
+.It Li L2_RQSTS.IFETCH_MISS
+.Pq Event 24H , Umask 20H
+Counts number of instruction fetches that miss the L2 cache.
+L2 instruction fetches include both L1I demand misses as well as L1I
+instruction prefetches.
+.It Li L2_RQSTS.IFETCHES
+.Pq Event 24H , Umask 30H
+Counts all instruction fetches.
+L2 instruction fetches include both L1I
+demand misses as well as L1I instruction prefetches.
+.It Li L2_RQSTS.PREFETCH_HIT
+.Pq Event 24H , Umask 40H
+Counts L2 prefetch hits for both code and data.
+.It Li L2_RQSTS.PREFETCH_MISS
+.Pq Event 24H , Umask 80H
+Counts L2 prefetch misses for both code and data.
+.It Li L2_RQSTS.PREFETCHES
+.Pq Event 24H , Umask C0H
+Counts all L2 prefetches for both code and data.
+.It Li L2_RQSTS.MISS
+.Pq Event 24H , Umask AAH
+Counts all L2 misses for both code and data.
+.It Li L2_RQSTS.REFERENCES
+.Pq Event 24H , Umask FFH
+Counts all L2 requests for both code and data.
+.It Li L2_DATA_RQSTS.DEMAND.I_STATE
+.Pq Event 26H , Umask 01H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the I (invalid) state, i.e. a cache miss.
+L2 demand loads are both L1D demand misses and L1D prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.S_STATE
+.Pq Event 26H , Umask 02H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the S (shared) state.
+L2 demand loads are both L1D demand misses and L1D
+prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.E_STATE
+.Pq Event 26H , Umask 04H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the E (exclusive) state.
+L2 demand loads are both L1D demand misses and
+L1D prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.M_STATE
+.Pq Event 26H , Umask 08H
+Counts number of L2 data demand loads where the cache line to be loaded is
+in the M (modified) state.
+L2 demand loads are both L1D demand misses and
+L1D prefetches.
+.It Li L2_DATA_RQSTS.DEMAND.MESI
+.Pq Event 26H , Umask 0FH
+Counts all L2 data demand requests.
+L2 demand loads are both L1D demand
+misses and L1D prefetches.
+.It Li L2_DATA_RQSTS.PREFETCH.I_STATE
+.Pq Event 26H , Umask 10H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the I (invalid) state, i.e. a cache miss.
+.It Li L2_DATA_RQSTS.PREFETCH.S_STATE
+.Pq Event 26H , Umask 20H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the S (shared) state.
+A prefetch RFO will miss on an S state line, while
+a prefetch read will hit on an S state line.
+.It Li L2_DATA_RQSTS.PREFETCH.E_STATE
+.Pq Event 26H , Umask 40H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the E (exclusive) state.
+.It Li L2_DATA_RQSTS.PREFETCH.M_STATE
+.Pq Event 26H , Umask 80H
+Counts number of L2 prefetch data loads where the cache line to be loaded is
+in the M (modified) state.
+.It Li L2_DATA_RQSTS.PREFETCH.MESI
+.Pq Event 26H , Umask F0H
+Counts all L2 prefetch requests.
+.It Li L2_DATA_RQSTS.ANY
+.Pq Event 26H , Umask FFH
+Counts all L2 data requests.
+.It Li L2_WRITE.RFO.I_STATE
+.Pq Event 27H , Umask 01H
+Counts number of L2 demand store RFO requests where the cache line to be
+loaded is in the I (invalid) state, i.e, a cache miss.
+The L1D prefetcher
+does not issue a RFO prefetch.
+This is a demand RFO request
+.It Li L2_WRITE.RFO.S_STATE
+.Pq Event 27H , Umask 02H
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in the S (shared) state.
+The L1D prefetcher does not issue a RFO prefetch.
+This is a demand RFO request.
+.It Li L2_WRITE.RFO.M_STATE
+.Pq Event 27H , Umask 08H
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in the M (modified) state.
+The L1D prefetcher does not issue a RFO prefetch.
+This is a demand RFO request.
+.It Li L2_WRITE.RFO.HIT
+.Pq Event 27H , Umask 0EH
+Counts number of L2 store RFO requests where the cache line to be loaded is
+in either the S, E or M states.
+The L1D prefetcher does not issue a RFO
+prefetch.
+This is a demand RFO request
+.It Li L2_WRITE.RFO.MESI
+.Pq Event 27H , Umask 0FH
+Counts all L2 store RFO requests.The L1D prefetcher does not issue a RFO
+prefetch.
+This is a demand RFO request.
+.It Li L2_WRITE.LOCK.I_STATE
+.Pq Event 27H , Umask 10H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in the I (invalid) state, i.e. a cache miss.
+.It Li L2_WRITE.LOCK.S_STATE
+.Pq Event 27H , Umask 20H
+Counts number of L2 lock RFO requests where the cache line to be loaded is
+in the S (shared) state.
+.It Li L2_WRITE.LOCK.E_STATE
+.Pq Event 27H , Umask 40H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in the E (exclusive) state.
+.It Li L2_WRITE.LOCK.M_STATE
+.Pq Event 27H , Umask 80H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in the M (modified) state.
+.It Li L2_WRITE.LOCK.HIT
+.Pq Event 27H , Umask E0H
+Counts number of L2 demand lock RFO requests where the cache line to be
+loaded is in either the S, E, or M state.
+.It Li L2_WRITE.LOCK.MESI
+.Pq Event 27H , Umask F0H
+Counts all L2 demand lock RFO requests.
+.It Li L1D_WB_L2.I_STATE
+.Pq Event 28H , Umask 01H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the I (invalid) state, i.e. a cache miss.
+.It Li L1D_WB_L2.S_STATE
+.Pq Event 28H , Umask 02H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the S state.
+.It Li L1D_WB_L2.E_STATE
+.Pq Event 28H , Umask 04H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the E (exclusive) state.
+.It Li L1D_WB_L2.M_STATE
+.Pq Event 28H , Umask 08H
+Counts number of L1 writebacks to the L2 where the cache line to be written
+is in the M (modified) state.
+.It Li L1D_WB_L2.MESI
+.Pq Event 28H , Umask 0FH
+Counts all L1 writebacks to the L2.
+.It Li L3_LAT_CACHE.REFERENCE
+.Pq Event 2EH , Umask 02H
+Counts uncore Last Level Cache references.
+Because cache hierarchy, cache
+sizes and other implementation-specific characteristics; value comparison to
+estimate performance differences is not recommended.
+See Table A-1.
+.It Li L3_LAT_CACHE.MISS
+.Pq Event 2EH , Umask 01H
+Counts uncore Last Level Cache misses.
+Because cache hierarchy, cache sizes
+and other implementation-specific characteristics; value comparison to
+estimate performance differences is not recommended.
+See Table A-1.
+.It Li CPU_CLK_UNHALTED.THREAD_P
+.Pq Event 3CH , Umask 00H
+Counts the number of thread cycles while the thread is not in a halt state.
+The thread enters the halt state when it is running the HLT instruction.
+The core frequency may change from time to time due to power or thermal
+throttling.
+see Table A-1
+.It Li CPU_CLK_UNHALTED.REF_P
+.Pq Event 3CH , Umask 01H
+Increments at the frequency of TSC when not halted.
+see Table A-1
+.It Li DTLB_MISSES.ANY
+.Pq Event 49H , Umask 01H
+Counts the number of misses in the STLB which causes a page walk.
+.It Li DTLB_MISSES.WALK_COMPLETED
+.Pq Event 49H , Umask 02H
+Counts number of misses in the STLB which resulted in a completed page walk.
+.It Li DTLB_MISSES.WALK_CYCLES
+.Pq Event 49H , Umask 04H
+Counts cycles of page walk due to misses in the STLB.
+.It Li DTLB_MISSES.STLB_HIT
+.Pq Event 49H , Umask 10H
+Counts the number of DTLB first level misses that hit in the second level
+TLB.
+This event is only relevant if the core contains multiple DTLB levels.
+.It Li DTLB_MISSES.LARGE_WALK_COMPLETED
+.Pq Event 49H , Umask 80H
+Counts number of completed large page walks due to misses in the STLB.
+.It Li LOAD_HIT_PRE
+.Pq Event 4CH , Umask 01H
+Counts load operations sent to the L1 data cache while a previous SSE
+prefetch instruction to the same cache line has started prefetching but has
+not yet finished.
+.It Li L1D_PREFETCH.REQUESTS
+.Pq Event 4EH , Umask 01H
+Counts number of hardware prefetch requests dispatched out of the prefetch
+FIFO.
+.It Li L1D_PREFETCH.MISS
+.Pq Event 4EH , Umask 02H
+Counts number of hardware prefetch requests that miss the L1D.
+There are two
+prefetchers in the L1D.
+A streamer, which predicts lines sequentially after
+this one should be fetched, and the IP prefetcher that remembers access
+patterns for the current instruction.
+The streamer prefetcher stops on an
+L1D hit, while the IP prefetcher does not.
+.It Li L1D_PREFETCH.TRIGGERS
+.Pq Event 4EH , Umask 04H
+Counts number of prefetch requests triggered by the Finite State Machine and
+pushed into the prefetch FIFO.
+Some of the prefetch requests are dropped due
+to overwrites or competition between the IP index prefetcher and streamer
+prefetcher.
+The prefetch FIFO contains 4 entries.
+.It Li EPT.WALK_CYCLES
+.Pq Event 4FH , Umask 10H
+Counts Extended Page walk cycles.
+.It Li L1D.REPL
+.Pq Event 51H , Umask 01H
+Counts the number of lines brought into the L1 data cache.
+Counter 0, 1 only.
+.It Li L1D.M_REPL
+.Pq Event 51H , Umask 02H
+Counts the number of modified lines brought into the L1 data cache.
+Counter 0, 1 only.
+.It Li L1D.M_EVICT
+.Pq Event 51H , Umask 04H
+Counts the number of modified lines evicted from the L1 data cache due to
+replacement.
+Counter 0, 1 only.
+.It Li L1D.M_SNOOP_EVICT
+.Pq Event 51H , Umask 08H
+Counts the number of modified lines evicted from the L1 data cache due to
+snoop HITM intervention.
+Counter 0, 1 only
+.It Li L1D_CACHE_PREFETCH_LOCK_FB_HIT
+.Pq Event 52H , Umask 01H
+Counts the number of cacheable load lock speculated instructions accepted
+into the fill buffer.
+.It Li L1D_CACHE_LOCK_FB_HIT
+.Pq Event 53H , Umask 01H
+Counts the number of cacheable load lock speculated or retired instructions
+accepted into the fill buffer.
+.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_DATA
+.Pq Event 60H , Umask 01H
+Counts weighted cycles of offcore demand data read requests.
+Does not include L2 prefetch requests.
+Counter 0.
+.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.READ_CODE
+.Pq Event 60H , Umask 02H
+Counts weighted cycles of offcore demand code read requests.
+Does not include L2 prefetch requests.
+Counter 0.
+.It Li OFFCORE_REQUESTS_OUTSTANDING.DEMAND.RFO
+.Pq Event 60H , Umask 04H
+Counts weighted cycles of offcore demand RFO requests.
+Does not include L2 prefetch requests.
+Counter 0.
+.It Li OFFCORE_REQUESTS_OUTSTANDING.ANY.READ
+.Pq Event 60H , Umask 08H
+Counts weighted cycles of offcore read requests of any kind.
+Include L2 prefetch requests.
+Counter 0.
+.It Li CACHE_LOCK_CYCLES.L1D_L2
+.Pq Event 63H , Umask 01H
+Cycle count during which the L1D and L2 are locked.
+A lock is asserted when
+there is a locked memory access, due to uncacheable memory, a locked
+operation that spans two cache lines, or a page walk from an uncacheable
+page table.
+Counter 0, 1 only.
+L1D and L2 locks have a very high performance penalty and
+it is highly recommended to avoid such accesses.
+.It Li CACHE_LOCK_CYCLES.L1D
+.Pq Event 63H , Umask 02H
+Counts the number of cycles that cacheline in the L1 data cache unit is
+locked.
+Counter 0, 1 only.
+.It Li IO_TRANSACTIONS
+.Pq Event 6CH , Umask 01H
+Counts the number of completed I/O transactions.
+.It Li L1I.HITS
+.Pq Event 80H , Umask 01H
+Counts all instruction fetches that hit the L1 instruction cache.
+.It Li L1I.MISSES
+.Pq Event 80H , Umask 02H
+Counts all instruction fetches that miss the L1I cache.
+This includes
+instruction cache misses, streaming buffer misses, victim cache misses and
+uncacheable fetches.
+An instruction fetch miss is counted only once and not
+once for every cycle it is outstanding.
+.It Li L1I.READS
+.Pq Event 80H , Umask 03H
+Counts all instruction fetches, including uncacheable fetches that bypass
+the L1I.
+.It Li L1I.CYCLES_STALLED
+.Pq Event 80H , Umask 04H
+Cycle counts for which an instruction fetch stalls due to a L1I cache miss,
+ITLB miss or ITLB fault.
+.It Li LARGE_ITLB.HIT
+.Pq Event 82H , Umask 01H
+Counts number of large ITLB hits.
+.It Li ITLB_MISSES.ANY
+.Pq Event 85H , Umask 01H
+Counts the number of misses in all levels of the ITLB which causes a page
+walk.
+.It Li ITLB_MISSES.WALK_COMPLETED
+.Pq Event 85H , Umask 02H
+Counts number of misses in all levels of the ITLB which resulted in a
+completed page walk.
+.It Li ITLB_MISSES.WALK_CYCLES
+.Pq Event 85H , Umask 04H
+Counts ITLB miss page walk cycles.
+.It Li ITLB_MISSES.LARGE_WALK_COMPLETED
+.Pq Event 85H , Umask 80H
+Counts number of completed large page walks due to misses in the STLB.
+.It Li ILD_STALL.LCP
+.Pq Event 87H , Umask 01H
+Cycles Instruction Length Decoder stalls due to length changing prefixes:
+66, 67 or REX.W (for EM64T) instructions which change the length of the
+decoded instruction.
+.It Li ILD_STALL.MRU
+.Pq Event 87H , Umask 02H
+Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU)
+Most Recently Used (MRU) bypass.
+.It Li ILD_STALL.IQ_FULL
+.Pq Event 87H , Umask 04H
+Stall cycles due to a full instruction queue.
+.It Li ILD_STALL.REGEN
+.Pq Event 87H , Umask 08H
+Counts the number of regen stalls.
+.It Li ILD_STALL.ANY
+.Pq Event 87H , Umask 0FH
+Counts any cycles the Instruction Length Decoder is stalled.
+.It Li BR_INST_EXEC.COND
+.Pq Event 88H , Umask 01H
+Counts the number of conditional near branch instructions executed, but not
+necessarily retired.
+.It Li BR_INST_EXEC.DIRECT
+.Pq Event 88H , Umask 02H
+Counts all unconditional near branch instructions excluding calls and
+indirect branches.
+.It Li BR_INST_EXEC.INDIRECT_NON_CALL
+.Pq Event 88H , Umask 04H
+Counts the number of executed indirect near branch instructions that are not
+calls.
+.It Li BR_INST_EXEC.NON_CALLS
+.Pq Event 88H , Umask 07H
+Counts all non call near branch instructions executed, but not necessarily
+retired.
+.It Li BR_INST_EXEC.RETURN_NEAR
+.Pq Event 88H , Umask 08H
+Counts indirect near branches that have a return mnemonic.
+.It Li BR_INST_EXEC.DIRECT_NEAR_CALL
+.Pq Event 88H , Umask 10H
+Counts unconditional near call branch instructions, excluding non call
+branch, executed.
+.It Li BR_INST_EXEC.INDIRECT_NEAR_CALL
+.Pq Event 88H , Umask 20H
+Counts indirect near calls, including both register and memory indirect,
+executed.
+.It Li BR_INST_EXEC.NEAR_CALLS
+.Pq Event 88H , Umask 30H
+Counts all near call branches executed, but not necessarily retired.
+.It Li BR_INST_EXEC.TAKEN
+.Pq Event 88H , Umask 40H
+Counts taken near branches executed, but not necessarily retired.
+.It Li BR_INST_EXEC.ANY
+.Pq Event 88H , Umask 7FH
+Counts all near executed branches (not necessarily retired).
+This includes only instructions and not micro-op branches.
+Frequent branching is not necessarily a major performance issue.
+However frequent branch mispredictions may be a problem.
+.It Li BR_MISP_EXEC.COND
+.Pq Event 89H , Umask 01H
+Counts the number of mispredicted conditional near branch instructions
+executed, but not necessarily retired.
+.It Li BR_MISP_EXEC.DIRECT
+.Pq Event 89H , Umask 02H
+Counts mispredicted macro unconditional near branch instructions, excluding
+calls and indirect branches (should always be 0).
+.It Li BR_MISP_EXEC.INDIRECT_NON_CALL
+.Pq Event 89H , Umask 04H
+Counts the number of executed mispredicted indirect near branch instructions
+that are not calls.
+.It Li BR_MISP_EXEC.NON_CALLS
+.Pq Event 89H , Umask 07H
+Counts mispredicted non call near branches executed, but not necessarily
+retired.
+.It Li BR_MISP_EXEC.RETURN_NEAR
+.Pq Event 89H , Umask 08H
+Counts mispredicted indirect branches that have a rear return mnemonic.
+.It Li BR_MISP_EXEC.DIRECT_NEAR_CALL
+.Pq Event 89H , Umask 10H
+Counts mispredicted non-indirect near calls executed, (should always be 0).
+.It Li BR_MISP_EXEC.INDIRECT_NEAR_CALL
+.Pq Event 89H , Umask 20H
+Counts mispredicted indirect near calls executed, including both register
+and memory indirect.
+.It Li BR_MISP_EXEC.NEAR_CALLS
+.Pq Event 89H , Umask 30H
+Counts all mispredicted near call branches executed, but not necessarily
+retired.
+.It Li BR_MISP_EXEC.TAKEN
+.Pq Event 89H , Umask 40H
+Counts executed mispredicted near branches that are taken, but not
+necessarily retired.
+.It Li BR_MISP_EXEC.ANY
+.Pq Event 89H , Umask 7FH
+Counts the number of mispredicted near branch instructions that were
+executed, but not necessarily retired.
+.It Li RESOURCE_STALLS.ANY
+.Pq Event A2H , Umask 01H
+Counts the number of Allocator resource related stalls.
+Includes register renaming buffer entries, memory buffer entries.
+In addition to resource related stalls, this event counts some other events.
+Includes stalls arising
+during branch misprediction recovery, such as if retirement of the
+mispredicted branch is delayed and stalls arising while store buffer is
+draining from synchronizing operations.
+Does not include stalls due to SuperQ (off core) queue full, too many cache
+misses, etc.
+.It Li RESOURCE_STALLS.LOAD
+.Pq Event A2H , Umask 02H
+Counts the cycles of stall due to lack of load buffer for load operation.
+.It Li RESOURCE_STALLS.RS_FULL
+.Pq Event A2H , Umask 04H
+This event counts the number of cycles when the number of instructions in
+the pipeline waiting for execution reaches the limit the processor can
+handle.
+A high count of this event indicates that there are long latency
+operations in the pipe (possibly load and store operations that miss the L2
+cache, or instructions dependent upon instructions further down the pipeline
+that have yet to retire.
+When RS is full, new instructions can not enter the reservation station and
+start execution.
+.It Li RESOURCE_STALLS.STORE
+.Pq Event A2H , Umask 08H
+This event counts the number of cycles that a resource related stall will
+occur due to the number of store instructions reaching the limit of the
+pipeline, (i.e. all store buffers are used).
+The stall ends when a store
+instruction commits its data to the cache or memory.
+.It Li RESOURCE_STALLS.ROB_FULL
+.Pq Event A2H , Umask 10H
+Counts the cycles of stall due to re- order buffer full.
+.It Li RESOURCE_STALLS.FPCW
+.Pq Event A2H , Umask 20H
+Counts the number of cycles while execution was stalled due to writing the
+floating-point unit (FPU) control word.
+.It Li RESOURCE_STALLS.MXCSR
+.Pq Event A2H , Umask 40H
+Stalls due to the MXCSR register rename occurring to close to a previous
+MXCSR rename.
+The MXCSR provides control and status for the MMX registers.
+.It Li RESOURCE_STALLS.OTHER
+.Pq Event A2H , Umask 80H
+Counts the number of cycles while execution was stalled due to other
+resource issues.
+.It Li MACRO_INSTS.FUSIONS_DECODED
+.Pq Event A6H , Umask 01H
+Counts the number of instructions decoded that are macro-fused but not
+necessarily executed or retired.
+.It Li BACLEAR_FORCE_IQ
+.Pq Event A7H , Umask 01H
+Counts number of times a BACLEAR was forced by the Instruction Queue.
+The IQ is also responsible for providing conditional branch prediction
+direction based on a static scheme and dynamic data provided by the L2
+Branch Prediction Unit.
+If the conditional branch target is not found in the Target
+Array and the IQ predicts that the branch is taken, then the IQ will force
+the Branch Address Calculator to issue a BACLEAR.
+Each BACLEAR asserted by
+the BAC generates approximately an 8 cycle bubble in the instruction fetch
+pipeline.
+.It Li LSD.UOPS
+.Pq Event A8H , Umask 01H
+Counts the number of micro-ops delivered by loop stream detector
+Use cmask=1 and invert to count cycles
+.It Li ITLB_FLUSH
+.Pq Event AEH , Umask 01H
+Counts the number of ITLB flushes
+.It Li OFFCORE_REQUESTS.DEMAND.READ_DATA
+.Pq Event B0H , Umask 01H
+Counts number of offcore demand data read requests.
+Does not count L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.DEMAND.READ_CODE
+.Pq Event B0H , Umask 02H
+Counts number of offcore demand code read requests.
+Does not count L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.DEMAND.RFO
+.Pq Event B0H , Umask 04H
+Counts number of offcore demand RFO requests.
+Does not count L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.ANY.READ
+.Pq Event B0H , Umask 08H
+Counts number of offcore read requests.
+Includes L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.ANY.RFO
+.Pq Event 80H , Umask 10H
+Counts number of offcore RFO requests.
+Includes L2 prefetch requests.
+.It Li OFFCORE_REQUESTS.L1D_WRITEBACK
+.Pq Event B0H , Umask 40H
+Counts number of L1D writebacks to the uncore.
+.It Li OFFCORE_REQUESTS.ANY
+.Pq Event B0H , Umask 80H
+Counts all offcore requests.
+.It Li UOPS_EXECUTED.PORT0
+.Pq Event B1H , Umask 01H
+Counts number of Uops executed that were issued on port 0.
+Port 0 handles integer arithmetic, SIMD and FP add Uops.
+.It Li UOPS_EXECUTED.PORT1
+.Pq Event B1H , Umask 02H
+Counts number of Uops executed that were issued on port 1.
+Port 1 handles integer arithmetic, SIMD, integer shift, FP multiply and
+FP divide Uops.
+.It Li UOPS_EXECUTED.PORT2_CORE
+.Pq Event B1H , Umask 04H
+Counts number of Uops executed that were issued on port 2.
+Port 2 handles the load Uops.
+This is a core count only and can not be collected per
+thread.
+.It Li UOPS_EXECUTED.PORT3_CORE
+.Pq Event B1H , Umask 08H
+Counts number of Uops executed that were issued on port 3.
+Port 3 handles store Uops.
+This is a core count only and can not be collected per thread.
+.It Li UOPS_EXECUTED.PORT4_CORE
+.Pq Event B1H , Umask 10H
+Counts number of Uops executed that where issued on port 4.
+Port 4 handles the value to be stored for the store Uops issued on port 3.
+This is a core count only and can not be collected per thread.
+.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES_NO_PORT5
+.Pq Event B1H , Umask 1FH
+Counts number of cycles there are one or more uops being executed and were
+issued on ports 0-4.
+This is a core count only and can not be collected per thread.
+.It Li UOPS_EXECUTED.PORT5
+.Pq Event B1H , Umask 20H
+Counts number of Uops executed that where issued on port 5.
+.It Li UOPS_EXECUTED.CORE_ACTIVE_CYCLES
+.Pq Event B1H , Umask 3FH
+Counts number of cycles there are one or more uops being executed on any
+ports.
+This is a core count only and can not be collected per thread.
+.It Li UOPS_EXECUTED.PORT015
+.Pq Event B1H , Umask 40H
+Counts number of Uops executed that where issued on port 0, 1, or 5.
+Use cmask=1, invert=1 to count stall cycles.
+.It Li UOPS_EXECUTED.PORT234
+.Pq Event B1H , Umask 80H
+Counts number of Uops executed that where issued on port 2, 3, or 4.
+.It Li OFFCORE_REQUESTS_SQ_FULL
+.Pq Event B2H , Umask 01H
+Counts number of cycles the SQ is full to handle off-core requests.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.DATA
+.Pq Event B3H , Umask 01H
+Counts weighted cycles of snoopq requests for data.
+Counter 0 only
+Use cmask=1 to count cycles not empty.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.INVALIDATE
+.Pq Event B3H , Umask 02H
+Counts weighted cycles of snoopq invalidate requests.
+Counter 0 only.
+Use cmask=1 to count cycles not empty.
+.It Li SNOOPQ_REQUESTS_OUTSTANDING.CODE
+.Pq Event B3H , Umask 04H
+Counts weighted cycles of snoopq requests for code.
+Counter 0 only.
+Use cmask=1 to count cycles not empty.
+.It Li SNOOPQ_REQUESTS.CODE
+.Pq Event B4H , Umask 01H
+Counts the number of snoop code requests.
+.It Li SNOOPQ_REQUESTS.DATA
+.Pq Event B4H , Umask 02H
+Counts the number of snoop data requests.
+.It Li SNOOPQ_REQUESTS.INVALIDATE
+.Pq Event B4H , Umask 04H
+Counts the number of snoop invalidate requests
+.It Li OFF_CORE_RESPONSE_0
+.Pq Event B7H , Umask 01H
+see Section 30.6.1.3, Off-core Response Performance Monitoring in the
+Processor Core.
+Requires programming MSR 01A6H.
+.It Li SNOOP_RESPONSE.HIT
+.Pq Event B8H , Umask 01H
+Counts HIT snoop response sent by this thread in response to a snoop
+request.
+.It Li SNOOP_RESPONSE.HITE
+.Pq Event B8H , Umask 02H
+Counts HIT E snoop response sent by this thread in response to a snoop
+request.
+.It Li SNOOP_RESPONSE.HITM
+.Pq Event B8H , Umask 04H
+Counts HIT M snoop response sent by this thread in response to a snoop
+request.
+.It Li OFF_CORE_RESPONSE_1
+.Pq Event BBH , Umask 01H
+see Section 30.6.1.3, Off-core Response Performance Monitoring in the
+Processor Core.
+Use MSR 01A7H.
+.It Li INST_RETIRED.ANY_P
+.Pq Event C0H , Umask 01H
+See Table A-1
+Notes: INST_RETIRED.ANY is counted by a designated fixed counter.
+INST_RETIRED.ANY_P is counted by a programmable counter and is an
+architectural performance event.
+Event is supported if CPUID.A.EBX[1] = 0.
+Counting: Faulting executions of GETSEC/VM entry/VM Exit/MWait will not
+count as retired instructions.
+.It Li INST_RETIRED.X87
+.Pq Event C0H , Umask 02H
+Counts the number of floating point computational operations retired
+floating point computational operations executed by the assist handler and
+sub-operations of complex floating point instructions like transcendental
+instructions.
+.It Li INST_RETIRED.MMX
+.Pq Event C0H , Umask 04H
+Counts the number of retired: MMX instructions.
+.It Li UOPS_RETIRED.ANY
+.Pq Event C2H , Umask 01H
+Counts the number of micro-ops retired, (macro-fused=1, micro- fused=2,
+others=1; maximum count of 8 per cycle).
+Most instructions are composed of one or two micro-ops.
+Some instructions are decoded into longer sequences
+such as repeat instructions, floating point transcendental instructions, and
+assists.
+Use cmask=1 and invert to count active cycles or stalled cycles
+.It Li UOPS_RETIRED.RETIRE_SLOTS
+.Pq Event C2H , Umask 02H
+Counts the number of retirement slots used each cycle
+.It Li UOPS_RETIRED.MACRO_FUSED
+.Pq Event C2H , Umask 04H
+Counts number of macro-fused uops retired.
+.It Li MACHINE_CLEARS.CYCLES
+.Pq Event C3H , Umask 01H
+Counts the cycles machine clear is asserted.
+.It Li MACHINE_CLEARS.MEM_ORDER
+.Pq Event C3H , Umask 02H
+Counts the number of machine clears due to memory order conflicts.
+.It Li MACHINE_CLEARS.SMC
+.Pq Event C3H , Umask 04H
+Counts the number of times that a program writes to a code section.
+Self-modifying code causes a sever penalty in all Intel 64 and IA-32
+processors.
+The modified cache line is written back to the L2 and L3caches.
+.It Li BR_INST_RETIRED.ANY_P
+.Pq Event C4H , Umask 00H
+See Table A-1.
+.It Li BR_INST_RETIRED.CONDITIONAL
+.Pq Event C4H , Umask 01H
+Counts the number of conditional branch instructions retired.
+.It Li BR_INST_RETIRED.NEAR_CALL
+.Pq Event C4H , Umask 02H
+Counts the number of direct & indirect near unconditional calls retired.
+.It Li BR_INST_RETIRED.ALL_BRANCHES
+.Pq Event C4H , Umask 04H
+Counts the number of branch instructions retired.
+.It Li BR_MISP_RETIRED.ANY_P
+.Pq Event C5H , Umask 00H
+See Table A-1.
+.It Li BR_MISP_RETIRED.CONDITIONAL
+.Pq Event C5H , Umask 01H
+Counts mispredicted conditional retired calls.
+.It Li BR_MISP_RETIRED.NEAR_CALL
+.Pq Event C5H , Umask 02H
+Counts mispredicted direct & indirect near unconditional retired calls.
+.It Li BR_MISP_RETIRED.ALL_BRANCHES
+.Pq Event C5H , Umask 04H
+Counts all mispredicted retired calls.
+.It Li SSEX_UOPS_RETIRED.PACKED_SINGLE
+.Pq Event C7H , Umask 01H
+Counts SIMD packed single-precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.SCALAR_SINGLE
+.Pq Event C7H , Umask 02H
+Counts SIMD calar single-precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.PACKED_DOUBLE
+.Pq Event C7H , Umask 04H
+Counts SIMD packed double- precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.SCALAR_DOUBLE
+.Pq Event C7H , Umask 08H
+Counts SIMD scalar double-precision floating point Uops retired.
+.It Li SSEX_UOPS_RETIRED.VECTOR_INTEGER
+.Pq Event C7H , Umask 10H
+Counts 128-bit SIMD vector integer Uops retired.
+.It Li ITLB_MISS_RETIRED
+.Pq Event C8H , Umask 20H
+Counts the number of retired instructions that missed the ITLB when the
+instruction was fetched.
+.It Li MEM_LOAD_RETIRED.L1D_HIT
+.Pq Event CBH , Umask 01H
+Counts number of retired loads that hit the L1 data cache.
+.It Li MEM_LOAD_RETIRED.L2_HIT
+.Pq Event CBH , Umask 02H
+Counts number of retired loads that hit the L2 data cache.
+.It Li MEM_LOAD_RETIRED.L3_UNSHARED_HIT
+.Pq Event CBH , Umask 04H
+Counts number of retired loads that hit their own, unshared lines in the L3
+cache.
+.It Li MEM_LOAD_RETIRED.OTHER_CORE_L2_HIT_HITM
+.Pq Event CBH , Umask 08H
+Counts number of retired loads that hit in a sibling core's L2 (on die
+core).
+Since the L3 is inclusive of all cores on the package, this is an L3 hit.
+This counts both clean or modified hits.
+.It Li MEM_LOAD_RETIRED.L3_MISS
+.Pq Event CBH , Umask 10H
+Counts number of retired loads that miss the L3 cache.
+The load was satisfied by a remote socket, local memory or an IOH.
+.It Li MEM_LOAD_RETIRED.HIT_LFB
+.Pq Event CBH , Umask 40H
+Counts number of retired loads that miss the L1D and the address is located
+in an allocated line fill buffer and will soon be committed to cache.
+This is counting secondary L1D misses.
+.It Li MEM_LOAD_RETIRED.DTLB_MISS
+.Pq Event CBH , Umask 80H
+Counts the number of retired loads that missed the DTLB.
+The DTLB miss is not counted if the load operation causes a fault.
+This event counts loads from cacheable memory only.
+The event does not count loads by software prefetches.
+Counts both primary and secondary misses to the TLB.
+.It Li FP_MMX_TRANS.TO_FP
+.Pq Event CCH , Umask 01H
+Counts the first floating-point instruction following any MMX instruction.
+You can use this event to estimate the penalties for the transitions between
+floating-point and MMX technology states.
+.It Li FP_MMX_TRANS.TO_MMX
+.Pq Event CCH , Umask 02H
+Counts the first MMX instruction following a floating-point instruction.
+You can use this event to estimate the penalties for the transitions between
+floating-point and MMX technology states.
+.It Li FP_MMX_TRANS.ANY
+.Pq Event CCH , Umask 03H
+Counts all transitions from floating point to MMX instructions and from MMX
+instructions to floating point instructions.
+You can use this event to estimate the penalties for the transitions between
+floating-point and MMX technology states.
+.It Li MACRO_INSTS.DECODED
+.Pq Event D0H , Umask 01H
+Counts the number of instructions decoded, (but not necessarily executed or
+retired).
+.It Li UOPS_DECODED.STALL_CYCLES
+.Pq Event D1H , Umask 01H
+Counts the cycles of decoder stalls.
+.It Li UOPS_DECODED.MS
+.Pq Event D1H , Umask 02H
+Counts the number of Uops decoded by the Microcode Sequencer, MS.
+The MS delivers uops when the instruction is more than 4 uops long or a
+microcode assist is occurring.
+.It Li UOPS_DECODED.ESP_FOLDING
+.Pq Event D1H , Umask 04H
+Counts number of stack pointer (ESP) instructions decoded: push , pop , call
+, ret, etc. ESP instructions do not generate a Uop to increment or decrement
+ESP.
+Instead, they update an ESP_Offset register that keeps track of the
+delta to the current value of the ESP register.
+.It Li UOPS_DECODED.ESP_SYNC
+.Pq Event D1H , Umask 08H
+Counts number of stack pointer (ESP) sync operations where an ESP
+instruction is corrected by adding the ESP offset register to the current
+value of the ESP register.
+.It Li RAT_STALLS.FLAGS
+.Pq Event D2H , Umask 01H
+Counts the number of cycles during which execution stalled due to several
+reasons, one of which is a partial flag register stall.
+A partial register
+stall may occur when two conditions are met: 1) an instruction modifies
+some, but not all, of the flags in the flag register and 2) the next
+instruction, which depends on flags, depends on flags that were not modified
+by this instruction.
+.It Li RAT_STALLS.REGISTERS
+.Pq Event D2H , Umask 02H
+This event counts the number of cycles instruction execution latency became
+longer than the defined latency because the instruction used a register that
+was partially written by previous instruction.
+.It Li RAT_STALLS.ROB_READ_PORT
+.Pq Event D2H , Umask 04H
+Counts the number of cycles when ROB read port stalls occurred, which did
+not allow new micro-ops to enter the out-of-order pipeline.
+Note that, at
+this stage in the pipeline, additional stalls may occur at the same cycle
+and prevent the stalled micro-ops from entering the pipe.
+In such a case,
+micro-ops retry entering the execution pipe in the next cycle and the
+ROB-read port stall is counted again.
+.It Li RAT_STALLS.SCOREBOARD
+.Pq Event D2H , Umask 08H
+Counts the cycles where we stall due to microarchitecturally required
+serialization.
+Microcode scoreboarding stalls.
+.It Li RAT_STALLS.ANY
+.Pq Event D2H , Umask 0FH
+Counts all Register Allocation Table stall cycles due to: Cycles when ROB
+read port stalls occurred, which did not allow new micro-ops to enter the
+execution pipe.
+Cycles when partial register stalls occurred Cycles when
+flag stalls occurred Cycles floating-point unit (FPU) status word stalls
+occurred.
+To count each of these conditions separately use the events:
+RAT_STALLS.ROB_READ_PORT, RAT_STALLS.PARTIAL, RAT_STALLS.FLAGS, and
+RAT_STALLS.FPSW.
+.It Li SEG_RENAME_STALLS
+.Pq Event D4H , Umask 01H
+Counts the number of stall cycles due to the lack of renaming resources for
+the ES, DS, FS, and GS segment registers.
+If a segment is renamed but not
+retired and a second update to the same segment occurs, a stall occurs in
+the front- end of the pipeline until the renamed segment retires.
+.It Li ES_REG_RENAMES
+.Pq Event D5H , Umask 01H
+Counts the number of times the ES segment register is renamed.
+.It Li UOP_UNFUSION
+.Pq Event DBH , Umask 01H
+Counts unfusion events due to floating point exception to a fused uop.
+.It Li BR_INST_DECODED
+.Pq Event E0H , Umask 01H
+Counts the number of branch instructions decoded.
+.It Li BPU_MISSED_CALL_RET
+.Pq Event E5H , Umask 01H
+Counts number of times the Branch Prediction Unit missed predicting a call
+or return branch.
+.It Li BACLEAR.CLEAR
+.Pq Event E6H , Umask 01H
+Counts the number of times the front end is resteered, mainly when the
+Branch Prediction Unit cannot provide a correct prediction and this is
+corrected by the Branch Address Calculator at the front end.
+This can occur
+if the code has many branches such that they cannot be consumed by the BPU.
+Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble
+in the instruction fetch pipeline.
+The effect on total execution time depends on the surrounding code.
+.It Li BACLEAR.BAD_TARGET
+.Pq Event E6H , Umask 02H
+Counts number of Branch Address Calculator clears (BACLEAR) asserted due to
+conditional branch instructions in which there was a target hit but the
+direction was wrong.
+Each BACLEAR asserted by the BAC generates
+approximately an 8 cycle bubble in the instruction fetch pipeline.
+.It Li BPU_CLEARS.EARLY
+.Pq Event E8H , Umask 01H
+Counts early (normal) Branch Prediction Unit clears: BPU predicted a taken
+branch after incorrectly assuming that it was not taken.
+The BPU clear leads to 2 cycle bubble in the Front End.
+.It Li BPU_CLEARS.LATE
+.Pq Event E8H , Umask 02H
+Counts late Branch Prediction Unit clears due to Most Recently Used
+conflicts.
+The PBU clear leads to a 3 cycle bubble in the Front End.
+.It Li THREAD_ACTIVE
+.Pq Event ECH , Umask 01H
+Counts cycles threads are active.
+.It Li L2_TRANSACTIONS.LOAD
+.Pq Event F0H , Umask 01H
+Counts L2 load operations due to HW prefetch or demand loads.
+.It Li L2_TRANSACTIONS.RFO
+.Pq Event F0H , Umask 02H
+Counts L2 RFO operations due to HW prefetch or demand RFOs.
+.It Li L2_TRANSACTIONS.IFETCH
+.Pq Event F0H , Umask 04H
+Counts L2 instruction fetch operations due to HW prefetch or demand ifetch.
+.It Li L2_TRANSACTIONS.PREFETCH
+.Pq Event F0H , Umask 08H
+Counts L2 prefetch operations.
+.It Li L2_TRANSACTIONS.L1D_WB
+.Pq Event F0H , Umask 10H
+Counts L1D writeback operations to the L2.
+.It Li L2_TRANSACTIONS.FILL
+.Pq Event F0H , Umask 20H
+Counts L2 cache line fill operations due to load, RFO, L1D writeback or
+prefetch.
+.It Li L2_TRANSACTIONS.WB
+.Pq Event F0H , Umask 40H
+Counts L2 writeback operations to the L3.
+.It Li L2_TRANSACTIONS.ANY
+.Pq Event F0H , Umask 80H
+Counts all L2 cache operations.
+.It Li L2_LINES_IN.S_STATE
+.Pq Event F1H , Umask 02H
+Counts the number of cache lines allocated in the L2 cache in the S (shared)
+state.
+.It Li L2_LINES_IN.E_STATE
+.Pq Event F1H , Umask 04H
+Counts the number of cache lines allocated in the L2 cache in the E
+(exclusive) state.
+.It Li L2_LINES_IN.ANY
+.Pq Event F1H , Umask 07H
+Counts the number of cache lines allocated in the L2 cache.
+.It Li L2_LINES_OUT.DEMAND_CLEAN
+.Pq Event F2H , Umask 01H
+Counts L2 clean cache lines evicted by a demand request.
+.It Li L2_LINES_OUT.DEMAND_DIRTY
+.Pq Event F2H , Umask 02H
+Counts L2 dirty (modified) cache lines evicted by a demand request.
+.It Li L2_LINES_OUT.PREFETCH_CLEAN
+.Pq Event F2H , Umask 04H
+Counts L2 clean cache line evicted by a prefetch request.
+.It Li L2_LINES_OUT.PREFETCH_DIRTY
+.Pq Event F2H , Umask 08H
+Counts L2 modified cache line evicted by a prefetch request.
+.It Li L2_LINES_OUT.ANY
+.Pq Event F2H , Umask 0FH
+Counts all L2 cache lines evicted for any reason.
+.It Li SQ_MISC.LRU_HINTS
+.Pq Event F4H , Umask 04H
+Counts number of Super Queue LRU hints sent to L3.
+.It Li SQ_MISC.SPLIT_LOCK
+.Pq Event F4H , Umask 10H
+Counts the number of SQ lock splits across a cache line.
+.It Li SQ_FULL_STALL_CYCLES
+.Pq Event F6H , Umask 01H
+Counts cycles the Super Queue is full.
+Neither of the threads on this core will be able to access the uncore.
+.It Li FP_ASSIST.ALL
+.Pq Event F7H , Umask 01H
+Counts the number of floating point operations executed that required
+micro-code assist intervention.
+Assists are required in the following cases:
+SSE instructions, (Denormal input when the DAZ flag is off or Underflow
+result when the FTZ flag is off): x87 instructions, (NaN or denormal are
+loaded to a register or used as input from memory, Division by 0 or
+Underflow output).
+.It Li FP_ASSIST.OUTPUT
+.Pq Event F7H , Umask 02H
+Counts number of floating point micro-code assist when the output value
+(destination register) is invalid.
+.It Li FP_ASSIST.INPUT
+.Pq Event F7H , Umask 04H
+Counts number of floating point micro-code assist when the input value (one
+of the source operands to an FP instruction) is invalid.
+.It Li SIMD_INT_64.PACKED_MPY
+.Pq Event FDH , Umask 01H
+Counts number of SID integer 64 bit packed multiply operations.
+.It Li SIMD_INT_64.PACKED_SHIFT
+.Pq Event FDH , Umask 02H
+Counts number of SID integer 64 bit packed shift operations.
+.It Li SIMD_INT_64.PACK
+.Pq Event FDH , Umask 04H
+Counts number of SID integer 64 bit pack operations.
+.It Li SIMD_INT_64.UNPACK
+.Pq Event FDH , Umask 08H
+Counts number of SID integer 64 bit unpack operations.
+.It Li SIMD_INT_64.PACKED_LOGICAL
+.Pq Event FDH , Umask 10H
+Counts number of SID integer 64 bit logical operations.
+.It Li SIMD_INT_64.PACKED_ARITH
+.Pq Event FDH , Umask 20H
+Counts number of SID integer 64 bit arithmetic operations.
+.It Li SIMD_INT_64.SHUFFLE_MOVE
+.Pq Event FDH , Umask 40H
+Counts number of SID integer 64 bit shift or move operations.
+.El
+.Sh SEE ALSO
+.Xr pmc 3 ,
+.Xr pmc.atom 3 ,
+.Xr pmc.core 3 ,
+.Xr pmc.corei7 3 ,
+.Xr pmc.corei7uc 3 ,
+.Xr pmc.iaf 3 ,
+.Xr pmc.k7 3 ,
+.Xr pmc.k8 3 ,
+.Xr pmc.p4 3 ,
+.Xr pmc.p5 3 ,
+.Xr pmc.p6 3 ,
+.Xr pmc.soft 3 ,
+.Xr pmc.tsc 3 ,
+.Xr pmc.ucf 3 ,
+.Xr pmc.westmereuc 3 ,
+.Xr pmc_cpuinfo 3 ,
+.Xr pmclog 3 ,
+.Xr hwpmc 4
+.Sh HISTORY
+The
+.Nm pmc
+library first appeared in
+.Fx 6.0 .
+.Sh AUTHORS
+The
+.Lb libpmc
+library was written by
+.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org .