hqemu - HQEMU

	Commit message (Collapse)	Author	Age	Files	Lines
*	implementing victim TLB for QEMU system emulated TLB	Xin Tong	2014-09-01	1	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	QEMU system mode page table walks are expensive. Taken by running QEMU qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a 4-level page tables in guest Linux OS takes ~450 X86 instructions on average. QEMU system mode TLB is implemented using a directly-mapped hashtable. This structure suffers from conflict misses. Increasing the associativity of the TLB may not be the solution to conflict misses as all the ways may have to be walked in serial. A victim TLB is a TLB used to hold translations evicted from the primary TLB upon replacement. The victim TLB lies between the main TLB and its refill path. Victim TLB is of greater associativity (fully associative in this patch). It takes longer to lookup the victim TLB, but its likely better than a full page table walk. The memory translation path is changed as follows : Before Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. TLB refill. 5. Do the memory access. 6. Return to code cache. After Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. Victim TLB lookup. 5. If victim TLB misses, TLB refill 6. Do the memory access. 7. Return to code cache The advantage is that victim TLB can offer more associativity to a directly mapped TLB and thus potentially fewer page table walks while still keeping the time taken to flush within reasonable limits. However, placing a victim TLB before the refill path increase TLB refill path as the victim TLB is consulted before the TLB refill. The performance results demonstrate that the pros outweigh the cons. some performance results taken on SPECINT2006 train datasets and kernel boot and qemu configure script on an Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine are shown in the Google Doc link below. https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing In summary, victim TLB improves the performance of qemu-system-x86_64 by 11% on average on SPECINT2006, kernelboot and qemu configscript and with highest improvement of in 26% in 456.hmmer. And victim TLB does not result in any performance degradation in any of the measured benchmarks. Furthermore, the implemented victim TLB is architecture independent and is expected to benefit other architectures in QEMU as well. Although there are measurement fluctuations, the performance improvement is very significant and by no means in the range of noises. Signed-off-by: Xin Tong <trent.tong@gmail.com> Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
*	softmmu: introduce cpu_ldst.h	Paolo Bonzini	2014-06-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This will collect all load and store helpers soon. For now it is just a replacement for softmmu_exec.h, which this patch stops including directly, but we also include it where this will be necessary in order to simplify the next patch. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softmmu: move softmmu_template.h out of include/	Paolo Bonzini	2014-06-05	1	-8/+8
\| \| \| \| \| \| \|	It is only included in cputlb.c now. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	softmmu: commonize helper definitions	Paolo Bonzini	2014-06-05	1	-2/+16
\| \| \| \| \| \| \| \| \|	They do not need to be in op_helper.c. Because cputlb.c now includes softmmu_template.h twice for each size, io_readX must be elided the second time through. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	cputlb: Fix regression with TCG interpreter (bug 1310324)	Stefan Weil	2014-06-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0f842f8a246f2b5b51a11c13f933bf7a90ae8e96 replaced GETPC_EXT() which was derived from GETPC() by GETRA_EXT() without fixing cputlb.c. A later patch replaced GETRA_EXT() by GETRA() in exec/softmmu_template.h which is included in cputlb.c. The TCG interpreter failed because the values returned by GETRA() were no longer explicitly set to 0. The redefinition of GETRA() introduced here fixes this. In addition, GETPC_ADJ which is also used in exec/softmmu_template.h is set to 0. Both changes reduce the compiled code size for cputlb.c by more than 100 bytes, so the normal TCG without interpreter also profits from the reduced code size and slightly faster code. Cc: qemu-stable@nongnu.org Reported-by: Giovanni Mascellani <gio@debian.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	cputlb: Change tlb_set_page() argument to CPUState	Andreas Färber	2014-03-13	1	-2/+2
\| \| \| \|	Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cputlb: Change tlb_flush() argument to CPUState	Andreas Färber	2014-03-13	1	-3/+3
\| \| \| \|	Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cputlb: Change tlb_flush_page() argument to CPUState	Andreas Färber	2014-03-13	1	-2/+2
\| \| \| \|	Signed-off-by: Andreas Färber <afaerber@suse.de>
*	exec: Change cpu_abort() argument to CPUState	Andreas Färber	2014-03-13	1	-1/+1
\| \| \| \|	Signed-off-by: Andreas Färber <afaerber@suse.de>
*	exec: Change memory_region_section_get_iotlb() argument to CPUState	Andreas Färber	2014-03-13	1	-1/+1
\| \| \| \| \| \|	It no longer needs CPUArchState since moving watchpoints to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cputlb: Change tlb_unprotect_code_phys() argument to CPUState	Andreas Färber	2014-03-13	1	-1/+1
\| \| \| \| \| \|	Note that the argument is unused. Signed-off-by: Andreas Färber <afaerber@suse.de>
*	translate-all: Change tb_flush_jmp_cache() argument to CPUState	Andreas Färber	2014-03-13	1	-1/+1
\| \| \| \|	Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cpu: Move tb_jmp_cache field from CPU_COMMON to CPUState	Andreas Färber	2014-03-13	1	-1/+1
\| \| \| \| \| \|	Clear it on reset. Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cpu: Add per-cpu address space	Edgar E. Iglesias	2014-02-11	1	-3/+4
\| \| \| \| \|	Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
*	exec: Make iotlb_to_region input an AS	Edgar E. Iglesias	2014-02-11	1	-1/+1
\| \| \| \| \|	Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
*	memory: split cpu_physical_memory_* functions to its own include	Juan Quintela	2014-01-13	1	-0/+1
\| \| \| \| \| \| \|	All the functions that use ram_addr_t should be here. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
*	memory: make cpu_physical_memory_reset_dirty() take a length parameter	Juan Quintela	2014-01-13	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	We have an end parameter in all the callers, and this make it coherent with the rest of cpu_physical_memory_* functions, that also take a length parameter. Once here, move the start/end calculation to tlb_reset_dirty_range_all() as we don't need it here anymore. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
*	memory: s/dirty/clean/ in cpu_physical_memory_is_dirty()	Juan Quintela	2014-01-13	1	-1/+2
\| \| \| \| \| \| \| \|	All uses except one really want the other meaning. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
*	memory: cpu_physical_memory_mask_dirty_range() always clears a single flag	Juan Quintela	2014-01-13	1	-2/+2
\| \| \| \| \| \| \| \|	Document it Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com>
*	memory: create function to set a single dirty bit	Juan Quintela	2014-01-13	1	-1/+1
\| \| \| \| \| \|	Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Orit Wasserman <owasserm@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
*	cputlb: Tidy memset() of arrays	Richard Henderson	2013-12-23	1	-1/+1
\| \| \| \| \| \| \| \| \|	Don't duplicate the array length computation in the memset() when plain sizeof() can produce the correct results. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cputlb: Use memset() when flushing entries	Richard Henderson	2013-12-23	1	-17/+2
\| \| \| \| \| \| \| \| \|	The size of tlb_table is 4k on a 64-bit host. For overwriting memory at this size, cacheline tricks can help. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cputlb: Remove dead function tlb_update_dirty()	liguang	2013-10-07	1	-15/+0
\| \| \| \| \| \|	Signed-off-by: liguang <lig.fnst@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cpu: Use QTAILQ for CPU list	Andreas Färber	2013-09-03	1	-1/+1
\| \| \| \| \| \| \|	Introduce CPU_FOREACH(), CPU_FOREACH_SAFE() and CPU_NEXT() shorthand macros. Signed-off-by: Andreas Färber <afaerber@suse.de>
*	cpu: Make first_cpu and next_cpu CPUState	Andreas Färber	2013-07-09	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Move next_cpu from CPU_COMMON to CPUState. Move first_cpu variable to qom/cpu.h. gdbstub needs to use CPUState::env_ptr for now. cpu_copy() no longer needs to save and restore cpu_next. Acked-by: Paolo Bonzini <pbonzini@redhat.com> [AF: Rebased, simplified cpu_copy()] Signed-off-by: Andreas Färber <afaerber@suse.de>
*	memory: return MemoryRegion from qemu_ram_addr_from_host	Paolo Bonzini	2013-07-04	1	-1/+1
\| \| \| \| \| \| \|	It will be needed in the next patch. Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	exec: move qemu_ram_addr_from_host_nofail to cputlb.c	Paolo Bonzini	2013-07-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \|	After the next patch it would not be used elsewhere anyway. Also, the _nofail and the standard versions of this function return different things, which is confusing. Removing the function from the public headers limits the confusion. Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	cpu: Turn cpu_unassigned_access() into a CPUState hook	Andreas Färber	2013-06-28	1	-6/+9
\| \| \| \| \| \| \|	Use it for all targets, but be careful not to pass invalid CPUState. cpu_single_env can be NULL, e.g. on Xen. Signed-off-by: Andreas Färber <afaerber@suse.de>
*	exec: Resolve subpages in one step except for IOTLB fills	Jan Kiszka	2013-06-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Except for the case of setting the IOTLB entry in TCG mode, we can avoid the subpage dispatching handlers and do the resolution directly on address_space_lookup_region. An IOTLB entry describes a full page, not only the region that the first access to a sub-divided page may return. This patch therefore introduces a special translation function, address_space_translate_for_iotlb, that avoids the subpage resolutions. In contrast, callers of the existing address_space_translate service will now always receive the terminal memory region section. This will be important for breaking the BQL and for enabling unaligned memory region. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	cputlb: fix debug logs	Hervé Poussineau	2013-06-14	1	-2/+2
\| \| \| \| \| \| \|	'pd' variable has been removed in 06ef3525e1f271b6a842781a05eace5cf63b95c2. Signed-off-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
*	memory: add address_space_translate	Paolo Bonzini	2013-05-29	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using phys_page_find to translate an AddressSpace to a MemoryRegionSection is unwieldy. It requires to pass the page index rather than the address, and later memory_region_section_addr has to be called. Replace memory_region_section_addr with a function that does all of it: call phys_page_find, compute the offset within the region, and check how big the current mapping is. This way, a large flat region can be written with a single lookup rather than a page at a time. address_space_translate will also provide a single point where IOMMU forwarding is implemented. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	cputlb: simplify tlb_set_page	Paolo Bonzini	2013-05-29	1	-8/+5
\| \| \| \| \| \| \|	The same "if" condition is repeated twice. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	cpu: Move current_tb field to CPUState	Andreas Färber	2013-02-16	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	Explictly NULL it on CPU reset since it was located before breakpoints. Change vapic_report_tpr_access() argument to CPUState. This also resolves the use of void* for cpu.h independence. Change vAPIC patch_instruction() argument to X86CPU. Signed-off-by: Andreas Färber <afaerber@suse.de>
*	exec: move include files to include/exec/	Paolo Bonzini	2012-12-19	1	-9/+9
\| \| \| \|	Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	Rename target_phys_addr_t to hwaddr	Avi Kivity	2012-10-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are reserved) and its purpose doesn't match the name (most target_phys_addr_t addresses are not target specific). Replace it with a finger-friendly, standards conformant hwaddr. Outstanding patchsets can be fixed up with the command git rebase -i --exec 'find -name "*.[ch]" \| xargs s/target_phys_addr_t/hwaddr/g' origin Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
*	memory: per-AddressSpace dispatch	Avi Kivity	2012-10-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we use a global radix tree to dispatch memory access. This only works with a single address space; to support multiple address spaces we make the radix tree a member of AddressSpace (via an intermediate structure AddressSpaceDispatch to avoid exposing too many internals). A side effect is that address_space_io also gains a dispatch table. When we remove all the pre-memory-API I/O registrations, we can use that for dispatching I/O and get rid of the original I/O dispatch. Signed-off-by: Avi Kivity <avi@redhat.com>
*	memory: rename 'exec-obsolete.h'	Avi Kivity	2012-10-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	exec-obsolete.h used to hold pre-memory-API functions that were used from device code prior to the transition to the memory API. Now that the transition is complete, the name no longer describes the file. The functions still need to be merged better into the memory core, but there's no danger of anyone using them. Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
*	Remove unused CONFIG_TCG_PASS_AREG0 and dead code	Blue Swirl	2012-09-15	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Now that CONFIG_TCG_PASS_AREG0 is enabled for all targets, remove dead code and support for !CONFIG_TCG_PASS_AREG0 case. Remove dyngen-exec.h and all references to it. Although included by hw/spapr_hcall.c, it does not seem to use it. Remove unused HELPER_CFLAGS. Signed-off-by: Blue Swirl <blauwirbel@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
*	cputlb.c: Fix out of date comment	Peter Maydell	2012-08-15	1	-1/+3
\| \| \| \| \| \| \| \|	The comment about the return address from get_page_addr_code() was well out of date as phys_ram_base has not existed for some time. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
*	cputlb: fix watchpoints handling	Max Filippov	2012-05-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Cleanup commit e554861766d9ae84dd5720baa4869f4ed711506f have changed code_address calculation in the tlb_set_page function in case of access to a page with a watchpoint. This caused QEMU segfault in the xtensa test_break unit test. Fix it by moving code_address assignment above memory_region_section_get_iotlb call. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
*	cputlb: prepare private memory API for public consumption	Blue Swirl	2012-05-01	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \|	Fold is_ram_rom and is_ram_rom_romd() into callers. Change is_romd() and section_addr() to take MemoryRegion instead of MemoryRegionSection for consistency and use memory_region_ prefix. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
*	cputlb: move TLB handling to a separate file	Blue Swirl	2012-05-01	1	-0/+362
	Move TLB handling and softmmu code load helpers to cputlb.c, compile only for softmmu targets. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>