1 files changed, 699 insertions, 0 deletions
diff --git a/src/qemu-tech.texi b/src/qemu-tech.texi
new file mode 100644
index 0000000..022017d
--- /dev/null
+++ b/src/qemu-tech.texi
@@ -0,0 +1,699 @@
+\input texinfo @c -*- texinfo -*-
+@c %**start of header
+@setfilename qemu-tech.info
+
+@documentlanguage en
+@documentencoding UTF-8
+
+@settitle QEMU Internals
+@exampleindent 0
+@paragraphindent 0
+@c %**end of header
+
+@ifinfo
+@direntry
+* QEMU Internals: (qemu-tech).   The QEMU Emulator Internals.
+@end direntry
+@end ifinfo
+
+@iftex
+@titlepage
+@sp 7
+@center @titlefont{QEMU Internals}
+@sp 3
+@end titlepage
+@end iftex
+
+@ifnottex
+@node Top
+@top
+
+@menu
+* Introduction::
+* QEMU Internals::
+* Regression Tests::
+* Index::
+@end menu
+@end ifnottex
+
+@contents
+
+@node Introduction
+@chapter Introduction
+
+@menu
+* intro_features::         Features
+* intro_x86_emulation::    x86 and x86-64 emulation
+* intro_arm_emulation::    ARM emulation
+* intro_mips_emulation::   MIPS emulation
+* intro_ppc_emulation::    PowerPC emulation
+* intro_sparc_emulation::  Sparc32 and Sparc64 emulation
+* intro_xtensa_emulation:: Xtensa emulation
+* intro_other_emulation::  Other CPU emulation
+@end menu
+
+@node intro_features
+@section Features
+
+QEMU is a FAST! processor emulator using a portable dynamic
+translator.
+
+QEMU has two operating modes:
+
+@itemize @minus
+
+@item
+Full system emulation. In this mode (full platform virtualization),
+QEMU emulates a full system (usually a PC), including a processor and
+various peripherals. It can be used to launch several different
+Operating Systems at once without rebooting the host machine or to
+debug system code.
+
+@item
+User mode emulation. In this mode (application level virtualization),
+QEMU can launch processes compiled for one CPU on another CPU, however
+the Operating Systems must match. This can be used for example to ease
+cross-compilation and cross-debugging.
+@end itemize
+
+As QEMU requires no host kernel driver to run, it is very safe and
+easy to use.
+
+QEMU generic features:
+
+@itemize
+
+@item User space only or full system emulation.
+
+@item Using dynamic translation to native code for reasonable speed.
+
+@item
+Working on x86, x86_64 and PowerPC32/64 hosts. Being tested on ARM,
+HPPA, Sparc32 and Sparc64. Previous versions had some support for
+Alpha and S390 hosts, but TCG (see below) doesn't support those yet.
+
+@item Self-modifying code support.
+
+@item Precise exceptions support.
+
+@item
+Floating point library supporting both full software emulation and
+native host FPU instructions.
+
+@end itemize
+
+QEMU user mode emulation features:
+@itemize
+@item Generic Linux system call converter, including most ioctls.
+
+@item clone() emulation using native CPU clone() to use Linux scheduler for threads.
+
+@item Accurate signal handling by remapping host signals to target signals.
+@end itemize
+
+Linux user emulator (Linux host only) can be used to launch the Wine
+Windows API emulator (@url{http://www.winehq.org}). A BSD user emulator for BSD
+hosts is under development. It would also be possible to develop a
+similar user emulator for Solaris.
+
+QEMU full system emulation features:
+@itemize
+@item
+QEMU uses a full software MMU for maximum portability.
+
+@item
+QEMU can optionally use an in-kernel accelerator, like kvm. The accelerators 
+execute some of the guest code natively, while
+continuing to emulate the rest of the machine.
+
+@item
+Various hardware devices can be emulated and in some cases, host
+devices (e.g. serial and parallel ports, USB, drives) can be used
+transparently by the guest Operating System. Host device passthrough
+can be used for talking to external physical peripherals (e.g. a
+webcam, modem or tape drive).
+
+@item
+Symmetric multiprocessing (SMP) even on a host with a single CPU. On a
+SMP host system, QEMU can use only one CPU fully due to difficulty in
+implementing atomic memory accesses efficiently.
+
+@end itemize
+
+@node intro_x86_emulation
+@section x86 and x86-64 emulation
+
+QEMU x86 target features:
+
+@itemize
+
+@item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
+LDT/GDT and IDT are emulated. VM86 mode is also supported to run
+DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3,
+and SSE4 as well as x86-64 SVM.
+
+@item Support of host page sizes bigger than 4KB in user mode emulation.
+
+@item QEMU can emulate itself on x86.
+
+@item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
+It can be used to test other x86 virtual CPUs.
+
+@end itemize
+
+Current QEMU limitations:
+
+@itemize
+
+@item Limited x86-64 support.
+
+@item IPC syscalls are missing.
+
+@item The x86 segment limits and access rights are not tested at every
+memory access (yet). Hopefully, very few OSes seem to rely on that for
+normal use.
+
+@end itemize
+
+@node intro_arm_emulation
+@section ARM emulation
+
+@itemize
+
+@item Full ARM 7 user emulation.
+
+@item NWFPE FPU support included in user Linux emulation.
+
+@item Can run most ARM Linux binaries.
+
+@end itemize
+
+@node intro_mips_emulation
+@section MIPS emulation
+
+@itemize
+
+@item The system emulation allows full MIPS32/MIPS64 Release 2 emulation,
+including privileged instructions, FPU and MMU, in both little and big
+endian modes.
+
+@item The Linux userland emulation can run many 32 bit MIPS Linux binaries.
+
+@end itemize
+
+Current QEMU limitations:
+
+@itemize
+
+@item Self-modifying code is not always handled correctly.
+
+@item 64 bit userland emulation is not implemented.
+
+@item The system emulation is not complete enough to run real firmware.
+
+@item The watchpoint debug facility is not implemented.
+
+@end itemize
+
+@node intro_ppc_emulation
+@section PowerPC emulation
+
+@itemize
+
+@item Full PowerPC 32 bit emulation, including privileged instructions,
+FPU and MMU.
+
+@item Can run most PowerPC Linux binaries.
+
+@end itemize
+
+@node intro_sparc_emulation
+@section Sparc32 and Sparc64 emulation
+
+@itemize
+
+@item Full SPARC V8 emulation, including privileged
+instructions, FPU and MMU. SPARC V9 emulation includes most privileged
+and VIS instructions, FPU and I/D MMU. Alignment is fully enforced.
+
+@item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and
+some 64-bit SPARC Linux binaries.
+
+@end itemize
+
+Current QEMU limitations:
+
+@itemize
+
+@item IPC syscalls are missing.
+
+@item Floating point exception support is buggy.
+
+@item Atomic instructions are not correctly implemented.
+
+@item There are still some problems with Sparc64 emulators.
+
+@end itemize
+
+@node intro_xtensa_emulation
+@section Xtensa emulation
+
+@itemize
+
+@item Core Xtensa ISA emulation, including most options: code density,
+loop, extended L32R, 16- and 32-bit multiplication, 32-bit division,
+MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor
+context, debug, multiprocessor synchronization,
+conditional store, exceptions, relocatable vectors, unaligned exception,
+interrupts (including high priority and timer), hardware alignment,
+region protection, region translation, MMU, windowed registers, thread
+pointer, processor ID.
+
+@item Not implemented options: data/instruction cache (including cache
+prefetch and locking), XLMI, processor interface. Also options not
+covered by the core ISA (e.g. FLIX, wide branches) are not implemented.
+
+@item Can run most Xtensa Linux binaries.
+
+@item New core configuration that requires no additional instructions
+may be created from overlay with minimal amount of hand-written code.
+
+@end itemize
+
+@node intro_other_emulation
+@section Other CPU emulation
+
+In addition to the above, QEMU supports emulation of other CPUs with
+varying levels of success. These are:
+
+@itemize
+
+@item
+Alpha
+@item
+CRIS
+@item
+M68k
+@item
+SH4
+@end itemize
+
+@node QEMU Internals
+@chapter QEMU Internals
+
+@menu
+* QEMU compared to other emulators::
+* Portable dynamic translation::
+* Condition code optimisations::
+* CPU state optimisations::
+* Translation cache::
+* Direct block chaining::
+* Self-modifying code and translated code invalidation::
+* Exception support::
+* MMU emulation::
+* Device emulation::
+* Hardware interrupts::
+* User emulation specific details::
+* Bibliography::
+@end menu
+
+@node QEMU compared to other emulators
+@section QEMU compared to other emulators
+
+Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than
+bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC
+emulation while QEMU can emulate several processors.
+
+Like Valgrind [2], QEMU does user space emulation and dynamic
+translation. Valgrind is mainly a memory debugger while QEMU has no
+support for it (QEMU could be used to detect out of bound memory
+accesses as Valgrind, but it has no support to track uninitialised data
+as Valgrind does). The Valgrind dynamic translator generates better code
+than QEMU (in particular it does register allocation) but it is closely
+tied to an x86 host and target and has no support for precise exceptions
+and system emulation.
+
+EM86 [3] is the closest project to user space QEMU (and QEMU still uses
+some of its code, in particular the ELF file loader). EM86 was limited
+to an alpha host and used a proprietary and slow interpreter (the
+interpreter part of the FX!32 Digital Win32 code translator [4]).
+
+TWIN from Willows Software was a Windows API emulator like Wine. It is less
+accurate than Wine but includes a protected mode x86 interpreter to launch
+x86 Windows executables. Such an approach has greater potential because most
+of the Windows API is executed natively but it is far more difficult to
+develop because all the data structures and function parameters exchanged
+between the API and the x86 code must be converted.
+
+User mode Linux [5] was the only solution before QEMU to launch a
+Linux kernel as a process while not needing any host kernel
+patches. However, user mode Linux requires heavy kernel patches while
+QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
+slower.
+
+The Plex86 [6] PC virtualizer is done in the same spirit as the now
+obsolete qemu-fast system emulator. It requires a patched Linux kernel
+to work (you cannot launch the same kernel on your PC), but the
+patches are really small. As it is a PC virtualizer (no emulation is
+done except for some privileged instructions), it has the potential of
+being faster than QEMU. The downside is that a complicated (and
+potentially unsafe) host kernel patch is needed.
+
+The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster
+than QEMU (without virtualization), but they all need specific, proprietary
+and potentially unsafe host drivers. Moreover, they are unable to
+provide cycle exact simulation as an emulator can.
+
+VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC
+[12] uses QEMU to simulate a system where some hardware devices are
+developed in SystemC.
+
+@node Portable dynamic translation
+@section Portable dynamic translation
+
+QEMU is a dynamic translator. When it first encounters a piece of code,
+it converts it to the host instruction set. Usually dynamic translators
+are very complicated and highly CPU dependent. QEMU uses some tricks
+which make it relatively easily portable and simple while achieving good
+performances.
+
+After the release of version 0.9.1, QEMU switched to a new method of
+generating code, Tiny Code Generator or TCG. TCG relaxes the
+dependency on the exact version of the compiler used. The basic idea
+is to split every target instruction into a couple of RISC-like TCG
+ops (see @code{target-i386/translate.c}). Some optimizations can be
+performed at this stage, including liveness analysis and trivial
+constant expression evaluation. TCG ops are then implemented in the
+host CPU back end, also known as TCG target (see
+@code{tcg/i386/tcg-target.c}). For more information, please take a
+look at @code{tcg/README}.
+
+@node Condition code optimisations
+@section Condition code optimisations
+
+Lazy evaluation of CPU condition codes (@code{EFLAGS} register on x86)
+is important for CPUs where every instruction sets the condition
+codes. It tends to be less important on conventional RISC systems
+where condition codes are only updated when explicitly requested. On
+Sparc64, costly update of both 32 and 64 bit condition codes can be
+avoided with lazy evaluation.
+
+Instead of computing the condition codes after each x86 instruction,
+QEMU just stores one operand (called @code{CC_SRC}), the result
+(called @code{CC_DST}) and the type of operation (called
+@code{CC_OP}). When the condition codes are needed, the condition
+codes can be calculated using this information. In addition, an
+optimized calculation can be performed for some instruction types like
+conditional branches.
+
+@code{CC_OP} is almost never explicitly set in the generated code
+because it is known at translation time.
+
+The lazy condition code evaluation is used on x86, m68k, cris and
+Sparc. ARM uses a simplified variant for the N and Z flags.
+
+@node CPU state optimisations
+@section CPU state optimisations
+
+The target CPUs have many internal states which change the way it
+evaluates instructions. In order to achieve a good speed, the
+translation phase considers that some state information of the virtual
+CPU cannot change in it. The state is recorded in the Translation
+Block (TB). If the state changes (e.g. privilege level), a new TB will
+be generated and the previous TB won't be used anymore until the state
+matches the state recorded in the previous TB. For example, if the SS,
+DS and ES segments have a zero base, then the translator does not even
+generate an addition for the segment base.
+
+[The FPU stack pointer register is not handled that way yet].
+
+@node Translation cache
+@section Translation cache
+
+A 32 MByte cache holds the most recently used translations. For
+simplicity, it is completely flushed when it is full. A translation unit
+contains just a single basic block (a block of x86 instructions
+terminated by a jump or by a virtual CPU state change which the
+translator cannot deduce statically).
+
+@node Direct block chaining
+@section Direct block chaining
+
+After each translated basic block is executed, QEMU uses the simulated
+Program Counter (PC) and other cpu state information (such as the CS
+segment base value) to find the next basic block.
+
+In order to accelerate the most common cases where the new simulated PC
+is known, QEMU can patch a basic block so that it jumps directly to the
+next one.
+
+The most portable code uses an indirect jump. An indirect jump makes
+it easier to make the jump target modification atomic. On some host
+architectures (such as x86 or PowerPC), the @code{JUMP} opcode is
+directly patched so that the block chaining has no overhead.
+
+@node Self-modifying code and translated code invalidation
+@section Self-modifying code and translated code invalidation
+
+Self-modifying code is a special challenge in x86 emulation because no
+instruction cache invalidation is signaled by the application when code
+is modified.
+
+When translated code is generated for a basic block, the corresponding
+host page is write protected if it is not already read-only. Then, if
+a write access is done to the page, Linux raises a SEGV signal. QEMU
+then invalidates all the translated code in the page and enables write
+accesses to the page.
+
+Correct translated code invalidation is done efficiently by maintaining
+a linked list of every translated block contained in a given page. Other
+linked lists are also maintained to undo direct block chaining.
+
+On RISC targets, correctly written software uses memory barriers and
+cache flushes, so some of the protection above would not be
+necessary. However, QEMU still requires that the generated code always
+matches the target instructions in memory in order to handle
+exceptions correctly.
+
+@node Exception support
+@section Exception support
+
+longjmp() is used when an exception such as division by zero is
+encountered.
+
+The host SIGSEGV and SIGBUS signal handlers are used to get invalid
+memory accesses. The simulated program counter is found by
+retranslating the corresponding basic block and by looking where the
+host program counter was at the exception point.
+
+The virtual CPU cannot retrieve the exact @code{EFLAGS} register because
+in some cases it is not computed because of condition code
+optimisations. It is not a big concern because the emulated code can
+still be restarted in any cases.
+
+@node MMU emulation
+@section MMU emulation
+
+For system emulation QEMU supports a soft MMU. In that mode, the MMU
+virtual to physical address translation is done at every memory
+access. QEMU uses an address translation cache to speed up the
+translation.
+
+In order to avoid flushing the translated code each time the MMU
+mappings change, QEMU uses a physically indexed translation cache. It
+means that each basic block is indexed with its physical address.
+
+When MMU mappings change, only the chaining of the basic blocks is
+reset (i.e. a basic block can no longer jump directly to another one).
+
+@node Device emulation
+@section Device emulation
+
+Systems emulated by QEMU are organized by boards. At initialization
+phase, each board instantiates a number of CPUs, devices, RAM and
+ROM. Each device in turn can assign I/O ports or memory areas (for
+MMIO) to its handlers. When the emulation starts, an access to the
+ports or MMIO memory areas assigned to the device causes the
+corresponding handler to be called.
+
+RAM and ROM are handled more optimally, only the offset to the host
+memory needs to be added to the guest address.
+
+The video RAM of VGA and other display cards is special: it can be
+read or written directly like RAM, but write accesses cause the memory
+to be marked with VGA_DIRTY flag as well.
+
+QEMU supports some device classes like serial and parallel ports, USB,
+drives and network devices, by providing APIs for easier connection to
+the generic, higher level implementations. The API hides the
+implementation details from the devices, like native device use or
+advanced block device formats like QCOW.
+
+Usually the devices implement a reset method and register support for
+saving and loading of the device state. The devices can also use
+timers, especially together with the use of bottom halves (BHs).
+
+@node Hardware interrupts
+@section Hardware interrupts
+
+In order to be faster, QEMU does not check at every basic block if a
+hardware interrupt is pending. Instead, the user must asynchronously
+call a specific function to tell that an interrupt is pending. This
+function resets the chaining of the currently executing basic
+block. It ensures that the execution will return soon in the main loop
+of the CPU emulator. Then the main loop can test if the interrupt is
+pending and handle it.
+
+@node User emulation specific details
+@section User emulation specific details
+
+@subsection Linux system call translation
+
+QEMU includes a generic system call translator for Linux. It means that
+the parameters of the system calls can be converted to fix the
+endianness and 32/64 bit issues. The IOCTLs are converted with a generic
+type description system (see @file{ioctls.h} and @file{thunk.c}).
+
+QEMU supports host CPUs which have pages bigger than 4KB. It records all
+the mappings the process does and try to emulated the @code{mmap()}
+system calls in cases where the host @code{mmap()} call would fail
+because of bad page alignment.
+
+@subsection Linux signals
+
+Normal and real-time signals are queued along with their information
+(@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
+request is done to the virtual CPU. When it is interrupted, one queued
+signal is handled by generating a stack frame in the virtual CPU as the
+Linux kernel does. The @code{sigreturn()} system call is emulated to return
+from the virtual signal handler.
+
+Some signals (such as SIGALRM) directly come from the host. Other
+signals are synthesized from the virtual CPU exceptions such as SIGFPE
+when a division by zero is done (see @code{main.c:cpu_loop()}).
+
+The blocked signal mask is still handled by the host Linux kernel so
+that most signal system calls can be redirected directly to the host
+Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
+calls need to be fully emulated (see @file{signal.c}).
+
+@subsection clone() system call and threads
+
+The Linux clone() system call is usually used to create a thread. QEMU
+uses the host clone() system call so that real host threads are created
+for each emulated thread. One virtual CPU instance is created for each
+thread.
+
+The virtual x86 CPU atomic operations are emulated with a global lock so
+that their semantic is preserved.
+
+Note that currently there are still some locking issues in QEMU. In
+particular, the translated cache flush is not protected yet against
+reentrancy.
+
+@subsection Self-virtualization
+
+QEMU was conceived so that ultimately it can emulate itself. Although
+it is not very useful, it is an important test to show the power of the
+emulator.
+
+Achieving self-virtualization is not easy because there may be address
+space conflicts. QEMU user emulators solve this problem by being an
+executable ELF shared object as the ld-linux.so ELF interpreter. That
+way, it can be relocated at load time.
+
+@node Bibliography
+@section Bibliography
+
+@table @asis
+
+@item [1]
+@url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
+by Kevin Lawton et al.
+
+@item [2]
+@url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger
+for GNU/Linux.
+
+@item [3]
+@url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html},
+the EM86 x86 emulator on Alpha-Linux.
+
+@item [4]
+@url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf},
+DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
+Chernoff and Ray Hookway.
+
+@item [5]
+@url{http://user-mode-linux.sourceforge.net/},
+The User-mode Linux Kernel.
+
+@item [6]
+@url{http://www.plex86.org/},
+The new Plex86 project.
+
+@item [7]
+@url{http://www.vmware.com/},
+The VMWare PC virtualizer.
+
+@item [8]
+@url{https://www.microsoft.com/download/details.aspx?id=3702},
+The VirtualPC PC virtualizer.
+
+@item [9]
+@url{http://virtualbox.org/},
+The VirtualBox PC virtualizer.
+
+@item [10]
+@url{http://www.xen.org/},
+The Xen hypervisor.
+
+@item [11]
+@url{http://www.linux-kvm.org/},
+Kernel Based Virtual Machine (KVM).
+
+@item [12]
+@url{http://www.greensocs.com/projects/QEMUSystemC},
+QEMU-SystemC, a hardware co-simulator.
+
+@end table
+
+@node Regression Tests
+@chapter Regression Tests
+
+In the directory @file{tests/}, various interesting testing programs
+are available. They are used for regression testing.
+
+@menu
+* test-i386::
+* linux-test::
+@end menu
+
+@node test-i386
+@section @file{test-i386}
+
+This program executes most of the 16 bit and 32 bit x86 instructions and
+generates a text output. It can be compared with the output obtained with
+a real CPU or another emulator. The target @code{make test} runs this
+program and a @code{diff} on the generated output.
+
+The Linux system call @code{modify_ldt()} is used to create x86 selectors
+to test some 16 bit addressing and 32 bit with segmentation cases.
+
+The Linux system call @code{vm86()} is used to test vm86 emulation.
+
+Various exceptions are raised to test most of the x86 user space
+exception reporting.
+
+@node linux-test
+@section @file{linux-test}
+
+This program tests various Linux system calls. It is used to verify
+that the system call parameters are correctly converted between target
+and host CPUs.
+
+@node Index
+@chapter Index
+@printindex cp
+
+@bye