diff options
author | Tony Luck <tony.luck@intel.com> | 2006-06-21 14:50:10 -0700 |
---|---|---|
committer | Tony Luck <tony.luck@intel.com> | 2006-06-21 14:50:10 -0700 |
commit | 1323523f505606cfd24af6122369afddefc3b09d (patch) | |
tree | a3238a27220dd91ec0918478683e59e48605865f /Documentation | |
parent | 9ba89334552b96e2127dcafb1c46ce255ecf2667 (diff) | |
parent | 32e62c636a728cb39c0b3bd191286f2ca65d4028 (diff) | |
download | op-kernel-dev-1323523f505606cfd24af6122369afddefc3b09d.zip op-kernel-dev-1323523f505606cfd24af6122369afddefc3b09d.tar.gz |
Pull rework-memory-attribute-aliasing into release branch
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ia64/aliasing.txt | 208 |
1 files changed, 208 insertions, 0 deletions
diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt new file mode 100644 index 0000000..38f9a52 --- /dev/null +++ b/Documentation/ia64/aliasing.txt @@ -0,0 +1,208 @@ + MEMORY ATTRIBUTE ALIASING ON IA-64 + + Bjorn Helgaas + <bjorn.helgaas@hp.com> + May 4, 2006 + + +MEMORY ATTRIBUTES + + Itanium supports several attributes for virtual memory references. + The attribute is part of the virtual translation, i.e., it is + contained in the TLB entry. The ones of most interest to the Linux + kernel are: + + WB Write-back (cacheable) + UC Uncacheable + WC Write-coalescing + + System memory typically uses the WB attribute. The UC attribute is + used for memory-mapped I/O devices. The WC attribute is uncacheable + like UC is, but writes may be delayed and combined to increase + performance for things like frame buffers. + + The Itanium architecture requires that we avoid accessing the same + page with both a cacheable mapping and an uncacheable mapping[1]. + + The design of the chipset determines which attributes are supported + on which regions of the address space. For example, some chipsets + support either WB or UC access to main memory, while others support + only WB access. + +MEMORY MAP + + Platform firmware describes the physical memory map and the + supported attributes for each region. At boot-time, the kernel uses + the EFI GetMemoryMap() interface. ACPI can also describe memory + devices and the attributes they support, but Linux/ia64 currently + doesn't use this information. + + The kernel uses the efi_memmap table returned from GetMemoryMap() to + learn the attributes supported by each region of physical address + space. Unfortunately, this table does not completely describe the + address space because some machines omit some or all of the MMIO + regions from the map. + + The kernel maintains another table, kern_memmap, which describes the + memory Linux is actually using and the attribute for each region. + This contains only system memory; it does not contain MMIO space. + + The kern_memmap table typically contains only a subset of the system + memory described by the efi_memmap. Linux/ia64 can't use all memory + in the system because of constraints imposed by the identity mapping + scheme. + + The efi_memmap table is preserved unmodified because the original + boot-time information is required for kexec. + +KERNEL IDENTITY MAPPINGS + + Linux/ia64 identity mappings are done with large pages, currently + either 16MB or 64MB, referred to as "granules." Cacheable mappings + are speculative[2], so the processor can read any location in the + page at any time, independent of the programmer's intentions. This + means that to avoid attribute aliasing, Linux can create a cacheable + identity mapping only when the entire granule supports cacheable + access. + + Therefore, kern_memmap contains only full granule-sized regions that + can referenced safely by an identity mapping. + + Uncacheable mappings are not speculative, so the processor will + generate UC accesses only to locations explicitly referenced by + software. This allows UC identity mappings to cover granules that + are only partially populated, or populated with a combination of UC + and WB regions. + +USER MAPPINGS + + User mappings are typically done with 16K or 64K pages. The smaller + page size allows more flexibility because only 16K or 64K has to be + homogeneous with respect to memory attributes. + +POTENTIAL ATTRIBUTE ALIASING CASES + + There are several ways the kernel creates new mappings: + + mmap of /dev/mem + + This uses remap_pfn_range(), which creates user mappings. These + mappings may be either WB or UC. If the region being mapped + happens to be in kern_memmap, meaning that it may also be mapped + by a kernel identity mapping, the user mapping must use the same + attribute as the kernel mapping. + + If the region is not in kern_memmap, the user mapping should use + an attribute reported as being supported in the EFI memory map. + + Since the EFI memory map does not describe MMIO on some + machines, this should use an uncacheable mapping as a fallback. + + mmap of /sys/class/pci_bus/.../legacy_mem + + This is very similar to mmap of /dev/mem, except that legacy_mem + only allows mmap of the one megabyte "legacy MMIO" area for a + specific PCI bus. Typically this is the first megabyte of + physical address space, but it may be different on machines with + several VGA devices. + + "X" uses this to access VGA frame buffers. Using legacy_mem + rather than /dev/mem allows multiple instances of X to talk to + different VGA cards. + + The /dev/mem mmap constraints apply. + + However, since this is for mapping legacy MMIO space, WB access + does not make sense. This matters on machines without legacy + VGA support: these machines may have WB memory for the entire + first megabyte (or even the entire first granule). + + On these machines, we could mmap legacy_mem as WB, which would + be safe in terms of attribute aliasing, but X has no way of + knowing that it is accessing regular memory, not a frame buffer, + so the kernel should fail the mmap rather than doing it with WB. + + read/write of /dev/mem + + This uses copy_from_user(), which implicitly uses a kernel + identity mapping. This is obviously safe for things in + kern_memmap. + + There may be corner cases of things that are not in kern_memmap, + but could be accessed this way. For example, registers in MMIO + space are not in kern_memmap, but could be accessed with a UC + mapping. This would not cause attribute aliasing. But + registers typically can be accessed only with four-byte or + eight-byte accesses, and the copy_from_user() path doesn't allow + any control over the access size, so this would be dangerous. + + ioremap() + + This returns a kernel identity mapping for use inside the + kernel. + + If the region is in kern_memmap, we should use the attribute + specified there. Otherwise, if the EFI memory map reports that + the entire granule supports WB, we should use that (granules + that are partially reserved or occupied by firmware do not appear + in kern_memmap). Otherwise, we should use a UC mapping. + +PAST PROBLEM CASES + + mmap of various MMIO regions from /dev/mem by "X" on Intel platforms + + The EFI memory map may not report these MMIO regions. + + These must be allowed so that X will work. This means that + when the EFI memory map is incomplete, every /dev/mem mmap must + succeed. It may create either WB or UC user mappings, depending + on whether the region is in kern_memmap or the EFI memory map. + + mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled + + See https://bugzilla.novell.com/show_bug.cgi?id=140858. + + The EFI memory map reports the following attributes: + 0x00000-0x9FFFF WB only + 0xA0000-0xBFFFF UC only (VGA frame buffer) + 0xC0000-0xFFFFF WB only + + This mmap is done with user pages, not kernel identity mappings, + so it is safe to use WB mappings. + + The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, + which will use a granule-sized UC mapping covering 0-0xFFFFF. This + granule covers some WB-only memory, but since UC is non-speculative, + the processor will never generate an uncacheable reference to the + WB-only areas unless the driver explicitly touches them. + + mmap of 0x0-0xFFFFF legacy_mem by "X" + + If the EFI memory map reports this entire range as WB, there + is no VGA MMIO hole, and the mmap should fail or be done with + a WB mapping. + + There's no easy way for X to determine whether the 0xA0000-0xBFFFF + region is a frame buffer or just memory, so I think it's best to + just fail this mmap request rather than using a WB mapping. As + far as I know, there's no need to map legacy_mem with WB + mappings. + + Otherwise, a UC mapping of the entire region is probably safe. + The VGA hole means the region will not be in kern_memmap. The + HP sx1000 chipset doesn't support UC access to the memory surrounding + the VGA hole, but X doesn't need that area anyway and should not + reference it. + + mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled + + The EFI memory map reports the following attributes: + 0x00000-0xFFFFF WB only (no VGA MMIO hole) + + This is a special case of the previous case, and the mmap should + fail for the same reason as above. + +NOTES + + [1] SDM rev 2.2, vol 2, sec 4.4.1. + [2] SDM rev 2.2, vol 2, sec 4.4.6. |