summaryrefslogtreecommitdiffstats
path: root/sys/ia64
Commit message (Collapse)AuthorAgeFilesLines
* In pmap_set_pte(), make sure to enforce ordering by inserting a memorymarcel2014-01-201-2/+4
| | | | | | | | | | | | | | | fence. Under system load, the CPU has been found to change the order by which the stores are made visible. When the tag is made visible before the other TLB values, other CPUs may use the invalid TLB values and do bad things. While here (i.e. not a fix) don't return errors from pmap_remove_vhpt() to callers of pmap_remove_pte(). Those callers don't check the return value and as such don't do what is needed to keep a consistent state. More importantly, pmap_remove_vhpt() can't really have an error without it indicating something unintended. Using KASSERT is therefore better. PR: 182999, 183227
* Enable vt. This brings VGA-based console and terminal support tomarcel2014-01-191-1/+4
| | | | | | | | | ia64 for the very first time. Only 9 years in the making... Note that the vt/vga driver does not actually make sure there's VGA hardware at the standard/legacy VGA I/O port and memory I/O addresses. This can cause machine checks if the H/W does not have a VGA controller.
* In the nested TLB fault handler, for a direct-mapped address, makemarcel2014-01-151-1/+1
| | | | | | | | | | | | | | | sure to clear the lower 12 bits. We're adding the translation attributes to the physical address and non-zero bits in the first 12 bits would give us something unexpected, including invalid bit values. Those trigger nested general protection faults. We do not have to clear the region bits, because they are ignored anyway, so we can replace an existing dep instruction with the one we need. This fixes GP faults for the swapper thread, as it's the only thread that has a direct-mapped stack. Since the bug is in the nested TLB fault handler, the frequency of hitting the GP is in the order of hours/days under load.
* Implement atomic_swap_<type>.marcel2014-01-011-0/+28
| | | | | | | The operation was documented and implemented partially (both from a type and architecture perspective) on 2013-08-21 and got used in ZFS with revision 260150 (zfeature.c) and since ZFS is supported on ia64, the lack of having atomic_swap became problem.
* Add a virt_foreach() that does the same as what phys_foreach() does andmarcel2013-12-281-8/+94
| | | | | | | | | | | change virt_size(), virt_dumphdrs() and virt_dumpdata() into its callback functions. In virt_foreach() we iterate over all the virtual memory regions that we want in the minidump. For now, just start with the PBVM (= kernel text and data plus preloaded modules). The core file this produces can already be used to work out the libkvm changes that need to be made to support it. In parallel, we can flesh out the in-kernel bits to dump more of what we need in a minidump without changing the core file structure.
* Add the scaffolding for minidumps. They're just like physical dumps,marcel2013-12-271-21/+65
| | | | | | except the chunks aren't physical memory regions but virtual memory regions. In both cases, the core file is an ELF file and flags in the header allow libkvm to distinguish one from the other.
* Allow pmap_remove_pages() to be called for physical maps notmarcel2013-12-121-6/+4
| | | | | | associated with the current thread. Obtained from: alc@
* Make process descriptors standard part of the kernel. rwhod(8) alreadypjd2013-11-301-1/+0
| | | | | | | | requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days
* Don't enable interrupts before we call sched_throw(). Interruptsmarcel2013-11-101-2/+0
| | | | are expected to be disabled by virtue of md_spinlock_count=1.
* As of r257209, all architectures have defined VM_KMEM_SIZE_SCALE. In otheralc2013-11-081-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | words, every architecture is now auto-sizing the kmem arena. This revision changes kmeminit() so that the definition of VM_KMEM_SIZE_SCALE becomes mandatory and the definition of VM_KMEM_SIZE becomes optional. Replace or eliminate all existing definitions of VM_KMEM_SIZE. With auto-sizing enabled, VM_KMEM_SIZE effectively became an alternate spelling for VM_KMEM_SIZE_MIN on most architectures. Use VM_KMEM_SIZE_MIN for clarity. Change kmeminit() so that the effect of defining VM_KMEM_SIZE is similar to that of setting the tunable vm.kmem_size. Whereas the macros VM_KMEM_SIZE_{MAX,MIN,SCALE} have had the same effect as the tunables vm.kmem_size_{max,min,scale}, the effects of VM_KMEM_SIZE and vm.kmem_size have been distinct. In particular, whereas VM_KMEM_SIZE was overridden by VM_KMEM_SIZE_{MAX,MIN,SCALE} and vm.kmem_size_{max,min,scale}, vm.kmem_size was not. Remedy this inconsistency. Now, VM_KMEM_SIZE can be used to set the size of the kmem arena at compile-time without that value being overridden by auto-sizing. Update the nearby comments to reflect the kmem submap being replaced by the kmem arena. Stop duplicating the auto-sizing formula in every machine- dependent vmparam.h and place it in kmeminit() where auto-sizing takes place. Reviewed by: kib (an earlier version) Sponsored by: EMC / Isilon Storage Division
* Use LOG2_ID_PAGE_SIZE again for the identity mapping in regions 6 & 7.marcel2013-11-013-4/+9
| | | | | | Make the default translation size the same as the PBVM page size to avoid inserting overlapping translations in the TC. That generally is very bad.
* The PAL_PTCE_INFO function returns the counts and strides of themarcel2013-11-011-10/+10
| | | | | | | | | | | | | | outer and inner loop as 32-bit integers mux'd in 64-bit return values. Change our data types for the count and stride to match and simplify/adjust accordingly. Note that with this change the defaults of the ptc.e parameters have changed. Since all hardware is supposed to support the PAL call, there should be no impact. Even ski is unaffected, because the TC is re-initialized without considering the virtual address. So, as long as we call ptc.e at least once, we're good. That's what the new defaults do. Most processor implementations need but a single ptc.e to purge the entire TC anyway...
* Purge the translation cache of APs before we unleash them. To thatmarcel2013-10-313-18/+6
| | | | | | | | end, make pmap_invalidate_all() global and have it only handle the local CPU -- i.e. no rendezvous. We do not use pmap_invalidate_all other than during initialization. Note that the BSP already purges its TC -- it was missing for APs only. Nonetheless, this so far seems to eliminate random problems.
* Respect the kern.smp.disabled tunable. When we're scanning the MADT inmarcel2013-10-311-0/+5
| | | | | | | | ia64_probe_sapics(), we also create PCPU structures for any Local SAPICs we encounter. When SMP is disabled, this leaves us with partially setup PCPU structures, which typically results in panics when we're iterating over CPUs. When SMP is disabled, we now prevent the creation of the PCPU structures.
* Add bus_dmamap_load_ma() function to load map with the array ofkib2013-10-271-0/+11
| | | | | | | | | | vm_pages. Provide trivial implementation which forwards the load to _bus_dmamap_load_phys() page by page. Right now all architectures use bus_dmamap_load_ma_triv(). Tested by: pho (as part of the functional patch) Sponsored by: The FreeBSD Foundation MFC after: 1 month
* The pmap function pmap_clear_reference() is no longer used. Remove it.alc2013-09-201-31/+0
| | | | | | | | | pmap_clear_reference() has had exactly one caller in the kernel for several years, more precisely, since FreeBSD 8. Now, that call no longer exists. Approved by: re (kib) Sponsored by: EMC / Isilon Storage Division
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-2/+2
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* On those machines, where sf_bufs do not represent any real object, makeglebius2013-09-062-22/+12
| | | | | | | | | sf_buf_alloc()/sf_buf_free() inlines, to save two calls to an absolutely empty functions. Reviewed by: alc, kib, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
* Significantly reduce the cost, i.e., run time, of calls to madvise(...,alc2013-08-291-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_DONTNEED) and madvise(..., MADV_FREE). Specifically, introduce a new pmap function, pmap_advise(), that operates on a range of virtual addresses within the specified pmap, allowing for a more efficient implementation of MADV_DONTNEED and MADV_FREE. Previously, the implementation of MADV_DONTNEED and MADV_FREE relied on per-page pmap operations, such as pmap_clear_reference(). Intuitively, the problem with this implementation is that the pmap-level locks are acquired and released and the page table traversed repeatedly, once for each resident page in the range that was specified to madvise(2). A more subtle flaw with the previous implementation is that pmap_clear_reference() would clear the reference bit on all mappings to the specified page, not just the mapping in the range specified to madvise(2). Since our malloc(3) makes heavy use of madvise(2), this change can have a measureable impact. For example, the system time for completing a parallel "buildworld" on a 6-core amd64 machine was reduced by about 1.5% to 2.0%. Note: This change only contains pmap_advise() implementations for a subset of our supported architectures. I will commit implementations for the remaining architectures after further testing. For now, a stub function is sufficient because of the advisory nature of pmap_advise(). Discussed with: jeff, jhb, kib Tested by: pho (i386), marcel (ia64) Sponsored by: EMC / Isilon Storage Division
* Revert r254501. Instead, reuse the type stability of the struct pmapkib2013-08-221-2/+2
| | | | | | | | | | | which is the part of struct vmspace, allocated from UMA_ZONE_NOFREE zone. Initialize the pmap lock in the vmspace zone init function, and remove pmap lock initialization and destruction from pmap_pinit() and pmap_release(). Suggested and reviewed by: alc (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation
* Add process descriptors support to the GENERIC kernel. It is already beingpjd2013-08-181-2/+3
| | | | | | | | | used by the tools in base systems and with sandboxing more and more tools the usage should only increase. Submitted by: Mariusz Zaborski <oshogbo@FreeBSD.org> Sponsored by: Google Summer of Code 2013 MFC after: 1 month
* Tidy up global locks for ACPICA. There is no functional change.jkim2013-08-131-3/+3
|
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* follow up to r254051avg2013-08-091-2/+1
| | | | | | | | - update powerpc/GENERIC64 as well, suggested by mdf - update comments so that they make sense after the change, suggested by jhb X-MFC after: never (change specific to head)
* enable KDB_TRACE in GENERICsavg2013-08-071-1/+1
| | | | | | | KDB_TRACE is not an alternative to DDB/etc, they are complementary. So I do not see any reason to not enable KDB_TRACE by default. X-MFC after: never (change specific to head)
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-1/+2
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Back out r253779 & r253786.obrien2013-07-311-1/+0
|
* Decouple yarrow from random(4) device.obrien2013-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Make Yarrow an optional kernel component -- enabled by "YARROW_RNG" option. The files sha2.c, hash.c, randomdev_soft.c and yarrow.c comprise yarrow. * random(4) device doesn't really depend on rijndael-*. Yarrow, however, does. * Add random_adaptors.[ch] which is basically a store of random_adaptor's. random_adaptor is basically an adapter that plugs in to random(4). random_adaptor can only be plugged in to random(4) very early in bootup. Unplugging random_adaptor from random(4) is not supported, and is probably a bad idea anyway, due to potential loss of entropy pools. We currently have 3 random_adaptors: + yarrow + rdrand (ivy.c) + nehemeiah * Remove platform dependent logic from probe.c, and move it into corresponding registration routines of each random_adaptor provider. probe.c doesn't do anything other than picking a specific random_adaptor from a list of registered ones. * If the kernel doesn't have any random_adaptor adapters present then the creation of /dev/random is postponed until next random_adaptor is kldload'ed. * Fix randomdev_soft.c to refer to its own random_adaptor, instead of a system wide one. Submitted by: arthurmesh@gmail.com, obrien Obtained from: Juniper Networks Reviewed by: obrien
* Revert r253748,253749avg2013-07-281-2/+2
| | | | | | This WIP should not have been committed yet. Pointyhat to: avg
* put contents of cpu.h under _KERNELavg2013-07-281-2/+2
| | | | | | no userland-serviceable parts inside MFC after: 20 days
* In pci_cfgregread() and pci_cfgregwrite(), multiplex the domain andmarcel2013-07-231-2/+2
| | | | | | | | | | | | | | | bus number into the bus argument. The bus number occupies the least significant 8 bits. The PCI domain occupies the most significant 24 bits. On the Altix 350, the PCI domain is a required parameter, but changing the prototype of the pci_cfgreg*() functions to include a separate domain argument has wide-spread consequences across the supported architectures. We'd be changing a known interface. Multiplexing is an acceptable kluge to give us what we need with manageable impact. Note that the PCI bus number fits in 8 bits, so the multiplexing of the domain is a backward compatible change.
* In ia64_mca_init(), don't limit the allocation of the info block tomarcel2013-07-231-2/+2
| | | | | | | | | | | | | | | | fall within the first 256MB of memory. The origin/reason for that limitation is not known, but it's not believed to be required for proper initialization. What is known is that the Altix 350 does not have physical memory at that address (by virtue of the address space bits). Keep the boundary at 256MB so that the info block will be covered by a single direct-mapped translation. While here, change the flags to M_NOWAIT to eliminate confusion. It does not change the behaviour of contigmalloc(). What is does is makes the flags argument explicitly say what the actual behaviour is.
* In pmap_mapdev(), if the physical memory range is not covered by an EFImarcel2013-07-231-1/+1
| | | | | | | | memory descriptor, don't return NULL as the virtual address, return the direct-mapped uncacheable virtual address for it. At first, this was needed only for the Altix 350, but now even some high-end HP machines have devices mapped to physical addresses that aren't covered by the EFI memory map.
* Fix issues with zeroing and fetching the counters, on x86 and ppc64.kib2013-07-011-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issues were noted by Bruce Evans and are present on all architectures. On i386, a counter fetch should use atomic read of 64bit value, otherwise carry from the increment on other CPU could be lost for the given fetch, making error of 2^32. If 64bit read (cmpxchg8b) is not available on the machine, it cannot be SMP and it is enough to disable preemption around read to avoid the split read. On x86 the counter increment is not atomic on purpose, which makes it possible for the store of the incremented result to override just zeroed per-cpu slot. The effect would be a counter going off by arbitrary value after zeroing. Perform the counter zeroing on the same processor which does the increments, making the operations mutually exclusive. On i386, same as for the fetching, if the cmpxchg8b is not available, machine is not SMP and we disable preemption for zeroing. PowerPC64 is treated the same as amd64. For other architectures, the changes made to allow the compilation to succeed, without fixing the issues with zeroing or fetching. It should be possible to handle them by using the 64bit loads and stores atomic WRT preemption (assuming the architectures also converted from using critical sections to proper asm). If architecture does not provide the facility, using global (spin) mutex would be non-optimal but working solution. Noted by: bde Sponsored by: The FreeBSD Foundation
* Move definitions required by userland applications out of acpica_machdep.h.jkim2013-06-271-7/+2
|
* Driver 'aacraid' added. Supports Adaptec by PMC RAID controller families ↵achim2013-05-241-0/+1
| | | | | | Series 6, 7, 8 and upcoming products. Older Adaptec RAID controller families are supported by the 'aac' driver. Approved by: scottl (mentor)
* o Relax locking assertions for vm_page_find_least()attilio2013-05-211-1/+2
| | | | | | | | | | | | o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc
* Tidy up some CVS workarounds.peter2013-05-122-2/+0
|
* Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h inattilio2013-05-072-7/+4
| | | | | | | | | order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc
* Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE'trasz2013-04-121-1/+1
| | | | | | | | | | | and kern.cam.ctl.disable tunable; those were introduced as a workaround to make it possible to boot GENERIC on low memory machines. With ctl(4) being built as a module and automatically loaded by ctladm(8), this makes CTL work out of the box. Reviewed by: ken Sponsored by: FreeBSD Foundation
* Merge from projects/counters: counter(9).glebius2013-04-081-0/+54
| | | | | | | | | | | | | Introduce counter(9) API, that implements fast and raceless counters, provided (but not limited to) for gathering of statistical data. See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html for more details. In collaboration with: kib Reviewed by: luigi Tested by: ae, ray Sponsored by: Nginx, Inc.
* Merge from projects/counters:glebius2013-04-081-1/+2
| | | | | | | Pad struct pcpu so that its size is denominator of PAGE_SIZE. This is done to reduce memory waste in UMA_PCPU_ZONE zones. Sponsored by: Nginx, Inc.
* Remove all legacy ATA code parts, not used since options ATA_CAM enabled inmav2013-04-041-1/+0
| | | | | | | | | most kernels before FreeBSD 9.0. Remove such modules and respective kernel options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the atacontrol utility and some man pages. Remove useless now options ATA_CAM. No objections: current@, stable@ MFC after: never
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichkib2013-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
* Add pmap function pmap_copy_pages(), which copies the content of thekib2013-03-141-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
* MFCattilio2013-03-022-18/+11
|\
| * MFcalloutng:mav2013-02-281-15/+7
| | | | | | | | | | | | | | Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.
| * MFcalloutng:davide2013-02-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
* | MFCattilio2013-02-262-8/+9
|\ \ | |/
| * kernacc() expects all KVAs to be covered in the kernel map. With themarcel2013-02-252-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | introduction of the PBVM, this stopped being the case. Redefine the VM parameters so that the PBVM is included in the kernel map. In particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of region 4 to include the PBVM. While here define KERNBASE to the actual link address of the kernel as is intended. PR: 169926
OpenPOWER on IntegriCloud