summaryrefslogtreecommitdiffstats
path: root/sys/ia64
Commit message (Collapse)AuthorAgeFilesLines
* Merge from projects/counters: counter(9).glebius2013-04-081-0/+54
| | | | | | | | | | | | | Introduce counter(9) API, that implements fast and raceless counters, provided (but not limited to) for gathering of statistical data. See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html for more details. In collaboration with: kib Reviewed by: luigi Tested by: ae, ray Sponsored by: Nginx, Inc.
* Merge from projects/counters:glebius2013-04-081-1/+2
| | | | | | | Pad struct pcpu so that its size is denominator of PAGE_SIZE. This is done to reduce memory waste in UMA_PCPU_ZONE zones. Sponsored by: Nginx, Inc.
* Remove all legacy ATA code parts, not used since options ATA_CAM enabled inmav2013-04-041-1/+0
| | | | | | | | | most kernels before FreeBSD 9.0. Remove such modules and respective kernel options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the atacontrol utility and some man pages. Remove useless now options ATA_CAM. No objections: current@, stable@ MFC after: never
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichkib2013-03-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
* Add pmap function pmap_copy_pages(), which copies the content of thekib2013-03-141-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks
* MFCattilio2013-03-022-18/+11
|\
| * MFcalloutng:mav2013-02-281-15/+7
| | | | | | | | | | | | | | Switch eventtimers(9) from using struct bintime to sbintime_t. Even before this not a single driver really supported full dynamic range of struct bintime even in theory, not speaking about practical inexpediency. This change legitimates the status quo and cleans up the code.
| * MFcalloutng:davide2013-02-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
* | MFCattilio2013-02-262-8/+9
|\ \ | |/
| * kernacc() expects all KVAs to be covered in the kernel map. With themarcel2013-02-252-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | introduction of the PBVM, this stopped being the case. Redefine the VM parameters so that the PBVM is included in the kernel map. In particular this introduces VM_INIT_KERNEL_ADDRESS to point to the base of region 5 now that VM_MIN_KERNEL_ADDRESS points to the base of region 4 to include the PBVM. While here define KERNBASE to the actual link address of the kernel as is intended. PR: 169926
* | MFCattilio2013-02-241-1/+1
|\ \ | |/
| * Enable PREEMPTION by default now that PR 147501 has been fixed.marcel2013-02-231-1/+1
| |
* | Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-5/+5
| | | | | | | | | | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
* | Fix other architectures and ZFS.attilio2013-02-211-0/+1
| | | | | | | | Sponsored by: EMC / Isilon storage division
* | Switch vm_object lock to be a rwlock.attilio2013-02-201-5/+5
|/ | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* Close a race relating to setting the PCPU pointer (r13). Register r13marcel2013-02-171-5/+5
| | | | | | | | | | | | | | | | | | | | points to the TLS in user space and points to the PCPU structure in the kernel. The race is the result of having the exception handler on the one hand and the RPC system call entry on the other. The EPC syscall path is non-atomic in that interrupts are enabled while the two stacks are switched. The register stack is switched last as that is the stack used to determine whether we're going back to user space by the exception handler. If we go back to user space, we restore r13, otherwise we leave r13 alone. The EPC syscall path however set r13 to the PCPU structure *before* switching the register stack, which means that there was a window in which the exception handler would restore r13 when it was already pointing to the PCPU structure. This is fatal when the exception happened on CPU x, but left from the exception on anotehr CPU. In that case r13 would point to the PCPU of the CPU the thread was running on. This immediately results in getting the wrong value for curthread. The fix is to make sure we assign r13 *after* we set ar.bspstore to point to the kernel register stack for the thread.
* Return EFAULT when the address is not a kernel virtual address.marcel2013-02-161-0/+2
|
* Eliminate the PC_CURTHREAD symbol and load the current thread'smarcel2013-02-123-35/+33
| | | | | | | | | | | | | | | | | thread structure pointer atomically from r13 (the pcpu pointer) for the current CPU/core. Add a CTASSERT in machdep.c to make sure that pc_curthread is in fact the first field in struct pcpu. The only non-atomic operations left were those related to process- space operations, such as casuword, subyte, suword16, fubyte, fuword16, copyin, copyout and their variations. The casuword function has been re-structured more complete than the others. This way we have an example of a better bundling without introducing a lot of risk when we get it wrong. The other functions can be rebundled in separate commits and with the appropriate testing.
* Eliminate padding by moving 'narg' next to 'code'. Both are 32-bitmarcel2013-02-121-1/+1
| | | | | entities in the syscall_args structure that otherwise has 64-bit only fields.
* Reform the busdma API so that new types may be added without modifyingkib2013-02-121-244/+224
| | | | | | | | | | | | | | | | | | | | | every architecture's busdma_machdep.c. It is done by unifying the bus_dmamap_load_buffer() routines so that they may be called from MI code. The MD busdma is then given a chance to do any final processing in the complete() callback. The cam changes unify the bus_dmamap_load* handling in cam drivers. The arm and mips implementations are updated to track virtual addresses for sync(). Previously this was done in a type specific way. Now it is done in a generic way by recording the list of virtuals in the map. Submitted by: jeff (sponsored by EMC/Isilon) Reviewed by: kan (previous version), scottl, mjacob (isp(4), no objections for target mode changes) Discussed with: ian (arm changes) Tested by: marius (sparc64), mips (jmallet), isci(4) on x86 (jharris), amd64 (Fabian Keil <freebsd-listen@fabiankeil.de>)
* Now that we actually use more memory descriptors, make sure to dumpmarcel2013-02-121-1/+4
| | | | them as well.
* Remove firewire devices missed in r244992.hrs2013-01-041-5/+0
|
* Enable the UFS quotas for big-iron GENERIC kernels.kib2013-01-031-0/+1
| | | | | Discussed with: mckusick MFC after: 2 weeks
* As discussed on -current last October, remove the firewire drivers fromdes2013-01-031-1/+0
| | | | GENERIC.
* Flip the semantic of M_NOWAIT to only require the allocation to notkib2012-11-141-6/+2
| | | | | | | | | | | | | | | | | | | | sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks
* Rework the known rwlock to benefit about staying on their ownattilio2012-11-031-10/+1
| | | | | | | cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris
* Fix compilation on ia64 when page size is configured for 16KB.kib2012-10-282-21/+37
| | | | Reviewed by: alc, marcel
* Port the new PV entry allocator from amd64/i386. This allocator has twoalc2012-10-262-233/+445
| | | | | | | | | | | advantages. First, PV entries are roughly half the size. Second, this allocator doesn't access the paging queues, and thus it allows for the removal of the page queues lock from this pmap. Replace all uses of the page queues lock by a R/W lock that is private to this pmap. Tested by: marcel
* Eliminate a stale comment. It describes another use case for the pmap inalc2012-09-281-7/+0
| | | | Mach that doesn't exist in FreeBSD.
* userret() already checks for td_locks when INVARIANTS is enabled, soattilio2012-09-082-2/+0
| | | | | | | there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week
* Use pmap_kextract(x) rather than pmap_extract(kernel_pmap, x). Themarcel2012-08-181-1/+1
| | | | | former knows about all the special mappings, like PBVM. The kernel text and data are in the PBVM.
* Remove support for SKI: HP's Itanium simulator. It's pretty much notmarcel2012-08-184-460/+0
| | | | | | | | | used, serves very little value given that FreeBSD runs on real H/W for a long time. Note that SKI is open-source (see http://ski.sourceforge.net), so if there's interest and value again, then this code can be revived. Discussed with: jhb
* Add locking for sscdisk(4) and mark it MPSAFE. Since this driver justjhb2012-08-161-13/+19
| | | | | | | | makes calls out to the emulator, the locking is fairly simple. A global mutex protects the list of ssc disks, and each ssc disk has a mutex to protect it's bioq. Approved by: marcel
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonkib2012-08-051-0/+1
| | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
* Move PCPU initialization to a new function called cpu_pcpu_setup().marcel2012-07-083-4/+13
| | | | | This makes it easier to add additional CPU or platform information to the per-CPU structure without duplicated code.
* Unleash the APs at SI_SUB_KICK_SCHEDULER so that we have them allmarcel2012-07-081-2/+1
| | | | | | | | up and running to service interrupts. This is especially important when the firmware has bound interrupts to CPUs, like for the SGI Altix 350. We wake up APs at SI_SUB_CPU time and they sit and spin until we unleash them, so there's nothing fundamentally different from a MD perspective.
* Implement ia64_physmem_alloc() and use it consistently to get memorymarcel2012-07-074-80/+82
| | | | | | | | | | | | | before VM has been initialized. This includes: 1. Replacing pmap_steal_memory(), 2. Replace the handcrafted logic to allocate a naturally aligned VHPT, 3. Properly allocate the DPCPU for the BSP. Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't cross into the next PBVM page. If we were to cross into the next page, then there wouldn't be a PTE entry on the page table for it and we would end up with a MCA following a page fault. As such, this commit fixes MCAs occasionally seen.
* Hide the creation of phys_avail behind an API to make it easier to do itmarcel2012-07-075-153/+269
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | correctly. We now iterate the EFI memory descriptors once and collect all the information in a single pass. This includes: 1. The I/O port base address, 2. The PAL memory region. Have the physmem API track this. 3. Memory descriptors of memory we can't use, like bad memory, runtime services code & data, etc. Have the physmem API track these. 4. memory descriptors of memory we can use or re-use, such as free memory, boot time services code & data, loader code & data, etc. These are added by the physmem API. Since the PBVM page table and pages are in memory described as loader data, inform the physmem API of chunks that need to be delated from the available physical memory. While here, remove Maxmem and replace it with the better named paddr_max. Maxmem was defined as physmem, which is generally wrong. Now, paddr_max is properly defined as the largesty physical address. The upshot of all this is that: 1. We properly determine realmem. 2. We maximize physmem by re-using memory where possible. 3. We remove complexity from ia64_init() in machdep.c. 4. Remove confusion about realmem, physmem & Maxmem. The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c, as well as replace the handcrafted allocation of the VHPT for the BSP in pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation of phys_avail after it is being created.
* Make the wchar_t type machine dependent.andrew2012-06-242-6/+4
| | | | | | | | | | | | | | This is required for ARM EABI. Section 7.1.1 of the Procedure Call for the ARM Architecture (AAPCS) defines wchar_t as either an unsigned int or an unsigned short with the former preferred. Because of this requirement we need to move the definition of __wchar_t to a machine dependent header. It also cleans up the macros defining the limits of wchar_t by defining __WCHAR_MIN and __WCHAR_MAX in the same machine dependent header then using them to define WCHAR_MIN and WCHAR_MAX respectively. Discussed with: bde
* Implement mechanism to export some kernel timekeeping data tokib2012-06-221-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month
* Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer tokib2012-06-221-0/+1
| | | | | | timekeeping information. MFC after: 1 week
* The page flag PGA_WRITEABLE is set and cleared exclusively by the pmapalc2012-06-161-0/+1
| | | | | | | | | | | | | | | | layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks
* Improve style(9) in the previous commit.jkim2012-06-011-2/+2
|
* Call AcpiLeaveSleepStatePrep() in interrupt disabled contextiwasaki2012-06-011-0/+7
| | | | | | | | | | | | | | | | | | (described in ACPICA source code). - Move intr_disable() and intr_restore() from acpi_wakeup.c to acpi.c and call AcpiLeaveSleepStatePrep() in interrupt disabled context. - Add acpi_wakeup_machdep() to execute wakeup MD procedures and call it twice in interrupt disabled/enabled context (ia64 version is just dummy). - Rename wakeup_cpus variable in acpi_sleep_machdep() to suspcpus in order to be shared by acpi_sleep_machdep() and acpi_wakeup_machdep(). - Move identity mapping related code to acpi_install_wakeup_handler() (i386 version) for preparation of x86/acpica/acpi_wakeup.c (MFC candidate). Reviewed by: jkim@ MFC after: 2 days
* pmap_alloc_vhpt() doesn't need the pages that it allocates to be mappedalc2012-06-011-6/+8
| | | | | | | into the kernel map, so vm_page_alloc_contig() can be used in place of contigmalloc(). Reviewed by: marcel
* MFp4 bz_ipv6_fast:bz2012-05-241-0/+4
| | | | | | | | | | | | | | | | | | | | in_cksum.h required ip.h to be included for struct ip. To be able to use some general checksum functions like in_addword() in a non-IPv4 context, limit the (also exported to user space) IPv4 specific functions to the times, when the ip.h header is present and IPVERSION is defined (to 4). We should consider more general checksum (updating) functions to also allow easier incremental checksum updates in the L3/4 stack and firewalls, as well as ponder further requirements by certain NIC drivers needing slightly different pseudo values in offloading cases. Thinking in terms of a better "library". Sponsored by: The FreeBSD Foundation Sponsored by: iXsystems Reviewed by: gnn (as part of the whole) MFC After: 3 days
* Don't assume we have legacy PICs (i.e. 8259A in cascade) at the legacymarcel2012-05-041-9/+0
| | | | | | | | I/O port addresses. Even if we do, this is hardly the place to mask interrupts. It's not clear that this was at all needed. The code came with CVS revision 1.2 of nexus.c when interrupt support was first added. What is known is that ia64 has always been designed around the IOSAPIC, and that doing I/O like this prevents Altix from booting.
* Add a convenience macro for the returns_twice attribute, and apply it todim2012-04-291-2/+2
| | | | | | | the prototypes of the appropriate functions (getcontext, savectx, setjmp, sigsetjmp and vfork). MFC after: 2 weeks
* Remove pty(4) from our kernel configurations.ed2012-03-212-2/+0
| | | | | | | | | | | As of FreeBSD 8, this driver should not be used. Applications that use posix_openpt(2) and openpty(3) use the pts(4) that is built into the kernel unconditionally. If it turns out high profile depend on the pty(4) module anyway, I'd rather get those fixed. So please report any issues to me. The pty(4) module is still available as a kernel module of course, so a simple `kldload pty' can be used to run old-style pseudo-terminals.
* Copy i386 specialreg.h to x86 and merge with amd64 specialreg.h. Replacetijl2012-03-191-1/+1
| | | | amd64/i386/pc98 specialreg.h with stubs.
OpenPOWER on IntegriCloud