summaryrefslogtreecommitdiffstats
path: root/sys/ia64/ia64/machdep.c
Commit message (Collapse)AuthorAgeFilesLines
* Repair ia64 build after r278347 - remove const from set_mcontextpeter2015-02-081-1/+1
|
* Make sure all memory updates are made visible before we let gomarcel2014-09-221-0/+2
| | | | | | | | | | | | | | of the thread in cpu_switch(). It's otherwise possible that on another CPU the thread continues from stale context data. Note that this is prominent on newer CPUs, like the Montecito, that really take advantage of the weak memory ordering. First generation Itanium 2 is not that aggressive and does not need this. This is a direct commit to stable/10. Approved by: re@ (gjb)
* Fix the PCPU access macros. It was found that the PCPU pointer, whenmarcel2014-09-061-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | held in register r13, is used outside the bounds of critical_enter() and critical_exit() by virtue of optimizations performed by the compiler. The net effect being that address computations of fields in the PCPU structure could be relative to the PCPU structure of the CPU on which the address computation was performed and not related to the CPU that executes the actual load or store operation. The typical failure mode being that the per-CPU cache of UMA got corrupted due to accesses from other CPUs. Adding more volatile decorating to the register expression does not help. The thinking being that volatile is assumed to work on memory references and not register references. Thus, the fix is to perform the address computation using a volatile inline assembly statement. Additionally, since the reference is fundamentally non-atomic on ia64 by virtue of have a distinct address computation followed by the actual load or store operation, it is required to wrap the entire PCPU access in a critical section. With PCPU_GET and friends requiring curthread now that they're in a critical section, low-level use of these macros in functions like cpu_switch() is not possible anymore. Consequently, a second order set of changes is needed to avoid using PCPU_GET and friends where curthread is either not set yet, or in the process of being changed. In those cases, explicit dereferencing of pcpup is needed. In those cases it is also possible to do that. This is a direct commit to stable/10. Approved by: re@ (marius)
* MFC r263815, r263872:emaste2014-08-211-1/+1
| | | | | | | | | | | | | | Move ia64 efi.h to sys in preparation for amd64 UEFI support Prototypes specific to ia64 have been left in this file for now, under __ia64__, rather than moving them to a new header under sys/ia64. I anticipate that (some of) the corresponding functions will be shared by the amd64, arm64, i386, and ia64 architectures, and we can adjust this as EFI support on other than ia64 continues to develop. Fix missed efi.h header change in r263815 Sponsored by: The FreeBSD Foundation
* MFC r263380 & r268185: Add KTR events for the PMAP interface functions.marcel2014-07-021-2/+3
|
* MFC r263323: Fix and improve exception tracing.marcel2014-07-021-3/+8
|
* MFC r263254: Move the implementation of kdb_cpu_trap() from <machine/kdb.h>marcel2014-07-021-0/+13
| | | | to machdep.c.
* MFC r257487:marcel2014-02-161-2/+2
| | | | Use LOG2_ID_PAGE_SIZE again for the identity mapping in regions 6 & 7.
* MFCattilio2013-03-021-3/+4
|\
| * MFcalloutng:davide2013-02-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.
* | Fix other architectures and ZFS.attilio2013-02-211-0/+1
|/ | | | Sponsored by: EMC / Isilon storage division
* Eliminate the PC_CURTHREAD symbol and load the current thread'smarcel2013-02-121-0/+16
| | | | | | | | | | | | | | | | | thread structure pointer atomically from r13 (the pcpu pointer) for the current CPU/core. Add a CTASSERT in machdep.c to make sure that pc_curthread is in fact the first field in struct pcpu. The only non-atomic operations left were those related to process- space operations, such as casuword, subyte, suword16, fubyte, fuword16, copyin, copyout and their variations. The casuword function has been re-structured more complete than the others. This way we have an example of a better bundling without introducing a lot of risk when we get it wrong. The other functions can be rebundled in separate commits and with the appropriate testing.
* Move PCPU initialization to a new function called cpu_pcpu_setup().marcel2012-07-081-1/+9
| | | | | This makes it easier to add additional CPU or platform information to the per-CPU structure without duplicated code.
* Implement ia64_physmem_alloc() and use it consistently to get memorymarcel2012-07-071-19/+9
| | | | | | | | | | | | | before VM has been initialized. This includes: 1. Replacing pmap_steal_memory(), 2. Replace the handcrafted logic to allocate a naturally aligned VHPT, 3. Properly allocate the DPCPU for the BSP. Ad 3: Appending the DPCPU to kernend worked as long as we wouldn't cross into the next PBVM page. If we were to cross into the next page, then there wouldn't be a PTE entry on the page table for it and we would end up with a MCA following a page fault. As such, this commit fixes MCAs occasionally seen.
* Hide the creation of phys_avail behind an API to make it easier to do itmarcel2012-07-071-142/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | correctly. We now iterate the EFI memory descriptors once and collect all the information in a single pass. This includes: 1. The I/O port base address, 2. The PAL memory region. Have the physmem API track this. 3. Memory descriptors of memory we can't use, like bad memory, runtime services code & data, etc. Have the physmem API track these. 4. memory descriptors of memory we can use or re-use, such as free memory, boot time services code & data, loader code & data, etc. These are added by the physmem API. Since the PBVM page table and pages are in memory described as loader data, inform the physmem API of chunks that need to be delated from the available physical memory. While here, remove Maxmem and replace it with the better named paddr_max. Maxmem was defined as physmem, which is generally wrong. Now, paddr_max is properly defined as the largesty physical address. The upshot of all this is that: 1. We properly determine realmem. 2. We maximize physmem by re-using memory where possible. 3. We remove complexity from ia64_init() in machdep.c. 4. Remove confusion about realmem, physmem & Maxmem. The new ia64_physmem_alloc() is to replace pmap_steal_memory() in pmap.c, as well as replace the handcrafted allocation of the VHPT for the BSP in pmap_bootstrap() in pmap.c. This is step 2 and addresses the manipulation of phys_avail after it is being created.
* Correct capitalization of "Hz" in user-visible text (manpages, printf(),gavin2012-02-281-1/+1
| | | | | | etc). MFC after: 3 days
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-071-2/+2
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-161-2/+2
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Switch to the event timers infrastructure. This includes:marcel2011-06-251-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Setting td_intr_frame to the XIVs trap frame because it's referenced by the ET event handler. o Signal EOI to the CPU before calling the registered XIV handlers. This prevents lost ITC interrupts, which cause starvation in one-shot mode. o Adding support for IPI_HARDCLOCK with corresponding per-CPU counters. o Have the APs call cpu_initclocks() so as to limited the scattering of clock related initialization. cpu_initclocks() calls the <self>_bsp() or <self>_ap() version accordingly. o Uncomment the ET clock handling in cpu_idle(). o Update the DDB 'show pcpu' output for the new MD fields. o Entirely rewritten ia64_ih_clock(). Note that we don't create as many clock XIVs as we have CPUs, as is done on PowerPC. It doesn't scale. We can only have 240 XIVs and we can have more CPUs than that. There's a single intrcnt index for the cumulative clock ticks and we keep per CPU counts in the PCPU stats structure. o Register the ITC by hooking SI_SUB_CONFIGURE (2nd order). Open issues: o Clock interrupts can still be lost. Some tweaking is still necessary. Thanks to: mav@ for his support, feedback and explanations. ET stats while committing: eris% sysctl machdep.cpu | grep nclks machdep.cpu.0.nclks: 24007 machdep.cpu.1.nclks: 22895 machdep.cpu.2.nclks: 13523 machdep.cpu.3.nclks: 9342 machdep.cpu.4.nclks: 9103 machdep.cpu.5.nclks: 9298 machdep.cpu.6.nclks: 10039 machdep.cpu.7.nclks: 9479 eris% vmstat -i | grep clock clock 108599 50
* Unblock the outgoing thread after we performed pmap_switch() tomarcel2011-06-231-2/+2
| | | | | | | | | switch the region registers. pmap_switch() returns the pmap for which the region register are currently programmed, which needs to be re-programmed on the CPU the ougoing thread gets switched in. This change does not noticibly change anything or fix known bugs, but does give me a warm fuzzy feeling by being more correct.
* Add the model number for the Montvale processor (marketed as Itanium 2 9100).marcel2011-06-111-0/+3
| | | | At this time we're missing just one: Tukwila (Itanium 2 9300).
* Call set_cputicker() to have the time counter use the ITC register.marcel2011-06-071-0/+2
| | | | Note that the ITC frequency is fixed.
* Improve cpu_idle():marcel2011-06-061-4/+29
| | | | | | | | | | o cpu_idle_hook is expected to be called with interrupts disabled and re-enables interrupts on return. o sync with x86: don't idle when the CPU has runnable tasks o have callers of ia64_call_pal_static() disable interrupts and re-enable interrupts. o add, but compile-out, support for idle mode. This will be enabled at some later time, after proper testing.
* On multi-core, multi-threaded PPC systems, it is important that the threadsnwhitehorn2011-05-311-1/+1
| | | | | | | | | | | | be brought up in the order they are enumerated in the device tree (in particular, that thread 0 on each core be brought up first). The SLIST through which we loop to start the CPUs has all of its entries added with SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration and so AP startup would always fail in such situations (causing a machine check or RTAS failure). Fix this by changing the SLIST into an STAILQ, and inserting new CPUs at the end. Reviewed by: jhb
* Stop linking against a direct-mapped virtual address and insteadmarcel2011-04-301-46/+47
| | | | | | | | | | | use the PBVM. This eliminates the implied hardcoding of the physical address at which the kernel needs to be loaded. Using the PBVM makes it possible to load the kernel irrespective of the physical memory organization and allows us to replicate kernel text on NUMA machines. While here, reduce the direct-mapped page size to the kernel's page size so that we can support memory attributes better.
* Fix switching to physical mode as part of calling into EFI runtimemarcel2011-03-211-18/+18
| | | | | | | | | | | | | | | | | services or PAL procedures. The new implementation is based on specific functions that are known to be called in certain scenarios only. This in particular fixes the PAL call to obtain information about translation registers. In general, the new implementation does not bank on virtual addresses being direct-mapped and will work when the kernel uses PBVM. When new scenarios need to be supported, new functions are added if the existing functions cannot be changed to handle the new scenario. If a single generic implementation is possible, it will become clear in due time. While here, change bootinfo to a pointer type in anticipation of future development.
* Use VM_MAXUSER_ADDRESS rather than VM_MAX_ADDRESS when we talk aboutmarcel2011-03-181-3/+3
| | | | | the bounds of user space. Redefine VM_MAX_ADDRESS as ~0UL, even though it's not used anywhere in the source tree.
* Mostly revert r219468, as I had misremembered the C standard regardingmdf2011-03-111-1/+1
| | | | | | the size of an extern array. Keep one change from strncpy to strlcpy.
* Use MAXPATHLEN rather than the size of an extern array when copying themdf2011-03-101-1/+1
| | | | | | kernel name. Also consistenly use strlcpy(). Suggested by: Warner Losh
* Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.pluknet2011-01-211-3/+2
| | | | | | | Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe
* Remove unused includes of <sys/mutex.h> and <machine/mutex.h>.jhb2010-11-091-1/+0
|
* Adjust the order of operations in spinlock_enter() and spinlock_exit() tojhb2010-11-051-4/+10
| | | | | | | | | | | | | | | | | | work properly with single-stepping in a kernel debugger. Specifically, these routines have always disabled interrupts before increasing the nesting count and restored the prior state of interrupts after decreasing the nesting count to avoid problems with a nested interrupt not disabling interrupts when acquiring a spin lock. However, trap interrupts for single-stepping can still occur even when interrupts are disabled. Now the saved state of interrupts is not saved in the thread until after interrupts have been disabled and the nesting count has been increased. Similarly, the saved state from the thread cannot be read once the nesting count has been decreased to zero. To fix this, use temporary variables to store interrupt state and shuffle it between the thread's MD area and the appropriate registers. In cooperation with: bde MFC after: 1 month
* Move prototypes for kern_sigtimedwait() and kern_sigprocmask() tojhb2010-06-301-0/+1
| | | | <sys/syscallsubr.h> where all other kern_<syscall> prototypes live.
* Fix the ia64 build.nwhitehorn2010-03-261-1/+1
| | | | Pointy hat to: me
* Change the arguments of exec_setregs() so that it receives a pointernwhitehorn2010-03-251-3/+3
| | | | | | | | to the image_params struct instead of several members of that struct individually. This makes it easier to expand its arguments in the future without touching all platforms. Reviewed by: jhb
* Don't check for boot_verbose in the environment. The loader doesmarcel2010-03-201-11/+1
| | | | that already and sets RB_VERBOSE. The loader has always done it.
* Revamp the interrupt code based on the previous commit:marcel2010-03-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Introduce XIV, eXternal Interrupt Vector, to differentiate from the interrupts vectors that are offsets in the IVT (Interrupt Vector Table). There's a vector for external interrupts, which are based on the XIVs. o Keep track of allocated and reserved XIVs so that we can assign XIVs without hardcoding anything. When XIVs are allocated, an interrupt handler and a class is specified for the XIV. Classes are: 1. architecture-defined: XIV 15 is returned when no external interrupt are pending, 2. platform-defined: SAL reports which XIV is used to wakeup an AP (typically 0xFF, but it's 0x12 for the Altix 350). 3. inter-processor interrupts: allocated for SMP support and non-redirectable. 4. device interrupts (i.e. IRQs): allocated when devices are discovered and are redirectable. o Rewrite the central interrupt handler to call the per-XIV interrupt handler and rename it to ia64_handle_intr(). Move the per-XIV handler implementation to the file where we have the XIV allocation/reservation. Clock interrupt handling is moved to clock.c. IPI handling is moved to mp_machdep.c. o Drop support for the Intel 8259A because it was broken. When XIV 0 is received, the CPU should initiate an INTA cycle to obtain the interrupt vector of the 8259-based interrupt. In these cases the interrupt controller we should be talking to WRT to masking on signalling EOI is the 8259 and not the I/O SAPIC. This requires adriver for the Intel 8259A which isn't available for ia64. Thus stop pretending to support ExtINTs and instead panic() so that if we come across hardware that has an Intel 8259A, so have something real to work with. o With XIVs for IPIs dynamically allocatedi and also based on priority, define the IPI_* symbols as variables rather than constants. The variable holds the XIV allocated for the IPI. o IPI_STOP_HARD delivers a NMI if possible. Otherwise the XIV assigned to IPI_STOP is delivered.
* Have cpu_throw() loop on blocked_lock as well. This bug has existedmarcel2010-03-151-8/+17
| | | | | | | | | a long time and has gone unnoticed just as long, because I kept using sched_4bsd (due to sched_ule not working with preemption), but GENERIC had sched_ule by default -- including SMP. While here, remove unused inclusion of <machine/clock.h>, remove totally bogus inclusion of <i386/include/specialreg.h>.
* Provide groundwork for 32-bit binary compatibility on non-x86 platforms,nwhitehorn2010-03-111-3/+3
| | | | | | | | | for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
* Some code churn:marcel2010-02-141-1/+9
| | | | | | | | | | | | | | | | | | | | | o Eliminate IA64_PHYS_TO_RR6 and change all places where the macro is used by calling either bus_space_map() or pmap_mapdev(). o Implement bus_space_map() in terms of pmap_mapdev() and implement bus_space_unmap() in terms of pmap_unmapdev(). o Have ia64_pib hold the uncached virtual address of the processor interrupt block throughout the kernel's life and access the elements of the PIB through this structure pointer. This is a non-functional change with the exception of using ia64_ld1() and ia64_st8() to write to the PIB. We were still using assignments, for which the compiler generates semaphore reads -- which cause undefined behaviour for uncacheable memory. Note also that the memory barriers in ipi_send() are critical for proper functioning. With all the mapping of uncached memory done by pmap_mapdev(), we can keep track of the translations and wire them in the CPU. This then eliminates the need to reserve a whole region for uncached I/O and it eliminates translation traps for device I/O accesses.
* In cpu_switch(), use an atomic operation to set the td_lockmarcel2010-01-271-1/+1
| | | | | | of the old thread to the mutex that's passed. Pointed out by: attilio, jhb
* Remove cpu_boot() and call efi_reset_system() directly frommarcel2010-01-231-8/+1
| | | | cpu_reset().
* Revamp bus_space access functions:marcel2009-12-301-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | o Optimize for memory mapped I/O by making all I/O port acceses function calls and marking the test for the IA64_BUS_SPACE_IO tag with __predict_false(). Implement the I/O port access functions in a new file, called bus_machdep.c. o Change the bus_space_handle_t for memory mapped I/O to the virtual address rather than the physical address. This eliminates the PA->VA translation for every I/O access. The handle for I/O port access is still the port number. o Move inb(), outb(), inw(), outw(), inl(), outl(), and their string variants from cpufunc.h and define them in bus.h. On ia64 these are not CPU functions at all. In bus.h they are merely aliases for the new I/O port access functions defined in bus_machdep.h. o Handle the ACPI resource bug in nexus_set_resource(). There we can do it once so that we don't have to worry about it whenever we need to write to an I/O port that is really a memory mapped address. The upshot of this change is that the KBI is better defined and that I/O port access always involves a function call, allowing us to change the actual implementation without breaking the KBI. For memory mapped I/O the virtual address is abstracted, so that we can change the VA->PA mapping in the kernel without causing an KBI breakage. The exception at this time is for bus_space_map() and bus_space_unmap(). MFC after: 1 week.
* Export the bus, cpu and itc frequencies under the hw.freq sysctl node.marcel2009-12-231-22/+37
| | | | | | | | | | | | The frequencies are in MHz (i.e. a value of 1000 represents 1GHz). The frequencies are rounded to the nearest whole MHz. While here, rename and re-type bus_frequency, processor_frequency and itc_frequency to bus_freq, cpu_freq and itc_freq and make them static. As unsigned integers, the hw.freq.cpu sysctl can more easily be made generic (across all architectures) making porting easier. MFC after: 3 days
* Define struct pcpu_md as the only MD field of struct pcpu (pc_acpi_idmarcel2009-12-071-24/+27
| | | | | | | | | | excluded, as it's used by MI code) and mode the sysctl variables from pcpu_stats to pcpu_md. Adjust all references accordingly. While nearby, change the PCPU sysctl tree so that they match the CPU device sysctl tree -- they are now children of a static node called "machdep.cpu" and are named only with their cpu ID.
* Make sure bus space accesses use unorder memory loads and stores.marcel2009-12-031-2/+2
| | | | | | | | | | | | Memory accesses are posted in program order by virtue of the uncacheable memory attribute. Since GCC, by default, adds acquire and release semantics to volatile memory loads and stores, we need to use inline assembly to guarantee it. With inline assembly, we don't need volatile pointers anymore. Itanium does not support semaphore instructions to uncacheable memory.
* Eliminate teh use of MAXCPU in static arrays of interrupt counters bymarcel2009-11-281-4/+64
| | | | | | | | adding statistics counters to the PCPU structure. Export the counters through sysctl by giving each PCPU structure its own sysctl context. While here, fix cnt.v_intr by not just having it count clock interrupts, but every interrupt and add more counters for each interrupt source.
* Reimplement the lazy FP context switching:marcel2009-10-311-75/+0
| | | | | | | | | | | | | | | | | | | o Move all code into a single file for easier maintenance. o Use a single global lock to avoid having to handle either multiple locks or race conditions. o Make sure to disable the high FP registers after saving or dropping them. o use msleep() to wait for the other CPU to save the high FP registers. This change fixes the high FP inconsistency panics. A single global lock typically serializes too much, which may be noticable when a lot of threads use the high FP registers, but in that case it's probably better to switch the high FP context synchronuously. Put differently: cpu_switch() should switch the high FP registers if the incoming and outgoing threads both use the high FP registers.
* In r197963, a race with thread being selected for signal deliverykib2009-10-271-7/+1
| | | | | | | | | | | | | while in kernel mode, and later changing signal mask to block the signal, was fixed for sigprocmask(2) and ptread_exit(3). The same race exists for sigreturn(2), setcontext(2) and swapcontext(2) syscalls. Use kern_sigprocmask() instead of direct manipulation of td_sigmask to reschedule newly blocked signals, closing the race. Reviewed by: davidxu Tested by: pho MFC after: 1 month
* Decouple ACPI CPU Ids from FreeBSD's cpuid. The ACPI Ids can bemarcel2009-08-161-1/+5
| | | | | | sparse, which causes a kernel assert. Approved by: re (kensmith)
OpenPOWER on IntegriCloud