summaryrefslogtreecommitdiffstats
path: root/sys/ia64
Commit message (Collapse)AuthorAgeFilesLines
...
* Introduce IA64_ID_PAGE_{MASK|SHIFT|SIZE} and LOG2_ID_PAGE_SIZE. Themarcel2003-09-096-8/+26
| | | | | | | | | | latter is a kernel option for IA64_ID_PAGE_SHIFT, which in turn determines IA64_ID_PAGE_MASK and IA64_ID_PAGE_SIZE. The constants are used instead of the literal hardcoding (in its various forms) of the size of the direct mappings created in region 6 and 7. The default and probably only workable size is still 256M, but for kicks we use 128M for LINT.
* Introduce a new pmap function, pmap_extract_and_hold(). This functionalc2003-09-081-0/+24
| | | | | | | | | atomically extracts and holds the physical page that is associated with the given pmap and virtual address. Such a function is needed to make the memory mapping optimizations used by, for example, pipes and raw disk I/O MP-safe. Reviewed by: tegge
* Take the support for the 8139C+/8169/8169S/8110S chips out of thewpaul2003-09-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rl(4) driver and put it in a new re(4) driver. The re(4) driver shares the if_rlreg.h file with rl(4) but is a separate module. (Ultimately I may change this. For now, it's convenient.) rl(4) has been modified so that it will never attach to an 8139C+ chip, leaving it to re(4) instead. Only re(4) has the PCI IDs to match the 8169/8169S/8110S gigE chips. if_re.c contains the same basic code that was originally bolted onto if_rl.c, with the following updates: - Added support for jumbo frames. Currently, there seems to be a limit of approximately 6200 bytes for jumbo frames on transmit. (This was determined via experimentation.) The 8169S/8110S chips apparently are limited to 7.5K frames on transmit. This may require some more work, though the framework to handle jumbo frames on RX is in place: the re_rxeof() routine will gather up frames than span multiple 2K clusters into a single mbuf list. - Fixed bug in re_txeof(): if we reap some of the TX buffers, but there are still some pending, re-arm the timer before exiting re_txeof() so that another timeout interrupt will be generated, just in case re_start() doesn't do it for us. - Handle the 'link state changed' interrupt - Fix a detach bug. If re(4) is loaded as a module, and you do tcpdump -i re0, then you do 'kldunload if_re,' the system will panic after a few seconds. This happens because ether_ifdetach() ends up calling the BPF detach code, which notices the interface is in promiscuous mode and tries to switch promisc mode off while detaching the BPF listner. This ultimately results in a call to re_ioctl() (due to SIOCSIFFLAGS), which in turn calls re_init() to handle the IFF_PROMISC flag change. Unfortunately, calling re_init() here turns the chip back on and restarts the 1-second timeout loop that drives re_tick(). By the time the timeout fires, if_re.ko has been unloaded, which results in a call to invalid code and blows up the system. To fix this, I cleared the IFF_UP flag before calling ether_ifdetach(), which stops the ioctl routine from trying to reset the chip. - Modified comments in re_rxeof() relating to the difference in RX descriptor status bit layout between the 8139C+ and the gigE chips. The layout is different because the frame length field was expanded from 12 bits to 13, and they got rid of one of the status bits to make room. - Add diagnostic code (re_diag()) to test for the case where a user has installed a broken 32-bit 8169 PCI NIC in a 64-bit slot. Some NICs have the REQ64# and ACK64# lines connected even though the board is 32-bit only (in this case, they should be pulled high). This fools the chip into doing 64-bit DMA transfers even though there is no 64-bit data path. To detect this, re_diag() puts the chip into digital loopback mode and sets the receiver to promiscuous mode, then initiates a single 64-byte packet transmission. The frame is echoed back to the host, and if the frame contents are intact, we know DMA is working correctly, otherwise we complain loudly on the console and abort the device attach. (At the moment, I don't know of any way to work around the problem other than physically modifying the board, so until/unless I can think of a software workaround, this will have do to.) - Created re(4) man page - Modified rlphy.c to allow re(4) to attach as well as rl(4). Note that this code works for the sample 8169/Marvell 88E1000 NIC that I have, but probably won't work for the 8169S/8110S chips. RealTek has sent me some sample NICs, but they haven't arrived yet. I will probably need to add an rlgphy driver to handle the on-board PHY in the 8169S/8110S (it needs special DSP initialization).
* Untangle the code in this file to improve understandability. Bothmarcel2003-09-071-159/+155
| | | | | | | | | | | | | | ia64_count_cpus() and ia64_probe_sapics() called a single function to do the the actual work. The difference in behaviour was handled in that function and was further complicated by adding bootverbose related code. As such, even the simplest of changes was hard to comprehend. Untangling has been done by increasing code duplication and using a more naive style of coding. FWIW, the object file is slightly smaller than before, so things aren't as bad as it may seem. Triggered by: a simple fix on the P4 branch that never got merged.
* MFamd64/i386alc2003-09-071-12/+19
| | | | Add necessary page locking to pmap_mincore().
* MFp4: Revamped GENERIC (and hints). This is some much more pleasantmarcel2003-09-072-118/+84
| | | | to look at...
* Replace sio(4) with uart(4). Remove the sio(4) hints and only addmarcel2003-09-072-9/+5
| | | | | those hints used by uart(4) for the determination of the serial console in the absence of the HCDP table.
* Fix a place where I forgot to change the code that checks whethermarcel2003-09-054-21/+9
| | | | | | | | | | | | | | | | | we return to kernel or userland. This triggered a panic in a KSE application when TDF_USTATCLOCK was set in the case userland was interrupted, but we never called ast() on our way out. As such, we called ast() at some other time. Unfortunately, TDF_USTATCLOCK handling assumes running in the interrupt thread. This was not the case anymore. To avoid making the same mistake later, interrupt() now returns to its caller whether we interrupted userland or not. This avoids that we have to duplicate the check in assembly, where it's bound to fall off the scope. Now we simply check the return value and call ast() if appropriate. Run into this: davidxu
* Use pmap_steal_memory() for the msgbuf instead of trying to squeezemarcel2003-09-011-25/+2
| | | | | | | | | | it in the last chunk (phys_avail block). The last chunk very often is not larger than one or two pages, resulting in a msgbuf that's too small to hold a complete verbose boot. Note that pmap_steal_memory() will bzero the memory it "allocates". Consequently, ia64 will never preserve previous msgbufs. This is not a noticable difference in practice. If the msgbuf could be reused, it was invariably too small to have anything preserved anyway.
* Use direct mapped KVA for the sf_buf allocator, as made possiblemarcel2003-09-011-15/+8
| | | | | | | by the previous commit. While here, fix a typo, reformat comments and fix a long line. Tested with: ftpd
* Migrate the sf_buf allocator that is used by sendfile(2) and zero-copyalc2003-08-291-0/+101
| | | | | | | | | | | sockets into machine-dependent files. The rationale for this migration is illustrated by the modified amd64 allocator. It uses the amd64's direct map to avoid emphemeral mappings in the kernel's address space. On an SMP, the emphemeral mappings result in an IPI for TLB shootdown for each transmitted page. Yuck. Maintainers of other 64-bit platforms with direct maps should be able to use the amd64 allocator as a reference implementation.
* Minor style cleanups.njl2003-08-283-13/+6
|
* Change LOG2_PAGE_SIZE from 14 to 15 bits. This will cause the CTASSERTmarcel2003-08-251-1/+1
| | | | | in vm_page.h to be reached and thus slightly increases the overall coverage of LINT on ia64.
* Add the bits for a LINT kernel. It has been verified to compile. Wemarcel2003-08-232-0/+51
| | | | may need to polish this.
* Remove PAGE_SIZE_4K, PAGE_SIZE_8K and PAGE_SIZE_16K and replace themmarcel2003-08-231-16/+4
| | | | | with LOG2_PAGE_SIZE. A single option is better to LINT than multiple mutual exclusive ones.
* Remove unused inclusion of opt_acpi.hmarcel2003-08-231-1/+0
|
* Regen.jhb2003-08-213-5/+5
|
* Swap sigaction/sigreturn since they are in the wrong order.jhb2003-08-211-2/+2
| | | | Noticed indirectly by: peter
* Undo the mistake made in revision 1.77 of trap.c and which was themarcel2003-08-202-25/+15
| | | | | | | | | | | | | | | | | | ultimate trigger for the follow-up fixes in revisions 1.78, 1.80, 1.81 and 1.82 of trap.c. I was simply too pre-occupied with the gateway page and how it blurs kernel space with user space and vice versa that I couldn't see that it was all a load of bollocks. It's not the IP address that matters, it's the privilege level that counts. We never run in user space with lifted permissions and we sure can not run in kernel space without it. Sure, the gateway page is the exception, but not if you look at the privilege level. It's user space if you run with user permissions and kernel space otherwise. So, we're back to looking at the privilege level like it should be. There's no other way. Pointy hat: marcel
* Fixup the ELF branding information to point to the new home of rtld.gordon2003-08-173-3/+3
|
* In vm_thread_swap{in|out}(), remove the alpha specific conditionalmarcel2003-08-161-0/+10
| | | | | | compilation and replace it with a call to cpu_thread_swap{in|out}(). This allows us to add similar code on ia64 without cluttering the code even more.
* Further cleanup <machine/cpu.h> and <machine/md_var.h>: move the MImarcel2003-08-166-56/+30
| | | | | | | | | | | | | | | | | | | prototypes of cpu_halt(), cpu_reset() and swi_vm() from md_var.h to cpu.h. This affects db_command.c and kern_shutdown.c. ia64: move all MD prototypes from cpu.h to md_var.h. This affects madt.c, interrupt.c and mp_machdep.c. Remove is_physical_memory(). It's not used (vm_machdep.c). alpha: the MD prototypes have been left in cpu.h with a comment that they should be there. Moving them is left for later. It was expected that the impact would be significant enough to be done in a seperate commit. powerpc: MD prototypes left in cpu.h. Comment added. Suggested by: bde Tested with: make universe (pc98 incomplete)
* Fix a range check bug. Don't left-shift the integer argument 'data'.marcel2003-08-161-12/+7
| | | | | | | | | Sign extension happens after the shift, not before so that boundary cases like 0x40000000 will not be caught properly. Instead, right shift ndirty. It is guaranteed to be a multiple of 8. While here, do some manual code motion and code commoning. Range check bug pointed out by: iedowse
* Fix the generation of coredumps. We did not take the dirty registersmarcel2003-08-151-1/+38
| | | | | | | | | | | that were on the kernel stack into account. For now we write them out to the register stack of the process before creating the dump. This however is not the final solution. The problem is that we may invalidate the coredump by overwriting vital information due to an invalid backing store pointer. Instead we need to write the dirty registers to an unused region of VM which will result in a seperate segment in the coredump. For now we can at least get to all the registers from a coredump.
* Add an instruction group break after the move to application registermarcel2003-08-151-2/+2
| | | | | | | and the move to control register to avoid dependency violations when these functions are used. Note that explicit data and instruction serialization also need to be in a subsequent instruction group. This too requires that we have an igrp break here.
* Introduce two machine specific ptrace(2) requests: PT_GETKSTACK andmarcel2003-08-152-2/+75
| | | | | | | | | | | | | | | | | | PT_SETKSTACK. These requests allow the tracing process to access the dirty registers of the traced process that are on the kernel stack. Note that there's currently no way to access the rnat register for those dirty registers that are not (yet) covered by a nat collection point. The interface for this is still being slept on. Also note that implied by these requests is the division of work: The tracing process has to keep track of where registers are spilled and is responsible to figure out where the NaT bit of the stacked registers are at any time during the execution of the traced process. The kernel provides the interfaces but will not abstract the fact that the register stack can be split. This model does not follow the approach taken in Linux where PT_PEEK and PT_POKE deals with this automagically.
* Don't use VM_MIN_KERNEL_ADDRESS to check if the faulting address ismarcel2003-08-131-2/+2
| | | | | | | | in user space or kernel space. VM_MIN_KERNEL_ADDRESS starts after the gateway page, which means that improper memory accesses to the gateway page while in user mode would panic the kernel. Use VM_MAX_ADDRESS instead. It ends before the gateway page. The difference between VM_MIN_KERNEL_ADDRESS and VM_MAX_ADDRESS is exactly the gateway page.
* Put an instruction group break between the move to ar.rnat and themarcel2003-08-131-0/+1
| | | | | | | | | | | | | | | | move to ar.rsc. The RSE must be in enforced lazy mode when writing to RSE modifyable registers. In this case we restore the RSE NaT collection register ar.rnat. I have seen 2 general exception faults on pluto1 now that indicate that the move to ar.rsc has already happened prior to the move to ar.rnat, meaning that the RSE is not in enforced lazy mode anymore. The ia64 dependency and instruction ordering rules seem to allow having both registers written to in the same instruction group, provided ar.rsc is written to later than ar.rnat (based on the ordering semantics). It appears that we may be pushing our luck. For now, put them in seperate cycles (by means of the instruction group break). If we ever get a general exception fault on the move to ar.rnat again, we have definite proof that something else is fishy.
* Expand inline the relevant parts of src/COPYRIGHT for Matt Dillon'simp2003-08-122-4/+48
| | | | | | copyrighted files. Approved by: Matt Dillon
* Extend identifycpu():marcel2003-08-121-17/+36
| | | | | | | | | | | | | | | | o Differentiate between CPU family and CPU model. There are multiple Itanium 2 models and it's nice to differentiate between them. o Seperately export the CPU family and CPU model with sysctl. o Merced is the only model in the Itanium family. o Add Madison to the Itanium 2 family. We already knew about McKinley. o Print the CPU family between parenthesis, like we do with the i386 CPU class. My prototype now identifies itself as: CPU: Merced (800.03-Mhz Itanium) pluto1 and pluto2 will eventually identify themselves as: CPU: McKinley (900.00-Mhz Itanium 2)
* Cleanup prototypes in cpu.h, including fswintrberr and any referencesmarcel2003-08-124-47/+8
| | | | | | to it. Sort the remaining prototypes in cpu.h. No functional change.
* Cleanup and style(9) fixes. No functional change.marcel2003-08-111-7/+4
|
* o move cpu_reset() from vm_machdep.c to machdep.c.marcel2003-08-102-79/+68
| | | | | | o reorder cpu_boot(), cpu_halt() and identifycpu(). No functional change.
* Now that we can ignore up to 8KB of dirty registers, remove the RSEmarcel2003-08-101-45/+30
| | | | | | | | magic from exec_setregs(). In set_mcontext() we now also don't have to worry that we entered the kernel with more that 512 bytes of dirty registers on the kernel stack. Note that we cannot make any assumptions anymore WRT to NaT collection points in exec_setregs(), so we have to deal with them now.
* MFi386 1.422 & 1.423: lock page queues in pmap_insert_entry().marcel2003-08-081-0/+2
|
* Consistently use the BSD u_int and u_short instead of the SYSV uint andjhb2003-08-072-2/+2
| | | | | | | ushort. In most of these files, there was a mixture of both styles and this change just makes them self-consistent. Requested by: bde (kern_ktrace.c)
* Better define the flags in the mcontext_t and properly set the flagsmarcel2003-08-072-35/+121
| | | | | | | | | | | | | when we create contexts. The meaning of the flags are documented in <machine/ucontext.h>. I only list them here to help browsing the commit logs: _MC_FLAGS_ASYNC_CONTEXT _MC_FLAGS_HIGHFP_VALID _MC_FLAGS_KSE_SET_MBOX _MC_FLAGS_RETURN_VALID _MC_FLAGS_SCRATCH_VALID Yes, _MC_FLAGS_KSE_SET_MBOX is a hack and I'm proud of it :-)
* o Fix cut-n-paste whitespace corruption in previous commitmarcel2003-08-071-5/+12
| | | | | | | o For trap-based upcalls the argument (the kse_mailbox) to the UTS must be written onto the kernel stack, not the user stack. While here, deal with the fact that we may be at a NaT collection point.
* In cpu_set_upcall_kse(), create the upcall according to the entrymarcel2003-08-061-12/+19
| | | | | | | | | path into the kernel. Normally it's due to a syscall, but one can also be created as the result of a clock interrupt (for example). This now even more looks like exec_setregs(). While here, add an assert that we don't expect more than 8KB of dirty registers on the kernel stack.
* o In revision 1.45 of exception.S we changed exception_restore tomarcel2003-08-062-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unconditionally restore ar.k7 (kernel memory stack) and ar.k6 (kernel register stack). I don't know what I was smoking then, but if you unconditionally restore ar.k6, you also want to compute its value unconditionally. By having the computation predicated and dependent on whether we return to user mode, we would end up writing junk (= invalid value for ar.bspstore) if we would return to kernel mode. But the whole point of the unconditional restoration was that there is a grey area where we still need to have ar.k6 restored. If we restore with a junk value, we would end up wedging the machine on the next interrupt. So, unconditionally calculate the value we unconditionally write to ar.k6. o The previous braino was found while making the following change: We used to clear the lower 9 bits of the value we write to ar.k6. The meaning being that we know that the kernel register stack is at least 512 byte aligned and simply clearing the lower 9 bits allows us to return to a context of which we don't have dirty registers on the kernel stack, even though the context that entered the kernel does have dirty registers on the kernel stack. By masking-off the lower bits, we correctly obtain the base of the register stack without having to worry that we didn't actually reached the base while unwinding it. The change is to mask off the lower 13 bits, knowing that the kernel register stack is always 8KB aligned. The advantage is that we don't have to worry anymore if there's more than 512 bytes of dirty registers on the kernel stack. A situation that frequently occurs. In exec_setregs() in machdep.c:1.147 or older, we had to deal with that situation by copying the active portion of the register stack down in multiples of 512 bytes. Now that we mask off the lower 13 bits we don't have to do that at all. Contemporary IPF processors have a register file that can hold up to 96 stacked registers (=784 bytes [incl. 2 NaT collections]). With no indication that register files grow beyond a couple of hundred registers, we should not have to worry about it anymore... and yes, 640KB is enough for everybody :-) This change helps setcontext(2) and cpu_set_upcall_kse() in that they can return to completely different contexts without having to mess with the kernel stack. Of course exec_setregs() doesn't need to do that anymore as well.
* o Put the syscall return registers in the context. Not only do wemarcel2003-08-051-6/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | need this for swapcontext(), KSE upcalls initiated from ast() also need to save them so that we properly return the syscall results after having had a context switch. Note that we don't use r11 in the kernel. However, the runtime specification has defined r8-r11 as return registers, so we put r11 in the context as well. I think deischen@ was trying to tell me that we should save the return registers before. I just wasn't ready for it :-) o The EPC syscall code has 2 return registers and 2 frame markers to save. The first (rp/pfs) belongs to the syscall stub itself. The second (iip/cfm) belongs to the caller of the syscall stub. We want to put the second in the context (note that iip and cfm relate to interrupts. They are only being misused by the syscall code, but are not part of a regular context). This way, when the context is switched to again, we return to the caller of setcontext(2) as one would expect. o Deal with dirty registers on the kernel stack. The getcontext() syscall will flush the RSE, so we don't expect any dirty registers in that case. However, in thread_userret() we also need to save the context in certain cases. When that happens, we are sure that there are dirty registers on the kernel stack. This implementation simply copies the registers, one at a time, from the kernel stack to the user stack. NAT collections are not dealt with. Hence we don't preserve NaT bits. A better solution needs to be found at some later time. We also don't deal with this in all cases in set_mcontext. No temporay solution is implemented because it's not a showstopper. The problem is that we need to ignore the dirty registers and we automaticly do that for at most 62 registers. When there are more than 62 dirty registers we have a memory "leak". This commit is fundamental for KSE support.
* Fix logic bug in the previous commit. Any region less than 5 is amarcel2003-08-041-1/+1
| | | | | | user space region. Hence, we need to test if 5 is greater than the region; not greater equal. This bug caused us to call ast() while interrupting kernel mode.
* - Since td_critnest is now initialized in MI code, it doesn't have to bejhb2003-08-042-13/+0
| | | | | | set in cpu_critical_fork_exit() anymore. - As far as I can tell, cpu_thread_link() has never been used, not even when it was originally added, so remove it.
* Cleanup the clock code. This includes:marcel2003-08-048-459/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o Remove alpha specific timer code (mc146818A) and compiled-out calibration of said timer. o Remove i386 inherited timer code (i8253) and related acquire and release functions. o Move sysbeep() from clock.c to machdep.c and have it return ENODEV. Console beeps should be implemented using ACPI or if no such device is described, using the sound driver. o Move the sysctls related to adjkerntz, disable_rtc_set and wall_cmos_clock from machdep.c to clock.c, where the variables are. o Don't hardcode a hz value of 1024 in cpu_initclocks() and don't bother faking a stathz that's 1/8 of that. Keep it simple: hz defaults to HZ and stathz equals hz. This is also how it's done for sparc64. o Keep a per-CPU ITC counter (pc_clock) and adjustment (pc_clockadj) to calculate ITC skew and corrections. On average, we adjust the ITC match register once every ~1500 interrupts for a duration of 2 consequtive interruprs. This is to correct the non-deterministic behaviour of the ITC interrupt (there's a delay between the match and the raising of the interrupt). o Add 4 debugging sysctls to monitor clock behaviour. Those are debug.clock_adjust_edges, debug.clock_adjust_excess, debug.clock_adjust_lost and debug.clock_adjust_ticks. The first counts the individual adjustment cycles (when the skew first crosses the threshold), the second counts the number of times the adjustment was excessive (any non-zero value is to be considered a bug), the third counts lost clock interrupts and the last counts the number of interrupts for which we applied an adjustment (debug.clock_adjust_ticks / debug.clock_adjust_edges gives the avarage duration of an individual adjustment -- should be ~2). While here, remove some nearby (trivial) left-overs from alpha and other cleanups.
* Fix handling of external interrupts: we weren't calling ast() whenmarcel2003-08-042-14/+51
| | | | | | | | | | | | | | | | | | | | interrupting user mode. The net effect of this bug is that a clock interrupt does not cause rescheduling and processes are not preempted. It only takes a "while (1);" to render the machine useless. This bug was introduced by the context changes and EPC syscall code. Handling of ASTs was moved to C for clarity and ease of maintenance, but was not added for the external interrupt case. This needs to be revisited. We now have calls to do_ast() in trap(), break_syscall() and ivt_External_Interrupt(). A single call in exception_restore covers these 3 places without duplication. This is where we handled ASTs prior to the overhaul, except that the meat has been moved to do_ast(), a C function. This was the goal to begin with. Pointy hat: marcel
* Style sync.obrien2003-08-031-8/+9
|
* Don't use uint64_t. Use unsigned long instead. One is supposed to usemarcel2003-08-021-2/+2
| | | | ucontext_t without having to include headers other than <ucontext.h>.
* Write the preserved registers to (and read them from) struct reg andmarcel2003-08-011-4/+4
| | | | struct fpreg.
* Make sure that when the PV ENTRY zone is created in pmap, that it'sbmilekic2003-07-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | created not only with UMA_ZONE_VM but also with UMA_ZONE_NOFREE. In the i386 case in particular, the pmap code would hook a special page allocation routine that allocated from kernel_map and not kmem_map, and so when/if the pageout daemon drained the zones, it could actually push out slabs from the PV ENTRY zone but call UMA's default page_free, which resulted in pages allocated from kernel_map being freed to kmem_map; bad. kmem_free() ignores the return value of the vm_map_delete and just returns. I'm not sure what the exact repercussions could be, but it doesn't look good. In the PAE case on i386, we also set-up a zone in pmap, so be conservative for now and make that zone also ZONE_NOFREE and ZONE_VM. Do this for the pmap zones for the other archs too, although in some cases it may not be entirely necessarily. We'd rather be safe than sorry at this point. Perhaps all UMA_ZONE_VM zones should by default be also UMA_ZONE_NOFREE? May fix some of silby's crashes on the PV ENTRY zone.
* Deal with 'options KSTACK_PAGES' being a global option.peter2003-07-313-0/+6
|
OpenPOWER on IntegriCloud