summaryrefslogtreecommitdiffstats
path: root/sys/amd64
Commit message (Collapse)AuthorAgeFilesLines
* In the robust futexes list head, futex_offset shall be signed,kib2008-11-161-1/+1
| | | | | | and glibc actually supplies negative offsets. Change l_ulong to l_long. Submitted by: dchagin
* Add ale(4), a driver for Atheros AR8121/AR8113/AR8114 PCIe ethernetyongari2008-11-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | controller. The controller is also known as L1E(AR8121) and L2E(AR8113/AR8114). Unlike its predecessor Attansic L1, AR8121/AR8113/AR8114 uses completely different Rx logic such that it requires separate driver. Datasheet for AR81xx is not available to open source driver writers but it shares large part of Tx and PHY logic of L1. I still don't understand some part of register meaning and some MAC statistics counters but the driver seems to have no critical issues for performance and stability. The AR81xx requires copy operation to pass received frames to upper stack such that ale(4) consumes a lot of CPU cycles than that of other controller. A couple of silicon bugs also adds more CPU cycles to address the known hardware bug. However, if you have fast CPU you can still saturate the link. Currently ale(4) supports the following hardware features. - MSI. - TCP Segmentation offload. - Hardware VLAN tag insertion/stripping with checksum offload. - Tx TCP/UDP checksum offload and Rx IP/TCP/UDP checksum offload. - Tx/Rx interrupt moderation. - Hardware statistics counters. - Jumbo frame. - WOL. AR81xx PCIe ethernet controllers are mainly found on ASUS EeePC or P5Q series of ASUS motherboards. Special thanks to Jeremy Chadwick who sent the hardware to me. Without his donation writing a driver for AR81xx would never have been possible. Big thanks to all people who reported feedback or tested patches. HW donated by: koitsu Tested by: bsam, Joao Barros <joao.barros <> gmail DOT com > Jan Henrik Sylvester <me <> janh DOT de > Ivan Brawley < ivan <> brawley DOT id DOT au >, CURRENT ML
* Several cleanups related to pipe(2).ed2008-11-111-18/+5
| | | | | | | | | | | | | | | | | | - Use `fildes[2]' instead of `*fildes' to make more clear that pipe(2) fills an array with two descriptors. - Remove EFAULT from the manual page. Because of the current calling convention, pipe(2) raises a segmentation fault when an invalid address is passed. - Introduce kern_pipe() to make it easier for binary emulations to implement pipe(2). - Make Linux binary emulation use kern_pipe(), which means we don't have to recover td_retval after calling the FreeBSD system call. Approved by: rdivacky Discussed on: arch
* - Separate PMC class dependent code from other kinds of machinejkoshy2008-11-091-3/+32
| | | | | | | | | | | | | | | | | dependencies. A 'struct pmc_classdep' structure describes operations on PMCs; 'struct pmc_mdep' contains one or more 'struct pmc_classdep' structures depending on the CPU in question. Inside PMC class dependent code, row indices are relative to the PMCs supported by the PMC class; MI code in "hwpmc_mod.c" translates global row indices before invoking class dependent operations. - Augment the OP_GETCPUINFO request with the number of PMCs present in a PMC class. - Move code common to Intel CPUs to file "hwpmc_intel.c". - Move TSC handling to file "hwpmc_tsc.c".
* Regenerate system call tables for r184789.ed2008-11-093-5/+11
|
* Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.ed2008-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | Looking at our source code history, it seems the uname(), getdomainname() and setdomainname() system calls got deprecated somewhere after FreeBSD 1.1, but they have never been phased out properly. Because we don't have a COMPAT_FREEBSD1, just use COMPAT_FREEBSD4. Also fix the Linuxolator to build without the setdomainname() routine by just making it call userland_sysctl on kern.domainname. Also replace the setdomainname()'s implementation to use this approach, because we're duplicating code with sysctl_domainname(). I wasn't able to keep these three routines working in our COMPAT_FREEBSD32, because that would require yet another keyword for syscalls.master (COMPAT4+NOPROTO). Because this routine is probably unused already, this won't be a problem in practice. If it turns out to be a problem, we'll just restore this functionality. Reviewed by: rdivacky, kib
* Revert r184136. Instead, push the check for crashdumpmap overflow into thekib2008-10-312-2/+2
| | | | | | | | MD i386 and amd64 dump code. Requested by: jhb Retested by: pho MFC after: 3 days (+ 176304 + 184136)
* Fix r184323 - set stathz to be the same as lapic_timer_hz when lapic_timer_hzsobomax2008-10-271-4/+3
| | | | is less than 128. Remove extra {} to match existing style.
* Fix division by zero panic if kern.hz less than 32.sobomax2008-10-261-1/+5
| | | | MFC after: 1 day
* Simplify AMD64_CPU_MODEL() and AMD64_CPU_FAMILY() macros as the base familyjkim2008-10-221-4/+2
| | | | should be at least 0xf00 for all supported platforms.
* Add AMD Family 0Fh, Model 6Bh, Stepping 2 to the list of invariant TSCsjkim2008-10-221-2/+7
| | | | and fix i386 test.
* Set kern.timecounter.invariant_tsc to 1 for AMD CPU family 10h and higherjkim2008-10-222-1/+20
| | | | even if BIOS does not advertise it.
* Turn off CPU frequency change notifiers when the TSC is P-state invariantjkim2008-10-214-8/+30
| | | | | or it is forced by setting 'kern.timecounter.invariant_tsc' tunable to non-zero.
* Detect Advanced Power Management Information for AMD CPUs.jkim2008-10-214-0/+19
|
* Correctly fill siginfo for the signals delivered by linux tkill/tgkill.kib2008-10-192-15/+18
| | | | | | | | | | | | | | | | | | | | | It is required for async cancellation to work. Fix PROC_LOCK leak in linux_tgkill when signal delivery attempt is made to not linux process. Do not call em_find(p, ...) with p unlocked. Move common code for linux_tkill() and linux_tgkill() into linux_do_tkill(). Change linux siginfo_t definition to match actual linux one. Extend uid fields to 4 bytes from 2. The extension does not change structure layout and is binary compatible with previous definition, because i386 is little endian, and each uid field has 2 byte padding after it. Reported by: Nicolas Joly <njoly pasteur fr> Submitted by: dchangin MFC after: 1 month
* Set PCB_32BIT and clear PCB_GS32BIT for linux32 binaries.kib2008-10-181-1/+2
| | | | | Tested by: dchagin MFC after: 3 days
* Make robust futexes work on linux32/amd64. Use PTRIN to readkib2008-10-141-0/+11
| | | | | | | | user-mode pointers. Change types used in the structures definitions to properly-sized architecture-specific types. Submitted by: dchagin MFC after: 1 week
* If the current thread has the trap bit set (i.e. a debugger haddavidxu2008-10-051-0/+8
| | | | | | | single stepped the process to the system call), we need to clear the trap flag from the new frame. Otherwise, the new thread will receive a (likely unexpected) SIGTRAP when it executes the first instruction after returning to userland.
* - Add driver for Attansic L2 FastEthernet controller found onstas2008-10-031-0/+1
| | | | | | | | | Asus EeePC and some Asus mainboards. Reviewed by: yongari, rpaulo, jhb Tested by: many Approved by: kib (mentor) MFC after: 1 week
* Collect N identical (or near identical) mkdumpheader() implementations intopeter2008-10-012-44/+2
| | | | one, as threatened in the comment. Textdump magic can be passed in.
* Bump MAXCPU to 32 now that 32 CPU x86 systems exist.jhb2008-10-011-1/+1
| | | | | Tested by: rwatson, mdtansca Approved by: peter
* Remove ipi_all() and ipi_self() as the former hasn't been used atmarius2008-09-282-32/+0
| | | | | | | | | | | all to date and the latter also is only used in ia64 and powerpc code which no longer serves a real purpose after bring-up and just can be removed as well. Note that architectures like sun4u also provide no means of implementing IPI'ing a CPU itself natively in the first place. Suggested by: jhb Reviewed by: arch, grehan, jhb
* Replace all calls to minor() with dev2unit().ed2008-09-271-4/+4
| | | | | | | | | | | | | | | After I removed all the unit2minor()/minor2unit() calls from the kernel yesterday, I realised calling minor() everywhere is quite confusing. Character devices now only have the ability to store a unit number, not a minor number. Remove the confusion by using dev2unit() everywhere. This commit could also be considered as a bug fix. A lot of drivers call minor(), while they should actually be calling dev2unit(). In -CURRENT this isn't a problem, but it turns out we never had any problem reports related to that issue in the past. I suspect not many people connect more than 256 pieces of the same hardware. Reviewed by: kib
* Change the static struct sysentvec and struct Elf_Brandinfo initializerskib2008-09-242-93/+94
| | | | | | | | | | | to the C99 style. At least, it is easier to read sysent definitions that way, and search for the actual instances of sigcode etc. Explicitely initialize sysentvec.sv_maxssiz that was missed in most sysvecs. No objection from: jhb MFC after: 1 month
* - Recognize SAVE and OSXSAVE extended processor features.stas2008-09-181-2/+2
| | | | | Approved by: kib (mentor) MFC after: 1 month
* Correct a callchain capture bug on the i386.jkoshy2008-09-151-2/+3
| | | | | | | | | | | | | On the i386 architecture, the processor only saves the current value of `%esp' on stack if a privilege switch is necessary when entering the interrupt handler. Thus, `frame->tf_esp' is only valid for an entry from user mode. For interrupts taken in kernel mode, we need to determine the top-of-stack for the interrupted kernel procedure by adding the appropriate offset to the current frame pointer. Reported by: kris, Fabien Thomas Tested by: Fabien Thomas <fabien.thomas at netasq dot com>
* Add a 'hw.pci.mcfg' tunable. It can be set to 0 to disable memory-mappedjhb2008-09-111-0/+6
| | | | PCI config access.
* Update the comments above the 0xcf9 register reset attempt to match thejhb2008-09-111-4/+7
| | | | | | | | code. We only attempt a single reset using this method (a "hard" reset), and we use two writes to ensure there is a 0 -> 1 transition in bit 2 to force a reset. MFC after: 1 week
* Some K8 chipsets don't expose all of the PCI devices on bus 0 via PCIejhb2008-09-101-14/+48
| | | | | | | | | | memory-mapped config access. Add a workaround for these systems by checking the first function of each slot on bus 0 using both the memory-mapped config access and the older type 1 I/O port config access. If we find a slot that is only visible via the type 1 I/O port config access, we flag that slot. Future PCI config transactions to flagged slots on bus 0 use type 1 I/O port config access rather than memory mapped config access.
* The pcb_gs32p should be per-cpu, not per-thread pointer. This iskib2008-09-087-8/+8
| | | | | | | | location in GDT where the segment descriptor from pcb_gs32sd is copied, and the location is in GDT local to CPU. Noted and reviewed by: peter MFC after: 1 week
* Provide private per-CPU GDTs on amd64. This is required at least for thekib2008-09-082-5/+13
| | | | | | | | linux CB_GS32BIT to work. Noted by: nox Reviewed by: peter MFC after: 1 week
* In linux_set_thread_area(), mark pcb as PCB_GS32BIT. This was missedkib2008-09-081-1/+1
| | | | | | | when r180992 was committed. Reviewed by: peter MFC after: 1 week
* Fix inconsistencies in the comments.kib2008-09-082-3/+3
| | | | MFC after: 1 week
* Segment registers are stored in the uc_mcontext member of the structkib2008-09-072-4/+4
| | | | | | | | | l_ucontext. To restore the registers content, trampoline needs to dereference uc_mcontext instead of taking some undefined values from l_ucontext. Submitted by: Dmitry Chagin <dchagin@> MFC after: 1 week
* - When executing FreeBSD/amd64 binaries from FreeBSD/i386 or Linux/i386kib2008-09-023-2/+20
| | | | | | | | | | | | | | | | | processes, clear PCB_32BIT and PCB_GS32BIT bits [1]. - Reread the fs and gs bases from the msr unconditionally, not believing the values in pcb_fsbase and pcb_gsbase, since usermode may reload segment registers, invalidating the cache. [2]. Both problems resulted in the wrong fs base, causing wrong tls pointer be dereferenced in the usermode. Reported and tested by: Vyacheslav Bocharov <adeepv at gmail com> [1] Reported by: Bernd Walter <ticsoat cicely7 cicely de>, Artem Belevich <fbsdlist at src cx>[2] Reviewed by: peter MFC after: 3 days
* Move empty filter handling to MI source.jkim2008-08-261-4/+0
| | | | MFC after: 3 days
* Fix a typo in copyrights.jkim2008-08-252-2/+2
|
* Adjust the handling the various timer frequencies when using the lapicjhb2008-08-231-10/+17
| | | | | | | | | | | | | | | | | | timer. Previously, the various divisors were fixed which meant that while it gave somewhat reasonable stathz, etc. at hz=1000, it went off the rails with any other hz value. With these changes, we now pick a lapic timer hz based on the value of hz. If hz is >= 1500, then the lapic timer runs at hz. If 1500 hz >= 750, we run the lapic timer at hz * 2. If hz < 750, we run at hz * 4. We compute a divider at runtime to make stathz run as close to 128 as we can since stathz really wants to be run at something close to that frequency. Profiling just runs on every clock tick. So some examples: With hz = 100, the lapic timer now runs at 400 instead of 2000. stathz will be 133, and profhz = 400. With hz = 1000 (default), the lapic timer is still at 2000 (as it is now), stathz is at 133 (as it is now), and profhz will be 2000 (previously 666). MFC after: 2 weeks
* Extend the support for PCI-e memory mapped configuration space access:jhb2008-08-222-3/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | - Rename pciereg_cfgopen() to pcie_cfgregopen() and expose it to the rest of the kernel. It now also accepts parameters via function arguments rather than global variables. - Add a notion of minimum and maximum bus numbers and reject requests for an out of range bus. - Add more range checks on slot/func/reg/bytes parameters to the cfg reg read/write routines. Don't panic on any invalid parameters, just fail the request (writes do nothing, reads return -1). This matches the behavior of the other cfg mechanisms. - Port the memory mapped configuration space access to amd64. On amd64 we simply use the direct map (via pmap_mapdev()) for the memory mapped window. - During acpi_attach() just after loading the ACPI tables, check for a MCFG table. If it exists, call pciereg_cfgopen() on each subtable (memory mapped window). For now we only support windows for domain 0 that start with bus 0. This removes the need for more chipset-specific quirks in the MD code. - Remove the chipset-specific quirks for the Intel 5000P/V/Z chipsets since these machines should all have MCFG tables via ACPI. - Updated pci_cfgregopen() to DTRT if ACPI had invoked pcie_cfgregopen() earlier. MFC after: 2 weeks
* Integrate the new MPSAFE TTY layer to the FreeBSD operating system.ed2008-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last half year I've been working on a replacement TTY layer for the FreeBSD kernel. The new TTY layer was designed to improve the following: - Improved driver model: The old TTY layer has a driver model that is not abstract enough to make it friendly to use. A good example is the output path, where the device drivers directly access the output buffers. This means that an in-kernel PPP implementation must always convert network buffers into TTY buffers. If a PPP implementation would be built on top of the new TTY layer (still needs a hooks layer, though), it would allow the PPP implementation to directly hand the data to the TTY driver. - Improved hotplugging: With the old TTY layer, it isn't entirely safe to destroy TTY's from the system. This implementation has a two-step destructing design, where the driver first abandons the TTY. After all threads have left the TTY, the TTY layer calls a routine in the driver, which can be used to free resources (unit numbers, etc). The pts(4) driver also implements this feature, which means posix_openpt() will now return PTY's that are created on the fly. - Improved performance: One of the major improvements is the per-TTY mutex, which is expected to improve scalability when compared to the old Giant locking. Another change is the unbuffered copying to userspace, which is both used on TTY device nodes and PTY masters. Upgrading should be quite straightforward. Unlike previous versions, existing kernel configuration files do not need to be changed, except when they reference device drivers that are listed in UPDATING. Obtained from: //depot/projects/mpsafetty/... Approved by: philip (ex-mentor) Discussed: on the lists, at BSDCan, at the DevSummit Sponsored by: Snow B.V., the Netherlands dcons(4) fixed by: kan
* Export 'struct pcpu' to userland w/o requiring _KERNEL. A few portsjhb2008-08-191-2/+2
| | | | | | | already define _KERNEL to get to this and I'm about to add hooks to libkvm to access per-CPU data. MFC after: 1 week
* Correctly check unsignedness of all BPF_LD|BPF_IND instructions.jkim2008-08-182-38/+70
| | | | This is roughly from sys/net/bpf_filter.c r1.12 and r1.14.
* - Make these files compilable on user land.jkim2008-08-181-4/+28
| | | | - Update copyrights and fix style(9).
* The doreti_iret_fault code is always called with gs base MSR containingkib2008-08-181-6/+3
| | | | | | | | | | | | | | kernel gs base, because %rip is adjusted only on kernel-mode trap caused by iretq execution. On the other hand, the stack contains (hardware part of) trap frame from the usermode. As a consequence, checking for frame mode and doing swapgs causes the kernel to enter trap() with usermode gs base. Remove the check for mode and conditional swapgs, we already have right gs base in the MSR. Submitted by: Nate Eldredge <neldredge math ucsd edu> MFC after: 3 days
* Commit step 1 of the vimage project, (network stack)bz2008-08-172-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
* Use int32_t/int16_t instead of int/short as sys/net/bpf_filter.c does.jkim2008-08-131-4/+4
|
* - Remove unnecessary jump instruction(s) when offset(s) is/are zero(s).jkim2008-08-132-78/+84
| | | | - Constantly use conditional jumps for unsigned integers.
* Update copyrights and fix style(9).jkim2008-08-122-10/+10
|
* Replace all stack usages with registers and remove unused macros.jkim2008-08-122-86/+81
|
* Decode some more "exotic" instructions including: fxsave, fxrstor, ldmxcsr,jhb2008-08-111-9/+68
| | | | | | | stmxcsr, clflush, lfence, mfence, sfence, syscall, sysret, sysenter, sysexit, pause, monitor, mwait, and swapgs (amd64 only). MFC after: 1 week
OpenPOWER on IntegriCloud