summaryrefslogtreecommitdiffstats
path: root/sys/kern/imgact_elf.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(),rwatson2005-09-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment). Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout. With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant. NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable. NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change. NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code. MFC after: 1 week
* Improve the MP safeness associated with the creation of symboliccsjp2005-09-151-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | links and the execution of ELF binaries. Two problems were found: 1) The link path wasn't tagged as being MP safe and thus was not properly protected. 2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was insufficiently protected. This commit makes the following changes: -Sets the MPSAFE flag in NDINIT for symbolic link paths -Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been picked up. -Drop in an assertion into vfs_lookup which ensures that if the MPSAFE flag is NOT set, that we have picked up giant. If not panic (if WITNESS compiled into the kernel). This should help us find conditions where vnode operations are in-sufficiently protected. This is a RELENG_6 candidate. Discussed with: jeff MFC after: 4 days
* Jumbo-commit to enhance 32 bit application support on 64 bit kernels.peter2005-06-301-10/+40
| | | | | | | | | | | | | | | | | | | | | | | | This is good enough to be able to run a RELENG_4 gdb binary against a RELENG_4 application, along with various other tools (eg: 4.x gcore). We use this at work. ia32_reg.[ch]: handle the 32 bit register file format, used by ptrace, procfs and core dumps. procfs_*regs.c: vary the format of proc/XXX/*regs depending on the client and target application. procfs_map.c: Don't print a 64 bit value to 32 bit consumers, or their sscanf fails. They expect an unsigned long. imgact_elf.c: produce a valid 32 bit coredump for 32 bit apps. sys_process.c: handle 32 bit consumers debugging 32 bit targets. Note that 64 bit consumers can still debug 32 bit targets. IA64 has got stubs for ia32_reg.c. Known limitations: a 5.x/6.x gdb uses get/setcontext(), which isn't implemented in the 32/64 wrapper yet. We also make a tiny patch to gdb pacify it over conflicting formats of ld-elf.so.1. Approved by: re
* Don't set the default of kern.fallback_elf_brand to FreeBSD for arm, ascognet2005-05-241-4/+0
| | | | binutils now do the job for us
* - Neither of our image formats require Giant now that the vm and vfs havejeff2005-05-031-2/+0
| | | | been locked.
* Remove GIANT_REQUIRED from elfN_load_section().alc2005-04-031-2/+0
|
* o Split out kernel part of execve(2) syscall into two parts: one thatsobomax2005-01-291-8/+3
| | | | | | | | | | | copies arguments into the kernel space and one that operates completely in the kernel space; o use kernel-only version of execve(2) to kill another stackgap in linuxlator/i386. Obtained from: DragonFlyBSD (partially) MFC after: 2 weeks
* Don't use VOP_GETVOBJECT, use vp->v_object directly.phk2005-01-251-1/+1
|
* On arm, set the default elf brand to FreeBSD, until the binutils do it for us.cognet2004-09-231-0/+4
|
* Add __elfN(dump_thread). This function is called from __elfN(coredump)marcel2004-08-111-2/+5
| | | | | | | | | to allow dumping per-thread machine specific notes. On ia64 we use this function to flush the dirty registers onto the backingstore before we write out the PRSTATUS notes. Tested on: alpha, amd64, i386, ia64 & sparc64 Not tested on: arm, powerpc
* Make sure that AT_PHDR has a useful value even for static programs.dfr2004-08-081-0/+11
|
* After maintaining previous behaviour in writing out the core notes, it'smarcel2004-07-181-8/+5
| | | | | | | | | | | | | | | | | time now to break with the past: do not write the PID in the first note. Rationale: 1. [impact of the breakage] Process IDs in core files serve no immediate purpose to the debugger itself. They are only useful to relate a core file to a process. This can provide context to the person looking at the core file, provided one keeps track of this. Overall, not having the PID in the core file is only in very rare occasions unfortunate. 2. [reason of the breakage] Having one PRSTATUS note contain the PID, while all others contain the LWPID of the corresponding kernel thread creates an irregularity for the debugger that cannot easily be worked around. This is caused by libthread_db correlating user thread IDs to kernel thread (aka LWP) IDs and thus aware of the actual LWPIDs. Update comments accordingly.
* Allocate TIDs in thread_init() and deallocate them in thread_fini().marcel2004-06-261-33/+31
| | | | | | | | | | | | | | | | | | | | | | | | The overhead of unconditionally allocating TIDs (and likewise, unconditionally deallocating them), is amortized across multiple thread creations by the way UMA makes it possible to have type-stable storage. Previously the cost was kept down by having threads created as part of a fork operation use the process' PID as the TID. While this had some nice properties, it also introduced complexity in the way TIDs were allocated. Most importantly, by using the type-stable storage that UMA gives us this was also unnecessary. This change affects how core dumps are created and in particular how the PRSTATUS notes are dumped. Since we don't have a thread with a TID equalling the PID, we now need a different way to preserve the old and previous behavior. We do this by having the given thread (i.e. the thread passed to the core dump code in td) dump it's state first and fill in pr_pid with the actual PID. All other threads will have pr_pid contain their TIDs. The upshot of all this is that the debugger will now likely select the right LWP (=TID) as the initial thread. Credits to: julian@ for spotting how we can utilize UMA. Thanks to: all who provided julian@ with test results.
* Change the types of vn_rdwr_inchunks()'s len and aresid arguments totjr2004-06-051-1/+1
| | | | | | size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.
* Back out workaround for vn_rdwr_inchunks()'s INT_MAX length limitationtjr2004-06-051-21/+8
| | | | after discussions with bde; vn_rdwr_inchunks() itself should be fixed.
* Write segments to core dump files in maximally-sized chunks that neithertjr2004-06-041-8/+21
| | | | | | | exceed vn_rdwr_inchunks()'s INT_MAX length limitation nor span a block boundary. This fixes dumping segments larger than 2GB. PR: 67546
* Utilize sf_buf_alloc() rather than pmap_qenter() (and sometimesalc2004-04-231-10/+1
| | | | | kmem_alloc_wait()) for mapping the image header. On all machines with a direct virtual-to-physical mapping and SMP/HTT i386s, this is a clear win.
* Do not assume that the initial thread (i.e. the thread with the IDmarcel2004-04-081-6/+9
| | | | | | | | equal to the process ID) is still present when we dump a core. It already may have been destroyed. In that case we would end up dereferencing a NULL pointer, so specifically test for that as well. Reported & tested by: Dan Nelson <dnelson@allantgroup.com>
* Create NT_PRSTATUS and NT_FPREGSET notes for each and every threadmarcel2004-04-031-50/+81
| | | | | | | | | | | | | | | | | | | | in the process. This is required for proper debugging of corefiles created by 1:1 or M:N threaded processes. Add an XXX comment where we should actually call a function that dumps MD specific notes. An example of a MD specific note is the NT_PRXFPREG note for SSE registers. Since BFD creates non-annotated pseudo-sections for the first PRSTATUS and FPREGSET notes (non-annotated in the sense that the name of the section does not contain the pid/tid), make sure those sections describe the initial thread of the process (i.e. the thread which tid equals the pid). This is not strictly necessary, but makes sure that tools that use the non-annotated section names will not change behaviour due to this change. The practical upshot of this all is that one can see the threads in the debugger when looking at a corefile. For 1:1 threading this means that *all* threads are visible.
* Verify more bits of the ELF header: the program header tablenectar2004-03-181-6/+6
| | | | | | | | | | | | | entry size and the ELF version. Also, avoid a potential integer overflow when determining whether the ELF header fits entirely within the first page. Reviewed by: jdp A panic when attempting to execute an ELF binary with a bogus program header table entry size was Reported by: Christer Öberg <christer.oberg@texonet.com>
* Locking for the per-process resource limits structure.jhb2004-02-041-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists. Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
* Add an additional field to the elf brandinfo structure to supportpeter2003-12-231-11/+16
| | | | | quicker exec-time replacement of the elf interpreter on an emulation environment where an entire /compat/* tree isn't really warranted.
* Add sysentvec->sv_fixlimits() hook so that we can catch cases on 64 bitpeter2003-09-251-1/+8
| | | | | | | | | | | | | | | | | | | | | systems where the data/stack/etc limits are too big for a 32 bit process. Move the 5 or so identical instances of ELF_RTLD_ADDR() into imgact_elf.c. Supply an ia32_fixlimits function. Export the clip/default values to sysctl under the compat.ia32 heirarchy. Have mmap(0, ...) respect the current p->p_limits[RLIMIT_DATA].rlim_max value rather than the sysctl tweakable variable. This allows mmap to place mappings at sensible locations when limits have been reduced. Have the imgact_elf.c ld-elf.so.1 placement algorithm use the same method as mmap(0, ...) now does. Note that we cannot remove all references to the sysctl tweakable maxdsiz etc variables because /etc/login.conf specifies a datasize of 'unlimited'. And that causes exec etc to fail since it can no longer find space to mmap things.
* Use __FBSDID().obrien2003-06-111-2/+3
|
* Fix ia32 compat on ia64. Recent ia64 MD changes caused the garbage onmarcel2003-05-311-5/+4
| | | | | | | | | | | the stack to be changed in a way incompatible with elf32_map_insert() where we used data_buf without initializing it for when the partial mapping resulting in a misaligned image (typical when the page size implied by the image is not the same as the page size in use by the kernel). Since data_buf is passed by reference to vm_map_find(), the compiler cannot warn about it. While here, move all local variables to the top of the function.
* Back out M_* changes, per decision of the TRB.imp2003-02-191-5/+5
| | | | Approved by: trb
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-211-5/+5
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* - Provide backwards compatibility for kern.fallback_elf_brand.jake2003-01-051-8/+5
| | | | | - Use the generic elf type macros in imgact_elf.h instead of ifdefing the entire contents of the header.
* Improve the way that an elf image activator for an alternate word size isjake2003-01-041-27/+23
| | | | | | | | | | | included in the kernel. Include imgact_elf.c in conf/files, instead of both imgact_elf32.c and imgact_elf64.c, which will use the default word size for an architecture as defined in machine/elf.h. Architectures that wish to build an additional image activator for an alternate word size can include either imgact_elf32.c or imgact_elf64.c in files.${ARCH}, which allows it to be dependent on MD options instead of solely on architecture. Glanced at by: peter
* Fix multiple registration of the elf_legacy_coredump sysctl variable.marcel2002-12-211-3/+5
| | | | | | | | | | | | | | | | | | | | The duplication is caused by the fact that imgact_elf.c is included by both imgact_elf32.c and imgact_elf64.c and both are compiled by default on ia64. Consequently, we have two seperate copies of the elf_legacy_coredump variable due to them being declared static, and two entries for the same sysctl in the linker set, both referencing the unique copy of the elf_legacy_coredump variable. Since the second sysctl cannot be registered, one of the elf_legacy_coredump variables can not be tuned (if ordering still holds, it's the ELF64 related one). The only solution is to create two different sysctl variables, just like the elf<32|64>_trace sysctl variables. This unfortunately is an (user) interface change, but unavoidable. Thus, on ELF32 platforms the sysctl variable is called elf32_legacy_coredump and on ELF64 platforms it is called elf64_legacy_coredump. Platforms that have both ELF formats have both sysctl variables. These variables should probably be retired sooner rather than later.
* Change the way ELF coredumps are handled. Instead of unconditionallydillon2002-12-161-11/+31
| | | | | | | | | | | | | | | | | | | skipping read-only pages, which can result in valuable non-text-related data not getting dumped, the ELF loader and the dynamic loader now mark read-only text pages NOCORE and the coredump code only checks (primarily) for complete inaccessibility of the page or NOCORE being set. Certain applications which map large amounts of read-only data will produce much larger cores. A new sysctl has been added, debug.elf_legacy_coredump, which will revert to the old behavior. This commit represents collaborative work by all parties involved. The PR contains a program demonstrating the problem. PR: kern/45994 Submitted by: "Peter Edwards" <pmedwards@eircom.net>, Archie Cobbs <archie@dellroad.org> Reviewed by: jdp, dillon MFC after: 7 days
* Assign value of NULL to imgp->execlabel when imgp is initializedrwatson2002-11-081-0/+1
| | | | | | | | in the ELF code. Missed in earlier merge from the MAC tree. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Remove reference to struct execve_args from struct imgact, whichrwatson2002-11-051-1/+2
| | | | | | | | | | | | | | | | | describes an image activation instance. Instead, make use of the existing fname structure entry, and introduce two new entries, userspace_argv, and userspace_envv. With the addition of mac_execve(), this divorces the image structure from the specifics of the execve() system call, removes a redundant pointer, etc. No semantic change from current behavior, but it means that the structure doesn't depend on syscalls.master-generated includes. There seems to be some redundant initialization of imgact entries, which I have maintained, but which could probably use some cleaning up at some point. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Handle binaries with arbitrary number PT_LOAD sections, not onlykan2002-10-231-14/+19
| | | | | | | | | ones with one text and one data section. The text and data rlimit checks still needs to be fixed to properly accout for additional sections. Reviewed by: peter (slightly different patch version)
* Use strlcpy() instead of strncpy() to copy NUL terminated stringsrobert2002-10-171-2/+2
| | | | for safety and consistency.
* Use the fields in the sysentvec and in the vm map header in place of thejake2002-09-211-2/+1
| | | | | | | | constants VM_MIN_ADDRESS, VM_MAXUSER_ADDRESS, USRSTACK and PS_STRINGS. This is mainly so that they can be variable even for the native abi, based on different machine types. Get stack protections from the sysentvec too. This makes it trivial to map the stack non-executable for certain abis, on machines that support it.
* Do not blow up when we walk off the end of the brands list.peter2002-09-081-1/+3
| | | | Found by: kris, jake
* Alright, fix the problems with the elf loader for the Alpha. It turnsdillon2002-09-041-8/+18
| | | | | | | | | | | | | | | | out that there is no easy way to discern the difference between a text segment and a data segment through the read-only OR execute attribute in the elf segment header, so revert the algorithm to what it was before. Neither can we account for multiple data load segments in the vmspace structure (at least not without more work), due to assumptions obreak() makes in regards to the data start and data size fields. Retain RLIMIT_VMEM checking by using a local variable to track the total bytes of data being loaded. Reviewed by: peter X-MFC after: ASAP
* Make the text segment locating heuristics from rev 1.121 more reliablepeter2002-09-031-15/+10
| | | | | | | | so that it works on the Alpha. This defines the segment that the entry point exists in as 'text' and any others (usually one) as data. Submitted by: tmm Tested on: i386, alpha
* Grammer cleanupdillon2002-09-021-2/+2
|
* Moved elf brand identification into a function. Fully identify thejake2002-09-021-105/+75
| | | | | | | | brand early in the process of loading an elf file, so that we can identify the sysentvec, and so that we do not continue if we do not have a brand (and thus a sysentvec). Use the values in the sysentvec for the page size and vm ranges unconditionally, since they are all filled in now.
* Fixed more indentation bugs.jake2002-09-021-3/+3
|
* Implement data, text, and vmem limit checking in the elf loader and svr4dillon2002-08-301-10/+33
| | | | | | | compat code. Clean up accounting for multiple segments. Part 1/2. Submitted by: Andrey Alekseyev <uitm@zenon.net> (with some modifications) MFC after: 3 days
* Fixed most indentation bugs.jake2002-08-251-46/+34
|
* Fixed placement of operators. Wrapped long lines.jake2002-08-251-4/+8
|
* Fixed white space around operators, casts and reserved words.jake2002-08-241-23/+22
| | | | Reviewed by: md5
* return x; -> return (x);jake2002-08-241-32/+32
| | | | | | return(x); -> return (x); Reviewed by: md5
* In order to better support flexible and extensible access control,rwatson2002-08-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
* - Hold the vnode lock throughout execve.jeff2002-08-131-10/+3
| | | | | - Set VV_TEXT in the top level execve code. - Fixup the image activators to deal with the newly locked vnode.
* - Replace v_flag with v_iflag and v_vflagjeff2002-08-041-5/+7
| | | | | | | | | | | | | | | - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
OpenPOWER on IntegriCloud