diff options
Diffstat (limited to 'share/FAQ/Text/kernel-debug.FAQ')
-rw-r--r-- | share/FAQ/Text/kernel-debug.FAQ | 417 |
1 files changed, 0 insertions, 417 deletions
diff --git a/share/FAQ/Text/kernel-debug.FAQ b/share/FAQ/Text/kernel-debug.FAQ deleted file mode 100644 index 304bcd3..0000000 --- a/share/FAQ/Text/kernel-debug.FAQ +++ /dev/null @@ -1,417 +0,0 @@ -# Hello emacs, this is -*- indented-text -*- - - Kernel debugging FAQ for FreeBSD - -$Id: kernel-debug.FAQ,v 1.3 1995/07/30 12:53:39 joerg Exp $ - - -*** Debugging a kernel crash dump with kgdb *** - - [In the following, the term ``kgdb'' refers to gdb run in `kernel - debug mode'. This can be accomplished by either starting the gdb - with the option ``-k'', or by linking and starting it under the - name ``kgdb''. This is not being done by default, however.] - - Here are some instructions for getting kernel debugging working on a - crash dump, it assumes that you have enough swap space for a crash - dump. If you happen to have multiple swap partitions with the first - one being too small to keep the dump, you can configure your kernel - to use an alternate dump device (in the ``config kernel'' line), or - you can tell this using the dumpon(8) command. Dumps to non-swap - devices (e.g. tapes) are currently not supported. - - Config your kernel using config -g - - Either, use the dumpon(8) command to tell the kernel where to dump - to (note that this will have to be done after configuring the - partition in question as swap space via swapon(8)). This is - normally arranged via sysconfig and /etc/rc. Alternatively, you can - hard-code the dump device via the `dump' clause in the `config' line - of your kernel config file. - - When the kernel's been built make a copy of it, say kernel.debug, - and then run strip -x on the original. Install the original as - normal. You may also install the unstripped kernel, but symtab - lookup time for some programs might drastically increase, and since - the whole kernel is loaded entirely at boot time and cannot be - swapped out later, you're going to waste several megabytes of - physical RAM. - - If you are testing a new kernel (e.g. by typing the new kernel's - name at the boot prompt), but need to boot a different one in order - to get your system up & running again, do boot it only into single - user state (the -s flag at the boot prompt), and then perform the - following steps: - - fsck -p - mount -a -t ufs # so your file system for /var/crash is writable - savecore -N /kernel.panicked /var/crash - exit # ...to multi-user - - This instructs savecore to use another kernel for symbol name - extraction; it would default to the currently running kernel - otherwise. - - Now, after a crash dump, go to /sys/compile/WHATEVER and run - kgdb. From kgdb do: - - symbol-file kernel.debug - exec-file /var/crash/system.0 - core-file /var/crash/ram.0 - - and voila, you can debug the crash dump using the kernel sources - just like you can for any other program. - - If your kernel panicked due to a trap (perhaps the most common case - for getting a core dump), the following trick might help you. Examine - the stack (`where') and look for the stack frame in the function - trap(). Go `up' to that frame, and then type: - - frame frame->tf_ebp frame->tf_eip - - This will tell kgdb to go to the stack frame explicitly named by a - frame pointer and instruction pointer, which is the location where - the trap occured. There are still some bugs in kgdb (you can go - `up' from there, but not `down'; the stack trace will still remain - as it was before going to here), but generally this method will lead - you much closer to the failing piece of code. - - Here's a script log of a kgdb session illustrating the above. Long - lines have been folded to improve readability, and the lines are - numbered for reference. Despite of this, it's a real-world error - trace taken during the development of the pcvt console driver. - - 1:Script started on Fri Dec 30 23:15:22 1994 - 2:uriah # cd /sys/compile/URIAH - 3:uriah # kgdb kernel /var/crash/vmcore.1 - 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done. - 5:IdlePTD 1f3000 - 6:panic: because you said to! - 7:current pcb at 1e3f70 - 8:Reading in symbols for ../../i386/i386/machdep.c...done. - 9:(kgdb) where - 10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767) - 11:#1 0xf0115159 in panic () - 12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698) - 13:#3 0xf010185e in db_fncall () - 14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073) - 15:#5 0xf0101711 in db_command_loop () - 16:#6 0xf01040a0 in db_trap () - 17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723) - 18:#8 0xf019d2eb in trap_fatal (...) - 19:#9 0xf019ce60 in trap_pfault (...) - 20:#10 0xf019cb2f in trap (...) - 21:#11 0xf01932a1 in exception:calltrap () - 22:#12 0xf0191503 in cnopen (...) - 23:#13 0xf0132c34 in spec_open () - 24:#14 0xf012d014 in vn_open () - 25:#15 0xf012a183 in open () - 26:#16 0xf019d4eb in syscall (...) - 27:(kgdb) up 10 - 28:Reading in symbols for ../../i386/i386/trap.c...done. - 29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\ - 30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\ - 31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\ - 32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\ - 33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\ - 34:ss = -266427884}) (../../i386/i386/trap.c line 283) - 35:283 (void) trap_pfault(&frame, FALSE); - 36:(kgdb) frame frame->tf_ebp frame->tf_eip - 37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done. - 38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\ - 39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403) - 40:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); - 41:(kgdb) list - 42:398 - 43:399 tp->t_state |= TS_CARR_ON; - 44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */ - 45:401 - 46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200) - 47:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); - 48:404 #else - 49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag)); - 50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */ - 51:407 } - 52:(kgdb) print tp - 53:Reading in symbols for ../../i386/i386/cons.c...done. - 54:$1 = (struct tty *) 0x1bae - 55:(kgdb) print tp->t_line - 56:$2 = 1767990816 - 57:(kgdb) up - 58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\ - 59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126) - 60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p)); - 61:(kgdb) up - 62:#2 0xf0132c34 in spec_open () - 63:(kgdb) up - 64:#3 0xf012d014 in vn_open () - 65:(kgdb) up - 66:#4 0xf012a183 in open () - 67:(kgdb) up - 68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\ - 69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\ - 70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \ - 71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \ - 72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673) - 73:673 error = (*callp->sy_call)(p, args, rval); - 74:(kgdb) up - 75:Initial frame selected; you cannot go up. - 76:(kgdb) quit - 77:uriah # exit - 78:exit - 79: - 80:Script done on Fri Dec 30 23:18:04 1994 - - Comments to the above script: - - line 6: this is a dump taken from within DDB (see below), hence the - panic comment ``because you said to!'', and a rather long - stack trace; the initial reason for going into DDB has been - a page fault trap though - - line 20: the location of function ``trap()'' in the stack trace - - line 36: force usage of a new stack frame, kgdb responds and displays - the source line where the trap happened; from looking at the - code, there's a high probability that either the pointer - access for ``tp'' was messed up, or the array access was - out of bounds - - line 52: the pointer looks suspicious, but happens to be a valid - address... - - line 56: ... but obviously points to garbage, so we have found our - error, sigh! [For those uncommon with that particular piece - of code: tp->t_line refers to the line discipline of the - console device here, which must be a rather small integer - number.] - - - -*** Post-mortem analysis of a dump *** - - What to do if a kernel dumped core but you didn't expect it, and it's - therefore not compiled using config -g? - - Not everything is lost here. Don't panic. :-) - - Of course, you still need to enable crash dumps at all. See above - on the options you've got to do this. (This is for safety reasons - in the default kernels, to avoid them trying to dump e.g. during - system installation where there's no FreeBSD partition at all and - valuable data on the disk could be destroyed.) - - Go to your kernel compile directory, and edit the line containing - COPTFLAGS?=-O. Add the `-g' option there (but DON'T change anything - on the level of optimization). If you do already know roughly the - probable location of the failing piece of code (e.g., the `pcvt' - driver in the example above), remove all the object files for this - code. Rebuild the kernel. Due to the time stamp change on the - Makefile, there will be some other object files rebuild, e.g. - trap.o. With a bit of luck, the added -g option won't change - anything for the generated code, so you'll finally get a new kernel - with similiar code to the faulting one but some debugging symbols. - You should at least verify the old and new sizes with the `size' - command; if they mismatch, you probably need to give up here. - - Go and examine the dump as described above. The debugging symbols - might be incomplete for some places (as can be seen in the stack trace - in the example above: some functions are displayed without line - numbers and argument lists). If you need more debugging symbols, - remove the appropriate object files and repeat the kgdb session until - you know enough. - - All this is not guaranteed to work, but most likely will do it fine. - - - -*** On-line kernel debugging using DDB *** - - While kgdb as an offline debugger provides a very high level of user - interface (e.g. it can lookup source files, display C structures - etc.), there are some things it cannot do. The most important ones - being breakpointing and single-stepping kernel code. - - If you need to do low-level debugging on your kernel, there's an on- - line debugger available called DDB. It allows to set breakpoints, - single-step kernel functions, examine and change kernel variables - etc. It can however not access kernel source files, and it does - only have access to the global and static symbols, but not to the - full debug information (including type and line number information) - like kgdb. - - To configure your kernel to include DDB, add the option line - - options DDB - - to your config file, and rebuild. - - (Note that if you have an older version of the boot blocks, your - debugger symbols might not be loaded at all. Update the boot - blocks, the recent ones do load the DDB symbols automagically.) - - Once your DDB kernel is running, there are several ways to enter - DDB. The first (and most early) way is to set the boot flag `-d' - (right at the boot prompt). The kernel will start up in debug mode - and enter DDB prior to any device probing. Hence you are able to - even debug the device probe/attach functions. - - The second scenario is a hot-key on the keyboard, usually Ctrl-Alt- - ESC. (For syscons, this can be remapped, and some of the - distributed maps do this, so watch out.) There's an option - available for a COMCONSOLE kernel (``options BREAK_TO_DEBUGGER'') - that allows the use of a serial line BREAK on the console line to - enter DDB. - - The third way is that any panic condition will branch to DDB if the - kernel is configured to use it. (Thus it is not wise to configure a - kernel with DDB for a machine running unattended.) - - - The DDB commands roughly resemble some gdb commands. The first you - probably need is to set a breakpoint: - - b function-name - b address - - Numbers are taken hexadecimal by default, but to make them distinct - from symbol names, hex numbers starting with the letters `a' - `f' - need to be preceded with `0x' (for other numbers, this is optional). - Simple expressions are allowed, e.g. ``function-name + 0x103''. - - To continue the operation of an interrupted kernel, simply type - - c - - To get a stack trace, use - - trace - - Note that when entering DDB via a hot-key, the kernel is currently - servicing an interrupt, so the stack trace might be not of much use - for you. - - If you want to remove a breakpoint, use - - del - del address-expression - - The first form will be accepted immediately after a breakpoint hit, - and deletes the current breakpoint. The second form can remove any - breakpoint, but you need to specify the exact address, as it can be - obtained from - - show b - - To single-step the kernel, try - - s - - This will step into functions, but you can make DDB trace them until - the matching return statement is reached by - - n - - NOTE: this is different from gdb's ``next'' statement, it's like - gdb's ``finish''. - - To examine data from memory, use e.g. - - x/wx 0xf0133fe0,40 - x/hd db_symtab_space - x/bc termbuf,10 - x/s stringbuf - - for word/halfword/byte access, and hexadecimal/decimal/character/ - string display. The number after the comma is the object count. - To display the next 0x10 items, simply use - - x ,10 - - Similiarly, use - - x/ia foofunc,10 - - to disassemble the first 0x10 instructions of foofunc, and display - them along with their offset from the beginning of foofunc. - - To modify the memory, use the write command: - - w/b termbuf 0xa 0xb 0 - w/w 0xf0010030 0 0 - - The command modifier (b/h/w) specifies the size of the data to be - writtten, the first following expression is the address to write to, - the remainder is interpreted as data to write to successive memory - locations. - - If you need to know the current registers, use - - show reg - - Alternatively, you can display a single register value by e.g. - - print $eax - - and modify it by - - set $eax new-value - - - Should you need to call some kernel functions from DDB, simply - say - - call func(arg1, arg2, ...) - - The return value will be printed. - - For a ps-style summary of all running processes, use - - ps - - - - Well, you've now examined why your kernel failed, and you wish to - reboot. Remember that, depending on the severity of previous - malfunctioning, not all parts of the kernel might still be working - as expected. Perform one of the following actions to shut down and - reboot your system: - - - call diediedie() - - (must usually be followed by another ``c[ontinue]'' statement), - will cause your kernel to dump core and reboot, so you can later - analyze the core on a higher level with kgdb. - - There's now an alias for this: ``panic''. - - - call boot(0) - - might be a good way to cleanly shut down the running system, sync() - all disks, and finally reboot. As long as the disk and file system - interfaces of the kernel are not damaged, this might be a good way - for an almost clean shutdown. - - - call cpu_reset() - - ...is the final way out of the desaster, almost similiar to hitting - the Big Red Button. - - - -*** What to do if i want to debug a console driver? *** - - Since you need a console driver to run DDB on, things are more - complicated if the console driver itself is flakey. You might - remember the ``options COMCONSOLE'' line, and hook up a standard - terminal onto your first serial port. DDB works on any configured - console driver, of course it also works on a COMCONSOLE. - - - - Paul Richards, FreeBSD core team member. (paul@FreeBSD.org) - J"org Wunsch (joerg@FreeBSD.org) - |