diff options
author | joerg <joerg@FreeBSD.org> | 1995-01-02 12:01:59 +0000 |
---|---|---|
committer | joerg <joerg@FreeBSD.org> | 1995-01-02 12:01:59 +0000 |
commit | 1402539403319b578486831b5f7ae7c84ab6b8f1 (patch) | |
tree | e8678c332a0b384a59af000ac6d8829ce6bbfa03 | |
parent | 1e1f81d37c95856f99694cf1942bd4f407d67f6f (diff) | |
download | FreeBSD-src-1402539403319b578486831b5f7ae7c84ab6b8f1.zip FreeBSD-src-1402539403319b578486831b5f7ae7c84ab6b8f1.tar.gz |
Heavily re-worked.
Updated to 2.0 .
Included sections about how to use DDB, post-mortem analysis of
a kernel crash where you didn't anticipate it and therefore
didn't config -g it. Added a real-world example of a kgdb session.
-rw-r--r-- | share/FAQ/kernel-debug.FAQ | 424 |
1 files changed, 404 insertions, 20 deletions
diff --git a/share/FAQ/kernel-debug.FAQ b/share/FAQ/kernel-debug.FAQ index 0d098c3..3221f82 100644 --- a/share/FAQ/kernel-debug.FAQ +++ b/share/FAQ/kernel-debug.FAQ @@ -1,33 +1,417 @@ - Kernel debugging FAQ - for FreeBSD 1.1.5.1 and below + Kernel debugging FAQ for FreeBSD -Last modified: $Id: kernel-debug.FAQ,v 1.1 1994/09/11 10:56:06 jkh Exp $ +Last modified: $Id: kernel-debug.FAQ,v 1.2 1994/10/03 03:19:41 gclarkii Exp $ -Here are some instructions for getting kernel debugging working on -a crash dump, it assumes that you have enough swap space for a crash -dump. -*** Start *** +*** Debugging a kernel crash dump with kgdb *** -Config you're kernel using config -g + Here are some instructions for getting kernel debugging working on a + crash dump, it assumes that you have enough swap space for a crash + dump. If you happen to have multiple swap partitions with the first + one being too small to keep the dump, you can configure your kernel to + use an alternate dump device (in the ``kernel'' line). Dumps to non- + swap devices (e.g. tapes) are currently not supported. -Remove ${STRIP} -x $@; from the Makefile for the kernel so it doesn't -get stripped. + Config your kernel using config -g -When the kernel's been built make a copy of it, say 386BSD.debug, and -then run strip -x on the original. Install the original as normal. + Remember that you need to specify ``options DODUMP'' in your config + file in order to get kernel core dumps. -Now, after a crash dump, go to /sys/compile/WHATEVER and run kgdb. From kgdb -do: + When the kernel's been built make a copy of it, say kernel.debug, and + then run strip -x on the original. Install the original as normal. + You may also install the unstripped kernel, but symtab lookup time + for some programs might drastically increase. -symbol-file 386BSD.debug -exec-file /var/crash/system.0 -core-file /var/crash/ram.0 + If you are testing a new kernel (e.g. by typing the new kernel's + name at the boot prompt), but need to boot a different one in order + to get your system up & running again, do boot it only into single + user state (the -s flag at the boot prompt), and then perform the + following steps: -and viola, you can debug the crash dump using the kernel sources just like -you can for any other program. + fsck -p + mount -a -t ufs # so your file system for /var/crash is writable + savecore -N /kernel.panicked /var/crash + exit # ...to multi-user + This instructs savecore to use another kernel for symbol name + extraction; it would default to the currently running kernel + otherwise. + Now, after a crash dump, go to /sys/compile/WHATEVER and run + kgdb. From kgdb do: - Paul Richards, FreeBSD core team member. + symbol-file kernel.debug + exec-file /var/crash/system.0 + core-file /var/crash/ram.0 + + and voila, you can debug the crash dump using the kernel sources + just like you can for any other program. + + If your kernel panicked due to a trap (perhaps the most common case + for getting a core dump), the following trick might help you. Examine + the stack (`where') and look for the stack frame in the function + trap(). Go `up' to that frame, and then type: + + frame frame->tf_ebp frame->tf_eip + + This will tell kgdb to go to the stack frame explicitly named by a + frame pointer and instruction pointer, which is the location where + the trap occured. There are still some bugs in kgdb (you can go + `up' from there, but not `down'; the stack trace will still remain + as it was before going to here), but generally this method will lead + you much closer to the failing piece of code. + + Here's a script log of a kgdb session illustrating the above. Long + lines have been folded to improve readability, and the lines are + numbered for reference. Despite of this, it's a real-world error + trace taken during the development of the pcvt console driver. + + 1:Script started on Fri Dec 30 23:15:22 1994 + 2:uriah # cd /sys/compile/URIAH + 3:uriah # kgdb kernel /var/crash/vmcore.1 + 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done. + 5:IdlePTD 1f3000 + 6:panic: because you said to! + 7:current pcb at 1e3f70 + 8:Reading in symbols for ../../i386/i386/machdep.c...done. + 9:(kgdb) where + 10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767) + 11:#1 0xf0115159 in panic () + 12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698) + 13:#3 0xf010185e in db_fncall () + 14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073) + 15:#5 0xf0101711 in db_command_loop () + 16:#6 0xf01040a0 in db_trap () + 17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723) + 18:#8 0xf019d2eb in trap_fatal (...) + 19:#9 0xf019ce60 in trap_pfault (...) + 20:#10 0xf019cb2f in trap (...) + 21:#11 0xf01932a1 in exception:calltrap () + 22:#12 0xf0191503 in cnopen (...) + 23:#13 0xf0132c34 in spec_open () + 24:#14 0xf012d014 in vn_open () + 25:#15 0xf012a183 in open () + 26:#16 0xf019d4eb in syscall (...) + 27:(kgdb) up 10 + 28:Reading in symbols for ../../i386/i386/trap.c...done. + 29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\ + 30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\ + 31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\ + 32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\ + 33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\ + 34:ss = -266427884}) (../../i386/i386/trap.c line 283) + 35:283 (void) trap_pfault(&frame, FALSE); + 36:(kgdb) frame frame->tf_ebp frame->tf_eip + 37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done. + 38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\ + 39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403) + 40:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); + 41:(kgdb) list + 42:398 + 43:399 tp->t_state |= TS_CARR_ON; + 44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */ + 45:401 + 46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200) + 47:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); + 48:404 #else + 49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag)); + 50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */ + 51:407 } + 52:(kgdb) print tp + 53:Reading in symbols for ../../i386/i386/cons.c...done. + 54:$1 = (struct tty *) 0x1bae + 55:(kgdb) print tp->t_line + 56:$2 = 1767990816 + 57:(kgdb) up + 58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\ + 59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126) + 60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p)); + 61:(kgdb) up + 62:#2 0xf0132c34 in spec_open () + 63:(kgdb) up + 64:#3 0xf012d014 in vn_open () + 65:(kgdb) up + 66:#4 0xf012a183 in open () + 67:(kgdb) up + 68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\ + 69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\ + 70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \ + 71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \ + 72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673) + 73:673 error = (*callp->sy_call)(p, args, rval); + 74:(kgdb) up + 75:Initial frame selected; you cannot go up. + 76:(kgdb) quit + 77:uriah # exit + 78:exit + 79: + 80:Script done on Fri Dec 30 23:18:04 1994 + + Comments to the above script: + + line 6: this is a dump taken from within DDB (see below), hence the + panic comment ``because you said to!'', and a rather long + stack trace; the initial reason for going into DDB has been + a page fault trap though + + line 20: the location of function ``trap()'' in the stack trace + + line 36: force usage of a new stack frame, kgdb responds and displays + the source line where the trap happened; from looking at the + code, there's a high probability that either the pointer + access for ``tp'' was messed up, or the array access was + out of bounds + + line 52: the pointer looks suspicious, but happens to be a valid + address... + + line 56: ... but obviously points to garbage, so we have found our + error, sigh! [For those uncommon with that particular piece + of code: tp->t_line refers to the line discipline of the + console device here, which must be a rather small integer + number.] + + + +*** Post-mortem analysis of a dump *** + + What to do if a kernel dumped core but you didn't expect it, and it's + therefore not compiled using config -g? + + Not everything is lost here. Don't panic. :-) + + Of course, you still need to configure all your kernels with the + DODUMP option being set, otherwise you won't get a core dump at all. + (This is for safety reasons in the default kernels, to avoid them + trying to dump e.g. during system installation where there's no + FreeBSD partition at all and valuable data on the disk could be + destroyed.) + + Go to your kernel compile directory, and edit the line containing + COPTFLAGS?=-O. Add the `-g' option there (but DON'T change anything + on the level of optimization). If you do already know roughly the + probable location of the failing piece of code (e.g., the `pcvt' + driver in the example above), remove all the object files for this + code. Rebuild the kernel. Due to the time stamp change on the + Makefile, there will be some other object files rebuild, e.g. + trap.o. With a bit of luck, the added -g option won't change + anything for the generated code, so you'll finally get a new kernel + with similiar code to the faulting one but some debugging symbols. + You should at least verify the old and new sizes with the `size' + command; if they mismatch, you probably need to give up here. + + Go and examine the dump as described above. The debugging symbols + might be incomplete for some places (as can be seen in the stack trace + in the example above: some functions are displayed without line + numbers and argument lists). If you need more debugging symbols, + remove the appropriate object files and repeat the kgdb session until + you know enough. + + All this is not guaranteed to work, but most likely will do it fine. + + + +*** On-line kernel debugging using DDB *** + + While kgdb as an offline debugger provides a very high level of user + interface (e.g. it can lookup source files, display C structures + etc.), there are some things it cannot do. The most important ones + being breakpointing and single-stepping kernel code. + + If you need to do low-level debugging on your kernel, there's an on- + line debugger available called DDB. It allows to set breakpoints, + single-step kernel functions, examine and change kernel variables + etc. It can however not access kernel source files, and it does + only have access to the global and static symbols, but not to the + full debug information (including type and line number information) + like kgdb. + + To configure your kernel to include DDB, add the option lines + + options DDB + options "SYMTAB_SPACE=XXXX" + + to your config file, and rebuild. XXXX is the amount of space to be + reserved into a global array DDB examines to find its symbols at run + time. It must be large enough to hold all symbols, but not too + large at all to avoid wasting space. 100000 Bytes are a good first + bet for a ``normal'' kernel. The link stage will tell you about the + usage of the symtab space, you'll see something like: + + dbsym: need 98765; avail 100000 + + If the amount of allocated space has been too small, the above + message is accompanied by the following error message: + + not enough room in db_symtab array + + and the link stage fails. You then need to increase the number, + reconfig and recompile. If your config(8) has been compiled to not + remove the old compile directory before continuing (this is a + compile-time option [CONFIG_DONT_CLOBBER]), you need to remove + db_aout.o prior to recompilation; this is the only file being + affected by the SYMTAB_SPACE option. + + + Once your DDB kernel is running, there are several ways to enter + DDB. The first (and most early) way is to set the boot flag `-d' + (right at the boot prompt). The kernel will start up in debug mode + and enter DDB prior to any device probing. Hence you are able to + even debug the device probe/attach functions. + + The second scenario is a hot-key on the keyboard, usually Ctrl-Alt- + ESC. (For syscons, this can be remapped, and some of the + distributed maps do this, so watch out.) There are patches + available for a COMCONSOLE kernel, ask me (joerg@FreeBSD.org) for + them. + + The third way is that any panic condition will branch to DDB if the + kernel is configured to use it. (Thus it is not wise to configure a + kernel with DDB for a machine running unattended.) + + + The DDB commands roughly resemble some gdb commands. The first you + probably need is to set a breakpoint: + + b function-name + b address + + Numbers are taken hexadecimal by default, but to make them distinct + from symbol names, hex numbers starting with the letters `a' - `f' + need to be preceded with `0x' (for other numbers, this is optional). + Simple expressions are allowed, e.g. ``function-name + 0x103''. + + To continue the operation of an interrupted kernel, simply type + + c + + To get a stack trace, use + + trace + + Note that when entering DDB via a hot-key, the kernel is currently + servicing an interrupt, so the stack trace might be not of much use + for you. + + If you want to remove a breakpoint, use + + del + del address-expression + + The first form will be accepted immediately after a breakpoint hit, + and deletes the current breakpoint. The second form can remove any + breakpoint, but you need to specify the exact address, as it can be + obtained from + + show b + + To single-step the kernel, try + + s + + This will step into functions, but you can make DDB trace them until + the matching return statement is reached by + + n + + NOTE: this is different from gdb's ``next'' statement, it's like + gdb's ``finish''. + + To examine data from memory, use e.g. + + x/wx 0xf0133fe0,40 + x/hd db_symtab_space + x/bc termbuf,10 + x/s stringbuf + + for word/halfword/byte access, and hexadecimal/decimal/character/ + string display. The number after the comma is the object count. + To display the next 0x10 items, simply use + + x ,10 + + Similiarly, use + + x/ia foofunc,10 + + to disassemble the first 0x10 instructions of foofunc, and display + them along with their offset from the beginning of foofunc. + + To modify the memory, use the write command: + + w/b termbuf 0xa 0xb 0 + w/w 0xf0010030 0 0 + + The command modifier (b/h/w) specifies the size of the data to be + writtten, the first following expression is the address to write to, + the remainder is interpreted as data to write to successive memory + locations. + + If you need to know the current registers, use + + show reg + + Alternatively, you can display a single register value by e.g. + + print $eax + + and modify it by + + set $eax new-value + + + Should you need to call some kernel functions from DDB, simply + say + + call func(arg1, arg2, ...) + + The return value will be printed. + + For a ps-style summary of all running processes, use + + ps + + + + Well, you've now examined why your kernel failed, and you wish to + reboot. Remember that, depending on the severity of previous + malfunctioning, not all parts of the kernel might still be working + as expected. Perform one of the following actions to shut down and + reboot your system: + + + call diediedie() + + (must usually be followed by another ``c[ontinue]'' statement), + will cause your kernel to dump core and reboot, so you can later + analyze the core on a higher level with kgdb. + + + call boot(0) + + might be a good way to cleanly shut down the running system, sync() + all disks, and finally reboot. As long as the disk and file system + interfaces of the kernel are not damaged, this might be a good way + for an almost clean shutdown. + + + call cpu_reset() + + ...is the final way out of the desaster, almost similiar to hitting + the Big Red Button. + + + +*** What to do if i want to debug a console driver? *** + + Since you need a console driver to run DDB on, things are more + complicated if the console driver itself is flakey. You might + remember the ``options COMCONSOLE'' line, and hook up a standard + terminal onto your first serial port. DDB works on any configured + console driver, of course it also works on a COMCONSOLE. + + + + Paul Richards, FreeBSD core team member. (paul@FreeBSD.org) + J"org Wunsch (joerg@FreeBSD.org) |