diff options
Diffstat (limited to 'share/doc/handbook/kerneldebug.sgml')
-rw-r--r-- | share/doc/handbook/kerneldebug.sgml | 525 |
1 files changed, 0 insertions, 525 deletions
diff --git a/share/doc/handbook/kerneldebug.sgml b/share/doc/handbook/kerneldebug.sgml deleted file mode 100644 index dd617a4..0000000 --- a/share/doc/handbook/kerneldebug.sgml +++ /dev/null @@ -1,525 +0,0 @@ -<!-- $Id: kerneldebug.sgml,v 1.13 1997/03/18 00:42:36 joerg Exp $ --> -<!-- The FreeBSD Documentation Project --> - -<chapt><heading>Kernel Debugging<label id="kerneldebug"></heading> - -<p><em>Contributed by &a.paul; and &a.joerg;</em> - -<sect><heading>Debugging a kernel crash dump with kgdb</heading> - - <p>Here are some instructions for getting kernel debugging - working on a crash dump, it assumes that you have enough swap - space for a crash dump. If you have multiple swap - partitions and the first one is too small to hold the dump, - you can configure your kernel to use an alternate dump device - (in the <tt>config kernel</tt> line), or - you can specify an alternate using the dumpon(8) command. - Dumps to non-swap devices, - tapes for example, are currently not supported. Config your - kernel using <tt>config -g</tt>. - See <ref id="kernelconfig" name="Kernel Configuration"> for - details on configuring the FreeBSD kernel. - - Use the <tt>dumpon(8)</tt> command to tell the kernel where to dump - to (note that this will have to be done after configuring the - partition in question as swap space via <tt>swapon(8)</tt>). This is - normally arranged via <tt>/etc/sysconfig</tt> and <tt>/etc/rc</tt>. - Alternatively, you can - hard-code the dump device via the `dump' clause in the `config' line - of your kernel config file. This is deprecated, use only if you - want a crash dump from a kernel that crashes during booting. - - <em><bf>Note:</bf> In the following, the term `<tt>kgdb</tt>' refers - to <tt>gdb</tt> run in `kernel debug mode'. This can be accomplished by - either starting the <tt>gdb</tt> with the option <tt>-k</tt>, or by linking - and starting it under the name <tt>kgdb</tt>. This is not being - done by default, however, and the idea is basically deprecated since - the GNU folks do not love it if their tools behave differently when - called by another name. This feature might as well be discontinued - in further releases.</em> - - When the kernel has been built make a copy of it, say - <tt>kernel.debug</tt>, and then run <tt>strip -d</tt> on the - original. Install the original as normal. You may also install - the unstripped kernel, but symbol table lookup time for some - programs will drastically increase, and since - the whole kernel is loaded entirely at boot time and cannot be - swapped out later, several megabytes of - physical memory will be wasted. - - If you are testing a new kernel, for example by typing the new - kernel's name at the boot prompt, but need to boot a different - one in order to get your system up and running again, boot it - only into single user state using the <tt>-s</tt> flag at the - boot prompt, and then perform the following steps: -<tscreen><verb> - fsck -p - mount -a -t ufs # so your file system for /var/crash is writable - savecore -N /kernel.panicked /var/crash - exit # ...to multi-user -</verb></tscreen> - This instructs <tt>savecore(8)</tt> to use another kernel for symbol name - extraction. It would otherwise default to the currently running kernel - and most likely not do anything at all since the crash dump and the - kernel symbols differ. - - Now, after a crash dump, go to <tt>/sys/compile/WHATEVER</tt> and run - <tt>kgdb</tt>. From <tt>kgdb</tt> do: -<tscreen><verb> - symbol-file kernel.debug - exec-file /var/crash/kernel.0 - core-file /var/crash/vmcore.0 -</verb></tscreen> - and voila, you can debug the crash dump using the kernel sources - just like you can for any other program. - - Here is a script log of a <tt>kgdb</tt> session illustrating the - procedure. Long - lines have been folded to improve readability, and the lines are - numbered for reference. Despite this, it is a real-world error - trace taken during the development of the pcvt console driver. -<tscreen><verb> - 1:Script started on Fri Dec 30 23:15:22 1994 - 2:uriah # cd /sys/compile/URIAH - 3:uriah # kgdb kernel /var/crash/vmcore.1 - 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done. - 5:IdlePTD 1f3000 - 6:panic: because you said to! - 7:current pcb at 1e3f70 - 8:Reading in symbols for ../../i386/i386/machdep.c...done. - 9:(kgdb) where - 10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767) - 11:#1 0xf0115159 in panic () - 12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698) - 13:#3 0xf010185e in db_fncall () - 14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073) - 15:#5 0xf0101711 in db_command_loop () - 16:#6 0xf01040a0 in db_trap () - 17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723) - 18:#8 0xf019d2eb in trap_fatal (...) - 19:#9 0xf019ce60 in trap_pfault (...) - 20:#10 0xf019cb2f in trap (...) - 21:#11 0xf01932a1 in exception:calltrap () - 22:#12 0xf0191503 in cnopen (...) - 23:#13 0xf0132c34 in spec_open () - 24:#14 0xf012d014 in vn_open () - 25:#15 0xf012a183 in open () - 26:#16 0xf019d4eb in syscall (...) - 27:(kgdb) up 10 - 28:Reading in symbols for ../../i386/i386/trap.c...done. - 29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\ - 30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\ - 31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\ - 32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\ - 33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\ - 34:ss = -266427884}) (../../i386/i386/trap.c line 283) - 35:283 (void) trap_pfault(&frame, FALSE); - 36:(kgdb) frame frame->tf_ebp frame->tf_eip - 37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done. - 38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\ - 39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403) - 40:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); - 41:(kgdb) list - 42:398 - 43:399 tp->t_state |= TS_CARR_ON; - 44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */ - 45:401 - 46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200) - 47:403 return ((*linesw[tp->t_line].l_open)(dev, tp)); - 48:404 #else - 49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag)); - 50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */ - 51:407 } - 52:(kgdb) print tp - 53:Reading in symbols for ../../i386/i386/cons.c...done. - 54:$1 = (struct tty *) 0x1bae - 55:(kgdb) print tp->t_line - 56:$2 = 1767990816 - 57:(kgdb) up - 58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\ - 59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126) - 60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p)); - 61:(kgdb) up - 62:#2 0xf0132c34 in spec_open () - 63:(kgdb) up - 64:#3 0xf012d014 in vn_open () - 65:(kgdb) up - 66:#4 0xf012a183 in open () - 67:(kgdb) up - 68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\ - 69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\ - 70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \ - 71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \ - 72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673) - 73:673 error = (*callp->sy_call)(p, args, rval); - 74:(kgdb) up - 75:Initial frame selected; you cannot go up. - 76:(kgdb) quit - 77:uriah # exit - 78:exit - 79: - 80:Script done on Fri Dec 30 23:18:04 1994 -</verb></tscreen> - Comments to the above script: - -<descrip> -<tag/line 6:/ This is a dump taken from within DDB (see below), hence the - panic comment ``because you said to!'', and a rather long - stack trace; the initial reason for going into DDB has been - a page fault trap though. -<tag/line 20:/ This is the location of function <tt>trap()</tt> - in the stack trace. -<tag/line 36:/ Force usage of a new stack frame; this is no longer - necessary now. The stack frames are supposed to point to - the right locations now, even in case of a trap. - (I do not have a new core dump handy <g>, my kernel - did not panic for ia rather long time.) - From looking at the code in source line 403, - there is a high probability that either the pointer - access for ``tp'' was messed up, or the array access was - out of bounds. -<tag/line 52:/ The pointer looks suspicious, but happens to be a valid - address. -<tag/line 56:/ However, it obviously points to garbage, so we have found our - error! (For those unfamiliar with that particular piece - of code: <tt>tp->t_line</tt> refers to the line discipline - of the console device here, which must be a rather small integer - number.) -</descrip> - - -<sect><heading>Post-mortem analysis of a dump</heading> - -<p>What do you do if a kernel dumped core but you did not expect - it, and it is therefore not compiled using <tt>config -g</tt>? - Not everything is lost here. Do not panic! - - Of course, you still need to enable crash dumps. See above - on the options you have to specify in order to do this. - - Go to your kernel compile directory, and edit the line - containing <tt>COPTFLAGS?=-O</tt>. Add the <tt>-g</tt> option - there (but <em>do not</em> change anything on the level of - optimization). If you do already know roughly the probable - location of the failing piece of code (e.g., the <tt>pcvt</tt> - driver in the example above), remove all the object files for - this code. Rebuild the kernel. Due to the time stamp change on - the Makefile, there will be some other object files rebuild, - for example <tt>trap.o</tt>. With a bit of luck, the added - <tt>-g</tt> option will not change anything for the generated - code, so you will finally get a new kernel with similar code to - the faulting one but some debugging symbols. You should at - least verify the old and new sizes with the <tt>size(1)</tt> command. If - there is a mismatch, you probably need to give up here. - - Go and examine the dump as described above. The debugging - symbols might be incomplete for some places, as can be seen in - the stack trace in the example above where some functions are - displayed without line numbers and argument lists. If you need - more debugging symbols, remove the appropriate object files and - repeat the <tt>kgdb</tt> session until you know enough. - - All this is not guaranteed to work, but it will do it fine in - most cases. - -<sect><heading>On-line kernel debugging using DDB</heading> - -<p>While <tt>kgdb</tt> as an offline debugger provides a very - high level of user interface, there are some things it cannot do. - The most important ones being breakpointing and single-stepping - kernel code. - - If you need to do low-level debugging on your kernel, there is - an on-line debugger available called DDB. It allows to - setting breakpoints, single-steping kernel functions, examining - and changing kernel variables, etc. However, it cannot not - access kernel source files, and only has access to the global - and static symbols, not to the full debug information like - <tt>kgdb</tt>. - - To configure your kernel to include DDB, add the option line -<tscreen><verb> - options DDB -</verb></tscreen> - to your config file, and rebuild. (See <ref id="kernelconfig" - name="Kernel Configuration"> for details on configuring the - FreeBSD kernel. Note that if you have an older version of the - boot blocks, your debugger symbols might not be loaded at all. - Update the boot blocks, the recent ones do load the DDB symbols - automagically.) - - Once your DDB kernel is running, there are several ways to - enter DDB. The first, and earliest way is to type the boot - flag <tt>-d</tt> right at the boot prompt. The kernel will - start up in debug mode and enter DDB prior to any device - probing. Hence you are able to even debug the device - probe/attach functions. - - The second scenario is a hot-key on the keyboard, usually - Ctrl-Alt-ESC. For syscons, this can be remapped, and some of - the distributed maps do this, so watch out. - There is an option - available for serial consoles - that allows the use of a serial line BREAK on the console line to - enter DDB (``<tt>options BREAK_TO_DEBUGGER</tt>'' - in the kernel config file). It is not the default since there are a lot of - crappy serial adapters around that gratuitously generate a - BREAK condition for example when pulling the cable. - - The third way is that any panic condition will branch to DDB if - the kernel is configured to use it. - For this reason, it is not wise to - configure a kernel with DDB for a machine running unattended. - - The DDB commands roughly resemble some <tt>gdb</tt> commands. The first you - probably need is to set a breakpoint: -<tscreen><verb> - b function-name - b address -</verb></tscreen> - - Numbers are taken hexadecimal by default, but to make them - distinct from symbol names, hexadecimal numbers starting with the - letters <tt>a</tt>-<tt>f</tt> need to be preceded with - <tt>0x</tt> (for other numbers, this is optional). Simple - expressions are allowed, for example: <tt>function-name + 0x103</tt>. - - To continue the operation of an interrupted kernel, simply type -<tscreen><verb> - c -</verb></tscreen> - To get a stack trace, use -<tscreen><verb> - trace -</verb></tscreen> - Note that when entering DDB via a hot-key, the kernel is currently - servicing an interrupt, so the stack trace might be not of much use - for you. - - If you want to remove a breakpoint, use -<tscreen><verb> - del - del address-expression -</verb></tscreen> - The first form will be accepted immediately after a breakpoint hit, - and deletes the current breakpoint. The second form can remove any - breakpoint, but you need to specify the exact address, as it can be - obtained from -<tscreen><verb> - show b -</verb></tscreen> - To single-step the kernel, try -<tscreen><verb> - s -</verb></tscreen> - This will step into functions, but you can make DDB trace them until - the matching return statement is reached by -<tscreen><verb> - n -</verb></tscreen> - <bf>Note:</bf> this is different from <tt>gdb</tt>'s `next' statement, it is like - <tt>gdb</tt>'s `finish'. - - To examine data from memory, use (for example): -<tscreen><verb> - x/wx 0xf0133fe0,40 - x/hd db_symtab_space - x/bc termbuf,10 - x/s stringbuf -</verb></tscreen> - for word/halfword/byte access, and hexadecimal/decimal/character/ - string display. The number after the comma is the object count. - To display the next 0x10 items, simply use -<tscreen><verb> - x ,10 -</verb></tscreen> - Similarly, use -<tscreen><verb> - x/ia foofunc,10 -</verb></tscreen> - to disassemble the first 0x10 instructions of <tt>foofunc</tt>, and display - them along with their offset from the beginning of <tt>foofunc</tt>. - - To modify the memory, use the write command: -<tscreen><verb> - w/b termbuf 0xa 0xb 0 - w/w 0xf0010030 0 0 -</verb></tscreen> - The command modifier (<tt>b</tt>/<tt>h</tt>/<tt>w</tt>) - specifies the size of the data to be written, the first - following expression is the address to write to, the remainder - is interpreted as data to write to successive memory locations. - - If you need to know the current registers, use -<tscreen><verb> - show reg -</verb></tscreen> - Alternatively, you can display a single register value by e.g. -<tscreen><verb> - p $eax -</verb></tscreen> - and modify it by -<tscreen><verb> - set $eax new-value -</verb></tscreen> - - Should you need to call some kernel functions from DDB, simply - say -<tscreen><verb> - call func(arg1, arg2, ...) -</verb></tscreen> - The return value will be printed. - - For a <tt>ps(1)</tt> style summary of all running processes, use -<tscreen><verb> - ps -</verb></tscreen> - - Now you have now examined why your kernel failed, and you wish to - reboot. Remember that, depending on the severity of previous - malfunctioning, not all parts of the kernel might still be working - as expected. Perform one of the following actions to shut down and - reboot your system: -<tscreen><verb> - call diediedie() -</verb></tscreen> - - will cause your kernel to dump core and reboot, so you can - later analyze the core on a higher level with kgdb. This - command usually must be followed by another - `<tt>continue</tt>' statement. - There is now an alias for this: `<tt>panic</tt>'. - -<tscreen><verb> - call boot(0) -</verb></tscreen> - might be a good way to cleanly shut down the running system, <tt>sync()</tt> - all disks, and finally reboot. As long as the disk and file system - interfaces of the kernel are not damaged, this might be a good way - for an almost clean shutdown. - -<tscreen><verb> - call cpu_reset() -</verb></tscreen> - is the final way out of disaster and almost the same as hitting - the Big Red Button. - - If you need a short command summary, simply type -<tscreen><verb> - help -</verb></tscreen> - However, it is highly recommended to have a printed copy of the - <tt>ddb(4)</tt> manual page ready for a debugging session. - Remember that it is hard to read the on-line manual while - single-stepping the kernel. - -<sect><heading>On-line kernel debugging using remote GDB</heading> - -<p>This feature is supported since FreeBSD 2.2, and it's actually - a very neat one. - - GDB used to support <em/remote debugging/ for a long time - already. This is done using a very simple protocol along a - serial line. Obviously, and opposed to the other methods - described above, you need two machines for doing this. One is - the host providing the debugging environment, including all - the sources, and a copy of the kernel binary with all the - symbols in it, and the other one is the target machine that - simply runs a similar copy of the very same kernel (but stripped - off the debugging information). - - You should configure the kernel in question with <tt>config -g</tt>, - include <em/DDB/ into the configuration, and compile it as usual. - This gives a large blurb of a binary, due - to the debugging information. Copy this kernel to the target - machine, strip the debugging symbols off with <tt>strip -x</tt>, - and boot it using the <tt/-d/ boot option. Connect the first - serial line of the target machine to any serial line of the - debugging host. Now, on the debugging machine, go to the compile - directory of the target kernel, and start gdb: -<tscreen><verb> -% gdb -k kernel -GDB is free software and you are welcome to distribute copies of it - under certain conditions; type "show copying" to see the conditions. -There is absolutely no warranty for GDB; type "show warranty" for details. -GDB 4.16 (i386-unknown-freebsd), -Copyright 1996 Free Software Foundation, Inc... -(kgdb) -</verb></tscreen> - - Initialize the remote debugging session (assuming the first serial - port is being used) by: -<tscreen><verb> -(kgdb) target remote /dev/cuaa0 -</verb></tscreen> - - Now, on the target host (that entered DDB right before even starting - the device probe), type: -<tscreen><verb> -Debugger("Boot flags requested debugger") -Stopped at Debugger+0x35: movb $0, edata+0x51bc -db> gdb -</verb></tscreen> - - DDB will respond with: -<tscreen><verb> -Next trap will enter GDB remote protocol mode -</verb></tscreen> - - Every time you type ``gdb'', the mode will be toggled between - remote GDB and local DDB. In order to force a next trap - immediately, simply type ``s'' (step). Your hosting GDB will - now gain control over the target kernel: -<tscreen><verb> -Remote debugging using /dev/cuaa0 -Debugger (msg=0xf01b0383 "Boot flags requested debugger") - at ../../i386/i386/db_interface.c:257 -(kgdb) -</verb></tscreen> - - You can use this session almost as any other GDB session, including - full access to the source, running it in gud-mode inside an Emacs - window (which gives you an automatic source code display in another - Emacs window) etc. - -<p>Remote GDB can also be used to debug LKMs. First build the LKM - with debugging symbols: -<tscreen><verb> -# cd /usr/src/lkm/linux -# make clean; make COPTS=-g -</verb></tscreen> - - Then install this version of the module on the target machine, load it - and use <tt>modstat</tt> to find out where it was loaded: -<tscreen><verb> -# linux -# modstat -Type Id Off Loadaddr Size Info Rev Module Name -EXEC 0 4 f5109000 001c f510f010 1 linux_mod -</verb></tscreen> - - Take the load address of the module and add 0x20 (probably to account - for the a.out header). This is the address that the module code was - relocated to. Use the <tt>add-symbol-file</tt> command in GDB to tell the - debugger about the module: -<tscreen><verb> -(kgdb) add-symbol-file /usr/src/lkm/linux/linux_mod.o 0xf5109020 -add symbol table from file "/usr/src/lkm/linux/linux_mod.o" at -text_addr = 0xf5109020? -(y or n) y -(kgdb) -</verb></tscreen> - - You now have access to all the symbols in the LKM. - -<sect><heading>Debugging a console driver</heading> - -<p>Since you need a console driver to run DDB on, things are more - complicated if the console driver itself is failing. You might - remember the use of a serial console (either with modified boot - blocks, or by specifying <tt><bf>-h</bf></tt> at the <tt>Boot:</tt> - prompt), and hook up a standard - terminal onto your first serial port. DDB works on any configured - console driver, of course also on a serial console. - - |