summaryrefslogtreecommitdiffstats
path: root/share/doc/handbook/kerneldebug.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'share/doc/handbook/kerneldebug.sgml')
-rw-r--r--share/doc/handbook/kerneldebug.sgml525
1 files changed, 0 insertions, 525 deletions
diff --git a/share/doc/handbook/kerneldebug.sgml b/share/doc/handbook/kerneldebug.sgml
deleted file mode 100644
index dd617a4..0000000
--- a/share/doc/handbook/kerneldebug.sgml
+++ /dev/null
@@ -1,525 +0,0 @@
-<!-- $Id: kerneldebug.sgml,v 1.13 1997/03/18 00:42:36 joerg Exp $ -->
-<!-- The FreeBSD Documentation Project -->
-
-<chapt><heading>Kernel Debugging<label id="kerneldebug"></heading>
-
-<p><em>Contributed by &a.paul; and &a.joerg;</em>
-
-<sect><heading>Debugging a kernel crash dump with kgdb</heading>
-
- <p>Here are some instructions for getting kernel debugging
- working on a crash dump, it assumes that you have enough swap
- space for a crash dump. If you have multiple swap
- partitions and the first one is too small to hold the dump,
- you can configure your kernel to use an alternate dump device
- (in the <tt>config kernel</tt> line), or
- you can specify an alternate using the dumpon(8) command.
- Dumps to non-swap devices,
- tapes for example, are currently not supported. Config your
- kernel using <tt>config -g</tt>.
- See <ref id="kernelconfig" name="Kernel Configuration"> for
- details on configuring the FreeBSD kernel.
-
- Use the <tt>dumpon(8)</tt> command to tell the kernel where to dump
- to (note that this will have to be done after configuring the
- partition in question as swap space via <tt>swapon(8)</tt>). This is
- normally arranged via <tt>/etc/sysconfig</tt> and <tt>/etc/rc</tt>.
- Alternatively, you can
- hard-code the dump device via the `dump' clause in the `config' line
- of your kernel config file. This is deprecated, use only if you
- want a crash dump from a kernel that crashes during booting.
-
- <em><bf>Note:</bf> In the following, the term `<tt>kgdb</tt>' refers
- to <tt>gdb</tt> run in `kernel debug mode'. This can be accomplished by
- either starting the <tt>gdb</tt> with the option <tt>-k</tt>, or by linking
- and starting it under the name <tt>kgdb</tt>. This is not being
- done by default, however, and the idea is basically deprecated since
- the GNU folks do not love it if their tools behave differently when
- called by another name. This feature might as well be discontinued
- in further releases.</em>
-
- When the kernel has been built make a copy of it, say
- <tt>kernel.debug</tt>, and then run <tt>strip -d</tt> on the
- original. Install the original as normal. You may also install
- the unstripped kernel, but symbol table lookup time for some
- programs will drastically increase, and since
- the whole kernel is loaded entirely at boot time and cannot be
- swapped out later, several megabytes of
- physical memory will be wasted.
-
- If you are testing a new kernel, for example by typing the new
- kernel's name at the boot prompt, but need to boot a different
- one in order to get your system up and running again, boot it
- only into single user state using the <tt>-s</tt> flag at the
- boot prompt, and then perform the following steps:
-<tscreen><verb>
- fsck -p
- mount -a -t ufs # so your file system for /var/crash is writable
- savecore -N /kernel.panicked /var/crash
- exit # ...to multi-user
-</verb></tscreen>
- This instructs <tt>savecore(8)</tt> to use another kernel for symbol name
- extraction. It would otherwise default to the currently running kernel
- and most likely not do anything at all since the crash dump and the
- kernel symbols differ.
-
- Now, after a crash dump, go to <tt>/sys/compile/WHATEVER</tt> and run
- <tt>kgdb</tt>. From <tt>kgdb</tt> do:
-<tscreen><verb>
- symbol-file kernel.debug
- exec-file /var/crash/kernel.0
- core-file /var/crash/vmcore.0
-</verb></tscreen>
- and voila, you can debug the crash dump using the kernel sources
- just like you can for any other program.
-
- Here is a script log of a <tt>kgdb</tt> session illustrating the
- procedure. Long
- lines have been folded to improve readability, and the lines are
- numbered for reference. Despite this, it is a real-world error
- trace taken during the development of the pcvt console driver.
-<tscreen><verb>
- 1:Script started on Fri Dec 30 23:15:22 1994
- 2:uriah # cd /sys/compile/URIAH
- 3:uriah # kgdb kernel /var/crash/vmcore.1
- 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done.
- 5:IdlePTD 1f3000
- 6:panic: because you said to!
- 7:current pcb at 1e3f70
- 8:Reading in symbols for ../../i386/i386/machdep.c...done.
- 9:(kgdb) where
- 10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767)
- 11:#1 0xf0115159 in panic ()
- 12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698)
- 13:#3 0xf010185e in db_fncall ()
- 14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073)
- 15:#5 0xf0101711 in db_command_loop ()
- 16:#6 0xf01040a0 in db_trap ()
- 17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723)
- 18:#8 0xf019d2eb in trap_fatal (...)
- 19:#9 0xf019ce60 in trap_pfault (...)
- 20:#10 0xf019cb2f in trap (...)
- 21:#11 0xf01932a1 in exception:calltrap ()
- 22:#12 0xf0191503 in cnopen (...)
- 23:#13 0xf0132c34 in spec_open ()
- 24:#14 0xf012d014 in vn_open ()
- 25:#15 0xf012a183 in open ()
- 26:#16 0xf019d4eb in syscall (...)
- 27:(kgdb) up 10
- 28:Reading in symbols for ../../i386/i386/trap.c...done.
- 29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\
- 30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\
- 31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\
- 32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\
- 33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\
- 34:ss = -266427884}) (../../i386/i386/trap.c line 283)
- 35:283 (void) trap_pfault(&amp;frame, FALSE);
- 36:(kgdb) frame frame->tf_ebp frame->tf_eip
- 37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done.
- 38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\
- 39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403)
- 40:403 return ((*linesw[tp->t_line].l_open)(dev, tp));
- 41:(kgdb) list
- 42:398
- 43:399 tp->t_state |= TS_CARR_ON;
- 44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */
- 45:401
- 46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200)
- 47:403 return ((*linesw[tp->t_line].l_open)(dev, tp));
- 48:404 #else
- 49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag));
- 50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */
- 51:407 }
- 52:(kgdb) print tp
- 53:Reading in symbols for ../../i386/i386/cons.c...done.
- 54:$1 = (struct tty *) 0x1bae
- 55:(kgdb) print tp->t_line
- 56:$2 = 1767990816
- 57:(kgdb) up
- 58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\
- 59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126)
- 60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
- 61:(kgdb) up
- 62:#2 0xf0132c34 in spec_open ()
- 63:(kgdb) up
- 64:#3 0xf012d014 in vn_open ()
- 65:(kgdb) up
- 66:#4 0xf012a183 in open ()
- 67:(kgdb) up
- 68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\
- 69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\
- 70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \
- 71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \
- 72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673)
- 73:673 error = (*callp->sy_call)(p, args, rval);
- 74:(kgdb) up
- 75:Initial frame selected; you cannot go up.
- 76:(kgdb) quit
- 77:uriah # exit
- 78:exit
- 79:
- 80:Script done on Fri Dec 30 23:18:04 1994
-</verb></tscreen>
- Comments to the above script:
-
-<descrip>
-<tag/line 6:/ This is a dump taken from within DDB (see below), hence the
- panic comment ``because you said to!'', and a rather long
- stack trace; the initial reason for going into DDB has been
- a page fault trap though.
-<tag/line 20:/ This is the location of function <tt>trap()</tt>
- in the stack trace.
-<tag/line 36:/ Force usage of a new stack frame; this is no longer
- necessary now. The stack frames are supposed to point to
- the right locations now, even in case of a trap.
- (I do not have a new core dump handy &lt;g&gt;, my kernel
- did not panic for ia rather long time.)
- From looking at the code in source line 403,
- there is a high probability that either the pointer
- access for ``tp'' was messed up, or the array access was
- out of bounds.
-<tag/line 52:/ The pointer looks suspicious, but happens to be a valid
- address.
-<tag/line 56:/ However, it obviously points to garbage, so we have found our
- error! (For those unfamiliar with that particular piece
- of code: <tt>tp-&gt;t_line</tt> refers to the line discipline
- of the console device here, which must be a rather small integer
- number.)
-</descrip>
-
-
-<sect><heading>Post-mortem analysis of a dump</heading>
-
-<p>What do you do if a kernel dumped core but you did not expect
- it, and it is therefore not compiled using <tt>config -g</tt>?
- Not everything is lost here. Do not panic!
-
- Of course, you still need to enable crash dumps. See above
- on the options you have to specify in order to do this.
-
- Go to your kernel compile directory, and edit the line
- containing <tt>COPTFLAGS?=-O</tt>. Add the <tt>-g</tt> option
- there (but <em>do not</em> change anything on the level of
- optimization). If you do already know roughly the probable
- location of the failing piece of code (e.g., the <tt>pcvt</tt>
- driver in the example above), remove all the object files for
- this code. Rebuild the kernel. Due to the time stamp change on
- the Makefile, there will be some other object files rebuild,
- for example <tt>trap.o</tt>. With a bit of luck, the added
- <tt>-g</tt> option will not change anything for the generated
- code, so you will finally get a new kernel with similar code to
- the faulting one but some debugging symbols. You should at
- least verify the old and new sizes with the <tt>size(1)</tt> command. If
- there is a mismatch, you probably need to give up here.
-
- Go and examine the dump as described above. The debugging
- symbols might be incomplete for some places, as can be seen in
- the stack trace in the example above where some functions are
- displayed without line numbers and argument lists. If you need
- more debugging symbols, remove the appropriate object files and
- repeat the <tt>kgdb</tt> session until you know enough.
-
- All this is not guaranteed to work, but it will do it fine in
- most cases.
-
-<sect><heading>On-line kernel debugging using DDB</heading>
-
-<p>While <tt>kgdb</tt> as an offline debugger provides a very
- high level of user interface, there are some things it cannot do.
- The most important ones being breakpointing and single-stepping
- kernel code.
-
- If you need to do low-level debugging on your kernel, there is
- an on-line debugger available called DDB. It allows to
- setting breakpoints, single-steping kernel functions, examining
- and changing kernel variables, etc. However, it cannot not
- access kernel source files, and only has access to the global
- and static symbols, not to the full debug information like
- <tt>kgdb</tt>.
-
- To configure your kernel to include DDB, add the option line
-<tscreen><verb>
- options DDB
-</verb></tscreen>
- to your config file, and rebuild. (See <ref id="kernelconfig"
- name="Kernel Configuration"> for details on configuring the
- FreeBSD kernel. Note that if you have an older version of the
- boot blocks, your debugger symbols might not be loaded at all.
- Update the boot blocks, the recent ones do load the DDB symbols
- automagically.)
-
- Once your DDB kernel is running, there are several ways to
- enter DDB. The first, and earliest way is to type the boot
- flag <tt>-d</tt> right at the boot prompt. The kernel will
- start up in debug mode and enter DDB prior to any device
- probing. Hence you are able to even debug the device
- probe/attach functions.
-
- The second scenario is a hot-key on the keyboard, usually
- Ctrl-Alt-ESC. For syscons, this can be remapped, and some of
- the distributed maps do this, so watch out.
- There is an option
- available for serial consoles
- that allows the use of a serial line BREAK on the console line to
- enter DDB (``<tt>options BREAK_TO_DEBUGGER</tt>''
- in the kernel config file). It is not the default since there are a lot of
- crappy serial adapters around that gratuitously generate a
- BREAK condition for example when pulling the cable.
-
- The third way is that any panic condition will branch to DDB if
- the kernel is configured to use it.
- For this reason, it is not wise to
- configure a kernel with DDB for a machine running unattended.
-
- The DDB commands roughly resemble some <tt>gdb</tt> commands. The first you
- probably need is to set a breakpoint:
-<tscreen><verb>
- b function-name
- b address
-</verb></tscreen>
-
- Numbers are taken hexadecimal by default, but to make them
- distinct from symbol names, hexadecimal numbers starting with the
- letters <tt>a</tt>-<tt>f</tt> need to be preceded with
- <tt>0x</tt> (for other numbers, this is optional). Simple
- expressions are allowed, for example: <tt>function-name + 0x103</tt>.
-
- To continue the operation of an interrupted kernel, simply type
-<tscreen><verb>
- c
-</verb></tscreen>
- To get a stack trace, use
-<tscreen><verb>
- trace
-</verb></tscreen>
- Note that when entering DDB via a hot-key, the kernel is currently
- servicing an interrupt, so the stack trace might be not of much use
- for you.
-
- If you want to remove a breakpoint, use
-<tscreen><verb>
- del
- del address-expression
-</verb></tscreen>
- The first form will be accepted immediately after a breakpoint hit,
- and deletes the current breakpoint. The second form can remove any
- breakpoint, but you need to specify the exact address, as it can be
- obtained from
-<tscreen><verb>
- show b
-</verb></tscreen>
- To single-step the kernel, try
-<tscreen><verb>
- s
-</verb></tscreen>
- This will step into functions, but you can make DDB trace them until
- the matching return statement is reached by
-<tscreen><verb>
- n
-</verb></tscreen>
- <bf>Note:</bf> this is different from <tt>gdb</tt>'s `next' statement, it is like
- <tt>gdb</tt>'s `finish'.
-
- To examine data from memory, use (for example):
-<tscreen><verb>
- x/wx 0xf0133fe0,40
- x/hd db_symtab_space
- x/bc termbuf,10
- x/s stringbuf
-</verb></tscreen>
- for word/halfword/byte access, and hexadecimal/decimal/character/
- string display. The number after the comma is the object count.
- To display the next 0x10 items, simply use
-<tscreen><verb>
- x ,10
-</verb></tscreen>
- Similarly, use
-<tscreen><verb>
- x/ia foofunc,10
-</verb></tscreen>
- to disassemble the first 0x10 instructions of <tt>foofunc</tt>, and display
- them along with their offset from the beginning of <tt>foofunc</tt>.
-
- To modify the memory, use the write command:
-<tscreen><verb>
- w/b termbuf 0xa 0xb 0
- w/w 0xf0010030 0 0
-</verb></tscreen>
- The command modifier (<tt>b</tt>/<tt>h</tt>/<tt>w</tt>)
- specifies the size of the data to be written, the first
- following expression is the address to write to, the remainder
- is interpreted as data to write to successive memory locations.
-
- If you need to know the current registers, use
-<tscreen><verb>
- show reg
-</verb></tscreen>
- Alternatively, you can display a single register value by e.g.
-<tscreen><verb>
- p $eax
-</verb></tscreen>
- and modify it by
-<tscreen><verb>
- set $eax new-value
-</verb></tscreen>
-
- Should you need to call some kernel functions from DDB, simply
- say
-<tscreen><verb>
- call func(arg1, arg2, ...)
-</verb></tscreen>
- The return value will be printed.
-
- For a <tt>ps(1)</tt> style summary of all running processes, use
-<tscreen><verb>
- ps
-</verb></tscreen>
-
- Now you have now examined why your kernel failed, and you wish to
- reboot. Remember that, depending on the severity of previous
- malfunctioning, not all parts of the kernel might still be working
- as expected. Perform one of the following actions to shut down and
- reboot your system:
-<tscreen><verb>
- call diediedie()
-</verb></tscreen>
-
- will cause your kernel to dump core and reboot, so you can
- later analyze the core on a higher level with kgdb. This
- command usually must be followed by another
- `<tt>continue</tt>' statement.
- There is now an alias for this: `<tt>panic</tt>'.
-
-<tscreen><verb>
- call boot(0)
-</verb></tscreen>
- might be a good way to cleanly shut down the running system, <tt>sync()</tt>
- all disks, and finally reboot. As long as the disk and file system
- interfaces of the kernel are not damaged, this might be a good way
- for an almost clean shutdown.
-
-<tscreen><verb>
- call cpu_reset()
-</verb></tscreen>
- is the final way out of disaster and almost the same as hitting
- the Big Red Button.
-
- If you need a short command summary, simply type
-<tscreen><verb>
- help
-</verb></tscreen>
- However, it is highly recommended to have a printed copy of the
- <tt>ddb(4)</tt> manual page ready for a debugging session.
- Remember that it is hard to read the on-line manual while
- single-stepping the kernel.
-
-<sect><heading>On-line kernel debugging using remote GDB</heading>
-
-<p>This feature is supported since FreeBSD 2.2, and it's actually
- a very neat one.
-
- GDB used to support <em/remote debugging/ for a long time
- already. This is done using a very simple protocol along a
- serial line. Obviously, and opposed to the other methods
- described above, you need two machines for doing this. One is
- the host providing the debugging environment, including all
- the sources, and a copy of the kernel binary with all the
- symbols in it, and the other one is the target machine that
- simply runs a similar copy of the very same kernel (but stripped
- off the debugging information).
-
- You should configure the kernel in question with <tt>config -g</tt>,
- include <em/DDB/ into the configuration, and compile it as usual.
- This gives a large blurb of a binary, due
- to the debugging information. Copy this kernel to the target
- machine, strip the debugging symbols off with <tt>strip -x</tt>,
- and boot it using the <tt/-d/ boot option. Connect the first
- serial line of the target machine to any serial line of the
- debugging host. Now, on the debugging machine, go to the compile
- directory of the target kernel, and start gdb:
-<tscreen><verb>
-% gdb -k kernel
-GDB is free software and you are welcome to distribute copies of it
- under certain conditions; type "show copying" to see the conditions.
-There is absolutely no warranty for GDB; type "show warranty" for details.
-GDB 4.16 (i386-unknown-freebsd),
-Copyright 1996 Free Software Foundation, Inc...
-(kgdb)
-</verb></tscreen>
-
- Initialize the remote debugging session (assuming the first serial
- port is being used) by:
-<tscreen><verb>
-(kgdb) target remote /dev/cuaa0
-</verb></tscreen>
-
- Now, on the target host (that entered DDB right before even starting
- the device probe), type:
-<tscreen><verb>
-Debugger("Boot flags requested debugger")
-Stopped at Debugger+0x35: movb $0, edata+0x51bc
-db> gdb
-</verb></tscreen>
-
- DDB will respond with:
-<tscreen><verb>
-Next trap will enter GDB remote protocol mode
-</verb></tscreen>
-
- Every time you type ``gdb'', the mode will be toggled between
- remote GDB and local DDB. In order to force a next trap
- immediately, simply type ``s'' (step). Your hosting GDB will
- now gain control over the target kernel:
-<tscreen><verb>
-Remote debugging using /dev/cuaa0
-Debugger (msg=0xf01b0383 "Boot flags requested debugger")
- at ../../i386/i386/db_interface.c:257
-(kgdb)
-</verb></tscreen>
-
- You can use this session almost as any other GDB session, including
- full access to the source, running it in gud-mode inside an Emacs
- window (which gives you an automatic source code display in another
- Emacs window) etc.
-
-<p>Remote GDB can also be used to debug LKMs. First build the LKM
- with debugging symbols:
-<tscreen><verb>
-# cd /usr/src/lkm/linux
-# make clean; make COPTS=-g
-</verb></tscreen>
-
- Then install this version of the module on the target machine, load it
- and use <tt>modstat</tt> to find out where it was loaded:
-<tscreen><verb>
-# linux
-# modstat
-Type Id Off Loadaddr Size Info Rev Module Name
-EXEC 0 4 f5109000 001c f510f010 1 linux_mod
-</verb></tscreen>
-
- Take the load address of the module and add 0x20 (probably to account
- for the a.out header). This is the address that the module code was
- relocated to. Use the <tt>add-symbol-file</tt> command in GDB to tell the
- debugger about the module:
-<tscreen><verb>
-(kgdb) add-symbol-file /usr/src/lkm/linux/linux_mod.o 0xf5109020
-add symbol table from file "/usr/src/lkm/linux/linux_mod.o" at
-text_addr = 0xf5109020?
-(y or n) y
-(kgdb)
-</verb></tscreen>
-
- You now have access to all the symbols in the LKM.
-
-<sect><heading>Debugging a console driver</heading>
-
-<p>Since you need a console driver to run DDB on, things are more
- complicated if the console driver itself is failing. You might
- remember the use of a serial console (either with modified boot
- blocks, or by specifying <tt><bf>-h</bf></tt> at the <tt>Boot:</tt>
- prompt), and hook up a standard
- terminal onto your first serial port. DDB works on any configured
- console driver, of course also on a serial console.
-
-
OpenPOWER on IntegriCloud