summaryrefslogtreecommitdiffstats
path: root/sys/net/bpf.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge more of currently non-functional (i.e. resolving tozec2008-11-261-0/+7
| | | | | | | | | | | | | | | | | whitespace) macros from p4/vimage branch. Do a better job at enclosing all instantiations of globals scheduled for virtualization in #ifdef VIMAGE_GLOBALS blocks. De-virtualize and mark as const saorder_state_alive and saorder_state_any arrays from ipsec code, given that they are never updated at runtime, so virtualizing them would be pointless. Reviewed by: bz, julian Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-1/+1
| | | | MFC after: 3 months
* Make bpf_maxinsns visible from ng_bpf.c.jkim2008-08-291-1/+1
| | | | Pass me the pointyhat, please.
* Change bpf(4) to use the cdevpriv API.ed2008-08-131-80/+62
| | | | | | | | | | | | | | | | | | Right now the bpf(4) driver uses the cloning API to generate /dev/bpf%u. When an application such as tcpdump needs a BPF, it opens /dev/bpf0, /dev/bpf1, etc. until it opens the first available device node. We used this approach, because our devfs implementation didn't allow per-descriptor data. Now that we can, make it use devfs_get_cdevpriv() to obtain the private data. To remain compatible with the existing implementation, add a symlink from /dev/bpf0 to /dev/bpf. I've already changed libpcap to compile with HAVE_CLONING_BPF, which makes it use /dev/bpf. There may be other applications in the base system (dhclient) that use the loop to obtain a valid bpf. Discussed on: src-committers Approved by: csjp
* Annotate why we do not call BPF_CHECK_DIRECTION() in this tapping routine.csjp2008-08-011-0/+6
| | | | | | | There is no way for the caller to tell us which direction this packet is going. With the bpf_mtap{2} routines, we can check the interface pointer. MFC after: 2 weeks
* Allow injecting big packets via bpf(4) up to min(MTU, 16K-byte).jkim2008-07-141-3/+9
| | | | MFC after: 1 week
* Add a new ioctl for changing the read filter (BIOCSETFNR). This isdwmalone2008-07-071-4/+8
| | | | | | | | | | | | just like BIOCSETF but it doesn't drop all the packets buffered on the discriptor and reset the statistics. Also, when setting the write filter, don't drop packets waiting to be read or reset the statistics. PR: 118486 Submitted by: Matthew Luckie <mluckie@cs.waikato.ac.nz> MFC after: 1 month
* Make sure we are clearing the ZBUF_FLAG_IMMUTABLE any time a free buffercsjp2008-07-051-0/+25
| | | | | | | | is reclaimed by the kernel. This fixes a bug resulted in the kernel over writing packet data while user-space was still processing it when zerocopy is enabled. (Or a panic if invariants was enabled). Discussed with: rwatson
* Set D_TRACKCLOSE to avoid a race in devfs that could lead to orphaned bpfjhb2008-05-091-0/+1
| | | | | | devices never getting fully closed. MFC after: 3 days
* Check packet directions more properly instead of just checking receivedjkim2008-04-281-5/+5
| | | | | | | | interface is null. PR: kern/123138 Submitted by: Dmitry (hanabana at mail dot ru) MFC after: 1 week
* Revert the previous commit and use M_PROMISC flag instead.jkim2008-04-151-8/+17
| | | | It is safer because it will never be used for outgoing packets.
* Remove M_SKIP_FIREWALL abuse and add more appropriate check.jkim2008-04-151-20/+11
| | | | | | Pointyhat to: jkim Reported by: Eugene Grosbein (eugen at kuzbass dot ru) MFC after: 3 days
* Maintain and observe a ZBUF_FLAG_IMMUTABLE flag on zero-copy BPFrwatson2008-04-071-10/+50
| | | | | | | | | | | | | | | | | | | | | | buffer kernel descriptors, which is used to allow the buffer currently in the BPF "store" position to be assigned to userspace when it fills, even if userspace hasn't acknowledged the buffer in the "hold" position yet. To implement this, notify the buffer model when a buffer becomes full, and check that the store buffer is writable, not just for it being full, before trying to append new packet data. Shared memory buffers will be assigned to userspace at most once per fill, be it in the store or in the hold position. This removes the restriction that at most one shared memory can by owned by userspace, reducing the chances that userspace will need to call select() after acknowledging one buffer in order to wait for the next buffer when under high load. This more fully realizes the goal of zero system calls in order to process a high-speed packet stream from BPF. Update bpf.4 to reflect that both buffers may be owned by userspace at once; caution against assuming this.
* Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.ru2008-03-251-7/+4
| | | | | | | | | | Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
* Check for a NULL free buffer pointer in BPF before invokingrwatson2008-03-251-1/+1
| | | | | | | | bpf_canfreebuf() in order to avoid potentially calling a non-inlinable but trivial function in zero-copy buffer mode for every packet received when we couldn't free the buffer anyway. MFC after: 4 months
* Introduce support for zero-copy BPF buffering, which reduces thecsjp2008-03-241-109/+313
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | overhead of packet capture by allowing a user process to directly "loan" buffer memory to the kernel rather than using read(2) to explicitly copy data from kernel address space. The user process will issue new BPF ioctls to set the shared memory buffer mode and provide pointers to buffers and their size. The kernel then wires and maps the pages into kernel address space using sf_buf(9), which on supporting architectures will use the direct map region. The current "buffered" access mode remains the default, and support for zero-copy buffers must, for the time being, be explicitly enabled using a sysctl for the kernel to accept requests to use it. The kernel and user process synchronize use of the buffers with atomic operations, avoiding the need for system calls under load; the user process may use select()/poll()/kqueue() to manage blocking while waiting for network data if the user process is able to consume data faster than the kernel generates it. Patchs to libpcap are available to allow libpcap applications to transparently take advantage of this support. Detailed information on the new API may be found in bpf(4), including specific atomic operations and memory barriers required to synchronize buffer use safely. These changes modify the base BPF implementation to (roughly) abstrac the current buffer model, allowing the new shared memory model to be added, and add new monitoring statistics for netstat to print. The implementation, with the exception of some monitoring hanges that break the netstat monitoring ABI for BPF, will be MFC'd. Zerocopy bpf buffers are still considered experimental are disabled by default. To experiment with this new facility, adjust the net.bpf.zerocopy_enable sysctl variable to 1. Changes to libpcap will be made available as a patch for the time being, and further refinements to the implementation are expected. Sponsored by: Seccuris Inc. In collaboration with: rwatson Tested by: pwood, gallatin MFC after: 4 months [1] [1] Certain portions will probably not be MFCed, specifically things that can break the monitoring ABI.
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+1
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Add comment that bpfread() has multi-threading issues.rwatson2008-02-021-1/+4
| | | | Fix minor white space nit.
* Use __FBSDID() in the kernel BPF implementation.rwatson2007-12-251-2/+3
| | | | MFC after: 3 days
* Remove trailing whitespace from lines in BPF.rwatson2007-12-231-3/+3
| | | | MFC after: 3 days
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-8/+8
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Make sure that we refresh the PID on read(2) and write(2) operations.csjp2007-10-121-0/+2
| | | | | | | | | | | | | This fixes the process portion of the bpf(4) stats if the peer forks into the background after it's opened the descriptor. This bug results in the following behavior for netstat -B: # netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command netstat: kern.proc.pid failed: No such process 78023 em0 p--s-- 2237404 43119 2237404 13986 0 ?????? MFC after: 1 week
* Check for multicast destination on bpf injected packets and update the M_*CASTthompsa2007-09-101-4/+19
| | | | | | | | | | flags, the absense of these flags causes problems in other areas such as bridging which expect them to be correct. At the moment only Ethernet DLTs are checked. Reviewed by: bms, csjp, sam Approved by: re (bmah)
* Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, whichrwatson2007-08-061-10/+2
| | | | | | | | | | | | | | | previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
* Replace references to NET_CALLOUT_MPSAFE with CALLOUT_MPSAFE, and removerwatson2007-07-281-1/+1
| | | | | | | | definition of NET_CALLOUT_MPSAFE, which is no longer required now that debug.mpsafenet has been removed. The once over: bz Approved by: re (kensmith)
* Silence some gcc 4 warnings. It is expected that the bpf_movein() routinecsjp2007-06-171-0/+2
| | | | | | will intialize the the header length and re-initialize the mbuf pointer to reference the mbuf that is allocated after moving user supplied packet data in.
* - Conditionally pickup Giant around the network interfacecsjp2007-06-151-3/+4
| | | | | | | | | | | | | | ioctl routines if we are running with !mpsafenet - Change un-conditional Giant acquisition around ifpromisc to occur only if we are running with !mpsafenet With these locking bits in place, we can now remove the Giant requirement from BPF, so drop the D_NEEDGIANT device flag. This change removes Giant acquisitions around BPF device handlers (read, write, ioctl etc). MFC after: 1 month Discussed with: rwatson
* Add three new ioctl(2) commands for bpf(4).jkim2007-02-261-31/+88
| | | | | | | | | | | | | | | | | | | | | | - BIOCGDIRECTION and BIOCSDIRECTION get or set the setting determining whether incoming, outgoing, or all packets on the interface should be returned by BPF. Set to BPF_D_IN to see only incoming packets on the interface. Set to BPF_D_INOUT to see packets originating locally and remotely on the interface. Set to BPF_D_OUT to see only outgoing packets on the interface. This setting is initialized to BPF_D_INOUT by default. BIOCGSEESENT and BIOCSSEESENT are obsoleted by these but kept for backward compatibility. - BIOCFEEDBACK sets packet feedback mode. This allows injected packets to be fed back as input to the interface when output via the interface is successful. When BPF_D_INOUT direction is set, injected outgoing packet is not returned by BPF to avoid duplication. This flag is initialized to zero by default. Note that libpcap has been modified to support BPF_D_OUT direction for pcap_setdirection(3) and PCAP_D_OUT direction is functional now. Reviewed by: rwatson
* Remove slightly dubious comment; add descriptive strings for severalrwatson2007-01-281-5/+2
| | | | | | sysctls. MFC after: 3 days
* Sweep kernel replacing suser(9) calls with priv(9) calls, assigningrwatson2006-11-061-1/+2
| | | | | | | | | | | | | specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+2
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Since bpf_allocbufs() uses malloc() with M_WAITOK, don't check returnrwatson2006-08-091-16/+9
| | | | | | | values for NULL or return an error state. Assert that all three bpf buffer pointers are NULL before starting. MFC after: 1 week
* add support for 802.11 packet injection via bpfsam2006-07-261-0/+31
| | | | | | Together with: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Reviewed by: arch@ MFC after: 1 month
* Rather than calling mircotime() in catchpacket(), make catchpacket()dwmalone2006-07-241-6/+30
| | | | | | | | | | | | | | | | | | | | | | | take a timeval indicating when the packet was captured. Move microtime() to the calling functions and grab the timestamp as soon as we know that we're going to call catchpacket at least once. This means that we call microtime() once per matched packet, as opposed to once per matched packet per bpf listener. It also means that we return the same timestamp to all bpf listeners, rather than slightly different ones. It would be more accurate to call microtime() even earlier for all packets, as you have to grab (1+#listener) locks before you can determine if the packet will be logged. You could always grab a timestamp before the locks, but microtime() can be costly, so this didn't seem like a good idea. (I guess most ethernet interfaces will have a bpf listener these days because of dhclient. That means that we could be doing two bpf locks on most packets going through the interface.) PR: 71711
* Adjust descriptor locking to tell the kqueue subsystem that our descriptor iscsjp2006-07-031-3/+1
| | | | | | | | | | | | | | already locked. The reason to do this is to avoid two lock+unlock operations in a row. We need the lock here to serialize access to bd_pid for stats collection purposes. Drop the locks all together on detach, as they will be picked up by knlist_remove. This should fix a failed locking assertion when kqueue is being used with bpf descriptors. Discussed with: jmg
* Since we are doing some bpf(4) clean up, change a couple of function prototypescsjp2006-06-151-142/+48
| | | | | to be consistent. Also, ANSI'fy function definitions. There is no functional change here.
* If bpf(4) has not been compiled into the kernel, initialize the bpf interfacecsjp2006-06-141-0/+5
| | | | | | | | pointer to a zeroed, statically allocated bpf_if structure. This way the LIST_EMPTY() macro will always return true. This allows us to remove the additional unconditional memory reference for each packet in the fast path. Discussed with: sam
* Fix the following bpf(4) race condition which can result in a panic:csjp2006-06-021-71/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (1) bpf peer attaches to interface netif0 (2) Packet is received by netif0 (3) ifp->if_bpf pointer is checked and handed off to bpf (4) bpf peer detaches from netif0 resulting in ifp->if_bpf being initialized to NULL. (5) ifp->if_bpf is dereferenced by bpf machinery (6) Kaboom This race condition likely explains the various different kernel panics reported around sending SIGINT to tcpdump or dhclient processes. But really this race can result in kernel panics anywhere you have frequent bpf attach and detach operations with high packet per second load. Summary of changes: - Remove the bpf interface's "driverp" member - When we attach bpf interfaces, we now set the ifp->if_bpf member to the bpf interface structure. Once this is done, ifp->if_bpf should never be NULL. [1] - Introduce bpf_peers_present function, an inline operation which will do a lockless read bpf peer list associated with the interface. It should be noted that the bpf code will pickup the bpf_interface lock before adding or removing bpf peers. This should serialize the access to the bpf descriptor list, removing the race. - Expose the bpf_if structure in bpf.h so that the bpf_peers_present function can use it. This also removes the struct bpf_if; hack that was there. - Adjust all consumers of the raw if_bpf structure to use bpf_peers_present Now what happens is: (1) Packet is received by netif0 (2) Check to see if bpf descriptor list is empty (3) Pickup the bpf interface lock (4) Hand packet off to process From the attach/detach side: (1) Pickup the bpf interface lock (2) Add/remove from bpf descriptor list Now that we are storing the bpf interface structure with the ifnet, there is is no need to walk the bpf interface list to locate the correct bpf interface. We now simply look up the interface, and initialize the pointer. This has a nice side effect of changing a bpf interface attach operation from O(N) (where N is the number of bpf interfaces), to O(1). [1] From now on, we can no longer check ifp->if_bpf to tell us whether or not we have any bpf peers that might be interested in receiving packets. In collaboration with: sam@ MFC after: 1 month
* Fix -Wundef warnings.ru2006-05-301-7/+7
|
* Pickup locks for the BPF interface structure. It's quite possible thatcsjp2006-05-071-0/+2
| | | | | | | bpf(4) descriptors can be added and removed on this interface while we are processing stats. MFC after: 2 weeks
* Add BPF Just-In-Time compiler support for ng_bpf(4).jkim2005-12-071-6/+0
| | | | | The sysctl is changed from net.bpf.jitter.enable to net.bpf_jitter.enable and this controls both bpf(4) and ng_bpf(4) now.
* Add experimental BPF Just-In-Time compiler for amd64 and i386.jkim2005-12-061-3/+54
| | | | | | | | | | | | | | | | | Use the following kernel configuration option to enable: options BPF_JITTER If you want to use bpf_filter() instead (e. g., debugging), do: sysctl net.bpf.jitter.enable=0 to turn it off. Currently BIOCSETWF and bpf_mtap2() are unsupported, and bpf_mtap() is partially supported because 1) no need, 2) avoid expensive m_copydata(9). Obtained from: WinPcap 3.1 (for i386)
* Protect PID initializations for statistics by the bpf descriptorcsjp2005-10-041-2/+6
| | | | | | | locks. Also while we are here, protect the bpf descriptor during knlist_remove{add} operations. Discussed with: rwatson
* Undo a tad little optimization to bpf_mtap() introduced in rev. 1.95andre2005-09-141-4/+0
| | | | | | | | | which broke the correct handling of the BIOCGSEESENT flag in the bpf listener. PR: kern/56441 Submitted by: <vys at renet.ru> MFC after: 3 days
* Instead of caching the PID which opened the bpf descriptor, continuouslycsjp2005-09-051-2/+12
| | | | | | | | | | | | | | | | | refresh the PID which has the descriptor open. The PID is refreshed in various operations like ioctl(2), kevent(2) or poll(2). This produces more accurate information about current bpf consumers. While we are here remove the bd_pcomm member of the bpf stats structure because now that we have an accurate PID we can lookup the via the kern.proc.pid sysctl variable. This is the trick that NetBSD decided to use to deal with this issue. Special care needs to be taken when MFC'ing this change, as we have made a change to the bpf stats structure. What will end up happening is we will leave the pcomm structure but just mark it as being un-used. This way we keep the ABI in tact. MFC after: 1 month Discussed with: Rui Paulo < rpaulo at NetBSD dot org >
* Introduce two new ioctl(2) commands, BIOCLOCK and BIOCSETWF. These commandscsjp2005-08-221-23/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enhance the security of bpf(4) by further relinquishing the privilege of the bpf(4) consumer (assuming the ioctl commands are being implemented). Once BIOCLOCK is executed, the device becomes locked which prevents the execution of ioctl(2) commands which can change the underly parameters of the bpf(4) device. An example might be the setting of bpf(4) filter programs or attaching to different network interfaces. BIOCSETWF can be used to set write filters for outgoing packets. Currently if a bpf(4) consumer is compromised, the bpf(4) descriptor can essentially be used as a raw socket, regardless of consumer's UID. Write filters give users the ability to constrain which packets can be sent through the bpf(4) descriptor. These features are currently implemented by a couple programs which came from OpenBSD, such as the new dhclient and pflogd. -Modify bpf_setf(9) to accept a "cmd" parameter. This will be used to specify whether a read or write filter is to be set. -Add a bpf(4) filter program as a parameter to bpf_movein(9) as we will run the filter program on the mbuf data once we move the packet in from user-space. -Rather than execute two uiomove operations, (one for the link header and the other for the packet data), execute one and manually copy the linker header into the sockaddr structure via bcopy. -Restructure bpf_setf to compensate for write filters, as well as read. -Adjust bpf(4) stats structures to include a bd_locked member. It should be noted that the FreeBSD and OpenBSD implementations differ a bit in the sense that we unconditionally enforce the lock, where OpenBSD enforces it only if the calling credential is not root. Idea from: OpenBSD Reviewed by: mlaier
* Add missing braces around bpf_filter which were missed when Icsjp2005-08-181-2/+4
| | | | | | | | merged the bpfstat code. Pointed out by: iedowse Pointy hat to: csjp MFC after: 3 days
* Merge the dev_clone and dev_clone_cred event handlers into a singlerwatson2005-08-081-2/+3
| | | | | | | | | | | | | event handler, dev_clone, which accepts a credential argument. Implementors of the event can ignore it if they're not interested, and most do. This avoids having multiple event handler types and fall-back/precedence logic in devfs. This changes the kernel API for /dev cloning, and may affect third party packages containg cloning kernel modules. Requested by: phk MFC after: 3 days
* Rather than hold a mutex over calls to SYSCTL_OUT allocate acsjp2005-07-261-12/+14
| | | | | | | | | | temporary buffer then pass the array to user-space once we have dropped the lock. While we are here, drop an assertion which could result in a kernel panic under certain race conditions. Pointed out by: rwatson
* Introduce new sysctl variable: net.bpf.stats. This sysctl variable cancsjp2005-07-241-14/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | be used to pass statistics regarding dropped, matched and received packet counts from the kernel to user-space. While we are here introduce a new counter for filtered or matched packets. We currently keep track of packets received or dropped by the bpf device, but not how many packets actually matched the bpf filter. -Introduce net.bpf.stats sysctl OID -Move sysctl variables after the function prototypes so we can reference bpf_stats_sysctl(9) without build errors. -Introduce bpf descriptor counter which is used mainly for sizing of the xbpf_d array. -Introduce a xbpf_d structure which will act as an external representation of the bpf_d structure. -Add a the following members to the bpfd structure: bd_fcount - Number of packets which matched bpf filter bd_pid - PID which opened the bpf device bd_pcomm - Process name which opened the device. It should be noted that it's possible that the process which opened the device could be long gone at the time of stats collection. An example might be a process that opens the bpf device forks then exits leaving the child process with the bpf fd. Reviewed by: mdodd
OpenPOWER on IntegriCloud