summaryrefslogtreecommitdiffstats
path: root/sys/netinet/ip_output.c
Commit message (Collapse)AuthorAgeFilesLines
* Avoid dereferencing a null pointer in ro_rt.luigi2002-07-121-2/+3
| | | | | | | | | This was always broken in HEAD (the offending statement was introduced in rev. 1.123 for HEAD, while RELENG_4 included this fix (in rev. 1.99.2.12 for RELENG_4) and I inadvertently deleted it in 1.99.2.30. So I am also restoring these two lines in RELENG_4 now. We might need another few things from 1.99.2.30.
* Warning fixes for 64 bits platforms. With this last fix,mux2002-06-271-1/+1
| | | | | | I can build a GENERIC sparc64 kernel with -Werror. Reviewed by: luigi
* At long last, commit the zero copy sockets code.ken2002-06-261-2/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* fix bad indentation and whitespace resulting from cut&pasteluigi2002-06-231-19/+19
|
* Remove (almost all) global variables that were used to holdluigi2002-06-221-93/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | packet forwarding state ("annotations") during ip processing. The code is considerably cleaner now. The variables removed by this change are: ip_divert_cookie used by divert sockets ip_fw_fwd_addr used for transparent ip redirection last_pkt used by dynamic pipes in dummynet Removal of the first two has been done by carrying the annotations into volatile structs prepended to the mbuf chains, and adding appropriate code to add/remove annotations in the routines which make use of them, i.e. ip_input(), ip_output(), tcp_input(), bdg_forward(), ether_demux(), ether_output_frame(), div_output(). On passing, remove a bug in divert handling of fragmented packet. Now it is the fragment at offset 0 which sets the divert status of the whole packet, whereas formerly it was the last incoming fragment to decide. Removal of last_pkt required a change in the interface of ip_fw_chk() and dummynet_io(). On passing, use the same mechanism for dummynet annotations and for divert/forward annotations. option IPFIREWALL_FORWARD is effectively useless, the code to implement it is very small and is now in by default to avoid the obfuscation of conditionally compiled code. NOTES: * there is at least one global variable left, sro_fwd, in ip_output(). I am not sure if/how this can be removed. * I have deliberately avoided gratuitous style changes in this commit to avoid cluttering the diffs. Minor stule cleanup will likely be necessary * this commit only focused on the IP layer. I am sure there is a number of global variables used in the TCP and maybe UDP stack. * despite the number of files touched, there are absolutely no API's or data structures changed by this commit (except the interfaces of ip_fw_chk() and dummynet_io(), which are internal anyways), so an MFC is quite safe and unintrusive (and desirable, given the improved readability of the code). MFC after: 10 days
* - Change the newly turned INVARIANTS #ifdef blocks (they were changed fromarr2002-05-211-15/+11
| | | | | DIAGNOSTIC yesterday) into KASSERT()'s as these help to increase code readability.
* - Turn a few DIAGNOSTIC into INVARIANTS since they are really sanityarr2002-05-201-3/+3
| | | | checks.
* Cleanup the interface to ip_fw_chk, two of the input argumentsluigi2002-05-091-14/+3
| | | | | | | | | | | | | | | were totally useless and have been removed. ip_input.c, ip_output.c: Properly initialize the "ip" pointer in case the firewall does an m_pullup() on the packet. Remove some debugging code forgotten long ago. ip_fw.[ch], bridge.c: Prepare the grounds for matching MAC header fields in bridged packets, so we can have 'etherfw' functionality without a lot of kernel and userland bloat.
* Change the suser() API to take advantage of td_ucred as well as do ajhb2002-04-011-1/+1
| | | | | | | | | | | | general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
* Prevent icmp_reflect() from calling ip_output() with a NULL routeru2002-03-221-6/+4
| | | | | | | | | | | | | | pointer which will then result in the allocated route's reference count never being decremented. Just flood ping the localhost and watch refcnt of the 127.0.0.1 route with netstat(1). Submitted by: jayanth Back out ip_output.c,v 1.143 and ip_mroute.c,v 1.69 that allowed ip_output() to be called with a NULL route pointer. The previous paragraph shows why this was a bad idea in the first place. MFC after: 0 days
* Remove __P.alfred2002-03-191-7/+7
|
* o Move NTOHL() and associated macros into <sys/param.h>. These aremike2002-02-181-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | deprecated in favor of the POSIX-defined lowercase variants. o Change all occurrences of NTOHL() and associated marcros in the source tree to use the lowercase function variants. o Add missing license bits to sparc64's <machine/endian.h>. Approved by: jake o Clean up <machine/endian.h> files. o Remove unused __uint16_swap_uint32() from i386's <machine/endian.h>. o Remove prototypes for non-existent bswapXX() functions. o Include <machine/endian.h> in <arpa/inet.h> to define the POSIX-required ntohl() family of functions. o Do similar things to expose the ntohl() family in libstand, <netinet/in.h>, and <sys/param.h>. o Prepend underscores to the ntohl() family to help deal with complexities associated with having MD (asm and inline) versions, and having to prevent exposure of these functions in other headers that happen to make use of endian-specific defines. o Create weak aliases to the canonical function name to help deal with third-party software forgetting to include an appropriate header. o Remove some now unneeded pollution from <sys/types.h>. o Add missing <arpa/inet.h> includes in userland. Tested on: alpha, i386 Reviewed by: bde, jake, tmm
* Moved the 127/8 check below so that IPF redirects have a chance of working.ru2002-02-151-10/+10
| | | | MFC after: 1 day
* - Check the address family of the destination cached in a PCB.ume2002-01-211-1/+5
| | | | | | | | - Clear the cached destination before getting another cached route. Otherwise, garbage in the padding space (which might be filled in if it was used for IPv4) could annoy rtalloc. Obtained from: KAME
* RFC1122 requires that addresses of the form { 127, <any> } MUST NOTru2002-01-211-1/+11
| | | | | | | | appear outside a host. PR: 30792, 33996 Obtained from: ip_input.c MFC after: 1 week
* Pre-calculate the checksum for multicast packets sourced on afenner2002-01-051-0/+12
| | | | | | multicast router. This is overkill; it should be possible to delay to hardware interfaces and only pre-calculate when forwarding to a tunnel.
* Fix ipfw fwd so that it acts as the docs sayjulian2001-12-281-5/+11
| | | | | | | when forwarding an incoming packet to another machine. Obtained from: Vicor Production tree MFC after: 3 weeks
* Don't try to free a NULL route when doing IPFIREWALL_FORWARD.yar2001-12-191-1/+2
| | | | | | | An old route will be NULL at that point if a packet were initially routed to an interface (using the IP_ROUTETOIF flag.) Submitted by: Igor Timkin <ivt@gamma.ru>
* whitespace and style fixes recovered from -stable.jlemon2001-12-141-33/+35
|
* Allow for ip_output() to be called with a NULL route pointer.ru2001-12-011-4/+6
| | | | This fixes a panic I introduced yesterday in ip_icmp.c,v 1.64.
* MFS: sync the ipfw/dummynet/bridge code with the one recently mergedluigi2001-11-041-3/+3
| | | | into stable (mostly , but not only, formatting and comments changes).
* Fix a (long standing?) bug in ip_output(): if ip_insertoptions() iswpaul2001-10-301-1/+1
| | | | | | | | | | called and ip_output() encounters an error and bails (i.e. host unreachable), we will leak an mbuf. This is because the code calls m_freem(m0) after jumping to the bad: label at the end of the function, when it should be calling m_freem(m). (m0 is the original mbuf list _without_ the options mbuf prepended.) Obtained from: NetBSD
* When dropping a packet because there is no room in the queue (which itselfjlemon2001-10-301-0/+1
| | | | | | is somewhat bogus), update the statistics to indicate something was dropped. PR: 13740
* Make it so dummynet and bridge can be loaded as modules.ps2001-10-051-9/+2
| | | | Submitted by: billf
* Add a hash table that contains the list of internet addresses, and usejlemon2001-09-291-1/+2
| | | | | this in place of the in_ifaddr list when appropriate. This improves performance on hosts which have a large number of IP aliases.
* Centralize satosin(), sintosa() and ifatoia() macros in <netinet/in.h>jlemon2001-09-291-3/+1
| | | | Remove local definitions.
* Two main changes here:luigi2001-09-271-2/+2
| | | | | | | | | | | | | | | + implement "limit" rules, which permit to limit the number of sessions between certain host pairs (according to masks). These are a special type of stateful rules, which might be of interest in some cases. See the ipfw manpage for details. + merge the list pointers and ipfw rule descriptors in the kernel, so the code is smaller, faster and more readable. This patch basically consists in replacing "foo->rule->bar" with "rule->bar" all over the place. I have been willing to do this for ages! MFC after: 1 week
* Make faith loadable, unloadable, and clonable.brooks2001-09-251-10/+0
|
* KSE Milestone 2julian2001-09-121-3/+3
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Wrap array accesses in macros, which also happen to be lvalues:jlemon2001-09-061-1/+1
| | | | | | | ifnet_addrs[i - 1] -> ifaddr_byindex(i) ifindex2ifnet[i] -> ifnet_byindex(i) This is intended to ease the conversion to SMPng.
* MFS: Avoid dropping fragments in the absence of an interface address.dcs2001-08-031-3/+5
| | | | | | Noticed by: fenner Submitted by: iedowse Not committed to current by: iedowse ;-)
* Avoid a NULL pointer derefence introduced in rev. 1.129.ru2001-07-231-24/+21
| | | | | | Problem noticed by: bde, gcc(1) Panic caught by: mjacob Patch tested by: mjacob
* Backout non-functional changes from revision 1.128.ru2001-07-191-13/+9
| | | | Not objected to by: dcs
* Skip the route checking in the case of multicast packets with knowndcs2001-07-171-9/+22
| | | | | | | interfaces. Reviewed by: people at that channel Approved by: silence on -net
* Sync with recent KAME.ume2001-06-111-134/+179
| | | | | | | | | | | | | | | | | | This work was based on kame-20010528-freebsd43-snap.tgz and some critical problem after the snap was out were fixed. There are many many changes since last KAME merge. TODO: - The definitions of SADB_* in sys/net/pfkeyv2.h are still different from RFC2407/IANA assignment because of binary compatibility issue. It should be fixed under 5-CURRENT. - ip6po_m member of struct ip6_pktopts is no longer used. But, it is still there because of binary compatibility issue. It should be removed under 5-CURRENT. Reviewed by: itojun Obtained from: KAME MFC after: 3 weeks
* Add ``options RANDOM_IP_ID'' which randomizes the ID field of IP packets.kris2001-06-011-0/+5
| | | | | | | | | This closes a minor information leak which allows a remote observer to determine the rate at which the machine is generating packets, since the default behaviour is to increment a counter for each packet sent. Reviewed by: -net Obtained from: OpenBSD
* RFC768 (UDP) requires that "if the computed checksum is zero, itru2001-03-131-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is transmitted as all ones". This got broken after introduction of delayed checksums as follows. Some guys (including Jonathan) think that it is allowed to transmit all ones in place of a zero checksum for TCP the same way as for UDP. (The discussion still takes place on -net.) Thus, the 0 -> 0xffff checksum fixup was first moved from udp_output() (see udp_usrreq.c, 1.64 -> 1.65) to in_cksum_skip() (see sys/i386/i386/in_cksum.c, 1.17 -> 1.18, INVERT expression). Besides that I disagree that it is valid for TCP, there was no real problem until in_cksum.c,v 1.20, where the in_cksum() was made just a special version of in_cksum_skip(). The side effect was that now every incoming IP datagram failed to pass the checksum test (in_cksum() returned 0xffff when it should actually return zero). It was fixed next day in revision 1.21, by removing the INVERT expression. The latter also broke the 0 -> 0xffff fixup for UDP checksums. Before this change: : tcpdump: listening on lo0 : 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1) : 4500 001c 0001 0000 4011 7cce 7f00 0001 : 7f00 0001 80ed 80ee 0008 0000 After this change: : tcpdump: listening on lo0 : 127.0.0.1.33005 > 127.0.0.1.33006: udp 0 (ttl 64, id 1) : 4500 001c 0001 0000 4011 7cce 7f00 0001 : 7f00 0001 80ed 80ee 0008 ffff
* In ip_output(), initialise `ia' in the case where the packet hasiedowse2001-03-111-0/+1
| | | | | | | | | come from a dummynet pipe. Without this, the code which increments the per-ifaddr stats can dereference an uninitialised pointer. This should make dummynet usable again. Reported by: "Dmitry A. Yanko" <fm@astral.ntu-kpi.kiev.ua> Reviewed by: luigi, joe
* Remove conditionals for vax support.asmodai2001-02-261-5/+0
| | | | | | | People who care much about this are welcomed to try 2.11BSD. :) Noticed by: luigi Reviewed by: jesper
* Another round of the <sys/queue.h> FOREACH transmogriffer.phk2001-02-041-4/+2
| | | | | Created with: sed(1) Reviewed by: md5(1)
* Mechanical change to use <sys/queue.h> macro API instead ofphk2001-02-041-3/+3
| | | | | | | fondling implementation details. Created with: sed(1) Reviewed by: md5(1)
* MFS: bridge/ipfw/dummynet fixes (bridge.c will be committed separately)luigi2001-02-021-4/+16
|
* Pass up errors returned by dummynet. The same should be done withluigi2001-01-251-3/+3
| | | | divert.
* * Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.bmilekic2000-12-211-1/+1
| | | | | | | | | | | | | | | | | | This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
* It's no longer true that "nobody uses ia beyond here"; it's nowjoe2000-11-011-1/+1
| | | | | | used to keep address based if_data statistics in. Submitted by: ru
* Move suser() and suser_xxx() prototypes and a related #define fromphk2000-10-291-1/+0
| | | | | | | | | <sys/proc.h> to <sys/systm.h>. Correctly document the #includes needed in the manpage. Add one now needed #include of <sys/systm.h>. Remove the consequent 48 unused #includes of <sys/proc.h>.
* Count per-address statistics for IP fragments.joe2000-10-291-2/+6
| | | | | Requested by: ru Obtained from: BSD/OS
* Save a few CPU cycles in IP fragmentation code.ru2000-10-201-3/+1
|
* Augment the 'ifaddr' structure with a 'struct if_data' to keepjoe2000-10-191-0/+7
| | | | | | | | | | | statistics on a per network address basis. Teach the IPv4 and IPv6 input/output routines to log packets/bytes against the network address connected to the flow. Teach netstat to display the per-address stats for IP protocols when 'netstat -i' is evoked, instead of displaying the per-interface stats.
* Follow BSD/OS and NetBSD, keep the ip_id field in network order all the time.ru2000-09-141-9/+1
| | | | Requested by: wollman
OpenPOWER on IntegriCloud