summaryrefslogtreecommitdiffstats
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
...
* Commit pf version 3.5 and link additional files to the kernel build.mlaier2004-06-161-0/+14
| | | | | | | | | | | | Version 3.5 brings: - Atomic commits of ruleset changes (reduce the chance of ending up in an inconsistent state). - A 30% reduction in the size of state table entries. - Source-tracking (limit number of clients and states per client). - Sticky-address (the flexibility of round-robin with the benefits of source-hash). - Significant improvements to interface handling. - and many more ...
* Prepare for pf 3.5 import:mlaier2004-06-161-0/+2
| | | | | | | | | | - Remove pflog and pfsync modules. Things will change in such a fashion that there will be one module with pf+pflog that can be loaded into GENERIC without problems (which is what most people want). pfsync is no longer possible as a module. - Add multicast address for in-kernel multicast pfsync protocol. Protocol glue will follow once the import is done. - Add one more mbuf tag
* o connect(2): if there is no a route to the destinationmaxim2004-06-161-3/+1
| | | | | | | | do not pick up the first local ip address for the source ip address, return ENETUNREACH instead. Submitted by: Gleb Smirnoff Reviewed by: -current (silence)
* Fix build for IPSEC && !INET6bms2004-06-162-6/+12
| | | | | PR: kern/66125 Submitted by: Cyrille Lefevre
* Reverse a patch which has no effect on -CURRENT and should probably bebms2004-06-161-7/+1
| | | | | | | applied directly to -STABLE. Noticed by: iedowse Pointy hat to: bms
* In ip_forward(), when calculating the MTU in effect for an IPSEC transportbms2004-06-161-0/+2
| | | | | | | | mode tunnel, take the per-route MTU into account, *if* and *only if* it is non-zero (as found in struct rt_metrics/rt_metrics_lite). PR: kern/42727 Obtained from: NetBSD (ip_input.c rev 1.151)
* In ip_forward(), set m->m_pkthdr.len correctly such that the mbuf chainbms2004-06-161-0/+1
| | | | | | | is sane, and ipsec4_getpolicybyaddr() will therefore complete. PR: kern/42727 Obtained from: KAME (kame/freebsd4/sys/netinet/ip_input.c rev 1.42)
* Disconnect a temporarily-connected UDP socket in out-of-mbufs case. Thisbms2004-06-161-1/+7
| | | | | | | | | | | | | | | | | | | | | | fixes the problem of UDP sockets getting wedged in a connected state (and bound to their destination) under heavy load. Temporary bind/connect should probably be deleted in future as an optimization, as described in "A Faster UDP" [Partridge/Pink 1993]. Notes: - INP_LOCK() is already held in udp_output(). The connection is in effect happening at a layer lower than the socket layer, therefore in theory socket locking should not be needed. - Inlining the in_pcbdisconnect() operation buys us nothing (in the case of the current state of the code), as laddr is not part of the inpcb hash or the udbinfo hash. Therefore there should be no need to rehash after restoring laddr in the error case (this was a concern of the original author of the patch). PR: kern/41765 Requested by: gnn Submitted by: Jinmei Tatuya (with cleanups) Tested by: spray(8)
* Convert GIANT_REQUIRED to NET_ASSERT_GIANT for socket access.rwatson2004-06-161-1/+1
|
* Grab the socket buffer send or receive mutex when performing arwatson2004-06-152-2/+8
| | | | | | read-modify-write on the sb_state field. This commit catches only the "easy" ones where it doesn't interact with as yet unmerged locking.
* The socket field so_state is used to hold a variety of socket relatedrwatson2004-06-144-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | flags relating to several aspects of socket functionality. This change breaks out several bits relating to send and receive operation into a new per-socket buffer field, sb_state, in order to facilitate locking. This is required because, in order to provide more granular locking of sockets, different state fields have different locking properties. The following fields are moved to sb_state: SS_CANTRCVMORE (so_state) SS_CANTSENDMORE (so_state) SS_RCVATMARK (so_state) Rename respectively to: SBS_CANTRCVMORE (so_rcv.sb_state) SBS_CANTSENDMORE (so_snd.sb_state) SBS_RCVATMARK (so_rcv.sb_state) This facilitates locking by isolating fields to be located with other identically locked fields, and permits greater granularity in socket locking by avoiding storing fields with different locking semantics in the same short (avoiding locking conflicts). In the future, we may wish to coallesce sb_state and sb_flags; for the time being I leave them separate and there is no additional memory overhead due to the packing/alignment of shorts in the socket buffer structure.
* Link ALTQ to the build and break with ABI for struct ifnet. Please recompilemlaier2004-06-132-0/+13
| | | | | | | | | | | | your (network) modules as well as any userland that might make sense of sizeof(struct ifnet). This does not change the queueing yet. These changes will follow in a seperate commit. Same with the driver changes, which need case by case evaluation. __FreeBSD_version bump will follow. Tested-by: (i386)LINT
* Add a new driver to support IP over firewire. This driver is intended todfr2004-06-131-1/+2
| | | | | | | | conform to the rfc2734 and rfc3146 standard for IP over firewire and should eventually supercede the fwe driver. Right now the broadcast channel number is hardwired and we don't support MCAP for multicast channel allocation - more infrastructure is required in the firewire code itself to fix these problems.
* Socket MAC labels so_label and so_peerlabel are now protected byrwatson2004-06-135-1/+12
| | | | | | | | | | | | | SOCK_LOCK(so): - Hold socket lock over calls to MAC entry points reading or manipulating socket labels. - Assert socket lock in MAC entry point implementations. - When externalizing the socket label, first make a thread-local copy while holding the socket lock, then release the socket lock to externalize to userspace.
* Extend coverage of SOCK_LOCK(so) to include so_count, the socketrwatson2004-06-123-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | reference count: - Assert SOCK_LOCK(so) macros that directly manipulate so_count: soref(), sorele(). - Assert SOCK_LOCK(so) in macros/functions that rely on the state of so_count: sofree(), sotryfree(). - Acquire SOCK_LOCK(so) before calling these functions or macros in various contexts in the stack, both at the socket and protocol layers. - In some cases, perform soisdisconnected() before sotryfree(), as this could result in frobbing of a non-present socket if sotryfree() actually frees the socket. - Note that sofree()/sotryfree() will release the socket lock even if they don't free the socket. Submitted by: sam Sponsored by: FreeBSD Foundation Obtained from: BSD/OS
* Modify ip fw so that whenever UID or GID constraints exist in acsjp2004-06-111-30/+77
| | | | | | | | | | | | | | | | | | | ruleset, the pcb is looked up once per ipfw_chk() activation. This is done by extracting the required information out of the PCB and caching it to the ipfw_chk() stack. This should greatly reduce PCB looking contention and speed up the processing of UID/GID based firewall rules (especially with large UID/GID rulesets). Some very basic benchmarks were taken which compares the number of in_pcblookup_hash(9) activations to the number of firewall rules containing UID/GID based contraints before and after this patch. The results can be viewed here: o http://people.freebsd.org/~csjp/ip_fw_pcb.png Reviewed by: andre, luigi, rwatson Approved by: bmilekic (mentor)
* Remove unneeded Giant acquisition in divert_packet(), which isrwatson2004-06-111-15/+0
| | | | | | left over from debug.mpsafenet affecting only the forwarding plane. Giant is now acquired in the ithread/netisr or in the system call code.
* Lock down parallel router_info list for tracking multicast IGMPrwatson2004-06-111-1/+27
| | | | | | | | | | | | versions of various routers seen: - Introduce igmp_mtx. - Protect global variable 'router_info_head' and list fields in struct router_info with this mutex, as well as igmp_timers_are_running. - find_rti() asserts that the caller acquires igmp_mtx. - Annotate a failure to check the return value of MALLOC(..., M_NOWAIT).
* init_tables() must be run after sys/net/route.c:route_init().ru2004-06-101-1/+4
|
* Introduce a new feature to IPFW2: lookup tables. These are usefulru2004-06-094-1/+354
| | | | | | | for handling large sparse address sets. Initial implementation by Vsevolod Lobko <seva@ip.net.ua>, refined by me. MFC after: 1 week
* do not send icmp response if the original packet is encrypted.ume2004-06-071-0/+3
| | | | | Obtained from: KAME MFC after: 1 week
* Move the locking of the pcb into raw_output(). Organize code sobmilekic2004-06-031-10/+14
| | | | | | | | that m_prepend() is not called with possibility to wait while the pcb lock is held. What still needs revisiting is whether the ripcbinfo lock is really required here. Discussed with: rwatson
* add missing #include <sys/module.h>phk2004-05-303-0/+3
|
* Add some missing <sys/module.h> includes which are masked by thephk2004-05-301-0/+1
| | | | one on death-row in <sys/kernel.h>
* Add a super-user check to ipfw_ctl() to make sure that the callingcsjp2004-05-251-0/+4
| | | | | | | | | process is a non-prison root. The security.jail.allow_raw_sockets sysctl variable is disabled by default, however if the user enables raw sockets in prisons, prison-root should not be able to interact with firewall rule sets. Approved by: rwatson, bmilekic (mentor)
* When checking for possible port theft, skip over a TCP inpcbyar2004-05-201-7/+3
| | | | | | | | | | | | | | unless it's in the closed or listening state (remote address == INADDR_ANY). If a TCP inpcb is in any other state, it's impossible to steal its local port or use it for port theft. And if there are both closed/listening and connected TCP inpcbs on the same localIP:port couple, the call to in_pcblookup_local() will find the former due to the design of that function. No objections raised in: -net, -arch MFC after: 1 month
* o Calculate a number of bytes to copy (cnt) correctly:maxim2004-05-111-1/+1
| | | | | | | | | | | | | | | | | | +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ | | |C| | | | | | | | IP |N|O|L|P| | IP | | IP | | #1 |O|D|E|T| | #2 | | #n | | |P|E|N|R| | | | | +----+-+-+-+-+----+----+- - - - - - - - - - - - -+----+ ^ ^<---- cnt - (IPOPT_MINOFF - 1) ---->| | | src | +-- cp[IPOPT_OFF + 1] + sizeof(struct in_addr) | dst +-- cp[IPOPT_OFF + 1] PR: kern/66386 Submitted by: Andrei Iltchenko MFC after: 3 weeks
* o IFNAMSIZ does include the trailing \0.maxim2004-05-071-1/+1
| | | | | | Approved by: andre o Document net.inet.icmp.reply_src.
* Provide the sysctl net.inet.ip.process_options to control the processingandre2004-05-063-2/+24
| | | | | | | | | | | | | | | | | | of IP options. net.inet.ip.process_options=0 Ignore IP options and pass packets unmodified. net.inet.ip.process_options=1 Process all IP options (default). net.inet.ip.process_options=2 Reject all packets with IP options with ICMP filter prohibited message. This sysctl affects packets destined for the local host as well as those only transiting through the host (routing). IP options do not have any legitimate purpose anymore and are only used to circumvent firewalls or to exploit certain behaviours or bugs in TCP/IP stacks. Reviewed by: sam (mentor)
* Switch to using the inpcb MAC label instead of socket MAC label whenrwatson2004-05-046-8/+20
| | | | | | | | | | | | | | | | | | | | labeling new mbufs created from sockets/inpcbs in IPv4. This helps avoid the need for socket layer locking in the lower level network paths where inpcb locks are already frequently held where needed. In particular: - Use the inpcb for label instead of socket in raw_append(). - Use the inpcb for label instead of socket in tcp_output(). - Use the inpcb for label instead of socket in tcp_respond(). - Use the inpcb for label instead of socket in tcp_twrespond(). - Use the inpcb for label instead of socket in syncache_respond(). While here, modify tcp_respond() to avoid assigning NULL to a stack variable and centralize assertions about the inpcb when inp is assigned. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
* Assert inpcb lock in udp_append().rwatson2004-05-041-0/+2
| | | | | Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
* Assert the inpcb lock on 'last' in udp_append(), since it's alwaysrwatson2004-05-041-0/+2
| | | | | | | called with it, and also requires it. Obtained from: TrustedBSD Project Sponsored by: DARPA, McAfee Research
* o Fix misindentation in the previous commit.maxim2004-05-031-8/+7
|
* Back out a change that slipped into the previous commit for which otherandre2004-05-031-10/+2
| | | | | | supporting parts have not yet been committed. Remove pre-mature IP options ignoring option.
* Optimize IP fastforwarding some more:andre2004-05-031-95/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | o New function ip_findroute() to reduce code duplication for the route lookup cases. (luigi) o Store ip_len in host byte order on the stack instead of using it via indirection from the mbuf. This allows to defer the host byte conversion to a later point and makes a quicker fallback to normal ip_input() processing. (luigi) o Check if route is dampned with RTF_REJECT flag and drop packet already here when ARP is unable to resolve destination address. An ICMP unreachable is sent to inform the sender. o Check if interface output queue is full and drop packet already here. No ICMP notification is sent because signalling source quench is depreciated. o Check if media_state is down (used for ethernet type interfaces) and drop the packet already here. An ICMP unreachable is sent to inform the sender. o Do not account sent packets to the interface address counters. They are only for packets with that 'ia' as source address. o Update and clarify some comments. Submitted by: luigi (most of it)
* Rename m_claim_next_hop() to m_claim_next(), as suggested by Max Laier.darrenr2004-05-024-4/+4
|
* oops, I forgot this file in a prior commit (change was still sitting here,darrenr2004-05-022-2/+2
| | | | | | | | uncommitted): Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra arg (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.
* Rename ip_claim_next_hop() to m_claim_next_hop(), give it an extra argdarrenr2004-05-023-18/+2
| | | | | (the type of tag to claim) and push it out of ip_var.h into mbuf.h alongside all of the other macros that work ok mbuf's and tag's.
* Give jail(8) the feature to allow raw sockets from within abmilekic2004-04-261-2/+31
| | | | | | | | | | | | | | | | | | | | | jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1. Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail. The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800
* Tighten up reset handling in order to make reset attacks as difficult assilby2004-04-263-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | possible while maintaining compatibility with the widest range of TCP stacks. The algorithm is as follows: --- For connections in the ESTABLISHED state, only resets with sequence numbers exactly matching last_ack_sent will cause a reset, all other segments will be silently dropped. For connections in all other states, a reset anywhere in the window will cause the connection to be reset. All other segments will be silently dropped. --- The necessity of accepting all in-window resets was discovered by jayanth and jlemon, both of whom have seen TCP stacks that will respond to FIN-ACK packets with resets not meeting the strict last_ack_sent check. Idea by: Darren Reed Reviewed by: truckman, jlemon, others(?)
* Another small set of changes to reduce diffs with the new arp code.luigi2004-04-251-31/+18
|
* remove a stale comment on the behaviour of arpresolveluigi2004-04-251-10/+0
|
* Start the arp timer at init time.luigi2004-04-251-10/+1
| | | | It runs so rarely that it makes no sense to wait until the first request.
* This commit does two things:luigi2004-04-251-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. rt_check() cleanup: rt_check() is only necessary for some address families to gain access to the corresponding arp entry, so call it only in/near the *resolve() routines where it is actually used -- at the moment this is arpresolve(), nd6_storelladdr() (the call is embedded here), and atmresolve() (the call is just before atmresolve to reduce the number of changes). This change will make it a lot easier to decouple the arp table from the routing table. There is an extra call to rt_check() in if_iso88025subr.c to determine the routing info length. I have left it alone for the time being. The interface of arpresolve() and nd6_storelladdr() now changes slightly: + the 'rtentry' parameter (really a hint from the upper level layer) is now passed unchanged from *_output(), so it becomes the route to the final destination and not to the gateway. + the routines will return 0 if resolution is possible, non-zero otherwise. + arpresolve() returns EWOULDBLOCK in case the mbuf is being held waiting for an arp reply -- in this case the error code is masked in the caller so the upper layer protocol will not see a failure. 2. arpcom untangling Where possible, use 'struct ifnet' instead of 'struct arpcom' variables, and use the IFP2AC macro to access arpcom fields. This mostly affects the netatalk code. === Detailed changes: === net/if_arcsubr.c rt_check() cleanup, remove a useless variable net/if_atmsubr.c rt_check() cleanup net/if_ethersubr.c rt_check() cleanup, arpcom untangling net/if_fddisubr.c rt_check() cleanup, arpcom untangling net/if_iso88025subr.c rt_check() cleanup netatalk/aarp.c arpcom untangling, remove a block of duplicated code netatalk/at_extern.h arpcom untangling netinet/if_ether.c rt_check() cleanup (change arpresolve) netinet6/nd6.c rt_check() cleanup (change nd6_storelladdr)
* Wrap two long lines in the previous commit.silby2004-04-231-2/+4
|
* Correct an edge case in tcp_mss() where the cached path MTUandre2004-04-232-4/+4
| | | | | | | | | | | | from tcp_hostcache would have overridden a (now) lower MTU of an interface or route that changed since first PMTU discovery. The bug would have caused TCP to redo the PMTU discovery when not strictly necessary. Make a comment about already pre-initialized default values more clear. Reviewed by: sam
* Add the option versrcreach to verify that a valid route to theandre2004-04-232-7/+32
| | | | | | | | | | | | | | | | | | | | source address of a packet exists in the routing table. The default route is ignored because it would match everything and render the check pointless. This option is very useful for routers with a complete view of the Internet (BGP) in the routing table to reject packets with spoofed or unrouteable source addresses. Example: ipfw add 1000 deny ip from any to any not versrcreach also known in Cisco-speak as: ip verify unicast source reachable-via any Reviewed by: luigi
* Fix a potential race when purging expired hostcache entries.andre2004-04-231-3/+3
| | | | Spotted by: luigi
* Take out an unneeded variable I forgot to remove in the last commit,silby2004-04-221-2/+3
| | | | and make two small whitespace fixes so that diffs vs rev 1.142 are minimal.
* Simplify random port allocation, and add net.inet.ip.portrange.randomized,silby2004-04-221-27/+13
| | | | | | which can be used to turn off randomized port allocation if so desired. Requested by: alfred
OpenPOWER on IntegriCloud