summaryrefslogtreecommitdiffstats
path: root/sys/netipx
Commit message (Collapse)AuthorAgeFilesLines
* Use queue(9) instead of hand-crafted link lists for the global IPXrwatson2009-06-245-51/+53
| | | | | | | address list (ipx_ifaddr -> ipx_ifaddrhead), and generally adopt the naming and usage conventions found in netinet. MFC after: 6 weeks
* Rework locking and reference counting in ipx_control to be consistent withrwatson2009-06-241-58/+60
| | | | | | the model used in in_control(). MFC after: 6 weeks
* Modify most routines returning 'struct ifaddr *' to return referencesrwatson2009-06-231-7/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rather than pointers, requiring callers to properly dispose of those references. The following routines now return references: ifaddr_byindex ifa_ifwithaddr ifa_ifwithbroadaddr ifa_ifwithdstaddr ifa_ifwithnet ifaof_ifpforaddr ifa_ifwithroute ifa_ifwithroute_fib rt_getifa rt_getifa_fib IFP_TO_IA ip_rtaddr in6_ifawithifp in6ifa_ifpforlinklocal in6ifa_ifpwithaddr in6_ifadd carp_iamatch6 ip6_getdstifaddr Remove unused macro which didn't have required referencing: IFP_TO_IA6 This closes many small races in which changes to interface or address lists while an ifaddr was in use could lead to use of freed memory (etc). In a few cases, add missing if_addr_list locking required to safely acquire references. Because of a lack of deep copying support, we accept a race in which an in6_ifaddr pointed to by mbuf tags and extracted with ip6_getdstifaddr() doesn't hold a reference while in transmit. Once we have mbuf tag deep copy support, this can be fixed. Reviewed by: bz Obtained from: Apple, Inc. (portions) MFC after: 6 weeks (portions)
* Include sys/lock.h before sys/rwlock.h. If anything used to bring it for uscognet2009-06-231-0/+1
| | | | before, it does not anymore.
* Add a new function, ifa_ifwithaddr_check(), which rather than returningrwatson2009-06-221-1/+1
| | | | | | | | | | a pointer to an ifaddr matching the passed socket address, returns a boolean indicating whether one was present. In the (near) future, ifa_ifwithaddr() will return a referenced ifaddr rather than a raw ifaddr pointer, and the new wrapper will allow callers that care only about the boolean condition to avoid having to free that reference. MFC after: 3 weeks
* Add ipx_ifaddr locking to ipx_control(), which should close mostrwatson2009-06-211-27/+70
| | | | | | | | | | | | | | | remaining potential races in ifconfig's management of IPX addresses. This is largely accomplished by dropping a global write lock for the IPX address list over the body of in_control(), although there are some places we bump the refcount on an ifaddr of interest while calling out to the routing code or link layer code, which might require revisiting. Annotate one as a potential race if two simultaneous delete ioctls are issued for the same IPX addresses at once. MFC after: 3 weeks
* Introduce basic locking of global IPX address list 'ipx_ifaddr' usingrwatson2009-06-215-13/+61
| | | | | | | | | a new rwlock, ipx_ifaddr_rw, wrapped with macros. This locking is necessary but not sufficient, in isolation, to satisfy the stability requirements of a fully parallel IPX input path during interface reconfiguration. MFC after: 3 weeks
* In ipx_control(), lock if_addr_mtx when adding/removing addresses fromrwatson2009-06-211-3/+5
| | | | | | | interface address lists, and don't add an address until it's fully initialized. MFC after: 3 weeks
* Clean up common ifaddr management:rwatson2009-06-211-3/+2
| | | | | | | | | | | | | | - Unify reference count and lock initialization in a single function, ifa_init(). - Move tear-down from a macro (IFAFREE) to a function ifa_free(). - Move reference count bump from a macro (IFAREF) to a function ifa_ref(). - Instead of using a u_int protected by a mutex to refcount(9) for reference count management. The ifa_mtx is now used for exactly one ioctl, and possibly should be removed. MFC after: 3 weeks
* Minor style cleanups.rwatson2009-06-211-24/+25
| | | | MFC after: 3 days
* Remove unuxed ipx_zerohost.rwatson2009-06-212-2/+0
| | | | MFC after: 3 days
* Update copyright on netipx.rwatson2009-06-211-1/+1
|
* Remove historical support for capturing IPX packets in the output pathrwatson2009-06-214-67/+0
| | | | | | | | | | | | | | | | | | | using raw IPX sockets. While functional, this support is disabled using a flag that can't be changed from userspace, and google reveals no documentation or use of that flag anywhere. This eliminates a potential lock order reversal and code reentrance issue in which the output path reentered the input path in IPX. An alternative to removal would be to use the netisr, as a comment I added in 2005 suggests. While this change is fairly straight-forward, the lack of any consumers or the easy possibility of consumers (kernel modification and recompile required) suggests that this is simply an unused feature. Update README to remove this TODO, and a TODO regarding IPX/IP encapsulation which was also removed a few years ago. MFC after: 1 week
* Implement socket delivery MAC checks for IPX/SPX.rwatson2009-06-202-0/+11
| | | | | Obtained from: TrustedBSD Project MFC after: 3 days
* Rework SPX segment reassembly, which was originally based on our TCPrwatson2009-06-204-57/+55
| | | | | | | | | | reassembly but failed to be modernized over time: - Use queue(9). - Specifically allocate queue entries of type M_SPXREASSQ to point at member mbufs, rather than casting mbuf data to 'spx_q'. - Maintain the mbuf pointer as part of the queue entry so that we can later free the mbuf without using dtom().
* Invoke the MAC Framework's mac_socket_create_mbuf() entry point whenrwatson2009-06-201-0/+6
| | | | | | generating IPX output for SPX sockets. Obtained from: TrustedBSD Project
* Invoke the MAC Framework's mac_socket_create_mbuf() entry point whenrwatson2009-06-201-0/+5
| | | | | | generating IPX output for raw and datagram IPX sockets. Obtained from: TrustedBSD Project
* Put the variable declarations for TCPDEBUG under #ifdef INET as well.bz2009-06-101-0/+2
| | | | | | The implementation already has this right. Reviewed by: rwatson
* Reimplement the netisr framework in order to support parallel netisrrwatson2009-06-011-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | threads: - Support up to one netisr thread per CPU, each processings its own workstream, or set of per-protocol queues. Threads may be bound to specific CPUs, or allowed to migrate, based on a global policy. In the future it would be desirable to support topology-centric policies, such as "one netisr per package". - Allow each protocol to advertise an ordering policy, which can currently be one of: NETISR_POLICY_SOURCE: packets must maintain ordering with respect to an implicit or explicit source (such as an interface or socket). NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work, as well as allowing protocols to provide a flow generation function for mbufs without flow identifers (m2flow). Falls back on NETISR_POLICY_SOURCE if now flow ID is available. NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for each packet handled by netisr (m2cpuid). - Provide utility functions for querying the number of workstreams being used, as well as a mapping function from workstream to CPU ID, which protocols may use in work placement decisions. - Add explicit interfaces to get and set per-protocol queue limits, and get and clear drop counters, which query data or apply changes across all workstreams. - Add a more extensible netisr registration interface, in which protocols declare 'struct netisr_handler' structures for each registered NETISR_ type. These include name, handler function, optional mbuf to flow ID function, optional mbuf to CPU ID function, queue limit, and ordering policy. Padding is present to allow these to be expanded in the future. If no queue limit is declared, then a default is used. - Queue limits are now per-workstream, and raised from the previous IFQ_MAXLEN default of 50 to 256. - All protocols are updated to use the new registration interface, and with the exception of netnatm, default queue limits. Most protocols register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use NETISR_POLICY_FLOW, and will therefore take advantage of driver- generated flow IDs if present. - Formalize a non-packet based interface between interface polling and the netisr, rather than having polling pretend to be two protocols. Provide two explicit hooks in the netisr worker for start and end events for runs: netisr_poll() and netisr_pollmore(), as well as a function, netisr_sched_poll(), to allow the polling code to schedule netisr execution. DEVICE_POLLING still embeds single-netisr assumptions in its implementation, so for now if it is compiled into the kernel, a single and un-bound netisr thread is enforced regardless of tunable configuration. In the default configuration, the new netisr implementation maintains the same basic assumptions as the previous implementation: a single, un-bound worker thread processes all deferred work, and direct dispatch is enabled by default wherever possible. Performance measurement shows a marginal performance improvement over the old implementation due to the use of batched dequeue. An rmlock is used to synchronize use and registration/unregistration using the framework; currently, synchronized use is disabled (replicating current netisr policy) due to a measurable 3%-6% hit in ping-pong micro-benchmarking. It will be enabled once further rmlock optimization has taken place. However, in practice, netisrs are rarely registered or unregistered at runtime. A new man page for netisr will follow, but since one doesn't currently exist, it hasn't been updated. This change is not appropriate for MFC, although the polling shutdown handler should be merged to 7-STABLE. Bump __FreeBSD_version. Reviewed by: bz
* Staticize spx_remque() now that it's only used from spx_reass.c.rwatson2009-05-252-2/+1
|
* Add missing call to ipx_pcbdetach() during SPX socket tear-down: notrwatson2009-05-251-0/+1
| | | | | | | harmful in practice if running without INVARIANTS, but will panic with KASSERT enabled when SPX sockets are closed. MFC after: 3 days
* Eliminate use of dtom() in spx_output() by fixing up tracking of therwatson2009-05-251-7/+10
| | | | | | containing mbuf for 'si' in local variable 'm'. MFC after: 1 month
* Prefer NULL to 0 for pointer assignments.rwatson2009-05-251-2/+2
| | | | MFC after: 1 month
* Rather than store a skeleton IPX header in an mbuf hung off the SPXrwatson2009-05-252-12/+15
| | | | | | | | | | PCB, simply embed it in the PCB, avoiding additional memory overhead, memory allocation overhead, and removing one of the few remaining uses of dtom() in the network stack. Restore misplaced spx_ctlinput() from an earlier commit. MFC after: 1 month
* Pull SPX reassembly queue init and flush into spx_reass.c.rwatson2009-05-253-12/+34
| | | | MFC after: 1 month
* Prefer m_nextpkt to m_act when iterating mbuf queues.rwatson2009-05-251-1/+1
| | | | MFC after: 1 month
* Complete move of SPX reassembly from spx_usrreq.c to spx_reass.c.rwatson2009-05-253-2061/+19
| | | | MFC after: 1 month
* Copy spx_usrreq.c to spx_reass.c in order to apply similar file layoutrwatson2009-05-251-0/+2132
| | | | | | | changes to IPX/SPX that were applied to TCP/IP in the creation of tcp_reass.c. MFC after: 1 month
* Make the SPX code use its own copies of insque()/remque().ed2009-04-261-3/+22
| | | | | | Instead of using the antique insque()/remque() functions from sys/queue.h, make this code use its own versions. Eventually the code should just use the regular TAILQ/LIST macros.
* Change if_output to take a struct route as its fourth argument in orderkmacy2009-04-161-1/+1
| | | | | | to allow passing a cached struct llentry * down to L2 Reviewed by: rwatson
* Add missing "goto set_head" for SO_IPX_CHECKSUM; otherwise we fall throughrwatson2008-12-111-0/+1
| | | | | | | | to the SO_HEADERS_ON_OUTPUT case and set that instead. MFC after: 1 week Found with: Coverity Prevent(tm) Coverity ID: 3988
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-232-6/+6
| | | | MFC after: 3 months
* Remove the suser(9) interface from the kernel. It has been replaced fromattilio2008-09-171-4/+11
| | | | | | | | | | | | | | | | | years by the priv_check(9) interface and just very few places are left. Note that compatibility stub with older FreeBSD version (all above the 8 limit though) are left in order to reduce diffs against old versions. It is responsibility of the maintainers for any module, if they think it is the case, to axe out such cases. This patch breaks KPI so __FreeBSD_version will be bumped into a later commit. This patch needs to be credited 50-50 with rwatson@ as he found time to explain me how the priv_check() works in detail and to review patches. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Reviewed by: rwatson
* Begin the sysctl descriptions with a capital letter.trhodes2008-07-252-6/+6
| | | | Make some slight wording tweaks.
* Document a few sysctls.trhodes2008-07-202-6/+6
| | | | Reviewed by: rwatson
* Remove NETISR_MPSAFE, which allows specific netisr handlers to be directlyrwatson2008-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | dispatched without Giant, and add NETISR_FORCEQUEUE, which allows specific netisr handlers to always be dispatched via a queue (deferred). Mark the usb and if_ppp netisr handlers as NETISR_FORCEQUEUE, and explicitly acquire Giant in those handlers. Previously, any netisr handler not marked NETISR_MPSAFE would necessarily run deferred and with Giant acquired. This change removes Giant scaffolding from the netisr infrastructure, but NETISR_FORCEQUEUE allows non-MPSAFE handlers to continue to force deferred dispatch so as to avoid lock order reversals between their acqusition of Giant and any calling context. It is likely we will be able to remove NETISR_FORCEQUEUE once IFF_NEEDSGIANT is removed, as non-MPSAFE usb and if_ppp drivers will no longer be supported. Reviewed by: bz MFC after: 1 month X-MFC note: We can't remove NETISR_MPSAFE from stable/7 for KPI reasons, but the rest can go back.
* Rather than m_free(dtom(si)) in spx_reass(), return (1) which causes therwatson2008-05-291-6/+3
| | | | | | caller to free the mbuf without using dtom(). MFC after: 3 days
* Correct minor comment typos, make white space use before block commentsrwatson2008-05-291-6/+19
| | | | | | more consistent. MFC after: 3 days
* Avoid unnecessary one use of dtom(9) in spx_input().rwatson2008-05-261-1/+1
| | | | MFC after: 3 days
* Add code to allow the system to handle multiple routing tables.julian2008-05-091-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x) Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux. From my notes: ----- One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address. Constraints: ------------ I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need. One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing". One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch. This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it. Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs. To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family. The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before. The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row. In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later. One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically). You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it. This brings us as to how the correct FIB is selected for an outgoing IPV4 packet. Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways. Packets fall into one of a number of classes. 1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice.. setfib -3 ping target.example.com # will use fib 3 for ping. It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands. 2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.) 3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2). 4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib. 5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to. 6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1. Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented) In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB. In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process. Early testing experience: ------------------------- Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks. For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done. Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly. ipfw has grown 2 new keywords: setfib N ip from anay to any count ip from any to any fib N In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required. SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something. Where to next: -------------------- After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code. Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code. My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it. When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry. Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already. This work was sponsored by Ironport Systems/Cisco Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
* Make tcpstates[] static, and make sure TCPSTATES is defined beforedes2007-07-302-1/+2
| | | | | | | | | <netinet/tcp_fsm.h> is included into any compilation unit that needs tcpstates[]. Also remove incorrect extern declarations and TCPDEBUG conditionals. This allows kernels both with and without TCPDEBUG to build, and unbreaks the tinderbox. Approved by: re (rwatson)
* Include priv.h to pick up suser(9) definitions, missed in an earlierrwatson2007-06-131-0/+1
| | | | | | commit. Warnings spotted by: kris
* Remove IPX over IP tunneling support, which allows IPX routing over IPrwatson2007-06-135-569/+1
| | | | | | | | | | tunnels, and was not MPSAFE. The code can be easily restored in the event that someone with an IPX over IP tunnel configuration can work with me to test patches. This removes one of five remaining consumers of NET_NEEDS_GIANT. Approved by: re (kensmith)
* Use ANSI C function declarations throughout netipx.rwatson2007-05-1112-162/+88
| | | | Remove 'register' use.
* Reduce network stack oddness: implement .pru_sockaddr and .pru_peeraddrrwatson2007-05-113-6/+6
| | | | | | | | protocol entry points using functions named proto_getsockaddr and proto_getpeeraddr rather than proto_setsockaddr and proto_setpeeraddr. While it's true that sockaddrs are allocated and set, the net effect is to retrieve (get) the socket address or peer address from a socket, not set it, so align names to that intent.
* Build ipx_ip.c only if options IPXIP is defined. No functional change.rwatson2007-02-261-2/+0
|
* Fix a likely bug by adding what appears to be a missing break statementrwatson2007-02-261-0/+1
| | | | | in the IPX over IP configuration ioctl: when changing the flags on a tunnel interface, return the generated error rather than always EINVAL.
* Further style(9) for ipx_ip.rwatson2007-02-252-9/+7
|
* Improve ipx_ip.c's approximation of style(9).rwatson2007-02-251-97/+82
|
* Factor out UCB and my copyrights from copyrights of Mike Mitchell;rwatson2007-01-0819-20/+501
| | | | | | | the former use a three-clause BSD license (per UCB authorization letter), whereas he uses a four-clause BSD license. MFC after: 3 days
OpenPOWER on IntegriCloud