summaryrefslogtreecommitdiffstats
path: root/sys/net
Commit message (Collapse)AuthorAgeFilesLines
* Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD.jhb2012-03-191-3/+0
| | | | | The new [RW]LOCK macros are merged back to 8.x so should be suitable for new code in HEAD even if it is to be MFC'd.
* Hide kernel option ROUTETABLES evaluations in the implementationbz2012-03-182-21/+18
| | | | | | | | | | | | | | | | rather than the header file. With this also move RT_MAXFIBS and RT_NUMFIBS into the implemantion to avoid further usage in other code. rt_numfibs is all that should be needed. This allows users to change the number of FIBs from 1..RT_MAXFIBS(16) dynamically using the tunable without the need to change the kernel config for the maximum anymore. This means that thet multi-FIB feature is now fully available with GENERIC kernels. The kernel option ROUTETABLES can still be used to set the default numbers of FIBs in absence of the tunable. Ok.ed by: julian, hrs, melifaro MFC after: 2 weeks
* - remove an extra parenthesis in a closing brace;luigi2012-03-111-1/+6
| | | | | | | - add the macro NETMAP_RING_FIRST_RESERVED() which returns the index of the first non-released buffer in the ring (this is useful for code that retains buffers for some time instead of processing them immediately)
* Move the vlan buffer space into the union which also fixes an unused variablethompsa2012-03-071-2/+2
| | | | | | warning with !INET & !INET6. Spotted by: pluknet
* Add the ability to set which packet layers are used for the load balance hashthompsa2012-03-063-15/+82
| | | | calculation.
* Properly restore curvnet context when returning early fromzec2012-03-041-1/+4
| | | | | | | | | | ether_input_internal(). This change only affects options VIMAGE kernel builds. PR: kern/165643 Submitted by: Vijay Singh MFC after: 3 days
* o) Add COMPAT_FREEBSD32 support for MIPS kernels using the n64 ABI with ↵jmallett2012-03-031-7/+7
| | | | | | | | | | | | | | | | | | | | | userlands using the o32 ABI. This mostly follows nwhitehorn's lead in implementing COMPAT_FREEBSD32 on powerpc64. o) Add a new type to the freebsd32 compat layer, time32_t, which is time_t in the 32-bit ABI being used. Since the MIPS port is relatively-new, even the 32-bit ABIs use a 64-bit time_t. o) Because time{spec,val}32 has the same size and layout as time{spec,val} on MIPS with 32-bit compatibility, then, disable some code which assumes otherwise wrongly when built for MIPS. A more general macro to check in this case would seem like a good idea eventually. If someone adds support for using n32 userland with n64 kernels on MIPS, then they will have to add a variety of flags related to each piece of the ABI that can vary. That's probably the right time to generalize further. o) Add MIPS to the list of architectures which use PAD64_REQUIRED in the freebsd32 compat code. Probably this should be generalized at some point. Reviewed by: gonzo
* Use a more appropriate default for the maximum number of addresses in thethompsa2012-02-291-2/+2
| | | | | | | bridge forwarding table. PR: docs/164564 Discussed with: brueffer
* A bunch of netmap fixes:luigi2012-02-272-69/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | USERSPACE: 1. add support for devices with different number of rx and tx queues; 2. add better support for zero-copy operation, adding an extra field to the netmap ring to indicate how many buffers we have already processed but not yet released (with help from Eddie Kohler); 3. The two changes above unfortunately require an API change, so while at it add a version field and some spares to the ioctl() argument to help detect mismatches. 4. update the manual page for the two changes above; 5. update sample applications in tools/tools/netmap KERNEL: 1. simplify the internal structures moving the global wait queues to the 'struct netmap_adapter'; 2. simplify the functions that map kring<->nic ring indexes 3. normalize device-specific code, helps mainteinance; 4. start exploring the impact of micro-optimizations (prefetch etc.) in the ixgbe driver. Use 'legacy' descriptors on the tx ring and prefetch slots gives about 20% speedup at 900 MHz. Another 7-10% would come from removing the explict calls to bus_dmamap* in the core (they are effectively NOPs in this case, but it takes expensive load of the per-buffer dma maps to figure out that they are all NULL. Rx performance not investigated. I am postponing the MFC so i can import a few more improvements before merging.
* Only look for a usable MAC address for the bridge ID from ports within ourthompsa2012-02-241-20/+30
| | | | | | | | | bridge, this allows us to have more than one independent bridge in the same STP domain. PR: kern/164369 Submitted by: Nikos Vassiliadis (earlier version) MFC after: 2 weeks
* Add a sysctl/tunable default value for the use_flowid sysctl in r232008.thompsa2012-02-231-1/+6
|
* Indicate this function decrements the timer as well as testing for expiry.thompsa2012-02-231-11/+11
|
* When using flowtable llentrys can outlive the interface with which they're ↵kmacy2012-02-231-3/+1
| | | | | | | | | | | associated at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer valid. Move the free pointer in to the llentry itself and update the initalization sites. MFC after: 2 weeks
* Now that network interfaces advertise if they support linkstate notificationsthompsa2012-02-231-3/+5
| | | | we do not need to perform a media ioctl every 15 seconds.
* bstp_input() always consumes the packet so remove the mbuf handling dancethompsa2012-02-233-9/+6
| | | | | | around it. Obtained from: OpenBSD (r1.37)
* Using the flowid in the mbuf assumes the network card is giving a good hash forthompsa2012-02-223-2/+18
| | | | | | | | | the traffic flow, this may not be the case giving poor traffic distribution. Add a sysctl which allows us to fall back to our own flow hash code. PR: kern/164901 Submitted by: Eugene Grosbein MFC after: 1 week
* Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:bz2012-02-174-70/+122
| | | | | | | | | | | | Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity. This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat. Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
* Change some headers such that lang/gcc* ports no longer patch them.tijl2012-02-141-1/+1
| | | | | | | The lang/gcc* ports patch headers where they think something is non-standard. These patched headers override the system headers which means you have to rebuild these ports whenever you do installworld to make sure they contain the latest changes.
* Introduce a new NET_RT_IFLISTL API to query the address list. It worksbz2012-02-112-44/+257
| | | | | | | | | | | on extended and extensible structs if_msghdrl and ifa_msghdrl. This will allow us to extend both the msghdrl structs and eventually if_data in the future without breaking the ABI. Bump __FreeBSD_version to allow ports to more easily detect the new API. Reviewed by: glebius, brooks MFC after: 3 days
* Backout changes from r228571. Remove if_data from struct ifa_msghdr again.bz2012-02-112-6/+0
| | | | | | | While this breaks carp on HEAD temporary, it restores the upgrade path from stable, and head before 20111215. Reviewed by: glebius, brooks
* g/c last bit of old ipv6 prefix management.pluknet2012-02-082-17/+1
| | | | | Reviewed by: bz Obtained from: NetBSD, net/if.h, rev 1.80
* - change the buffer size from a constant to aluigi2012-02-082-13/+1
| | | | | | | | | | | | | TUNABLE variable (hw.netmap.buf_size) so we can experiment with values different from 2048 which may give better cache performance. - rearrange the memory allocation code so it will be easier to replace it with a different implementation. The current code relies on a single large contiguous chunk of memory obtained through contigmalloc. The new implementation (not committed yet) uses multiple smaller chunks which are easier to fit in a fragmented address space.
* Allow to set if_bridge(4) sysctls from /boot/loader.conf.pjd2012-02-071-0/+7
| | | | MFC after: 3 days
* Fix typo in r231010.glebius2012-02-051-1/+1
| | | | Submitted by: linimon
* Better comment for ifa_init(), ifa_ref(), ifa_free().glebius2012-02-051-1/+1
|
* In ifa_init() initialize if_data.ifi_datalen. This would beglebius2012-02-051-0/+1
| | | | | | required after upcoming changes from bz@. Discussed with: bz
* A flowtable entry can continue referencing an llentry indefinitely if the ↵kmacy2012-01-262-1/+4
| | | | | | | | | | | entry is repeatedly referenced within its timeout window. This change clears the LLE_VALID flag when an llentry is removed from an interface's hash table and adds an extra check to the flowtable code for the LLE_VALID flag in llentry to avoid retaining and using a stale reference. Reviewed by: qingli@ MFC after: 2 weeks
* Replace random ARIN direct assignment legacy IPs with proper RFC 5735bz2012-01-241-4/+4
| | | | | | TEST-NET1 block for use in documentation and example code addresses. MFC after: 3 days
* - Fix trivial typoeadler2012-01-144-4/+4
| | | | | Approved by: nwhitehorn MFC after: 3 days
* Clarify throughout the vlan(4) code the difference between a "tag" (therwatson2012-01-122-54/+61
| | | | | | | | | | | | | 802.1q-defined 16-bit VID, CFI, and PCP field in host by order) and a VLAN ID (VID). Tags go in packets. VIDs identify VLANs. No functional change is intended, so this should be safe to MFC. Further cleanup with functional changes will be committed separately (for example, renaming vlan_tag/vlan_tag_p, which modify the KPI and KBI). Reviewed by: bz Sponsored by: ADARA Networks, Inc. MFC after: 3 days
* Consumers of bpfdetach() expect it to remove all bpf_if structs from thelstewart2012-01-101-22/+31
| | | | | | | | | | | | | | | | | | | | | | | bpf_iflist list which reference the specified ifnet. The existing implementation only removes the first matching bpf_if found in the list, effectively leaking list entries if an ifnet has been bpfattach()ed multiple times with different DLTs. Fix the leak by performing the detach logic in a loop, stopping when all bpf_if structs referencing the specified ifnet have been detached and removed from the bpf_iflist list. Whilst here, also: - Remove the unnecessary "bp->bif_ifp == NULL" check, as a bpf_if should never exist in the list with a NULL ifnet pointer. - Except when INVARIANTS is in the kernel config, silently ignore the case where no bpf_if referencing the specified ifnet is found, as it is harmless and does not require user attention. Reviewed by: csjp MFC after: 1 week
* Convert the per-interface address list lock from a mutex to a reader/writerjhb2012-01-091-11/+10
| | | | | | lock. Reviewed by: bz
* Copy ifa->if_data to ifam->ifam_data. This was forgotten in r228571.glebius2012-01-081-0/+1
| | | | Submitted by: bz
* Move arprequest() declaration to if_ether.h.glebius2012-01-081-3/+0
|
* Since r228571 CARP is no longer an interface.glebius2012-01-061-8/+0
|
* Convert all users of IF_ADDR_LOCK to use new locking macros that specifyjhb2012-01-052-63/+63
| | | | | | | either a read lock or write lock. Reviewed by: bz MFC after: 2 weeks
* Add new variants of the IF_ADDR_*LOCK*() macros used for protectingjhb2012-01-051-2/+8
| | | | | | | | | interface address lists that distinguish read locks from write locks. To preserve the KPI, the previous operations are mapped to the write lock macros. The lock is still kept as a mutex for now. Reviewed by: bz MFC after: 2 weeks
* Refine last comment.rwatson2012-01-051-1/+1
| | | | | | Submitted by: joeld Sponsored by: ADARA Networks, Inc. MFC after: 3 days
* Add comment to the VLAN code about its integration with VIMAGE: we see whatrwatson2012-01-051-0/+7
| | | | | | | | | | the code is doing, we recognise the legitimacy of its goal, but we're not quite sure it's going about it the right way. More pondering is clearly required. Sponsored by: ADARA Networks, Inc. Discussed with: bz MFC after: 3 days
* Revert r228986 until it can be reworked to avoid panicing the kernel when thelstewart2011-12-312-193/+82
| | | | | | | same interface is attached multiple times with different DLTs, as is done in net80211 for example. Reported by: adrian
* - Introduce the net.bpf.tscfg sysctl tree and associated code so as to make onelstewart2011-12-302-82/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | aspect of time stamp configuration per interface rather than per BPF descriptor. Prior to this, the order in which BPF devices were opened and the per descriptor time stamp configuration settings could cause non-deterministic and unintended behaviour with respect to time stamping. With the new scheme, a BPF attached interface's tscfg sysctl entry can be set to "default", "none", "fast", "normal" or "external". Setting "default" means use the system default option (set with the net.bpf.tscfg.default sysctl), "none" means do not generate time stamps for tapped packets, "fast" means generate time stamps for tapped packets using a hz granularity system clock read, "normal" means generate time stamps for tapped packets using a full timecounter granularity system clock read and "external" (currently unimplemented) means use the time stamp provided with the packet from an underlying source. - Utilise the recently introduced sysclock_getsnapshot() and sysclock_snap2bintime() KPIs to ensure the system clock is only read once per packet, regardless of the number of BPF descriptors and time stamp formats requested. Use the per BPF attached interface time stamp configuration to control if sysclock_getsnapshot() is called and whether the system clock read is fast or normal. The per BPF descriptor time stamp configuration is then used to control how the system clock snapshot is converted to a bintime by sysclock_snap2bintime(). - Remove all FAST related BPF descriptor flag variants. Performing a "fast" read of the system clock is now controlled per BPF attached interface using the net.bpf.tscfg sysctl tree. - Update the bpf.4 man page. Committed on behalf of Julien Ridoux and Darryl Veitch from the University of Melbourne, Australia, as part of the FreeBSD Foundation funded "Feed-Forward Clock Synchronization Algorithms" project. For more information, see http://www.synclab.org/radclock/ In collaboration with: Julien Ridoux (jridoux at unimelb edu au)
* Update if_obytes and if_omcast after successful transmit.yongari2011-12-291-4/+8
| | | | | | | | | | While I'm here update if_oerrors if parent interface of vlan is not up and running. Previously it updated collision counter and it was confusing to interprete it. PR: kern/163478 Reviewed by: glebius, jhb Tested by: Joe Holden < lists <> rewt dot org dot uk >
* Provide ABI compatibility shim to enable configuring of addressesglebius2011-12-211-0/+8
| | | | | | with ifconfig(8) prior to r228571. Requested by: brooks
* Restore a feature that was present in 5.x and 6.x, and was cleared inglebius2011-12-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | 7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP preemption, while it is running its bulk update. However, reimplement the feature in more elegant manner, that is partially inspired by newer OpenBSD: - Rename term "suppression" to "demotion", to match with OpenBSD. - Keep a global demotion factor, that can be raised by several conditions, for now these are: - interface goes down - carp(4) has problems with ip_output() or ip6_output() - pfsync performs bulk update - Unlike in OpenBSD the demotion factor isn't a counter, but is actual value added to advskew. The adjustment values for particular error conditions are also configurable, and their defaults are maximum advskew value, so a single failure bumps demotion to maximum. This is for POLA compatibility, and should satisfy most users. - Demotion factor is a writable sysctl, so user can do foot shooting, if he desires to.
* A major overhaul of the CARP implementation. The ip_carp.c was startedglebius2011-12-166-9/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from scratch, copying needed functionality from the old implemenation on demand, with a thorough review of all code. The main change is that interface layer has been removed from the CARP. Now redundant addresses are configured exactly on the interfaces, they run on. The CARP configuration itself is, as before, configured and read via SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or SIOCAIFADDR_IN6 may now be configured to a particular virtual host id, which makes the prefix redundant. ifconfig(8) semantics has been changed too: now one doesn't need to clone carpXX interface, he/she should directly configure a vhid on a Ethernet interface. To supply vhid data from the kernel to an application the getifaddrs(8) function had been changed to pass ifam_data with each address. [1] The new implementation definitely closes all PRs related to carp(4) being an interface, and may close several others. It also allows to run a single redundant IP per interface. Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for idea on using ifam_data and for several rounds of reviewing! PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448 Reviewed by: bz Submitted by: bz [1]
* Simplify rtrequest(RTM_ADD): ifa can't be NULL after rt_getifa_fib().glebius2011-12-151-9/+4
|
* Remove the unused if_free_type() function.brooks2011-12-092-22/+2
| | | | X-MFC after: never
* 1. Fix the handling of link reset while in netmap more.luigi2011-12-051-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A link reset now is completely transparent for the netmap client: even if the NIC resets its own ring (e.g. restarting from 0), the client will not see any change in the current rx/tx positions, because the driver will keep track of the offset between the two. 2. make the device-specific code more uniform across different drivers There were some inconsistencies in the implementation of the netmap support routines, now drivers have been aligned to a common code structure. 3. import netmap support for ixgbe . This is implemented as a very small patch for ixgbe.c (233 lines, 11 chunks, mostly comments: in total the patch has only 54 lines of new code) , as most of the code is in an external file sys/dev/netmap/ixgbe_netmap.h , following some initial comments from Jack Vogel about making changes less intrusive. (Note, i have emailed Jack multiple times asking if he had comments on this structure of the code; i got no reply so i assume he is fine with it). Support for other drivers (em, lem, re, igb) will come later. "ixgbe" is now the reference driver for netmap support. Both the external file (sys/dev/netmap/ixgbe_netmap.h) and the device-specific patches (in sys/dev/ixgbe/ixgbe.c) are heavily commented and should serve as a reference for other device drivers. Tested on i386 and amd64 with the pkt-gen program in tools/tools/netmap, the sender does 14.88 Mpps at 1050 Mhz and 14.2 Mpps at 900 MHz on an i7-860 with 4 cores and 82599 card. Haven't tried yet more aggressive optimizations such as adding 'prefetch' instructions in the time-critical parts of the code.
* Revert r227778 in preparation for committing reworked patches in its place.lstewart2011-11-292-165/+12
|
* Change the if_vlan driver to use if_transmit for forwarding packets to thejhb2011-11-281-83/+79
| | | | | | | | | parent interface. This avoids the overhead of queueing a packet to an IFQ only to immediately dequeue it again. Suggested by: np Reviewed by: brooks MFC after: 1 month
OpenPOWER on IntegriCloud