summaryrefslogtreecommitdiffstats
path: root/net/ceph
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2012-07-241-10/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking changes from David S Miller: 1) Remove the ipv4 routing cache. Now lookups go directly into the FIB trie and use prebuilt routes cached there. No more garbage collection, no more rDOS attacks on the routing cache. Instead we now get predictable and consistent performance, no matter what the pattern of traffic we service. This has been almost 2 years in the making. Special thanks to Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who have helped along the way. I'm sure that with a change of this magnitude there will be some kind of fallout, but such things ought the be simple to fix at this point. Luckily I'm not European so I'll be around all of August to fix things :-) The major stages of this work here are each fronted by a forced merge commit whose commit message contains a top-level description of the motivations and implementation issues. 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on input. 3) TCP SYN/ACK performance tweaks from Eric Dumazet. 4) Add namespace support for netfilter L4 conntrack helpers, from Gao Feng. 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from Yuval Mintz. 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet. 7) Support for connection tracker helpers in userspace, from Pablo Neira Ayuso. 8) Allow userspace driven TX load balancing functions in TEAM driver, from Jiri Pirko. 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with embedded gotos. 10) TCP Small Queues, essentially minimize the amount of TCP data queued up in the packet scheduler layer. Whereas the existing BQL (Byte Queue Limits) limits the pkt_sched --> netdevice queuing levels, this controls the TCP --> pkt_sched queueing levels. From Eric Dumazet. 11) Reduce the number of get_page/put_page ops done on SKB fragments, from Alexander Duyck. 12) Implement protection against blind resets in TCP (RFC 5961), from Eric Dumazet. 13) Support the client side of TCP Fast Open, basically the ability to send data in the SYN exchange, from Yuchung Cheng. Basically, the sender queues up data with a sendmsg() call using MSG_FASTOPEN, then they do the connect() which emits the queued up fastopen data. 14) Avoid all the problems we get into in TCP when timers or PMTU events hit a locked socket. The TCP Small Queues changes added a tcp_release_cb() that allows us to queue work up to the release_sock() caller, and that's what we use here too. From Eric Dumazet. 15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits) genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP r8169: revert "add byte queue limit support". ipv4: Change rt->rt_iif encoding. net: Make skb->skb_iif always track skb->dev ipv4: Prepare for change of rt->rt_iif encoding. ipv4: Remove all RTCF_DIRECTSRC handliing. ipv4: Really ignore ICMP address requests/replies. decnet: Don't set RTCF_DIRECTSRC. net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse. ipv4: Remove redundant assignment rds: set correct msg_namelen openvswitch: potential NULL deref in sample() tcp: dont drop MTU reduction indications bnx2x: Add new 57840 device IDs tcp: avoid oops in tcp_metrics and reset tcpm_stamp niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value niu: Fix to check for dma mapping errors. net: Fix references to out-of-scope variables in put_cmsg_compat() net: ethernet: davinci_emac: add pm_runtime support net: ethernet: davinci_emac: Remove unnecessary #include ...
| * net: Fix non-kernel-doc comments with kernel-doc start markerBen Hutchings2012-07-101-10/+4
| | | | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | libceph: fix messenger retrySage Weil2012-07-171-6/+6
|/ | | | | | | | | | | | | | | In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <sage@inktank.com>
* libceph: flush msgr queue during mon_client shutdownSage Weil2012-06-202-7/+8
| | | | | | | | | | | | | | | | We need to flush the msgr workqueue during mon_client shutdown to ensure that any work affecting our embedded ceph_connection is finished so that we can be safely destroyed. Previously, we were flushing the work queue after osd_client shutdown and before mon_client shutdown to ensure that any osd connection refs to authorizers are flushed. Remove the redundant flush, and document in the comment that the mon_client flush is needed to cover that case as well. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit f3dea7edd3d449fe7a6d402c1ce56a294b985261)
* rbd: Clear ceph_msg->bio_iter for retransmitted messageYan, Zheng2012-06-201-0/+4
| | | | | | | | The bug can cause NULL pointer dereference in write_partial_msg_pages Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 43643528cce60ca184fe8197efa8e8da7c89a037)
* libceph: use con get/put ops from osd_clientSage Weil2012-06-201-4/+4
| | | | | | | | | | | | | | | | | | | | There were a few direct calls to ceph_con_{get,put}() instead of the con ops from osd_client.c. This is a bug since those ops aren't defined to be ceph_con_get/put. This breaks refcounting on the ceph_osd structs that contain the ceph_connections, and could lead to all manner of strangeness. The purpose of the ->get and ->put methods in a ceph connection are to allow the connection to indicate it has a reference to something external to the messaging system, *not* to indicate something external has a reference to the connection. [elder@inktank.com: added that last sentence] Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Alex Elder <elder@inktank.com> (cherry picked from commit 0d47766f14211a73eaf54cab234db134ece79f49)
* libceph: osd_client: don't drop reply reference too earlyAlex Elder2012-06-201-2/+2
| | | | | | | | | | | In ceph_osdc_release_request(), a reference to the r_reply message is dropped. But just after that, that same message is revoked if it was in use to receive an incoming reply. Reorder these so we are sure we hold a reference until we're actually done with the message. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit ab8cb34a4b2f60281a4b18b1f1ad23bc2313d91b)
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds2012-05-307-276/+235
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...
| * libceph: fix pg_temp updatesSage Weil2012-05-211-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Usually, we are adding pg_temp entries or removing them. Occasionally they update. In that case, osdmap_apply_incremental() was failing because the rbtree entry already exists. Fix by removing the existing entry before inserting a new one. Fixes http://tracker.newdream.net/issues/2446 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
| * libceph: avoid unregistering osd request when not registeredSage Weil2012-05-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between two __unregister_request() callers: the reply path and the ceph_osdc_wait_request(). If we get a reply *and* the timeout expires at roughly the same time, both callers will try to unregister the request, and the second one will do bad things. Simply check if the request is still already unregistered; if so, return immediately and do nothing. Fixes http://tracker.newdream.net/issues/2420 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
| * ceph: add auth buf in prepare_write_connect()Alex Elder2012-05-181-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the addition of the authorizer buffer to a connection's out_kvec out of get_connect_authorizer() and into its caller. This way, the caller--prepare_write_connect()--can avoid adding the connect header to out_kvec before it has been fully initialized. Prior to this patch, it was possible for a connect header to be sent over the wire before the authorizer protocol or buffer length fields were initialized. An authorizer buffer associated with that header could also be queued to send only after the connection header that describes it was on the wire. Fixes http://tracker.newdream.net/issues/2424 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: rename prepare_connect_authorizer()Alex Elder2012-05-171-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the name of prepare_connect_authorizer(). The next patch is going to make this function no longer add anything to the connection's out_kvec, so it will no longer fit the pattern of the rest of the prepare_connect_*() functions. In addition, pass the address of a variable that will hold the authorization protocol to use. Move the assignment of that to the connection's out_connect structure into prepare_write_connect(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: return pointer from prepare_connect_authorizer()Alex Elder2012-05-171-9/+9
| | | | | | | | | | | | | | | | Change prepare_connect_authorizer() so it returns a pointer (or pointer-coded error). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: use info returned by get_authorizerAlex Elder2012-05-172-14/+8
| | | | | | | | | | | | | | | | | | | | Rather than passing a bunch of arguments to be filled in with the content of the ceph_auth_handshake buffer now returned by the get_authorizer method, just use the returned information in the caller, and drop the unnecessary arguments. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: have get_authorizer methods return pointersAlex Elder2012-05-172-11/+16
| | | | | | | | | | | | | | | | | | | | Have the get_authorizer auth_client method return a ceph_auth pointer rather than an integer, pointer-encoding any returned error value. This is to pave the way for making use of the returned value in an upcoming patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: ensure auth ops are defined before useAlex Elder2012-05-171-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the create_authorizer method for both the mds and osd clients, the auth_client->ops pointer is blindly dereferenced. There is no obvious guarantee that this pointer has been assigned. And furthermore, even if the ops pointer is non-null there is definitely no guarantee that the create_authorizer or destroy_authorizer methods are defined. Add checks in both routines to make sure they are defined (non-null) before use. Add similar checks in a few other spots in these files while we're at it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: reduce args to create_authorizerAlex Elder2012-05-173-32/+26
| | | | | | | | | | | | | | | | | | | | Make use of the new ceph_auth_handshake structure in order to reduce the number of arguments passed to the create_authorizor method in ceph_auth_client_ops. Use a local variable of that type as a shorthand in the get_authorizer method definitions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: define ceph_auth_handshake typeAlex Elder2012-05-171-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | The definitions for the ceph_mds_session and ceph_osd both contain five fields related only to "authorizers." Encapsulate those fields into their own struct type, allowing for better isolation in some upcoming patches. Fix the #includes in "linux/ceph/osd_client.h" to lay out their more complete canonical path. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: check return from get_authorizerAlex Elder2012-05-171-3/+7
| | | | | | | | | | | | | | | | | | In prepare_connect_authorizer(), a connection's get_authorizer method is called but ignores its return value. This function can return an error, so check for it and return it if that ever occurs. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: rework prepare_connect_authorizer()Alex Elder2012-05-171-9/+19
| | | | | | | | | | | | | | | | | | | | | | Change prepare_connect_authorizer() so it returns without dropping the connection mutex if the connection has no get_authorizer method. Use the symbolic CEPH_AUTH_UNKNOWN instead of 0 when assigning authorization protocols. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: check prepare_write_connect() resultAlex Elder2012-05-171-4/+12
| | | | | | | | | | | | | | | | | | | | prepare_write_connect() can return an error, but only one of its callers checks for it. All the rest are in functions that already return errors, so it should be fine to return the error if one gets returned. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: don't set WRITE_PENDING too earlyAlex Elder2012-05-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | prepare_write_connect() prepares a connect message, then sets WRITE_PENDING on the connection. Then *after* this, it calls prepare_connect_authorizer(), which updates the content of the connection buffer already queued for sending. It's also possible it will result in prepare_write_connect() returning -EAGAIN despite the WRITE_PENDING big getting set. Fix this by preparing the connect authorizer first, setting the WRITE_PENDING bit only after that is done. Partially addresses http://tracker.newdream.net/issues/2424 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: drop msgr argument from prepare_write_connect()Alex Elder2012-05-171-14/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | In all cases, the value passed as the msgr argument to prepare_write_connect() is just con->msgr. Just get the msgr value from the ceph connection and drop the unneeded argument. The only msgr passed to prepare_write_banner() is also therefore just the one from con->msgr, so change that function to drop the msgr argument as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: send banner in process_connect()Alex Elder2012-05-171-9/+7
| | | | | | | | | | | | | | | | | | | | | | prepare_write_connect() has an argument indicating whether a banner should be sent out before sending out a connection message. It's only ever set in one of its callers, so move the code that arranges to send the banner into that caller and drop the "include_banner" argument from prepare_write_connect(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: reset connection kvec callerAlex Elder2012-05-171-1/+5
| | | | | | | | | | | | | | | | | | | | Reset a connection's kvec fields in the caller rather than in prepare_write_connect(). This ends up repeating a few lines of code but it's improving the separation between distinct operations on the connection, which we can take advantage of later. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * libceph: don't reset kvec in prepare_write_banner()Alex Elder2012-05-171-3/+1
| | | | | | | | | | | | | | | | Move the kvec reset for a connection out of prepare_write_banner and into its only caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: change read_partial() to take "end" argAlex Elder2012-05-141-22/+38
| | | | | | | | | | | | | | | | | | Make the second argument to read_partial() be the ending input byte position rather than the beginning offset it now represents. This amounts to moving the addition "to + size" into the caller. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: update "to" in read_partial() callerAlex Elder2012-05-141-15/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | read_partial() always increases whatever "to" value is supplied by adding the requested size to it, and that's the only thing it does with that pointed-to value. Do that pointer advance in the caller (and then only when the updated value will be subsequently used), and change the "to" parameter to be an in-only and non-pointer value. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: messenger: use read_partial() in read_partial_message()Alex Elder2012-05-141-20/+10
| | | | | | | | | | | | | | | | | | There are two blocks of code in read_partial_message()--those that read the header and footer of the message--that can be replaced by a call to read_partial(). Do that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * ceph: osd_client: fix endianness bug in osd_req_encode_op()Alex Elder2012-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | From Al Viro <viro@zeniv.linux.org.uk> Al Viro noticed that we were using a non-cpu-encoded value in a switch statement in osd_req_encode_op(). The result would clearly not work correctly on a big-endian machine. Signed-off-by: Alex Elder <elder@dreamhost.com>
| * crush: warn on do_rule failureSage Weil2012-05-071-4/+11
| | | | | | | | | | | | | | | | If we get an error code from crush_do_rule(), print an error to the console. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: fix memory leak when destroying tree bucketsSage Weil2012-05-071-0/+2
| | | | | | | | | | | | | | | | Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4. Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: fix tree node weight lookupSage Weil2012-05-071-3/+1
| | | | | | | | | | | | | | | | | | Fix the node weight lookup for tree buckets by using a correct accessor. Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: remove parent mapsSage Weil2012-05-072-32/+0
| | | | | | | | | | | | | | | | | | These were used for the ill-fated forcefeed feature. Remove them. Reflects ceph.git commit ebdf80edfecfbd5a842b71fbe5732857994380c1. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: remove forcefeed functionalitySage Weil2012-05-072-48/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove forcefeed functionality from CRUSH. This is an ugly misfeature that is mostly useless and unused. Remove it. Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com> Conflicts: net/ceph/crush/mapper.c
| * crush: use a temporary variable to simplify crush_do_ruleSage Weil2012-05-071-9/+11
| | | | | | | | | | | | | | | | | | | | Use a temporary variable here to avoid repeated array lookups and clean up the code a bit. This reflects ceph.git commit 6b5be27634ad307b471a5bf0db85c4f5c834885f. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: be more tolerant of nonsensical crush mapsSage Weil2012-05-071-9/+24
| | | | | | | | | | | | | | | | | | | | If we get a map that doesn't make sense, error out or ignore the badness instead of BUGging out. This reflects the ceph.git commits 9895f0bff7dc68e9b49b572613d242315fb11b6c and 8ded26472058d5205803f244c2f33cb6cb10de79. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: adjust local retry thresholdSage Weil2012-05-071-1/+1
| | | | | | | | | | | | | | | | | | This small adjustment reflects a change that was made in ceph.git commit af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago. An N-1 search is not exhaustive. Fixed ceph.git bug #1594. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * crush: clean up types, const-nessSage Weil2012-05-072-19/+20
| | | | | | | | | | | | | | | | | | | | | | Move various types from int -> __u32 (or similar), and add const as appropriate. This reflects changes that have been present in the userland implementation for some time. Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| * ceph: drop support for preferred_osd pgsSage Weil2012-05-071-37/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
* | crush: include header for global symbolshartleys2012-04-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Include the header to pickup the definitions of the global symbols. Quiets the following sparse warnings: warning: symbol 'crush_find_rule' was not declared. Should it be static? warning: symbol 'crush_do_rule' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Sage Weil <sage@newdream.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: cleanup unsigned to unsigned intEric Dumazet2012-04-159-36/+36
|/ | | | | | | Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* libceph: isolate kmap() call in write_partial_msg_pages()Alex Elder2012-03-221-11/+2
| | | | | | | | | | | | | In write_partial_msg_pages(), every case now does an identical call to kmap(page). Instead, just call it once inside the CRC-computing block where it's needed. Move the definition of kaddr inside that block, and make it a (char *) to ensure portable pointer arithmetic. We still don't kunmap() it until after the sendpage() call, in case that also ends up needing to use the mapping. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* libceph: rename "page_shift" variable to something sensibleAlex Elder2012-03-221-5/+6
| | | | | | | | | | | | | | | In write_partial_msg_pages() there is a local variable used to track the starting offset within a bio segment to use. Its name, "page_shift" defies the Linux convention of using that name for log-base-2(page size). Since it's only used in the bio case rename it "bio_offset". Use it along with the page_pos field to compute the memory offset when computing CRC's in that function. This makes the bio case match the others more closely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* libceph: get rid of zero_page_addressAlex Elder2012-03-221-9/+2
| | | | | | | | | | | | | | There's not a lot of benefit to zero_page_address, which basically holds a mapping of the zero page through the life of the messenger module. Even with our own mapping, the sendpage interface where it's used may need to kmap() it again. It's almost certain to be in low memory anyway. So stop treating the zero page specially in write_partial_msg_pages() and just get rid of zero_page_address entirely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* libceph: only call kernel_sendpage() via helperAlex Elder2012-03-221-6/+2
| | | | | | | | Make ceph_tcp_sendpage() be the only place kernel_sendpage() is used, by using this helper in write_partial_msg_pages(). Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* libceph: use kernel_sendpage() for sending zeroesAlex Elder2012-03-221-5/+15
| | | | | | | | | | | | | | If a message queued for send gets revoked, zeroes are sent over the wire instead of any unsent data. This is done by constructing a message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg(). Since we are already working with a page in this case we can use the sendpage interface instead. Create a new ceph_tcp_sendpage() helper that sets up flags to match the way ceph_tcp_sendmsg() does now. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* libceph: fix inverted crc option logicAlex Elder2012-03-221-10/+10
| | | | | | | | | | | | | | | | | | | | | CRC's are computed for all messages between ceph entities. The CRC computation for the data portion of message can optionally be disabled using the "nocrc" (common) ceph option. The default is for CRC computation for the data portion to be enabled. Unfortunately, the code that implements this feature interprets the feature flag wrong, meaning that by default the CRC's have *not* been computed (or checked) for the data portion of messages unless the "nocrc" option was supplied. Fix this, in write_partial_msg_pages() and read_partial_message(). Also change the flag variable in write_partial_msg_pages() to be "no_datacrc" to match the usage elsewhere in the file. This fixes http://tracker.newdream.net/issues/2064 Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>
* libceph: some simple changesAlex Elder2012-03-221-4/+9
| | | | | | | | | | | | Nothing too big here. - define the size of the buffer used for consuming ignored incoming data using a symbolic constant - simplify the condition determining whether to unmap the page in write_partial_msg_pages(): do it for crc but not if the page is the zero page Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
* libceph: small refactor in write_partial_kvec()Alex Elder2012-03-221-11/+12
| | | | | | | | | Make a small change in the code that counts down kvecs consumed by a ceph_tcp_sendmsg() call. Same functionality, just blocked out a little differently. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>
OpenPOWER on IntegriCloud