Currently all the work directed toward the lustre upstream client is tracked at the following link: https://jira.hpdd.intel.com/browse/LU-9679 Under this ticket you will see the following work items that need to be addressed: ****************************************************************************** * libcfs cleanup * * https://jira.hpdd.intel.com/browse/LU-9859 * * Track all the cleanups and simplification of the libcfs module. Remove * functions the kernel provides. Possible intergrate some of the functionality * into the kernel proper. * ****************************************************************************** https://jira.hpdd.intel.com/browse/LU-100086 LNET_MINOR conflicts with USERIO_MINOR ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8130 Fix and simplify libcfs hash handling ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8703 The current way we handle SMP is wrong. Platforms like ARM and KNL can have core and NUMA setups with things like NUMA nodes with no cores. We need to handle such cases. This work also greatly simplified the lustre SMP code. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9019 Replace libcfs time API with standard kernel APIs. Also migrate away from jiffies. We found jiffies can vary on nodes which can lead to corner cases that can break the file system due to nodes having inconsistent behavior. So move to time64_t and ktime_t as much as possible. ****************************************************************************** * Proper IB support for ko2iblnd ****************************************************************************** https://jira.hpdd.intel.com/browse/LU-9179 Poor performance for the ko2iblnd driver. This is related to many of the patches below that are missing from the linux client. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9886 Crash in upstream kiblnd_handle_early_rxs() ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10394 / LU-10526 / LU-10089 Default to default to using MEM_REG ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10459 throttle tx based on queue depth ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9943 correct WR fast reg accounting ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10291 remove concurrent_sends tunable ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10213 calculate qp max_send_wrs properly ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9810 use less CQ entries for each connection ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10129 / LU-9180 rework map_on_demand behavior ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10129 query device capabilities ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10015 fix race at kiblnd_connect_peer ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9983 allow for discontiguous fragments ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9500 Don't Page Align remote_addr with FastReg ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9448 handle empty CPTs ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9507 Don't Assert On Reconnect with MultiQP ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9472 Fix FastReg map/unmap for MLX5 ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9425 Turn on 2 sges by default ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8943 Enable Multiple OPA Endpoints between Nodes ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-5718 multiple sges for work request ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9094 kill timedout txs from ibp_tx_queue ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9094 reconnect peer for REJ_INVALID_SERVICE_ID ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8752 Stop MLX5 triggering a dump_cqe ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8874 Move ko2iblnd to latest RDMA changes ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8875 / LU-8874 Change to new RDMA done callback mechanism ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9164 / LU-8874 Incorporate RDMA map/unamp API's into ko2iblnd ****************************************************************************** * sysfs/debugfs fixes * * https://jira.hpdd.intel.com/browse/LU-8066 * * The original migration to sysfs was done in haste without properly working * utilities to test the changes. This covers the work to restore the proper * behavior. Huge project to make this right. * ****************************************************************************** https://jira.hpdd.intel.com/browse/LU-9431 The function class_process_proc_param was used for our mass updates of proc tunables. It didn't work with sysfs and it was just ugly so it was removed. In the process the ability to mass update thousands of clients was lost. This work restores this in a sane way. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9091 One the major request of users is the ability to pass in parameters into a sysfs file in various different units. For example we can set max_pages_per_rpc but this can vary on platforms due to different platform sizes. So you can set this like max_pages_per_rpc=16MiB. The original code to handle this written before the string helpers were created so the code doesn't follow that format but it would be easy to move to. Currently the string helpers does the reverse of what we need, changing bytes to string. We need to change a string to bytes. ****************************************************************************** * Proper user land to kernel space interface for Lustre * * https://jira.hpdd.intel.com/browse/LU-9680 * ****************************************************************************** https://jira.hpdd.intel.com/browse/LU-8915 Don't use linux list structure as user land arguments for lnet selftest. This code is pretty poor quality and really needs to be reworked. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8834 The lustre ioctl LL_IOC_FUTIMES_3 is very generic. Need to either work with other file systems with similar functionality and make a common syscall interface or rework our server code to automagically do it for us. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-6202 Cleanup up ioctl handling. We have many obsolete ioctls. Also the way we do ioctls can be changed over to netlink. This also has the benefit of working better with HPC systems that do IO forwarding. Such systems don't like ioctls very well. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9667 More cleanups by making our utilities use sysfs instead of ioctls for LNet. Also it has been requested to move the remaining ioctls to the netlink API. ****************************************************************************** * Misc ****************************************************************************** ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9855 Clean up obdclass preprocessor code. One of the major eye sores is the various pointer redirections and macros used by the obdclass. This makes the code very difficult to understand. It was requested by the Al Viro to clean this up before we leave staging. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9633 Migrate to sphinx kernel-doc style comments. Add documents in Documentation. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-6142 Possible remaining coding style fix. Remove deadcode. Enforce kernel code style. Other minor misc cleanups... ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8837 Separate client/server functionality. Functions only used by server can be removed from client. Most of this has been done but we need a inspect of the code to make sure. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-8964 Lustre client readahead/writeback control needs to better suit kernel providings. Currently its being explored. We could end up replacing the CLIO read ahead abstract with the kernel proper version. ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9862 Patch that landed for LU-7890 leads to static checker errors ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-9868 dcache/namei fixes for lustre ------------------------------------------------------------------------------ https://jira.hpdd.intel.com/browse/LU-10467 use standard linux wait_events macros work by Neil Brown ------------------------------------------------------------------------------ Please send any patches to Greg Kroah-Hartman , Andreas Dilger , James Simmons and Oleg Drokin .