| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI update from James Bottomley:
"This is a much shorter set of patches that were on the go but didn't
make it in to the early pull request for the merge window. It's
really a set of bug fixes plus some final cleanup work on the new tag
queue API"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
storvsc: ring buffer failures may result in I/O freeze
ipr: set scsi_level correctly for disk arrays
ipr: add support for async scanning to speed up boot
scsi_debug: fix missing "break;" in SDEBUG_UA_CAPACITY_CHANGED case
scsi_debug: take sdebug_host_list_lock when changing capacity
scsi_debug: improve driver description in Kconfig
scsi_debug: fix compare and write errors
qla2xxx: fix race in handling rport deletion during recovery causes panic
scsi: blacklist RSOC for Microsoft iSCSI target devices
scsi: fix random memory corruption with scsi-mq + T10 PI
Revert "[SCSI] mpt3sas: Remove phys on topology change"
Revert "[SCSI] mpt2sas: Remove phys on topology change."
esas2r: Correct typos of "validate" in a comment
fc: FCP_PTA_SIMPLE is 0
ibmvfc: remove unused tag variable
scsi: remove MSG_*_TAG defines
scsi: remove scsi_set_tag_type
scsi: remove scsi_get_tag_type
scsi: never drop to untagged mode during queue ramp down
scsi: remove ->change_queue_type method
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For SPI drivers use the message definitions from scsi.h, and for target
drivers introduce a new TCM_*_TAG namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since we got rid of ordered tag support in 2010 the prime use case of
switching on and off ordered tags has been obsolete. The other function
of enabling/disabling tagging entirely has only been correctly implemented
by the 53c700 driver and isn't generally useful.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
"The highlights this merge window include:
- Allow target fabric drivers to function as built-in. (Roland)
- Fix tcm_loop multi-TPG endpoint nexus bug. (Hannes)
- Move per device config_item_type into se_subsystem_api, allowing
configfs attributes to be defined at module_init time. (Jerome +
nab)
- Convert existing IBLOCK/FILEIO/RAMDISK/PSCSI/TCMU drivers to use
external configfs attributes. (nab)
- A number of iser-target fixes related to active session + network
portal shutdown stability during extended stress testing. (Sagi +
Slava)
- Dynamic allocation of T10-PI contexts for iser-target, fixing a
potentially bogus iscsi_np->tpg_np pointer reference in >= v3.14
code. (Sagi)
- iser-target performance + scalability improvements. (Sagi)
- Fixes for SPC-4 Persistent Reservation AllRegistrants spec
compliance. (Ilias + James + nab)
- Avoid potential short kern_sendmsg() in iscsi-target for now until
Al's conversion to use msghdr iteration is merged post -rc1.
(Viro)
Also, Sagi has requested a number of iser-target patches (9) that
address stability issues he's encountered during extended stress
testing be considered for v3.10.y + v3.14.y code. Given the amount of
LOC involved, it will certainly require extra backporting effort.
Apologies in advance to Greg-KH & Co on this. Sagi and I will be
working post-merge to ensure they each get applied correctly"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (53 commits)
target: Allow AllRegistrants to re-RESERVE existing reservation
uapi/linux/target_core_user.h: fix headers_install.sh badness
iscsi-target: Fail connection on short sendmsg writes
iscsi-target: nullify session in failed login sequence
target: Avoid dropping AllRegistrants reservation during unregister
target: Fix R_HOLDER bit usage for AllRegistrants
iscsi-target: Drop left-over bogus iscsi_np->tpg_np
iser-target: Fix wc->wr_id cast warning
iser-target: Remove code duplication
iser-target: Adjust log levels and prettify some prints
iser-target: Use debug_level parameter to control logging level
iser-target: Fix logout sequence
iser-target: Don't wait for session commands from completion context
iser-target: Reduce CQ lock contention by batch polling
iser-target: Introduce isert_poll_budget
iser-target: Remove an atomic operation from the IO path
iser-target: Remove redundant call to isert_conn_terminate
iser-target: Use single CQ for TX and RX
iser-target: Centralize completion elements to a context
iser-target: Cast wr_id with uintptr_t instead of unsinged long
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
CC [M] drivers/infiniband/ulp/isert/ib_isert.o
drivers/infiniband/ulp/isert/ib_isert.c: In function ‘isert_cq_comp_err’:
drivers/infiniband/ulp/isert/ib_isert.c:1979:42: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
- Fall-through in switch case instead in do_control_comp.
- Move rkey invalidation to a function.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
debug_level 1 (warn): Include warning messages.
debug_level 2 (info): Include relevant info for control plane.
debug_level 3 (debug): Include relevant info in the IO path.
Also, added/removed some logging messages.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Personal preference, easier control of the log level with
a single modparam which can be changed dynamically. Allows
better saparation of control and IO plains.
Replaced throughout ib_isert.c:
s/pr_debug/isert_dbg/g
s/pr_info/isert_info/g
s/pr_warn/isert_warn/g
s/pr_err/isert_err/g
Plus nit checkpatch warning change.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We don't want to wait for conn_logout_comp from isert_comp_wq
context as this blocks further completions from being processed.
Instead we wait for it conditionally (if logout response was
actually posted) in wait_conn. This wait should normally happen
immediately as it occurs after we consumed all the completions
(including flush errors) and conn_logout_comp should have been
completed.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Might result in a deadlock where completion context waits for
session commands release where the later might need a final
completion for it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In order to reduce the contention on CQ locking (present
in some LLDDs) we poll in batches of 16 work completion items.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In case the CQ is packed with completions, we can't just
hog the CPU forever. Poll until a sufficient budget (currently
hard-coded to 64k completions) and if budget is exhausted, bailout
and give a chance to other threads.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In order to know that we consumed all the connection completions
we maintain atomic post_send_buf_count for each IO post send. But
we can know that if we post a "beacon" (zero length RECV work request)
after we move the QP into error state and the target does not serve
any new IO. When we consume it, we know we finished all the connection
completion and we can go ahead and destroy stuff.
In error completion handler we now just need to check for ISERT_BEACON_WRID
to arrive and then wait for session commands to cleanup and complete
conn_wait_comp_err.
We reserve another CQ and QP entries to fit the zero length post recv.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We are calling session reinstatement, wait_conn will start
connection termination.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Using TX and RX CQs attached to the same vector might
create a throttling effect coming from the serial processing
of a work-queue. Use one CQ instead, it will do better in interrupt
processing and it provides a simpler code. Also, We get rid of
redundant isert_rx_wq.
Next we can remove the atomic post_send_buf_count from the IO path.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A pre-step before going to a single CQ.
Also this makes the code a little more simple to
read.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Nit, uintptr_t is designed for pointer casting, use it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
As a pre-step to a single CQ, we unite the error completion
handlers to a single handler.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It is disabled at the moment, we will get that back
in once the target is more stable.
This reverts commit 95b60f0
"Add support for completion interrupt coalescing"
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently we have no way to tell that the target stack is in shutdown
sequence. In case we have open connections, the initiator immediately
attempts to reconnect in a DDOS attack style, so we may end up
terminating the iser enabled network portal while it's np_accept_list
still have pending connections.
The workaround is simply release all the connections in the list.
A proper fix will be to start shutdown sequence by shutting the
network portal to avoid initiator immediate reconnect attempts.
But the temporary work around seems to work at this point, so I think
we can do this for now...
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
iSER will report supported protection operations based on
the tpg attribute t10_pi settings and HCA PI offload capabilities.
If the HCA does not support PI offload or tpg attribute t10_pi is
not set, we fall to SW PI mode.
In order to do that, we move iscsit_get_sup_prot_ops after connection
tpg assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fallback to software mode DIF if HCA does not support
PI (without crashing obviously). It is still possible to
run with backend protection and an unprotected frontend,
so looking at the command prot_op is not enough. Check
device PI capability on a per-IO basis (isert_prot_cmd
inline static) to determine if we need to handle protection
information.
Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert]
Call Trace:
[<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130
[<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert]
[<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0
[<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert]
[<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod]
[<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod]
[<ffffffff8106ece2>] process_one_work+0x182/0x3b0
[<ffffffff8106fda0>] worker_thread+0x120/0x3c0
[<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190
[<ffffffff8107594e>] kthread+0xce/0xf0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch converts to allocate PI contexts dynamically in order
avoid a potentially bogus np->tpg_np and associated NULL pointer
dereference in isert_connect_request() during iser-target endpoint
shutdown with multiple network portals.
Also, there is really no need to allocate these at connection
establishment since it is not guaranteed that all the IOs on
that connection will be to a PI formatted device.
We can do it in a lazy fashion so the initial burst will have a
transient slow down, but very fast all IOs will allocate a PI
context.
Squashed:
iser-target: Centralize PI context handling code
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.
So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.
The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).
We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.
Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.
Squashed:
iser-target: Move cma_id setup to a function
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.
In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.
Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().
Squashed:
iser-target: Decrement completion context active_qps in error flow
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.
This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.
In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.
Squashed:
iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.
Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.
- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.
Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).
Squashed:
iser-target: Don't deffer disconnected handler to a work
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since commit 0fc4ea701fcf ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.
Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).
Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.
Squashed:
iscsi-target: Add call to wait_conn in establishment error flow
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | | | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
|\ \ \ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
'srp' into for-next
|
| | | | | | | | |/ /
| | | | | | | |/| |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
In case the last argument of the connection string is processed as a
string (destination GID for example).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
* Implement the relevant invalidation functions (zap MTTs as needed)
* Implement interlocking (and rollback in the page fault handlers) for
cases of a racing notifier and fault.
* With this patch we can now enable the capability bits for supporting RC
send/receive/RDMA read/RDMA write, and UD send.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This patch implement a page fault handler (leaving the pages pinned as
of time being). The page fault handler handles initiator and responder
page faults for UD/RC transports, for send/receive operations, as well
as RDMA read/write initiator support.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.
The registered fault handler is empty in this patch, and only a later
patch implements it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.
Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.
In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.
* Instead of accepting a UMR bit telling the function to enable all
access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
correct list, and enable/disable the access flags per-page according
to whether the page is present.
* A new bit is set to enable writing of access flags when using the
firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.
In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The patch adds infrastructure to query ODP capabilities in the mlx5
driver. The code will read the capabilities from the device, and
enable only those capabilities that both the driver and the device
supports. At this point ODP is not supported, so no capability is
copied from the device, but the patch exposes the global ODP device
capability bit.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
* Add an interval tree implementation for ODP umems. Create an
interval tree for each ucontext (including a count of the number of
ODP MRs in this context, semaphore, etc.), and register ODP umems in
the interval tree.
* Add MMU notifiers handling functions, using the interval tree to
notify only the relevant umems and underlying MRs.
* Register to receive MMU notifier events from the MM subsystem upon
ODP MR registration (and unregister accordingly).
* Add a completion object to synchronize the destruction of ODP umems.
* Add mechanism to abort page faults when there's a concurrent invalidation.
The way we synchronize between concurrent invalidations and page
faults is by keeping a counter of currently running invalidations, and
a sequence number that is incremented whenever an invalidation is
caught. The page fault code checks the counter and also verifies that
the sequence number hasn't progressed before it updates the umem's
page tables. This is similar to what the kvm module does.
In order to prevent the case where we register a umem in the middle of
an ongoing notifier, we also keep a per ucontext counter of the total
number of active mmu notifiers. We only enable new umems when all the
running notifiers complete.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yuval Dagan <yuvalda@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
* Extend the umem struct to keep the ODP related data.
* Allocate and initialize the ODP related information in the umem
(page_list, dma_list) and freeing as needed in the end of the run.
* Store a reference to the process PID struct in the ucontext. Used to
safely obtain the task_struct and the mm during fault handling,
without preventing the task destruction if needed.
* Add 2 helper functions: ib_umem_odp_map_dma_pages and
ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
of specific pages of the umem (and, currently, pin them).
* Support for page faults only - IB core will keep the reference on
the pages used and call put_page when freeing an ODP umem
area. Invalidations support will be added in a later patch.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
* Add a configuration option for enable on-demand paging support in
the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
later patch, this configuration option will select the MMU_NOTIFIER
configuration option to enable mmu notifiers.
* Add a flag for on demand paging (ODP) support in the IB device capabilities.
* Add a flag to request ODP MR in the access flags to reg_mr.
* Fail registrations done with the ODP flag when the low-level driver
doesn't support this.
* Change the conditions in which an MR will be writable to explicitly
specify the access flags. This is to avoid making an MR writable just
because it is an ODP MR.
* Add a ODP capabilities to the extended query device verb.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
copy capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues. The function will be used in a later
patch by the page-fault handling code in mlx5_ib.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
- Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
In some drivers there's a need to read data from a user space area
that was pinned using ib_umem when running from a different process
context.
The ib_umem_copy_from function allows reading data from the physical
pages pinned in the ib_umem struct.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
In order to allow umems that do not pin memory, we need the umem to
keep track of its region's address.
This makes the offset field redundant, and so this patch removes it.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The current UMR interface doesn't allow partial updates to a memory
region's page tables. This patch changes the interface to allow that.
It also changes the way the UMR operation validates the memory
region's state. When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR
operation to fail if the MKEY is in the free state. When it is
unchecked the operation will check that it isn't in the free state.
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | |/ / /
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Since UMR code now uses its own context struct on the stack, the pas
and dma pointers for the UMR operation that remained in the mlx5_ib_mr
struct are not necessary. This patch removes them.
Fixes: a74d24168d2d ("IB/mlx5: Refactor UMR to have its own context struct")
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
For user applications that use UD QPs, always resolve destination MAC
from the GRH. This is to avoid failure due to any garbage value in
the attr->dmac.
Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | | |/ / /
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|