summaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | NVMe: Passthrough IOCTL for IO commandsKeith Busch2014-11-041-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NVME_IOCTL_SUBMIT_IO only works for IO commands with block data transfers and isn't usable for other NVMe commands like flush, data set management, or any sort of vendor unique command. The NVME_IOCTL_ADMIN_CMD, however, can easily be modified to accept arbitrary IO commands in addition to arbitrary admin commands without breaking backward compatibility. This patch just adds a new IOCTL to distinguish if the driver should submit the command on an IO or Admin queue. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Add revalidate_disk callbackKeith Busch2014-11-041-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a callback to revalidate the disk and change its block size and capacity if needed. Before, a user would have to remove + rescan an entire device if they changed the logical block size using an NVMe Format or other vendor specific command; now they can just run something that issues the BLKRRPART IOCTL, like # hdparm -z /dev/nvmeXnY This can also be used in response to the 1.2 Spec's Namespace Attribute Change asynchronous event. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Fix nvmeq waitqueue entry initializationKeith Busch2014-11-041-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to update the nvme queue's wait_queue_t entry during each initialization since the nvme_thread may be ended and restarted when the device is reset. If a device reset occurs during a large amount of buffered IO, it would take a lot longer to complete the outstanding requests due to the 1 second polling instead of waking up as completions occur. Fixes: b9afca3efb18a9b8392cb544a3e29e8b1168400c Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Translate NVMe status to errnoKeith Busch2014-11-041-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This returns a more appropriate error for the "capacity exceeded" status. In case other NVMe statuses have a better errno, this patch adds a convience function to translate an NVMe status code to an errno for IO commands, defaulting to the current -EIO. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Fix SG_IO status valuesKeith Busch2014-11-041-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've only been setting the sg_io_hdr status values on SCSI commands that require an nvme command to complete the translation. The fields in the struct are output parameters, so we have to set them, otherwise user space will see whatever was in memory from before. In the case of compat SG_IO, this would reveal kernel memory. This fixes the issue by initializing the sg_io_hdr with successful status. Signed-off-by: Keith Busch <keith.busch@intel.com> Acked-by: Vishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Remove duplicate compat SG_IO codeKeith Busch2014-11-042-149/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can return -ENOIOCTLCMD and the ioctl will be handled by fs/compat_ioctl.c instead. This removes a lot of duplicate code in the nvme driver. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Reference count pci deviceKeith Busch2014-11-041-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an nvme device is removed but user space has an open reference, the nvme driver would have been holding an invalid reference to its pci device. You may get a general protection fault on x86 h/w when the driver uses that reference in dma_map_sg(), as is done in nvme_map_user_pages() from the IOCTL interface. This patch fixes the fault by taking a reference on the pci device and holding it even after device removal until all opens on the nvme device are closed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Nilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | nvme: Replace rcu_assign_pointer() with RCU_INIT_POINTER()Andreea-Cristina Bernat2014-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Correctly handle IOCTL_SUBMIT_IO when cpus > online queuesSam Bradshaw2014-11-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index. This patch fixes the case where there are more cpus from which the ioctl call can originate than online queues, which can happen when a device supports or was allocated fewer interrupt vectors than exist cpu cores. Thanks to Keith Busch for the implementation suggestion. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Fix filesystem sync deadlock on removalKeith Busch2014-11-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the order of deleting the gendisks so it happens after the nvme IO queues are freed. If a device is removed while a filesystem has associated dirty data, the removal will wait on these to complete before proceeding from del_gendisk, which could have caused deadlock before. The implication of this is that an orderly removal of a responsive device won't necessarily wait for dirty data to be written, but we are not guaranteed the device is even going to respond at this point either. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Call nvme_free_queue directlyKeith Busch2014-11-041-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than relying on call_rcu, this patch directly frees the nvme_queue's memory after ensuring no readers exist. Some arch specific dma_free_coherent implementations may not be called from a call_rcu's soft interrupt context, hence the change. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Matthew Minter <matthew_minter@xyratex.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Add shutdown timeout as module parameter.Dan McLeran2014-11-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Skip orderly shutdown on failed devicesKeith Busch2014-11-041-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than skipping shutdown only for devices that have been removed, skip the orderly shutdown on failed devices to avoid the long timeout handling that inevitably happens when deleting queues on such a device. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Whitespace fixesKeith Busch2014-11-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixing tabs inadvertently converted to spaces. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Use pci_stop_and_remove_bus_device_locked()Keith Busch2014-11-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Race conditions are theoretically possible between the NVMe PCI device removal and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make the NVMe code use pci_stop_and_remove_bus_device_locked(). Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Handling devices incapable of I/OKeith Busch2014-11-041-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a minor refactor for handling devices that are incapable of IO. The driver previously used special error codes to know that IO queues are unavailable, but we have an online queue count now. This also fixes an issue where the driver successfully sets the queue count, but either is unable to allocate an IO queue or the device can't create one for some reason. If the driver can successfully enable the device and get responses to admin commands, the driver will bring up a character device for managment but not create block devices. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Change nvme_enable_ctrl to set EN and manage CC thru ctrl_config.Dan McLeran2014-11-041-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the behavior of nvme_enable_ctrl to set EN. Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl. Remove reading of the CC register and manage the state in dev->ctrl_config. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> [removed an unwanted write to CC] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Mismatched host/device page size supportKeith Busch2014-11-041-19/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | NVMe: Async event requestKeith Busch2014-11-041-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Submits NVMe asynchronous event requests, one event up to the controller maximum or number of possible different event types (8), whichever is smaller. Events successfully returned by the controller are logged. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | block: Use dma_zalloc_coherentJoe Perches2014-11-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the zeroing function instead of dma_alloc_coherent & memset(,0,) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-blockLinus Torvalds2014-12-134-11/+13
|\ \ \ \ \ | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver core update from Jens Axboe: "This is the pull request for the core block IO changes for 3.19. Not a huge round this time, mostly lots of little good fixes: - Fix a bug in sysfs blktrace interface causing a NULL pointer dereference, when enabled/disabled through that API. From Arianna Avanzini. - Various updates/fixes/improvements for blk-mq: - A set of updates from Bart, mostly fixing buts in the tag handling. - Cleanup/code consolidation from Christoph. - Extend queue_rq API to be able to handle batching issues of IO requests. NVMe will utilize this shortly. From me. - A few tag and request handling updates from me. - Cleanup of the preempt handling for running queues from Paolo. - Prevent running of unmapped hardware queues from Ming Lei. - Move the kdump memory limiting check to be in the correct location, from Shaohua. - Initialize all software queues at init time from Takashi. This prevents a kobject warning when CPUs are brought online that weren't online when a queue was registered. - Single writeback fix for I_DIRTY clearing from Tejun. Queued with the core IO changes, since it's just a single fix. - Version X of the __bio_add_page() segment addition retry from Maurizio. Hope the Xth time is the charm. - Documentation fixup for IO scheduler merging from Jan. - Introduce (and use) generic IO stat accounting helpers for non-rq drivers, from Gu Zheng. - Kill off artificial limiting of max sectors in a request from Christoph" * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits) bio: modify __bio_add_page() to accept pages that don't start a new segment blk-mq: Fix uninitialized kobject at CPU hotplugging blktrace: don't let the sysfs interface remove trace from running list blk-mq: Use all available hardware queues blk-mq: Micro-optimize bt_get() blk-mq: Fix a race between bt_clear_tag() and bt_get() blk-mq: Avoid that __bt_get_word() wraps multiple times blk-mq: Fix a use-after-free blk-mq: prevent unmapped hw queue from being scheduled blk-mq: re-check for available tags after running the hardware queue blk-mq: fix hang in bt_get() blk-mq: move the kdump check to blk_mq_alloc_tag_set blk-mq: cleanup tag free handling blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map blk: introduce generic io stat accounting help function blk-mq: handle the single queue case in blk_mq_hctx_next_cpu genhd: check for int overflow in disk_expand_part_tbl() blk-mq: add blk_mq_free_hctx_request() blk-mq: export blk_mq_free_request() blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable ...
| * | | | blk-mq: add a 'list' parameter to ->queue_rq()Jens Axboe2014-10-293-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we have the notion of a 'last' request in a chain, we can use this to have the hardware optimize the issuing of requests. Add a list_head parameter to queue_rq that the driver can use to temporarily store hw commands for issue when 'last' is true. If we are doing a chain of requests, pass in a NULL list for the first request to force issue of that immediately, then batch the remainder for deferred issue until the last request has been sent. Instead of adding yet another argument to the hot ->queue_rq path, encapsulate the passed arguments in a blk_mq_queue_data structure. This is passed as a constant, and has been tested as faster than passing 4 (or even 3) args through ->queue_rq. Update drivers for the new ->queue_rq() prototype. There are no functional changes in this patch for drivers - if they don't use the passed in list, then they will just queue requests individually like before. Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | block: remove artifical max_hw_sectors capChristoph Hellwig2014-10-211-1/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set max_sectors to the value the drivers provides as hardware limit by default. Linux had proper I/O throttling for a long time and doesn't rely on a artifically small maximum I/O size anymore. By not limiting the I/O size by default we remove an annoying tuning step required for most Linux installation. Note that both the user, and if absolutely required the driver can still impose a limit for FS requests below max_hw_sectors_kb. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | Merge branch 'akpm' (second patch-bomb from Andrew)Linus Torvalds2014-12-132-35/+73
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge second patchbomb from Andrew Morton: - the rest of MM - misc fs fixes - add execveat() syscall - new ratelimit feature for fault-injection - decompressor updates - ipc/ updates - fallocate feature creep - fsnotify cleanups - a few other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits) cgroups: Documentation: fix trivial typos and wrong paragraph numberings parisc: percpu: update comments referring to __get_cpu_var percpu: update local_ops.txt to reflect this_cpu operations percpu: remove __get_cpu_var and __raw_get_cpu_var macros fsnotify: remove destroy_list from fsnotify_mark fsnotify: unify inode and mount marks handling fallocate: create FAN_MODIFY and IN_MODIFY events mm/cma: make kmemleak ignore CMA regions slub: fix cpuset check in get_any_partial slab: fix cpuset check in fallback_alloc shmdt: use i_size_read() instead of ->i_size ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments ipc/msg: increase MSGMNI, remove scaling ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM ipc/sem.c: change memory barrier in sem_lock() to smp_rmb() lib/decompress.c: consistency of compress formats for kernel image decompress_bunzip2: off by one in get_next_block() usr/Kconfig: make initrd compression algorithm selection not expert fault-inject: add ratelimit option ratelimit: add initialization macro ...
| * | | | zram: use DEVICE_ATTR_[RW|RO|WO] to define zram sys device attributeGanesh Mahendran2014-12-131-17/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In current zram, we use DEVICE_ATTR() to define sys device attributes. SO, we need to set (S_IRUGO | S_IWUSR) permission and other arguments manually. Linux already provids the macro DEVICE_ATTR_[RW|RO|WO] to define sys device attribute. It is simple and readable. This patch uses kernel defined macro DEVICE_ATTR_[RW|RO|WO] to define zram device attribute. Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | mm/zram: correct ZRAM_ZERO flag bit positionMahendran Ganesh2014-12-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In struct zram_table_entry, the element *value* contains obj size and obj zram flags. Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj zram_flags. So the first zram flag(ZRAM_ZERO) should be from ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1). This patch fixes this cosmetic issue. Also fix a typo, "page in now accessed" -> "page is now accessed" Signed-off-by: Mahendran Ganesh <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | zram: implement rw_page operation of zramkaram.lee2014-12-131-0/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements rw_page operation for zram block device. I implemented the feature in zram and tested it. Test bed was the G2, LG electronic mobile device, whtich has msm8974 processor and 2GB memory. With a memory allocation test program consuming memory, the system generates swap. Operating time of swap_write_page() was measured. -------------------------------------------------- | | operating time | improvement | | | (20 runs average) | | -------------------------------------------------- |with patch | 1061.15 us | +2.4% | -------------------------------------------------- |without patch| 1087.35 us | | -------------------------------------------------- Each test(with paged_io,with BIO) result set shows normal distribution and has equal variance. I mean the two values are valid result to compare. I can say operation with paged I/O(without BIO) is faster 2.4% with confidence level 95%. [minchan@kernel.org: make rw_page opeartion return 0] [minchan@kernel.org: rely on the bi_end_io for zram_rw_page fails] [sergey.senozhatsky@gmail.com: code cleanup] [minchan@kernel.org: add comment] Signed-off-by: karam.lee <karam.lee@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <seungho1.park@lge.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | zram: change parameter from vaild_io_request()karam.lee2014-12-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes parameter of valid_io_request for common usage. The purpose of valid_io_request() is to determine if bio request is valid or not. This patch use I/O start address and size instead of a BIO parameter for common usage. Signed-off-by: karam.lee <karam.lee@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <seungho1.park@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | zram: remove bio parameter from zram_bvec_rw()karam.lee2014-12-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently rw_page block device operation has been added. This patchset implements rw_page operation for zram block device and does some clean-up. This patch (of 3): Remove an unnecessary parameter(bio) from zram_bvec_rw() and zram_bvec_read(). zram_bvec_read() doesn't use a bio parameter, so remove it. zram_bvec_rw() calls a read/write operation not using bio, so a rw parameter replaces a bio parameter. Signed-off-by: karam.lee <karam.lee@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: <seungho1.park@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | sunvdc: reconnect ldc after vds service domain restartsDwight Engen2014-12-111-22/+183
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change enables the sunvdc driver to reconnect and recover if a vds service domain is disconnected or bounced. By default, it will wait indefinitely for the service domain to become available again, but will honor a non-zero vdc-timout md property if one is set. If a timeout is reached, any in-progress I/O's are completed with -EIO. Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Reviewed-by: Chris Hyser <chris.hyser@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | vio: create routines for inc,dec vio dring indexesDwight Engen2014-12-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both sunvdc and sunvnet implemented distinct functionality for incrementing and decrementing dring indexes. Create common functions for use by both from the sunvnet versions, which were chosen since they will still work correctly in case a non power of two ring size is used. Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | sunvdc: fix module unload/reloadDwight Engen2014-12-111-0/+11
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Free resources allocated during port/disk probing so that the module may be successfully reloaded after unloading. Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2014-12-111-31/+43
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "virtio: virtio 1.0 support, misc patches This adds a lot of infrastructure for virtio 1.0 support. Notable missing pieces: virtio pci, virtio balloon (needs spec extension), vhost scsi. Plus, there are some minor fixes in a couple of places. Note: some net drivers are affected by these patches. David said he's fine with merging these patches through my tree. Rusty's on vacation, he acked using my tree for these, too" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (70 commits) virtio_ccw: finalize_features error handling virtio_ccw: future-proof finalize_features virtio_pci: rename virtio_pci -> virtio_pci_common virtio_pci: update file descriptions and copyright virtio_pci: split out legacy device support virtio_pci: setup config vector indirectly virtio_pci: setup vqs indirectly virtio_pci: delete vqs indirectly virtio_pci: use priv for vq notification virtio_pci: free up vq->priv virtio_pci: fix coding style for structs virtio_pci: add isr field virtio: drop legacy_only driver flag virtio_balloon: drop legacy_only driver flag virtio_ccw: rev 1 devices set VIRTIO_F_VERSION_1 virtio: allow finalize_features to fail virtio_ccw: legacy: don't negotiate rev 1/features virtio: add API to detect legacy devices virtio_console: fix sparse warnings vhost: remove unnecessary forward declarations in vhost.h ...
| * | | | virtio: drop VIRTIO_F_VERSION_1 from driversMichael S. Tsirkin2014-12-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Core activates this bit automatically now, drop it from drivers that set it explicitly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | | | virtio_blk: fix race at module removalMichael S. Tsirkin2014-12-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a device appears while module is being removed, driver will get a callback after we've given up on the major number. In theory this means this major number can get reused by something else, resulting in a conflict. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * | | | virtio_blk: make serial attribute staticMichael S. Tsirkin2014-12-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's never declared so no need to make it extern. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * | | | virtio_blk: v1.0 supportMichael S. Tsirkin2014-12-091-29/+41
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on patch by Cornelia Huck. Note: for consistency, and to avoid sparse errors, convert all fields, even those no longer in use for virtio v1.0. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2014-12-101-3/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS changes from Al Viro: "First pile out of several (there _definitely_ will be more). Stuff in this one: - unification of d_splice_alias()/d_materialize_unique() - iov_iter rewrite - killing a bunch of ->f_path.dentry users (and f_dentry macro). Getting that completed will make life much simpler for unionmount/overlayfs, since then we'll be able to limit the places sensitive to file _dentry_ to reasonably few. Which allows to have file_inode(file) pointing to inode in a covered layer, with dentry pointing to (negative) dentry in union one. Still not complete, but much closer now. - crapectomy in lustre (dead code removal, mostly) - "let's make seq_printf return nothing" preparations - assorted cleanups and fixes There _definitely_ will be more piles" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) copy_from_iter_nocache() new helper: iov_iter_kvec() csum_and_copy_..._iter() iov_iter.c: handle ITER_KVEC directly iov_iter.c: convert copy_to_iter() to iterate_and_advance iov_iter.c: convert copy_from_iter() to iterate_and_advance iov_iter.c: get rid of bvec_copy_page_{to,from}_iter() iov_iter.c: convert iov_iter_zero() to iterate_and_advance iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds iov_iter.c: convert iov_iter_npages() to iterate_all_kinds iov_iter.c: iterate_and_advance iov_iter.c: macros for iterating over iov_iter kill f_dentry macro dcache: fix kmemcheck warning in switch_names new helper: audit_file() nfsd_vfs_write(): use file_inode() ncpfs: use file_inode() kill f_dentry uses lockd: get rid of ->f_path.dentry->d_sb ...
| * \ \ \ Merge branch 'iov_iter' into for-nextAl Viro2014-12-084-37/+34
| |\ \ \ \ | | |/ / /
| * | | | kill f_dentry usesAl Viro2014-11-191-3/+3
| | |/ / | |/| | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge remote-tracking branch 'scsi-queue/core-for-3.19' into for-linusJames Bottomley2014-12-081-2/+2
|\ \ \ \ | |_|/ / |/| | |
| * | | scsi: rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16Hannes Reinecke2014-11-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPC-3 defines SERVICE ACTION IN(12) and SERVICE ACTION IN(16). So rename SERVICE_ACTION_IN to SERVICE_ACTION_IN_16 to be consistent with SPC and to allow for better distinction. Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Robert Elliott <elliott@hp.com> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | | zram: avoid kunmap_atomic() of a NULL pointerWeijie Yang2014-11-131-1/+2
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zram could kunmap_atomic() a NULL pointer in a rare situation: a zram page becomes a full-zeroed page after a partial write io. The current code doesn't handle this case and performs kunmap_atomic() on a NULL pointer, which panics the kernel. This patch fixes this issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang.kh@gmail.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2014-11-031-16/+19
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a GFP flag fix from Mike Christie, an error code fix from Jan, and fixes for two unnecessary allocations (kmalloc and workqueue) from Ilya. All are well tested. Ilya has one other fix on the way but it didn't get tested in time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: eliminate unnecessary allocation in process_one_ticket() rbd: Fix error recovery in rbd_obj_read_sync() libceph: use memalloc flags for net IO rbd: use a single workqueue for all devices
| * | rbd: Fix error recovery in rbd_obj_read_sync()Jan Kara2014-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we fail to allocate page vector in rbd_obj_read_sync() we just basically ignore the problem and continue which will result in an oops later. Fix the problem by returning proper error. CC: Yehuda Sadeh <yehuda@inktank.com> CC: Sage Weil <sage@inktank.com> CC: ceph-devel@vger.kernel.org CC: stable@vger.kernel.org Coverity-id: 1226882 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
| * | rbd: use a single workqueue for all devicesIlya Dryomov2014-10-301-15/+18
| |/ | | | | | | | | | | | | | | | | Using one queue per device doesn't make much sense given that our workfn processes "devices" and not "requests". Switch to a single workqueue for all devices. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2014-10-311-9/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull sparc update from David Miller: "Two changes: 1) It makes no sense to execute a VTOC partition table request in the Sun virtual block device driver and fail to load if it doesn't succeed because a) we don't use the result at all and b) it won't succeed if there is an EFI partition on the disk, for example. We read the partition table via the normal means in the block layer anyways, so this is really completely useless, so just remove it. From Dwight Engen. 2) Hook up new bpf system call" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sunvdc: don't call VD_OP_GET_VTOC sparc: Hook up bpf system call.
| * | sunvdc: don't call VD_OP_GET_VTOCDwight Engen2014-10-311-9/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a VTOC label, otherwise it will fail. In particular, it will return error 48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already handled by directly reading the disk in block/partitions/sun.c (enabled by CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is unused in the driver, remove the call and the field. Signed-off-by: Dwight Engen <dwight.engen@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'akpm' (incoming from Andrew Morton)Linus Torvalds2014-10-291-4/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) mm/balloon_compaction: fix deflation when compaction is disabled sh: fix sh770x SCIF memory regions zram: avoid NULL pointer access in concurrent situation mm/slab_common: don't check for duplicate cache names ocfs2: fix d_splice_alias() return code checking mm: rmap: split out page_remove_file_rmap() mm: memcontrol: fix missed end-writeback page accounting mm: page-writeback: inline account_page_dirtied() into single caller lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}() drivers/rtc/rtc-bq32k.c: fix register value memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node() drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock kernel/kmod: fix use-after-free of the sub_info structure drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc mm, thp: fix collapsing of hugepages on madvise drivers: of: add return value to of_reserved_mem_device_init() mm: free compound page with correct order gcov: add ARM64 to GCOV_PROFILE_ALL fsnotify: next_i is freed during fsnotify_unmount_inodes. mm/compaction.c: avoid premature range skip in isolate_migratepages_range ...
| * | zram: avoid NULL pointer access in concurrent situationWeijie Yang2014-10-291-4/+6
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a rare NULL pointer bug in mem_used_total_show() and mem_used_max_store() in concurrent situation, like this: zram is not initialized, process A is a mem_used_total reader which runs periodically, while process B try to init zram. process A process B access meta, get a NULL value init zram, done init_done() is true access meta->mem_pool, get a NULL pointer BUG This patch fixes this issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud