summaryrefslogtreecommitdiffstats
path: root/sys/geom/geom_dev.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC Alexander Motin's GEOM direct dispatch work:scottl2014-01-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | r256603: Introduce new function devstat_end_transaction_bio_bt(), adding new argument to specify present time. Use this function to move binuptime() out of lock, substantially reducing lock congestion when slow timecounter is used. r256606: Move g_io_deliver() out of the lock, as required for direct dispatch. Move g_destroy_bio() out too to reduce lock scope even more. r256607: Fix passing uninitialized bio_resid argument to g_trace(). r256610: Add unmapped I/O support to GEOM RAID. r256830: Restore BIO_UNMAPPED and BIO_TRANSIENT_MAPPING in biodonne() when unmapping temporary mapped buffer. That fixes double unmap if biodone() called twice for the same BIO (but with different done methods). r256880: Merge GEOM direct dispatch changes from the projects/camlock branch. When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. r259247: Fix bug introduced at r256607. We have to recalculate bp_resid here since sizes of original and completed requests may differ due to end of media. Testing of the stable/10 merge was done by Netflix, but all of the credit goes to Alexander and iX Systems. Submitted by: mav Sponsored by: iX Systems
* Change the way that unmapped I/O capability is advertised.ken2013-08-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous method was to set the D_UNMAPPED_IO flag in the cdevsw for the driver. The problem with this is that in many cases (e.g. sa(4)) there may be some instances of the driver that can handle unmapped I/O and some that can't. The isp(4) driver can handle unmapped I/O, but the esp(4) driver currently cannot. The cdevsw is shared among all driver instances. So instead of setting a flag on the cdevsw, set a flag on the cdev. This allows drivers to indicate support for unmapped I/O on a per-instance basis. sys/conf.h: Remove the D_UNMAPPED_IO cdevsw flag and replace it with an SI_UNMAPPED cdev flag. kern_physio.c: Look at the cdev SI_UNMAPPED flag to determine whether or not a particular driver can handle unmapped I/O. geom_dev.c: Set the SI_UNMAPPED flag for all GEOM cdevs. Since GEOM will create a temporary mapping when needed, setting SI_UNMAPPED unconditionally will work. Remove the D_UNMAPPED_IO flag. nvme_ns.c: Set the SI_UNMAPPED flag on cdevs created here if NVME_UNMAPPED_BIO_SUPPORT is enabled. vfs_aio.c: In aio_qphysio(), check the SI_UNMAPPED flag on a cdev instead of the D_UNMAPPED_IO flag on the cdevsw. sys/param.h: Bump __FreeBSD_version to 1000045 for the switch from setting the D_UNMAPPED_IO flag in the cdevsw to setting SI_UNMAPPED in the cdev. Reviewed by: kib, jimharris MFC after: 1 week Sponsored by: Spectra Logic
* Added a sysctl (kern.geom.dev.delete_max_sectors) to control the maximumsmh2013-04-261-5/+22
| | | | | | | | | | | | | | | | | | | | | | size of a delete request sent to the providing device performed by g_dev_ioctl. This allows the kernel and apps via ioctl e.g. newfs -E to request large LBA deletes which siginificantly improves performance. Previously this was hard coded to 65536 sectors, the new default is 262144 which doubles the throughput of deletes on commonly available SSD's. In tests on a Intel 520 120GB FW: 400i disk it improved the delete throughput from 1.6GB/s to over 2.6GB/s on a full disk delete such as that done via newfs -E For some SSD's where delete time is pretty much constant, no matter what the request, setting this to 0 will provide significantly better throughput e.g. Samsung 840 240GB FW DXT07B0Q @ 262144 = 79G/s, @ 0 = 2259G/s Reviewed by: mav Approved by: pjd (mentor) MFC after: 2 weeks
* Make it possible to submit FLUSH bios through geom_dev strategy. Thistrasz2013-04-061-1/+2
| | | | | | is required for CTL to work with device-backed LUNs. Reviewed by: mav
* Do not pass unmapped buffers to drivers that cannot handle themkan2013-03-261-1/+1
| | | | | | | | | | | | | | In physio, check if device can handle unmapped IO and pass an appropriately mapped buffer to the driver strategy routine. The only driver in the tree that can handle unmapped buffers is one exposed by GEOM, so mark it as such with the new flag in the driver cdevsw structure. This fixes insta-panics on hosts, running dconschat, as /dev/fwmem is an example of the driver that makes use of physio routine, but bypasses the g_down thread, where the buffer gets mapped normally. Discussed with: kib (earlier version)
* Fix long known deadlock between geom dev destruction and d_close() call.mav2013-03-241-71/+129
| | | | | | | | Use destroy_dev_sched_cb() to not wait for device destruction while holding GEOM topology lock (that actually caused deadlock). Use request counting protected by mutex to properly wait for outstanding requests completion in cases of device closing and geom destruction. Unlike r227009, this code does not block taskqueue thread for indefinite time, waiting for completion.
* - Don't pass geom and provider names as format strings.jh2012-11-201-1/+1
| | | | | | | - Add __printflike() attributes. - Remove an extra argument for the g_new_geomf() call in swapongeom_ev(). Reviewed by: pjd
* Provide a device name in the sysctl tree for programs to query thealfred2012-11-011-3/+3
| | | | | | | | | state of crashdump target devices. This will be used to add a "-l" (ell) flag to dumpon(8) to list the currently configured dumpdev. Reviewed by: phk
* Remove unneeded G_PF_CANDELETE flag.ed2012-08-281-4/+0
| | | | | This flag is only used by GEOM so it can be propagated to the character device's SI_CANDELETE. Unfortunately, SI_CANDELETE seems to do nothing.
* Implement media change notification for DA and CD removable media devices.mav2012-07-291-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It includes three parts: 1) Modifications to CAM to detect media media changes and report them to disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes Asynchronous Notification mechanism to receive events from hardware. Active polling with TEST UNIT READY commands with 3 seconds period is used for incapable hardware. After that both CD and DA drivers work the same way, detecting two conditions: "NOT READY: Medium not present" after medium was detected previously, and "UNIT ATTENTION: Not ready to ready change, medium may have changed". First one reported to disk(9) as media removal, second as media insert/change. To reliably receive second event new AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by generic error handling code in cam_periph_error(). 2) Modifications to GEOM core to handle media remove and change events. Media removal handled by spoiling all consumers attached to the provider. Media change event also schedules provider retaste after spoiling to probe new media. New flag G_CF_ORPHAN was added to consumers to reflect that consumer is in process of destruction. It allows retaste to create new geom instance of the same class, while previous one is still dying. 3) Modifications to some GEOM classes: DEV -- to report media change events to devd; VFS -- to handle spoiling same as orphan to prevent accessing replaced media. PART class already handles spoiling alike to orphan. Reviewed by: silence on geom@ and scsi@ Tested by: avg Sponsored by: iXsystems, Inc. / PC-BSD MFC after: 2 months
* Fix typo in the comment.trasz2012-07-061-1/+1
|
* Temporary revert r227009 to fix freeze on UP systems without PREEMPTION.mav2011-11-141-27/+12
| | | | | | | | | | | Before r215687, if some withered geom or provider could not be destroyed, g_event thread went to sleep for 0.1s before retrying. After that change it is just restarting immediately. r227009 made orphaned (withered) provider to not detach immediately, but only after context switch. That made loop inside g_event thread infinite on UP systems without PREEMPTION. To address original problem with possible dead lock addressed by r227009 we have to fix r215687 change first, that needs some time to think and test.
* Make orphan() method in geom_dev asynchronous using destroy_dev_sched_cb()mav2011-11-011-12/+27
| | | | | | | instead of destroy_dev(). It moves device destruction waiting out of the topology lock and so fixes dead lock between orphanization and closing. Real provider and geom destruction called from swi context after device destroyed as callback of the destroy_dev_sched_cb().
* Plumb device physical path reporting from CAM devices, through GEOM andgibbs2011-06-141-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DEVFS, and make it accessible via the diskinfo utility. Extend GEOM's generic attribute query mechanism into generic disk consumers. sys/geom/geom_disk.c: sys/geom/geom_disk.h: sys/cam/scsi/scsi_da.c: sys/cam/ata/ata_da.c: - Allow disk providers to implement a new method which can override the default BIO_GETATTR response, d_getattr(struct bio *). This function returns -1 if not handled, otherwise it returns 0 or an errno to be passed to g_io_deliver(). sys/cam/scsi/scsi_da.c: sys/cam/ata/ata_da.c: - Don't copy the serial number to dp->d_ident anymore, as the CAM XPT is now responsible for returning this information via d_getattr()->(a)dagetattr()->xpt_getatr(). sys/geom/geom_dev.c: - Implement a new ioctl, DIOCGPHYSPATH, which returns the GEOM attribute "GEOM::physpath", if possible. If the attribute request returns a zero-length string, ENOENT is returned. usr.sbin/diskinfo/diskinfo.c: - If the DIOCGPHYSPATH ioctl is successful, report physical path data when diskinfo is executed with the '-v' option. Submitted by: will Reviewed by: gibbs Sponsored by: Spectra Logic Corporation Add generic attribute change notification support to GEOM. sys/sys/geom/geom.h: Add a new attrchanged method field to both g_class and g_geom. sys/sys/geom/geom.h: sys/geom/geom_event.c: - Provide the g_attr_changed() function that providers can use to advertise attribute changes. - Perform delivery of attribute change notifications from a thread context via the standard GEOM event mechanism. sys/geom/geom_subr.c: Inherit the attrchanged method from class to geom (class instance). sys/geom/geom_disk.c: Provide disk_attr_changed() to provide g_attr_changed() access to consumers of the disk API. sys/cam/scsi/scsi_pass.c: sys/cam/scsi/scsi_da.c: sys/geom/geom_dev.c: sys/geom/geom_disk.c: Use attribute changed events to track updates to physical path information. sys/cam/scsi/scsi_da.c: Add AC_ADVINFO_CHANGED to the registered asynchronous CAM events for this driver. When this event occurs, and the updated buffer type references our physical path attribute, emit a GEOM attribute changed event via the disk_attr_changed() API. sys/cam/scsi/scsi_pass.c: Add AC_ADVINFO_CHANGED to the registered asynchronous CAM events for this driver. When this event occurs, update the physical patch devfs alias for this pass instance. Submitted by: gibbs Sponsored by: Spectra Logic Corporation
* Use make_dev_alias_p() added in r221397 to create alias dev entry.mav2011-05-031-1/+2
| | | | It removes panic in case if alias name is already busy for some reason.
* - Add shim to simplify migration to the CAM-based ATA. For each new adaXmav2011-04-261-2/+26
| | | | | | | | | device in /dev/ create symbolic link with adY name, trying to mimic old ATA numbering. Imitation is not complete, but should be enough in most cases to mount file systems without touching /etc/fstab. - To know what behavior to mimic, restore ATA_STATIC_ID option in cases where it was present before. - Add some more details to UPDATING.
* MFgraid/head r217827:mav2011-03-241-2/+5
| | | | | | | Change BIO_GETATTR("GEOM::kerneldump") API to make set_dumper() called by consumer (geom_dev) instead of provider (geom_disk). This allows any geom insert it's code into the dump call chain, implementing more sophisticated functionality then just disk partitioning.
* Use make_dev_p(9) with the MAKEDEV_CHECKNAME flag instead of make_dev(9)jh2010-10-191-2/+10
| | | | | | | | | | and print a diagnostic if the call fails. This avoids a panic when a device with an invalid name is attempted to be registered. For example the label class gets device names from untrusted input. Reviewed by: freebsd-geom
* fix a few cases where a string is passed via format argument instead ofavg2010-06-111-1/+1
| | | | | | | | | | via %s Most of the cases looked harmless, but this is done for the sake of correctness. In one case it even allowed to drop an intermediate buffer. Found by: clang MFC after: 2 week
* Add BIO_DELETE support to ada(4):mav2009-12-281-2/+2
| | | | | | | | | | | | | | | | | | | - For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by ACS-2 specification working draft. - For CompactFlash use CFA ERASE command, same as ad(4) does. With this patch, `newfs -E /dev/ada1` was able to restore write speed of my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level for the most part of it's capacity. Previous 1.3 firmware, even reportiong TRIM capabilty bit set, was not working, reporting ABORT error for every DSM command. I have no idea whether it is normal, but for some reason it takes 200ms to handle any TRIM command on this drive, that was making delete extremely slow. But TRIM command is able to accept long list of LBAs and the length of that list seems doesn't affect it's execution time. Implemented request clusting algorithm allowed me to rise delete rate up to reasonable numbers, when many parallel DELETE requests running.
* Add two disk ioctls, giving user-level tools information about disk/arraymav2009-12-241-1/+6
| | | | stripe (optimal access block) size and offset.
* Do not check proper request alignment here in geom_dev in production.mav2009-09-081-2/+2
| | | | | | | It will be checked any way later by g_io_check() in g_io_schedule_down(). It is only needed here to not trigger panic from additional check, when INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two 64bit divisions per request.
* Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.cmarcel2009-07-081-12/+0
| | | | | | | | | | | | is invalid because the ioctl happens without prior open. The ioctl got introduced to provide backward compatibility for extended partitions, but it ended up not being used because it didn't work as expected. Since there are no consumers of the ioctl and the implementation is broken, the best fix is to remove the code entirely. Spotted by: phk Approved by: re (kensmith)
* Provide compatibility symlink for logical partitions:marcel2009-02-201-0/+12
| | | | | | | | | | 1. Extend geom_dev by having it create the symlink (i.e. call make_dev_alias) based on the DIOCGPROVIDERALIAS ioctl. In this way the functionaility is generic and thus usable by any geom/provider. 2. Have g_part handle said ioctl through the devalias method, so that it's under control of the scheme itself. By design the alias will not be created for newly added partitions.
* Remove unused unrhdr from GEOM character device module.ed2009-01-241-17/+1
| | | | | | Now that make_dev() doesn't require unit numbers to be unique, there is no need to use an unrhdr here to generate the numbers. Remove the entire init-routine, because it is optional.
* Remove unit2minor() use from kernel code.ed2008-09-261-1/+1
| | | | | | | | | | | | | | | When I changed kern_conf.c three months ago I made device unit numbers equal to (unneeded) device minor numbers. We used to require bitshifting, because there were eight bits in the middle that were reserved for a device major number. Not very long after I turned dev2unit(), minor(), unit2minor() and minor2unit() into macro's. The unit2minor() and minor2unit() macro's were no-ops. We'd better not remove these four macro's from the kernel, because there is a lot of (external) code that may still depend on them. For now it's harmless to remove all invocations of unit2minor() and minor2unit(). Reviewed by: kib
* - Add a new ioctl for getting the provider name of a geom provider.lulf2008-09-071-0/+7
| | | | | | | | - Add a routine for looking up a device and checking if it is a valid geom provider given a partial or full path to its device node. Reviewed by: phk Approved by: pjd (mentor)
* Remove the distinction between device minor and unit numbers.ed2008-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Even though we got rid of device major numbers some time ago, device drivers still need to provide unique device minor numbers to make_dev(). These numbers are only used inside the kernel. They are not related to device major and minor numbers which are visible in devfs. These are actually based on the inode number of the device. It would eventually be nice to remove minor numbers entirely, but we don't want to be too agressive here. Because the 8-15 bits of the device number field (si_drv0) are still reserved for the major number, there is no 1:1 mapping of the device minor and unit numbers. Because this is now unused, remove the restrictions on these numbers. The MAXMAJOR definition was actually used for two purposes. It was used to convert both the userspace and kernelspace device numbers to their major/minor pair, which is why it is now named UMINORMASK. minor2unit() and unit2minor() have now become useless. Both minor() and dev2unit() now serve the same purpose. We should eventually remove some of them, at least turning them into macro's. If devfs would become completely minor number unaware, we could consider using si_drv0 directly, just like si_drv1 and si_drv2. Approved by: philip (mentor)
* Chop DIOCGDELETE from userland up in 1024 sector chunks to give geom_diskphk2007-12-161-2/+18
| | | | | | | or any other bio chopping geom a reasonable size of work. Check for delivered signals between chunks, because the request size and service time is unbounded.
* Don't limit BIO_DELETE requests to MAXPHYS, they perform no dataphk2007-12-161-2/+1
| | | | transfers, so they are not subject to the VM system limitation.
* Implement three new ioctls that can be used with GEOM provider:pjd2007-05-051-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DIOCGFLUSH - Flush write cache (sends BIO_FLUSH). DIOCGDELETE - Delete data (mark as unused) (sends BIO_DELETE). DIOCGIDENT - Get provider's uniqe and fixed identifier (asks for GEOM::ident attribute). First two are self-explanatory, but the last one might not be. Here are properties of provider's ident: - ident value is preserved between reboots, - provider can be detached/attached and ident is preserved, - provider's name can change - ident can't, - ident value should not be based on on-disk metadata; in other words copying whole data from one disk to another should not yield the same ident for the other disk, - there could be more than one provider with the same ident, but only if they point at exactly the same physical storage, this is the case for multipathing for example, - GEOM classes that consumes single providers and provide single providers, like geli, gbde, should just attach class name to the ident of the underlying provider, - ident is an ASCII string (is printable), - ident is optional and applications can't relay on its presence. The main purpose for this is that application and remember provider's ident and once it tries to open provider by its name again, it may compare idents to be sure this is the right provider. If it is not (idents don't match), then it can open provider by its ident. OK'ed by: phk
* make_dev(9) can be (and is) called without Giant, so there is no need tokris2007-03-261-8/+0
| | | | | | drop the topology lock and acquire Giant around this call. Reviewed by: phk
* Use pause() rather than tsleep() on stack variables and function pointers.jhb2007-02-271-3/+3
|
* Use tsleep() rather than msleep() with a NULL mtx parameter.jhb2007-02-231-1/+1
|
* In g_dev_strategy(), when failing an IO request with EINVAL due tosimon2006-06-181-0/+1
| | | | | | | | | | | | | | offset or request size which is not a multiple of the sector size, make sure that the bio is set to indicate that no data has actually been transferred. The result of this is that the file offset is no longer incremented for these requests. The fact that the file offset was incremented broke fdisk(8)'s probing of sector size for non-512 byte sector sizes. Reviewed by: phk, cperciva Submitted by: mdodd MFC after: 2 weeks
* Avoid null pointer dereference.phk2005-03-181-3/+2
|
* Add placeholder mutex argument to new_unrhdr().phk2005-03-071-2/+1
|
* Pass the file->flags down to geom ioctl handlers.phk2004-12-121-1/+1
| | | | | | | | Reject certain ioctls if write permission is not indicated. Bump geom API version. Reported by: Ruben de Groot <mail25@bzerk.org>
* Don't set si_bsize_phys, nobody cares.phk2004-10-291-2/+0
|
* Give dev_strategy() an explict cdev argument in preparation for removingphk2004-10-291-2/+2
| | | | | | | | | | | buf->b-dev. Put a bio between the buf passed to dev_strategy() and the device driver strategy routine in order to not clobber fields in the buf. Assert copyright on vfs_bio.c and update copyright message to canonical text. There is no legal difference between John Dysons two-clause abbreviated BSD license and the canonical text.
* Use unit number allocation functions for GEOM minor numbers.phk2004-10-251-3/+18
|
* Retire si_stripesize and si_stripeoffset they will not be needed in cdevphk2004-10-251-2/+0
| | | | in the future.
* Don't call g_waitidle(), it happens automagically now.phk2004-10-231-3/+0
|
* Deny invalid I/O requests which comes from userland here, because laterpjd2004-09-271-0/+6
| | | | | | | we'll get a panic. MT5 candidate. Reviewed by: phk
* Assert topology is held in g_dev_getprovider().phk2004-09-241-3/+5
| | | | | Don't call devsw(). It is not necessary, and we do not need to hold dev_lock to compare the devsw pointer to our own since we do not dereference it.
* Tag all geom classes in the tree with a version number.phk2004-08-081-0/+1
|
* Use default method initialization on geoms.phk2004-08-081-1/+1
|
* Duplicate the securelevel check from spec_vnops.c here.phk2004-06-191-0/+11
|
* Reduce the thaumaturgical level of root filesystem mounts: Instead of usingphk2004-06-171-43/+0
| | | | | | | an otherwise redundant clone routine in geom_disk.c, mount a temporary DEVFS and do a proper lookup. Submitted by: thomas
* Second half of the dev_t cleanup.phk2004-06-171-1/+1
| | | | | | | | | | | The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
OpenPOWER on IntegriCloud