summaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* [SCSI] libfc: eliminate disc->eventJoe Eykholt2009-09-101-1/+0
| | | | | | | | | | | There was no need to have the discovery status stored in struct fc_disc. Change fc_disc_done() to take the discovery status as an argument and just pass it on to the discovery callback. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: don't create dummy (rogue) remote portsJoe Eykholt2009-09-101-21/+19
| | | | | | | | | | | | | | | | | | Don't create a "dummy" remote port to go with fc_rport_priv. Make the rport truly optional by allocating fc_rport_priv separately and not requiring a dummy rport to be there if we haven't yet done fc_remote_port_add(). The fc_rport_libfc_priv remains as a structure attached to the rport for I/O purposes. Be sure to hold references on rdata when the lock is dropped in fc_rport_work(). Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: rename rport event CREATED to READYJoe Eykholt2009-09-101-1/+1
| | | | | | | | | | | | Remote ports will become READY more than once after ADISC is implemented in a later patch. The event callback that has been called "CREATED" will mean "READY". Rename it now in preparation for those changes. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: make rport structure optionalJoe Eykholt2009-09-101-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a struct fc_rport_priv to have no fc_rport associated with it. This sets up to remove the need for "rogue" rports. Add a few fields to fc_rport_priv that are needed before the fc_rport is created. These are the ids, maxframe_size, classes, and rport pointer. Remove the macro PRIV_TO_RPORT(). Just use rdata->rport where appropriate. To take the place of the get_device()/put_device ops that were used to hold both the rport and rdata, add a reference count to rdata structures using kref. When kref_get decrements the refcount to zero, a new template function releasing the rdata should be called. This will take care of freeing the rdata and releasing the hold on the rport (for now). After subsequent patches make the rport truly optional, this release function will simply free the rdata. Remove the simple inline function fc_rport_set_name(), which becomes semanticly ambiguous otherwise. The caller will set the port_name and node_name in the rdata->Ids, which will later be copied to the rport when it its created. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: change elsct to use FC_ID instead of rdataJoe Eykholt2009-09-102-22/+6
| | | | | | | | | | | | | | | | | tt.elsct_send is used by both FCP and by the rport state machine. After further patches, these two modules will use different structures for the remote port. So, change elsct_send to use the FC_ID instead of the fc_rport_priv as its argument. It currently only uses the FC_ID anyway. For CT requests the destination FC_ID is still implicitly 0xfffffc. After further patches the did arg on CT requests will be used to specify the FC_ID being inquired about for GPN_ID or other queries. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: make fc_rport_priv the primary rport interface.Joe Eykholt2009-09-102-14/+19
| | | | | | | | | | | | | | | | | The rport and discovery modules deal with remote ports before fc_remote_port_add() can be done, because the full set of rport identifiers is not known at early stages. In preparation for splitting the fc_rport/fc_rport_priv allocation, make fc_rport_priv the primary interface for the remote port and discovery engines. The FCP / SCSI layers still deal with fc_rport and fc_rport_libfc_priv, however. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: fix RPORT_TO_PRIV and PRIV_TO_RPORT() macros.Joe Eykholt2009-09-101-2/+2
| | | | | | | | | | | | These macros introduce extra undesirable semicolons that keep them from being used in expressions, and they don't protect against being passed an expression. Add parens and remove the semicolons. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: change interface for rport_createJoe Eykholt2009-09-101-3/+2
| | | | | | | | | | | | | The interface for lport->tt.rport_create() takes a fc_disc_port arg, which is unnatural for most calls. The only reason for this was to avoid passing in the local port as an argument, but otherwise added to complexity. Simplify by just using lport and fc_rport_identifiers. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: prepare to split off struct fc_rport_priv from fc_rport_libfc_privJoe Eykholt2009-09-101-1/+9
| | | | | | | | | | | | | | | While the I/O and LLD interfaces use fc_rport_libfc_priv, the disc and rport interfaces will use fc_rport_priv, which will be separately allocated. Change the disc and rport usage of fc_rport_libfc_priv to fc_rport_priv. Use #define temporarily to make both names equivalent until a subsequent patch splits them. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] fcoe, libfc: fully makes use of per cpu exch pool and then removes ↵Vasu Dev2009-09-051-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | em_lock 1. Updates fcoe_rcv() to queue incoming frames to the fcoe per cpu thread on which this frame's exch was originated and simply use current cpu for request exch not originated by initiator. It is redundant to add this code under CONFIG_SMP, so removes CONFIG_SMP uses around this code. 2. Updates fc_exch_em_alloc, fc_exch_delete, fc_exch_find to use per cpu exch pools, here fc_exch_delete is rename of older fc_exch_mgr_delete_ep since ep/exch are now deleted in pools of EM and so brief new name is sufficient and better name. Updates these functions to map exch id to their index into exch pool using fc_cpu_mask, fc_cpu_order and EM min_xid. This mapping is as per detailed explanation about this in last patch and basically this is just as lower fc_cpu_mask bits of exch id as cpu number and upper bit sum of EM min_xid and exch index in pool. Uses pool next_index to keep track of exch allocation from pool along with pool_max_index as upper bound of exches array in pool. 3. Adds exch pool ptr to fc_exch to free exch to its pool in fc_exch_delete. 4. Updates fc_exch_mgr_reset to reset all exch pools of an EM, this required adding fc_exch_pool_reset func to reset exches in pool and then have fc_exch_mgr_reset call fc_exch_pool_reset for each pool within each EM for a lport. 5. Removes no longer needed exches array, em_lock, next_xid, and total_exches from struct fc_exch_mgr, these are not needed after use of per cpu exch pool, also removes not used max_read, last_read from struct fc_exch_mgr. 6. Updates locking notes for exch pool lock with fc_exch lock and uses pool lock in exch allocation, lookup and reset. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] fcoe, libfc: adds per cpu exch pool within exchange manager(EM)Vasu Dev2009-09-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds per cpu exch pool for these reasons:- 1. Currently an EM instance is shared across all cpus to manage all exches for all cpus. This required em_lock across all cpus for an exch alloc, free, lookup and reset each frame and that made em_lock expensive, so instead having per cpu exch pool with their own per cpu pool lock will likely reduce locking contention in fast path for an exch alloc, free and lookup. 2. Per cpu exch pool will likely improve cache hit ratio since all frames of an exch will be processed on the same cpu on which exch originated. This patch is only prep work to help in keeping complexity of next patch low, so this patch only sets up per cpu exch pool and related helper funcs to be used by next patch. The next patch fully makes use of per cpu exch pool in all code paths ie. tx, rx and reset. Divides per EM exch id range equally across all cpus to setup per cpu exch pool. This division is such that lower bits of exch id carries cpu number info on which exch originated, later a simple bitwise AND operation on exch id of incoming frame with fc_cpu_mask retrieves cpu number info to direct all frames to same cpu on which exch originated. This required a global fc_cpu_mask and fc_cpu_order initialized to max possible cpus number nr_cpu_ids rounded up to 2's power, this will be used in mapping exch id and exch ptr array index in pool during exch allocation, find or reset code paths. Adds a check in fc_exch_mgr_alloc() to ensure specified min_xid lower bits are zero since these bits are used to carry cpu info. Adds and initializes struct fc_exch_pool with all required fields to manage exches in pool. Allocates per cpu struct fc_exch_pool with memory for exches array for range of exches per pool. The exches array memory is followed by struct fc_exch_pool. Adds fc_exch_ptr_get/set() helper functions to get/set exch ptr in pool exches array at specified array index. Increases default FCOE_MAX_XID to 0x0FFF from 0x07EF, so that more exches are available per cpu after above described exch id range division across all cpus to each pool. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] iscsi_tcp: add new conn error to indicate tcp conn closedMike Christie2009-09-051-0/+1
| | | | | | | | | | | If a target closed the connection, we will detect it in the state_changed or data_ready callout. This adds a new conn error value to use for this problem, so it is not confused with when the initiator throws a conn error and drops the connection. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] scsi_dh: add the interface scsi_dh_set_params()Chandra Seetharaman2009-08-222-0/+6
| | | | | | | | | | | | | | | | | | | | | When we moved the device handler functionality from dm layer to SCSI layer we dropped the parameter functionality. This path adds an interface to scsi dh layer to set device handler parameters. Basically, multipath layer need to create a string with all the parameters and call scsi_dh_set_params() after it called scsi_dh_attach() on a device. If a device handler provides such an interface it will handle the parameters as it expects them. Reported-by: Eddie Williams <Eddie.Williams@steeleye.com> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Tested-by: Eddie Williams <Eddie.Williams@steeleye.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] ses: add support for enclosure component hot removalJames Bottomley2009-08-221-1/+1
| | | | | | | | | | | | Right at the moment, hot removal of a device within an enclosure does nothing (because the intf_remove only copes with enclosure removal not with component removal). Fix this by adding a function to remove the component. Also needed to fix the prototype of enclosure_remove_device, since we know the device we've removed but not the internal component number Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] ses: fix hotplug with multiple devices and expandersJames Bottomley2009-08-221-1/+2
| | | | | | | | | | | | In a situation either with expanders or with multiple enclosure devices, hot add doesn't always work. This is because we try to find a single enclosure device attached to the host. Fix this by looping over all enclosure devices attached to the host and also by making the find loop recognise that the enclosure devices may be expander remote (i.e. not parented by the host). Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: Remove FC_FRAME_SG_LEN in fc_fcp_send_dataYi Zou2009-08-221-7/+0
| | | | | | | | | | | | | | | | FC_FRAME_SG_LEN is 4 which is too small when offload is enabled. Actually, the WARN_ON() in fc_fcp_send_data() should be: WARN_ON(skb_shinfo(fp_skb(fp))->nr_frags > MAX_SKB_FRAGS); But since we will not get anything more than 64K anyway, so there is no need to do this anyway here. Therefore, I am getting rid of FC_FRAME_SG_LEN here and the WARN_ON here. Signed-off-by: Yi Zou <yi.zou@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] fcoe, fnic, libfc: modifies current code paths to use EM anchor listVasu Dev2009-08-221-40/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modifies current code to use EM anchor list in EM allocation, EM free, EM reset, exch allocation and exch lookup code paths. 1. Modifies fc_exch_mgr_alloc to accept EM match function and then have allocated EM added to the lport using fc_exch_mgr_add API while also updating EM kref for newly added EM. 2. Updates fc_exch_mgr_free API to accept only lport pointer instead EM and then have this API free all EMs of the lport from EM anchor list. 3. Removes single lport pointer link from the EM, which was used in associating lport pointer in newly allocated exchange. Instead have lport pointer passed along new exchange allocation call path and then store passed lport pointer in newly allocated exchange, this will allow a single EM instance to be used across more than one lport and used in EM reset to reset only lport specific exchanges. 4. Modifies fc_exch_mgr_reset to reset all EMs from the EM anchor list of the lport, adds additional exch lport pointer (ep->lp) check for shared EM case to reset exchange specific to a lport requested reset. 5. Updates exch allocation API fc_exch_alloc to use EM anchor list and its anchor match func pointer. The fc_exch_alloc will walk the list of EMs until it finds a match, a match will be either null match func pointer or call to match function returning true value. 6. Updates fc_exch_recv to accept incoming frame on local port using only lport pointer and frame pointer without specifying EM instance of incoming frame. Instead modified fc_exch_recv to locate EM for the incoming frame by matching xid of incoming frame against a EM xid range. This change was required to use EM list in libfc Rx path and after this change the lport fc_exch_mgr pointer emp is not needed anymore, so removed emp pointer. 7. Updates fnic for removed lport emp pointer and above modified libfc APIs fc_exch_recv, fc_exch_mgr_alloc and fc_exch_mgr_free. 8. Removes exch_get and exch_put from libfc_function_template as these are no longer needed with EM anchor list and its match function use. Also removes its default function fc_exch_get. A defect this patch introduced regarding the libfc initialization order in the fnic driver was fixed by Joe Eykholt <jeykholt@cisco.com>. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: Remove the FC_EM_DBG macroRobert Love2009-08-221-6/+0
| | | | | | | | | | | | | | | | | Currently there is a 1:1 relationship between the lport and exchange manager. This macro takes an EM as an argument and determines the lport from it. However, later patches will use an EM list per lport, so we will no longer have this 1:1 relationship- this macro must change. The FC_EM_DBG macro is rarely used. There are four callers, two can use FC_LPORT_DBG instead and two can be removed since they're not necessary. This patch makes those changes and removes the macro. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] fcoe, libfc: adds exchange manager(EM) anchor list per lport and ↵Vasu Dev2009-08-221-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | related APIs Adds EM list using a anchor struct fc_exch_mgr_anchor, anchor is used to allow same EM instance sharing across more than one lport on a eth device, this implementation is per discussed design posted at http://www.open-fcoe.org/pipermail/devel/2009-June/002566.html. The shared EM is required for multiple lports on eth device when using multiple VLANs or NPIV. Adds fc_exch_mgr_add API to add a EM to the lport and fc_exch_mgr_del API to delete previously added EM. Also adds function fc_exch_mgr_destroy() to destroy allocated EM. The kref is added to the EM to keep track of EM usage count, the EM is destroyed when no longer in use upon kref reaching to zero. The caller can specify match function to fc_exch_mgr_add, this will be used in determining exchange allocation from its EM or not. Moved calling of fcoe_em_config below fcoe_libfc_config calling, so that list head lp->ema_list is initialized before configuring EM. Signed-off-by: Vasu Dev <vasu.dev@intel.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: rename rport state "NONE" to "DELETE".Joe Eykholt2009-08-221-1/+1
| | | | | | | | | | | | | | | State RPORT_ST_NONE was intented to be an invalid state (0), never used. This was a misguided attempt to be sure it was always initialized. Having an extra state meaning nothing requires switch statements to have a case covering that state. State NONE has been used instead to mean the remote port is being deleted. Changing the name to RPORT_ST_DELETE. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: rename lport NONE state to DISABLEDJoe Eykholt2009-08-221-1/+1
| | | | | | | | | | | | The state NONE was meant to be invalid, but has been used as the initial state. Rename it to be DISABLED, as more descriptive. Further patches will make it the like the RESET state, except it won't transition to FLOGI until fc_lport_fabric_login() is called. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: change debug messages to give host number.Joe Eykholt2009-08-221-16/+20
| | | | | | | | | | | | | | | | | | | | | | | libfc debug messages currently show 'lport: <fc-id>:' wher <fc-id> is the hex assigned port-id. When the lport is logged off, that will be zero, so its hard to distinguish which instance is involved. The FC-ID can change if the port is re-patched or changes VSANs. Two lports may even have the same FC-ID if connected to isolated SANs. Change the debug messages to print the SCSI host number "hostN:", which will not change for the life of the lport. Still show the FC_ID on lport messages. Also, add a macro to FC_RPORT_ID_DBG for rport debugging where there's no rdata structure involved. It takes the lport and port_id as parameters. Use this in fc_rport_recv_plogi_req() and fc_rport_recv_logo_req(). Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] libfc: remove extra semicolons from debug macrosJoe Eykholt2009-08-221-10/+10
| | | | | | | | | | This is unlikely to cause any problems, but the libfc debug macros introduce extra undesirable semicolons. Signed-off-by: Joe Eykholt <jeykholt@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* [SCSI] scsi_dh: Reference count scsi_dh_attachChandra Seetharaman2009-08-221-0/+2
| | | | | | | | | | | | | | Problem reported: http://marc.info/?l=dm-devel&m=124585978305866&w=2 scsi_dh does not do a refernce count for attach/detach, and this affects the way it is supposed to work with multipath when a device is not in the dev_list of the hardware handler. This patch adds a reference count that counts each attach. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* Merge branch 'drm-fixes' of ↵Linus Torvalds2009-08-211-0/+4
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/radeon: add GET_PARAM/INFO support for Z pipes drm/radeon/kms: add r100/r200 OQ support. drm: Fix sysfs device confusion. drm/radeon/kms: implement the bo busy ioctl properly.
| * drm/radeon: add GET_PARAM/INFO support for Z pipesAlex Deucher2009-08-211-0/+2
| | | | | | | | | | | | | | Needed for occlusion queries on rv530 chips. Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * drm/radeon/kms: implement the bo busy ioctl properly.Dave Airlie2009-08-211-0/+2
| | | | | | | | | | | | | | | | | | The previous patch assumes the ioctl already existed, when it actually didn't. It also didn't return the correct error code. Signed-off-by: Dave Airlie <airlied@redhat.com>
* | Make bitmask 'and' operators return a result codeLinus Torvalds2009-08-212-20/+18
|/ | | | | | | | | | | | | When 'and'ing two bitmasks (where 'andnot' is a variation on it), some cases want to know whether the result is the empty set or not. In particular, the TLB IPI sending code wants to do cpumask operations and determine if there are any CPU's left in the final set. So this just makes the bitmask (and cpumask) functions return a boolean for whether the result has any bits set. Cc: stable@kernel.org (2.6.30, needed by TLB shootdown fix) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'drm-fixes' of ↵Linus Torvalds2009-08-191-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/kms: teardown crtc correctly when fb is destroyed. drm/kms/radeon: cleanup combios TV table like DDX. drm/radeon/kms: memset the allocated framebuffer before using it. drm/radeon/kms: although LVDS might be possible on crtc 1 don't do it. drm/radeon/kms: implement bo busy check + current domain drm/radeon/kms: cut down indirects in register accesses. drm/radeon/kms: Fix up vertical blank interrupt support. drm/radeon/kms: add rv530 R300_SU_REG_DEST + reloc for ZPASS_ADDR drm/edid: fixup detailed timings like the X server. drm/radeon/kms: Add specific rs690 authorized register table
| * drm/radeon/kms: implement bo busy check + current domainDave Airlie2009-08-171-1/+1
| | | | | | | | | | | | | | | | This implements the busy ioctl along with a current domain check. returns 0 or -EBUSY puts the current domain no matter what the answer. Signed-off-by: Dave Airlie <airlied@redhat.com>
* | mm: revert "oom: move oom_adj value"KOSAKI Motohiro2009-08-182-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to the mm_struct. It was a very good first step for sanitize OOM. However Paul Menage reported the commit makes regression to his job scheduler. Current OOM logic can kill OOM_DISABLED process. Why? His program has the code of similar to the following. ... set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */ ... if (vfork() == 0) { set_oom_adj(0); /* Invoked child can be killed */ execve("foo-bar-cmd"); } .... vfork() parent and child are shared the same mm_struct. then above set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also change oom_adj for vfork() parent. Then, vfork() parent (job scheduler) lost OOM immune and it was killed. Actually, fork-setting-exec idiom is very frequently used in userland program. We must not break this assumption. Then, this patch revert commit 2ff05b2b and related commit. Reverted commit list --------------------- - commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct) - commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE) - commit 8123681022 (oom: only oom kill exiting tasks with attached memory) - commit 933b787b57 (mm: copy over oom_adj value at fork time) Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-08-185-8/+13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (60 commits) net: restore gnet_stats_basic to previous definition NETROM: Fix use of static buffer e1000e: fix use of pci_enable_pcie_error_reporting e1000e: WoL does not work on 82577/82578 with manageability enabled cnic: Fix locking in init/exit calls. cnic: Fix locking in start/stop calls. bnx2: Use mutex on slow path cnic calls. cnic: Refine registration with bnx2. cnic: Fix symbol_put_addr() panic on ia64. gre: Fix MTU calculation for bound GRE tunnels pegasus: Add new device ID. drivers/net: fixed drivers that support netpoll use ndo_start_xmit() via-velocity: Fix test of mii_status bit VELOCITY_DUPLEX_FULL rt2x00: fix memory corruption in rf cache, add a sanity check ixgbe: Fix receive on real device when VLANs are configured ixgbe: Do not return 0 in ixgbe_fcoe_ddp() upon FCP_RSP in DDP completion netxen: free napi resources during detach netxen: remove netxen workqueue ixgbe: fix issues setting rx-usecs with legacy interrupts can: fix oops caused by wrong rtnl newlink usage ...
| * | net: restore gnet_stats_basic to previous definitionEric Dumazet2009-08-175-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc for better SMP performance" the definition of struct gnet_stats_basic changed incompatibly, as copies of this struct are shipped to userland via netlink. Restoring old behavior is not welcome, for performance reason. Fix is to use a private structure for kernel, and teach gnet_stats_copy_basic() to convert from kernel to user land, using legacy structure (struct gnet_stats_basic) Based on a report and initial patch from Michael Spang. Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | security: define round_hint_to_min in !CONFIG_SECURITYEric Paris2009-08-171-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the header files to define round_hint_to_min() and to define mmap_min_addr_handler() in the !CONFIG_SECURITY case. Built and tested with !CONFIG_SECURITY Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* | | Security/SELinux: seperate lsm specific mmap_min_addrEric Paris2009-08-172-15/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently SELinux enforcement of controls on the ability to map low memory is determined by the mmap_min_addr tunable. This patch causes SELinux to ignore the tunable and instead use a seperate Kconfig option specific to how much space the LSM should protect. The tunable will now only control the need for CAP_SYS_RAWIO and SELinux permissions will always protect the amount of low memory designated by CONFIG_LSM_MMAP_MIN_ADDR. This allows users who need to disable the mmap_min_addr controls (usual reason being they run WINE as a non-root user) to do so and still have SELinux controls preventing confined domains (like a web server) from being able to map some area of low memory. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* | | Capabilities: move cap_file_mmap to commoncap.cEric Paris2009-08-171-3/+4
| |/ |/| | | | | | | | | | | | | | | | | | | | | Currently we duplicate the mmap_min_addr test in cap_file_mmap and in security_file_mmap if !CONFIG_SECURITY. This patch moves cap_file_mmap into commoncap.c and then calls that function directly from security_file_mmap ifndef CONFIG_SECURITY like all of the other capability checks are done. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
* | Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds2009-08-131-11/+38
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Report the cloning task as parent on perf_counter_fork() perf_counter: Fix an ipi-deadlock perf: Rework/fix the whole read vs group stuff perf_counter: Fix swcounter context invariance perf report: Don't show unresolved DSOs and symbols when -S/-d is used perf tools: Add a general option to enable raw sample records perf tools: Add a per tracepoint counter attribute to get raw sample perf_counter: Provide hw_perf_counter_setup_online() APIs perf list: Fix large list output by using the pager perf_counter, x86: Fix/improve apic fallback perf record: Add missing -C option support for specifying profile cpu perf tools: Fix dso__new handle() to handle deleted DSOs perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available perf report: Show the tid too in -D perf record: Fix .tid and .pid fill-in when synthesizing events perf_counter, x86: Fix generic cache events on P6-mobile CPUs perf_counter, x86: Fix lapic printk message
| * | perf: Rework/fix the whole read vs group stuffPeter Zijlstra2009-08-131-11/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace PERF_SAMPLE_GROUP with PERF_SAMPLE_READ and introduce PERF_FORMAT_GROUP to deal with group reads in a more generic way. This allows you to get group reads out of read() as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey J Ashford <cjashfor@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> LKML-Reference: <20090813103655.117411814@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | perf_counter: Provide hw_perf_counter_setup_online() APIsIngo Molnar2009-08-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide weak aliases for hw_perf_counter_setup_online(). This is used by the BTS patches (for v2.6.32), but it interacts with fixes so propagate this upstream. (it has no effect as of yet) Also export perf_counter_output() to architecture code. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2009-08-131-1/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Fix handling of bad requeue syscall pairing futex: Fix compat_futex to be same as futex for REQUEUE_PI locking, sched: Give waitqueue spinlocks their own lockdep classes futex: Update futex_q lock_ptr on requeue proxy lock
| * | | locking, sched: Give waitqueue spinlocks their own lockdep classesPeter Zijlstra2009-08-101-1/+8
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Give waitqueue spinlocks their own lockdep classes when they are initialised from init_waitqueue_head(). This means that struct wait_queue::func functions can operate other waitqueues. This is used by CacheFiles to catch the page from a backing fs being unlocked and to wake up another thread to take a copy of it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: linux-cachefs@redhat.com Cc: torvalds@osdl.org Cc: akpm@linux-foundation.org LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | NFS: Fix an O_DIRECT Oops...Trond Myklebust2009-08-121-3/+2
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds2009-08-102-7/+20
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) perf_counter: Zero dead bytes from ftrace raw samples size alignment perf_counter: Subtract the buffer size field from the event record size perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data perf_counter: Correct PERF_SAMPLE_RAW output perf tools: callchain: Fix bad rounding of minimum rate perf_counter tools: Fix libbfd detection for systems with libz dependency perf: "Longum est iter per praecepta, breve et efficax per exempla" perf_counter: Fix a race on perf_counter_ctx perf_counter: Fix tracepoint sampling to be part of generic sampling perf_counter: Work around gcc warning by initializing tracepoint record unconditionally perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode perf tools: callchain: Fix 'perf report' display to be callchain by default perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains perf record: Fix the -A UI for empty or non-existent perf.data perf util: Fix do_read() to fail on EOF instead of busy-looping perf list: Fix the output to not include tracepoints without an id perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc) perf report: Fix per task mult-counter stat reporting ...
| * perf_counter: Zero dead bytes from ftrace raw samples size alignmentFrederic Weisbecker2009-08-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After aligning the ftrace raw samples, there are dead bytes storing random data from the stack. We don't want to leak these to userspace, then zero these out. Before: 0x2de88 [0x50]: event: 9 . . ... raw event: size 80 bytes . 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........ . 0010: 68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00 h...h...,...... . 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h.. . 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0.... . 0040: 68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f h...@.F........ ^ ^ ^ ^ Leak After: 0x2d318 [0x50]: event: 9 . . ... raw event: size 80 bytes . 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........ . 0010: 68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00 h...h...h...... . 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h.. . 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0.... . 0040: 68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00 h.....F........ ^ ^ ^ ^ Fixed Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de>
| * perf_counter: Subtract the buffer size field from the event record sizeFrederic Weisbecker2009-08-101-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We compute the perf raw sample size by aligning the raw ftrace event size plus the buffer size field itself. We do that instead of aligning only the perf raw sample size, so that we might economize some in some cases. But this buffer size field is not stored in the perf raw sample, we must then substract its size from the buffer once we computed the alignment unless we may get a useless u32 field in the buffer. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090810141129.GA5124@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: Correct PERF_SAMPLE_RAW outputPeter Zijlstra2009-08-102-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | PERF_SAMPLE_* output switches should unconditionally output the correct format, as they are the only way to unambiguously parse the PERF_EVENT_SAMPLE data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1249896447.17467.74.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf_counter: Fix tracepoint sampling to be part of generic samplingFrederic Weisbecker2009-08-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on Peter's comments, make tracepoint sampling generic just like all the other sampling bits are. This is a rename with no code changes: - PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW - struct perf_tracepoint_record to perf_raw_record We want the system in place that transport tracepoints raw samples events into the perf ring buffer to be generalized and usable by any type of counter. Reported-by; Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2009-08-091-0/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Avoid redelivery of edge interrupt before next edge KVM: MMU: limit rmap chain length KVM: ia64: fix build failures due to ia64/unsigned long mismatches KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc KVM: fix ack not being delivered when msi present KVM: s390: fix wait_queue handling KVM: VMX: Fix locking imbalance on emulation failure KVM: VMX: Fix locking order in handle_invalid_guest_state KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages KVM: SVM: force new asid on vcpu migration KVM: x86: verify MTRR/PAT validity KVM: PIT: fix kpit_elapsed division by zero KVM: Fix KVM_GET_MSR_INDEX_LIST
| * KVM: fix ack not being delivered when msi presentMichael S. Tsirkin2009-08-051-0/+1
| | | | | | | | | | | | | | | | | | | | kvm_notify_acked_irq does not check irq type, so that it sometimes interprets msi vector as irq. As a result, ack notifiers are not called, which typially hangs the guest. The fix is to track and check irq type. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | perf_counter: Fix/complete ftrace event records samplingFrederic Weisbecker2009-08-093-36/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the kernel side support for ftrace event record sampling. A new counter sampling attribute is added: PERF_SAMPLE_TP_RECORD which requests ftrace events record sampling. In this case if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint fires, we emit the tracepoint binary record to the perfcounter event buffer, as a sample. Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf record: perf record -f -F 1 -a -e workqueue:workqueue_execution perf report -D 0x21e18 [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........ . 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!...... . 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve . 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1........... . 0040: e0 b1 31 81 ff ff ff ff ....... . 0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33 The raw ftrace binary record starts at offset 0020. Translation: struct trace_entry { type = 0x2b = 43; flags = 1; preempt_count = 2; pid = 0xa = 10; tgid = 0xa = 10; } thread_comm = "events/1" thread_pid = 0xa = 10; func = 0xffffffff8131b1e0 = flush_to_ldisc() What will come next? - Userspace support ('perf trace'), 'flight data recorder' mode for perf trace, etc. - The unconditional copy from the profiling callback brings some costs however if someone wants no such sampling to occur, and needs to be fixed in the future. For that we need to have an instant access to the perf counter attribute. This is a matter of a flag to add in the struct ftrace_event. - Take care of the events recursivity! Don't ever try to record a lock event for example, it seems some locking is used in the profiling fast path and lead to a tracing recursivity. That will be fixed using raw spinlock or recursivity protection. - [...] - Profit! :-) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
OpenPOWER on IntegriCloud