summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* blk-throttle: make blk_throtl_drain() ready for hierarchyTejun Heo2013-05-141-11/+40
| | | | | | | | | | | | | | The current blk_throtl_drain() assumes that all active throtl_grps are queued on throtl_data->service_queue, which won't be true once hierarchy support is implemented. This patch makes blk_throtl_drain() perform post-order walk of the blkg hierarchy draining each associated throtl_grp, which guarantees that all bios will eventually be pushed to the top-level service_queue in throtl_data. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: dispatch from throtl_pending_timer_fn()Tejun Heo2013-05-141-25/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, blk_throtl_dispatch_work_fn() is responsible for both dispatching bio's from throtl_grp's according to their limits and then issuing the dispatched bios. This patch moves the dispatch part to throtl_pending_timer_fn() so that the work item is kicked iff there are bio's to issue. This is to avoid work item execution at each step when hierarchy support is enabled. bio's will be dispatched towards the top-level service_queue from the timers at each layer and the work item will only be used to issue the bio's which reached the top-level service_queue. While fetching bio's to issue from bio_lists[], blk_throtl_dispatch_work_fn() fetches all READs before WRITEs. While the original code also dispatched READs first, if multiple throtl_grps are dispatched on the same run, WRITEs from throtl_grp which is dispatched first would precede READs from throtl_grps which are dispatched later. While this is a behavior change, given that the previous code already prioritized READs and block layer generally prioritizes and segregates READs from WRITEs, this isn't likely to make any noticeable differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: implement dispatch loopingTejun Heo2013-05-141-26/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | throtl_select_dispatch() only dispatches throtl_quantum bios on each invocation. blk_throtl_dispatch_work_fn() in turn depends on throtl_schedule_next_dispatch() scheduling the next dispatch window immediately so that undue delays aren't incurred. This effectively chains multiple dispatch work item executions back-to-back when there are more than throtl_quantum bios to dispatch on a given tick. There is no reason to finish the current work item just to repeat it immediately. This patch makes throtl_schedule_next_dispatch() return %false without doing anything if the current dispatch window is still open and updates blk_throtl_dispatch_work_fn() repeat dispatching after cpu_relax() on %false return. This change will help implementing hierarchy support as dispatching will be done from pending_timer and immediate reschedule of timer function isn't supported and doesn't make much sense. While this patch changes how dispatch behaves when there are more than throtl_quantum bios to dispatch on a single tick, the behavior change is immaterial. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: separate out throtl_service_queue->pending_timer from ↵Tejun Heo2013-05-141-23/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | throtl_data->dispatch_work Currently, throtl_data->dispatch_work is a delayed_work item which handles both delayed dispatch and issuing bios. The two tasks will be separated to support proper hierarchy. To prepare for that, this patch separates out the timer into throtl_service_queue->pending_timer from throtl_data->dispatch_work and make the latter a work_struct. * As the timer is now per-service_queue, it's initialized and del_sync'd as its corresponding service_queue is created and destroyed. The timer, when triggered, simply schedules throtl_data->dispathc_work for execution. * throtl_schedule_delayed_work() is renamed to throtl_schedule_pending_timer() and takes @sq and @expires now. * Simiarly, throtl_schedule_next_dispatch() now takes @sq, which should be the parent_sq of the service_queue which just got a new bio or updated. As the parent_sq is always the top-level service_queue now, this doesn't change anything at this point. This patch doesn't introduce any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats ↵Tejun Heo2013-05-141-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | update with it With proper hierarchy support, a bio can be dispatched multiple times until it reaches the top-level service_queue and we don't want to update dispatch stats at each step. They are local stats and will be kept local. If recursive stats are necessary, they should be implemented separately and definitely not by updating counters recursively on each dispatch. This patch moves REQ_THROTTLED setting to throtl_charge_bio() and gate stats update with it so that dispatch stats are updated only on the first time the bio is charged to a throtl_grp, which will always be the throtl_grp the bio was originally queued to. This means that REQ_THROTTLED would be set even for bios which don't get throttled. As we don't want bios to leave blk-throtl with the flag set, move REQ_THROTLLED clearing to the end of blk_throtl_bio() and clear if the bio is being issued directly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log()Tejun Heo2013-05-141-29/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that both throtl_data and throtl_grp embed throtl_service_queue, we can unify throtl_log() and throtl_log_tg(). * sq_to_tg() is added. This returns the throtl_grp a service_queue is embedded in. If the service_queue is the top-level one embedded in throtl_data, NULL is returned. * sq_to_td() is added. A service_queue is always associated with a throtl_data. This function finds the associated td and returns it. * throtl_log() is updated to take throtl_service_queue instead of throtl_data. If the service_queue is one embedded in throtl_grp, it prints the same header as throtl_log_tg() did. If it's one embedded in throtl_data, it behaves the same as before. This renders throtl_log_tg() unnecessary. Removed. This change is necessary for hierarchy support as we're gonna be using the same code paths to dispatch bios to intermediate service_queues embedded in throtl_grps and the top-level service_queue embedded in throtl_data. This patch doesn't make any behavior changes. v2: throtl_log() didn't print a space after blkg path. Updated so that it prints a space after throtl_grp path. Spotted by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: add throtl_service_queue->parent_sqTejun Heo2013-05-141-42/+39
| | | | | | | | | | | | | | | | | | | To prepare for hierarchy support, this patch adds throtl_service_queue->service_sq which points to the arent service_queue. Currently, for all service_queues embedded in throtl_grps, it points to throtl_data->service_queue. As throtl_data->service_queue doesn't have a parent its parent_sq is set to NULL. There are a number of functions which take both throtl_grp *tg and throtl_service_queue *parent_sq. With this patch, the parent service_queue can be determined from @tg and the @parent_sq arguments are removed. This patch doesn't make any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: generalize update_disptime optimization in blk_throtl_bio()Tejun Heo2013-05-141-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | When blk_throtl_bio() wants to queue a bio to a tg (throtl_grp), it avoids invoking tg_update_disptime() and throtl_schedule_next_dispatch() if the tg already has bios queued in that direction. As a new bio is appeneded after the existing ones, it can't change the tg's next dispatch time or the parent's dispatch schedule. This optimization is currently open coded in blk_throtl_bio(). Whether the target biolist was occupied was recorded in a local variable and later used to skip disptime update. This patch moves generalizes it so that throtl_add_bio_tg() sets a new flag THROTL_TG_WAS_EMPTY if the biolist was empty before the new bio was added. tg_update_disptime() clears the flag automatically. blk_throtl_bio() is updated to simply test the flag before updating disptime. This patch doesn't make any functional differences now but will enable using the same optimization for recursive dispatch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: dispatch to throtl_data->service_queue.bio_lists[]Tejun Heo2013-05-141-17/+23
| | | | | | | | | | | | | | | | | | | | | throtl_service_queues will eventually form a tree which is anchored at throtl_data->service_queue and queue bios will climb the tree to the top service_queue to be executed. This patch makes the dispatch paths in blk_throtl_dispatch_work_fn() and blk_throtl_drain() to dispatch bios to throtl_data->service_queue.bio_lists[] instead of the on-stack bio_lists. This will keep the final dispatch to the top level service_queue share the same mechanism as dispatches through the rest of the hierarchy. As bio's should be issued in a sleepable context, blk_throtl_dispatch_work_fn() transfers all dispatched bio's from the service_queue bio_lists[] into an onstack one before dropping queue_lock and issuing the bio's. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: move bio_lists[] and friends to throtl_service_queueTejun Heo2013-05-141-24/+39
| | | | | | | | | | | | | | | | throtl_service_queues will eventually form a tree which is anchored at throtl_data->service_queue and queue bios will climb the tree to the top service_queue to be executed. This patch moves bio_lists[] and nr_queued[] from throtl_grp to its service_queue to prepare for that. As currently only the throtl_data->service_queue is in use, this patch just ends up moving throtl_grp->bio_lists[] and ->nr_queued[] to throtl_grp->service_queue.bio_lists[] and ->nr_queued[] without making any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: add throtl_grp->service_queueTejun Heo2013-05-141-4/+11
| | | | | | | | | | | | | | | | | Currently, there's single service_queue per queue - throtl_data->service_queue. All active throtl_grp's are queued on the queue and dispatched according to their limits. To support hierarchy, this will be expanded such that active throtl_grp's form a tree anchored at throtl_data->service_queue and chained through each intermediate throtl_grp's service_queue. This patch adds throtl_grp->service_queue to prepare for hierarchy support. The initialization function - throtl_service_queue_init() - is added and replaces the macro initializer. The newly added tg->service_queue isn't used yet. Following patches will do. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: reorganize throtl_service_queue passed around as argumentTejun Heo2013-05-141-49/+51
| | | | | | | | | | | | | | | | | | | throtl_service_queue will be the building block of hierarchy support and will form a tree. This patch updates its usages as arguments to reduce confusion. * When a service queue is used as the parent role - the host of the rbtree - use @parent_sq instead of @sq. * For functions taking both @tg and @parent_sq, reorder them so that the order is (@tg, @parent_sq) not the other way around. This makes the code follow the usual convention of specifying the primary target of the operation as the first argument. This patch doesn't make any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: pass around throtl_service_queue instead of throtl_dataTejun Heo2013-05-141-25/+28
| | | | | | | | | | | | | | | | | | throtl_service_queue will be used as the basic block to implement hierarchy support. Pass around throtl_service_queue *sq instead of throtl_data *td in the following functions which will be used across multiple levels of hierarchy. * [__]throtl_enqueue/dequeue_tg() * throtl_add_bio_tg() * tg_update_disptime() * throtl_select_dispatch() Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: add backlink pointer from throtl_grp to throtl_dataTejun Heo2013-05-141-53/+53
| | | | | | | | | | | | | | | | | Add throtl_grp->td so that the td (throtl_data) a given tg (throtl_grp) belongs to can be determined, and remove @td argument from functions which take both @td and @tg as the former now can be determined from the latter. This generally simplifies the code and removes a number of cases where @td is passed as an argument without being actually used. This will also help hierarchy support implementation. While at it, in multi-line conditions, move the logical operators leading broken lines to the end of the previous line. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: simplify throtl_grp flag handlingTejun Heo2013-05-141-25/+9
| | | | | | | | | | | | blk-throttle is still using function-defining macros to define flag handling functions, which went out style at least a decade ago. Just define the flag as bitmask and use direct bit operations. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: rename throtl_rb_root to throtl_service_queueTejun Heo2013-05-141-42/+42
| | | | | | | | | | | | | | | | throtl_rb_root will be expanded to cover more roles for hierarchy support. Rename it to throtl_service_queue and make its fields more descriptive. * rb -> pending_tree * left -> first_pending * count -> nr_pending * min_disptime -> first_pending_disptime This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: remove pointless throtl_nr_queued() optimizationsTejun Heo2013-05-141-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | throtl_nr_queued() is used in several places to avoid performing certain operations when the throtl_data is empty. This usually is useless as those paths usually aren't traveled if there's no bio queued. * throtl_schedule_delayed_work() skips scheduling dispatch work item if @td doesn't have any bios queued; however, the only case it can be called when @td is empty is from tg_set_conf() which isn't something we should be optimizing for. * throtl_schedule_next_dispatch() takes a quick exit if @td is empty; however, right after that it triggers BUG if the service tree is empty. The two conditions are equivalent and it can just test @st->count for the quick exit. * blk_throtl_dispatch_work_fn() skips dispatch if @td is empty. This work function isn't usually invoked when @td is empty. The only possibility is from tg_set_conf() and when it happens the normal dispatching path can handle empty @td fine. No need to add special skip path. This patch removes the above three unnecessary optimizations, which leave throtl_log() call in blk_throtl_dispatch_work_fn() the only user of throtl_nr_queued(). Remove throtl_nr_queued() and open code it in throtl_log(). I don't think we need td->nr_queued[] at all. Maybe we can remove it later. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: relocate throtl_schedule_delayed_work()Tejun Heo2013-05-141-16/+13
| | | | | | | | | | Move throtl_schedule_delayed_work() above its first user so that the forward declaration can be removed. This patch is pure relocaiton. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: collapse throtl_dispatch() into the work functionTejun Heo2013-05-141-17/+9
| | | | | | | | | | | | | | | | blk-throttle is about to go through major restructuring to support hierarchy. Do cosmetic updates in preparation. * s/throtl_data->throtl_work/throtl_data->dispatch_work/ * s/blk_throtl_work()/blk_throtl_dispatch_work_fn()/ * Collapse throtl_dispatch() into blk_throtl_dispatch_work_fn() This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: remove deferred config application mechanismTejun Heo2013-05-141-54/+20
| | | | | | | | | | | | | | | | | | When bps or iops configuration changes, blk-throttle records the new configuration and sets a flag indicating that the config has changed. The flag is checked in the bio dispatch path and applied. This deferred config application was necessary due to limitations in blkcg framework, which haven't existed for quite a while now. This patch removes the deferred config application mechanism and applies new configurations directly from tg_set_conf(), which is simpler. v2: Dropped unnecessary throtl_schedule_delayed_work() call from tg_set_conf() as suggested by Vivek Goyal. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blk-throttle: remove spurious throtl_enqueue_tg() call from ↵Tejun Heo2013-05-141-3/+1
| | | | | | | | | | | | | throtl_select_dispatch() throtl_select_dispatch() calls throtl_enqueue_tg() right after tg_update_disptime(), which always calls the function anyway. The call is, while harmless, unnecessary. Remove it. This patch doesn't introduce any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blkcg: move bulk of blkcg_gq release operations to the RCU callbackTejun Heo2013-05-142-20/+18
| | | | | | | | | | | | | | | | | | | | | | Currently, when the last reference of a blkcg_gq is put, all then release operations sans the actual freeing happen directly in blkg_put(). As blkg_put() may be called under queue_lock, all pd_exit_fn()s may be too. This makes it impossible for pd_exit_fn()s to use del_timer_sync() on timers which grab the queue_lock which is an irq-safe lock due to the deadlock possibility described in the comment on top of del_timer_sync(). This can be easily avoided by perfoming the release operations in the RCU callback instead of directly from blkg_put(). This patch moves the blkcg_gq release operations to the RCU callback. As this leaves __blkg_release() with only call_rcu() invocation, blkg_rcu_free() is renamed to __blkg_release_rcu(), exported and call_rcu() invocation is now done directly from blkg_put() instead of going through __blkg_release() which is removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blkcg: invoke blkcg_policy->pd_init() after parent is linkedTejun Heo2013-05-141-17/+22
| | | | | | | | | | | | | | | | | | | Currently, when creating a new blkcg_gq, each policy's pd_init_fn() is invoked in blkg_alloc() before the parent is linked. This makes it difficult for policies to perform initializations which are dependent on the parent. This patch moves pd_init_fn() invocations to blkg_create() after the parent blkg is linked where the new blkg is fully initialized. As this means that blkg_free() can't assume that pd's are initialized, pd_exit_fn() invocations are moved to __blkg_release(). This guarantees that pd_exit_fn() is also invoked with fully initialized blkgs with valid parent pointers. This will help implementing hierarchy support in blk-throttle. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blkcg: implement blkg_for_each_descendant_post()Tejun Heo2013-05-141-0/+14
| | | | | | | This will be used by blk-throttle hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.hTejun Heo2013-05-142-22/+22
| | | | | | | | | blk-throttle hierarchy support will make use of it. Move blkg_for_each_descendant_pre() from block/blk-cgroup.c to block/blk-cgroup.h. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* blkcg: fix error return path in blkg_create()Tejun Heo2013-05-141-1/+1
| | | | | | | | | | | | | | In blkg_create(), after lookup of parent fails, the control jumps to error path with the error code encoded into @blkg. The error path doesn't use @blkg for the return value. It returns ERR_PTR(ret). Make lookup fail path set @ret instead of @blkg. Note that the parent lookup is guaranteed to succeed at that point and the condition check is purely for sanity and triggers WARN when fails. As such, I don't think it's necessary to mark it for -stable. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* Linux 3.10-rc1v3.10-rc1Linus Torvalds2013-05-111-2/+2
|
* Merge tag 'trace-fixes-v3.10' of ↵Linus Torvalds2013-05-116-108/+368
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing/kprobes update from Steven Rostedt: "The majority of these changes are from Masami Hiramatsu bringing kprobes up to par with the latest changes to ftrace (multi buffering and the new function probes). He also discovered and fixed some bugs in doing so. When pulling in his patches, I also found a few minor bugs as well and fixed them. This also includes a compile fix for some archs that select the ring buffer but not tracing. I based this off of the last patch you took from me that fixed the merge conflict error, as that was the commit that had all the changes I needed for this set of changes." * tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Support soft-mode disabling tracing/kprobes: Support ftrace_event_file base multibuffer tracing/kprobes: Pass trace_probe directly from dispatcher tracing/kprobes: Increment probe hit-count even if it is used by perf tracing/kprobes: Use bool for retprobe checker ftrace: Fix function probe when more than one probe is added ftrace: Fix the output of enabled_functions debug file ftrace: Fix locking in register_ftrace_function_probe() tracing: Add helper function trace_create_new_event() to remove duplicate code tracing: Modify soft-mode only if there's no other referrer tracing: Indicate enabled soft-mode in enable file tracing/kprobes: Fix to increment return event probe hit-count ftrace: Cleanup regex_lock and ftrace_lock around hash updating ftrace, kprobes: Fix a deadlock on ftrace_regex_lock ftrace: Have ftrace_regex_write() return either read or error tracing: Return error if register_ftrace_function_probe() fails for event_enable_func() tracing: Don't succeed if event_enable_func did not register anything ring-buffer: Select IRQ_WORK
| * tracing/kprobes: Support soft-mode disablingMasami Hiramatsu2013-05-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support soft-mode disabling on kprobe-based dynamic events. Soft-disabling is just ignoring recording if the soft disabled flag is set. Link: http://lkml.kernel.org/r/20130509054454.30398.7237.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/kprobes: Support ftrace_event_file base multibufferMasami Hiramatsu2013-05-091-36/+214
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support multi-buffer on kprobe-based dynamic events by using ftrace_event_file. Link: http://lkml.kernel.org/r/20130509054449.30398.88343.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/kprobes: Pass trace_probe directly from dispatcherMasami Hiramatsu2013-05-091-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the pointer of struct trace_probe directly from probe dispatcher to handlers. This removes redundant container_of macro uses. Same thing has already done in trace_uprobe. Link: http://lkml.kernel.org/r/20130509054441.30398.69112.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/kprobes: Increment probe hit-count even if it is used by perfMasami Hiramatsu2013-05-091-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increment probe hit-count for profiling even if it is used by perf tool. Same thing has already done in trace_uprobe. Link: http://lkml.kernel.org/r/20130509054436.30398.21133.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/kprobes: Use bool for retprobe checkerMasami Hiramatsu2013-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use bool instead of int for kretprobe checker. Link: http://lkml.kernel.org/r/20130509054431.30398.38561.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Fix function probe when more than one probe is addedSteven Rostedt (Red Hat)2013-05-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the first function probe is added and the function tracer is updated the functions are modified to call the probe. But when a second function is added, it updates the function records to have the second function also update, but it fails to update the actual function itself. This prevents the second (or third or forth and so on) probes from having their functions called. # echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter # echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter # cat trace # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | # touch /tmp/a # rm /tmp/a # cat trace # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | # ln -s /tmp/a # cat trace # tracer: nop # # entries-in-buffer/entries-written: 414/414 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | <idle>-0 [000] d..3 2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120 <...>-3114 [001] d..4 2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120 bash-2786 [000] d..3 2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120 kworker/0:1-34 [000] d..3 2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 <idle>-0 [002] d..3 2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120 sshd-2783 [002] d..3 2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120 Still need to update the functions even though the probe itself does not need to be registered again when added a new probe. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Fix the output of enabled_functions debug fileSteven Rostedt (Red Hat)2013-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The enabled_functions debugfs file was created to be able to see what functions have been modified from nops to calling a tracer. The current method uses the counter in the function record. As when a ftrace_ops is registered to a function, its count increases. But that doesn't mean that the function is actively being traced. /proc/sys/kernel/ftrace_enabled can be set to zero which would disable it, as well as something can go wrong and we can think its enabled when only the counter is set. The record's FTRACE_FL_ENABLED flag is set or cleared when its function is modified. That is a much more accurate way of knowing what function is enabled or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Fix locking in register_ftrace_function_probe()Steven Rostedt (Red Hat)2013-05-091-4/+6
| | | | | | | | | | | | | | The iteration of the ftrace function list and the call to ftrace_match_record() need to be protected by the ftrace_lock. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Add helper function trace_create_new_event() to remove duplicate codeSteven Rostedt (Red Hat)2013-05-091-12/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Both __trace_add_new_event() and __trace_early_add_new_event() do basically the same thing, except that __trace_add_new_event() does a little more. Instead of having duplicate code between the two functions, add a helper function trace_create_new_event() that both can use. This will help against having bugs fixed in one function but not the other. Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Modify soft-mode only if there's no other referrerMasami Hiramatsu2013-05-092-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify soft-mode flag only if no other soft-mode referrer (currently only the ftrace triggers) by using a reference counter in each ftrace_event_file. Without this fix, adding and removing several different enable/disable_event triggers on the same event clear soft-mode bit from the ftrace_event_file. This also happens with a typo of glob on setting triggers. e.g. # echo vfs_symlink:enable_event:net:netif_rx > set_ftrace_filter # cat events/net/netif_rx/enable 0* # echo typo_func:enable_event:net:netif_rx > set_ftrace_filter # cat events/net/netif_rx/enable 0 # cat set_ftrace_filter #### all functions enabled #### vfs_symlink:enable_event:net:netif_rx:unlimited As above, we still have a trigger, but soft-mode is gone. Link: http://lkml.kernel.org/r/20130509054429.30398.7464.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Sharp <dhsharp@google.com> Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Indicate enabled soft-mode in enable fileMasami Hiramatsu2013-05-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Indicate enabled soft-mode event as "1*" in "enable" file for each event, because it can be soft-disabled when disable_event trigger is hit. Link: http://lkml.kernel.org/r/20130509054426.30398.28202.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing/kprobes: Fix to increment return event probe hit-countMasami Hiramatsu2013-05-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to increment probe hit-count for function return event. Link: http://lkml.kernel.org/r/20130509054424.30398.34058.stgit@mhiramat-M0-7522 Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Cleanup regex_lock and ftrace_lock around hash updatingMasami Hiramatsu2013-05-091-27/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleanup regex_lock and ftrace_lock locking points around ftrace_ops hash update code. The new rule is that regex_lock protects ops->*_hash read-update-write code for each ftrace_ops. Usually, hash update is done by following sequence. 1. allocate a new local hash and copy the original hash. 2. update the local hash. 3. move(actually, copy) back the local hash to ftrace_ops. 4. update ftrace entries if needed. 5. release the local hash. This makes regex_lock protect #1-#4, and ftrace_lock to protect #3, #4 and adding and removing ftrace_ops from the ftrace_ops_list. The ftrace_lock protects #3 as well because the move functions update the entries too. Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace, kprobes: Fix a deadlock on ftrace_regex_lockMasami Hiramatsu2013-05-092-21/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a deadlock on ftrace_regex_lock which happens when setting an enable_event trigger on dynamic kprobe event as below. ---- sh-2.05b# echo p vfs_symlink > kprobe_events sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter ============================================= [ INFO: possible recursive locking detected ] 3.9.0+ #35 Not tainted --------------------------------------------- sh/72 is trying to acquire lock: (ftrace_regex_lock){+.+.+.}, at: [<ffffffff810ba6c1>] ftrace_set_hash+0x81/0x1f0 but task is already holding lock: (ftrace_regex_lock){+.+.+.}, at: [<ffffffff810b7cbd>] ftrace_regex_write.isra.29.part.30+0x3d/0x220 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(ftrace_regex_lock); lock(ftrace_regex_lock); *** DEADLOCK *** ---- To fix that, this introduces a finer regex_lock for each ftrace_ops. ftrace_regex_lock is too big of a lock which protects all filter/notrace_hash operations, but it doesn't need to be a global lock after supporting multiple ftrace_ops because each ftrace_ops has its own filter/notrace_hash. Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ Added initialization flag and automate mutex initialization for non ftrace.c ftrace_probes. ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ftrace: Have ftrace_regex_write() return either read or errorSteven Rostedt (Red Hat)2013-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | As ftrace_regex_write() reads the result of ftrace_process_regex() which can sometimes return a positive number, only consider a failure if the return is negative. Otherwise, it will skip possible other registered probes and by returning a positive number that wasn't read, it will confuse the user processes doing the writing. Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Return error if register_ftrace_function_probe() fails for ↵Steven Rostedt (Red Hat)2013-05-091-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | event_enable_func() register_ftrace_function_probe() returns the number of functions it registered, which can be zero, it can also return a negative number if something went wrong. But event_enable_func() only checks for the case that it didn't register anything, it needs to also check for the case that something went wrong and return that error code as well. Added some comments about the code as well, to make it more understandable. Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * tracing: Don't succeed if event_enable_func did not register anythingMasami Hiramatsu2013-05-091-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return 0 instead of the number of activated ftrace function probes if event_enable_func succeeded and return an error code if it failed or did not register any functions. But it currently returns the number of registered functions and if it didn't register anything, it returns 0, but that is considered success. This also fixes the return value. As if it succeeds, it returns the number of functions that were enabled, which is returned back to the user in ftrace_regex_write (the write() return code). If only one function is enabled, then the return code of the write is one, and this can confuse the user program in thinking it only wrote 1 byte. Link: http://lkml.kernel.org/r/20130509054413.30398.55650.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@intel.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ Rewrote change log to reflect that this fixes two bugs - SR ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * ring-buffer: Select IRQ_WORKSteven Rostedt (Red Hat)2013-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the wake up logic for waiters on the buffer has been moved from the tracing code to the ring buffer, it requires also adding IRQ_WORK as the wake up code is performed via irq_work. This fixes compile breakage when a user of the ring buffer is selected but tracing and irq_work are not. Link http://lkml.kernel.org/r/20130503115332.GT8356@rric.localhost Cc: Arnd Bergmann <arnd@arndb.de> Reported-by: Robert Richter <rric@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | Merge tag 'stable/for-linus-3.10-rc0-tag-two' of ↵Linus Torvalds2013-05-115-7/+55
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - More fixes in the vCPU PVHVM hotplug path. - Add more documentation. - Fix various ARM related issues in the Xen generic drivers. - Updates in the xen-pciback driver per Bjorn's updates. - Mask the x2APIC feature for PV guests. * tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pci: Used cached MSI-X capability offset xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST xen: mask x2APIC feature in PV xen: SWIOTLB is only used on x86 xen/spinlock: Fix check from greater than to be also be greater or equal to. xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info xen/vcpu: Document the xen_vcpu_info and xen_vcpu xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
| * | xen/pci: Used cached MSI-X capability offsetBjorn Helgaas2013-05-101-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now cache the MSI-X capability offset in the struct pci_dev, so no need to find the capability again. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASKBjorn Helgaas2013-05-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PCI_MSIX_FLAGS_BIRMASK is mis-named because the BIR mask is in the Table Offset register, not the flags ("Message Control" per spec) register. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen: clear IRQ_NOAUTOEN and IRQ_NOREQUESTJulien Grall2013-05-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reset the IRQ_NOAUTOEN and IRQ_NOREQUEST flags that are enabled by default on ARM. If IRQ_NOAUTOEN is set, __setup_irq doesn't call irq_startup, that is responsible for calling irq_unmask at startup time. As a result event channels remain masked. The clear is already made in bind_evtchn_to_irq with commit a8636c0 but was missing on all others bind_*_to_irq. Move the clear in xen_irq_info_common_init. On x86, IRQ_NOAUTOEN and IRQ_NOREQUEST are cleared by default, so this commit doesn't impact this architecture. Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
OpenPOWER on IntegriCloud