op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	workqueue: s/__create_workqueue()/alloc_workqueue()/, and add system workqueues	Tejun Heo	2010-06-29	2	-28/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes changes to make new workqueue features available to its users. * Now that workqueue is more featureful, there should be a public workqueue creation function which takes paramters to control them. Rename __create_workqueue() to alloc_workqueue() and make 0 max_active mean WQ_DFL_ACTIVE. In the long run, all create_workqueue_() will be converted over to alloc_workqueue(). To further unify access interface, rename keventd_wq to system_wq and export it. * Add system_long_wq and system_nrt_wq. The former is to host long running works separately (so that flush_scheduled_work() dosen't take so long) and the latter guarantees any queued work item is never executed in parallel by multiple CPUs. These will be used by future patches to update workqueue users. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: increase max_active of keventd and kill current_is_keventd()	Tejun Heo	2010-06-29	4	-53/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Define WQ_MAX_ACTIVE and create keventd with max_active set to half of it which means that keventd now can process upto WQ_MAX_ACTIVE / 2 - 1 works concurrently. Unless some combination can result in dependency loop longer than max_active, deadlock won't happen and thus it's unnecessary to check whether current_is_keventd() before trying to schedule a work. Kill current_is_keventd(). (Lockdep annotations are broken. We need lock_map_acquire_read_norecurse()) Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com>
*	workqueue: implement concurrency managed dynamic worker pool	Tejun Heo	2010-06-29	3	-116/+841
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of creating a worker for each cwq and putting it into the shared pool, manage per-cpu workers dynamically. Works aren't supposed to be cpu cycle hogs and maintaining just enough concurrency to prevent work processing from stalling due to lack of processing context is optimal. gcwq keeps the number of concurrent active workers to minimum but no less. As long as there's one or more running workers on the cpu, no new worker is scheduled so that works can be processed in batch as much as possible but when the last running worker blocks, gcwq immediately schedules new worker so that the cpu doesn't sit idle while there are works to be processed. gcwq always keeps at least single idle worker around. When a new worker is necessary and the worker is the last idle one, the worker assumes the role of "manager" and manages the worker pool - ie. creates another worker. Forward-progress is guaranteed by having dedicated rescue workers for workqueues which may be necessary while creating a new worker. When the manager is having problem creating a new worker, mayday timer activates and rescue workers are summoned to the cpu and execute works which might be necessary to create new workers. Trustee is expanded to serve the role of manager while a CPU is being taken down and stays down. As no new works are supposed to be queued on a dead cpu, it just needs to drain all the existing ones. Trustee continues to try to create new workers and summon rescuers as long as there are pending works. If the CPU is brought back up while the trustee is still trying to drain the gcwq from the previous offlining, the trustee will kill all idles ones and tell workers which are still busy to rebind to the cpu, and pass control over to gcwq which assumes the manager role as necessary. Concurrency managed worker pool reduces the number of workers drastically. Only workers which are necessary to keep the processing going are created and kept. Also, it reduces cache footprint by avoiding unnecessarily switching contexts between different workers. Please note that this patch does not increase max_active of any workqueue. All workqueues can still only process one work per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: implement worker_{set\|clr}_flags()	Tejun Heo	2010-06-29	1	-8/+40
\| \| \| \| \| \| \| \|	Implement worker_{set\|clr}_flags() to manipulate worker flags. These are currently simple wrappers but logics to track the current worker state and the current level of concurrency will be added. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: use shared worklist and pool all workers per cpu	Tejun Heo	2010-06-29	1	-32/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use gcwq->worklist instead of cwq->worklist and break the strict association between a cwq and its worker. All works queued on a cpu are queued on gcwq->worklist and processed by any available worker on the gcwq. As there no longer is strict association between a cwq and its worker, whether a work is executing can now only be determined by calling [__]find_worker_executing_work(). After this change, the only association between a cwq and its worker is that a cwq puts a worker into shared worker pool on creation and kills it on destruction. As all workqueues are still limited to max_active of one, this means that there are always at least as many workers as active works and thus there's no danger for deadlock. The break of strong association between cwqs and workers requires somewhat clumsy changes to current_is_keventd() and destroy_workqueue(). Dynamic worker pool management will remove both clumsy changes. current_is_keventd() won't be necessary at all as the only reason it exists is to avoid queueing a work from a work which will be allowed just fine. The clumsy part of destroy_workqueue() is added because a worker can only be destroyed while idle and there's no guarantee a worker is idle when its wq is going down. With dynamic pool management, workers are not associated with workqueues at all and only idle ones will be submitted to destroy_workqueue() so the code won't be necessary anymore. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: implement WQ_NON_REENTRANT	Tejun Heo	2010-06-29	2	-3/+30
\| \| \| \| \| \| \| \| \|	With gcwq managing all the workers and work->data pointing to the last gcwq it was on, non-reentrance can be easily implemented by checking whether the work is still running on the previous gcwq on queueing. Implement it. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: carry cpu number in work data once execution starts	Tejun Heo	2010-06-29	2	-61/+109
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To implement non-reentrant workqueue, the last gcwq a work was executed on must be reliably obtainable as long as the work structure is valid even if the previous workqueue has been destroyed. To achieve this, work->data will be overloaded to carry the last cpu number once execution starts so that the previous gcwq can be located reliably. This means that cwq can't be obtained from work after execution starts but only gcwq. Implement set_work_{cwq\|cpu}(), get_work_[g]cwq() and clear_work_data() to set work data to the cpu number when starting execution, access the overloaded work data and clear it after cancellation. queue_delayed_work_on() is updated to preserve the last cpu while in-flight in timer and other callers which depended on getting cwq from work after execution starts are converted to depend on gcwq instead. * Anton Blanchard fixed compile error on powerpc due to missing linux/threads.h include. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Anton Blanchard <anton@samba.org>
*	workqueue: add find_worker_executing_work() and track current_cwq	Tejun Heo	2010-06-29	1	-0/+56
\| \| \| \| \| \| \| \| \|	Now that all the workers are tracked by gcwq, we can find which worker is executing a work from gcwq. Implement find_worker_executing_work() and make worker track its current_cwq so that we can find things the other way around. This will be used to implement non-reentrant wqs. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: make single thread workqueue shared worker pool friendly	Tejun Heo	2010-06-29	2	-38/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reimplement st (single thread) workqueue so that it's friendly to shared worker pool. It was originally implemented by confining st workqueues to use cwq of a fixed cpu and always having a worker for the cpu. This implementation isn't very friendly to shared worker pool and suboptimal in that it ends up crossing cpu boundaries often. Reimplement st workqueue using dynamic single cpu binding and cwq->limit. WQ_SINGLE_THREAD is replaced with WQ_SINGLE_CPU. In a single cpu workqueue, at most single cwq is bound to the wq at any given time. Arbitration is done using atomic accesses to wq->single_cpu when queueing a work. Once bound, the binding stays till the workqueue is drained. Note that the binding is never broken while a workqueue is frozen. This is because idle cwqs may have works waiting in delayed_works queue while frozen. On thaw, the cwq is restarted if there are any delayed works or unbound otherwise. When combined with max_active limit of 1, single cpu workqueue has exactly the same execution properties as the original single thread workqueue while allowing sharing of per-cpu workers. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: reimplement CPU hotplugging support using trustee	Tejun Heo	2010-06-29	2	-16/+279
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reimplement CPU hotplugging support using trustee thread. On CPU down, a trustee thread is created and each step of CPU down is executed by the trustee and workqueue_cpu_callback() simply drives and waits for trustee state transitions. CPU down operation no longer waits for works to be drained but trustee sticks around till all pending works have been completed. If CPU is brought back up while works are still draining, workqueue_cpu_callback() tells trustee to step down and tell workers to rebind to the cpu. As it's difficult to tell whether cwqs are empty if it's freezing or frozen, trustee doesn't consider draining to be complete while a gcwq is freezing or frozen (tracked by new GCWQ_FREEZING flag). Also, workers which get unbound from their cpu are marked with WORKER_ROGUE. Trustee based implementation doesn't bring any new feature at this point but it will be used to manage worker pool when dynamic shared worker pool is implemented. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: implement worker states	Tejun Heo	2010-06-29	1	-41/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement worker states. After created, a worker is STARTED. While a worker isn't processing a work, it's IDLE and chained on gcwq->idle_list. While processing a work, a worker is BUSY and chained on gcwq->busy_hash. Also, gcwq now counts the number of all workers and idle ones. worker_thread() is restructured to reflect state transitions. cwq->more_work is removed and waking up a worker makes it check for events. A worker is killed by setting DIE flag while it's IDLE and waking it up. This gives gcwq better visibility of what's going on and allows it to find out whether a work is executing quickly which is necessary to have multiple workers processing the same cwq. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: introduce global cwq and unify cwq locks	Tejun Heo	2010-06-29	1	-62/+98
\| \| \| \| \| \| \| \| \| \| \| \|	There is one gcwq (global cwq) per each cpu and all cwqs on an cpu point to it. A gcwq contains a lock to be used by all cwqs on the cpu and an ida to give IDs to workers belonging to the cpu. This patch introduces gcwq, moves worker_ida into gcwq and make all cwqs on the same cpu use the cpu's gcwq->lock instead of separate locks. gcwq->ida is now protected by gcwq->lock too. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: reimplement workqueue freeze using max_active	Tejun Heo	2010-06-29	3	-12/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, workqueue freezing is implemented by marking the worker freezeable and calling try_to_freeze() from dispatch loop. Reimplement it using cwq->limit so that the workqueue is frozen instead of the worker. * workqueue_struct->saved_max_active is added which stores the specified max_active on initialization. * On freeze, all cwq->max_active's are quenched to zero. Freezing is complete when nr_active on all cwqs reach zero. * On thaw, all cwq->max_active's are restored to wq->saved_max_active and the worklist is repopulated. This new implementation allows having single shared pool of workers per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: implement per-cwq active work limit	Tejun Heo	2010-06-29	2	-11/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add cwq->nr_active, cwq->max_active and cwq->delayed_work. nr_active counts the number of active works per cwq. A work is active if it's flushable (colored) and is on cwq's worklist. If nr_active reaches max_active, new works are queued on cwq->delayed_work and activated later as works on the cwq complete and decrement nr_active. cwq->max_active can be specified via the new @max_active parameter to __create_workqueue() and is set to 1 for all workqueues for now. As each cwq has only single worker now, this double queueing doesn't cause any behavior difference visible to its users. This will be used to reimplement freeze/thaw and implement shared worker pool. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: reimplement work flushing using linked works	Tejun Heo	2010-06-29	2	-22/+134
\| \| \| \| \| \| \| \| \| \| \| \| \|	A work is linked to the next one by having WORK_STRUCT_LINKED bit set and these links can be chained. When a linked work is dispatched to a worker, all linked works are dispatched to the worker's newly added ->scheduled queue and processed back-to-back. Currently, as there's only single worker per cwq, having linked works doesn't make any visible behavior difference. This change is to prepare for multiple shared workers per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: introduce worker	Tejun Heo	2010-06-29	1	-61/+150
\| \| \| \| \| \| \| \| \| \| \| \| \|	Separate out worker thread related information to struct worker from struct cpu_workqueue_struct and implement helper functions to deal with the new struct worker. The only change which is visible outside is that now workqueue worker are all named "kworker/CPUID:WORKERID" where WORKERID is allocated from per-cpu ida. This is in preparation of concurrency managed workqueue where shared multiple workers would be available per cpu. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: reimplement workqueue flushing using color coded works	Tejun Heo	2010-06-29	2	-54/+322
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reimplement workqueue flushing using color coded works. wq has the current work color which is painted on the works being issued via cwqs. Flushing a workqueue is achieved by advancing the current work colors of cwqs and waiting for all the works which have any of the previous colors to drain. Currently there are 16 possible colors, one is reserved for no color and 15 colors are useable allowing 14 concurrent flushes. When color space gets full, flush attempts are batched up and processed together when color frees up, so even with many concurrent flushers, the new implementation won't build up huge queue of flushers which has to be processed one after another. Only works which are queued via __queue_work() are colored. Works which are directly put on queue using insert_work() use NO_COLOR and don't participate in workqueue flushing. Currently only works used for work-specific flush fall in this category. This new implementation leaves only cleanup_workqueue_thread() as the user of flush_cpu_workqueue(). Just make its users use flush_workqueue() and kthread_stop() directly and kill cleanup_workqueue_thread(). As workqueue flushing doesn't use barrier request anymore, the comment describing the complex synchronization around it in cleanup_workqueue_thread() is removed together with the function. This new implementation is to allow having and sharing multiple workers per cpu. Please note that one more bit is reserved for a future work flag by this patch. This is to avoid shifting bits and updating comments later. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: update cwq alignement	Tejun Heo	2010-06-29	2	-6/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	work->data field is used for two purposes. It points to cwq it's queued on and the lower bits are used for flags. Currently, two bits are reserved which is always safe as 4 byte alignment is guaranteed on every architecture. However, future changes will need more flag bits. On SMP, the percpu allocator is capable of honoring larger alignment (there are other users which depend on it) and larger alignment works just fine. On UP, percpu allocator is a thin wrapper around kzalloc/kfree() and don't honor alignment request. This patch introduces WORK_STRUCT_FLAG_BITS and implements alloc/free_cwqs() which guarantees max(1 << WORK_STRUCT_FLAG_BITS, __alignof__(unsigned long long) alignment both on SMP and UP. On SMP, simply wrapping percpu allocator is enough. On UP, extra space is allocated so that cwq can be aligned and the original pointer can be stored after it which is used in the free path. * Alignment problem on UP is reported by Michal Simek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: Michal Simek <michal.simek@petalogix.com>
*	workqueue: kill cpu_populated_map	Tejun Heo	2010-06-29	1	-114/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Worker management is about to be overhauled. Simplify things by removing cpu_populated_map, creating workers for all possible cpus and making single threaded workqueues behave more like multi threaded ones. After this patch, all cwqs are always initialized, all workqueues are linked on the workqueues list and workers for all possibles cpus always exist. This also makes CPU hotplug support simpler - checking ->cpus_allowed before processing works in worker_thread() and flushing cwqs on CPU_POST_DEAD are enough. While at it, make get_cwq() always return the cwq for the specified cpu, add target_cwq() for cases where single thread distinction is necessary and drop all direct usage of per_cpu_ptr() on wq->cpu_wq. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: temporarily remove workqueue tracing	Tejun Heo	2010-06-29	3	-114/+3
\| \| \| \| \| \| \| \|	Strip tracing code from workqueue and remove workqueue tracing. This is temporary measure till concurrency managed workqueue is complete. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com>
*	workqueue: separate out process_one_work()	Tejun Heo	2010-06-29	1	-39/+61
\| \| \| \| \| \| \|	Separate out process_one_work() out of run_workqueue(). This patch doesn't cause any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: define masks for work flags and conditionalize STATIC flags	Tejun Heo	2010-06-29	2	-14/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Work flags are about to see more traditional mask handling. Define WORK_STRUCT__BIT as the bit position constant and redefine WORK_STRUCT_ as bit masks. Also, make WORK_STRUCT_STATIC_* flags conditional While at it, re-define these constants as enums and use WORK_STRUCT_STATIC instead of hard-coding 2 in WORK_DATA_STATIC_INIT(). Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: merge feature parameters into flags	Tejun Heo	2010-06-29	2	-20/+22
\| \| \| \| \| \| \| \| \|	Currently, __create_workqueue_key() takes @singlethread and @freezeable paramters and store them separately in workqueue_struct. Merge them into a single flags parameter and field and use WQ_FREEZEABLE and WQ_SINGLE_THREAD. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: misc/cosmetic updates	Tejun Heo	2010-06-29	2	-47/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the following updates in preparation of concurrency managed workqueue. None of these changes causes any visible behavior difference. * Add comments and adjust indentations to data structures and several functions. * Rename wq_per_cpu() to get_cwq() and swap the position of two parameters for consistency. Convert a direct per_cpu_ptr() access to wq->cpu_wq to get_cwq(). * Add work_static() and Update set_wq_data() such that it sets the flags part to WORK_STRUCT_PENDING \| WORK_STRUCT_STATIC if static \| @extra_flags. * Move santiy check on work->entry emptiness from queue_work_on() to __queue_work() which all queueing paths share. * Make __queue_work() take @cpu and @wq instead of @cwq. * Restructure flush_work() and __create_workqueue_key() to make them easier to modify. Signed-off-by: Tejun Heo <tj@kernel.org>
*	workqueue: kill RT workqueue	Tejun Heo	2010-06-29	2	-17/+9
\| \| \| \| \| \| \|	With stop_machine() converted to use cpu_stop, RT workqueue doesn't have any user left. Kill RT workqueue support. Signed-off-by: Tejun Heo <tj@kernel.org>
*	acpi: use queue_work_on() instead of binding workqueue worker to cpu0	Tejun Heo	2010-06-29	1	-29/+11
\| \| \| \| \| \| \| \| \| \| \|	ACPI works need to be executed on cpu0 and acpi/osl.c achieves this by creating singlethread workqueue and then binding it to cpu0 from a work which is quite unorthodox. Make it create regular workqueues and use queue_work_on() instead. This is in preparation of concurrency managed workqueue and the extra workers won't be a problem after it's implemented. Signed-off-by: Tejun Heo <tj@kernel.org>
*	kthread: implement kthread_data()	Tejun Heo	2010-06-29	2	-0/+16
\| \| \| \| \| \| \| \| \|	Implement kthread_data() which takes @task pointing to a kthread and returns @data specified when creating the kthread. The caller is responsible for ensuring the validity of @task when calling this function. Signed-off-by: Tejun Heo <tj@kernel.org>
*	ivtv: use kthread_worker instead of workqueue	Tejun Heo	2010-06-29	4	-27/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	Upcoming workqueue updates will no longer guarantee fixed workqueue to worker kthread association, so giving RT priority to the irq worker won't work. Use kthread_worker which guarantees specific kthread association instead. This also makes setting the priority cleaner. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: ivtv-devel@ivtvdriver.org Cc: linux-media@vger.kernel.org
*	kthread: implement kthread_worker	Tejun Heo	2010-06-29	2	-0/+213
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement simple work processor for kthread. This is to ease using kthread. Single thread workqueue used to be used for things like this but workqueue won't guarantee fixed kthread association anymore to enable worker sharing. This can be used in cases where specific kthread association is necessary, for example, when it should have RT priority or be assigned to certain cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
*	Merge branch 'sched-wq' of ../wq into cmwq-base	Tejun Heo	2010-06-13	9	-81/+203
\|\
\| *	sched: add hooks for workqueue	Tejun Heo	2010-06-08	4	-3/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Concurrency managed workqueue needs to know when workers are going to sleep and waking up. Using these two hooks, cmwq keeps track of the current concurrency level and throttles execution of new works if it's too high and wakes up another worker from the sleep hook if it becomes too low. This patch introduces PF_WQ_WORKER to identify workqueue workers and adds the following two hooks. * wq_worker_waking_up(): called when a worker is woken up. * wq_worker_sleeping(): called when a worker is going to sleep and may return a pointer to a local task which should be woken up. The returned task is woken up using try_to_wake_up_local() which is simplified ttwu which is called under rq lock and can only wake up local tasks. Both hooks are currently defined as noop in kernel/workqueue_sched.h. Later cmwq implementation will replace them with proper implementation. These hooks are hard coded as they'll always be enabled. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu>
\| *	sched: refactor try_to_wake_up()	Tejun Heo	2010-06-08	1	-34/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Factor ttwu_activate() and ttwu_woken_up() out of try_to_wake_up(). The factoring out doesn't affect try_to_wake_up() much code-generation-wise. Depending on configuration options, it ends up generating the same object code as before or slightly different one due to different register assignment. This is to help future implementation of try_to_wake_up_local(). Mike Galbraith suggested rename to ttwu_post_activation() from ttwu_woken_up() and comment update in try_to_wake_up(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu>
\| *	sched: adjust when cpu_active and cpuset configurations are updated during ↵	Tejun Heo	2010-06-08	5	-42/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cpu on/offlining Currently, when a cpu goes down, cpu_active is cleared before CPU_DOWN_PREPARE starts and cpuset configuration is updated from a default priority cpu notifier. When a cpu is coming up, it's set before CPU_ONLINE but cpuset configuration again is updated from the same cpu notifier. For cpu notifiers, this presents an inconsistent state. Threads which a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be migrated to other cpus because the cpu is no more inactive. Fix it by updating cpu_active in the highest priority cpu notifier and cpuset configuration in the second highest when a cpu is coming up. Down path is updated similarly. This guarantees that all other cpu notifiers see consistent cpu_active and cpuset configuration. cpuset_track_online_cpus() notifier is converted to cpuset_update_active_cpus() which just updates the configuration and now called from cpuset_cpu_[in]active() notifiers registered from sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus() degenerates into partition_sched_domains() making separate notifier for !CONFIG_CPUSETS unnecessary. This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug callback creates a kthread and kthread_bind()s it to the target cpu, and the thread is expected to run on that cpu. * Ingo's test discovered __cpuinit/exit markups were incorrect. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Menage <menage@google.com>
\| *	sched: define and use CPU_PRI_* enums for cpu notifier priorities	Tejun Heo	2010-06-08	3	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of hardcoding priority 10 and 20 in sched and perf, collect them into CPU_PRI_* enums. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
* \|	Linux 2.6.35-rc3v2.6.35-rc3	Linus Torvalds	2010-06-11	1	-1/+1
\| \|
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-06-11	14	-28/+103
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: wimax/i2400m: fix missing endian correction read in fw loader net8139: fix a race at the end of NAPI pktgen: Fix accuracy of inter-packet delay. pkt_sched: gen_estimator: add a new lock net: deliver skbs on inactive slaves to exact matches ipv6: fix ICMP6_MIB_OUTERRORS r8169: fix mdio_read and update mdio_write according to hw specs gianfar: Revive the driver for eTSEC devices (disable timestamping) caif: fix a couple range checks phylib: Add support for the LXT973 phy. net: Print num_rx_queues imbalance warning only when there are allocated queues
\| * \	Merge branch 'wimax-2.6.35.y' of ↵	David S. Miller	2010-06-11	1	-1/+1
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax
\| \| * \|	wimax/i2400m: fix missing endian correction read in fw loader	Inaky Perez-Gonzalez	2010-06-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	i2400m_fw_hdr_check() was accessing hardware field bcf_hdr->module_type (little endian 32) without converting to host byte sex. Reported-by: Данилин Михаил <mdanilin@nsg.net.ru> Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
\| * \| \|	net8139: fix a race at the end of NAPI	Figo.zhang	2010-06-10	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fix a race at the end of NAPI complete processing, it had better do __napi_complete() first before re-enable interrupt. Signed-off-by:Figo.zhang <figo1802@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	pktgen: Fix accuracy of inter-packet delay.	Daniel Turull	2010-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch correct a bug in the delay of pktgen. It makes sure the inter-packet interval is accurate. Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	pkt_sched: gen_estimator: add a new lock	Eric Dumazet	2010-06-10	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gen_kill_estimator() / gen_new_estimator() is not always called with RTNL held. net/netfilter/xt_RATEEST.c is one user of these API that do not hold RTNL, so random corruptions can occur between "tc" and "iptables". Add a new fine grained lock instead of trying to use RTNL in netfilter. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	net: deliver skbs on inactive slaves to exact matches	John Fastabend	2010-06-10	3	-6/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the accelerated receive path for VLAN's will drop packets if the real device is an inactive slave and is not one of the special pkts tested for in skb_bond_should_drop(). This behavior is different then the non-accelerated path and for pkts over a bonded vlan. For example, vlanx -> bond0 -> ethx will be dropped in the vlan path and not delivered to any packet handlers at all. However, bond0 -> vlanx -> ethx and bond0 -> ethx will be delivered to handlers that match the exact dev, because the VLAN path checks the real_dev which is not a slave and netif_recv_skb() doesn't drop frames but only delivers them to exact matches. This patch adds a sk_buff flag which is used for tagging skbs that would previously been dropped and allows the skb to continue to skb_netif_recv(). Here we add logic to check for the deliver_no_wcard flag and if it is set only deliver to handlers that match exactly. This makes both paths above consistent and gives pkt handlers a way to identify skbs that come from inactive slaves. Without this patch in some configurations skbs will be delivered to handlers with exact matches and in others be dropped out right in the vlan path. I have tested the following 4 configurations in failover modes and load balancing modes. # bond0 -> ethx # vlanx -> bond0 -> ethx # bond0 -> vlanx -> ethx # bond0 -> ethx \| vlanx -> -- Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	ipv6: fix ICMP6_MIB_OUTERRORS	Eric Dumazet	2010-06-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 1f8438a85366 (icmp: Account for ICMP out errors), I did a typo on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	r8169: fix mdio_read and update mdio_write according to hw specs	Timo Teräs	2010-06-09	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Realtek confirmed that a 20us delay is needed after mdio_read and mdio_write operations. Reduce the delay in mdio_write, and add it to mdio_read too. Also add a comment that the 20us is from hw specs. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \|	Merge branch 'num_rx_queues' of git://kernel.ubuntu.com/rtg/net-2.6	David S. Miller	2010-06-09	1	-5/+3
\| \|\ \ \
\| \| * \| \|	net: Print num_rx_queues imbalance warning only when there are allocated queues	Tim Gardner	2010-06-09	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BugLink: http://bugs.launchpad.net/bugs/591416 There are a number of network drivers (bridge, bonding, etc) that are not yet receive multi-queue enabled and use alloc_netdev(), so don't print a num_rx_queues imbalance warning in that case. Also, only print the warning once for those drivers that _are_ multi-queue enabled. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
\| * \| \| \|	gianfar: Revive the driver for eTSEC devices (disable timestamping)	Anton Vorontsov	2010-06-09	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit cc772ab7cdcaa24d1fae332d92a1602788644f7a ("gianfar: Add hardware RX timestamping support"), the driver no longer works on at least MPC8313ERDB and MPC8568EMDS boards (and possibly much more boards as well). That's how MPC8313 Reference Manual describes RCTRL_TS_ENABLE bit: Timestamp incoming packets as padding bytes. PAL field is set to 8 if the PAL field is programmed to less than 8. Must be set to zero if TMR_CTRL[TE]=0. I see that the commit above sets this bit, but it doesn't handle TMR_CTRL. Manfred probably had this bit set by the firmware for his boards. But obviously this isn't true for all boards in the wild. Also, I recall that Freescale BSPs were explicitly disabling the timestamping because of a performance drop. For now, the best way to deal with this is just disable the timestamping, and later we can discuss proper device tree bindings and implement enabling this feature via some property. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	caif: fix a couple range checks	Dan Carpenter	2010-06-09	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The extra ! character means that these conditions are always false. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	phylib: Add support for the LXT973 phy.	Richard Cochran	2010-06-09	1	-1/+50
\| \|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a work around for Erratum 5, "3.3 V Fiber Speed Selection." If the hardware wiring does not respect this erratum, then fiber optic mode will not work properly. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \| \|	Merge branch 'pm-fixes' of ↵	Linus Torvalds	2010-06-11	3	-0/+8
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / x86: Save/restore MISC_ENABLE register