| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull audit updates from Eric Paris:
"So this change across a whole bunch of arches really solves one basic
problem. We want to audit when seccomp is killing a process. seccomp
hooks in before the audit syscall entry code. audit_syscall_entry
took as an argument the arch of the given syscall. Since the arch is
part of what makes a syscall number meaningful it's an important part
of the record, but it isn't available when seccomp shoots the
syscall...
For most arch's we have a better way to get the arch (syscall_get_arch)
So the solution was two fold: Implement syscall_get_arch() everywhere
there is audit which didn't have it. Use syscall_get_arch() in the
seccomp audit code. Having syscall_get_arch() everywhere meant it was
a useless flag on the stack and we could get rid of it for the typical
syscall entry.
The other changes inside the audit system aren't grand, fixed some
records that had invalid spaces. Better locking around the task comm
field. Removing some dead functions and structs. Make some things
static. Really minor stuff"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: rename audit_log_remove_rule to disambiguate for trees
audit: cull redundancy in audit_rule_change
audit: WARN if audit_rule_change called illegally
audit: put rule existence check in canonical order
next: openrisc: Fix build
audit: get comm using lock to avoid race in string printing
audit: remove open_arg() function that is never used
audit: correct AUDIT_GET_FEATURE return message type
audit: set nlmsg_len for multicast messages.
audit: use union for audit_field values since they are mutually exclusive
audit: invalid op= values for rules
audit: use atomic_t to simplify audit_serial()
kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
audit: reduce scope of audit_log_fcaps
audit: reduce scope of audit_net_id
audit: arm64: Remove the audit arch argument to audit_syscall_entry
arm64: audit: Add audit hook in syscall_trace_enter/exit()
audit: x86: drop arch from __audit_syscall_entry() interface
sparc: implement is_32bit_task
sparc: properly conditionalize use of TIF_32BIT
...
|
| |
| |
| |
| |
| |
| |
| |
| | |
Rename audit_log_remove_rule() to audit_tree_log_remove_rule() to avoid
confusion with watch and mark rule removal/changes.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Re-factor audit_rule_change() to reduce the amount of code redundancy and
simplify the logic.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
| |
| |
| |
| | |
Signed-off-by: Eric Paris <eparis@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Use same rule existence check order as audit_make_tree(), audit_to_watch(),
update_lsm_rule() for legibility.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When task->comm is passed directly to audit_log_untrustedstring() without
getting a copy or using the task_lock, there is a race that could happen that
would output a NULL (\0) in the output string that would effectively truncate
the rest of the report text after the comm= field in the audit, losing fields.
Use get_task_comm() to get a copy while acquiring the task_lock to prevent
this and to prevent the result from being a mixture of old and new values of
comm.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
open_arg() was added in commit 55669bfa "audit: AUDIT_PERM support"
and never used. Remove it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it
should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct
audit_feature. The current reply is a message tagged as an AUDIT_GET
type with a struct audit_feature.
This appears to have been a cut-and-paste-eo in commit b0fed40.
Reported-by: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Report:
Looking at your example code in
http://people.redhat.com/rbriggs/audit-multicast-listen/audit-multicast-listen.c,
it seems that nlmsg_len field in the received messages is supposed to
contain the length of the header + payload, but it is always set to the
size of the header only, i.e. 16. The example program works, because
the printf format specifies the minimum width, not "precision", so it
simply prints out the payload until the first zero byte. This isn't too
much of a problem, but precludes the use of recvmmsg, iiuc?
(gdb) p *(struct nlmsghdr*)nlh
$14 = {nlmsg_len = 16, nlmsg_type = 1100, nlmsg_flags = 0, nlmsg_seq = 0, nlmsg_pid = 9910}
The only time nlmsg_len would have been updated was at audit_buffer_alloc()
inside audit_log_start() and never updated after. It should arguably be done
in audit_log_vformat(), but would be more efficient in audit_log_end().
Reported-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
Since only one of val, uid, gid and lsm* are used at any given time, combine
them to reduce the size of the struct audit_field.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Various audit events dealing with adding, removing and updating rules result in
invalid values set for the op keys which result in embedded spaces in op=
values.
The invalid values are
op="add rule" set in kernel/auditfilter.c
op="remove rule" set in kernel/auditfilter.c
op="remove rule" set in kernel/audit_tree.c
op="updated rules" set in kernel/audit_watch.c
op="remove rule" set in kernel/audit_watch.c
Replace the space in the above values with an underscore character ('_').
Coded-by: Burn Alting <burn@swtf.dyndns.org>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| | |
Since there is already a primitive to do this operation in the atomic_t, use it
to simplify audit_serial().
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use kernel.h definition.
Cc: Eric Paris <eparis@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| | |
audit_log_fcaps() isn't used outside kernel/audit.c. Reduce its scope.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| | |
audit_net_id isn't used outside kernel/audit.c. Reduce its scope.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since the arch is found locally in __audit_syscall_entry(), there is no need to
pass it in as a parameter. Delete it from the parameter list.
x86* was the only arch to call __audit_syscall_entry() directly and did so from
assembly code.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-audit@redhat.com
Signed-off-by: Eric Paris <eparis@redhat.com>
---
As this patch relies on changes in the audit tree, I think it
appropriate to send it through my tree rather than the x86 tree.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The AUDIT_SECCOMP record looks something like this:
type=SECCOMP msg=audit(1373478171.953:32775): auid=4325 uid=4325 gid=4325 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0 pid=12381 comm="test" sig=31 syscall=231 compat=0 ip=0x39ea8bca89 code=0x0
In order to determine what syscall 231 maps to, we need to have the arch= field right before it.
To see the event, compile this test.c program:
=====
int main(void)
{
return seccomp_load(seccomp_init(SCMP_ACT_KILL));
}
=====
gcc -g test.c -o test -lseccomp
After running the program, find the record by: ausearch --start recent -m SECCOMP -i
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
signed-off-by: Eric Paris <eparis@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
directly
Since every arch should have syscall_get_arch() defined, stop using the
function argument and just collect this ourselves. We do not drop the
argument as fixing some code paths (in assembly) to not pass this first
argument is non-trivial. The argument will be dropped when that is
fixed.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fix from Rusty Russell:
"A single panic fix for a rare race, stable CC'd"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
modules, lock around setting of MODULE_STATE_UNFORMED
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A panic was seen in the following sitation.
There are two threads running on the system. The first thread is a system
monitoring thread that is reading /proc/modules. The second thread is
loading and unloading a module (in this example I'm using my simple
dummy-module.ko). Note, in the "real world" this occurred with the qlogic
driver module.
When doing this, the following panic occurred:
------------[ cut here ]------------
kernel BUG at kernel/module.c:3739!
invalid opcode: 0000 [#1] SMP
Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ #7
Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
RIP: 0010:[<ffffffff810d64c5>] [<ffffffff810d64c5>] module_flags+0xb5/0xc0
RSP: 0018:ffff88080fa7fe18 EFLAGS: 00010246
RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
FS: 00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
Call Trace:
[<ffffffff810d666c>] m_show+0x19c/0x1e0
[<ffffffff811e4d7e>] seq_read+0x16e/0x3b0
[<ffffffff812281ed>] proc_reg_read+0x3d/0x80
[<ffffffff811c0f2c>] vfs_read+0x9c/0x170
[<ffffffff811c1a58>] SyS_read+0x58/0xb0
[<ffffffff81605829>] system_call_fastpath+0x16/0x1b
Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
RIP [<ffffffff810d64c5>] module_flags+0xb5/0xc0
RSP <ffff88080fa7fe18>
Consider the two processes running on the system.
CPU 0 (/proc/modules reader)
CPU 1 (loading/unloading module)
CPU 0 opens /proc/modules, and starts displaying data for each module by
traversing the modules list via fs/seq_file.c:seq_open() and
fs/seq_file.c:seq_read(). For each module in the modules list, seq_read
does
op->start() <-- this is a pointer to m_start()
op->show() <- this is a pointer to m_show()
op->stop() <-- this is a pointer to m_stop()
The m_start(), m_show(), and m_stop() module functions are defined in
kernel/module.c. The m_start() and m_stop() functions acquire and release
the module_mutex respectively.
ie) When reading /proc/modules, the module_mutex is acquired and released
for each module.
m_show() is called with the module_mutex held. It accesses the module
struct data and attempts to write out module data. It is in this code
path that the above BUG_ON() warning is encountered, specifically m_show()
calls
static char *module_flags(struct module *mod, char *buf)
{
int bx = 0;
BUG_ON(mod->state == MODULE_STATE_UNFORMED);
...
The other thread, CPU 1, in unloading the module calls the syscall
delete_module() defined in kernel/module.c. The module_mutex is acquired
for a short time, and then released. free_module() is called without the
module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED,
also without the module_mutex. Some additional code is called and then the
module_mutex is reacquired to remove the module from the modules list:
/* Now we can delete it from the lists */
mutex_lock(&module_mutex);
stop_machine(__unlink_module, mod, NULL);
mutex_unlock(&module_mutex);
This is the sequence of events that leads to the panic.
CPU 1 is removing dummy_module via delete_module(). It acquires the
module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to
MODULE_STATE_UNFORMED yet.
CPU 0, which is reading the /proc/modules, acquires the module_mutex and
acquires a pointer to the dummy_module which is still in the modules list.
CPU 0 calls m_show for dummy_module. The check in m_show() for
MODULE_STATE_UNFORMED passed for dummy_module even though it is being
torn down.
Meanwhile CPU 1, which has been continuing to remove dummy_module without
holding the module_mutex, now calls free_module() and sets
dummy_module->state to MODULE_STATE_UNFORMED.
CPU 0 now calls module_flags() with dummy_module and ...
static char *module_flags(struct module *mod, char *buf)
{
int bx = 0;
BUG_ON(mod->state == MODULE_STATE_UNFORMED);
and BOOM.
Acquire and release the module_mutex lock around the setting of
MODULE_STATE_UNFORMED in the teardown path, which should resolve the
problem.
Testing: In the unpatched kernel I can panic the system within 1 minute by
doing
while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done
and
while (true) do cat /proc/modules; done
in separate terminals.
In the patched kernel I was able to run just over one hour without seeing
any issues. I also verified the output of panic via sysrq-c and the output
of /proc/modules looks correct for all three states for the dummy_module.
dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd07f (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.
However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).
Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Matteo Franchin <Matteo.Franchin@arm.com>
Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
__get_cpu_var can paper over differences in the definitions of
cpumask_var_t and either use the address of the cpumask variable
directly or perform a fetch of the address of the struct cpumask
allocated elsewhere. This is important particularly when using per cpu
cpumask_var_t declarations because in one case we have an offset into
a per cpu area to handle and in the other case we need to fetch a
pointer from the offset.
This patch introduces a new macro
this_cpu_cpumask_var_ptr()
that is defined where cpumask_var_t is defined and performs the proper
actions. All use cases where __get_cpu_var is used with cpumask_var_t
are converted to the use of this_cpu_cpumask_var_ptr().
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
[ARM specific]
These are generally replaced with raw_cpu_ptr. However, in
gic_get_percpu_base() we immediately dereference the pointer. This is
equivalent to a raw_cpu_read. So use that operation there.
Cc: nicolas.pitre@linaro.org
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Most of these are the uses of &__raw_get_cpu_var for address calculation.
touch_softlockup_watchdog_sync() uses __raw_get_cpu_var to write to
per cpu variables. Use __this_cpu_write instead.
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: linux-watchdog@vger.kernel.org
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Convert all uses of __get_cpu_var for address calculation to use
this_cpu_ptr instead.
[Uses of __get_cpu_var with cpumask_var_t are no longer
handled by this patch]
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Convert uses of __get_cpu_var for creating a address from a percpu
offset to this_cpu_ptr.
The two cases where get_cpu_var is used to actually access a percpu
variable are changed to use this_cpu_read/raw_cpu_read.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Replace uses of __get_cpu_var for address calculation with this_cpu_ptr.
Cc: akpm@linux-foundation.org
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Merge second patch-bomb from Andrew Morton:
- a few hotfixes
- drivers/dma updates
- MAINTAINERS updates
- Quite a lot of lib/ updates
- checkpatch updates
- binfmt updates
- autofs4
- drivers/rtc/
- various small tweaks to less used filesystems
- ipc/ updates
- kernel/watchdog.c changes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
frv: remove unused declarations of __start___ex_table and __stop___ex_table
kvm: ensure hard lockup detection is disabled by default
kernel/watchdog.c: control hard lockup detection default
staging: rtl8192u: use %*pEn to escape buffer
staging: rtl8192e: use %*pEn to escape buffer
staging: wlan-ng: use %*pEhp to print SN
lib80211: remove unused print_ssid()
wireless: hostap: proc: print properly escaped SSID
wireless: ipw2x00: print SSID via %*pE
wireless: libertas: print esaped string via %*pE
lib/vsprintf: add %*pE[achnops] format specifier
lib / string_helpers: introduce string_escape_mem()
lib / string_helpers: refactoring the test suite
lib / string_helpers: move documentation to c-file
include/linux: remove strict_strto* definitions
arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
fs: check bh blocknr earlier when searching lru
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Consolidate the various external const and non-const declarations of
__start___param[] and __stop___param in <linux/moduleparam.h>. This
requires making a few struct kernel_param pointers in kernel/params.c
const.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In some cases we don't want hard lockup detection enabled by default.
An example is when running as a guest. Introduce
watchdog_enable_hardlockup_detector(bool)
allowing those cases to disable hard lockup detection. This must be
executed early by the boot processor from e.g. smp_prepare_boot_cpu, in
order to allow kernel command line arguments to override it, as well as
to avoid hard lockup detection being enabled before we've had a chance
to indicate that it's unwanted. In summary,
initial boot: default=enabled
smp_prepare_boot_cpu
watchdog_enable_hardlockup_detector(false): default=disabled
cmdline has 'nmi_watchdog=1': default=enabled
The running kernel still has the ability to enable/disable at any time
with /proc/sys/kernel/nmi_watchdog us usual. However even when the
default has been overridden /proc/sys/kernel/nmi_watchdog will initially
show '1'. To truly turn it on one must disable/enable it, i.e.
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog
This patch will be immediately useful for KVM with the next patch of this
series. Other hypervisor guest types may find it useful as well.
[akpm@linux-foundation.org: fix build]
[dzickus@redhat.com: fix compile issues on sparc]
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.
To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We have a large university system in the UK that is experiencing very long
delays modprobing the driver for a specific I/O device. The delay is from
8-10 minutes per device and there are 31 devices in the system. This 4 to
5 hour delay in starting up those I/O devices is very much a burden on the
customer.
There are two causes for requiring a restart/reload of the drivers. First
is periodic preventive maintenance (PM) and the second is if any of the
devices experience a fatal error. Both of these trigger this excessively
long delay in bringing the system back up to full capability.
The problem was tracked down to a very slow IOREMAP operation and the
excessively long ioresource lookup to insure that the user is not
attempting to ioremap RAM. These patches provide a speed up to that
function.
The modprobe time appears to be affected quite a bit by previous activity
on the ioresource list, which I suspect is due to cache preloading. While
the overall improvement is impacted by other overhead of starting the
devices, this drastically improves the modprobe time.
Also our system is considerably smaller so the percentages gained will not
be the same. Best case improvement with the modprobe on our 20 device
smallish system was from 'real 5m51.913s' to 'real 0m18.275s'.
This patch (of 2):
Since the ioremap operation is verifying that the specified address range
is NOT RAM, it will search the entire ioresource list if the condition is
true. To make matters worse, it does this one 4k page at a time. For a
128M BAR region this is 32 passes to determine the entire region does not
contain any RAM addresses.
This patch provides another resource lookup function, region_is_ram, that
searches for the entire region specified, verifying that it is completely
contained within the resource region. If it is found, then it is checked
to be RAM or not, within a single pass.
The return result reflects if it was found or not (-1), and whether it is
RAM (1) or not (0). This allows the caller to fallback to the previous
page by page search if it was not found.
[akpm@linux-foundation.org: fix spellos and typos in comment]
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Cliff Wickman <cpw@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is a cleanup. In function parse_crashkernel_suffix, the parameter
crash_base is not used. So here remove it.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In locate_mem_hole functions, a memory hole is located and added as
kexec_segment. But from the name of locate_mem_hole, it should only take
responsibility of searching a available memory hole to contain data of a
specified size.
So in this patch add a new field 'mem' into kexec_buf, then take that
kexec segment adding code out of locate_mem_hole_top_down and
locate_mem_hole_bottom_up. This make clear of the functionality of
locate_mem_hole just like it declars to do. And by this
locate_mem_hole_callback chould be used later if anyone want to locate a
memory hole for other use.
Meanwhile Vivek suggested opening code function __kexec_add_segment(),
that way we have to retreive ksegment pointer once and it is easy to read.
So just do it in this patch and remove __kexec_add_segment() since no one
use it anymore.
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Reduce boilerplate code by using __seq_open_private() instead of
seq_open() in kallsyms_open().
Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Cc: Gideon Israel Dsouza <gidisrael@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit 458df9fd4815 ("printk: remove separate printk_sched buffers and use
printk buf instead") hardcodes printk_deferred() to KERN_WARNING and
inserts the string "[sched_delayed] " before the actual message. However
it doesn't take into account the KERN_* prefix of the message, that now
ends up in the middle of the output:
[sched_delayed] ^a4CE: hpet increased min_delta_ns to 20115 nsec
Fix this by just getting rid of the "[sched_delayed] " scnprintf(). The
prefix is useless since 458df9fd4815 anyway since from that moment
printk_deferred() inserts the message into the kernel printk buffer
immediately. So if the message eventually gets printed to console, it is
printed in the correct order with other messages and there's no need for
any special prefix. And if the kernel crashes before the message makes it
to console, then prefix in the printk buffer doesn't make the situation
any better.
Link: http://lkml.org/lkml/2014/9/14/4
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When configuring a uniprocessor kernel, don't bother the user with an
irrelevant LOG_CPU_MAX_BUF_SHIFT question, and don't build the unused
code.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"This patch set contains the main portion of the changes for 3.18 in
regard to the s390 architecture. It is a bit bigger than usual,
mainly because of a new driver and the vector extension patches.
The interesting bits are:
- Quite a bit of work on the tracing front. Uprobes is enabled and
the ftrace code is reworked to get some of the lost performance
back if CONFIG_FTRACE is enabled.
- To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
IPTE range facility is added.
- The rwlock code is re-factored to improve writer fairness and to be
able to use the interlocked-access instructions.
- The kernel part for the support of the vector extension is added.
- The device driver to access the CD/DVD on the HMC is added, this
will hopefully come in handy to improve the installation process.
- Add support for control-unit initiated reconfiguration.
- The crypto device driver is enhanced to enable the additional AP
domains and to allow the new crypto hardware to be used.
- Bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
s390/ftrace: remove 31 bit ftrace support
s390/kdump: add support for vector extension
s390/disassembler: add vector instructions
s390: add support for vector extension
s390/zcrypt: Toleration of new crypto hardware
s390/idle: consolidate idle functions and definitions
s390/nohz: use a per-cpu flag for arch_needs_cpu
s390/vtime: do not reset idle data on CPU hotplug
s390/dasd: add support for control unit initiated reconfiguration
s390/dasd: fix infinite loop during format
s390/mm: make use of ipte range facility
s390/setup: correct 4-level kernel page table detection
s390/topology: call set_sched_topology early
s390/uprobes: architecture backend for uprobes
s390/uprobes: common library for kprobes and uprobes
s390/rwlock: use the interlocked-access facility 1 instructions
s390/rwlock: improve writer fairness
s390/rwlock: remove interrupt-enabling rwlock variant.
s390/mm: remove change bit override support
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Move the nohz_delay bit from the s390_idle data structure to the
per-cpu flags. Clear the nohz delay flag in __cpu_disable and
remove the cpu hotplug notifier that used to do this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|\ \ \ \ \ \
| |_|/ / / /
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 seccomp changes from Ingo Molnar:
"This tree includes x86 seccomp filter speedups and related preparatory
work, which touches core seccomp facilities as well.
The main idea is to split seccomp into two phases, to be able to enter
a simple fast path for syscalls with ptrace side effects.
There's no substantial user-visible (and ABI) effects expected from
this, except a change in how we emit a better audit record for
SECCOMP_RET_TRACE events"
* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
x86: Split syscall_trace_enter into two phases
x86, entry: Only call user_exit if TIF_NOHZ
x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
seccomp: Document two-phase seccomp and arch-provided seccomp_data
seccomp: Allow arch code to provide seccomp_data
seccomp: Refactor the filter callback and the API
seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
populate_seccomp_data is expensive: it works by inspecting
task_pt_regs and various other bits to piece together all the
information, and it's does so in multiple partially redundant steps.
Arch-specific code in the syscall entry path can do much better.
Admittedly this adds a bit of additional room for error, but the
speedup should be worth it.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The reason I did this is to add a seccomp API that will be usable
for an x86 fast path. The x86 entry code needs to use a rather
expensive slow path for a syscall that might be visible to things
like ptrace. By splitting seccomp into two phases, we can check
whether we need the slow path and then use the fast path in if the
filter allows the syscall or just returns some errno.
As a side effect, I think the new code is much easier to understand
than the old code.
This has one user-visible effect: the audit record written for
SECCOMP_RET_TRACE is now a simple indication that SECCOMP_RET_TRACE
happened. It used to depend in a complicated way on what the tracer
did. I couldn't make much sense of it.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The secure_computing function took a syscall number parameter, but
it only paid any attention to that parameter if seccomp mode 1 was
enabled. Rather than coming up with a kludge to get the parameter
to work in mode 2, just remove the parameter.
To avoid churn in arches that don't have seccomp filters (and may
not even support syscall_get_nr right now), this leaves the
parameter in secure_computing_strict, which is now a real function.
For ARM, this is a bit ugly due to the fact that ARM conditionally
supports seccomp filters. Fixing that would probably only be a
couple of lines of code, but it should be coordinated with the audit
maintainers.
This will be a slight slowdown on some arches. The right fix is to
pass in all of seccomp_data instead of trying to make just the
syscall nr part be fast.
This is a prerequisite for making two-phase seccomp work cleanly.
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)
- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)
- sched/numa updates and optimizations (Rik van Riel)
- sysbench speedup (Vincent Guittot)
- capacity calculation cleanups/refactoring (Vincent Guittot)
- Various cleanups to thread group iteration (Oleg Nesterov)
- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)
- various sched/deadline fixes
... and lots of other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
rq->rd is freed using call_rcu_sched(), so rcu_read_lock() to access it
is not enough. We should use either rcu_read_lock_sched() or preempt_disable().
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: 66339c31bc39 "sched: Use dl_bw_of() under RCU read lock"
Link: http://lkml.kernel.org/r/1412065417.20287.24.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
We already reschedule env.dst_cpu in attach_tasks()->check_preempt_curr()
if this is necessary.
Furthermore, a higher priority class task may be current on dest rq,
we shouldn't disturb it.
Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140930210441.5258.55054.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
On 32 bit systems cmpxchg cannot handle 64 bit values, so
some additional magic is required to allow a 32 bit system
with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build.
Make sure the correct cmpxchg function is used when doing
an atomic swap of a cputime_t.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux390@de.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Since commit caeb178c60f4 ("sched/fair: Make update_sd_pick_busiest() ...")
sd_pick_busiest returns a group that can be neither imbalanced nor overloaded
but is only more loaded than others. This change has been introduced to ensure
a better load balance in system that are not overloaded but as a side effect,
it can also generate useless active migration between groups.
Let take the example of 3 tasks on a quad cores system. We will always have an
idle core so the load balance will find a busiest group (core) whenever an ILB
is triggered and it will force an active migration (once above
nr_balance_failed threshold) so the idle core becomes busy but another core
will become idle. With the next ILB, the freshly idle core will try to pull the
task of a busy CPU.
The number of spurious active migration is not so huge in quad core system
because the ILB is not triggered so much. But it becomes significant as soon as
you have more than one sched_domain level like on a dual cluster of quad cores
where the ILB is triggered every tick when you have more than 1 busy_cpu
We need to ensure that the migration generate a real improveùent and will not
only move the avg_load imbalance on another CPU.
Before caeb178c60f4f93f1b45c0bc056b5cf6d217b67f, the filtering of such use
case was ensured by the following test in f_b_g:
if ((local->idle_cpus < busiest->idle_cpus) &&
busiest->sum_nr_running <= busiest->group_weight)
This patch modified the condition to take into account situation where busiest
group is not overloaded: If the diff between the number of idle cpus in 2
groups is less than or equal to 1 and the busiest group is not overloaded,
moving a task will not improve the load balance but just move it.
A test with sysbench on a dual clusters of quad cores gives the following
results:
command: sysbench --test=cpu --num-threads=5 --max-time=5 run
The HZ is 200 which means that 1000 ticks has fired during the test.
With Mainline, perf gives the following figures:
Samples: 727 of event 'sched:sched_migrate_task'
Event count (approx.): 727
Overhead Command Shared Object Symbol
........ ............... ............. ..............
12.52% migration/1 [unknown] [.] 00000000
12.52% migration/5 [unknown] [.] 00000000
12.52% migration/7 [unknown] [.] 00000000
12.10% migration/6 [unknown] [.] 00000000
11.83% migration/0 [unknown] [.] 00000000
11.83% migration/3 [unknown] [.] 00000000
11.14% migration/4 [unknown] [.] 00000000
10.87% migration/2 [unknown] [.] 00000000
2.75% sysbench [unknown] [.] 00000000
0.83% swapper [unknown] [.] 00000000
0.55% ktps65090charge [unknown] [.] 00000000
0.41% mmcqd/1 [unknown] [.] 00000000
0.14% perf [unknown] [.] 00000000
With this patch, perf gives the following figures
Samples: 20 of event 'sched:sched_migrate_task'
Event count (approx.): 20
Overhead Command Shared Object Symbol
........ ............... ............. ..............
80.00% sysbench [unknown] [.] 00000000
10.00% swapper [unknown] [.] 00000000
5.00% ktps65090charge [unknown] [.] 00000000
5.00% migration/1 [unknown] [.] 00000000
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1412170735-5356-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|