summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | Btrfs: only drop modified extents if we logged the whole inodeJosef Bacik2013-11-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we fsync, seek and write, rename and then fsync again we will lose the modified hole extent because the rename will drop all of the modified extents since we didn't do the fast search. We need to only drop the modified extents if we didn't do the fast search and we were logging the entire inode as we don't need them anymore, otherwise this is being premature. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * | | | | | | Btrfs: make sure to copy everything if we renameJosef Bacik2013-11-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we rename a file that is already in the log and we fsync again we will lose the new name. This is because we just log the inode update and not the new ref. To fix this we just need to check if we are logging the new name of the inode and copy all the metadata instead of just updating the inode itself. With this patch my testcase now passes. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
| * | | | | | | Btrfs: don't BUG_ON() if we get an error walking backrefsJosef Bacik2013-11-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can just return false for this so we stop doing the snapshot aware defrag stuff. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
* | | | | | | | Merge tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfsLinus Torvalds2013-11-225-23/+43
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second xfs update from Ben Myers: "There are a couple of patches that I wasn't quite sure about in time for our initial 3.13 pull request, a bugfix, and an update to add Dave to MAINTAINERS: Here we have a performance fix for inode iversion, increased inode cluster size for v5 superblock filesystems, a fix for error handling in xfs_bmap_add_attrfork, and a MAINTAINERS update to add Dave" * tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfs: xfs: open code inc_inode_iversion when logging an inode xfs: increase inode cluster size for v5 filesystems xfs: fix unlock in xfs_bmap_add_attrfork xfs: update maintainers
| * | | | | | | | xfs: open code inc_inode_iversion when logging an inodeDave Chinner2013-11-181-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael L Semon reported that generic/069 runtime increased on v5 superblocks by 100% compared to v4 superblocks. his perf-based analysis pointed directly at the timestamp updates being done by the write path in this workload. The append writers are doing 4-byte writes, so there are lots of timestamp updates occurring. The thing is, they aren't being triggered by timestamp changes - they are being triggered by the inode change counter needing to be updated. That is, every write(2) system call needs to bump the inode version count, and it does that through the timestamp update mechanism. Hence for v5 filesystems, test generic/069 is running 3 orders of magnitude more timestmap update transactions on v5 filesystems due to the fact it does a huge number of *4 byte* write(2) calls. This isn't a real world scenario we really need to address - anyone doing such sequential IO should be using fwrite(3), not write(2). i.e. fwrite(3) buffers the writes in userspace to minimise the number of write(2) syscalls, and the problem goes away. However, there is a small change we can make to improve the situation - removing the expensive lock operation on the change counter update. All inode version counter changes in XFS occur under the ip->i_ilock during a transaction, and therefore we don't actually need the spin lock that provides exclusive access to it through inc_inode_iversion(). Hence avoid the lock and just open code the increment ourselves when logging the inode. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | | | | xfs: increase inode cluster size for v5 filesystemsDave Chinner2013-11-183-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v5 filesystems use 512 byte inodes as a minimum, so read inodes in clusters that are effectively half the size of a v4 filesystem with 256 byte inodes. For v5 fielsystems, scale the inode cluster size with the size of the inode so that we keep a constant 32 inodes per cluster ratio for all inode IO. This only works if mkfs.xfs sets the inode alignment appropriately for larger inode clusters, so this functionality is made conditional on mkfs doing the right thing. xfs_repair needs to know about the inode alignment changes, too. Wall time: create bulkstat find+stat ls -R unlink v4 237s 161s 173s 201s 299s v5 235s 163s 205s 31s 356s patched 234s 160s 182s 29s 317s System time: create bulkstat find+stat ls -R unlink v4 2601s 2490s 1653s 1656s 2960s v5 2637s 2497s 1681s 20s 3216s patched 2613s 2451s 1658s 20s 3007s So, wall time same or down across the board, system time same or down across the board, and cache hit rates all improve except for the ls -R case which is a pure cold cache directory read workload on v5 filesystems... So, this patch removes most of the performance and CPU usage differential between v4 and v5 filesystems on traversal related workloads. Note: while this patch is currently for v5 filesystems only, there is no reason it can't be ported back to v4 filesystems. This hasn't been done here because bringing the code back to v4 requires forwards and backwards kernel compatibility testing. i.e. to deterine if older kernels(*) do the right thing with larger inode alignments but still only using 8k inode cluster sizes. None of this testing and validation on v4 filesystems has been done, so for the moment larger inode clusters is limited to v5 superblocks. (*) a current default config v4 filesystem should mount just fine on 2.6.23 (when lazy-count support was introduced), and so if we change the alignment emitted by mkfs without a feature bit then we have to make sure it works properly on all kernels since 2.6.23. And if we allow it to be changed when the lazy-count bit is not set, then it's all kernels since v2 logs were introduced that need to be tested for compatibility... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
| * | | | | | | | xfs: fix unlock in xfs_bmap_add_attrforkMark Tinguely2013-11-181-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_trans_ijoin() activates the inode in a transaction and also can specify which lock to free when the transaction is committed or canceled. xfs_bmap_add_attrfork call locks and adds the lock to the transaction but also manually removes the lock. Change the routine to not add the lock to the transaction and manually remove lock on completion. While here, clean up the xfs_trans_cancel flags and goto names. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* | | | | | | | | Merge branch 'akpm' (fixes from Andrew)Linus Torvalds2013-11-211-2/+14
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge patches from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: place page->pmd_huge_pte to right union MAINTAINERS: add keyboard driver to Hyper-V file list x86, mm: do not leak page->ptl for pmd page tables ipc,shm: correct error return value in shmctl (SHM_UNLOCK) mm, mempolicy: silence gcc warning block/partitions/efi.c: fix bound check ARM: drivers/rtc/rtc-at91rm9200.c: disable interrupts at shutdown mm: hugetlbfs: fix hugetlbfs optimization kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly ipc,shm: fix shm_file deletion races mm: thp: give transparent hugepage code a separate copy_page checkpatch: fix "Use of uninitialized value" warnings configfs: fix race between dentry put and lookup
| * | | | | | | | | configfs: fix race between dentry put and lookupJunxiao Bi2013-11-211-2/+14
| | |_|/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A race window in configfs, it starts from one dentry is UNHASHED and end before configfs_d_iput is called. In this window, if a lookup happen, since the original dentry was UNHASHED, so a new dentry will be allocated, and then in configfs_attach_attr(), sd->s_dentry will be updated to the new dentry. Then in configfs_d_iput(), BUG_ON(sd->s_dentry != dentry) will be triggered and system panic. sys_open: sys_close: ... fput dput dentry_kill __d_drop <--- dentry unhashed here, but sd->dentry still point to this dentry. lookup_real configfs_lookup configfs_attach_attr---> update sd->s_dentry to new allocated dentry here. d_kill configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry) triggered here. To fix it, change configfs_d_iput to not update sd->s_dentry if sd->s_count > 2, that means there are another dentry is using the sd beside the one that is going to be put. Use configfs_dirent_lock in configfs_attach_attr to sync with configfs_d_iput. With the following steps, you can reproduce the bug. 1. enable ocfs2, this will mount configfs at /sys/kernel/config and fill configure in it. 2. run the following script. while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done & Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | Merge git://git.infradead.org/users/eparis/auditLinus Torvalds2013-11-213-8/+12
|\ \ \ \ \ \ \ \ \ | |/ / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull audit updates from Eric Paris: "Nothing amazing. Formatting, small bug fixes, couple of fixes where we didn't get records due to some old VFS changes, and a change to how we collect execve info..." Fixed conflict in fs/exec.c as per Eric and linux-next. * git://git.infradead.org/users/eparis/audit: (28 commits) audit: fix type of sessionid in audit_set_loginuid() audit: call audit_bprm() only once to add AUDIT_EXECVE information audit: move audit_aux_data_execve contents into audit_context union audit: remove unused envc member of audit_aux_data_execve audit: Kill the unused struct audit_aux_data_capset audit: do not reject all AUDIT_INODE filter types audit: suppress stock memalloc failure warnings since already managed audit: log the audit_names record type audit: add child record before the create to handle case where create fails audit: use given values in tty_audit enable api audit: use nlmsg_len() to get message payload length audit: use memset instead of trying to initialize field by field audit: fix info leak in AUDIT_GET requests audit: update AUDIT_INODE filter rule to comparator function audit: audit feature to set loginuid immutable audit: audit feature to only allow unsetting the loginuid audit: allow unsetting the loginuid (with priv) audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE audit: loginuid functions coding style selinux: apply selinux checks on new audit message types ...
| * | | | | | | | audit: call audit_bprm() only once to add AUDIT_EXECVE informationRichard Guy Briggs2013-11-051-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the audit_bprm() call from search_binary_handler() to exec_binprm(). This allows us to get rid of the mm member of struct audit_aux_data_execve since bprm->mm will equal current->mm. This also mitigates the issue that ->argc could be modified by the load_binary() call in search_binary_handler(). audit_bprm() was being called to add an AUDIT_EXECVE record to the audit context every time search_binary_handler() was recursively called. Only one reference is necessary. Reported-by: Oleg Nesterov <onestero@redhat.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> --- This patch is against 3.11, but was developed on Oleg's post-3.11 patches that introduce exec_binprm().
| * | | | | | | | audit: add child record before the create to handle case where create failsJeff Layton2013-11-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically, when a syscall that creates a dentry fails, you get an audit record that looks something like this (when trying to create a file named "new" in "/tmp/tmp.SxiLnCcv63"): type=PATH msg=audit(1366128956.279:965): item=0 name="/tmp/tmp.SxiLnCcv63/new" inode=2138308 dev=fd:02 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmp_t:s15:c0.c1023 This record makes no sense since it's associating the inode information for "/tmp/tmp.SxiLnCcv63" with the path "/tmp/tmp.SxiLnCcv63/new". The recent patch I posted to fix the audit_inode call in do_last fixes this, by making it look more like this: type=PATH msg=audit(1366128765.989:13875): item=0 name="/tmp/tmp.DJ1O8V3e4f/" inode=141 dev=fd:02 mode=040700 ouid=0 ogid=0 rdev=00:00 obj=staff_u:object_r:user_tmp_t:s15:c0.c1023 While this is more correct, if the creation of the file fails, then we have no record of the filename that the user tried to create. This patch adds a call to audit_inode_child to may_create. This creates an AUDIT_TYPE_CHILD_CREATE record that will sit in place until the create succeeds. When and if the create does succeed, then this record will be updated with the correct inode info from the create. This fixes what was broken in commit bfcec708. Commit 79f6530c should also be backported to stable v3.7+. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | | | | | | | audit: allow unsetting the loginuid (with priv)Eric Paris2013-11-051-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a task has CAP_AUDIT_CONTROL allow that task to unset their loginuid. This would allow a child of that task to set their loginuid without CAP_AUDIT_CONTROL. Thus when launching a new login daemon, a priviledged helper would be able to unset the loginuid and then the daemon, which may be malicious user facing, do not need priv to function correctly. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
* | | | | | | | | Merge tag 'squashfs-updates' of ↵Linus Torvalds2013-11-2020-243/+1145
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next Pull squashfs updates from Phillip Lougher: "These patches optionally improve the multi-threading peformance of Squashfs by adding parallel decompression, and direct decompression into the page cache, eliminating an intermediate buffer (removing memcpy overhead and lock contention)" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: Check stream is not NULL in decompressor_multi.c Squashfs: Directly decompress into the page cache for file data Squashfs: Restructure squashfs_readpage() Squashfs: Generalise paging handling in the decompressors Squashfs: add multi-threaded decompression using percpu variable squashfs: Enhance parallel I/O Squashfs: Refactor decompressor interface and code
| * | | | | | | | | Squashfs: Check stream is not NULL in decompressor_multi.cPhillip Lougher2013-11-201-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix static checker complaint that stream is not checked in squashfs_decompressor_destroy(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | | | | | Squashfs: Directly decompress into the page cache for file dataPhillip Lougher2013-11-205-1/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces an implementation of squashfs_readpage_block() that directly decompresses into the page cache. This uses the previously added page handler abstraction to push down the necessary kmap_atomic/kunmap_atomic operations on the page cache buffers into the decompressors. This enables direct copying into the page cache without using the slow kmap/kunmap calls. The code detects when multiple threads are racing in squashfs_readpage() to decompress the same block, and avoids this regression by falling back to using an intermediate buffer. This patch enhances the performance of Squashfs significantly when multiple processes are accessing the filesystem simultaneously because it not only reduces memcopying, but it more importantly eliminates the lock contention on the intermediate buffer. Using single-thread decompression. dd if=file1 of=/dev/null bs=4096 & dd if=file2 of=/dev/null bs=4096 & dd if=file3 of=/dev/null bs=4096 & dd if=file4 of=/dev/null bs=4096 Before: 629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s After: 629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | | | | | Squashfs: Restructure squashfs_readpage()Phillip Lougher2013-11-204-71/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Restructure squashfs_readpage() splitting it into separate functions for datablocks, fragments and sparse blocks. Move the memcpying (from squashfs cache entry) implementation of squashfs_readpage_block into file_cache.c This allows different implementations to be supported. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | | | | | Squashfs: Generalise paging handling in the decompressorsPhillip Lougher2013-11-2013-67/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Further generalise the decompressors by adding a page handler abstraction. This adds helpers to allow the decompressors to access and process the output buffers in an implementation independant manner. This allows different types of output buffer to be passed to the decompressors, with the implementation specific aspects handled at decompression time, but without the knowledge being held in the decompressor wrapper code. This will allow the decompressors to handle Squashfs cache buffers, and page cache pages. This patch adds the abstraction and an implementation for the caches. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
| * | | | | | | | | Squashfs: add multi-threaded decompression using percpu variablePhillip Lougher2013-11-203-20/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a multi-threaded decompression implementation which uses percpu variables. Using percpu variables has advantages and disadvantages over implementations which do not use percpu variables. Advantages: * the nature of percpu variables ensures decompression is load-balanced across the multiple cores. * simplicity. Disadvantages: it limits decompression to one thread per core. Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | squashfs: Enhance parallel I/OMinchan Kim2013-11-203-1/+221
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now squashfs have used for only one stream buffer for decompression so it hurts parallel read performance so this patch supports multiple decompressor to enhance performance parallel I/O. Four 1G file dd read on KVM machine which has 2 CPU and 4G memory. dd if=test/test1.dat of=/dev/null & dd if=test/test2.dat of=/dev/null & dd if=test/test3.dat of=/dev/null & dd if=test/test4.dat of=/dev/null & old : 1m39s -> new : 9s * From v1 * Change comp_strm with decomp_strm - Phillip * Change/add comments - Phillip Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
| * | | | | | | | | Squashfs: Refactor decompressor interface and codePhillip Lougher2013-11-2011-136/+216
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decompressor interface and code was written from the point of view of single-threaded operation. In doing so it mixed a lot of single-threaded implementation specific aspects into the decompressor code and elsewhere which makes it difficult to seamlessly support multiple different decompressor implementations. This patch does the following: 1. It removes compressor_options parsing from the decompressor init() function. This allows the decompressor init() function to be dynamically called to instantiate multiple decompressors, without the compressor options needing to be read and parsed each time. 2. It moves threading and all sleeping operations out of the decompressors. In doing so, it makes the decompressors non-blocking wrappers which only deal with interfacing with the decompressor implementation. 3. It splits decompressor.[ch] into decompressor generic functions in decompressor.[ch], and moves the single threaded decompressor implementation into decompressor_single.c. The result of this patch is Squashfs should now be able to support multiple decompressors by adding new decompressor_xxx.c files with specialised implementations of the functions in decompressor_single.c Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Minchan Kim <minchan@kernel.org>
* | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-2012-164/+52
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs bits and pieces from Al Viro: "Assorted bits that got missed in the first pull request + fixes for a couple of coredump regressions" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fold try_to_ascend() into the sole remaining caller dcache.c: get rid of pointless macros take read_seqbegin_or_lock() and friends to seqlock.h consolidate simple ->d_delete() instances gfs2: endianness misannotations dump_emit(): use __kernel_write(), not vfs_write() dump_align(): fix the dumb braino
| * | | | | | | | | | fold try_to_ascend() into the sole remaining callerAl Viro2013-11-151-31/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There used to be a bunch of tree-walkers in dcache.c, all alike. try_to_ascend() had been introduced to abstract a piece of logics duplicated in all of them. These days all these tree-walkers are implemented via the same iterator (d_walk()), which is the only remaining caller of try_to_ascend(), so let's fold it back... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | dcache.c: get rid of pointless macrosAl Viro2013-11-151-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()). At this point they are actively misguiding - they imply that values are compiler constants, which is no longer true. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | take read_seqbegin_or_lock() and friends to seqlock.hAl Viro2013-11-151-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | consolidate simple ->d_delete() instancesAl Viro2013-11-157-78/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename simple_delete_dentry() to always_delete_dentry() and export it. Export simple_dentry_operations, while we are at it, and get rid of their duplicates Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | gfs2: endianness misannotationsAl Viro2013-11-153-19/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | dump_emit(): use __kernel_write(), not vfs_write()Al Viro2013-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the caller has already done file_start_write()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | | | | | dump_align(): fix the dumb brainoAl Viro2013-11-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mea culpa - original variant used 64-by-32-bit division, which got caught very late. Getting rid of that wasn't hard, but I'd managed to botch the calling conventions in process ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2013-11-201-1/+1
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block IO fixes from Jens Axboe: "Normally I'd defer my initial for-linus pull request until after the merge window, but a race was uncovered in the virtio-blk conversion to blk-mq that could cause hangs. So here's a small collection of fixes for you to pull: - The fix for the virtio-blk IO hang reported by Dave Chinner, from Shaohua and myself. - Add the Insert blktrace event for blk-mq. This makes 'btt' happy when it is doing it's state transition analysis. - Ensure that blk-mq has disk/partition stats enabled by default, instead of making it opt-in. - A fix for __bio_add_page() and large sector counts" * 'for-linus' of git://git.kernel.dk/linux-block: blk-mq: add blktrace insert event trace virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations blk-mq: ensure that we set REQ_IO_STAT so diskstats work bio: fix argument of __bio_add_page() for max_sectors > 0xffff
| * | | | | | | | | | | bio: fix argument of __bio_add_page() for max_sectors > 0xffffAkinobu Mita2013-11-181-1/+1
| | |_|_|_|_|/ / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The data type of max_sectors and max_hw_sectors in queue settings are unsigned int. But these values are passed to __bio_add_page() as an argument whose data type is unsigned short. In the worst case such as max_sectors is 0x10000, bio_add_page() can't add a page and IOs can't proceed. Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2013-11-192-6/+20
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: "Mostly these are fixes for fallout due to merge window changes, as well as cures for problems that have been with us for a much longer period of time" 1) Johannes Berg noticed two major deficiencies in our genetlink registration. Some genetlink protocols we passing in constant counts for their ops array rather than something like ARRAY_SIZE(ops) or similar. Also, some genetlink protocols were using fixed IDs for their multicast groups. We have to retain these fixed IDs to keep existing userland tools working, but reserve them so that other multicast groups used by other protocols can not possibly conflict. In dealing with these two problems, we actually now use less state management for genetlink operations and multicast groups. 2) When configuring interface hardware timestamping, fix several drivers that simply do not validate that the hwtstamp_config value is one the driver actually supports. From Ben Hutchings. 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar. 4) In dev_forward_skb(), set the skb->protocol in the right order relative to skb_scrub_packet(). From Alexei Starovoitov. 5) Bridge erroneously fails to use the proper wrapper functions to make calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid. Fix from Toshiaki Makita. 6) When detaching a bridge port, make sure to flush all VLAN IDs to prevent them from leaking, also from Toshiaki Makita. 7) Put in a compromise for TCP Small Queues so that deep queued devices that delay TX reclaim non-trivially don't have such a performance decrease. One particularly problematic area is 802.11 AMPDU in wireless. From Eric Dumazet. 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts here. Fix from Eric Dumzaet, reported by Dave Jones. 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn. 10) When computing mergeable buffer sizes, virtio-net fails to take the virtio-net header into account. From Michael Dalton. 11) Fix seqlock deadlock in ip4_datagram_connect() wrt. statistic bumping, this one has been with us for a while. From Eric Dumazet. 12) Fix NULL deref in the new TIPC fragmentation handling, from Erik Hugne. 13) 6lowpan bit used for traffic classification was wrong, from Jukka Rissanen. 14) macvlan has the same issue as normal vlans did wrt. propagating LRO disabling down to the real device, fix it the same way. From Michal Kubecek. 15) CPSW driver needs to soft reset all slaves during suspend, from Daniel Mack. 16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet. 17) The xen-netfront RX buffer refill timer isn't properly scheduled on partial RX allocation success, from Ma JieYue. 18) When ipv6 ping protocol support was added, the AF_INET6 protocol initialization cleanup path on failure was borked a little. Fix from Vlad Yasevich. 19) If a socket disconnects during a read/recvmsg/recvfrom/etc that blocks we can do the wrong thing with the msg_name we write back to userspace. From Hannes Frederic Sowa. There is another fix in the works from Hannes which will prevent future problems of this nature. 20) Fix route leak in VTI tunnel transmit, from Fan Du. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits) genetlink: make multicast groups const, prevent abuse genetlink: pass family to functions using groups genetlink: add and use genl_set_err() genetlink: remove family pointer from genl_multicast_group genetlink: remove genl_unregister_mc_group() hsr: don't call genl_unregister_mc_group() quota/genetlink: use proper genetlink multicast APIs drop_monitor/genetlink: use proper genetlink multicast APIs genetlink: only pass array to genl_register_family_with_ops() tcp: don't update snd_nxt, when a socket is switched from repair mode atm: idt77252: fix dev refcnt leak xfrm: Release dst if this dst is improper for vti tunnel netlink: fix documentation typo in netlink_set_err() be2net: Delete secondary unicast MAC addresses during be_close be2net: Fix unconditional enabling of Rx interface options net, virtio_net: replace the magic value ping: prevent NULL pointer dereference on write to msg_name bnx2x: Prevent "timeout waiting for state X" bnx2x: prevent CFC attention bnx2x: Prevent panic during DMAE timeout ...
| * | | | | | | | | | | genetlink: make multicast groups const, prevent abuseJohannes Berg2013-11-191-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register generic netlink multicast groups as an array with the family and give them contiguous group IDs. Then instead of passing the global group ID to the various functions that send messages, pass the ID relative to the family - for most families that's just 0 because the only have one group. This avoids the list_head and ID in each group, adding a new field for the mcast group ID offset to the family. At the same time, this allows us to prevent abusing groups again like the quota and dropmon code did, since we can now check that a family only uses a group it owns. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | | genetlink: pass family to functions using groupsJohannes Berg2013-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This doesn't really change anything, but prepares for the next patch that will change the APIs to pass the group ID within the family, rather than the global group ID. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | | quota/genetlink: use proper genetlink multicast APIsJohannes Berg2013-11-191-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The quota code is abusing the genetlink API and is using its family ID as the multicast group ID, which is invalid and may belong to somebody else (and likely will.) Make the quota code use the correct API, but since this is already used as-is by userspace, reserve a family ID for this code and also reserve that group ID to not break userspace assumptions. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | | genetlink: only pass array to genl_register_family_with_ops()Johannes Berg2013-11-191-4/+6
| | |_|_|_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by David Miller, make genl_register_family_with_ops() a macro and pass only the array, evaluating ARRAY_SIZE() in the macro, this is a little safer. The openvswitch has some indirection, assing ops/n_ops directly in that code. This might ultimately just assign the pointers in the family initializations, saving the struct genl_family_and_ops and code (once mcast groups are handled differently.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | seq_file: always clear m->count when we free m->bufAl Viro2013-11-181-1/+2
| |/ / / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we'd freed m->buf, m->count should become zero - we have no valid contents reachable via m->buf. Reported-by: Charley (Hao Chuan) Chu <charley.chu@broadcom.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2013-11-1617-74/+358
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull CIFS fixes from Steve French: "A set of cifs fixes most important of which is Pavel's fix for some problems with handling Windows reparse points and also the security fix for setfacl over a cifs mount to Samba removing part of the ACL. Both of these fixes are for stable as well. Also added most of copychunk (copy offload) support to cifs although I expect a final patch in that series (to fix handling of larger files) in a few days (had to hold off on that in order to incorporate some additional code review feedback). Also added support for O_DIRECT on forcedirectio mounts (needed in order to run some of the server benchmarks over cifs and smb2/smb3 mounts)" * 'for-linus' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] Warn if SMB3 encryption required by server setfacl removes part of ACL when setting POSIX ACLs to Samba [CIFS] Set copychunk defaults CIFS: SMB2/SMB3 Copy offload support (refcopy) phase 1 cifs: Use data structures to compute NTLMv2 response offsets [CIFS] O_DIRECT opens should work on directio mounts cifs: don't spam the logs on unexpected lookup errors cifs: change ERRnomem error mapping from ENOMEM to EREMOTEIO CIFS: Fix symbolic links usage
| * | | | | | | | | | [CIFS] Warn if SMB3 encryption required by serverSteve French2013-11-152-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do not support SMB3 encryption yet, warn if server responds that SMB3 encryption is mandatory. Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | | | | | | | setfacl removes part of ACL when setting POSIX ACLs to SambaSteve French2013-11-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setfacl over cifs mounts can remove the default ACL when setting the (non-default part of) the ACL and vice versa (we were leaving at 0 rather than setting to -1 the count field for the unaffected half of the ACL. For example notice the setfacl removed the default ACL in this sequence: steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir ; setfacl -m default:user:test:rwx,user:test:rwx /mnt/test-dir getfacl: Removing leading '/' from absolute path names user::rwx group::r-x other::r-x default:user::rwx default:user:test:rwx default:group::r-x default:mask::rwx default:other::r-x steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir getfacl: Removing leading '/' from absolute path names user::rwx user:test:rwx group::r-x mask::rwx other::r-x CC: Stable <stable@kernel.org> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Jeremy Allison <jra@samba.org>
| * | | | | | | | | | [CIFS] Set copychunk defaultsSteve French2013-11-152-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch 2 of the copy chunk series (the final patch will use these to handle copies of files larger than the chunk size. We set the same defaults that Windows and Samba expect for CopyChunk. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: David Disseldorp <ddiss@samba.org>
| * | | | | | | | | | CIFS: SMB2/SMB3 Copy offload support (refcopy) phase 1Steve French2013-11-144-1/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This first patch adds the ability for us to do a server side copy (ie fast copy offloaded to the server to perform, aka refcopy) "cp --reflink" of one file to another located on the same server. This is much faster than traditional copy (which requires reading and writing over the network and extra memcpys). This first version is not going to be copy files larger than about 1MB (to Samba) until I add support for multiple chunks and for autoconfiguring the chunksize. It includes: 1) processing of the ioctl 2) marshalling and sending the SMB2/SMB3 fsctl over the network 3) simple parsing of the response It does not include yet (these will be in followon patches to come soon): 1) support for multiple chunks 2) support for autoconfiguring and remembering the chunksize 3) Support for the older style copychunk which Samba 4.1 server supports (because this requires write permission on the target file, which cp does not give you, apparently per-posix). This may require a distinct tool (other than cp) and other ioctl to implement. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | | | | | | | cifs: Use data structures to compute NTLMv2 response offsetsTim Gardner2013-11-112-17/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bit of cleanup plus some gratuitous variable renaming. I think using structures instead of numeric offsets makes this code much more understandable. Also added a comment about current time range expected by the server. Acked-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | | | | | | | [CIFS] O_DIRECT opens should work on directio mountsSteve French2013-11-111-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Opens on current cifs/smb2/smb3 mounts with O_DIRECT flag fail even when caching is disabled on the mount. This was reported by those running SMB2 benchmarks who need to be able to pass O_DIRECT on many of their open calls to reduce caching effects, but would also be needed by other applications. When mounting with forcedirectio ("cache=none") cifs and smb2/smb3 do not go through the page cache and thus opens with O_DIRECT flag should work (when posix extensions are negotiated we even are able to send the flag to the server). This patch fixes that in a simple way. The 9P client has a similar situation (caching is often disabled) and takes the same approach to O_DIRECT support ie works if caching disabled, but if client caching enabled it fails with EINVAL. A followon idea for a future patch as Pavel noted, could be that files opened with O_DIRECT could cause us to change inode->i_fop on the fly from cifs_file_strict_ops to cifs_file_direct_ops which would allow us to support this on non-forcedirectio mounts (cache=strict and cache=loose) as well. Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | | | | | | | cifs: don't spam the logs on unexpected lookup errorsJeff Layton2013-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey reported that he was seeing cifs.ko spam the logs with messages like this: CIFS VFS: Unexpected lookup error -26 He was listing the root directory of a server and hitting an error when trying to QUERY_PATH_INFO against hiberfil.sys and pagefile.sys. The right fix would be to switch the lookup code over to using FIND_FIRST, but until then we really don't need to report this at a level of KERN_ERR. Convert this message over to FYI level. Reported-by: "Andrey Shernyukov" <andreysh@nioch.nsc.ru> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | | | | | | | cifs: change ERRnomem error mapping from ENOMEM to EREMOTEIOJeff Layton2013-11-112-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes, the server will report an error that basically indicates that it's running out of resources. These include these under SMB1: NT_STATUS_NO_MEMORY NT_STATUS_SECTION_TOO_BIG NT_STATUS_TOO_MANY_PAGING_FILES ...and this one under SMB2: STATUS_NO_MEMORY Currently, this gets mapped to ENOMEM by the client, but that's confusing as an ENOMEM error is typically an indicator that the client is out of memory. Change these errors to instead map to EREMOTEIO to indicate that the problem is actually server-side and not on the client. Reported-by: "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | | | | | | | | CIFS: Fix symbolic links usagePavel Shilovsky2013-11-116-49/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we treat any reparse point as a symbolic link and map it to a Unix one that is not true in a common case due to many reparse point types supported by SMB servers. Distinguish reparse point types into two groups: 1) that can be accessed directly through a reparse point (junctions, deduplicated files, NFS symlinks); 2) that need to be processed manually (Windows symbolic links, DFS); and map only Windows symbolic links to Unix ones. Cc: <stable@vger.kernel.org> Acked-by: Jeff Layton <jlayton@redhat.com> Reported-and-tested-by: Joao Correia <joaomiguelcorreia@gmail.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
* | | | | | | | | | | Merge tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2013-11-163-5/+10
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes: - Stable fix for data corruption when retransmitting O_DIRECT writes - Stable fix for a deep recursion/stack overflow bug in rpc_release_client - Stable fix for infinite looping when mounting a NFSv4.x volume - Fix a typo in the nfs mount option parser - Allow pNFS layouts to be compiled into the kernel when NFSv4.1 is * tag 'nfs-for-3.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: fix pnfs Kconfig defaults NFS: correctly report misuse of "migration" mount option. nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once SUNRPC: Avoid deep recursion in rpc_release_client SUNRPC: Fix a data corruption issue when retransmitting RPC calls
| * | | | | | | | | | | nfs: fix pnfs Kconfig defaultsChristoph Hellwig2013-11-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Defaulting to m seem to prevent building the pnfs layout modules into the kernel. Default to the value of CONFIG_NFS_V4 make sure they are built in for built-in NFSv4 support and modular for a modular NFSv4. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | | | | | | | NFS: correctly report misuse of "migration" mount option.NeilBrown2013-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current test on valid use of the "migration" mount option can never report an error as it will only do so if mnt->version !=4 && mnt->minor_version != 0 (and some other condition), but if that test would succeed, then the previous test has already gone-to out_minorversion_mismatch. So change the && to an || to get correct semantics. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
OpenPOWER on IntegriCloud