summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* ocfs2: Don't use MAXQUOTAS valueJan Kara2014-09-175-34/+42
| | | | | | | | | | | | MAXQUOTAS value defines maximum number of quota types VFS supports. This isn't necessarily the number of types ocfs2 supports and with addition of project quotas these two numbers stop matching. So make ocfs2 use its private definition. CC: Mark Fasheh <mfasheh@suse.com> CC: Joel Becker <jlbec@evilplan.org> CC: ocfs2-devel@oss.oracle.com Signed-off-by: Jan Kara <jack@suse.cz>
* reiserfs: Don't use MAXQUOTAS valueJan Kara2014-09-172-9/+12
| | | | | | | | | | | MAXQUOTAS value defines maximum number of quota types VFS supports. This isn't necessarily the number of types reiserfs supports and with addition of project quotas these two numbers stop matching. So make reiserfs use its private definition. CC: reiserfs-devel@vger.kernel.org CC: Jeff Mahoney <jeffm@suse.de> Signed-off-by: Jan Kara <jack@suse.cz>
* ext3: Don't use MAXQUOTAS valueJan Kara2014-09-172-12/+14
| | | | | | | | | | MAXQUOTAS value defines maximum number of quota types VFS supports. This isn't necessarily the number of types ext3 supports and with addition of project quotas these two numbers stop matching. So make ext3 use its private definition. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Fix race between write(2) and close(2)Jan Kara2014-09-171-1/+8
| | | | | | | | | | | | | | | | | | Currently write(2) updating i_size and close(2) of the file can race in such a way that udf_truncate_tail_extent() called from udf_file_release() sees old i_size but already new extents added by the running write call. This results in complaints like: UDF-fs: warning (device vdb2): udf_truncate_tail_extent: Too long extent after EOF in inode 877: i_size: 0 lbcount: 1073739776 extent 0+1073739776 UDF-fs: error (device vdb2): udf_truncate_tail_extent: Extent after EOF in inode 877 Fix the problem by grabbing i_mutex in udf_file_release() to be sure i_size is consistent with current state of extent list. Also avoid truncating tail extent unnecessarily when the file is still open for writing. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: saner calling conventions for udf_new_inode()Al Viro2014-09-043-43/+27
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: fix the udf_iget() vs. udf_new_inode() racesAl Viro2014-09-042-1/+13
| | | | | | | | | | | | | | | | | | | Currently udf_iget() (triggered by NFS) can race with udf_new_inode() leading to two inode structures with the same inode number: nfsd: iget_locked() creates inode nfsd: try to read from disk, block on that. udf_new_inode(): allocate inode with that inumber udf_new_inode(): insert it into icache, set it up and dirty udf_write_inode(): write inode into buffer cache nfsd: get CPU again, look into buffer cache, see nice and sane on-disk inode, set the in-core inode from it Fix the problem by putting inode into icache in locked state (I_NEW set) and unlocking it only after it's fully set up. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: merge the pieces inserting a new non-directory object into directoryAl Viro2014-09-041-69/+29
| | | | | | | | | | boilerplate code in udf_{create,mknod,symlink} taken to new helper symlink case converted to unique id calculated by udf_new_inode() - no point finding a new one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Set i_generation fieldJan Kara2014-09-042-0/+2
| | | | | | | | | Currently UDF doesn't initialize i_generation in any way and thus NFS can easily get reallocated inodes from stale file handles. Luckily UDF already has a unique object identifier associated with each inode - i_unique. Use that for initialization of i_generation. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Properly detect stale inodesJan Kara2014-09-041-2/+4
| | | | | | | | NFS can easily ask for inodes that are already deleted. Currently UDF happily returns such inodes which is a bug. Return -ESTALE if udf_read_inode() is asked to read deleted inode. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Make udf_read_inode() and udf_iget() return errorJan Kara2014-09-044-95/+96
| | | | | | | | | | Currently __udf_read_inode() wasn't returning anything and we found out whether we succeeded reading inode by checking whether inode is bad or not. udf_iget() returned NULL on failure and inode pointer otherwise. Make these two functions properly propagate errors up the call stack and use the return value in callers. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Avoid infinite loop when processing indirect ICBsJan Kara2014-09-041-14/+21
| | | | | | | | | | | | We did not implement any bound on number of indirect ICBs we follow when loading inode. Thus corrupted medium could cause kernel to go into an infinite loop, possibly causing a stack overflow. Fix the possible stack overflow by removing recursion from __udf_read_inode() and limit number of indirect ICBs we follow to avoid infinite loops. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Fold udf_fill_inode() into __udf_read_inode()Jan Kara2014-09-041-17/+5
| | | | | | | There's no good reason to separate these since udf_fill_inode() is called only from __udf_read_inode() and both do part of the same thing. Signed-off-by: Jan Kara <jack@suse.cz>
* udf: Avoid dir link count to go negativeJan Kara2014-09-041-1/+1
| | | | | | | | If we are writing back inode of unlinked directory, its link count ends up being (u16)-1. Although the inode is deleted, udf_iget() can load the inode when NFS uses stale file handle and get confused. Signed-off-by: Jan Kara <jack@suse.cz>
* Merge tag 'for-f2fs-3.17-rc4' of ↵Linus Torvalds2014-09-0318-228/+248
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs bug fixes from Jaegeuk Kim: "This series includes patches to: - fix recovery routines - fix bugs related to inline_data/xattr - fix when casting the dentry names - handle EIO or ENOMEM correctly - fix memory leak - fix lock coverage" * tag 'for-f2fs-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (28 commits) f2fs: reposition unlock_new_inode to prevent accessing invalid inode f2fs: fix wrong casting for dentry name f2fs: simplify by using a literal f2fs: truncate stale block for inline_data f2fs: use macro for code readability f2fs: introduce need_do_checkpoint for readability f2fs: fix incorrect calculation with total/free inode num f2fs: remove rename and use rename2 f2fs: skip if inline_data was converted already f2fs: remove rewrite_node_page f2fs: avoid double lock in truncate_blocks f2fs: prevent checkpoint during roll-forward f2fs: add WARN_ON in f2fs_bug_on f2fs: handle EIO not to break fs consistency f2fs: check s_dirty under cp_mutex f2fs: unlock_page when node page is redirtied out f2fs: introduce f2fs_cp_error for readability f2fs: give a chance to mount again when encountering errors f2fs: trigger release_dirty_inode in f2fs_put_super f2fs: don't skip checkpoint if there is no dirty node pages ...
| * f2fs: reposition unlock_new_inode to prevent accessing invalid inodeChao Yu2014-09-022-16/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the race condition on the inode cache, following scenario can appear: [Thread a] [Thread b] ->f2fs_mkdir ->f2fs_add_link ->__f2fs_add_link ->init_inode_metadata failed here ->gc_thread_func ->f2fs_gc ->do_garbage_collect ->gc_data_segment ->f2fs_iget ->iget_locked ->wait_on_inode ->unlock_new_inode ->move_data_page ->make_bad_inode ->iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. change log from v1: o Add condition judgment in gc_data_segment() suggested by Changman Lee. o use iget_failed to simplify code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix wrong casting for dentry nameJaegeuk Kim2014-08-291-3/+4
| | | | | | | | | | | | | | The dentry name type is unsigned char *. If we don't match this type, some character codes can be changed by signed bit. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: simplify by using a literalDan Carpenter2014-08-281-1/+1
| | | | | | | | | | | | | | | | We can make the code a bit simpler because we know that "!retry" is zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: truncate stale block for inline_dataJaegeuk Kim2014-08-251-8/+12
| | | | | | | | | | | | | | This verifies to truncate any allocated blocks, offset[0], by inline_data. Not figured out, but for making sure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use macro for code readabilityChao Yu2014-08-221-11/+10
| | | | | | | | | | | | | | | | | | | | | | This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro instead of numbers in code for readability. change log from v1: o fix typo pointed out by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce need_do_checkpoint for readabilityChao Yu2014-08-211-13/+21
| | | | | | | | | | | | | | | | This patch introduce need_do_checkpoint() to include numerous judgment condition for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix incorrect calculation with total/free inode numChao Yu2014-08-212-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Theoretically, our total inodes number is the same as total node number, but there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2 (meta nid), and they should never be used by user, so our total/free inode number calculated in ->statfs is wrong. This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by recalculating total/free inode number with the macro. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove rename and use rename2Jaegeuk Kim2014-08-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Refer the following patch. commit 7177a9c4b509eb357cc450256bc3cf39f1a1e639 Author: Miklos Szeredi <mszeredi@suse.cz> Date: Wed Jul 23 15:15:30 2014 +0200 fs: call rename2 if exists Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: skip if inline_data was converted alreadyJaegeuk Kim2014-08-211-1/+5
| | | | | | | | | | | | | | This patch checks inline_data one more time under the inode page lock whether its inline_data is converted or not. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove rewrite_node_pageJaegeuk Kim2014-08-214-64/+0
| | | | | | | | | | | | | | | | | | | | I think we need to let the dirty node pages remain in the page cache instead of rewriting them in their places. So, after done with successful recovery, write_checkpoint will flush all of them through the normal write path. Through this, we can avoid potential error cases in terms of block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid double lock in truncate_blocksJaegeuk Kim2014-08-215-9/+12
| | | | | | | | | | | | | | | | The init_inode_metadata calls truncate_blocks when error is occurred. The callers holds f2fs_lock_op, so we should not call it again in truncate_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: prevent checkpoint during roll-forwardJaegeuk Kim2014-08-211-0/+8
| | | | | | | | | | | | | | Any checkpoint should not be done during the core roll-forward procedure. Especially, it includes error cases too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: add WARN_ON in f2fs_bug_onJaegeuk Kim2014-08-211-1/+1
| | | | | | | | | | | | This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: handle EIO not to break fs consistencyJaegeuk Kim2014-08-214-15/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two rules when EIO is occurred. 1. don't write any checkpoint data to preserve the previous checkpoint 2. don't lose the cached dentry/node/meta pages So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure. Then, writing checkpoint/dentry/node blocks is not allowed. Note that, for the data pages, we can't just throw away by redirtying them. Otherwise, kworker can fall into infinite loop to flush them. (Ref. xfstests/019) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: check s_dirty under cp_mutexJaegeuk Kim2014-08-212-5/+6
| | | | | | | | | | | | | | | | | | It needs to check s_dirty under cp_mutex, since s_dirty is reset under that mutex. And previous condition was not correct, since we can omit doing checkpoint when checkpoint was done followed by all the node pages were written back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: unlock_page when node page is redirtied outJaegeuk Kim2014-08-211-2/+5
| | | | | | | | | | | | This patch fixes missing unlock_page when a node page is redirtied out. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: introduce f2fs_cp_error for readabilityJaegeuk Kim2014-08-214-4/+9
| | | | | | | | | | | | This patch adds f2fs_cp_error for readability. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: give a chance to mount again when encountering errorsJaegeuk Kim2014-08-211-1/+12
| | | | | | | | | | | | | | This patch gives another chance to try mount process when we encounter an error. This makes an effect on the roll-forward recovery failures as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: trigger release_dirty_inode in f2fs_put_superJaegeuk Kim2014-08-213-1/+5
| | | | | | | | | | | | | | | | | | The generic_shutdown_super calls sync_filesystem, evict_inode, and then f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information so we should release them at f2fs_put_super. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't skip checkpoint if there is no dirty node pagesJaegeuk Kim2014-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the errorneous scenario. 1. write data 2. do checkpoint 3. produce some dirty node pages by the gc thread 4. write back dirty node pages 5. f2fs_put_super will skip the checkpoint, since dirty count for node pages is zero. This patch removes such the wrong condition check. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: avoid bug_on when error is occurredJaegeuk Kim2014-08-191-1/+2
| | | | | | | | | | | | During the recovery, if an error like EIO or ENOMEM, f2fs_bug_on should skip. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to recover inline_xattr/data and blocksJaegeuk Kim2014-08-193-13/+11
| | | | | | | | | | | | | | | | This patch fixes not to skip xattr recovery and inline xattr/data recovery order. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: should clear the inline_xattr flagJaegeuk Kim2014-08-191-8/+7
| | | | | | | | | | | | | | | | During the recovery, we should clear the inline_xattr flag if its xattr node block is recovered. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: clear FI_INC_LINK during the recoveryJaegeuk Kim2014-08-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | If an inode are fsynced multiple times with fsync & dent marks, this inode will set FI_INC_LINK at find_fsync_dnodes during the recovery. But, in recover_inode, recover_dentry doesn't clear that flag when multiple hits were occurred. So this patch removes the flag for the further consistency. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix the initial inode page for recoveryJaegeuk Kim2014-08-191-0/+2
| | | | | | | | | | | | | | | | If a new inode page is needed for recover_dentry, we should assing i_inline as zero. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: make clear on test condition and return typesJaegeuk Kim2014-08-192-6/+6
| | | | | | | | | | | | | | This patch adds a parentheses to make clear for condition check. And also it changes the return type for better meanings. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: should convert inline_data during the mkwriteJaegeuk Kim2014-08-194-13/+22
| | | | | | | | | | | | | | If mkwrite is called to an inode having inline_data, it can overwrite the data index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix typoarter972014-08-1915-30/+30
| | | | | | | | | | | | | | | | | | Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | Merge tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linuxLinus Torvalds2014-08-301-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull file locking bugfx from Jeff Layton: "Just a bugfix for a bug that crept in to v3.15. It's in a rather rare error path, and I'm not aware of anyone having hit it, but it's worth fixing for v3.17" * tag 'locks-v3.17-3' of git://git.samba.org/jlayton/linux: locks: pass correct "before" pointer to locks_unlink_lock in generic_add_lease
| * | locks: pass correct "before" pointer to locks_unlink_lock in generic_add_leaseJeff Layton2014-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The argument to locks_unlink_lock can't be just any pointer to a pointer. It must be a pointer to the fl_next field in the previous lock in the list. Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | | Merge branch 'akpm' (fixes from Andrew Morton)Linus Torvalds2014-08-294-94/+94
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge patches from Andrew Morton: "22 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits) kexec: purgatory: add clean-up for purgatory directory Documentation/kdump/kdump.txt: add ARM description flush_icache_range: export symbol to fix build errors tools: selftests: fix build issue with make kselftests target ocfs2: quorum: add a log for node not fenced ocfs2: o2net: set tcp user timeout to max value ocfs2: o2net: don't shutdown connection when idle timeout ocfs2: do not write error flag to user structure we cannot copy from/to x86/purgatory: use approprate -m64/-32 build flag for arch/x86/purgatory drivers/rtc/rtc-s5m.c: re-add support for devices without irq specified xattr: fix check for simultaneous glibc header inclusion kexec: remove CONFIG_KEXEC dependency on crypto kexec: create a new config option CONFIG_KEXEC_FILE for new syscall x86,mm: fix pte_special versus pte_numa hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked mm/zpool: use prefixed module loading zram: fix incorrect stat with failed_reads lib: turn CONFIG_STACKTRACE into an actual option. mm: actually clear pmd_numa before invalidating memblock, memhotplug: fix wrong type in memblock_find_in_range_node(). ...
| * | | ocfs2: quorum: add a log for node not fencedJunxiao Bi2014-08-291-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For debug use, we can see from the log whether the fence decision is made and why it is not fenced. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | ocfs2: o2net: set tcp user timeout to max valueJunxiao Bi2014-08-292-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When tcp retransmit timeout(15mins), the connection will be closed. Pending messages may be lost during this time. So we set tcp user timeout to override the retransmit timeout to the max value. This is OK for ocfs2 since we have disk heartbeat, if peer crash, the disk heartbeat will timeout and it will be evicted, if disk heartbeat not timeout and connection idle for a long time, then this means the cluster enters split-brain state, since fence can't happen, we'd better keep the connection and wait network recover. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | ocfs2: o2net: don't shutdown connection when idle timeoutJunxiao Bi2014-08-291-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series is to fix a possible message lost bug in ocfs2 when network go bad. This bug will cause ocfs2 hung forever even network become good again. The messages may lost in this case. After the tcp connection is established between two nodes, an idle timer will be set to check its state periodically, if no messages are received during this time, idle timer will timeout, it will shutdown the connection and try to reconnect, so pending messages in tcp queues will be lost. This messages may be from dlm. Dlm may get hung in this case. This may cause the whole ocfs2 cluster hung. This is very possible to happen when network state goes bad. Do the reconnect is useless, it will fail if network state is still bad. Just waiting there for network recovering may be a good idea, it will not lost messages and some node will be fenced until cluster goes into split-brain state, for this case, Tcp user timeout is used to override the tcp retransmit timeout. It will timeout after 25 days, user should have notice this through the provided log and fix the network, if they don't, ocfs2 will fall back to original reconnect way. This patch (of 3): Some messages in the tcp queue maybe lost if we shutdown the connection and reconnect when idle timeout. If packets lost and reconnect success, then the ocfs2 cluster maybe hung. To fix this, we can leave the connection there and do the fence decision when idle timeout, if network recover before fence dicision is made, the connection survive without lost any messages. This bug can be saw when network state go bad. It may cause ocfs2 hung forever if some packets lost. With this fix, ocfs2 will recover from hung if network becomes good again. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | ocfs2: do not write error flag to user structure we cannot copy from/toBen Hutchings2014-08-291-86/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we failed to copy from the structure, writing back the flags leaks 31 bits of kernel memory (the rest of the ir_flags field). In any case, if we cannot copy from/to the structure, why should we expect putting just the flags to work? Also make sure ocfs2_info_handle_freeinode() returns the right error code if the copy_to_user() fails. Fixes: ddee5cdb70e6 ('Ocfs2: Add new OCFS2_IOC_INFO ioctl for ocfs2 v8.') Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Joel Becker <jlbec@evilplan.org> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'nfs-for-3.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2014-08-292-10/+21
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Trond Myklebust: "Highlights: - NFSv3 stable fix for another POSIX ACL regression - NFSv4 stable fix for a regression with OPEN_DOWNGRADE - NFSv4 stable fix for bad close() behaviour when holding a delegation" * tag 'nfs-for-3.17-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv3: Fix another acl regression NFSv4: Don't clear the open state when we just did an OPEN_DOWNGRADE NFSv4: Fix problems with close in the presence of a delegation
OpenPOWER on IntegriCloud