summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* xfs: refactor xfs_trans_reserve() interfaceJie Liu2013-08-1221-197/+150
| | | | | | | | | | | | | | | | With the new xfs_trans_res structure has been introduced, the log reservation size, log count as well as log flags are pre-initialized at mount time. So it's time to refine xfs_trans_reserve() interface to be more neat. Also, introduce a new helper M_RES() to return a pointer to the mp->m_resv structure to simplify the input. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Make writeid transaction use tr_writeidJie Liu2013-08-121-1/+1
| | | | | | | | | | | | | | tr_writeid is defined at mp->m_resv structure, however, it does not really being used when it should be.. This patch changes it to tr_writeid to fetch the correct log reservation size. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Introduce tr_fsyncts to m_reservationJie Liu2013-08-122-1/+3
| | | | | | | | | | | | | | | A preparation step. For now fsync_ts transaction use the pre-calculated log reservation size of tr_swrite. This patch introduce a new item tr_fsyncts to mp->m_reservations structure so that we can fetch the log reservation value for it in a same manner to others. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Introduce a new structure to hold transaction reservation itemsJie Liu2013-08-124-87/+163
| | | | | | | | | | | | | | Introduce a new structure xfs_trans_res to hold transaction reservation item info per log ticket. We also need to improve xfs_trans_resv_calc() by initializing the log count as well as log flags for permanent log reservation. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: make struct xfs_perag kernel onlyDave Chinner2013-08-124-53/+54
| | | | | | | | | | | | | The struct xfs_perag has many kernel-only definitions in it, requiring a __KERNEL__ guard so userspace can use it to. Move it to xfs_mount.h so that it it kernel-only, and let userspace redefine it's own version of the structure containing only what it needs. This gets rid of another __KERNEL__ check in the XFS header files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: move kernel specific type definitions to xfs.hDave Chinner2013-08-122-36/+32
| | | | | | | | | | | | xfs_types.h is shared with userspace, so having kernel specific types defined in it is problematic. Move all the kernel specific defines to xfs_linux.h so we can remove the __KERNEL__ guards from xfs_types.h. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: xfs_filestreams.h doesn't need __KERNEL__Dave Chinner2013-08-121-4/+0
| | | | | | | | | Because it is only used within the kernel. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove __KERNEL__ check from xfs_dir2_leaf.cDave Chinner2013-08-121-4/+0
| | | | | | | | | | | | | | It's actually an ifndef section, which means it is only included in userspace. however, it's deep within the libxfs code, so it's unlikely that the condition checked in userspace can actually occur (search an empty leaf) through the libxfs interfaces. i.e. if it can happen in usrspace, it can happen in the kernel, so remove it from userspace too.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove __KERNEL__ from debug codeDave Chinner2013-08-123-4/+4
| | | | | | | | | | | There is no reason the remaining kernel-only debug code needs to remain kernel-only. Kill the __KERNEL__ part of the defines, and let userspace handle the debug code appropriately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: kill __KERNEL__ check for debug code in allocation codeDave Chinner2013-08-121-3/+3
| | | | | | | | | | | | | Userspace running debug builds is relatively rare, so there's need to special case the allocation algorithm code coverage debug switch. As it is, userspace defines random numbers to 0, so invert the logic of the switch so it is effectively a no-op in userspace. This kills another couple of __KERNEL__ users. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: don't special case shared superblock mountsDave Chinner2013-08-121-7/+0
| | | | | | | | | | | | Neither kernel or userspace support shared read-only mounts, so don't bother special casing the support check to be different between kernel and userspace. The same check can be used as neither like it... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: consolidate extent swap codeDave Chinner2013-08-129-517/+436
| | | | | | | | | | | | | | | | | So we don't need xfs_dfrag.h in userspace anymore, move the extent swap ioctl structure definition to xfs_fs.h where most of the other ioctl structure definitions are. Now that we don't need separate files for extent swapping, separate the basic file descriptor checking code to xfs_ioctl.c, and the code that does the extent swap operation to xfs_bmap_util.c. This cleanly separates the user interface code from the physical mechanism used to do the extent swap. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: consolidate xfs_utils.cDave Chinner2013-08-1218-358/+285
| | | | | | | | | | | There are a few small helper functions in xfs_util, all related to xfs_inode modifications. Move them all to xfs_inode.c so all xfs_inode operations are consiolidated in the one place. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: consolidate xfs_rename.cDave Chinner2013-08-123-349/+307
| | | | | | | | | | Move the rename code to xfs_inode.c to continue consolidating all the kernel xfs_inode operations in the one place. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: kill xfs_vnodeops.[ch]Dave Chinner2013-08-1229-1991/+1931
| | | | | | | | | | | | | | | | | | | | | | | Now we have xfs_inode.c for holding kernel-only XFS inode operations, move all the inode operations from xfs_vnodeops.c to this new file as it holds another set of kernel-only inode operations. The name of this file traces back to the days of Irix and it's vnodes which we don't have anymore. Essentially this move consolidates the inode locking functions and a bunch of XFS inode operations into the one file. Eventually the high level functions will be merged into the VFS interface functions in xfs_iops.c. This leaves only internal preallocation, EOF block manipulation and hole punching functions in vnodeops.c. Move these to xfs_bmap_util.c where we are already consolidating various in-kernel physical extent manipulation and querying functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: fix issues that cause userspace warningsDave Chinner2013-08-126-13/+12
| | | | | | | | | | | Some of the code shared with userspace causes compilation warnings from things turned off in the kernel code, such as differences in variable signedness. Fix those issues. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: minor cleanupsDave Chinner2013-08-124-14/+4
| | | | | | | | | | These come from syncing the shared userspace and kernel code. Small whitespace and trivial cleanups. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: create xfs_bmap_util.[ch]Dave Chinner2013-08-1220-853/+957
| | | | | | | | | | | | | | | | | | | | | | | There is a bunch of code in xfs_bmap.c that is kernel specific and not shared with userspace. To minimise the difference between the kernel and userspace code, shift this unshared code to xfs_bmap_util.c, and the declarations to xfs_bmap_util.h. The biggest issue here is xfs_bmap_finish() - userspace has it's own definition of this function, and so we need to move it out of xfs_bmap.[ch]. This means several other files need to include xfs_bmap_util.h as well. It also introduces and interesting dance for the stack switching code in xfs_bmapi_allocate(). The stack switching/workqueue code is actually moved to xfs_bmap_util.c, so that userspace can simply use a #define in a header file to connect the dots without needing to know about the stack switch code at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: introduce xfs_sb.c for sharing with libxfsDave Chinner2013-08-125-797/+863
| | | | | | | | | | | | | | | | | | xfs_mount.c is shared with userspace, but the only functions that are shared are to do with physical superblock manipulations. This means that less than 25% of the xfs_mount.c code is actually shared with userspace. Move all the superblock functions to xfs_sb.c and share that instead with libxfs. Note that this will leave all the in-core transaction related superblock counter modifications in xfs_mount.c as none of that is shared with userspace. With a few more small changes, xfs_mount.h won't need to be shared with userspace anymore, either. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out the remote symlink handlingDave Chinner2013-08-125-214/+244
| | | | | | | | | | | | The remote symlink format definition and manipulation needs to be shared with userspace, but the in-kernel interfaces do not. Split the remote symlink format handling out into xfs_symlink_remote.[ch] fo it can easily be shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out attribute fork truncation code into separate fileDave Chinner2013-08-124-423/+455
| | | | | | | | | | | | The attribute inactivation code is not used by userspace, so like the attribute listing, split it out into a separate file to minimise the differences between the filesystem shared with libxfs in userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out attribute listing code into separate fileDave Chinner2013-08-125-616/+658
| | | | | | | | | | | | The attribute listing code is not used by userspace, so like the directory readdir code, split it out into a separate file to minimise the differences between the filesystem shared with libxfs in userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: reshuffle dir2 definitions around for userspaceDave Chinner2013-08-1222-64/+83
| | | | | | | | | | | | | | Many of the definitions within xfs_dir2_priv.h are needed in userspace outside libxfs. Definitions within xfs_dir2_priv.h are wholly contained within libxfs, so we need to shuffle some of the definitions around to keep consistency across files shared between user and kernel space. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: move getdents code into it's own fileDave Chinner2013-08-127-616/+649
| | | | | | | | | | | | | | | | The directory readdir code is not used by userspace, but it is intermingled with files that are shared with userspace. This makes it difficult to compare the differences between the userspac eand kernel files are the userspace files don't have the getdents code in them. Move all the kernel getdents code to a separate file to bring the shared content between userspace and kernel files closer together. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: introduce xfs_inode_buf.c for inode buffer operationsDave Chinner2013-08-125-454/+514
| | | | | | | | | | | | | The only thing remaining in xfs_inode.[ch] are the operations that read, write or verify physical inodes in their underlying buffers. Move all this code to xfs_inode_buf.[ch] and so we can stop sharing xfs_inode.[ch] with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: move unrelated definitions out of xfs_inode.hDave Chinner2013-08-124-38/+16
| | | | | | | Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: move inode fork definitions to a new header fileDave Chinner2013-08-125-2007/+2096
| | | | | | | | | | | | | | | | The inode fork definitions are a combination of on-disk format definition and in-memory tracking and manipulation. They are both shared with userspace, so move them all into their own file so sharing is easy to do and track. This removes all inode fork related information from xfs_inode.h. Do the same for the all the C code that currently resides in xfs_inode.c for the same reason. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out transaction reservation codeDave Chinner2013-08-1210-772/+851
| | | | | | | | | | | | | | | The transaction reservation size calculations is used by both kernel and userspace, but most of the transaction code in xfs_trans.c is kernel specific. Split all the transaction reservation code out into it's own files to make sharing with userspace simpler. This just leaves kernel-only definitions in xfs_trans.h, so it doesn't need to be shared with userspace anymore, either. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: sync minor header differences needed by userspace.Dave Chinner2013-08-1213-8/+42
| | | | | | | | | | | Little things like exported functions, __KERNEL__ protections, and so on that ensure user and kernel shared headers are identical. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: introduce xfs_quota_defs.hDave Chinner2013-08-122-131/+160
| | | | | | | | | | | | There are a lot of quota flag definitions that are shared by user and kernel space. Move them all to xfs_quota_defs.h so we can unshare xfs_quota.h and remove the __KERNEL__ regions from it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: introduce xfs_rtalloc_defs.hDave Chinner2013-08-124-52/+54
| | | | | | | | | | | | There are quite a few realtime device definitions shared with userspace. Move them from xfs_rtalloc.h to xfs_rt_alloc_defs.h so we don't need to share xfs_rtalloc.h with userspace anymore. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out on-disk transaction definitionsDave Chinner2013-08-123-207/+208
| | | | | | | | | | | | | | | | | There's a bunch of definitions in xfs_trans.h that define on-disk formats - transaction headers that get written into the log, log item type definitions, etc. Split out everything into a separate file so that all which remains in xfs_trans.h are kernel only definitions. Also, remove the duplicate magic number definitions for XFS_TRANS_MAGIC... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: separate icreate log format definitions from xfs_icreate_item.hDave Chinner2013-08-122-18/+18
| | | | | | | | | | | | | The on disk log format definitions for the icreate log item are intertwined with the kernel-only in-memory log item definitions. Separate the log format definitions out into their own header file so they can easily be shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: separate dquot on disk format definitions out of xfs_quota.hDave Chinner2013-08-1229-137/+175
| | | | | | | | | | | | | The on disk format definitions of the on-disk dquot, log formats and quota off log formats are all intertwined with other definitions for quotas. Separate them out into their own header file so they can easily be shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out EFI/EFD log item format definitionDave Chinner2013-08-122-86/+85
| | | | | | | | | | | | The EFI/EFD item format definitions are shared with userspace. Split the out of header files that contain kernel only defintions to make it simple to shared them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out buf log item format definitionsDave Chinner2013-08-122-97/+100
| | | | | | | | Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: split out inode log item format definitionDave Chinner2013-08-127-183/+200
| | | | | | | | | | | | The log item format definitions are shared with userspace. Split them out of header files that contain kernel only defintions to make it simple to shared them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: separate out log format definitionsDave Chinner2013-08-124-208/+207
| | | | | | | | | | | | | | The on-disk format definitions for the log are spread randoms through a couple of header files. Consolidate it all in a single file that can be shared easily with userspace. This means that xfs_log.h and xfs_log_priv.h no longer need to be shared with userspace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: WQ_NON_REENTRANT is meaningless and going awayTejun Heo2013-07-301-3/+3
| | | | | | | | | | | | | | | dbf2576e37 ("workqueue: make all workqueues non-reentrant") made WQ_NON_REENTRANT no-op and the flag is going away. Remove its usages. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: xfs@oss.sgi.com Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: di_flushiter considered harmfulDave Chinner2013-07-243-11/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we made all inode updates transactional, we no longer needed the log recovery detection for inodes being newer on disk than the transaction being replayed - it was redundant as replay of the log would always result in the latest version of the inode would be on disk. It was redundant, but left in place because it wasn't considered to be a problem. However, with the new "don't read inodes on create" optimisation, flushiter has come back to bite us. Essentially, the optimisation made always initialises flushiter to zero in the create transaction, and so if we then crash and run recovery and the inode already on disk has a non-zero flushiter it will skip recovery of that inode. As a result, log recovery does the wrong thing and we end up with a corrupt filesystem. Because we have to support old kernel to new kernel upgrades, we can't just get rid of the flushiter support in log recovery as we might be upgrading from a kernel that doesn't have fully transactional inode updates. Unfortunately, for v4 superblocks there is no way to guarantee that log recovery knows about this fact. We cannot add a new inode format flag to say it's a "special inode create" because it won't be understood by older kernels and so recovery could do the wrong thing on downgrade. We cannot specially detect the combination of zero mode/non-zero flushiter on disk to non-zero mode, zero flushiter in the log item during recovery because wrapping of the flushiter can result in false detection. Hence that makes this "don't use flushiter" optimisation limited to a disk format that guarantees that we don't need it. And that means the only fix here is to limit the "no read IO on create" optimisation to version 5 superblocks.... Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Start using pquotaino from the superblock.Chandra Seetharaman2013-07-225-43/+158
| | | | | | | | | | | | | Start using pquotino and define a macro to check if the superblock has pquotino. Keep backward compatibilty by alowing mount of older superblock with no separate pquota inode. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Initialize all quota inodes to be NULLFSINOChandra Seetharaman2013-07-221-0/+18
| | | | | | | | | | | | | | | | | | mkfs doesn't initialize the quota inodes to NULLFSINO as it does for the other internal inodes. This leads to two in-core values (0 and NULLFSINO) to be checked against, to make sure if a quota inode is valid. Solve that problem by initializing the in-core values of all quotaino values to NULLFSINO if they are 0 in the disk. Note that these values are not written back to on-disk superblock unless some quota is enabled on the filesystem. Even in that case sb_pquotino is written to disk only if the on-disk superblock supports pquotino Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Fix a deadlock in xfs_log_commit_cil() code pathChandra Seetharaman2013-07-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing and rearranging pquota/gquota code, I stumbled on a xfs_shutdown() during a mount. But the mount just hung. Debugged and found that there is a deadlock involving &log->l_cilp->xc_ctx_lock. It is in a code path where &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels down the same semaphore is being acquired in write mode causing a deadlock. This is the stack: xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode xlog_print_tic_res xfs_force_shutdown xfs_log_force_umount xlog_cil_force xlog_cil_force_lsn xlog_cil_push_foreground xlog_cil_push - tries to acquire same semaphore in write mode This patch fixes the deadlock by changing the reason code for xfs_force_shutdown in xlog_print_tic_res() to SHUTDOWN_LOG_IO_ERROR. SHUTDOWN_LOG_IO_ERROR is the right reason code to be set since we are in the log path. Thanks to Dave for suggesting this solution. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: fix assertion failure in xfs_vm_write_failed()Jie Liu2013-07-221-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xfs_vm_write_failed(), we evaluate the block_offset of pos with PAGE_MASK which is an unsigned long. That is fine on 64-bit platforms regardless of whether the request pos is 32-bit or 64-bit. However, on 32-bit platforms the value is 0xfffff000 and so the high 32 bits in it will be masked off with (pos & PAGE_MASK) for a 64-bit pos. As a result, the evaluated block_offset is incorrect which will cause this failure ASSERT(block_offset + from == pos); and potentially pass the wrong block to xfs_vm_kill_delalloc_range(). In this case, we can get a kernel panic if CONFIG_XFS_DEBUG is enabled: XFS: Assertion failed: block_offset + from == pos, file: fs/xfs/xfs_aops.c, line: 1504 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:100! invalid opcode: 0000 [#1] SMP ........ Pid: 4057, comm: mkfs.xfs Tainted: G O 3.9.0-rc2 #1 EIP: 0060:[<f94a7e8b>] EFLAGS: 00010282 CPU: 0 EIP is at assfail+0x2b/0x30 [xfs] EAX: 00000056 EBX: f6ef28a0 ECX: 00000007 EDX: f57d22a4 ESI: 1c2fb000 EDI: 00000000 EBP: ea6b5d30 ESP: ea6b5d1c DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 094f3ff4 CR3: 2bcb4000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process mkfs.xfs (pid: 4057, ti=ea6b4000 task=ea5799e0 task.ti=ea6b4000) Stack: 00000000 f9525c48 f951fa80 f951f96b 000005e4 ea6b5d7c f9494b34 c19b0ea2 00000066 f3d6c620 c19b0ea2 00000000 e9a91458 00001000 00000000 00000000 00000000 c15c7e89 00000000 1c2fb000 00000000 00000000 1c2fb000 00000080 Call Trace: [<f9494b34>] xfs_vm_write_failed+0x74/0x1b0 [xfs] [<c15c7e89>] ? printk+0x4d/0x4f [<f9494d7d>] xfs_vm_write_begin+0x10d/0x170 [xfs] [<c110a34c>] generic_file_buffered_write+0xdc/0x210 [<f949b669>] xfs_file_buffered_aio_write+0xf9/0x190 [xfs] [<f949b7f3>] xfs_file_aio_write+0xf3/0x160 [xfs] [<c115e504>] do_sync_write+0x94/0xd0 [<c115ed1f>] vfs_write+0x8f/0x160 [<c115e470>] ? wait_on_retry_sync_kiocb+0x50/0x50 [<c115f017>] sys_write+0x47/0x80 [<c15d860d>] sysenter_do_call+0x12/0x28 ............. EIP: [<f94a7e8b>] assfail+0x2b/0x30 [xfs] SS:ESP 0068:ea6b5d1c ---[ end trace cdd9af4f4ecab42f ]--- Kernel panic - not syncing: Fatal exception In order to avoid this, we can evaluate the block_offset of the start of the page by using shifts rather than masks the mismatch problem. Thanks Dave Chinner for help finding and fixing this bug. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* Linux 3.11-rc1v3.11-rc1Linus Torvalds2013-07-142-1604/+883
|
* Merge branch 'slab/for-linus' of ↵Linus Torvalds2013-07-148-69/+121
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab update from Pekka Enberg: "Highlights: - Fix for boot-time problems on some architectures due to init_lock_keys() not respecting kmalloc_caches boundaries (Christoph Lameter) - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim) - Fix for excessive slab freelist draining (Wanpeng Li) - SLUB and SLOB cleanups and fixes (various people)" I ended up editing the branch, and this avoids two commits at the end that were immediately reverted, and I instead just applied the oneliner fix in between myself. * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux slub: Check for page NULL before doing the node_match check mm/slab: Give s_next and s_stop slab-specific names slob: Check for NULL pointer before calling ctor() slub: Make cpu partial slab support configurable slab: add kmalloc() to kernel API documentation slab: fix init_lock_keys slob: use DIV_ROUND_UP where possible slub: do not put a slab to cpu partial list when cpu_partial is 0 mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo mm/slub: Drop unnecessary nr_partials mm/slab: Fix /proc/slabinfo unwriteable for slab mm/slab: Sharing s_next and s_stop between slab and slub mm/slab: Fix drain freelist excessively slob: Rework #ifdeffery in slab.h mm, slab: moved kmem_cache_alloc_node comment to correct place
| * slub: Check for page NULL before doing the node_match checkSteven Rostedt2013-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the -rt kernel (mrg), we hit the following dump: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 PGD a2d39067 PUD b1641067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod CPU 3 Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992 RIP: 0010:[<ffffffff811573f1>] [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 RSP: 0018:ffff8800a9b17d70 EFLAGS: 00010213 RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000 RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500 RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500 R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000) Stack: ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011 0000000001200011 0000000001200011 0000000000000000 0000000000000000 00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd Call Trace: [<ffffffff81202e08>] ? current_has_perm+0x68/0x80 [<ffffffff81041cbd>] copy_process+0xdd/0x15b0 [<ffffffff810a2125>] ? rt_up_read+0x25/0x30 [<ffffffff8104369a>] do_fork+0x5a/0x360 [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220 [<ffffffff8100b068>] sys_clone+0x28/0x30 [<ffffffff81527423>] stub_clone+0x13/0x20 [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2 RIP [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180 RSP <ffff8800a9b17d70> CR2: 0000000000000000 ---[ end trace 0000000000000002 ]--- Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do disable migration. But the SLUB code is relatively lockless, and the spin_locks there are raw_spin_locks (not converted to mutexes), thus I believe this bug can happen in mainline without -rt features. The -rt patch is just good at triggering mainline bugs ;-) Anyway, looking at where this crashed, it seems that the page variable can be NULL when passed to the node_match() function (which does not check if it is NULL). When this happens we get the above panic. As page is only used in slab_alloc() to check if the node matches, if it's NULL I'm assuming that we can say it doesn't and call the __slab_alloc() code. Is this a correct assumption? Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * mm/slab: Give s_next and s_stop slab-specific namesWanpeng Li2013-07-083-8/+8
| | | | | | | | | | | | | | | | Give s_next and s_stop slab-specific names instead of exporting "s_next" and "s_stop". Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slob: Check for NULL pointer before calling ctor()Steven Rostedt2013-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing some code inspection, I noticed that the slob constructor method can be called with a NULL pointer. If memory is tight and slob fails to allocate with slob_alloc() or slob_new_pages() it still calls the ctor() method with a NULL pointer. Looking at the first ctor() method I found, I noticed that it can not handle a NULL pointer (I'm sure others probably can't either): static void sighand_ctor(void *data) { struct sighand_struct *sighand = data; spin_lock_init(&sighand->siglock); init_waitqueue_head(&sighand->signalfd_wqh); } The solution is to only call the ctor() method if allocation succeeded. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slub: Make cpu partial slab support configurableJoonsoo Kim2013-07-072-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | CPU partial support can introduce level of indeterminism that is not wanted in certain context (like a realtime kernel). Make it configurable. This patch is based on Christoph Lameter's "slub: Make cpu partial slab support configurable V2". Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
OpenPOWER on IntegriCloud