summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* jbd2: Change j_state_lock to be a rwlock_tTheodore Ts'o2010-08-038-110/+114
| | | | | | | | Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stopTheodore Ts'o2010-08-024-41/+51
| | | | | | | | By using an atomic_t for t_updates and t_outstanding credits, this should allow us to not need to take transaction t_handle_lock in jbd2_journal_stop(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Add mount options in superblockTheodore Ts'o2010-08-012-8/+30
| | | | | | | Allow mount options to be stored in the superblock. Also add default mount option bits for nobarrier, block_validity, discard, and nodelalloc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: force block allocation on quota_offDmitry Monakhov2010-08-011-1/+14
| | | | | | | | Perform full sync procedure so that any delayed allocation blocks are allocated so quota will be consistent. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: fix freeze deadlock under IOEric Sandeen2010-08-011-2/+2
| | | | | | | | | | | | | | | | | | | | | Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks when freezing a filesystem which had active IO; the vfs_check_frozen level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing through. Duh. Changing the test to FREEZE_TRANS should let the normal freeze syncing get through the fs, but still block any transactions from starting once the fs is completely frozen. I tested this by running fsstress in the background while periodically snapshotting the fs and running fsck on the result. I ran into occasional deadlocks, but different ones. I think this is a fine fix for the problem at hand, and the other deadlocky things will need more investigation. Reported-by: Phillip Susi <psusi@cfl.rr.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: drop inode from orphan list if ext4_delete_inode() failsTheodore Ts'o2010-07-291-0/+1
| | | | | | | | | There were some error paths in ext4_delete_inode() which was not dropping the inode from the orphan list. This could lead to a BUG_ON on umount when the orphan list is discovered to be non-empty. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: check to make make sure bd_dev is set before dereferencing itTheodore Ts'o2010-07-271-4/+13
| | | | | | | | | There are some drivers which may not set bdev->bd_dev. So make sure it is non-NULL before dereferencing it. Google-Bug-Id: 1773557 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Make barrier messages less scaryEric Sandeen2010-07-271-4/+4
| | | | | | | | | | Saying things like "sync failed" when a device does not support barriers makes users slightly more worried than they need to be; rather than talking about sync failures, let's just state the barrier-based facts. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: don't print scary messages for allocation failures post-abortEric Sandeen2010-07-272-11/+17
| | | | | | | | | | | | | | | | | | | I often get emails containing the "This should not happen!!" message, conveniently trimmed to remove things like: sd 0:0:0:0: [sda] Unhandled error code sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 03 13 c9 70 00 00 28 00 end_request: I/O error, dev sda, sector 51628400 Aborting journal on device dm-0-8. EXT4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal EXT4-fs (dm-0): Remounting filesystem read-only I don't think there is any value to the verbosity if the reason is due to a filesystem abort; it just obfuscates the root cause. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: fix EFBIG edge case when writing to large non-extent fileToshiyuki Okajima2010-07-271-1/+2
| | | | | | | | | | | | | | | | | | | By running the following reproducer, we can confirm that the write system call returns with 0 when it should return the error EFBIG. #!/bin/sh /bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1 /sbin/mkfs.ext3 -Fq ./img /bin/mount -o loop -t ext4 ./img /mnt /bin/touch /mnt/file strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = " /bin/umount /mnt exit Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com>
* ext4: fix ext4_get_blocks referencesEric Sandeen2010-07-272-9/+6
| | | | | | | | ext4_get_blocks got renamed to ext4_map_blocks, but left stale comments and a prototype littered around. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Always journal quota file modificationsJan Kara2010-07-271-14/+5
| | | | | | | | | | | | When journaled quota options are not specified, we do writes to quota files just in data=ordered mode. This actually causes warnings from JBD2 about dirty journaled buffer because ext4_getblk unconditionally treats a block allocated by it as metadata. Since quota actually is filesystem metadata, the easiest way to get rid of the warning is to always treat quota writes as metadata... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Fix potential memory leak in ext4_fill_superCyrill Gorcunov2010-07-271-3/+5
| | | | | | | | | Under heavy memory pressure we may hit out of memory situation and as result kstrdup'ed options will not be freed. Fix it. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Don't error out the fs if the user tries to make a file too bigTheodore Ts'o2010-07-271-4/+2
| | | | | | | | | | If the user attempts to make a non-extent-mapped file to be too large, return EFBIG, but don't call ext4_std_err() which will end up marking the file system as containing an error. Thanks to Toshiyuki Okajima-san at Fujitsu for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: allocate stripe-multiple IOs on stripe boundariesEric Sandeen2010-07-271-4/+3
| | | | | | | | | | | | | | For some reason, today mballoc only allocates IOs which are exactly stripe-sized on a stripe boundary. If you have a multiple (say, a 128k IO on a 64k stripe) you may end up unaligned. It seems to me that a simple change to align stripe-multiple IOs on stripe boundaries would be a very good idea, unless this breaks some other mballoc heuristic for some reason... Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: move aio completion after unwritten extent conversionjiayingz@google.com (Jiaying Zhang)2010-07-272-6/+15
| | | | | | | | | | | | | | | This patch is to be applied upon Christoph's "direct-io: move aio_complete into ->end_io" patch. It adds iocb and result fields to struct ext4_io_end_t, so that we can call aio_complete from ext4_end_io_nolock() after the extent conversion has finished. I have verified with Christoph's aio-dio test that used to fail after a few runs on an original kernel but now succeeds on the patched kernel. See http://thread.gmane.org/gmane.comp.file-systems.ext4/19659 for details. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* direct-io: move aio_complete into ->end_ioChristoph Hellwig2010-07-276-18/+37
| | | | | | | | | | | | | | Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Support discard requests when running in no-journal modeJiaying Zhang2010-07-271-16/+21
| | | | | | | | Issue discard request in ext4_free_blocks() when ext4 has no journal and is mounted with discard option. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* jbd2: Remove __GFP_NOFAIL from jbd2 layerTheodore Ts'o2010-07-273-23/+57
| | | | | | | | | | | __GFP_NOFAIL is going away, so add our own retry loop. Also add jbd2__journal_start() and jbd2__journal_restart() which take a gfp mask, so that file systems can optionally (re)start transaction handles using GFP_KERNEL. If they do this, then they need to be prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready to reflect that error up to userspace. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Fix block bitmap inconsistencies after a crash when deleting filesAmir G2010-07-271-22/+13
| | | | | | | | | | | | | | | | | | | We have experienced bitmap inconsistencies after crash during file delete under heavy load. The crash is not file system related and I the following patch in ext4_free_branches() fixes the recovery problem. If the transaction is restarted and there is a crash before the new transaction is committed, then after recovery, the blocks that this indirect block points to have been freed, but the indirect block itself has not been freed and may still point to some of the free blocks (because of the ext4_forget()). So ext4_forget() should be called inside ext4_free_blocks() to avoid this problem. Signed-off-by: Amir Goldstein <amir73il@users.sf.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Remove unnecessary casts of private_dataJoe Perches2010-07-272-2/+2
| | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: fix potential NULL dereference while tracingTheodore Ts'o2010-07-272-10/+14
| | | | | | The allocation_context pointer can be NULL. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Define s_jnl_backup_type in superblockTheodore Ts'o2010-07-271-2/+2
| | | | | | | This has been in use by e2fsprogs for a while; define it to keep the super block fields in sync. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Once a day, printk file system error information to dmesgTheodore Ts'o2010-07-272-0/+62
| | | | | | | | | This allows us to grab any file system error messages by scraping /var/log/messages. This will make it easy for us to do error analysis across the very large number of machines as we deploy ext4 across the fleet. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Save error information to the superblock for analysisTheodore Ts'o2010-07-275-22/+90
| | | | | | | | Save number of file system errors, and the time function name, line number, block number, and inode number of the first and most recent errors reported on the file system in the superblock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Pass line numbers to ext4_error() and friendsTheodore Ts'o2010-07-278-80/+98
| | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Cleanup ext4_check_dir_entry so __func__ is now implicitTheodore Ts'o2010-07-273-16/+17
| | | | | | | Also start passing the line number to ext4_check_dir since we're going to need it in upcoming patch. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Pass line number to ext4_journal_abort_handle()Theodore Ts'o2010-06-293-47/+53
| | | | | | This allows the error messages to include the line number Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Enhance ext4_grp_locked_error() to take block and function numbersTheodore Ts'o2010-06-294-32/+45
| | | | | | Also use a macro definition so that __func__ and __LINE__ is implicit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: clean up ext4_abort() so __func__ is now implicitTheodore Ts'o2010-06-293-9/+10
| | | | | | | Use a macro definition for ext4_abort() to clean up the .c files a wee bit. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Add new superblock fields reserved for the Next3 snapshot featureTheodore Ts'o2010-06-291-1/+7
| | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: update ctime when changing the file's permission by setfaclJan Kara2010-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | ext4 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, ext4 must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove vestiges of nobh supportChristoph Hellwig2010-06-144-44/+15
| | | | | | | | The nobh option was only supported for writeback mode, but given that all write paths actually create buffer heads it effectively was a no-op already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove initialized but not read variablesAndi Kleen2010-06-147-50/+10
| | | | | | | | | No real bugs found, just removed some dead code. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Convert more i_flags references to use accessor functionsTheodore Ts'o2010-06-143-4/+5
| | | | | | | These changes are not ones which are likely to result in races, but they should be fixed. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Clean up s_dirt handlingTheodore Ts'o2010-06-119-16/+36
| | | | | | | | | | | | | | | | | We don't need to set s_dirt in most of the ext4 code when journaling is enabled. In ext3/4 some of the summary statistics for # of free inodes, blocks, and directories are calculated from the per-block group statistics when the file system is mounted or unmounted. As a result the superblock doesn't have to be updated, either via the journal or by setting s_dirt. There are a few exceptions, most notably when resizing the file system, where the superblock needs to be modified --- and in that case it should be done as a journalled operation if possible, and s_dirt set only in no-journal mode. This patch will optimize out some unneeded disk writes when using ext4 with a journal. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* Linux 2.6.35-rc3v2.6.35-rc3Linus Torvalds2010-06-111-1/+1
|
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2010-06-1114-28/+103
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: wimax/i2400m: fix missing endian correction read in fw loader net8139: fix a race at the end of NAPI pktgen: Fix accuracy of inter-packet delay. pkt_sched: gen_estimator: add a new lock net: deliver skbs on inactive slaves to exact matches ipv6: fix ICMP6_MIB_OUTERRORS r8169: fix mdio_read and update mdio_write according to hw specs gianfar: Revive the driver for eTSEC devices (disable timestamping) caif: fix a couple range checks phylib: Add support for the LXT973 phy. net: Print num_rx_queues imbalance warning only when there are allocated queues
| * Merge branch 'wimax-2.6.35.y' of ↵David S. Miller2010-06-111-1/+1
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax
| | * wimax/i2400m: fix missing endian correction read in fw loaderInaky Perez-Gonzalez2010-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i2400m_fw_hdr_check() was accessing hardware field bcf_hdr->module_type (little endian 32) without converting to host byte sex. Reported-by: Данилин Михаил <mdanilin@nsg.net.ru> Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
| * | net8139: fix a race at the end of NAPIFigo.zhang2010-06-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | fix a race at the end of NAPI complete processing, it had better do __napi_complete() first before re-enable interrupt. Signed-off-by:Figo.zhang <figo1802@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | pktgen: Fix accuracy of inter-packet delay.Daniel Turull2010-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch correct a bug in the delay of pktgen. It makes sure the inter-packet interval is accurate. Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | pkt_sched: gen_estimator: add a new lockEric Dumazet2010-06-101-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gen_kill_estimator() / gen_new_estimator() is not always called with RTNL held. net/netfilter/xt_RATEEST.c is one user of these API that do not hold RTNL, so random corruptions can occur between "tc" and "iptables". Add a new fine grained lock instead of trying to use RTNL in netfilter. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: deliver skbs on inactive slaves to exact matchesJohn Fastabend2010-06-103-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the accelerated receive path for VLAN's will drop packets if the real device is an inactive slave and is not one of the special pkts tested for in skb_bond_should_drop(). This behavior is different then the non-accelerated path and for pkts over a bonded vlan. For example, vlanx -> bond0 -> ethx will be dropped in the vlan path and not delivered to any packet handlers at all. However, bond0 -> vlanx -> ethx and bond0 -> ethx will be delivered to handlers that match the exact dev, because the VLAN path checks the real_dev which is not a slave and netif_recv_skb() doesn't drop frames but only delivers them to exact matches. This patch adds a sk_buff flag which is used for tagging skbs that would previously been dropped and allows the skb to continue to skb_netif_recv(). Here we add logic to check for the deliver_no_wcard flag and if it is set only deliver to handlers that match exactly. This makes both paths above consistent and gives pkt handlers a way to identify skbs that come from inactive slaves. Without this patch in some configurations skbs will be delivered to handlers with exact matches and in others be dropped out right in the vlan path. I have tested the following 4 configurations in failover modes and load balancing modes. # bond0 -> ethx # vlanx -> bond0 -> ethx # bond0 -> vlanx -> ethx # bond0 -> ethx | vlanx -> -- Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: fix ICMP6_MIB_OUTERRORSEric Dumazet2010-06-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | In commit 1f8438a85366 (icmp: Account for ICMP out errors), I did a typo on IPV6 side, using ICMP6_MIB_OUTMSGS instead of ICMP6_MIB_OUTERRORS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | r8169: fix mdio_read and update mdio_write according to hw specsTimo Teräs2010-06-091-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Realtek confirmed that a 20us delay is needed after mdio_read and mdio_write operations. Reduce the delay in mdio_write, and add it to mdio_read too. Also add a comment that the 20us is from hw specs. Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'num_rx_queues' of git://kernel.ubuntu.com/rtg/net-2.6David S. Miller2010-06-091-5/+3
| |\ \
| | * | net: Print num_rx_queues imbalance warning only when there are allocated queuesTim Gardner2010-06-091-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/591416 There are a number of network drivers (bridge, bonding, etc) that are not yet receive multi-queue enabled and use alloc_netdev(), so don't print a num_rx_queues imbalance warning in that case. Also, only print the warning once for those drivers that _are_ multi-queue enabled. Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
| * | | gianfar: Revive the driver for eTSEC devices (disable timestamping)Anton Vorontsov2010-06-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit cc772ab7cdcaa24d1fae332d92a1602788644f7a ("gianfar: Add hardware RX timestamping support"), the driver no longer works on at least MPC8313ERDB and MPC8568EMDS boards (and possibly much more boards as well). That's how MPC8313 Reference Manual describes RCTRL_TS_ENABLE bit: Timestamp incoming packets as padding bytes. PAL field is set to 8 if the PAL field is programmed to less than 8. Must be set to zero if TMR_CTRL[TE]=0. I see that the commit above sets this bit, but it doesn't handle TMR_CTRL. Manfred probably had this bit set by the firmware for his boards. But obviously this isn't true for all boards in the wild. Also, I recall that Freescale BSPs were explicitly disabling the timestamping because of a performance drop. For now, the best way to deal with this is just disable the timestamping, and later we can discuss proper device tree bindings and implement enabling this feature via some property. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | caif: fix a couple range checksDan Carpenter2010-06-092-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extra ! character means that these conditions are always false. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud