op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	drm/i915,agp/intel: Do not clear stolen entries	Chris Wilson	2011-01-24	4	-14/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	We can only utilize the stolen portion of the GTT if we are in sole charge of the hardware. This is only true if using GEM and KMS, otherwise VESA continues to access stolen memory. Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	drm/i915: Recognise non-VGA display devices	Chris Wilson	2011-01-23	3	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Starting with SandyBridge (though possible with earlier hacked BIOSes), the BIOS may initialise the IGFX as secondary to a discrete GPU. Prior, it would simply disable the integrated GPU. So we adjust our PCI class mask to match any DISPLAY_CLASS device. In such a configuration, the IGFX is not a primary VGA controller and so should not take part in VGA arbitration, and the error return from vga_client_register() is expected. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
*	drm/i915: Fix use of invalid array size for ring->sync_seqno	Chris Wilson	2011-01-23	2	-2/+2
\| \| \| \| \| \| \| \| \|	There are I915_NUM_RINGS-1 inter-ring synchronisation counters, but we were clearing I915_NUM_RINGS of them. Oops. Reported-by: Jiri Slaby <jirislaby@gmail.com> Tested-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	drm/i915/ringbuffer: Fix use of stale HEAD position whilst polling for space	Chris Wilson	2011-01-20	2	-17/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During suspend, Linus found that his machine would hang for 3 seconds, and identified that intel_ring_buffer_wait() was the culprit: "Because from looking at the code, I get the notion that "intel_read_status_page()" may not be exact. But what happens if that inexact value matches our cached ring->actual_head, so we never even try to read the exact case? Does it _stay_ inexact for arbitrarily long times? If so, we might wait for the ring to empty forever (well, until the timeout - the behavior I see), even though the ring really _is_ empty." As the reported HEAD position is only updated every time it crosses a 64k boundary, whilst draining the ring it is indeed likely to remain one value. If that value matches the last known HEAD position, we never read the true value from the register and so trigger a timeout. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	drm/i915: Don't kick-off hangcheck after a DRI interrupt	Chris Wilson	2011-01-20	1	-1/+5
\| \| \| \| \| \| \|	Hangcheck and error recovery is only used by GEM. Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	drm/i915: Add dependency on CONFIG_TMPFS	Chris Wilson	2011-01-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without tmpfs, shmem_readpage() is not compiled in causing an OOPS as soon as we try to allocate some swappable pages for GEM. Jan 19 22:52:26 harlie kernel: Modules linked in: i915(+) drm_kms_helper cfbcopyarea video backlight cfbimgblt cfbfillrect Jan 19 22:52:26 harlie kernel: Jan 19 22:52:26 harlie kernel: Pid: 1125, comm: modprobe Not tainted 2.6.37Harlie #10 To be filled by O.E.M./To be filled by O.E.M. Jan 19 22:52:26 harlie kernel: EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 3 Jan 19 22:52:26 harlie kernel: EIP is at 0x0 Jan 19 22:52:26 harlie kernel: EAX: 00000000 EBX: f7b7d000 ECX: f3383100 EDX: f7b7d000 Jan 19 22:52:26 harlie kernel: ESI: f1456118 EDI: 00000000 EBP: f2303c98 ESP: f2303c7c Jan 19 22:52:26 harlie kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Jan 19 22:52:26 harlie kernel: Process modprobe (pid: 1125, ti=f2302000 task=f259cd80 task.ti=f2302000) Jan 19 22:52:26 harlie kernel: Stack: Jan 19 22:52:26 harlie udevd-work[1072]: '/sbin/modprobe -b pci:v00008086d00000046sv00000000sd00000000bc03sc00i00' unexpected exit with status 0x0009 Jan 19 22:52:26 harlie kernel: c1074061 000000d0 f2f42b80 00000000 000a13d2 f2d5dcc0 00000001 f2303cac Jan 19 22:52:26 harlie kernel: c107416f 00000000 000a13d2 00000000 f2303cd4 f8d620ed f2cee620 00001000 Jan 19 22:52:26 harlie kernel: 00000000 000a13d2 f1456118 f2d5dcc0 f1a40000 00001000 f2303d04 f8d637ab Jan 19 22:52:26 harlie kernel: Call Trace: Jan 19 22:52:26 harlie kernel: [<c1074061>] ? do_read_cache_page+0x71/0x160 Jan 19 22:52:26 harlie kernel: [<c107416f>] ? read_cache_page_gfp+0x1f/0x30 Jan 19 22:52:26 harlie kernel: [<f8d620ed>] ? i915_gem_object_get_pages+0xad/0x1d0 [i915] Jan 19 22:52:26 harlie kernel: [<f8d637ab>] ? i915_gem_object_bind_to_gtt+0xeb/0x2d0 [i915] Jan 19 22:52:26 harlie kernel: [<f8d65961>] ? i915_gem_object_pin+0x151/0x190 [i915] Jan 19 22:52:26 harlie kernel: [<c11e16ed>] ? drm_gem_object_init+0x3d/0x60 Jan 19 22:52:26 harlie kernel: [<f8d65aa5>] ? i915_gem_init_ringbuffer+0x105/0x1e0 [i915] Jan 19 22:52:26 harlie kernel: [<f8d571b7>] ? i915_driver_load+0x667/0x1160 [i915] Reported-by: John J. Stimson-III <john@idsfa.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
*	drm/i915: Initialise ring vfuncs for old DRI paths	Chris Wilson	2011-01-20	3	-18/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We weren't setting up the vfunc table when initialising the old DRI ringbuffer, leading to such OOPSes as: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<(null)>] (null) PGD 10c441067 PUD 1185e5067 PMD 0 Oops: 0010 [#1] PREEMPT SMP last sysfs file: /sys/class/dmi/id/chassis_asset_tag CPU 3 Modules linked in: i915 drm_kms_helper drm fb fbdev i2c_algo_bit cfbcopyarea video backlight output cfbimgblt cfbfillrect autofs4 ipv6 nfs lockd fscache nfs_acl auth_rpcgss sunrpc coretemp hwmon_vid mousedev usbhid hid option usb_wwan snd_hda_codec_via asus_atk0110 atl1e usbserial snd_hda_intel snd_hda_codec firmware_class snd_hwdep snd_pcm snd_seq snd_timer snd_seq_device processor parport_pc thermal snd thermal_sys parport 8250_pnp button rng_core rtc_cmos shpchp hwmon rtc_core ehci_hcd pci_hotplug uhci_hcd soundcore tpm_tis i2c_i801 rtc_lib tpm serio_raw snd_page_alloc tpm_bios i2c_core usbcore psmouse intel_agp sg pcspkr sr_mod evdev cdrom ext3 jbd mbcache dm_mod sd_mod ata_piix libata scsi_mod unix Jan 18 15:49:29 lithui kernel: Pid: 3605, comm: Xorg Not tainted 2.6.36.2 #5 P5KPL-CM/System Product Name RIP: 0010:[<0000000000000000>] [<(null)>] (null) RSP: 0018:ffff8801150d1d40 EFLAGS: 00010202 RAX: 000000000001ffff RBX: ffff88011a011b00 RCX: 000000000001a704 RDX: ffff880118566028 RSI: ffff880118566028 RDI: ffff880117876800 RBP: ffff8801150d1d48 R08: ffff8801195fe300 R09: 00000000c0086444 R10: 0000000000000001 R11: 0000000000003206 R12: ffff880117876800 R13: ffff880118566000 R14: ffff880117876820 R15: ffff8801150d1df8 FS: 00007f1038d456e0(0000) GS:ffff880001780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001187e7000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process Xorg (pid: 3605, threadinfo ffff8801150d0000, task ffff88011b016e40) Stack: ffffffffa043b8e6 ffff8801150d1d98 ffffffffa041768b dead000000000000 <0> 0000000000000048 00007f1023f2a000 0000000000000044 0000000000000008 <0> ffff88010d26bd80 ffff880117876800 ffff8801150d1df8 ffff8801150d1ea8 Call Trace: [<ffffffffa043b8e6>] ? intel_ring_advance+0x16/0x20 [i915] [<ffffffffa041768b>] i915_irq_emit+0x15b/0x240 [i915] [<ffffffffa03ea7b1>] drm_ioctl+0x1f1/0x460 [drm] [<ffffffffa0417530>] ? i915_irq_emit+0x0/0x240 [i915] [<ffffffff810dd8f1>] ? do_sync_read+0xd1/0x120 [<ffffffff81025b1f>] ? do_page_fault+0x1df/0x3d0 [<ffffffff810ed5c7>] do_vfs_ioctl+0x97/0x550 [<ffffffff8115c2ea>] ? security_file_permission+0x7a/0x90 [<ffffffff810edb19>] sys_ioctl+0x99/0xa0 [<ffffffff810024ab>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<(null)>] (null) RSP <ffff8801150d1d40> CR2: 0000000000000000 Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Herbert Xu <herbert@gondor.apana.org.au> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29153 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23172 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
*	drm/i915: make the blitter report buffer modifications to the FBC unit	Jesse Barnes	2011-01-18	2	-0/+25
\| \| \| \| \| \| \| \| \| \| \|	Without this change, blits to the front buffer won't invalidate FBC state, causing us to scan out stale data. Make sure we update these bits on every FBC enable, since they may get clobbered if we shut off the display. References: https://bugzilla.kernel.org/show_bug.cgi?id=26932 Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	drm/i915: set more FBC chicken bits	Jesse Barnes	2011-01-18	2	-1/+5
\| \| \| \| \| \| \| \| \|	Add a couple of missing workaround bits for ILK & SNB. These disable clock gating on a couple of units that would otherwise prevent FBC from working. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*	staging: fix build failure in bcm driver	Andres Salomon	2011-01-17	2	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While building latest Linus git, I hit the following: CC [M] drivers/staging/bcm/Qos.o drivers/staging/bcm/Qos.c: In function ‘PruneQueue’: drivers/staging/bcm/Qos.c:367: error: ‘struct netdev_queue’ has no member named ‘tx_dropped’ drivers/staging/bcm/Qos.c: In function ‘flush_all_queues’: drivers/staging/bcm/Qos.c:416: error: ‘struct netdev_queue’ has no member named ‘tx_dropped’ make[5]: * [drivers/staging/bcm/Qos.o] Error 1 make[4]: * [drivers/staging/bcm] Error 2 make[3]: * [drivers/staging] Error 2 As well as: CC [M] drivers/staging/bcm/Transmit.o drivers/staging/bcm/Transmit.c: In function ‘SetupNextSend’: drivers/staging/bcm/Transmit.c:163: error: ‘struct netdev_queue’ has no member named ‘tx_bytes’ drivers/staging/bcm/Transmit.c:164: error: ‘struct netdev_queue’ has no member named ‘tx_packets’ make[2]: * [drivers/staging/bcm/Transmit.o] Error 1 tx_dropped/tx_bytes_tx_packets were removed in commit 1ac9ad13. This patch converts bcm to use net_device_stats instead of netdev_queue. Acked-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-01-17	38	-144/+351
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA: Update workqueue usage RDMA/nes: Fix incorrect SFP+ link status detection on driver init RDMA/nes: Fix SFP+ link down detection issue with switch port disable RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events RDMA/nes: Fix bonding on iw_nes IB/srp: Test only once whether iu allocation succeeded IB/mlx4: Handle protocol field in multicast table RDMA: Use vzalloc() to replace vmalloc()+memset(0) mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long) IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)
\| *-----.	Merge branches 'misc', 'mlx4', 'mthca', 'nes' and 'srp' into for-next	Roland Dreier	2011-01-16	20	-60/+297
\| \|\ \ \ \
\| \| \| \| \| *	IB/srp: Test only once whether iu allocation succeeded	Bart Van Assche	2011-01-13	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge the two tests in srp_queuecommand() of whether information unit allocation succeeded into one. An intended side effect of this change is that we fix the warning: drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_queuecommand': drivers/infiniband/ulp/srp/ib_srp.c:1116: warning: 'req' may be used uninitialized in this function (seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y at least with gcc 4.4.4) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| \| \| * \|	RDMA/nes: Fix incorrect SFP+ link status detection on driver init	Maciej Sosnowski	2011-01-16	1	-4/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During iw_nes initialization the link status for SFP+ PHY is always detected as "up" regardless of real state (cable either connected or disconnected). Add SFP+ PHY specific link status detection to the iw_nes initialization procedure. Use link status recheck for netdev_open to detect delayed state updates. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| \| \| * \|	RDMA/nes: Fix SFP+ link down detection issue with switch port disable	Maciej Sosnowski	2011-01-16	4	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of SFP+ PHY, link status check at interrupt processing can give false results. For proper link status change detection a delayed recheck is needed to give nes registers time to settle. Add a periodic link status recheck scheduled at interrupt to detect potential delayed registers state changes. Addresses: http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2117 Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| \| \| * \|	RDMA/nes: Generate IB_EVENT_PORT_ERR/PORT_ACTIVE events	Maciej Sosnowski	2011-01-16	4	-9/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Depending on link state change, IB_EVENT_PORT_ERR or IB_EVENT_PORT_ACTIVE should be generated when handling MAC interrupts. Plugging in a cable happens to result in series of interrupts changing driver's link state a number of times before finally staying at link up (e.g. link up, link down, link up, link down, ..., link up). To prevent sending series of redundant IB_EVENT_PORT_ACTIVE and IB_EVENT_PORT_ERR events, we use a timer to debounce them in nes_port_ibevent(). Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| \| \| * \|	RDMA/nes: Fix bonding on iw_nes	Maciej Sosnowski	2011-01-16	2	-6/+26
\| \| \| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable configuring bonds on nes devices by adding missing support for master net_device to the driver. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| \| * \|	IB/mthca: Fix driver when sizeof (phys_addr_t) > sizeof (long)	John L. Burr	2011-01-11	5	-6/+7
\| \| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some systems have PCI addresses that don't fit in unsigned long (eg some 32-bit PowerPC 440 systems have 36-bit bus addresses). Fix up the driver by using phys_addr_t where appropriate, so we don't truncate any PCI resource addresses before ioremapping them. Signed-off-by: John L. Burr <jlburr@cadence.com> [ Update to apply to current driver source. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| * \|	IB/mlx4: Handle protocol field in multicast table	Aleksey Senin	2011-01-12	4	-21/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The newest device firmware stores IB vs. Ethernet protocol in two bits in members_count field of multicast group table (0: Infiniband, 1: Ethernet). When changing the QP members count for a multicast group, it important not to reset this information. When calling multicast attach first time, the protocol type should be specified. In this patch we always set it IB, but in the future we will handle Ethernet too. When looking for a QP, the protocol type shoud be checked too. Signed-off-by: Aleksey Senin <alekseys@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| \| * \|	mlx4_{core, ib, en}: Fix driver when sizeof (phys_addr_t) > sizeof (long)	Roland Dreier	2011-01-12	4	-6/+8
\| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some systems have PCI addresses that don't fit in unsigned long (eg some 32-bit PowerPC 440 systems have 36-bit bus addresses). Fix up mlx4 drivers by using phys_addr_t where appropriate, so we don't truncate any PCI resource addresses before ioremapping them. Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| * \|	RDMA: Update workqueue usage	Tejun Heo	2011-01-16	13	-50/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ib_wq is added, which is used as the common workqueue for infiniband instead of the system workqueue. All system workqueue usages including flush_scheduled_work() callers are converted to use and flush ib_wq. * cancel_delayed_work() + flush_scheduled_work() converted to cancel_delayed_work_sync(). * qib_wq is removed and ib_wq is used instead. This is to prepare for deprecation of flush_scheduled_work(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| * \|	RDMA: Use vzalloc() to replace vmalloc()+memset(0)	Joe Perches	2011-01-12	8	-34/+15
\| \|/ \| \| \| \| \| \| \| \|	Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2011-01-17	29	-623/+2490
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits) Btrfs: forced readonly mounts on errors btrfs: Require CAP_SYS_ADMIN for filesystem rebalance Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check btrfs: Fix memory leak in btrfs_read_fs_root_no_radix() btrfs: check NULL or not btrfs: Don't pass NULL ptr to func that may deref it. btrfs: mount failure return value fix btrfs: Mem leak in btrfs_get_acl() btrfs: fix wrong free space information of btrfs btrfs: make the chunk allocator utilize the devices better btrfs: restructure find_free_dev_extent() btrfs: fix wrong calculation of stripe size btrfs: try to reclaim some space when chunk allocation fails btrfs: fix wrong data space statistics fs/btrfs: Fix build of ctree Btrfs: fix off by one while setting block groups readonly Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls Btrfs: Add readonly snapshots support Btrfs: Refactor btrfs_ioctl_snap_create() btrfs: Extract duplicate decompress code ...
\| * \|	Btrfs: forced readonly mounts on errors	liubo	2011-01-17	7	-2/+523
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch comes from "Forced readonly mounts on errors" ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that fter FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: Require CAP_SYS_ADMIN for filesystem rebalance	Ben Hutchings	2011-01-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Filesystem rebalancing (BTRFS_IOC_BALANCE) affects the entire filesystem and may run uninterruptibly for a long time. This does not seem to be something that an unprivileged user should be able to do. Reported-by: Aron Xu <happyaron.xu@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	Btrfs: don't warn if we get ENOSPC in btrfs_block_rsv_check	Josef Bacik	2011-01-16	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we run low on space we could get a bunch of warnings out of btrfs_block_rsv_check, but this is mostly just called via the transaction code to see if we need to end the transaction, it expects to see failures, so let's not WARN and freak everybody out for no reason. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: Fix memory leak in btrfs_read_fs_root_no_radix()	Tsutomu Itoh	2011-01-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In btrfs_read_fs_root_no_radix(), 'root' is not freed if btrfs_search_slot() returns error. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: check NULL or not	Tsutomu Itoh	2011-01-16	3	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Should check if functions returns NULL or not. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: Don't pass NULL ptr to func that may deref it.	Jesper Juhl	2011-01-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi, In fs/btrfs/inode.c::fixup_tree_root_location() we have this code: ... if (!path) { err = -ENOMEM; goto out; } ... out: btrfs_free_path(path); return err; btrfs_free_path() passes its argument on to other functions and some of them end up dereferencing the pointer. In the code above that pointer is clearly NULL, so btrfs_free_path() will eventually cause a NULL dereference. There are many ways to cut this cake (fix the bug). The one I chose was to make btrfs_free_path() deal gracefully with NULL pointers. If you disagree, feel free to come up with an alternative patch. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: mount failure return value fix	Dave Young	2011-01-16	2	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I happened to pass swap partition as root partition in cmdline, then kernel panic and tell me about "Cannot open root device". It is not correct, in fact it is a fs type mismatch instead of 'no device'. Eventually I found btrfs mounting failed with -EIO, it should be -EINVAL. The logic in init/do_mounts.c: for (p = fs_names; *p; p += strlen(p)+1) { int err = do_mount_root(name, p, flags, root_mount_data); switch (err) { case 0: goto out; case -EACCES: flags \|= MS_RDONLY; goto retry; case -EINVAL: continue; } print "Cannot open root device" panic } SO fs type after btrfs will have no chance to mount Here fix the return value as -EINVAL Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: Mem leak in btrfs_get_acl()	Jesper Juhl	2011-01-16	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It seems to me that we leak the memory allocated to 'value' in btrfs_get_acl() if the call to posix_acl_from_xattr() fails. Here's a patch that attempts to correct that problem. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: fix wrong free space information of btrfs	Miao Xie	2011-01-16	5	-7/+286
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we store data by raid profile in btrfs with two or more different size disks, df command shows there is some free space in the filesystem, but the user can not write any data in fact, df command shows the wrong free space information of btrfs. # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10 # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 28.00KB devid 1 size 5.01GB used 2.03GB path /dev/sda9 devid 2 size 10.00GB used 2.01GB path /dev/sda10 # btrfs device scan /dev/sda9 /dev/sda10 # mount /dev/sda9 /mnt # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999 (fill the filesystem) # sync # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 5.4G 62% /mnt # btrfs-show Label: none uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64 Total devices 2 FS bytes used 3.99GB devid 1 size 5.01GB used 5.01GB path /dev/sda9 devid 2 size 10.00GB used 4.99GB path /dev/sda10 It is because btrfs cannot allocate chunks when one of the pairing disks has no space, the free space on the other disks can not be used for ever, and should be subtracted from the total space, but btrfs doesn't subtract this space from the total. It is strange to the user. This patch fixes it by calcing the free space that can be used to allocate chunks. Implementation: 1. get all the devices free space, and align them by stripe length. 2. sort the devices by the free space. 3. check the free space of the devices, 3.1. if it is not zero, and then check the number of the devices that has more free space than this device, if the number of the devices is beyond the min stripe number, the free space can be used, and add into total free space. if the number of the devices is below the min stripe number, we can not use the free space, the check ends. 3.2. if the free space is zero, check the next devices, goto 3.1 This implementation is just likely fake chunk allocation. After appling this patch, df can show correct space information: # df -TH Filesystem Type Size Used Avail Use% Mounted on /dev/sda9 btrfs 17G 8.6G 0 100% /mnt Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: make the chunk allocator utilize the devices better	Miao Xie	2011-01-16	2	-103/+300
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this patch, we change the handling method when we can not get enough free extents with default size. Implementation: 1. Look up the suitable free extent on each device and keep the search result. If not find a suitable free extent, keep the max free extent 2. If we get enough suitable free extents with default size, chunk allocation succeeds. 3. If we can not get enough free extents, but the number of the extent with default size is >= min_stripes, we just change the mapping information (reduce the number of stripes in the extent map), and chunk allocation succeeds. 4. If the number of the extent with default size is < min_stripes, sort the devices by its max free extent's size descending 5. Use the size of the max free extent on the (num_stripes - 1)th device as the stripe size to allocate the device space By this way, the chunk allocator can allocate chunks as large as possible when the devices' space is not enough and make full use of the devices. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: restructure find_free_dev_extent()	Miao Xie	2011-01-16	2	-68/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- make it return the start position and length of the max free space when it can not find a suitable free space. - make it more readability Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: fix wrong calculation of stripe size	Miao Xie	2011-01-16	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two tiny problem: - One is When we check the chunk size is greater than the max chunk size or not, we should take mirrors into account, but the original code didn't. - The other is btrfs shouldn't use the size of the residual free space as the length of of a dup chunk when doing chunk allocation. It is because the device space that a dup chunk needs is twice as large as the chunk size, if we use the size of the residual free space as the length of a dup chunk, we can not get enough free space. Fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: try to reclaim some space when chunk allocation fails	Miao Xie	2011-01-16	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We cannot write data into files when when there is tiny space in the filesystem. Reproduce steps: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 # dd if=/dev/zero of=/mnt/tmpfile1 bs=4K count=99999999999999 (fill the filesystem) # umount /mnt # mount /dev/sda1 /mnt # rm -f /mnt/tmpfile0 # dd if=/dev/zero of=/mnt/tmpfile0 bs=4K count=1 (failed with nospec) But if we do the last step again, we can write data successfully. The reason of the problem is that btrfs didn't try to commit the current transaction and reclaim some space when chunk allocation failed. This patch fixes it by committing the current transaction to reclaim some space when chunk allocation fails. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	btrfs: fix wrong data space statistics	Miao Xie	2011-01-16	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Josef has implemented mixed data/metadata chunks, we must add those chunks' space just like data chunks. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	fs/btrfs: Fix build of ctree	Stefan Schmidt	2011-01-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CC [M] fs/btrfs/ctree.o In file included from fs/btrfs/ctree.c:21:0: fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type make[2]: * [fs/btrfs/ctree.o] Error 1 make[1]: * [fs/btrfs] Error 2 make: *** [fs] Error 2 We need to include kobject.h here. Reported-by: Jeff Garzik <jeff@garzik.org> Fix-suggested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \|	Merge branch 'lzo-support' of git://repo.or.cz/linux-btrfs-devel into btrfs-38	Chris Mason	2011-01-16	20	-385/+1051
\| \|\ \
\| \| * \|	btrfs: Extract duplicate decompress code	Li Zefan	2010-12-22	4	-194/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a common function to copy decompressed data from working buffer to bio pages. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \|	btrfs: Allow to specify compress method when defrag	Li Zefan	2010-12-22	2	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update defrag ioctl, so one can choose lzo or zlib when turning on compression in defrag operation. Changelog: v1 -> v2 - Add incompability flag. - Fix to check invalid compress type. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \|	btrfs: Add lzo compression support	Li Zefan	2010-12-22	8	-8/+527
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lzo is a much faster compression algorithm than gzib, so would allow more users to enable transparent compression, and some users can choose from compression ratio and speed for different applications Usage: # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt or # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt "-o compress" without argument is still allowed for compatability. Compatibility: If we mount a filesystem with lzo compression, it will not be able be mounted in old kernels. One reason is, otherwise btrfs will directly dump compressed data, which sits in inline extent, to user. Performance: The test copied a linux source tarball (~400M) from an ext4 partition to the btrfs partition, and then extracted it. (time in second) lzo zlib nocompress copy: 10.6 21.7 14.9 extract: 70.1 94.4 66.6 (data size in MB) lzo zlib nocompress copy: 185.87 108.69 394.49 extract: 193.80 132.36 381.21 Changelog: v1 -> v2: - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig. - Add incompability flag. - Fix error handling in compress code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \|	btrfs: Allow to add new compression algorithm	Li Zefan	2010-12-22	15	-282/+473
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make the code aware of compression type, instead of always assuming zlib compression. Also make the zlib workspace function as common code for all compression types. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \|	btrfs: Fix error handling in zlib	Li Zefan	2010-12-22	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return failure if alloc_page() fails to allocate memory, and the upper code will just give up compression. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \|	btrfs: Fix bugs in zlib workspace	Li Zefan	2010-12-22	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Fix a race that can result in alloc_workspace > cpus. - Fix to check num_workspace after wakeup. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| * \| \|	Merge branch 'readonly-snapshots' of git://repo.or.cz/linux-btrfs-devel into ↵	Chris Mason	2011-01-16	7	-49/+195
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs-38
\| \| * \| \|	Btrfs: Add BTRFS_IOC_SUBVOL_GETFLAGS/SETFLAGS ioctls	Li Zefan	2010-12-23	2	-0/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to set a snapshot or a subvolume readonly or writable on the fly. Usage: Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and then call ioctl(BTRFS_IOCTL_SUBVOL_SETFLAGS); Changelog for v3: - Change to pass __u64 as ioctl parameter. Changelog for v2: - Add _GETFLAGS ioctl. - Check if the passed fd is the root of a subvolume. - Change the name from _SNAP_SETFLAGS to _SUBVOL_SETFLAGS. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \| \|	Btrfs: Add readonly snapshots support	Li Zefan	2010-12-23	7	-10/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Usage: Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call ioctl(BTRFS_I0CTL_SNAP_CREATE_V2). Implementation: - Set readonly bit of btrfs_root_item->flags. - Add readonly checks in btrfs_permission (inode_permission), btrfs_setattr, btrfs_set/remove_xattr and some ioctls. Changelog for v3: - Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags. - Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| \| * \| \|	Btrfs: Refactor btrfs_ioctl_snap_create()	Li Zefan	2010-12-23	1	-44/+40
\| \| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split it into two functions for two different ioctls, since they share no common code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
\| * \| \|	Btrfs: fix off by one while setting block groups readonly	Chris Mason	2011-01-04	1	-1/+2
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we read in block groups, we'll set non-redundant groups readonly if we find a raid1, DUP or raid10 group. But the ro code has an off by one bug in the math around testing to make sure out accounting doesn't go wrong. Signed-off-by: Chris Mason <chris.mason@oracle.com>