summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | ipv4: fix uninit-value in ip_route_output_key_hash_rcu()Eric Dumazet2018-04-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot complained that res.type could be used while not initialized. Using RTN_UNSPEC as initial value seems better than using garbage. BUG: KMSAN: uninit-value in __mkroute_output net/ipv4/route.c:2200 [inline] BUG: KMSAN: uninit-value in ip_route_output_key_hash_rcu+0x31f0/0x3940 net/ipv4/route.c:2493 CPU: 1 PID: 12207 Comm: syz-executor0 Not tainted 4.16.0+ #81 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 __mkroute_output net/ipv4/route.c:2200 [inline] ip_route_output_key_hash_rcu+0x31f0/0x3940 net/ipv4/route.c:2493 ip_route_output_key_hash net/ipv4/route.c:2322 [inline] __ip_route_output_key include/net/route.h:126 [inline] ip_route_output_flow+0x1eb/0x3c0 net/ipv4/route.c:2577 raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] SYSC_sendto+0x6c3/0x7e0 net/socket.c:1747 SyS_sendto+0x8a/0xb0 net/socket.c:1715 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x455259 RSP: 002b:00007fdc0625dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007fdc0625e6d4 RCX: 0000000000455259 RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000020000080 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000004f7 R14: 00000000006fa7c8 R15: 0000000000000000 Local variable description: ----res.i.i@ip_route_output_flow Variable was created at: ip_route_output_flow+0x75/0x3c0 net/ipv4/route.c:2576 raw_sendmsg+0x1861/0x3ed0 net/ipv4/raw.c:653 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | dccp: initialize ireq->ir_markEric Dumazet2018-04-072-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported an uninit-value read of skb->mark in iptable_mangle_hook() Thanks to the nice report, I tracked the problem to dccp not caring of ireq->ir_mark for passive sessions. BUG: KMSAN: uninit-value in ipt_mangle_out net/ipv4/netfilter/iptable_mangle.c:66 [inline] BUG: KMSAN: uninit-value in iptable_mangle_hook+0x5e5/0x720 net/ipv4/netfilter/iptable_mangle.c:84 CPU: 0 PID: 5300 Comm: syz-executor3 Not tainted 4.16.0+ #81 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 ipt_mangle_out net/ipv4/netfilter/iptable_mangle.c:66 [inline] iptable_mangle_hook+0x5e5/0x720 net/ipv4/netfilter/iptable_mangle.c:84 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0x158/0x3d0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] __ip_local_out net/ipv4/ip_output.c:113 [inline] ip_local_out net/ipv4/ip_output.c:122 [inline] ip_queue_xmit+0x1d21/0x21c0 net/ipv4/ip_output.c:504 dccp_transmit_skb+0x15eb/0x1900 net/dccp/output.c:142 dccp_xmit_packet+0x814/0x9e0 net/dccp/output.c:281 dccp_write_xmit+0x20f/0x480 net/dccp/output.c:363 dccp_sendmsg+0x12ca/0x12d0 net/dccp/proto.c:818 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x455259 RSP: 002b:00007f1a4473dc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f1a4473e6d4 RCX: 0000000000455259 RDX: 0000000000000000 RSI: 0000000020b76fc8 RDI: 0000000000000015 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000004f0 R14: 00000000006fa720 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_save_stack mm/kmsan/kmsan.c:293 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521 ip_queue_xmit+0x1e35/0x21c0 net/ipv4/ip_output.c:502 dccp_transmit_skb+0x15eb/0x1900 net/dccp/output.c:142 dccp_xmit_packet+0x814/0x9e0 net/dccp/output.c:281 dccp_write_xmit+0x20f/0x480 net/dccp/output.c:363 dccp_sendmsg+0x12ca/0x12d0 net/dccp/proto.c:818 inet_sendmsg+0x48d/0x740 net/ipv4/af_inet.c:764 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg net/socket.c:640 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046 __sys_sendmsg net/socket.c:2080 [inline] SYSC_sendmsg+0x2a3/0x3d0 net/socket.c:2091 SyS_sendmsg+0x54/0x80 net/socket.c:2087 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_save_stack mm/kmsan/kmsan.c:293 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521 inet_csk_clone_lock+0x503/0x580 net/ipv4/inet_connection_sock.c:797 dccp_create_openreq_child+0x7f/0x890 net/dccp/minisocks.c:92 dccp_v4_request_recv_sock+0x22c/0xe90 net/dccp/ipv4.c:408 dccp_v6_request_recv_sock+0x290/0x2000 net/dccp/ipv6.c:414 dccp_check_req+0x7b9/0x8f0 net/dccp/minisocks.c:197 dccp_v4_rcv+0x12e4/0x2630 net/dccp/ipv4.c:840 ip_local_deliver_finish+0x6ed/0xd40 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:449 [inline] ip_rcv_finish+0x1253/0x16d0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] process_backlog+0x62d/0xe20 net/core/dev.c:5307 napi_poll net/core/dev.c:5705 [inline] net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771 __do_softirq+0x56d/0x93d kernel/softirq.c:285 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756 reqsk_alloc include/net/request_sock.h:88 [inline] inet_reqsk_alloc+0xc4/0x7f0 net/ipv4/tcp_input.c:6145 dccp_v4_conn_request+0x5cc/0x1770 net/dccp/ipv4.c:600 dccp_v6_conn_request+0x299/0x1880 net/dccp/ipv6.c:317 dccp_rcv_state_process+0x2ea/0x2410 net/dccp/input.c:612 dccp_v4_do_rcv+0x229/0x340 net/dccp/ipv4.c:682 dccp_v6_do_rcv+0x16d/0x1220 net/dccp/ipv6.c:578 sk_backlog_rcv include/net/sock.h:908 [inline] __sk_receive_skb+0x60e/0xf20 net/core/sock.c:513 dccp_v4_rcv+0x24d4/0x2630 net/dccp/ipv4.c:874 ip_local_deliver_finish+0x6ed/0xd40 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x43c/0x4e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:449 [inline] ip_rcv_finish+0x1253/0x16d0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562 __netif_receive_skb net/core/dev.c:4627 [inline] process_backlog+0x62d/0xe20 net/core/dev.c:5307 napi_poll net/core/dev.c:5705 [inline] net_rx_action+0x7c1/0x1a70 net/core/dev.c:5771 __do_softirq+0x56d/0x93d kernel/softirq.c:285 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: fix uninit-value in __hw_addr_add_ex()Eric Dumazet2018-04-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot complained : BUG: KMSAN: uninit-value in memcmp+0x119/0x180 lib/string.c:861 CPU: 0 PID: 3 Comm: kworker/0:0 Not tainted 4.16.0+ #82 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 memcmp+0x119/0x180 lib/string.c:861 __hw_addr_add_ex net/core/dev_addr_lists.c:60 [inline] __dev_mc_add+0x1c2/0x8e0 net/core/dev_addr_lists.c:670 dev_mc_add+0x6d/0x80 net/core/dev_addr_lists.c:687 igmp6_group_added+0x2db/0xa00 net/ipv6/mcast.c:662 ipv6_dev_mc_inc+0xe9e/0x1130 net/ipv6/mcast.c:914 addrconf_join_solict net/ipv6/addrconf.c:2078 [inline] addrconf_dad_begin net/ipv6/addrconf.c:3828 [inline] addrconf_dad_work+0x427/0x2150 net/ipv6/addrconf.c:3954 process_one_work+0x12c6/0x1f60 kernel/workqueue.c:2113 worker_thread+0x113c/0x24f0 kernel/workqueue.c:2247 kthread+0x539/0x720 kernel/kthread.c:239 Fixes: f001fde5eadd ("net: introduce a list of device addresses dev_addr_list (v6)") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: initialize skb->peeked when cloningEric Dumazet2018-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported __skb_try_recv_from_queue() was using skb->peeked while it was potentially unitialized. We need to clear it in __skb_clone() Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netlink: fix uninit-value in netlink_sendmsgEric Dumazet2018-04-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reported : BUG: KMSAN: uninit-value in ffs arch/x86/include/asm/bitops.h:432 [inline] BUG: KMSAN: uninit-value in netlink_sendmsg+0xb26/0x1310 net/netlink/af_netlink.c:1851 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net_sched: fix a missing idr_remove() in u32_delete_key()Cong Wang2018-04-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we delete a u32 key via u32_delete_key(), we forget to call idr_remove() to remove its handle from IDR. Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles") Reported-by: Marcin Kabiesz <admin@hostcenter.eu> Tested-by: Marcin Kabiesz <admin@hostcenter.eu> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/sched: fix NULL dereference in the error path of tcf_bpf_init()Davide Caratti2018-04-051-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when tcf_bpf_init_from_ops() fails (e.g. because of program having invalid number of instructions), tcf_bpf_cfg_cleanup() calls bpf_prog_put(NULL) or bpf_prog_destroy(NULL). Unless CONFIG_BPF_SYSCALL is unset, this causes the following error: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 PGD 800000007345a067 P4D 800000007345a067 PUD 340e1067 PMD 0 Oops: 0000 [#1] SMP PTI Modules linked in: act_bpf(E) ip6table_filter ip6_tables iptable_filter binfmt_misc ext4 mbcache jbd2 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_generic pcbc snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd glue_helper cryptd joydev snd_timer snd virtio_balloon pcspkr soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm virtio_blk drm virtio_net virtio_console i2c_core crc32c_intel serio_raw virtio_pci ata_piix libata virtio_ring floppy virtio dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_bpf] CPU: 3 PID: 5654 Comm: tc Tainted: G E 4.16.0.bpf_test+ #408 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__bpf_prog_put+0xc/0xc0 RSP: 0018:ffff9594003ef728 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff9594003ef758 RCX: 0000000000000024 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044 R10: 0000000000000220 R11: ffff8a7ab9f17131 R12: 0000000000000000 R13: ffff8a7ab7c3c8e0 R14: 0000000000000001 R15: ffff8a7ab88f1054 FS: 00007fcb2f17c740(0000) GS:ffff8a7abfd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 000000007c888006 CR4: 00000000001606e0 Call Trace: tcf_bpf_cfg_cleanup+0x2f/0x40 [act_bpf] tcf_bpf_cleanup+0x4c/0x70 [act_bpf] __tcf_idr_release+0x79/0x140 tcf_bpf_init+0x125/0x330 [act_bpf] tcf_action_init_1+0x2cc/0x430 ? get_page_from_freelist+0x3f0/0x11b0 tcf_action_init+0xd3/0x1b0 tc_ctl_action+0x18b/0x240 rtnetlink_rcv_msg+0x29c/0x310 ? _cond_resched+0x15/0x30 ? __kmalloc_node_track_caller+0x1b9/0x270 ? rtnl_calcit.isra.29+0x100/0x100 netlink_rcv_skb+0xd2/0x110 netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2cd/0x3c0 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x27a/0x290 ? mem_cgroup_commit_charge+0x80/0x130 ? page_add_new_anon_rmap+0x73/0xc0 ? do_anonymous_page+0x2a2/0x560 ? __handle_mm_fault+0xc75/0xe20 __sys_sendmsg+0x58/0xa0 do_syscall_64+0x6e/0x1a0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fcb2e58eba0 RSP: 002b:00007ffc93c496c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffc93c497f0 RCX: 00007fcb2e58eba0 RDX: 0000000000000000 RSI: 00007ffc93c49740 RDI: 0000000000000003 RBP: 000000005ac6a646 R08: 0000000000000002 R09: 0000000000000000 R10: 00007ffc93c49120 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc93c49804 R14: 0000000000000001 R15: 000000000066afa0 Code: 5f 00 48 8b 43 20 48 c7 c7 70 2f 7c b8 c7 40 10 00 00 00 00 5b e9 a5 8b 61 00 0f 1f 44 00 00 0f 1f 44 00 00 41 54 55 48 89 fd 53 <48> 8b 47 20 f0 ff 08 74 05 5b 5d 41 5c c3 41 89 f4 0f 1f 44 00 RIP: __bpf_prog_put+0xc/0xc0 RSP: ffff9594003ef728 CR2: 0000000000000020 Fix it in tcf_bpf_cfg_cleanup(), ensuring that bpf_prog_{put,destroy}(f) is called only when f is not NULL. Fixes: bbc09e7842a5 ("net/sched: fix idr leak on the error path of tcf_bpf_init()") Reported-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/ipv6: Increment OUTxxx counters after netfilter hookJeff Barnhill2018-04-051-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of ip6_forward(), IPSTATS_MIB_OUTFORWDATAGRAMS and IPSTATS_MIB_OUTOCTETS are incremented immediately before the NF_HOOK call for NFPROTO_IPV6 / NF_INET_FORWARD. As a result, these counters get incremented regardless of whether or not the netfilter hook allows the packet to continue being processed. This change increments the counters in ip6_forward_finish() so that it will not happen if the netfilter hook chooses to terminate the packet, which is similar to how IPv4 works. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | arp: fix arp_filter on l3slave devicesMiguel Fadon Perlines2018-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arp_filter performs an ip_route_output search for arp source address and checks if output device is the same where the arp request was received, if it is not, the arp request is not answered. This route lookup is always done on main route table so l3slave devices never find the proper route and arp is not answered. Passing l3mdev_master_ifindex_rcu(dev) return value as oif fixes the lookup for l3slave devices while maintaining same behavior for non l3slave devices as this function returns 0 in that case. Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX") Signed-off-by: Miguel Fadon Perlines <mfadon@teldat.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | vti6: better validate user provided tunnel namesEric Dumazet2018-04-051-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use valid_name() to make sure user does not provide illegal device name. Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip6_tunnel: better validate user provided tunnel namesEric Dumazet2018-04-051-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use valid_name() to make sure user does not provide illegal device name. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip6_gre: better validate user provided tunnel namesEric Dumazet2018-04-051-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dev_valid_name() to make sure user does not provide illegal device name. syzbot caught the following bug : BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline] BUG: KASAN: stack-out-of-bounds in ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339 Write of size 20 at addr ffff8801afb9f7b8 by task syzkaller851048/4466 CPU: 1 PID: 4466 Comm: syzkaller851048 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1b9/0x29f lib/dump_stack.c:53 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 strlcpy include/linux/string.h:300 [inline] ip6gre_tunnel_locate+0x334/0x860 net/ipv6/ip6_gre.c:339 ip6gre_tunnel_ioctl+0x69d/0x12e0 net/ipv6/ip6_gre.c:1195 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525 sock_ioctl+0x47e/0x680 net/socket.c:1015 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 SYSC_ioctl fs/ioctl.c:708 [inline] SyS_ioctl+0x24/0x30 fs/ioctl.c:706 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: sit: better validate user provided tunnel namesEric Dumazet2018-04-051-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dev_valid_name() to make sure user does not provide illegal device name. syzbot caught the following bug : BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline] BUG: KASAN: stack-out-of-bounds in ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254 Write of size 33 at addr ffff8801b64076d8 by task syzkaller932654/4453 CPU: 0 PID: 4453 Comm: syzkaller932654 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1b9/0x29f lib/dump_stack.c:53 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 strlcpy include/linux/string.h:300 [inline] ipip6_tunnel_locate+0x63b/0xaa0 net/ipv6/sit.c:254 ipip6_tunnel_ioctl+0xe71/0x241b net/ipv6/sit.c:1221 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525 sock_ioctl+0x47e/0x680 net/socket.c:1015 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 SYSC_ioctl fs/ioctl.c:708 [inline] SyS_ioctl+0x24/0x30 fs/ioctl.c:706 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip_tunnel: better validate user provided tunnel namesEric Dumazet2018-04-051-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dev_valid_name() to make sure user does not provide illegal device name. syzbot caught the following bug : BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300 [inline] BUG: KASAN: stack-out-of-bounds in __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257 Write of size 20 at addr ffff8801ac79f810 by task syzkaller268107/4482 CPU: 0 PID: 4482 Comm: syzkaller268107 Not tainted 4.16.0+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1b9/0x29f lib/dump_stack.c:53 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0xac/0x2f5 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 strlcpy include/linux/string.h:300 [inline] __ip_tunnel_create+0xca/0x6b0 net/ipv4/ip_tunnel.c:257 ip_tunnel_create net/ipv4/ip_tunnel.c:352 [inline] ip_tunnel_ioctl+0x818/0xd40 net/ipv4/ip_tunnel.c:861 ipip_tunnel_ioctl+0x1c5/0x420 net/ipv4/ipip.c:350 dev_ifsioc+0x43e/0xb90 net/core/dev_ioctl.c:334 dev_ioctl+0x69a/0xcc0 net/core/dev_ioctl.c:525 sock_ioctl+0x47e/0x680 net/socket.c:1015 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 SYSC_ioctl fs/ioctl.c:708 [inline] SyS_ioctl+0x24/0x30 fs/ioctl.c:706 do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: fool proof dev_valid_name()Eric Dumazet2018-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to use dev_valid_name() to validate tunnel names, so better use strnlen(name, IFNAMSIZ) than strlen(name) to make sure to not upset KASAN. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge tag 'selinux-pr-20180403' of ↵Linus Torvalds2018-04-068-27/+215
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: "A bigger than usual pull request for SELinux, 13 patches (lucky!) along with a scary looking diffstat. Although if you look a bit closer, excluding the usual minor tweaks/fixes, there are really only two significant changes in this pull request: the addition of proper SELinux access controls for SCTP and the encapsulation of a lot of internal SELinux state. The SCTP changes are the result of a multi-month effort (maybe even a year or longer?) between the SELinux folks and the SCTP folks to add proper SELinux controls. A special thanks go to Richard for seeing this through and keeping the effort moving forward. The state encapsulation work is a bit of janitorial work that came out of some early work on SELinux namespacing. The question of namespacing is still an open one, but I believe there is some real value in the encapsulation work so we've split that out and are now sending that up to you" * tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: wrap AVC state selinux: wrap selinuxfs state selinux: fix handling of uninitialized selinux state in get_bools/classes selinux: Update SELinux SCTP documentation selinux: Fix ltp test connect-syscall failure selinux: rename the {is,set}_enforcing() functions selinux: wrap global selinux state selinux: fix typo in selinux_netlbl_sctp_sk_clone declaration selinux: Add SCTP support sctp: Add LSM hooks sctp: Add ip option support security: Add support for SCTP security hooks netlabel: If PF_INET6, check sk_buff ip header version
| * | | | sctp: Add LSM hooksRichard Haines2018-02-263-1/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add security hooks allowing security modules to exercise access control over SCTP. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | sctp: Add ip option supportRichard Haines2018-02-265-26/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add ip option support to allow LSM security modules to utilise CIPSO/IPv4 and CALIPSO/IPv6 services. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * | | | netlabel: If PF_INET6, check sk_buff ip header versionRichard Haines2018-02-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When resolving a fallback label, check the sk_buff version as it is possible (e.g. SCTP) to have family = PF_INET6 while receiving ip_hdr(skb)->version = 4. Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
* | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2018-04-063-5/+8
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - the v9fs maintainers have been missing for a long time. I've taken over v9fs patch slinging. - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits) mm,oom_reaper: check for MMF_OOM_SKIP before complaining mm/ksm: fix interaction with THP mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t headers: untangle kmemleak.h from mm.h include/linux/mmdebug.h: make VM_WARN* non-rvals mm/page_isolation.c: make start_isolate_page_range() fail if already isolated mm: change return type to vm_fault_t mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory kernel/fork.c: detect early free of a live mm mm: make counting of list_lru_one::nr_items lockless mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static block_invalidatepage(): only release page if the full page was invalidated mm: kernel-doc: add missing parameter descriptions mm/swap.c: remove @cold parameter description for release_pages() mm/nommu: remove description of alloc_vm_area zram: drop max_zpage_size and use zs_huge_class_size() zsmalloc: introduce zs_huge_class_size() mm: fix races between swapoff and flush dcache fs/direct-io.c: minor cleanups in do_blockdev_direct_IO ...
| * | | | | headers: untangle kmemleak.h from mm.hRandy Dunlap2018-04-052-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious reason. It looks like it's only a convenience, so remove kmemleak.h from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't already #include it. Also remove <linux/kmemleak.h> from source files that do not use it. This is tested on i386 allmodconfig and x86_64 allmodconfig. It would be good to run it through the 0day bot for other $ARCHes. I have neither the horsepower nor the storage space for the other $ARCHes. Update: This patch has been extensively build-tested by both the 0day bot & kisskb/ozlabs build farms. Both of them reported 2 build failures for which patches are included here (in v2). [ slab.h is the second most used header file after module.h; kernel.h is right there with slab.h. There could be some minor error in the counting due to some #includes having comments after them and I didn't combine all of those. ] [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr] Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures] Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | net/9p/client.c: fix potential refcnt problem of trans moduleChengguang Xu2018-04-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When specifying trans_mod multiple times in a mount, it will cause an inaccurate refcount of the trans module. Also, in the error case of option parsing, we should put the trans module if we have already got it. Link: http://lkml.kernel.org/r/1522154942-57339-1-git-send-email-cgxu519@gmx.com Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | net/9p: avoid -ERESTARTSYS leak to userspaceGreg Kurz2018-04-051-2/+4
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If it was interrupted by a signal, the 9p client may need to send some more requests to the server for cleanup before returning to userspace. To avoid such a last minute request to be interrupted right away, the client memorizes if a signal is pending, clears TIF_SIGPENDING, handles the request and calls recalc_sigpending() before returning. Unfortunately, if the transmission of this cleanup request fails for any reason, the transport returns an error and the client propagates it right away, without calling recalc_sigpending(). This ends up with -ERESTARTSYS from the initially interrupted request crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup request. The specific signal handling code, which is responsible for converting -ERESTARTSYS to -EINTR is not called, and userspace receives the confusing errno value: open: Unknown error 512 (512) This is really hard to hit in real life. I discovered the issue while working on hot-unplug of a virtio-9p-pci device with an instrumented QEMU allowing to control request completion. Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy error path actually. Their code flow is a bit obscure and the best thing to do would probably be a full rewrite: to really ensure this situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can never happen. But given the general lack of interest for the 9p code, I won't risk breaking more things. So this patch simply fixes the buggy paths in both functions with a trivial label+goto. Thanks to Laurent Dufour for his help and suggestions on how to find the root cause and how to fix it. Link: http://lkml.kernel.org/r/152062809886.10599.7361006774123053312.stgit@bahia.lan Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: David Miller <davem@davemloft.net> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2018-04-0510-90/+189
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on server xdr decoding (with an eye towards eliminating a data copy in the RDMA case). I did some refactoring of the delegation code in preparation for eliminating some delegation self-conflicts and implementing write delegations" * tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits) nfsd: fix incorrect umasks sunrpc: remove incorrect HMAC request initialization NFSD: Clean up legacy NFS SYMLINK argument XDR decoders NFSD: Clean up legacy NFS WRITE argument XDR decoders nfsd: Trace NFSv4 COMPOUND execution nfsd: Add I/O trace points in the NFSv4 read proc nfsd: Add I/O trace points in the NFSv4 write path nfsd: Add "nfsd_" to trace point names nfsd: Record request byte count, not count of vectors nfsd: Fix NFSD trace points svc: Report xprt dequeue latency sunrpc: Report per-RPC execution stats sunrpc: Re-purpose trace_svc_process sunrpc: Save remote presentation address in svc_xprt for trace events sunrpc: Simplify trace_svc_recv sunrpc: Simplify do_enqueue tracing sunrpc: Move trace_svc_xprt_dequeue() sunrpc: Update show_svc_xprt_flags() to include recently added flags svc: Simplify ->xpo_secure_port sunrpc: Remove unneeded pointer dereference ...
| * | | | | sunrpc: remove incorrect HMAC request initializationEric Biggers2018-04-031-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make_checksum_hmac_md5() is allocating an HMAC transform and doing crypto API calls in the following order: crypto_ahash_init() crypto_ahash_setkey() crypto_ahash_digest() This is wrong because it makes no sense to init() the request before a key has been set, given that the initial state depends on the key. And digest() is short for init() + update() + final(), so in this case there's no need to explicitly call init() at all. Before commit 9fa68f620041 ("crypto: hash - prevent using keyed hashes without setting key") the extra init() had no real effect, at least for the software HMAC implementation. (There are also hardware drivers that implement HMAC-MD5, and it's not immediately obvious how gracefully they handle init() before setkey().) But now the crypto API detects this incorrect initialization and returns -ENOKEY. This is breaking NFS mounts in some cases. Fix it by removing the incorrect call to crypto_ahash_init(). Reported-by: Michael Young <m.a.young@durham.ac.uk> Fixes: 9fa68f620041 ("crypto: hash - prevent using keyed hashes without setting key") Fixes: fffdaef2eb4a ("gss_krb5: Add support for rc4-hmac encryption") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | NFSD: Clean up legacy NFS SYMLINK argument XDR decodersChuck Lever2018-04-031-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move common code in NFSD's legacy SYMLINK decoders into a helper. The immediate benefits include: - one fewer data copies on transports that support DDP - consistent error checking across all versions - reduction of code duplication - support for both legal forms of SYMLINK requests on RDMA transports for all versions of NFS (in particular, NFSv2, for completeness) In the long term, this helper is an appropriate spot to perform a per-transport call-out to fill the pathname argument using, say, RDMA Reads. Filling the pathname in the proc function also means that eventually the incoming filehandle can be interpreted so that filesystem- specific memory can be allocated as a sink for the pathname argument, rather than using anonymous pages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | NFSD: Clean up legacy NFS WRITE argument XDR decodersChuck Lever2018-04-031-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move common code in NFSD's legacy NFS WRITE decoders into a helper. The immediate benefit is reduction of code duplication and some nice micro-optimizations (see below). In the long term, this helper can perform a per-transport call-out to fill the rq_vec (say, using RDMA Reads). The legacy WRITE decoders and procs are changed to work like NFSv4, which constructs the rq_vec just before it is about to call vfs_writev. Why? Calling a transport call-out from the proc instead of the XDR decoder means that the incoming FH can be resolved to a particular filesystem and file. This would allow pages from the backing file to be presented to the transport to be filled, rather than presenting anonymous pages and copying or flipping them into the file's page cache later. I also prefer using the pages in rq_arg.pages, instead of pulling the data pages directly out of the rqstp::rq_pages array. This is currently the way the NFSv3 write decoder works, but the other two do not seem to take this approach. Fixing this removes the only reference to rq_pages found in NFSD, eliminating an NFSD assumption about how transports use the pages in rq_pages. Lastly, avoid setting up the first element of rq_vec as a zero- length buffer. This happens with an RDMA transport when a normal Read chunk is present because the data payload is in rq_arg's page list (none of it is in the head buffer). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | svc: Report xprt dequeue latencyChuck Lever2018-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Record the time between when a rqstp is enqueued on a transport and when it is dequeued. This includes how long the rqstp waits on the queue and how long it takes the kernel scheduler to wake a nfsd thread to service it. The svc_xprt_dequeue trace point is altered to include the number of microseconds between xprt_enqueue and xprt_dequeue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Report per-RPC execution statsChuck Lever2018-04-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a mechanism to report the server-side execution latency of each RPC. The goal is to enable user space to filter the trace record for latency outliers, build histograms, etc. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Re-purpose trace_svc_processChuck Lever2018-04-031-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, trace_svc_process has two call sites: 1. Just after a call to svc_send. svc_send already invokes trace_svc_send with the same arguments just before returning 2. Just before a call to svc_drop. svc_drop already invokes trace_svc_drop with the same arguments just after it is called Therefore trace_svc_process does not provide any additional information not already provided by these other trace points. However, it would be useful to record the incoming RPC procedure. So reuse trace_svc_process for this purpose. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Save remote presentation address in svc_xprt for trace eventsChuck Lever2018-04-033-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TP_printk defines a format string that is passed to user space for converting raw trace event records to something human-readable. My user space's printf (Oracle Linux 7), however, does not have a %pI format specifier. The result is that what is supposed to be an IP address in the output of "trace-cmd report" is just a string that says the field couldn't be displayed. To fix this, adopt the same approach as the client: maintain a pre- formated presentation address for occasions when %pI is not available. The location of the trace_svc_send trace point is adjusted so that rqst->rq_xprt is not NULL when the trace event is recorded. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Simplify trace_svc_recvChuck Lever2018-04-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There doesn't seem to be a lot of value in calling trace_svc_recv in the failing case. 1. There are two very common cases: one is the transport is not ready, and the other is shutdown. Neither is terribly interesting. 2. The trace record for the failing case contains nothing but the status code. Therefore the trace point call site in the error exit is removed. Since the trace point is now recording a length instead of a status, rename the status field and remove the case that records a zero XID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Simplify do_enqueue tracingChuck Lever2018-04-031-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are three cases where svc_xprt_do_enqueue() returns without waking an nfsd thread: 1. There is no work to do 2. The transport is already busy 3. There are no available nfsd threads Only 3. is truly interesting. Move the trace point so it records that there was work to do and either an nfsd thread was awoken, or a free one could not found. As an additional clean up, remove a redundant comment and a couple of dprintk call sites. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Move trace_svc_xprt_dequeue()Chuck Lever2018-04-031-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce the amount of noise generated by trace_svc_xprt_dequeue by moving it to the end of svc_get_next_xprt. This generates exactly one trace event when a ready xprt is found, rather than spurious events when there is no work to do. The empty events contain no information that can't be obtained simply by tracing function calls to svc_xprt_dequeue. A small additional benefit is simplification of the svc_xprt_event trace class, which no longer has to handle the case when the @xprt parameter is NULL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | svc: Simplify ->xpo_secure_portChuck Lever2018-04-033-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: Instead of returning a value that is used to set or clear a bit, just make ->xpo_secure_port mangle that bit, and return void. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Remove unneeded pointer dereferenceChuck Lever2018-04-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: Noticed during code inspection that there is already a local automatic variable "xprt" so dereferencing rqst->rq_xprt again is unnecessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | svcrdma: Clean up rdma_build_arg_xdrChuck Lever2018-03-201-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: The value of the byte_count parameter is already passed to rdma_build_arg_xdr as part of the svc_rdma_op_ctxt structure. Further, without the parameter called "byte_count" there is no need to have the abbreviated "bc" automatic variable. "bc" can now be called something more intuitive. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | svcrdma: Consult max_qp_init_rd_atom when accepting connectionsChuck Lever2018-03-202-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The target needs to return the lesser of the client's Inbound RDMA Read Queue Depth (IRD), provided in the connection parameters, and the local device's Outbound RDMA Read Queue Depth (ORD). The latter limit is max_qp_init_rd_atom, not max_qp_rd_atom. The svcrdma_ord value caps the ORD value for iWARP transports, which do not exchange ORD/IRD values at connection time. Since no other Linux kernel RDMA-enabled storage target sees fit to provide this cap, I'm removing it here too. initiator_depth is a u8, so ensure the computed ORD value does not overflow that field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | svcrdma: Use pr_err to report Receive errorsChuck Lever2018-03-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: Other completion handlers use pr_err, not pr_warn. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | SUNRPC: cache: ignore timestamp written to 'flush' file.NeilBrown2018-03-191-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interface for flushing the sunrpc auth cache was poorly designed and has caused problems a number of times. The design is that you write a timestamp, and all entries created before that time are discarded. The most obvious problem is that this is not what people actually want. They want to just flush the whole cache. The 1-second granularity can be a problem, as can the use of wall-clock time. A current problem is that code will write the current time to this file - expecting it to clear everything - and if the seconds number ticks over before this timestamp is checked, the test "then >= now" fails, and a full flush isn't forced. So lets just drop the subtleties and always flush the whole cache. The worst this could do is impose an extra cost refilling it, but that would require someone to be using non-standard tools. We still report an error if the string written is not a number, but we cause any valid number to flush the whole cache. Reported-by: "Wang, Alan 1. (NSB - CN/Hangzhou)" <alan.1.wang@nokia-sbell.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | sunrpc: Fix unaligned access on sparc64James Ettle2018-03-192-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix unaligned access in gss_{get,verify}_mic_v2() on sparc64 Signed-off-by: James Ettle <james@ettle.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds2018-04-051-0/+23
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
| * | | | | | net/utils: Introduce inet_addr_is_anySagi Grimberg2018-03-261-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Can be useful to check INET_ANY address for both ipv4/ipv6 addresses. Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2018-04-051-1/+1
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kfifo: fix inaccurate comment tools/thermal: tmon: fix for segfault net: Spelling s/stucture/structure/ edd: don't spam log if no EDD information is present Documentation: Fix early-microcode.txt references after file rename tracing: Block comments should align the * on each line treewide: Fix typos in printk GenWQE: Fix a typo in two comments treewide: Align function definition open/close braces
| * | | | | | net: Spelling s/stucture/structure/Geert Uytterhoeven2018-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | | | | | inet: frags: fix ip6frag_low_thresh boundaryEric Dumazet2018-04-044-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches, since linker might place next to it a non zero value preventing a change to ip6frag_low_thresh. ip6frag_low_thresh is not used anymore in the kernel, but we do not want to prematuraly break user scripts wanting to change it. Since specifying a minimal value of 0 for proc_doulongvec_minmax() is moot, let's remove these zero values in all defrag units. Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | tipc: Fix namespace violation in tipc_sk_fill_sock_diagGhantaKrishnamurthy MohanKrishna2018-04-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To fetch UID info for socket diagnostics, we determine the namespace of user context using tipc socket instance. This may cause namespace violation, as the kernel will remap based on UID. We fix this by fetching namespace info using the calling userspace netlink socket. Fixes: c30b70deb5f4 (tipc: implement socket diagnostics for AF_TIPC) Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | net: avoid unneeded atomic operation in ip*_append_data()Paolo Abeni2018-04-042-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()") and commit 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()"), when transmitting sub MTU datagram, an addtional, unneeded atomic operation is performed in ip*_append_data() to update wmem_alloc: in the above condition the delta is 0. The above cause small but measurable performance regression in UDP xmit tput test with packet size below MTU. This change avoids such overhead updating wmem_alloc only if wmem_alloc_delta is non zero. The error path is left intentionally unmodified: it's a slow path and simplicity is preferred to performances. Fixes: 694aba690de0 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()") Fixes: 1f4c6eb24029 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | tipc: Fix missing list initializations in struct tipc_subscriptionJon Maloy2018-04-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an item of struct tipc_subscription is created, we fail to initialize the two lists aggregated into the struct. This has so far never been a problem, since the items are just added to a root object by list_add(), which does not require the addee list to be pre-initialized. However, syzbot is provoking situations where this addition fails, whereupon the attempted removal if the item from the list causes a crash. This problem seems to always have been around, despite that the code for creating this object was rewritten in commit 242e82cc95f6 ("tipc: collapse subscription creation functions"), which is still in net-next. We fix this for that commit by initializing the two lists properly. Fixes: 242e82cc95f6 ("tipc: collapse subscription creation functions") Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | ipv6: udp: set dst cache for a connected sk if current not validAlexey Kodanev2018-04-041-19/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow() and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending results to ICMPV6_PKT_TOOBIG error: udp_v6_send_skb(), for example with vti6 tunnel: vti6_xmit(), get ICMPV6_PKT_TOOBIG error skb_dst_update_pmtu(), can create a RTF_CACHE clone icmpv6_send() ... udpv6_err() ip6_sk_update_pmtu() ip6_update_pmtu(), can create a RTF_CACHE clone ... ip6_datagram_dst_update() ip6_dst_store() And after commit 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update"), the UDPv6 error handler can update socket's dst cache, but it can happen before the update in the end of udpv6_sendmsg(), preventing getting the new dst cache on the next udpv6_sendmsg() calls. In order to fix it, save dst in a connected socket only if the current socket's dst cache is invalid. The previous patch prepared ip6_sk_dst_lookup_flow() to do that with the new argument, and this patch enables it in udpv6_sendmsg(). Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update") Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
OpenPOWER on IntegriCloud