Commit Graph

948892 Commits

Author SHA1 Message Date
Jimmy Assarsson
66648766ef mm: Remove MPX leftovers
Remove MPX leftovers in generic code.

Fixes: 45fc24e89b ("x86/mpx: remove MPX from arch/x86")
Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.com
2020-04-22 21:02:35 +02:00
Xi Wang
d563099e3e RDMA/hns: Support 0 hop addressing for WQE buffer
Add the zero hop addressing support by using new mtr interface for WQE
buffer and simple mtr invoking process, so WQE buffer can support hopnum
between 0 to 3.

Link: https://lore.kernel.org/r/1586779091-51410-5-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:54 -03:00
Xi Wang
477a0a3870 RDMA/hns: Optimize 0 hop addressing for EQE buffer
Use the new mtr interface to simple the hop 0 addressing and multihop
addressing process.

Link: https://lore.kernel.org/r/1586779091-51410-4-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:54 -03:00
Xi Wang
cc23267aed RDMA/hns: Optimize hns buffer allocation flow
When the value of nbufs is 1, the buffer is in direct mode, which may cause
confusion. So optimizes current codes to make it easier to maintain and
understand.

Link: https://lore.kernel.org/r/1586779091-51410-3-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:54 -03:00
Xi Wang
3c873161a0 RDMA/hns: Add support for addressing when hopnum is 0
Currently, WQE and EQE table have already used the mtr interface to config
and access memory by multi-hop addressing when hopnum is from 1 to 3. But
if hopnum is 0, each table need write its own but repetitive logic, and
many duplicate code exists in the mtr interfaces invoke process.

So wraps the public logic as 3 functions: hns_roce_mtr_create(),
hns_roce_mtr_destroy() and hns_roce_mtr_map() to support hopnum ranges from
0 to 3. In addition, makes the mtr interfaces easier to use.

Link: https://lore.kernel.org/r/1586779091-51410-2-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:59:53 -03:00
Leon Romanovsky
83a2670212 RDMA/core: Fix overwriting of uobj in case of error
In case of failure to get file, the uobj is overwritten and causes to
supply bad pointer as an input to uverbs_uobject_put().

  BUG: KASAN: null-ptr-deref in atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline]
  BUG: KASAN: null-ptr-deref in refcount_sub_and_test include/linux/refcount.h:253 [inline]
  BUG: KASAN: null-ptr-deref in refcount_dec_and_test include/linux/refcount.h:281 [inline]
  BUG: KASAN: null-ptr-deref in kref_put include/linux/kref.h:64 [inline]
  BUG: KASAN: null-ptr-deref in uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57
  Write of size 4 at addr 0000000000000030 by task syz-executor.4/1691

  CPU: 1 PID: 1691 Comm: syz-executor.4 Not tainted 5.6.0 #17
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x94/0xce lib/dump_stack.c:118
   __kasan_report+0x10c/0x190 mm/kasan/report.c:515
   kasan_report+0x32/0x50 mm/kasan/common.c:625
   check_memory_region_inline mm/kasan/generic.c:187 [inline]
   check_memory_region+0x16d/0x1c0 mm/kasan/generic.c:193
   atomic_fetch_sub include/asm-generic/atomic-instrumented.h:199 [inline]
   refcount_sub_and_test include/linux/refcount.h:253 [inline]
   refcount_dec_and_test include/linux/refcount.h:281 [inline]
   kref_put include/linux/kref.h:64 [inline]
   uverbs_uobject_put+0x22/0x90 drivers/infiniband/core/rdma_core.c:57
   alloc_begin_fd_uobject+0x1d0/0x250 drivers/infiniband/core/rdma_core.c:486
   rdma_alloc_begin_uobject+0xa8/0xf0 drivers/infiniband/core/rdma_core.c:509
   __uobj_alloc include/rdma/uverbs_std_types.h:117 [inline]
   ib_uverbs_create_comp_channel+0x16d/0x230 drivers/infiniband/core/uverbs_cmd.c:982
   ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:665
   __vfs_write+0x7c/0x100 fs/read_write.c:494
   vfs_write+0x168/0x4a0 fs/read_write.c:558
   ksys_write+0xc8/0x200 fs/read_write.c:611
   do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:295
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x466479
  Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007efe9f6a7c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
  RDX: 0000000000000018 RSI: 0000000020000040 RDI: 0000000000000003
  RBP: 00007efe9f6a86bc R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
  R13: 0000000000000bf2 R14: 00000000004cb80a R15: 00000000006fefc0

Fixes: 849e149063 ("RDMA/core: Do not allow alloc_commit to fail")
Link: https://lore.kernel.org/r/20200421082929.311931-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:53:42 -03:00
Leon Romanovsky
0fb00941dc RDMA/core: Prevent mixed use of FDs between shared ufiles
FDs can only be used on the ufile that created them, they cannot be mixed
to other ufiles. We are lacking a check to prevent it.

  BUG: KASAN: null-ptr-deref in atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline]
  BUG: KASAN: null-ptr-deref in atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline]
  BUG: KASAN: null-ptr-deref in fput_many+0x1a/0x140 fs/file_table.c:336
  Write of size 8 at addr 0000000000000038 by task syz-executor179/284

  CPU: 0 PID: 284 Comm: syz-executor179 Not tainted 5.5.0-rc5+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x94/0xce lib/dump_stack.c:118
   __kasan_report+0x18f/0x1b7 mm/kasan/report.c:510
   kasan_report+0xe/0x20 mm/kasan/common.c:639
   check_memory_region_inline mm/kasan/generic.c:185 [inline]
   check_memory_region+0x15d/0x1b0 mm/kasan/generic.c:192
   atomic64_sub_and_test include/asm-generic/atomic-instrumented.h:1547 [inline]
   atomic_long_sub_and_test include/asm-generic/atomic-long.h:460 [inline]
   fput_many+0x1a/0x140 fs/file_table.c:336
   rdma_lookup_put_uobject+0x85/0x130 drivers/infiniband/core/rdma_core.c:692
   uobj_put_read include/rdma/uverbs_std_types.h:96 [inline]
   _ib_uverbs_lookup_comp_file drivers/infiniband/core/uverbs_cmd.c:198 [inline]
   create_cq+0x375/0xba0 drivers/infiniband/core/uverbs_cmd.c:1006
   ib_uverbs_create_cq+0x114/0x140 drivers/infiniband/core/uverbs_cmd.c:1089
   ib_uverbs_write+0xaa5/0xdf0 drivers/infiniband/core/uverbs_main.c:769
   __vfs_write+0x7c/0x100 fs/read_write.c:494
   vfs_write+0x168/0x4a0 fs/read_write.c:558
   ksys_write+0xc8/0x200 fs/read_write.c:611
   do_syscall_64+0x9c/0x390 arch/x86/entry/common.c:294
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x44ef99
  Code: 00 b8 00 01 00 00 eb e1 e8 74 1c 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c4 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007ffc0b74c028 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 00007ffc0b74c030 RCX: 000000000044ef99
  RDX: 0000000000000040 RSI: 0000000020000040 RDI: 0000000000000005
  RBP: 00007ffc0b74c038 R08: 0000000000401830 R09: 0000000000401830
  R10: 00007ffc0b74c038 R11: 0000000000000246 R12: 0000000000000000
  R13: 0000000000000000 R14: 00000000006be018 R15: 0000000000000000

Fixes: cf8966b347 ("IB/core: Add support for fd objects")
Link: https://lore.kernel.org/r/20200421082929.311931-2-leon@kernel.org
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:53:23 -03:00
Jin Yao
0e0bf1ea11 perf stat: Zero all the 'ena' and 'run' array slot stats for interval mode
As the code comments in perf_stat_process_counter() say, we calculate
counter's data every interval, and the display code shows ps->res_stats
avg value. We need to zero the stats for interval mode.

But the current code only zeros the res_stats[0], it doesn't zero the
res_stats[1] and res_stats[2], which are for ena and run of counter.

This patch zeros the whole res_stats[] for interval mode.

Fixes: 51fd2df1e8 ("perf stat: Fix interval output values")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200409070755.17261-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-22 15:51:01 -03:00
Jason Gunthorpe
39c011a538 RDMA/uverbs: Fix a race with disassociate and exit_mmap()
If uverbs_user_mmap_disassociate() is called while the mmap is
concurrently doing exit_mmap then the ordering of the
rdma_user_mmap_entry_put() is not reliable.

The put must be done before uvers_user_mmap_disassociate() returns,
otherwise there can be a use after free on the ucontext, and a left over
entry in the xarray. If the put is not done here then it is done during
rdma_umap_close() later.

Add the missing put to the error exit path.

  WARNING: CPU: 7 PID: 7111 at drivers/infiniband/core/rdma_core.c:810 uverbs_destroy_ufile_hw+0x2a5/0x340 [ib_uverbs]
  Modules linked in: bonding ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre mlx5_ib mlx5_core mlxfw pci_hyperv_intf act_ct nf_flow_table ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad 8021q garp mrp openvswitch nsh nf_conncount nfsv3 nfs_acl xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache overlay rpcrdma ib_isert iscsi_target_mod ib_iser kvm_intel ib_srpt iTCO_wdt target_core_mod iTCO_vendor_support kvm ib_srp nf_nat irqbypass crc32_pclmul crc32c_intel nf_conntrack rfkill nf_defrag_ipv6 virtio_net nf_defrag_ipv4 pcspkr ghash_clmulni_intel i2c_i801 net_failover failover i2c_core lpc_ich mfd_core rdma_cm ib_cm iw_cm button ib_core sunrpc sch_fq_codel ip_tables serio_raw [last unloaded: tunnel4]
  CPU: 7 PID: 7111 Comm: python3 Tainted: G        W         5.6.0-rc6-for-upstream-dbg-2020-03-21_06-41-26-18 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
  RIP: 0010:uverbs_destroy_ufile_hw+0x2a5/0x340 [ib_uverbs]
  Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 75 74 49 8b 84 24 08 01 00 00 48 85 c0 0f 84 13 ff ff ff 48 89 ef ff d0 e9 09 ff ff ff <0f> 0b e9 77 ff ff ff e8 0f d8 fa e0 e9 c5 fd ff ff e8 05 d8 fa e0
  RSP: 0018:ffff88840e0779a0 EFLAGS: 00010286
  RAX: dffffc0000000000 RBX: ffff8882a7721c00 RCX: 0000000000000000
  RDX: 1ffff11054ee469f RSI: ffffffff8446d7e0 RDI: ffff8882a77234f8
  RBP: ffff8882a7723400 R08: ffffed1085c0112c R09: 0000000000000001
  R10: 0000000000000001 R11: ffffed1085c0112b R12: ffff888403c30000
  R13: 0000000000000002 R14: ffff8882a7721cb0 R15: ffff8882a7721cd0
  FS:  00007f2046089700(0000) GS:ffff88842de00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f7cfe9a6e20 CR3: 000000040b8ac006 CR4: 0000000000360ee0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   ib_uverbs_remove_one+0x273/0x480 [ib_uverbs]
   ? up_write+0x15c/0x4a0
   remove_client_context+0xa6/0xf0 [ib_core]
   disable_device+0x12d/0x200 [ib_core]
   ? remove_client_context+0xf0/0xf0 [ib_core]
   ? mnt_get_count+0x1d0/0x1d0
   __ib_unregister_device+0x79/0x150 [ib_core]
   ib_unregister_device+0x21/0x30 [ib_core]
   __mlx5_ib_remove+0x91/0x110 [mlx5_ib]
   ? __mlx5_ib_remove+0x110/0x110 [mlx5_ib]
   mlx5_remove_device+0x241/0x310 [mlx5_core]
   mlx5_unregister_device+0x4d/0x1e0 [mlx5_core]
   mlx5_unload_one+0xc0/0x260 [mlx5_core]
   remove_one+0x5c/0x160 [mlx5_core]
   pci_device_remove+0xef/0x2a0
   ? pcibios_free_irq+0x10/0x10
   device_release_driver_internal+0x1d8/0x470
   unbind_store+0x152/0x200
   ? sysfs_kf_write+0x3b/0x180
   ? sysfs_file_ops+0x160/0x160
   kernfs_fop_write+0x284/0x460
   ? __sb_start_write+0x243/0x3a0
   vfs_write+0x197/0x4a0
   ksys_write+0x156/0x1e0
   ? __x64_sys_read+0xb0/0xb0
   ? do_syscall_64+0x73/0x1330
   ? do_syscall_64+0x73/0x1330
   do_syscall_64+0xe7/0x1330
   ? down_write_nested+0x3e0/0x3e0
   ? syscall_return_slowpath+0x970/0x970
   ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
   ? lockdep_hardirqs_off+0x1de/0x2d0
   ? trace_hardirqs_off_thunk+0x1a/0x1c
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f20a3ff0cdb
  Code: 53 48 89 d5 48 89 f3 48 83 ec 18 48 89 7c 24 08 e8 5a fd ff ff 48 89 ea 41 89 c0 48 89 de 48 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 90 fd ff ff 48
  RSP: 002b:00007f2046087040 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 00007f2038016df0 RCX: 00007f20a3ff0cdb
  RDX: 000000000000000d RSI: 00007f2038016df0 RDI: 0000000000000018
  RBP: 000000000000000d R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000100 R11: 0000000000000293 R12: 00007f2046e29630
  R13: 00007f20280035a0 R14: 0000000000000018 R15: 00007f2038016df0

Fixes: c043ff2cfb ("RDMA: Connect between the mmap entry and the umap_priv structure")
Link: https://lore.kernel.org/r/20200413132136.930388-1-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:43:46 -03:00
Aharon Landau
2d7e3ff7b6 RDMA/mlx5: Set GRH fields in query QP on RoCE
GRH fields such as sgid_index, hop limit, et. are set in the QP context
when QP is created/modified.

Currently, when query QP is performed, we fill the GRH fields only if the
GRH bit is set in the QP context, but this bit is not set for RoCE. Adjust
the check so we will set all relevant data for the RoCE too.

Since this data is returned to userspace, the below is an ABI regression.

Fixes: d8966fcd4c ("IB/core: Use rdma_ah_attr accessor functions")
Link: https://lore.kernel.org/r/20200413132028.930109-1-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-04-22 15:43:46 -03:00
David S. Miller
b86a037385 Merge branch 'dt-bindings-net-mdio.yaml-fixes'
Florian Fainelli says:

====================
dt-bindings: net: mdio.yaml fixes

This patch series documents some common MDIO devices properties such as
resets (and delays) and broken-turn-around. The second patch also
rephrases some descriptions to be more general towards MDIO devices and
not specific towards Ethernet PHYs.

Changes in v3:

- corrected wording of 'broken-turn-around' in ethernet-phy.yaml and
  mdio.yaml, add Andrew's R-b tag to patch #3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:43:27 -07:00
Florian Fainelli
630c3ff8c3 dt-bindings: net: mdio: Make descriptions more general
A number of descriptions assume a PHY device, but since this binding
describes a MDIO bus which can have different kinds of MDIO devices
attached to it, rephrase some descriptions to be more general in that
regard.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:43:19 -07:00
Florian Fainelli
b92d905f2c dt-bindings: net: mdio: Document common properties
Some of the properties pertaining to the broken turn around or resets
were only documented in ethernet-phy.yaml while they are applicable
across all MDIO devices and not Ethernet PHYs specifically which are a
superset.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:43:05 -07:00
Florian Fainelli
f42ceca226 dt-bindings: net: Correct description of 'broken-turn-around'
The turn around bytes (2) are placed between the control phase of the
MDIO transaction and the data phase, correct the wording to be more
exact.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:43:05 -07:00
Rob Herring
e996c1fd0c dt-bindings: Re-enable core schemas for dtbs_check
In commit 2ba06cd856 ("kbuild: Always validate DT binding examples"),
the core schemas (from dtschema repo) were inadvertently disabled for
dtbs_checks. Re-enable them.

Fixes: 2ba06cd856 ("kbuild: Always validate DT binding examples")
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
2020-04-22 13:41:07 -05:00
David S. Miller
a3b6e8fd27 Merge branch 'Ocelot-MAC_ETYPE-tc-flower-key-improvements'
Vladimir Oltean says:

====================
Ocelot MAC_ETYPE tc-flower key improvements

As discussed in the comments surrounding this patch:
https://patchwork.ozlabs.org/project/netdev/patch/20200417190308.32598-1-olteanv@gmail.com/

the restrictions imposed on non-MAC_ETYPE rules were harsher than they
needed to be. IP, IPv6, ARP rules can still be added concurrently with
src_mac and dst_mac rules, as long as those MAC address rules do not ask
for an offending EtherType.

For that to actually be supported, we need to parse the EtherType from
the flower classification rule first.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:40:52 -07:00
Vladimir Oltean
4faa2e0643 net: mscc: ocelot: lift protocol restriction for flow_match_eth_addrs keys
An attempt was made in commit fe3490e610 ("net: mscc: ocelot: Hardware
ofload for tc flower filter") to avoid clashes between MAC_ETYPE rules
and IP rules. Because the protocol blacklist should have included
ETH_P_ALL too, it created some confusion, but now the situation should
be dealt with a bit better by the patch immediately previous to this one
("net: mscc: ocelot: refine the ocelot_ace_is_problematic_mac_etype
function").

So now we can remove that check. MAC_ETYPE rules with a protocol of
ETH_P_IP, ETH_P_IPV6, ETH_P_ARP and ETH_P_ALL _are_ supported, with some
restrictions regarding per-port exclusivity which are enforced now.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:40:51 -07:00
Vladimir Oltean
7dec902f4f net: mscc: ocelot: refine the ocelot_ace_is_problematic_mac_etype function
The commit mentioned below was a bit too harsh, and while it restricted
the invalid key combinations which are known to not work, such as:

tc filter add dev swp0 ingress proto ip \
      flower src_ip 192.0.2.1 action drop
tc filter add dev swp0 ingress proto all \
      flower src_mac 00:11:22:33:44:55 action drop

it also restricted some which still should work, such as:

tc filter add dev swp0 ingress proto ip \
      flower src_ip 192.0.2.1 action drop
tc filter add dev swp0 ingress proto 0x22f0 \
      flower src_mac 00:11:22:33:44:55 action drop

What actually does not match "sanely" is a MAC_ETYPE rule on frames
having an EtherType of ARP, IPv4, IPv6, in addition to SNAP and OAM
frames (which the ocelot tc-flower implementation does not parse yet, so
the function might need to be revisited again in the future).

So just make the function recognize the problematic MAC_ETYPE rules by
EtherType - thus the VCAP IS2 can be forced to match even on those
packets.

This patch makes it possible for IP rules to live on a port together
with MAC_ETYPE rules that are non-all, non-arp, non-ip and non-ipv6.

Fixes: d4d0cb741d7b ("net: mscc: ocelot: deal with problematic MAC_ETYPE VCAP IS2 rules")
Reported-by: Allan W. Nielsen <allan.nielsen@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:40:51 -07:00
Vladimir Oltean
86b956de11 net: mscc: ocelot: support matching on EtherType
Currently, the filter's protocol is ignored except for a few special
cases (IPv4 and IPv6).

The EtherType can be matched inside VCAP IS2 by using a MAC_ETYPE key.
So there are 2 cases in which EtherType matches are supported:

  - As part of a larger MAC_ETYPE rule, such as:

    tc filter add dev swp0 ingress protocol ip \
            flower skip_sw src_mac 42:be:24:9b:76:20 action drop

  - Standalone (matching on protocol only):

    tc filter add dev swp0 ingress protocol arp \
            flower skip_sw action drop

As before, if the protocol is not specified, is it implicitly "all" and
the EtherType mask in the MAC_ETYPE half key is set to zero.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:40:51 -07:00
Yuiko Oshino
63edbcceef net: phy: microchip_t1: add lan87xx_phy_init to initialize the lan87xx phy.
lan87xx_phy_init() initializes the lan87xx phy hardware
including its TC10 Wake-up and Sleep features.

Fixes: 3e50d2da58 ("Add driver for Microchip LAN87XX T1 PHYs")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
v0->v1:
    - Add more details in the commit message and source comments.
    - Update to the latest initialization sequences.
    - Add access_ereg_modify_changed().
    - Fix access_ereg() to access SMI bank correctly.
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-22 11:38:58 -07:00
Al Viro
2a89b674fd get rid of csum_partial_copy_to_user()
For historical reasons some architectures call their csum_and_copy_to_user()
csum_partial_copy_to_user() instead (and supply a macro defining the
former as the latter).  That's the last remnants of old experiment that
went nowhere; time to bury them.  Rename those to csum_and_copy_to_user()
and get rid of the macros.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-04-22 14:37:50 -04:00
Benjamin Thiel
60abfd08e8 x86/mm/mmap: Fix -Wmissing-prototypes warnings
Add includes for the prototypes of valid_phys_addr_range(),
arch_mmap_rnd() and valid_mmap_phys_addr_range() in order to fix
-Wmissing-prototypes warnings.

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200402124307.10857-1-b.thiel@posteo.de
2020-04-22 20:19:48 +02:00
Mihai Carabas
9adbf3c609 x86/microcode: Fix return value for microcode late loading
The return value from stop_machine() might not be consistent.

stop_machine_cpuslocked() returns:
- zero if all functions have returned 0.
- a non-zero value if at least one of the functions returned
a non-zero value.

There is no way to know if it is negative or positive. So make
__reload_late() return 0 on success or negative otherwise.

 [ bp: Unify ret val check and touch up. ]

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1587497318-4438-1-git-send-email-mihai.carabas@oracle.com
2020-04-22 19:55:50 +02:00
Linus Torvalds
c578ddb39e Merge tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
 "This consists of fixes to runner scripts and individual test run-time
  bugs. Includes fixes to tpm2 and memfd test run-time regressions"

* tag 'linux-kselftest-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/ipc: Fix test failure seen after initial test run
  Revert "Kernel selftests: tpm2: check for tpm support"
  selftests/ftrace: Add CONFIG_SAMPLE_FTRACE_DIRECT=m kconfig
  selftests/seccomp: allow clock_nanosleep instead of nanosleep
  kselftest/runner: allow to properly deliver signals to tests
  selftests/harness: fix spelling mistake "SIGARLM" -> "SIGALRM"
  selftests: Fix memfd test run-time regression
  selftests: vm: Fix 64-bit test builds for powerpc64le
  selftests: vm: Do not override definition of ARCH
2020-04-22 10:47:49 -07:00
Hans de Goede
c03ee9af4e Bluetooth: btbcm: Add 2 missing models to subver tables
Currently the bcm_uart_subver_ and bcm_usb_subver_table-s lack entries
for the BCM4324B5 and BCM20703A1 chipsets. This makes the code use just
"BCM" as prefix for the filename to pass to request-firmware, making it
harder for users to figure out which firmware they need. This especially
is problematic with the UART attached BCM4324B5 where this leads to the
filename being just "BCM.hcd".

Add the 2 missing devices to subver tables. This has been tested on:

1. A Dell XPS15 9550 where this makes btbcm.c try to load
"BCM20703A1-0a5c-6410.hcd" before it tries to load "BCM-0a5c-6410.hcd".

2. A Thinkpad 8 where this makes btbcm.c try to load
"BCM4324B5.hcd" before it tries to load "BCM.hcd"

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:59 +02:00
Hans de Goede
74530a639a Bluetooth: btbcm: Try multiple Patch filenames when loading the Patch firmware
Currently the bcm_uart_subver_ and bcm_usb_subver_table-s lack entries
for various newer chipsets. This makes the code use just "BCM" as prefix
for the filename to pass to request-firmware, making it harder for users
to figure out which firmware they need. This especially a problem with
UART attached devices where this leads to the filename being "BCM.hcd".

If we add new entries to the subver-tables now, then this will change
what firmware file the kernel looks for, e.g. currently linux-firmware
contains a brcm/BCM-0bb4-0306.hcd file. If we add the info for the
BCM20703A1 to the subver table, then this will change to
brcm/BCM20703A1-0bb4-0306.hcd. This will cause the file to no longer
get loaded breaking Bluetooth for existing users, going against the
no regressions policy.

To avoid this regression make the btbcm code try multiple filenames,
first try the fullname, e.g. BCM20703A1-0bb4-0306.hcd and if that is
not found, then fallback to the name with just BCM as prefix.

This commit also adds an info message which filename was used,
this makes the output look like this for example:

[   57.387867] Bluetooth: hci0: BCM20703A1
[   57.387870] Bluetooth: hci0: BCM20703A1 (001.001.005) build 0000
[   57.389438] Bluetooth: hci0: BCM20703A1 'brcm/BCM20703A1-0a5c-6410.hcd' Patch
[   58.681769] Bluetooth: hci0: BCM20703A1 Generic USB 20Mhz fcbga_BU
[   58.681772] Bluetooth: hci0: BCM20703A1 (001.001.005) build 0481

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Hans de Goede
f53b975cf1 Bluetooth: btbcm: Bail sooner from btbcm_initialize() when not loading fw
If we have already loaded the firmware/patchram and btbcm_initialize()
is called to re-init the HCI after this then there is no need to get
the USB device-ids and build a firmware-filename out of these.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Hans de Goede
0383f16a87 Bluetooth: btbcm: Make btbcm_setup_patchram use btbcm_finalize
On UART attached devices we do:

1. btbcm_initialize()
2. Setup UART baudrate, etc.
3. btbcm_finalize()

After our previous changes we can now also use btbcm_finalize() from
the btbcm_setup_patchram() function used on USB devices without any
functional changes. This completes unifying the USB and UART paths
as much as possible.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Hans de Goede
2fcdd562b9 Bluetooth: btbcm: Make btbcm_initialize() print local-name on re-init too
Make btbcm_initialize() get and print the device's local-name on re-init
too, this will make us also print the local-name after loading the
Patch on UART attached devices making things more consistent.

This also removes some code duplication from btbcm_setup_patchram()
and allows more code duplication removal there in a follow-up patch.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Hans de Goede
0287c5d84f Bluetooth: btbcm: Fold Patch loading + applying into btbcm_initialize()
Instead of having btbcm_initialize() fill a passed in fw_name buffer
and then have its callers use that to request the firmware + load
it into the HCI, make btbcm_initialize() do this itself the first
time it is called (its get called a second time to reset the HCI
after the firmware has been loaded).

This removes some code duplication and makes it easier for further
patches in this series to try more then 1 firmware filename.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Hans de Goede
f8c51d28e9 Bluetooth: btbcm: Move setting of USE_BDADDR_PROPERTY quirk to hci_bcm.c
btbcm_finalize() is currently only used by UART attached BCM devices.

Move the setting of the USE_BDADDR_PROPERTY quirk, which we only want
for UART attached devices to hci_bcm in preparation for using
btbcm_finalize() for USB attached devices too.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Hans de Goede
3fef10ec32 Bluetooth: btbcm: Drop upper nibble version check from btbcm_initialize()
btbcm_initialize() must either return an error; or fill the passed in
fw_name, otherwise we end up passing uninitialized stack memory to
request_firmware().

Since we have a fallback hw_name of "BCM" not having a known version
in the subver field does not matter, drop the check so that we always
fill the passed in fw_name.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-04-22 19:43:58 +02:00
Naoki Kiryu
0df9433fca usb: typec: altmode: Fix typec_altmode_get_partner sometimes returning an invalid pointer
Before this commit, typec_altmode_get_partner would return a
const struct typec_altmode * pointing to address 0x08 when
to_altmode(adev)->partner was NULL.

Add a check for to_altmode(adev)->partner being NULL to fix this.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206365
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1785972
Fixes: 5f54a85db5 ("usb: typec: Make sure an alt mode exist before getting its partner")
Cc: stable@vger.kernel.org
Signed-off-by: Naoki Kiryu <naonaokiryu2@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20200422144345.43262-1-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-22 19:23:19 +02:00
Christoph Hellwig
bdf8710d69 block: move dma_pad handling from blk_rq_map_sg into the callers
There are only two callers of blk_rq_map_sg/__blk_rq_map_sg that set
the dma_pad value in the queue.  Move the handling into those callers
instead of burdening the common code, and move the ->extra_len field
from struct request to struct scsi_cmnd.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-22 10:47:39 -06:00
Christoph Hellwig
cc97923a5b block: move dma drain handling to scsi
Don't burden the common block code with with specifics of the libata DMA
draining mechanism.  Instead move most of the code to the scsi midlayer.

That also means the nr_phys_segments adjustments in the blk-mq fast path
can go away entirely, given that SCSI never looks at nr_phys_segments
after mapping the request to a scatterlist.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-22 10:47:35 -06:00
Christoph Hellwig
0475bd6c65 scsi: merge scsi_init_sgtable into scsi_init_io
scsi_init_io is the only caller of scsi_init_sgtable.  Merge the two
function to make upcoming changes a little easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-22 10:47:06 -06:00
Christoph Hellwig
89de1504d5 block: provide a blk_rq_map_sg variant that returns the last element
To be able to move some of the special purpose hacks in blk_rq_map_sg
into the callers we need a variant that returns the last mapped
S/G list element to the caller.  Add that variant as __blk_rq_map_sg
and make blk_rq_map_sg a trivial inline wrapper around it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-22 10:47:06 -06:00
Christoph Hellwig
e64a0e1692 block: remove RQF_COPY_USER
The RQF_COPY_USER is set for bio where the passthrough request mapping
helpers decided that bounce buffering is required.  It is then used to
pad scatterlist for drivers that required it.  But given that
non-passthrough requests are per definition aligned, and directly mapped
pass-through request must be aligned it is not actually required at all.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-22 10:47:06 -06:00
Chris Wilson
15501287b1 drm/i915/execlists: Drop request-before-CS assertion
When we migrated to execlists, one of the conditions we wanted to test
for was whether the breadcrumb seqno was being written before the
breadcumb interrupt was delivered. This was following on from issues
observed on previous generations which were not so strongly ordered. With
the removal of the missed interrupt detection, we have not reliable
means of detecting the out-of-order seqno/interrupt but instead tried to
assert that the relationship between the CS event interrupt and the
breadwrite should be strongly ordered. However, Icelake proves it is
possible for the HW implementation to forget about minor little details
such as write ordering and so the order between *processing* the CS
event and the breadcrumb is unreliable.

Remove the unreliable assertion, but leave a debug telltale in case we
have reason to suspect.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1658
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422141749.28709-1-chris@chris-wilson.co.uk
2020-04-22 17:17:50 +01:00
Marc Zyngier
41ee52ecbc KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
There is no point in accessing the HW when writing to any of the
ISENABLER/ICENABLER registers from userspace, as only the guest
should be allowed to change the HW state.

Introduce new userspace-specific accessors that deal solely with
the virtual state.

Reported-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-22 17:13:30 +01:00
Marc Zyngier
9a50ebbffa KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
When a guest tries to read the active state of its interrupts,
we currently just return whatever state we have in memory. This
means that if such an interrupt lives in a List Register on another
CPU, we fail to obsertve the latest active state for this interrupt.

In order to remedy this, stop all the other vcpus so that they exit
and we can observe the most recent value for the state. This is
similar to what we are doing for the write side of the same
registers, and results in new MMIO handlers for userspace (which
do not need to stop the guest, as it is supposed to be stopped
already).

Reported-by: Julien Grall <julien@xen.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-04-22 17:13:16 +01:00
Jeremie Francois (on alpha)
e461bc9f9a scripts/config: allow colons in option strings for sed
Sed broke on some strings as it used colon as a separator.
I made it more robust by using \001, which is legit POSIX AFAIK.

E.g. ./config --set-str CONFIG_USBNET_DEVADDR "de:ad:be:ef:00:01"
failed with: sed: -e expression #1, char 55: unknown option to `s'

Signed-off-by: Jeremie Francois (on alpha) <jeremie.francois@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-04-23 01:10:16 +09:00
Adrian Hunter
1a8eb6b373 mmc: sdhci-pci: Fix eMMC driver strength for BYT-based controllers
BIOS writers have begun the practice of setting 40 ohm eMMC driver strength
even though the eMMC may not support it, on the assumption that the kernel
will validate the value against the eMMC (Extended CSD DRIVER_STRENGTH
[offset 197]) and revert to the default 50 ohm value if 40 ohm is invalid.

This is done to avoid changing the value for different boards.

Putting aside the merits of this approach, it is clear the eMMC's mask
of supported driver strengths is more reliable than the value provided
by BIOS. Add validation accordingly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 51ced59cc0 ("mmc: sdhci-pci: Use ACPI DSM to get driver strength for some Intel devices")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200422111629.4899-1-adrian.hunter@intel.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2020-04-22 17:57:17 +02:00
Lorenzo Pieralisi
ef46738cc4 MAINTAINERS: Add Rob Herring and remove Andy Murray as PCI reviewers
Andy Murray decided to step down as PCI controller reviewer and Rob Herring
is willing to help review PCI controller patches.

Update the respective MAINTAINERS entries to reflect this change.

Link: https://lore.kernel.org/r/20200422150336.10528-1-lorenzo.pieralisi@arm.com
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rob Herring <robh@kernel.org>
Cc: Andrew Murray <amurray@thegoodpenguin.co.uk>
2020-04-22 10:53:37 -05:00
Alexey Gladkov
e61bb8b36a proc: use named enums for better readability
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22 10:51:22 -05:00
Alexey Gladkov
1c6c4d112e proc: use human-readable values for hidepid
The hidepid parameter values are becoming more and more and it becomes
difficult to remember what each new magic number means.

Backward compatibility is preserved since it is possible to specify
numerical value for the hidepid parameter. This does not break the
fsconfig since it is not possible to specify a numerical value through
it. All numeric values are converted to a string. The type
FSCONFIG_SET_BINARY cannot be used to indicate a numerical value.

Selftest has been added to verify this behavior.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22 10:51:22 -05:00
Alexey Gladkov
37e7647a72 docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22 10:51:22 -05:00
Alexey Gladkov
6814ef2d99 proc: add option to mount only a pids subset
This allows to hide all files and directories in the procfs that are not
related to tasks.

Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22 10:51:22 -05:00
Alexey Gladkov
24a71ce5c4 proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option
If "hidepid=4" mount option is set then do not instantiate pids that
we can not ptrace. "hidepid=4" means that procfs should only contain
pids that the caller can ptrace.

Signed-off-by: Djalal Harouni <tixxdz@gmail.com>
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22 10:51:21 -05:00
Alexey Gladkov
fa10fed30f proc: allow to mount many instances of proc in one pid namespace
This patch allows to have multiple procfs instances inside the
same pid namespace. The aim here is lightweight sandboxes, and to allow
that we have to modernize procfs internals.

1) The main aim of this work is to have on embedded systems one
supervisor for apps. Right now we have some lightweight sandbox support,
however if we create pid namespacess we have to manages all the
processes inside too, where our goal is to be able to run a bunch of
apps each one inside its own mount namespace without being able to
notice each other. We only want to use mount namespaces, and we want
procfs to behave more like a real mount point.

2) Linux Security Modules have multiple ptrace paths inside some
subsystems, however inside procfs, the implementation does not guarantee
that the ptrace() check which triggers the security_ptrace_check() hook
will always run. We have the 'hidepid' mount option that can be used to
force the ptrace_may_access() check inside has_pid_permissions() to run.
The problem is that 'hidepid' is per pid namespace and not attached to
the mount point, any remount or modification of 'hidepid' will propagate
to all other procfs mounts.

This also does not allow to support Yama LSM easily in desktop and user
sessions. Yama ptrace scope which restricts ptrace and some other
syscalls to be allowed only on inferiors, can be updated to have a
per-task context, where the context will be inherited during fork(),
clone() and preserved across execve(). If we support multiple private
procfs instances, then we may force the ptrace_may_access() on
/proc/<pids>/ to always run inside that new procfs instances. This will
allow to specifiy on user sessions if we should populate procfs with
pids that the user can ptrace or not.

By using Yama ptrace scope, some restricted users will only be able to see
inferiors inside /proc, they won't even be able to see their other
processes. Some software like Chromium, Firefox's crash handler, Wine
and others are already using Yama to restrict which processes can be
ptracable. With this change this will give the possibility to restrict
/proc/<pids>/ but more importantly this will give desktop users a
generic and usuable way to specifiy which users should see all processes
and which users can not.

Side notes:
* This covers the lack of seccomp where it is not able to parse
arguments, it is easy to install a seccomp filter on direct syscalls
that operate on pids, however /proc/<pid>/ is a Linux ABI using
filesystem syscalls. With this change LSMs should be able to analyze
open/read/write/close...

In the new patch set version I removed the 'newinstance' option
as suggested by Eric W. Biederman.

Selftest has been added to verify new behavior.

Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com>
Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-04-22 10:51:21 -05:00