Commit Graph

693606 Commits

Author SHA1 Message Date
Meng Xu
f12f42acdb perf/core: Fix potential double-fetch bug
While examining the kernel source code, I found a dangerous operation that
could turn into a double-fetch situation (a race condition bug) where the same
userspace memory region are fetched twice into kernel with sanity checks after
the first fetch while missing checks after the second fetch.

  1. The first fetch happens in line 9573 get_user(size, &uattr->size).

  2. Subsequently the 'size' variable undergoes a few sanity checks and
     transformations (line 9577 to 9584).

  3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size)

  4. Given that 'uattr' can be fully controlled in userspace, an attacker can
     race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF)
     after the first fetch but before the second fetch. The changed value will be
     copied to 'attr->size'.

  5. There is no further checks on 'attr->size' until the end of this function,
     and once the function returns, we lose the context to verify that 'attr->size'
     conforms to the sanity checks performed in step 2 (line 9577 to 9584).

  6. My manual analysis shows that 'attr->size' is not used elsewhere later,
     so, there is no working exploit against it right now. However, this could
     easily turns to an exploitable one if careless developers start to use
     'attr->size' later.

To fix this, override 'attr->size' from the second fetch to the one from the
first fetch, regardless of what is actually copied in.

In this way, it is assured that 'attr->size' is consistent with the checks
performed after the first fetch.

Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: meng.xu@gatech.edu
Cc: sanidhya@gatech.edu
Cc: taesoo@gatech.edu
Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 13:26:22 +02:00
Dan Carpenter
eaa2f87c6b x86/ldt: Fix off by one in get_segment_base()
ldt->entries[] is allocated in alloc_ldt_struct().  It has
ldt->nr_entries elements and ldt->nr_entries is capped at LDT_ENTRIES.
So if "idx" is == ldt->nr_entries then we're reading beyond the end of
the buffer.  It seems duplicative to have two limit checks when one
would work just as well so I removed the check against LDT_ENTRIES.

The gdt_page.gdt[] array has GDT_ENTRIES entries.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Fixes: d07bdfd322 ("perf/x86: Fix USER/KERNEL tagging of samples properly")
Link: http://lkml.kernel.org/r/20170818102516.gqwm4xdvvuvjw5ho@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 11:55:15 +02:00
Florian Fainelli
c7848399ec net: dsa: Don't dereference dst->cpu_dp->netdev
If we do not have a master network device attached dst->cpu_dp will be
NULL and accessing cpu_dp->netdev will create a trace similar to the one
below. The correct check is on dst->cpu_dp period.

[    1.004650] DSA: switch 0 0 parsed
[    1.008078] Unable to handle kernel NULL pointer dereference at
virtual address 00000010
[    1.016195] pgd = c0003000
[    1.018918] [00000010] *pgd=80000000004003, *pmd=00000000
[    1.024349] Internal error: Oops: 206 [#1] SMP ARM
[    1.029157] Modules linked in:
[    1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc6-00071-g45b45afab9bd-dirty #7
[    1.040772] Hardware name: Broadcom STB (Flattened Device Tree)
[    1.046704] task: ee08f840 task.stack: ee090000
[    1.051258] PC is at dsa_register_switch+0x5e0/0x9dc
[    1.056234] LR is at dsa_register_switch+0x5d0/0x9dc
[    1.061211] pc : [<c08fb28c>]    lr : [<c08fb27c>]    psr: 60000213
[    1.067491] sp : ee091d88  ip : 00000000  fp : 0000000c
[    1.072728] r10: 00000000  r9 : 00000001  r8 : ee208010
[    1.077965] r7 : ee2b57b0  r6 : ee2b5780  r5 : 00000000  r4 :
ee208e0c
[    1.084506] r3 : 00000000  r2 : 00040d00  r1 : 2d1b2000  r0 :
00000016
[    1.091050] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment user
[    1.098199] Control: 32c5387d  Table: 00003000  DAC: fffffffd
[    1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210)

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6d3c8c0dd8 ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 21:19:43 -07:00
Linus Torvalds
9c3a815f47 page waitqueue: always add new entries at the end
Commit 3510ca20ec ("Minor page waitqueue cleanups") made the page
queue code always add new waiters to the back of the queue, which helps
upcoming patches to batch the wakeups for some horrid loads where the
wait queues grow to thousands of entries.

However, I forgot about the nasrt add_page_wait_queue() special case
code that is only used by the cachefiles code.  That one still continued
to add the new wait queue entries at the beginning of the list.

Fix it, because any sane batched wakeup will require that we don't
suddenly start getting new entries at the beginning of the list that we
already handled in a previous batch.

[ The current code always does the whole list while holding the lock, so
  wait queue ordering doesn't matter for correctness, but even then it's
  better to add later entries at the end from a fairness standpoint ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28 16:45:40 -07:00
Roopa Prabhu
ef9a5a62c6 bridge: check for null fdb->dst before notifying switchdev drivers
current switchdev drivers dont seem to support offloading fdb
entries pointing to the bridge device which have fdb->dst
not set to any port. This patch adds a NULL fdb->dst check in
the switchdev notifier code.

This patch fixes the below NULL ptr dereference:
$bridge fdb add 00:02:00:00:00:33 dev br0 self

[   69.953374] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[   69.954044] IP: br_switchdev_fdb_notify+0x29/0x80
[   69.954044] PGD 66527067
[   69.954044] P4D 66527067
[   69.954044] PUD 7899c067
[   69.954044] PMD 0
[   69.954044]
[   69.954044] Oops: 0000 [#1] SMP
[   69.954044] Modules linked in:
[   69.954044] CPU: 1 PID: 3074 Comm: bridge Not tainted 4.13.0-rc6+ #1
[   69.954044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org
04/01/2014
[   69.954044] task: ffff88007b827140 task.stack: ffffc90001564000
[   69.954044] RIP: 0010:br_switchdev_fdb_notify+0x29/0x80
[   69.954044] RSP: 0018:ffffc90001567918 EFLAGS: 00010246
[   69.954044] RAX: 0000000000000000 RBX: ffff8800795e0880 RCX:
00000000000000c0
[   69.954044] RDX: ffffc90001567920 RSI: 000000000000001c RDI:
ffff8800795d0600
[   69.954044] RBP: ffffc90001567938 R08: ffff8800795d0600 R09:
0000000000000000
[   69.954044] R10: ffffc90001567a88 R11: ffff88007b849400 R12:
ffff8800795e0880
[   69.954044] R13: ffff8800795d0600 R14: ffffffff81ef8880 R15:
000000000000001c
[   69.954044] FS:  00007f93d3085700(0000) GS:ffff88007fd00000(0000)
knlGS:0000000000000000
[   69.954044] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   69.954044] CR2: 0000000000000008 CR3: 0000000066551000 CR4:
00000000000006e0
[   69.954044] Call Trace:
[   69.954044]  fdb_notify+0x3f/0xf0
[   69.954044]  __br_fdb_add.isra.12+0x1a7/0x370
[   69.954044]  br_fdb_add+0x178/0x280
[   69.954044]  rtnl_fdb_add+0x10a/0x200
[   69.954044]  rtnetlink_rcv_msg+0x1b4/0x240
[   69.954044]  ? skb_free_head+0x21/0x40
[   69.954044]  ? rtnl_calcit.isra.18+0xf0/0xf0
[   69.954044]  netlink_rcv_skb+0xed/0x120
[   69.954044]  rtnetlink_rcv+0x15/0x20
[   69.954044]  netlink_unicast+0x180/0x200
[   69.954044]  netlink_sendmsg+0x291/0x370
[   69.954044]  ___sys_sendmsg+0x180/0x2e0
[   69.954044]  ? filemap_map_pages+0x2db/0x370
[   69.954044]  ? do_wp_page+0x11d/0x420
[   69.954044]  ? __handle_mm_fault+0x794/0xd80
[   69.954044]  ? vma_link+0xcb/0xd0
[   69.954044]  __sys_sendmsg+0x4c/0x90
[   69.954044]  SyS_sendmsg+0x12/0x20
[   69.954044]  do_syscall_64+0x63/0xe0
[   69.954044]  entry_SYSCALL64_slow_path+0x25/0x25
[   69.954044] RIP: 0033:0x7f93d2bad690
[   69.954044] RSP: 002b:00007ffc7217a638 EFLAGS: 00000246 ORIG_RAX:
000000000000002e
[   69.954044] RAX: ffffffffffffffda RBX: 00007ffc72182eac RCX:
00007f93d2bad690
[   69.954044] RDX: 0000000000000000 RSI: 00007ffc7217a670 RDI:
0000000000000003
[   69.954044] RBP: 0000000059a1f7f8 R08: 0000000000000006 R09:
000000000000000a
[   69.954044] R10: 00007ffc7217a400 R11: 0000000000000246 R12:
00007ffc7217a670
[   69.954044] R13: 00007ffc72182a98 R14: 00000000006114c0 R15:
00007ffc72182aa0
[   69.954044] Code: 1f 00 66 66 66 66 90 55 48 89 e5 48 83 ec 20 f6 47
20 04 74 0a 83 fe 1c 74 09 83 fe 1d 74 2c c9 66 90 c3 48 8b 47 10 48 8d
55 e8 <48> 8b 70 08 0f b7 47 1e 48 83 c7 18 48 89 7d f0 bf 03 00 00 00
[   69.954044] RIP: br_switchdev_fdb_notify+0x29/0x80 RSP:
ffffc90001567918
[   69.954044] CR2: 0000000000000008
[   69.954044] ---[ end trace 03e9eec4a82c238b ]---

Fixes: 6b26b51b1d ("net: bridge: Add support for notifying devices about FDB add/del")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 16:14:00 -07:00
Tejun Heo
b339752d05 cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
@node.  The assumption seems that if !NUMA, there shouldn't be more than
one node and thus reporting cpu_online_mask regardless of @node is
correct.  However, that assumption was broken years ago to support
DISCONTIGMEM and whether a system has multiple nodes or not is
separately controlled by NEED_MULTIPLE_NODES.

This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
cpumask_of_node() will report cpu_online_mask for all possible nodes,
indicating that the CPUs are associated with multiple nodes which is an
impossible configuration.

This bug has been around forever but doesn't look like it has caused any
noticeable symptoms.  However, it triggers a WARN recently added to
workqueue to verify NUMA affinity configuration.

Fix it by reporting empty cpumask on non-zero nodes if !NUMA.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28 16:13:16 -07:00
Alexey Brodkin
e8206d2baa ARCv2: SMP: Mask only private-per-core IRQ lines on boot at core intc
Recent commit a8ec3ee861 "arc: Mask individual IRQ lines during core
INTC init" breaks interrupt handling on ARCv2 SMP systems.

That commit masked all interrupts at onset, as some controllers on some
boards (customer as well as internal), would assert interrutps early
before any handlers were installed.  For SMP systems, the masking was
done at each cpu's core-intc.  Later, when the IRQ was actually
requested, it was unmasked, but only on the requesting cpu.

For "common" interrupts, which were wired up from the 2nd level IDU
intc, this was as issue as they needed to be enabled on ALL the cpus
(given that IDU IRQs are by default served Round Robin across cpus)

So fix that by NOT masking "common" interrupts at core-intc, but instead
at the 2nd level IDU intc (latter already being done in idu_of_init())

Fixes: a8ec3ee861 ("arc: Mask individual IRQ lines during core INTC init")
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
[vgupta: reworked changelog, removed the extraneous idu_irq_mask_raw()]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28 16:11:15 -07:00
Helge Deller
79de3cbe9a fs/select: Fix memory corruption in compat_get_fd_set()
Commit 464d62421c ("select: switch compat_{get,put}_fd_set() to
compat_{get,put}_bitmap()") changed the calculation on how many bytes
need to be zeroed when userspace handed over a NULL pointer for a fdset
array in the select syscall.

The calculation was changed in compat_get_fd_set() wrongly from
	memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
to
	memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));

The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
to be zeroed in the target fdset array (rounded up to the next full bits
for an unsigned long).

But the memset() call expects the number of _bytes_ to be zeroed.

This leads to clearing more memory than wanted (on the stack area or
even at kmalloc()ed memory areas) and to random kernel crashes as we
have seen them on the parisc platform.

The correct change should have been

	memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);

which is the same as can be archieved with a call to

	zero_fd_set(nr, fdset).

Fixes: 464d62421c ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
Acked-by:: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-28 16:09:19 -07:00
Xin Long
1e2ea8ad37 ipv6: set dst.obsolete when a cached route has expired
Now it doesn't check for the cached route expiration in ipv6's
dst_ops->check(), because it trusts dst_gc that would clean the
cached route up when it's expired.

The problem is in dst_gc, it would clean the cached route only
when it's refcount is 1. If some other module (like xfrm) keeps
holding it and the module only release it when dst_ops->check()
fails.

But without checking for the cached route expiration, .check()
may always return true. Meanwhile, without releasing the cached
route, dst_gc couldn't del it. It will cause this cached route
never to expire.

This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
in .check.

Note that this is even needed when ipv6 dst_gc timer is removed
one day. It would set dst.obsolete in .redirect and .update_pmtu
instead, and check for cached route expiration when getting it,
just like what ipv4 route does.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:45:04 -07:00
Wei Wang
4e587ea71b ipv6: fix sparse warning on rt6i_node
Commit c5cff8561d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
  net/ipv6/route.c:1394:30: error: incompatible types in comparison
  expression (different address spaces)
  ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
  expression (different address spaces)

This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.

Fixes: c5cff8561d ("ipv6: add rcu grace period before freeing fib6_node")
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:34:40 -07:00
Stefano Brivio
0f3086868e cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox()
Passing commands for logging to t4_record_mbox() with size
MBOX_LEN, when the actual command size is actually smaller,
causes out-of-bounds stack accesses in t4_record_mbox() while
copying command words here:

	for (i = 0; i < size / 8; i++)
		entry->cmd[i] = be64_to_cpu(cmd[i]);

Up to 48 bytes from the stack are then leaked to debugfs.

This happens whenever we send (and log) commands described by
structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
fw_sched_cmd(32), fw_devlog_cmd(32).

The cxgb4vf driver got this right instead.

When we call t4_record_mbox() to log a command reply, a MBOX_LEN
size can be used though, as get_mbox_rpl() will fill cmd_rpl up
completely.

Fixes: 7f080c3f2f ("cxgb4: Add support to enable logging of firmware mailbox commands")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:24:23 -07:00
Maxime Ripard
ad4540cc5a net: stmmac: sun8i: Remove the compatibles
Since the bindings have been controversial, and we follow the DT stable ABI
rule, we shouldn't let a driver with a DT binding that might change slip
through in a stable release.

Remove the compatibles to make sure the driver will not probe and no-one
will start using the binding currently implemented. This commit will
obviously need to be reverted in due time.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:22:42 -07:00
David S. Miller
c73c8a8e07 Merge branch 'nfp-flow-dissector-layer'
Pieter Jansen van Vuuren says:

====================
nfp: fix layer calculation and flow dissector use

Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:25 -07:00
Pieter Jansen van Vuuren
6afd33e438 nfp: remove incorrect mask check for vlan matching
Previously the vlan tci field was incorrectly exact matched. This patch
fixes this by using the flow dissector to populate the vlan tci field.

Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:24 -07:00
Pieter Jansen van Vuuren
74af597510 nfp: fix supported key layers calculation
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.

Fixes: af9d842c13 ("nfp: extend flower add flow offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:24 -07:00
Pieter Jansen van Vuuren
a7cd39e0c7 nfp: fix unchecked flow dissector use
Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.

Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:20:24 -07:00
David S. Miller
77146b5d79 Merge branch 'l2tp-tunnel-refs'
Guillaume Nault says:

====================
l2tp: fix some l2tp_tunnel_find() issues in l2tp_netlink

Since l2tp_tunnel_find() doesn't take a reference on the tunnel it
returns, its users are almost guaranteed to be racy.

This series defines l2tp_tunnel_get() which can be used as a safe
replacement, and converts some of l2tp_tunnel_find() users in the
l2tp_netlink module.

Other users often combine this issue with other more or less subtle
races. They will be fixed incrementally in followup series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:34:59 -07:00
Guillaume Nault
e702c1204e l2tp: hold tunnel used while creating sessions with netlink
Use l2tp_tunnel_get() to retrieve tunnel, so that it can't go away on
us. Otherwise l2tp_tunnel_destruct() might release the last reference
count concurrently, thus freeing the tunnel while we're using it.

Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:34:58 -07:00
Guillaume Nault
4e4b21da3a l2tp: hold tunnel while handling genl TUNNEL_GET commands
Use l2tp_tunnel_get() instead of l2tp_tunnel_find() so that we get
a reference on the tunnel, preventing l2tp_tunnel_destruct() from
freeing it from under us.

Also move l2tp_tunnel_get() below nlmsg_new() so that we only take
the reference when needed.

Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:34:58 -07:00
Guillaume Nault
8c0e421525 l2tp: hold tunnel while handling genl tunnel updates
We need to make sure the tunnel is not going to be destroyed by
l2tp_tunnel_destruct() concurrently.

Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:34:58 -07:00
Guillaume Nault
bb0a32ce43 l2tp: hold tunnel while processing genl delete command
l2tp_nl_cmd_tunnel_delete() needs to take a reference on the tunnel, to
prevent it from being concurrently freed by l2tp_tunnel_destruct().

Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:34:58 -07:00
Guillaume Nault
54652eb12c l2tp: hold tunnel while looking up sessions in l2tp_netlink
l2tp_tunnel_find() doesn't take a reference on the returned tunnel.
Therefore, it's unsafe to use it because the returned tunnel can go
away on us anytime.

Fix this by defining l2tp_tunnel_get(), which works like
l2tp_tunnel_find(), but takes a reference on the returned tunnel.
Caller then has to drop this reference using l2tp_tunnel_dec_refcount().

As l2tp_tunnel_dec_refcount() needs to be moved to l2tp_core.h, let's
simplify the patch and not move the L2TP_REFCNT_DEBUG part. This code
has been broken (not even compiling) in May 2012 by
commit a4ca44fa57 ("net: l2tp: Standardize logging styles")
and fixed more than two years later by
commit 29abe2fda5 ("l2tp: fix missing line continuation"). So it
doesn't appear to be used by anyone.

Same thing for l2tp_tunnel_free(); instead of moving it to l2tp_core.h,
let's just simplify things and call kfree_rcu() directly in
l2tp_tunnel_dec_refcount(). Extra assertions and debugging code
provided by l2tp_tunnel_free() didn't help catching any of the
reference counting and socket handling issues found while working on
this series.

Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:34:58 -07:00
Guillaume Nault
9ee369a405 l2tp: initialise session's refcount before making it reachable
Sessions must be fully initialised before calling
l2tp_session_add_to_tunnel(). Otherwise, there's a short time frame
where partially initialised sessions can be accessed by external users.

Fixes: dbdbc73b44 ("l2tp: fix duplicate session creation")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:28:33 -07:00
Antoine Tenart
4c22868264 net: mvpp2: fix the mac address used when using PPv2.2
The mac address is only retrieved from h/w when using PPv2.1. Otherwise
the variable holding it is still checked and used if it contains a valid
value. As the variable isn't initialized to an invalid mac address
value, we end up with random mac addresses which can be the same for all
the ports handled by this PPv2 driver.

Fixes this by initializing the h/w mac address variable to {0}, which is
an invalid mac address value. This way the random assignation fallback
is called and all ports end up with their own addresses.

Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Fixes: 2697582144 ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:24:52 -07:00
Aleksander Morgado
3b638f0f0b cdc_ncm: flag the u-blox TOBY-L4 as wwan
The u-blox TOBY-L4 is a LTE Advanced (Cat 6) module with HSPA+ and 2G
fallback.

Unlike the TOBY-L2, this module has one single USB layout and exposes
several TTYs for control and a NCM interface for data. Connecting this
module may be done just by activating the desired PDP context with
'AT+CGACT=1,<cid>' and then running DHCP on the NCM interface.

Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:24:03 -07:00
Jesper Dangaard Brouer
1e22391e8f net: missing call of trace_napi_poll in busy_poll_stop
Noticed that busy_poll_stop() also invoke the drivers napi->poll()
function pointer, but didn't have an associated call to trace_napi_poll()
like all other call sites.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 11:22:21 -07:00
Linus Torvalds
702e97621e c6x tweaks 4.13
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZocn1AAoJEOiN4VijXeFPbfEP/3pMATh1SuY86/y19GYR0yT3
 2YmaasugtrqIMzQEabLDI6rg6bSudWvGk87R6Wr2dsDpEPSEM948OVAOjlnhZh2G
 06bwVEcNA7dE884iMBaZ43VAXENTaCG2macqePdXa6lLq5snyFhCqsoDLLDJOxN6
 HeSvyEb3yreziTFBPAfB8TNhFYPTXXaGmMviiZljIEJfeRW832FlIRk/rfDVU8L6
 wVGx2HETyFG34NP2eGsA8z1PFqc5PffQFlk4+xJENjvp24Kwcaeu28k5/ph6nlsH
 x9+y5uSyiSIsrdtf7TH3vUSDGZz7byWn3C00s7IytEYPeqSCq69sM+qCDuufoVJL
 D2B14O2L+8xWBFPiC7fZMVzq6hMYkP14ZjtuhecucmHc6p1TWFiLD3oP7jNwS2lD
 WYAL0VeP1BwnmX0IpbZOR0J5AffnZXlIjmZFcA9HtGc3StBOCCE0AF05L4Rqbyo3
 Up2tDug6VUcEhJOv61lPkrLTLuKdH5M8F9uXhNuUlFWV2yGEKclvvgKkcFZdLgYp
 zmYva6g/ttj63UWCuXXo5hHXYGEv2W/hozmoOtdy0HB/jF3cPLCUABmBTZoQ1P7j
 vjcYmcqpr1HOjVhMnh9BPMyhz45pD+BBHyJYXZCPIrUzMrXK320X9ewvfqVNOD2y
 HeadOK8rTklyvi49ppPP
 =cE+t
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming

Pull c6x tweaks from Mark Salter.

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
  c6x: Convert to using %pOF instead of full_name
  c6x: defconfig: Cleanup from old Kconfig options
2017-08-28 11:15:46 -07:00
Anthony Martin
3f9db52dc8 Input: synaptics - fix device info appearing different on reconnect
User-modified input settings no longer survive a suspend/resume cycle.
Starting with 4.12, the touchpad is reinitialized on every reconnect
because the hardware appears to be different. This can be reproduced
by running the following as root:

    echo -n reconnect >/sys/devices/platform/i8042/serio1/drvctl

A line like the following will show up in dmesg:

    [30378.295794] psmouse serio1: synaptics: hardware appears to be
                   different: id(149271-149271), model(114865-114865),
                   caps(d047b3-d047b1), ext(b40000-b40000).

Note the single bit difference in caps: bit 1 (SYN_CAP_MULTIFINGER).

This happens because we modify our stored copy of the device info
capabilities when we enable advanced gesture mode but this change is
not reflected in the actual hardware capabilities.

It worked in the past because synaptics_query_hardware used to modify
the stored synaptics_device_info struct instead of filling in a new
one, as it does now.

Fix it by no longer faking the SYN_CAP_MULTIFINGER bit when setting
advanced gesture mode. This necessitated a small refactoring.

Fixes: 6c53694fb2 ("Input: synaptics - split device info into a separate structure")
Signed-off-by: Anthony Martin <ality@pbrane.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2017-08-28 10:36:46 -07:00
Christoph Hellwig
35f0b6a779 libata: quirk read log on no-name M.2 SSD
Ido reported that reading the log page on his systems fails,
so quirk it as it won't support ZBC or security protocols.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-28 10:27:16 -07:00
Dan Williams
7a14724f54 libnvdimm: clean up command definitions
Remove the command payloads that do not have an associated libnvdimm
ioctl. I.e. remove the payloads that would only ever be carried in the
ND_CMD_CALL envelope. This prevents userspace from growing unnecessary
dependencies on this kernel header when userspace already has everything
it needs to craft and send these commands.

Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
Reported-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-08-28 08:33:20 -07:00
Bart Van Assche
1c23484c35 dm mpath: do not lock up a CPU with requeuing activity
When using the block layer in single queue mode, get_request()
returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
flag has been passed to get_request(). Avoid that the kernel
reports soft lockup complaints in this case due to continuous
requeuing activity.

Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 09:58:27 -04:00
Bart Van Assche
604407890e dm: fix printk() rate limiting code
Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.

Fixes: 71a16736a1 ("dm: use local printk ratelimit")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 09:58:27 -04:00
Bart Van Assche
68515cc721 dm mpath: retry BLK_STS_RESOURCE errors
Retry requests instead of failing them if an out-of-memory error occurs
or the block driver below dm-mpath is busy.  This restores the v4.12
behavior of noretry_error(), namely that -ENOMEM results in a retry.

Fixes: 2a842acab1 ("block: introduce new block status code type")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 09:58:26 -04:00
Bart Van Assche
54385bf75c dm: fix the second dec_pending() argument in __split_and_process_bio()
Detected by sparse.

Fixes: 4e4cbee93d ("block: switch bios to blk_status_t")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 09:36:19 -04:00
Maxime Ripard
fe45174b72 arm: dts: sunxi: Revert EMAC changes
Since the discussion is not settled yet for the EMAC, and that the release
in getting really close, let's revert the changes for now, and we'll
reintroduce them later.

Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2017-08-28 11:11:24 +02:00
Maxime Ripard
87e1f5e8bb arm64: dts: allwinner: Revert EMAC changes
Since the discussion is not settled yet for the EMAC, and that the release
in getting really close, let's revert the changes for now, and we'll
reintroduce them later.

Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2017-08-28 11:11:20 +02:00
Maxime Ripard
8aa33ec2f4 dt-bindings: net: Revert sun8i dwmac binding
This binding still doesn't please everyone, and we're getting far too
close from the release to allow it to reach a stable version.

Let's remove it until the discussion settles down.

Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
2017-08-28 11:11:05 +02:00
Mathias Krause
931e79d7a7 xfrm_user: fix info leak in build_aevent()
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the sa_id before filling it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Fixes: d51d081d65 ("[IPSEC]: Sync series - user")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28 10:58:02 +02:00
Mathias Krause
e3e5fc1698 xfrm_user: fix info leak in build_expire()
The memory reserved to dump the expired xfrm state includes padding
bytes in struct xfrm_user_expire added by the compiler for alignment. To
prevent the heap info leak, memset(0) the remainder of the struct.
Initializing the whole structure isn't needed as copy_to_user_state()
already takes care of clearing the padding bytes within the 'state'
member.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28 10:58:02 +02:00
Mathias Krause
50329c8a34 xfrm_user: fix info leak in xfrm_notify_sa()
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the whole struct before filling
it.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Fixes: 0603eac0d6 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28 10:58:02 +02:00
Mathias Krause
5fe0d4bd8f xfrm_user: fix info leak in copy_user_offload()
The memory reserved to dump the xfrm offload state includes padding
bytes of struct xfrm_user_offload added by the compiler for alignment.
Add an explicit memset(0) before filling the buffer to avoid the heap
info leak.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: d77e38e612 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-08-28 10:58:02 +02:00
Linus Torvalds
cc4a41fe55 Linux 4.13-rc7 2017-08-27 17:20:40 -07:00
Linus Torvalds
2c25833c42 IOMMU Fixes for Linux v4.13-rc6
Another fix, this time in common IOMMU sysfs code
 
 	- In the conversion from the old iommu sysfs-code to the
 	  iommu_device_register interface, I missed to update the
 	  release path for the struct device associated with an IOMMU.
 	  It freed the 'struct device', which was a pointer before, but
 	  is now embedded in another struct. Freeing from the middle of
 	  allocated memory had all kinds of nasty side effects when an
 	  IOMMU was unplugged. Unfortunatly nobody unplugged and IOMMU
 	  until now, so this was not discovered earlier.  The fix is to
 	  make the 'struct device' a pointer again.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZor4gAAoJECvwRC2XARrj3CUP/3/rhNhHfuvq0C+Nz+ikzZSg
 xavSSbZztC3SQnr9bgky5TP8Djf9zXzD6DiEGhJoOr98K7R/nvFyaR1NfYnIwHi3
 ofngfOcEuuNzInO9L0huHlkqlbUxEwcWTi/QbrFm+W2iL6vOgYejlspFLXAPviDo
 BlSzJTHzeyXJPZqKDuKB2oO+fVk/xor7KEelsh5fsRrBwFl/JclH5SwIusv4ORfJ
 sY+02Z8MfLx5+NUvSDj/APoGOlYn0T+XipvduIp2wDtQBmDvN332KWqB1JnAKVdM
 j27l0BnHABbe5TjQMzj3opAl2v2ZsUqRzolfJdvrh8Gr3gLT1LyMn8A3CRzelBDI
 jzNsPp9BG2z8enUrppy6yZwv95uxEvNrwrc7jmX46UK12Gf7eBlNGLSe4u+5Ctj5
 5e6Eui5y5g/4/DW+BbXt+DjYZHwqJdC1+KAI9XR6sMPRweEmdLhclqgtYhTjGGX9
 w2swhpWjcZ7bte8EF/Mlg2Dl6//WTcqFBeyZbHe+HwzWP33EIXpHdfwJCtWpfD/+
 lvdDvI2DUrDUiMVcJwnYrWbRuHtdE/fjI0BtmYA01JL0Oe4+kxB3vS4MnlmH8ENc
 i7KThAEDdyrqeX1DTPmef1YTuhprGAB/pj2GYGe/93QDXMDOPMG3pvYm6Up02MzD
 a2UNL/JvCEQloABXFyIM
 =/lXQ
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fix from Joerg Roedel:
 "Another fix, this time in common IOMMU sysfs code.

  In the conversion from the old iommu sysfs-code to the
  iommu_device_register interface, I missed to update the release path
  for the struct device associated with an IOMMU. It freed the 'struct
  device', which was a pointer before, but is now embedded in another
  struct.

  Freeing from the middle of allocated memory had all kinds of nasty
  side effects when an IOMMU was unplugged. Unfortunatly nobody
  unplugged and IOMMU until now, so this was not discovered earlier. The
  fix is to make the 'struct device' a pointer again"

* tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Fix wrong freeing of iommu_device->dev
2017-08-27 17:10:34 -07:00
Linus Torvalds
80f73b2da0 char/misc fix for 4.13-rc7
Here is a single misc driver fix for 4.13-rc7.  It resolves a reported
 problem in the Android binder driver due to previous patches in 4.13-rc.
 
 It's been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWaJyTQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk3GgCgi/suT2Mqfun8Ohmz9i4fMwjJ7UwAn2s3XxeH
 3b+zwqeZD1+zB/w6hZ2v
 =9B01
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single misc driver fix for 4.13-rc7. It resolves a reported
  problem in the Android binder driver due to previous patches in
  4.13-rc.

  It's been in linux-next with no reported issues"

* tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  ANDROID: binder: fix proc->tsk check.
2017-08-27 17:08:37 -07:00
Linus Torvalds
c3c162635f staging/iio fixes for 4.13-rc7
Here are few small staging driver fixes, and some more IIO driver fixes
 for 4.13-rc7.  Nothing major, just resolutions for some reported
 problems.
 
 All of these have been in linux-next with no reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWaJy4A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynWcACgxpL4f0LeykFayPprtrciey5OOGoAnAhfG7Lq
 LCuaIj8AtUVfwoWXVwBA
 =RSsO
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/iio fixes from Greg KH:
 "Here are few small staging driver fixes, and some more IIO driver
  fixes for 4.13-rc7. Nothing major, just resolutions for some reported
  problems.

  All of these have been in linux-next with no reported problems"

* tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  iio: magnetometer: st_magn: remove ihl property for LSM303AGR
  iio: magnetometer: st_magn: fix status register address for LSM303AGR
  iio: hid-sensor-trigger: Fix the race with user space powering up sensors
  iio: trigger: stm32-timer: fix get trigger mode
  iio: imu: adis16480: Fix acceleration scale factor for adis16480
  PATCH] iio: Fix some documentation warnings
  staging: rtl8188eu: add RNX-N150NUB support
  Revert "staging: fsl-mc: be consistent when checking strcmp() return"
  iio: adc: stm32: fix common clock rate
  iio: adc: ina219: Avoid underflow for sleeping time
  iio: trigger: stm32-timer: add enable attribute
  iio: trigger: stm32-timer: fix get/set down count direction
  iio: trigger: stm32-timer: fix write_raw return value
  iio: trigger: stm32-timer: fix quadrature mode get routine
  iio: bmp280: properly initialize device for humidity reading
2017-08-27 17:03:33 -07:00
Linus Torvalds
fff4e7a0e6 NTB bug fixes to address an incorrect ntb_mw_count reference in the NTB
transport, improperly bringing down the link if SPADs are corrupted, and
 an out-of-order issue regarding link negotiation and data passing.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZokFeAAoJEG5mS6x6i9IjwS8P/1fFjRt6q4Xr+/PGgYZzY+OH
 7Rnbhx89PuWECJuh0k2r2L0R4IsXueejTRkQyjE++AffwcuidnYdgHZwSUSgA3MR
 PuOXNA7PCRe1DW6BDe+Uvwigx+RUlQltQFihopi9YITu667/YlSNu2MWplpQxbTo
 RKDh2WhiI5SGsFtfS1CPkxtcvOqJEelR5yFuT6LUazw7EYbpjWBiRwTx5SovcncV
 bmLQEPSvOe1+HMJza1kBXr/UrnwryGz1CeoIWQk42bJePCedzMQpNxz/K9r3gol2
 Eem9Zbn+f5fAaogQiDAXi7aTObqf5LqzN3XdJjmKBq5buGGEt5+HUTkzWpYnvrlL
 M2kjc8NnxBb8Nx5BsTlOhUgvT81vCVJL25QFv5tN903Bc4qQG6/DXwqcLGIKszJ4
 rZw1n4dm0eWq4lPbUSLC8hKj6aV2yIwA1+nI7hbuky6vmX0rNxSHe/RRQsjUFIoP
 0NNDZGuIUGHJQuVeg9xaH6EOGi0xQdfZ/rXFoTaPW7JrDr7C4gAbVQYnGt/wJwvz
 cnmix+nS70VfZAW0JD9z4Qax3yyVbosQpYFMEwNfGcWqQ36A6tm0pzDpmb5M5tQp
 K90kBpfEUMETeH+vqMdc0c8Rn2mgu/YH/AOXdsyeYAqo/b9iLDztBPXJjWUcoGeB
 u68MYTfE+n1RUsycafn4
 =qmzC
 -----END PGP SIGNATURE-----

Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb

Pull NTB fixes from Jon Mason:
 "NTB bug fixes to address an incorrect ntb_mw_count reference in the
  NTB transport, improperly bringing down the link if SPADs are
  corrupted, and an out-of-order issue regarding link negotiation and
  data passing"

* tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb:
  ntb: ntb_test: ensure the link is up before trying to configure the mws
  ntb: transport shouldn't disable link due to bogus values in SPADs
  ntb: use correct mw_count function in ntb_tool and ntb_transport
2017-08-27 17:01:54 -07:00
Linus Torvalds
a8b169afbf Avoid page waitqueue race leaving possible page locker waiting
The "lock_page_killable()" function waits for exclusive access to the
page lock bit using the WQ_FLAG_EXCLUSIVE bit in the waitqueue entry
set.

That means that if it gets woken up, other waiters may have been
skipped.

That, in turn, means that if it sees the page being unlocked, it *must*
take that lock and return success, even if a lethal signal is also
pending.

So instead of checking for lethal signals first, we need to check for
them after we've checked the actual bit that we were waiting for.  Even
if that might then delay the killing of the process.

This matches the order of the old "wait_on_bit_lock()" infrastructure
that the page locking used to use (and is still used in a few other
areas).

Note that if we still return an error after having unsuccessfully tried
to acquire the page lock, that is ok: that means that some other thread
was able to get ahead of us and lock the page, and when that other
thread then unlocks the page, the wakeup event will be repeated.  So any
other pending waiters will now get properly woken up.

Fixes: 6290602709 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Jan Kara <jack@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-27 16:25:09 -07:00
Linus Torvalds
3510ca20ec Minor page waitqueue cleanups
Tim Chen and Kan Liang have been battling a customer load that shows
extremely long page wakeup lists.  The cause seems to be constant NUMA
migration of a hot page that is shared across a lot of threads, but the
actual root cause for the exact behavior has not been found.

Tim has a patch that batches the wait list traversal at wakeup time, so
that we at least don't get long uninterruptible cases where we traverse
and wake up thousands of processes and get nasty latency spikes.  That
is likely 4.14 material, but we're still discussing the page waitqueue
specific parts of it.

In the meantime, I've tried to look at making the page wait queues less
expensive, and failing miserably.  If you have thousands of threads
waiting for the same page, it will be painful.  We'll need to try to
figure out the NUMA balancing issue some day, in addition to avoiding
the excessive spinlock hold times.

That said, having tried to rewrite the page wait queues, I can at least
fix up some of the braindamage in the current situation. In particular:

 (a) we don't want to continue walking the page wait list if the bit
     we're waiting for already got set again (which seems to be one of
     the patterns of the bad load).  That makes no progress and just
     causes pointless cache pollution chasing the pointers.

 (b) we don't want to put the non-locking waiters always on the front of
     the queue, and the locking waiters always on the back.  Not only is
     that unfair, it means that we wake up thousands of reading threads
     that will just end up being blocked by the writer later anyway.

Also add a comment about the layout of 'struct wait_page_key' - there is
an external user of it in the cachefiles code that means that it has to
match the layout of 'struct wait_bit_key' in the two first members.  It
so happens to match, because 'struct page *' and 'unsigned long *' end
up having the same values simply because the page flags are the first
member in struct page.

Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Christopher Lameter <cl@linux.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-27 13:55:12 -07:00
Linus Torvalds
0cc3b0ec23 Clarify (and fix) MAX_LFS_FILESIZE macros
We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus.  On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong.  We used to
define that value to

	(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense.  The index is actually the full
32 bit unsigned value, and we can use that whole full page.  So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow.  So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX << PAGE_SHIFT.  That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume.  It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f45 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed.  But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case.  That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f45 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <nazard@nazar.ca>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-27 12:12:25 -07:00
Linus Torvalds
bab9752480 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:

 - a tweak to the IBM Trackpoint driver that helps recognizing
   trackpoints on never Lenovo Carbons

 - a fix to the ALPS driver solving scroll issues on some Dells

 - yet another ACPI ID has been added to Elan I2C toucpad driver

 - quieted diagnostic message in soc_button_array driver

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
  Input: soc_button_array - silence -ENOENT error on Dell XPS13 9365
  Input: trackpoint - add new trackpoint firmware ID
  Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
2017-08-26 12:48:29 -07:00