Commit Graph

751331 Commits

Author SHA1 Message Date
Roman Gushchin
f1782c9bc5 dcache: account external names as indirectly reclaimable memory
I received a report about suspicious growth of unreclaimable slabs on
some machines.  I've found that it happens on machines with low memory
pressure, and these unreclaimable slabs are external names attached to
dentries.

External names are allocated using generic kmalloc() function, so they
are accounted as unreclaimable.  But they are held by dentries, which
are reclaimable, and they will be reclaimed under the memory pressure.

In particular, this breaks MemAvailable calculation, as it doesn't take
unreclaimable slabs into account.  This leads to a silly situation, when
a machine is almost idle, has no memory pressure and therefore has a big
dentry cache.  And the resulting MemAvailable is too low to start a new
workload.

To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is
used to track the amount of memory, consumed by external names.  The
counter is increased in the dentry allocation path, if an external name
structure is allocated; and it's decreased in the dentry freeing path.

To reproduce the problem I've used the following Python script:

  import os

  for iter in range (0, 10000000):
      try:
          name = ("/some_long_name_%d" % iter) + "_" * 220
          os.stat(name)
      except Exception:
          pass

Without this patch:
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    7811688 kB
  $ python indirect.py
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    2753052 kB

With the patch:
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    7809516 kB
  $ python indirect.py
  $ cat /proc/meminfo | grep MemAvailable
  MemAvailable:    7749144 kB

[guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB]
  Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com
[guro@fb.com: fix indirectly reclaimable memory accounting]
  Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com
Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:29 -07:00
Roman Gushchin
034ebf65c3 mm: treat indirectly reclaimable memory as available in MemAvailable
Adjust /proc/meminfo MemAvailable calculation by adding the amount of
indirectly reclaimable memory (rounded to the PAGE_SIZE).

Link: http://lkml.kernel.org/r/20180305133743.12746-4-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:29 -07:00
Roman Gushchin
eb59254608 mm: introduce NR_INDIRECTLY_RECLAIMABLE_BYTES
Patch series "indirectly reclaimable memory", v2.

This patchset introduces the concept of indirectly reclaimable memory
and applies it to fix the issue of when a big number of dentries with
external names can significantly affect the MemAvailable value.

This patch (of 3):

Introduce a concept of indirectly reclaimable memory and adds the
corresponding memory counter and /proc/vmstat item.

Indirectly reclaimable memory is any sort of memory, used by the kernel
(except of reclaimable slabs), which is actually reclaimable, i.e.  will
be released under memory pressure.

The counter is in bytes, as it's not always possible to count such
objects in pages.  The name contains BYTES by analogy to
NR_KERNEL_STACK_KB.

Link: http://lkml.kernel.org/r/20180305133743.12746-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:29 -07:00
Martin Schwidefsky
6a3d1e81a4 s390: correct nospec auto detection init order
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via
early_initcall is done *after* the early_param functions. This
overwrites any settings done with the nobp/no_spectre_v2/spectre_v2
parameters. The code patching for the kernel is done after the
evaluation of the early parameters but before the early_initcall
is done. The end result is a kernel image that is patched correctly
but the kernel modules are not.

Make sure that the nospec auto detection function is called before the
early parameters are evaluated and before the code patching is done.

Fixes: 6e179d6412 ("s390: add automatic detection of the spectre defense")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-11 17:46:00 +02:00
Steven Rostedt (VMware)
0b3dec05db tracing: Enforce passing in filter=NULL to create_filter()
There's some inconsistency with what to set the output parameter filterp
when passing to create_filter(..., struct event_filter **filterp).

Whatever filterp points to, should be NULL when calling this function. The
create_filter() calls create_filter_start() with a pointer to a local
"filter" variable that is set to NULL. The create_filter_start() has a
WARN_ON() if the passed in pointer isn't pointing to a value set to NULL.

Ideally, create_filter() should pass the filterp variable it received to
create_filter_start() and not hide it as with a local variable, this allowed
create_filter() to fail, and not update the passed in filter, and the caller
of create_filter() then tried to free filter, which was never initialized to
anything, causing memory corruption.

Link: http://lkml.kernel.org/r/00000000000032a0c30569916870@google.com

Fixes: 80765597bc ("tracing: Rewrite filter logic to be simpler and faster")
Reported-by: syzbot+dadcc936587643d7f568@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:31:08 -04:00
Ravi Bangoria
a64b2c01e6 trace_uprobe: Simplify probes_seq_show()
Simplify probes_seq_show() function. No change in output
before and after patch.

Link: http://lkml.kernel.org/r/20180315082756.9050-2-ravi.bangoria@linux.vnet.ibm.com

Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:31:07 -04:00
Ravi Bangoria
18d45b11d9 trace_uprobe: Use %lx to display offset
tu->offset is unsigned long, not a pointer, thus %lx should
be used to print it, not the %px.

Link: http://lkml.kernel.org/r/20180315082756.9050-1-ravi.bangoria@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 0e4d819d08 ("trace_uprobe: Display correct offset in uprobe_events")
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:31:07 -04:00
Howard McLauchlan
f0a2aa5a2a tracing/uprobe: Add support for overlayfs
uprobes cannot successfully attach to binaries located in a directory
mounted with overlayfs.

To verify, create directories for mounting overlayfs
(upper,lower,work,merge), move some binary into merge/ and use readelf
to obtain some known instruction of the binary. I used /bin/true and the
entry instruction(0x13b0):

	$ mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merge
	$ cd /sys/kernel/debug/tracing
	$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
	$ echo 1 > events/uprobes/true_entry/enable

This returns 'bash: echo: write error: Input/output error' and dmesg
tells us 'event trace: Could not enable event true_entry'

This change makes create_trace_uprobe() look for the real inode of a
dentry. In the case of normal filesystems, this simplifies to just
returning the inode. In the case of overlayfs(and similar fs) we will
obtain the underlying dentry and corresponding inode, upon which uprobes
can successfully register.

Running the example above with the patch applied, we can see that the
uprobe is enabled and will output to trace as expected.

Link: http://lkml.kernel.org/r/20180410231030.2720-1-hmclauchlan@fb.com

Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:31:06 -04:00
Jérémy Lefaure
0a4d0564f0 tracing: Use ARRAY_SIZE() macro instead of open coding it
It is useless to re-invent the ARRAY_SIZE macro so let's use it instead
of DATA_CNT.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Link: http://lkml.kernel.org/r/20171016012250.26453-1-jeremy.lefaure@lse.epita.fr

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
[ Removed useless include of kernel.h ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-04-11 11:30:33 -04:00
David S. Miller
949989b36c Merge branch 'vhost-fix-vhost_vq_access_ok-log-check'
Stefan Hajnoczi says:

====================
vhost: fix vhost_vq_access_ok() log check

v3:
 * Rebased onto net/master and resolved conflict [DaveM]

v2:
 * Rewrote the conditional to make the vq access check clearer [Linus]
 * Added Patch 2 to make the return type consistent and harder to misuse [Linus]

The first patch fixes the vhost virtqueue access check which was recently
broken.  The second patch replaces the int return type with bool to prevent
future bugs.
====================

Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:54:06 -04:00
Stefan Hajnoczi
ddd3d4081f vhost: return bool from *_access_ok() functions
Currently vhost *_access_ok() functions return int.  This is error-prone
because there are two popular conventions:

1. 0 means failure, 1 means success
2. -errno means failure, 0 means success

Although vhost mostly uses #1, it does not do so consistently.
umem_access_ok() uses #2.

This patch changes the return type from int to bool so that false means
failure and true means success.  This eliminates a potential source of
errors.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:54:06 -04:00
Stefan Hajnoczi
d14d2b7809 vhost: fix vhost_vq_access_ok() log check
Commit d65026c6c6 ("vhost: validate log
when IOTLB is enabled") introduced a regression.  The logic was
originally:

  if (vq->iotlb)
      return 1;
  return A && B;

After the patch the short-circuit logic for A was inverted:

  if (A || vq->iotlb)
      return A;
  return B;

This patch fixes the regression by rewriting the checks in the obvious
way, no longer returning A when vq->iotlb is non-NULL (which is hard to
understand).

Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:54:06 -04:00
Eric Auger
7ced6c98c7 vhost: Fix vhost_copy_to_user()
vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.

Fixes: f889491380 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:52:34 -04:00
David S. Miller
98239c9096 Merge branch 'Aquantia-atlantic-critical-fixes-04-2018'
Igor Russkikh says:

====================
Aquantia atlantic critical fixes 04/2018

Two regressions on latest 4.16 driver reported by users

Some of old FW (1.5.44) had a link management logic which prevents
driver to make clean reset. Driver of 4.16 has a full hardware reset
implemented and that broke the link and traffic on such a cards.

Second is oops on shutdown callback in case interface is already
closed or was never opened.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:41:36 -04:00
Igor Russkikh
9a11aff25f net: aquantia: oops when shutdown on already stopped device
In case netdev is closed at the moment of pci shutdown, aq_nic_stop
gets called second time. napi_disable in that case hangs indefinitely.
In other case, if device was never opened at all, we get oops because
of null pointer access.

We should invoke aq_nic_stop conditionally, only if device is running
at the moment of shutdown.

Reported-by: David Arcari <darcari@redhat.com>
Fixes: 90869ddfef ("net: aquantia: Implement pci shutdown callback")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:41:36 -04:00
Igor Russkikh
cce96d1883 net: aquantia: Regression on reset with 1.x firmware
On ASUS XG-C100C with 1.5.44 firmware a special mode called "dirty wake"
is active. With this mode when motherboard gets powered (but no poweron
happens yet), NIC automatically enables powersave link and watches
for WOL packet.
This normally allows to powerup the PC after AC power failures.

Not all motherboards or bios settings gives power to PCI slots,
so this mode is not enabled on all the hardware.

4.16 linux driver introduced full hardware reset sequence
This is required since before that we had no NIC hardware
reset implemented and there were side effects of "not clean start".

But this full reset is incompatible with "dirty wake" WOL feature
it keeps the PHY link in a special mode forever. As a consequence,
driver sees no link and no traffic.

To fix this we forcibly change FW state to idle state before doing
the full reset. This makes FW to restore link state.

Fixes: c8c82eb net: aquantia: Introduce global AQC hardware reset sequence
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:41:36 -04:00
Bassem Boubaker
53765341ee cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
The Cinterion AHS8 is a 3G device with one embedded WWAN interface
using cdc_ether as a driver.

The modem is controlled via AT commands through the exposed TTYs.

AT+CGDCONT write command can be used to activate or deactivate a WWAN
connection for a PDP context defined with the same command. UE
supports one WWAN adapter.

Signed-off-by: Bassem Boubaker <bassem.boubaker@actia.fr>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:36:38 -04:00
Tejaswi Tanikella
3f01ddb962 slip: Check if rstate is initialized before uncompressing
On receiving a packet the state index points to the rstate which must be
used to fill up IP and TCP headers. But if the state index points to a
rstate which is unitialized, i.e. filled with zeros, it gets stuck in an
infinite loop inside ip_fast_csum trying to compute the ip checsum of a
header with zero length.

89.666953:   <2> [<ffffff9dd3e94d38>] slhc_uncompress+0x464/0x468
89.666965:   <2> [<ffffff9dd3e87d88>] ppp_receive_nonmp_frame+0x3b4/0x65c
89.666978:   <2> [<ffffff9dd3e89dd4>] ppp_receive_frame+0x64/0x7e0
89.666991:   <2> [<ffffff9dd3e8a708>] ppp_input+0x104/0x198
89.667005:   <2> [<ffffff9dd3e93868>] pppopns_recv_core+0x238/0x370
89.667027:   <2> [<ffffff9dd4428fc8>] __sk_receive_skb+0xdc/0x250
89.667040:   <2> [<ffffff9dd3e939e4>] pppopns_recv+0x44/0x60
89.667053:   <2> [<ffffff9dd4426848>] __sock_queue_rcv_skb+0x16c/0x24c
89.667065:   <2> [<ffffff9dd4426954>] sock_queue_rcv_skb+0x2c/0x38
89.667085:   <2> [<ffffff9dd44f7358>] raw_rcv+0x124/0x154
89.667098:   <2> [<ffffff9dd44f7568>] raw_local_deliver+0x1e0/0x22c
89.667117:   <2> [<ffffff9dd44c8ba0>] ip_local_deliver_finish+0x70/0x24c
89.667131:   <2> [<ffffff9dd44c92f4>] ip_local_deliver+0x100/0x10c

./scripts/faddr2line vmlinux slhc_uncompress+0x464/0x468 output:
 ip_fast_csum at arch/arm64/include/asm/checksum.h:40
 (inlined by) slhc_uncompress at drivers/net/slip/slhc.c:615

Adding a variable to indicate if the current rstate is initialized. If
such a packet arrives, move to toss state.

Signed-off-by: Tejaswi Tanikella <tejaswit@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:33:46 -04:00
Phil Elwell
fed56079e7 lan78xx: Avoid spurious kevent 4 "error"
lan78xx_defer_event generates an error message whenever the work item
is already scheduled. lan78xx_open defers three events -
EVENT_STAT_UPDATE, EVENT_DEV_OPEN and EVENT_LINK_RESET. Being aware
of the likelihood (or certainty) of an error message, the DEV_OPEN
event is added to the set of pending events directly, relying on
the subsequent deferral of the EVENT_LINK_RESET call to schedule the
work.  Take the same precaution with EVENT_STAT_UPDATE to avoid a
totally unnecessary error message.

Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:32:43 -04:00
Phil Elwell
4bfc33807a lan78xx: Correctly indicate invalid OTP
lan78xx_read_otp tries to return -EINVAL in the event of invalid OTP
content, but the value gets overwritten before it is returned and the
read goes ahead anyway. Make the read conditional as it should be
and preserve the error code.

Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:30:42 -04:00
Ka-Cheong Poon
a43cced9a3 rds: MP-RDS may use an invalid c_path
rds_sendmsg() calls rds_send_mprds_hash() to find a c_path to use to
send a message.  Suppose the RDS connection is not yet up.  In
rds_send_mprds_hash(), it does

	if (conn->c_npaths == 0)
		wait_event_interruptible(conn->c_hs_waitq,
					 (conn->c_npaths != 0));

If it is interrupted before the connection is set up,
rds_send_mprds_hash() will return a non-zero hash value.  Hence
rds_sendmsg() will use a non-zero c_path to send the message.  But if
the RDS connection ends up to be non-MP capable, the message will be
lost as only the zero c_path can be used.

Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-11 10:24:01 -04:00
Desnes A. Nunes do Rosario
adf58458bc PCI: Remove messages about reassigning resources
When reassigning device resources to increase their alignment, e.g.,
because of a "pci=resource_alignment=" kernel parameter or because the
platform aligns resources to its page size, we previously emitted messages
like this:

  pci 0000:00:00.0: Disabling memory decoding and releasing memory resources
  pci 0000:00:00.0: disabling bridge mem windows

These messages don't convey any useful information, so remove them.

Fixes: 3827463769 ("powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned")
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2018-04-11 08:46:50 -05:00
Rafael J. Wysocki
51798deaff Merge branches 'pm-cpuidle' and 'pm-qos'
* pm-cpuidle:
  tick-sched: avoid a maybe-uninitialized warning
  cpuidle: Add definition of residency to sysfs documentation
  time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
  nohz: Avoid duplication of code related to got_idle_tick
  nohz: Gather tick_sched booleans under a common flag field
  cpuidle: menu: Avoid selecting shallow states with stopped tick
  cpuidle: menu: Refine idle state selection for running tick
  sched: idle: Select idle state before stopping the tick
  time: hrtimer: Introduce hrtimer_next_event_without()
  time: tick-sched: Split tick_nohz_stop_sched_tick()
  cpuidle: Return nohz hint from cpuidle_select()
  jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  sched: idle: Do not stop the tick before cpuidle_idle_call()
  sched: idle: Do not stop the tick upfront in the idle loop
  time: tick-sched: Reorganize idle tick management code

* pm-qos:
  PM / QoS: mark expected switch fall-throughs
2018-04-11 13:22:46 +02:00
Helge Deller
71d577db01 parisc: Switch to generic COMPAT_BINFMT_ELF
Drop our own compat binfmt implementation in
arch/parisc/kernel/binfmt_elf32.c in favour of the generic
implementation with CONFIG_COMPAT_BINFMT_ELF.

While cleaning up the dependencies, I noticed that ELF_PLATFORM was strangely
defined: On a 32-bit kernel, it was defined to "PARISC", while when running in
compat mode on a 64-bit kernel it was defined to "PARISC32". Since it doesn't
seem to be used in glibc yet, it's now defined in both cases to "PARISC". In
any case, it can be distinguished because it's either a 32-bit or a 64-bit ELF
file.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11 11:40:35 +02:00
Helge Deller
2a03bb9e7a parisc: Move cache flush functions into .text.hot section
and move the disable_sr_hashing() C and assembly functions into the
.init section.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11 11:40:35 +02:00
Helge Deller
75abf64287 parisc/signal: Add FPE_CONDTRAP for conditional trap handling
Posix and common sense requires that SI_USER not be a signal specific
si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2018-04-11 11:40:35 +02:00
Joel Stanley
cb799267bb MAINTAINERS: Update ASPEED entry with details
I am interested in all ASPEED drivers, and the previous match wasn't
grabbing files in nested directories. Use N instead.

Add the arm kernel mailing list so that patches get reviewed there, and
the linux-aspeed list which exists only so I can use patchwork to track
patches.

Add Andrew as a reviewer, because he is involved in reviewing ASPEED
stuff.

Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-11 10:47:31 +02:00
Harald Freudenberger
af4a72276d s390/zcrypt: Support up to 256 crypto adapters.
There was an artificial restriction on the card/adapter id
to only 6 bits but all the AP commands do support adapter
ids with 8 bit. This patch removes this restriction to 64
adapters and now up to 256 adapter can get addressed.

Some of the ioctl calls work on the max number of cards
possible (which was 64). These ioctls are now deprecated
but still supported. All the defines, structs and ioctl
interface declarations have been kept for compabibility.
There are now new ioctls (and defines for these) with an
additional '2' appended which provide the extended versions
with 256 cards supported.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-11 10:36:27 +02:00
Carlos Maiolino
8c81dd46ef Force log to disk before reading the AGF during a fstrim
Forcing the log to disk after reading the agf is wrong, we might be
calling xfs_log_force with XFS_LOG_SYNC with a metadata lock held.

This can cause a deadlock when racing a fstrim with a filesystem
shutdown.

The deadlock has been identified due a miscalculation bug in device-mapper
dm-thin, which returns lack of space to its users earlier than the device itself
really runs out of space, changing the device-mapper volume into an error state.

The problem happened while filling the filesystem with a single file,
triggering the bug in device-mapper, consequently causing an IO error
and shutting down the filesystem.

If such file is removed, and fstrim executed before the XFS finishes the
shut down process, the fstrim process will end up holding the buffer
lock, and going to sleep on the cil wait queue.

At this point, the shut down process will try to wake up all the threads
waiting on the cil wait queue, but for this, it will try to hold the
same buffer log already held my the fstrim, locking up the filesystem.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-10 22:39:04 -07:00
Matthew Wilcox
fbbb450904 Export __set_page_dirty
XFS currently contains a copy-and-paste of __set_page_dirty().  Export
it from buffer.c instead.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-04-10 22:39:01 -07:00
Dave Airlie
871e899db1 Merge branch 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux into drm-next
A few fixes for 4.17:
- Fix a potential use after free in a error case
- Fix pcie lane handling in amdgpu SI dpm
- sdma pipeline sync fix
- A few vega12 cleanups and fixes
- Misc other fixes

* 'drm-next-4.17' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: Fix memory leaks at amdgpu_init() error path
  drm/amdgpu: Fix PCIe lane width calculation
  drm/radeon: Fix PCIe lane width calculation
  drm/amdgpu/si: implement get/set pcie_lanes asic callback
  drm/amdgpu: Add support for SRBM selection v3
  Revert "drm/amdgpu: Don't change preferred domian when fallback GTT v5"
  drm/amd/powerply: fix power reading on Fiji
  drm/amd/powerplay: Enable ACG SS feature
  drm/amdgpu/sdma: fix mask in emit_pipeline_sync
  drm/amdgpu: Fix KIQ hang on bare metal for device unbind/bind back v2.
  drm/amd/pp: Clean header file in vega12_smumgr.c
  drm/amd/pp: Remove Dead functions on Vega12
  drm/amd/pp: silence a static checker warning
  drm/amdgpu: drop compute ring timeout setting for non-sriov only (v2)
  drm/amdgpu: fix typo of domain fallback
2018-04-11 08:35:41 +10:00
Dave Airlie
c975f17d70 hda_intel: Don't declare azx PM ops if VGA_SWITCHEROO configured (Lukas)
Cc: Lukas Wunner <lukas@wunner.de>
 Cc: Takashi Iwai <tiwai@suse.de>
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEfxcpfMSgdnQMs+QqlvcN/ahKBwoFAlrFI58ACgkQlvcN/ahK
 BwrWzAf+J/tz5aj+e69HF2J7nypt4GY9owspk2SCLuvTfdzQIbzfl477DXCL7IFp
 Oc1qcCEbS6OjRXb8yN/xGClvWI8fDlRapnOYBsKvjZ1WNq89mwgl2FQgPKAfjNnT
 Lg1Z9AzzTyVJcWgk9jGlswNV+Uyi2QR9Zlelg+yrTidEi4ybwLlFMV9CkmT9q9ph
 yP+uApTqe8NA5SZrpwhtvlPj8kRcJgF9Z7FCARcVq3ZXh2qZ+Uujzay+QcnP4AZK
 MSfU/TXh4xeVDc4ttYenZNus0koJugTSpZ3hELZga4mGfj4W1nkSjLh3qRTxudD0
 WpxxSjFKkKUqcsB1YXKgRehmO+AGIA==
 =acWn
 -----END PGP SIGNATURE-----

Merge tag 'drm-misc-next-fixes-2018-04-04' of git://anongit.freedesktop.org/drm/drm-misc into drm-next

hda_intel: Don't declare azx PM ops if VGA_SWITCHEROO configured (Lukas)

Cc: Lukas Wunner <lukas@wunner.de>
Cc: Takashi Iwai <tiwai@suse.de>

* tag 'drm-misc-next-fixes-2018-04-04' of git://anongit.freedesktop.org/drm/drm-misc:
  ALSA: hda - Silence PM ops build warning
2018-04-11 08:35:18 +10:00
Takashi Iwai
9e7f06c8be swiotlb: fix unexpected swiotlb_alloc_coherent failures
The code refactoring by commit 0176adb004 ("swiotlb: refactor coherent
buffer allocation") made swiotlb_alloc_buffer almost always failing due
to a thinko: namely, the function evaluates the dma_coherent_ok call
incorrectly and dealing as if it's invalid. This ends up with weird
errors like iwlwifi probe failure or amdgpu screen flickering.

This patch corrects the logic error.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1088658
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1088902
Fixes: 0176adb004 ("swiotlb: refactor coherent buffer allocation")
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-10 22:30:43 +02:00
Frank Sorenson
98de9ce6f6 NFS: advance nfs_entry cookie only after decoding completes successfully
In nfs[34]_decode_dirent, the cookie is advanced as soon as it is
read, but decoding may still fail later in the function, returning
an error.  Because the cookie has been advanced, the failing entry
is not re-requested from the server, resulting in a missing directory
entry.

In addition, nfs v3 and v4 read the cookie at different locations
in the xdr_stream, so the behavior of the two can be inconsistent.

Fix these by reading the cookie into a temporary variable, and
only advancing the cookie once the entire entry has been decoded
from the xdr_stream successfully.

Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
chendt
dbc898ae10 NFSv3/acl: forget acl cache after setattr
Sync of ACL with std permissions fail,We need to forget the ACL cache after setattr.

Reproduction:
#!/bin/bash
touch testfile
cat <<EOF >testfile
#!/bin/bash
echo "Test was executed"
EOF
chmod u=rwx testfile
chmod g=rw- testfile
chmod o=r-- testfile

chacl u::r--,g::rwx,o:rw- testfile
chmod u+w testfile
ls -l testfile
chacl -l testfile

Output:
-rw-rwxrw- 1 root root 0 Mar 28 05:29 testfile
testfile [u::r--,g::rwx,o::rw-]

Signed-off-by: chendt.fnst <chendt.fnst@cn.fujitsu.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Kinglong Mee <Kinglong Mee>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
609339c123 NFSv4.1: Fix exclusive create
When we use EXCLUSIVE4_1 mode, the server returns an attribute mask where
all the bits indicate which attributes were set, and where the verifier
was stored. In order to figure out which attribute we have to resend,
we need to clear out the attributes that are set in exclcreat_bitmask.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
[Anna: Fixed typo NFS4_CREATE_EXCLUSIVE4 -> NFS4_CREATE_EXCLUSIVE]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
f6cdfa6dd6 NFSv4: Declare the size up to date after it was set.
When we've changed the file size, then ensure we declare it to be
up to date in the inode attributes.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Matthew Wilcox
aae5730e2d nfs: Use ida_simple API
Allocate the owner_id when we allocate the state and free it when we free
the state.  That lets us get rid of a gnarly ida_pre_get() / ida_get_new()
loop.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
35156bfff3 NFSv4: Fix the nfs_inode_set_delegation() arguments
Neither nfs_inode_set_delegation() nor nfs_inode_reclaim_delegation() are
generic code. They have no business delving into NFSv4 OPEN xdr structures,
so let's replace the "struct nfs_openres" parameter.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
8b06494624 NFSv4: Clean up CB_GETATTR encoding
Replace the open coded bitmap implementation with a generic one.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
8bcbe7d98c NFSv4: Don't ask for attributes when ACCESS is protected by a delegation
If we hold a delegation, then the results of the ACCESS call are protected
anyway.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
36b3743fef NFSv4: Add a helper to encode/decode struct timespec
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
40a3426c75 NFSv4: Clean up encode_attrs
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
37c88763de NFSv4; Clean up XDR encoding of type bitmap4
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
e8d8aa46be NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
85e3dd44c5 SUNRPC: Add a helper for encoding opaque data inline
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
0e779aa703 SUNRPC: Add helpers for decoding opaque and string types
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
d943f2dd8d NFSv4: Ignore change attribute invalidations if we hold a delegation
Don't bother even recording an invalid change attribute if we hold a
delegation since we already know the state of our attribute cache.
We can rely on the fact that we will pick up a copy from the server
when we return the delegation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
16e1437517 NFS: More fine grained attribute tracking
Currently, if the NFS_INO_INVALID_ATTR flag is set, for instance by
a call to nfs_post_op_update_inode_locked(), then it will not be cleared
until all the attributes have been revalidated. This means, for instance,
that NFSv4 writes will always force a full attribute revalidation.

Track the ctime, mtime, size and change attribute separately from the
other attributes so that we can have nfs_post_op_update_inode_locked()
set them correctly, and later have the cache consistency bitmask be
able to clear them.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00
Trond Myklebust
cac88f942d NFS: Don't force unnecessary cache invalidation in nfs_update_inode()
If we managed to revalidate all the attributes, then there is no reason
to mark them as invalid again. We do, however want to ensure that we
set nfsi->attrtimeo correctly.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-04-10 16:06:22 -04:00