Commit Graph

870981 Commits

Author SHA1 Message Date
Yazen Ghannam
1c9b08bac5 EDAC/amd64: Use cached data when checking for ECC
...now that the data is available earlier.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-5-Yazen.Ghannam@amd.com
2019-11-06 11:07:57 +01:00
Yazen Ghannam
5e4c55276a EDAC/amd64: Save max number of controllers to family type
The maximum number of memory controllers is fixed within a family/model
group. In most cases, this has been fixed at 2, but some systems may
have up to 8.

The struct amd64_family_type already contains family/model-specific
information, and this can be used rather than adding model checks to
various functions.

Create a new field in struct amd64_family_type for max_mcs.
Set this when setting other family type information, and use this when
needing the maximum number of memory controllers possible for a system.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-4-Yazen.Ghannam@amd.com
2019-11-06 11:07:01 +01:00
Yazen Ghannam
80355a3b2d EDAC/amd64: Gather hardware information early
Split out gathering hardware information from init_one_instance()
into a separate function hw_info_get(). This is necessary so that
the information can be cached earlier and used to check if memory is
populated and if ECC is enabled on a node.

Also, define a function hw_info_put() to back out changes made in
hw_info_get().

Check for an allocated PCI device (Function 0 for Family 17h or Function
1 for pre-Family 17h) before freeing, since hw_info_put() may be called
before PCI siblings are reserved.

Drop the family check when freeing pvt->umc. This will be NULL on
pre-Family 17h systems. However, kfree() is safe and will check for a
NULL pointer before freeing.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-3-Yazen.Ghannam@amd.com
2019-11-06 11:04:49 +01:00
Yazen Ghannam
38ddd4d157 EDAC/amd64: Make struct amd64_family_type global
The struct amd64_family_type doesn't change between multiple nodes and
instances of the module, so make it global.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191106012448.243970-2-Yazen.Ghannam@amd.com
2019-11-06 10:58:12 +01:00
Yazen Ghannam
466503d6b1 EDAC/amd64: Set grain per DIMM
The following commit introduced a warning on error reports without a
non-zero grain value.

  3724ace582 ("EDAC/mc: Fix grain_bits calculation")

The amd64_edac_mod module does not provide a value, so the warning will
be given on the first reported memory error.

Set the grain per DIMM to cacheline size (64 bytes). This is the current
recommendation.

Fixes: 3724ace582 ("EDAC/mc: Fix grain_bits calculation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20191022203448.13962-7-Yazen.Ghannam@amd.com
2019-10-25 15:36:36 +02:00
Markus Elfring
5bbab3cf21 EDAC/aspeed: Use devm_platform_ioremap_resource() in aspeed_probe()
Simplify this function implementation by using a known wrapper function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Joel Stanley <joel@jms.id.au>
Cc: Andrew Jeffery <andrew@aj.id.au>
Cc: James Morse <james.morse@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-aspeed@lists.ozlabs.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Stefan Schaeckeler <sschaeck@cisco.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/baabb9e9-a1b2-3a04-9fb6-aa632de5f722@web.de
2019-10-24 11:17:29 +02:00
Tony Luck
e80634a75a EDAC, skx: Retrieve and print retry_rd_err_log registers
Skylake logs some additional useful information in per-channel
registers in addition the the architectural status/addr/misc
logged in the machine check bank.

Pick up this information and add it to the EDAC log:

	retry_rd_err_[five 32-bit register values]

Sorry, no definitions for these registers. OEMs and DIMM vendors
will be able to use them to isolate which cells in the DIMM are
causing problems.

	correrrcnt[per rank corrected error counts]

Note that if additional errors are logged while these registers are
being read, you may see a jumble of values some from earlier errors,
others from later errors (since the registers report the most recent
logged error). The correrrcnt registers provide error counts per possible
rank. If these counts only change by one since the previous error logged
for this channel, then it is safe to assume that the registers logged
provide a coherent view of one error.

With this change EDAC logs look like this:

EDAC MC4: 1 CE memory read error on CPU_SrcID#2_MC#0_Chan#1_DIMM#0 (channel:1 slot:0 page:0x8f26018 offset:0x0 grain:32 syndrome:0x0 -  err_code:0x0101:0x0091 socket:2 imc:0 rank:0 bg:0 ba:0 row:0x1f880 col:0x200 retry_rd_err_log[0001a209 00000000 00000001 04800001 0001f880] correrrcnt[0001 0000 0000 0000 0000 0000 0000 0000])

Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-10-18 15:27:58 -07:00
Tony Luck
29b8e84fbc EDAC, skx_common: Refactor so that we initialize "dev" in result of adxl decode.
Simplifies the code a little.

Acked-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2019-10-18 15:27:48 -07:00
Borislav Petkov
3a5e7ec903 Merge branch 'edac-urgent' into edac-for-next
Pick up urgent change into next queue.

Signed-off-by: Borislav Petkov <bp@suse.de>
2019-10-17 13:47:12 +02:00
James Morse
1e72e673b9 EDAC/ghes: Fix Use after free in ghes_edac remove path
ghes_edac models a single logical memory controller, and uses a global
ghes_init variable to ensure only the first ghes_edac_register() will
do anything.

ghes_edac is registered the first time a GHES entry in the HEST is
probed. There may be multiple entries, so subsequent attempts to
register ghes_edac are silently ignored as the work has already been
done.

When a GHES entry is unregistered, it calls ghes_edac_unregister(),
which free()s the memory behind the global variables in ghes_edac.

But there may be multiple GHES entries, the next call to
ghes_edac_unregister() will dereference the free()d memory, and attempt
to free it a second time.

This may also be triggered on a platform with one GHES entry, if the
driver is unbound/re-bound and unbound. The re-bind step will do
nothing because of ghes_init, the second unbind will then do the same
work as the first.

Doing the unregister work on the first call is unsafe, as another
CPU may be processing a notification in ghes_edac_report_mem_error(),
using the memory we are about to free.

ghes_init is already half of the reference counting. We only need
to do the register work for the first call, and the unregister work
for the last. Add the unregister check.

This means we no longer free ghes_edac's memory while there are
GHES entries that may receive a notification.

This was detected by KASAN and DEBUG_TEST_DRIVER_REMOVE.

 [ bp: merge into a single patch. ]

Fixes: 0fe5f281f7 ("EDAC, ghes: Model a single, logical memory controller")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Robert Richter <rrichter@marvell.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20191014171919.85044-2-james.morse@arm.com
Link: https://lkml.kernel.org/r/304df85b-8b56-b77e-1a11-aa23769f2e7c@huawei.com
2019-10-17 11:27:05 +02:00
Hanna Hawa
9816b4af43 EDAC/device: Rework error logging API
Make the main workhorse the "count" functions which can log a @count
of errors. Have the current APIs edac_device_handle_{ce,ue}() call
the _count() variants and this way keep the exported symbols number
unchanged.

 [ bp: Rewrite. ]

Signed-off-by: Hanna Hawa <hhhawa@amazon.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: benh@amazon.com
Cc: dwmw@amazon.co.uk
Cc: hanochu@amazon.com
Cc: James Morse <james.morse@arm.com>
Cc: jonnyc@amazon.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: ronenk@amazon.com
Cc: talel@amazon.com
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20190923191741.29322-2-hhhawa@amazon.com
2019-10-09 13:01:42 +02:00
Mauro Carvalho Chehab
f05390d30e EDAC: skx_common: get rid of unused type var
drivers/edac/skx_common.c: In function ‘skx_mce_output_error’:
	drivers/edac/skx_common.c:478:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
	  478 |  char *type, *optype;
	      |        ^~~~

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Mauro Carvalho Chehab
323014d85d EDAC: sb_edac: get rid of unused vars
There are several vars unused on this driver, probably because
it was a modified copy of another driver. Get rid of them.

	drivers/edac/sb_edac.c: In function ‘knl_get_dimm_capacity’:
	drivers/edac/sb_edac.c:1343:16: warning: variable ‘sad_size’ set but not used [-Wunused-but-set-variable]
	 1343 |  u64 sad_base, sad_size, sad_limit = 0;
	      |                ^~~~~~~~
	drivers/edac/sb_edac.c: In function ‘sbridge_mce_output_error’:
	drivers/edac/sb_edac.c:2955:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
	 2955 |  char *type, *optype, msg[256];
	      |        ^~~~
	drivers/edac/sb_edac.c: In function ‘sbridge_unregister_mci’:
	drivers/edac/sb_edac.c:3203:22: warning: variable ‘pvt’ set but not used [-Wunused-but-set-variable]
	 3203 |  struct sbridge_pvt *pvt;
	      |                      ^~~
	At top level:
	drivers/edac/sb_edac.c:266:18: warning: ‘correrrthrsld’ defined but not used [-Wunused-const-variable=]
	  266 | static const u32 correrrthrsld[] = {
	      |                  ^~~~~~~~~~~~~
	drivers/edac/sb_edac.c:257:18: warning: ‘correrrcnt’ defined but not used [-Wunused-const-variable=]
	  257 | static const u32 correrrcnt[] = {
	      |                  ^~~~~~~~~~

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Mauro Carvalho Chehab
bb66f86781 EDAC: i5400_edac: get rid of some unused vars
There are several temporary unused vars:

	drivers/edac/i5400_edac.c: In function ‘i5400_get_mc_regs’:
	drivers/edac/i5400_edac.c:1058:6: warning: variable ‘maxdimmperch’ set but not used [-Wunused-but-set-variable]
	 1058 |  int maxdimmperch;
	      |      ^~~~~~~~~~~~
	drivers/edac/i5400_edac.c:1057:6: warning: variable ‘maxch’ set but not used [-Wunused-but-set-variable]
	 1057 |  int maxch;
	      |      ^~~~~
	drivers/edac/i5400_edac.c: In function ‘i5400_init_dimms’:
	drivers/edac/i5400_edac.c:1174:6: warning: variable ‘max_dimms’ set but not used [-Wunused-but-set-variable]
	 1174 |  int max_dimms;
	      |      ^~~~~~~~~
	drivers/edac/i5400_edac.c:1173:14: warning: variable ‘channel_count’ set but not used [-Wunused-but-set-variable]
	 1173 |  int ndimms, channel_count;
	      |              ^~~~~~~~~~~~~

Get rid of them.

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Mauro Carvalho Chehab
1acd05e40c EDAC: i5400_edac: print type at debug message
There are 3 types of non-recoverable errors that the MC reports:

	- Fatal;
	- Non-fatal uncorrected
	- Non-fatal correctable

While we don't add it to the log itself, it could be useful to
have this at least for debug messages.

This shuts up this warning:

	drivers/edac/i5400_edac.c: In function ‘i5400_proccess_non_recoverable_info’:
	drivers/edac/i5400_edac.c:524:8: warning: variable ‘type’ set but not used [-Wunused-but-set-variable]
	  524 |  char *type = NULL;
	      |        ^~~~

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Mauro Carvalho Chehab
48356e0d57 EDAC: i7300_edac: fix a kernel-doc syntax
The declaration of the kerneldoc entry is wrong, causing this
warning:

	drivers/edac/i7300_edac.c:824: warning: Function parameter or member 'mir_no' not described in 'decode_mir'

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Mauro Carvalho Chehab
9f95c8d5f8 EDAC: i7300_edac: rename a kernel-doc var description
One var was renamed, but the associated kernel-doc markup still
points to the old name.

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Mauro Carvalho Chehab
c43fa3b11e EDAC: i5100_edac: get rid of an unused var
As reported by GCC with W=1:

	drivers/edac/i5100_edac.c:714:16: warning: variable ‘et’ set but not used [-Wunused-but-set-variable]
	  714 |  unsigned long et;
	      |                ^~

It sounds some left over from some code before the addition of
an udelay().

Acked-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
2019-09-30 15:41:54 -03:00
Linus Torvalds
54ecb8f702 Linux 5.4-rc1 2019-09-30 10:35:40 -07:00
Linus Torvalds
bb48a59135 for-5.4-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl2SDbMACgkQxWXV+ddt
 WDsUhw/9HcRsT6SlrwA2R5leHxCR5UMwT2Zmbxpfft37ANF0SC1UINHBfnmquM97
 xX6fdRSR9RUjF9DrdLPfLBnJDQ/MnHl1ruIVBFhJm6cJ9TJwf9E0TiJBQt+08JWg
 vy5hZBWvsPWWRBJ94XPMe4LtakK/isW4Cz5W9AdrC2Siqw69j6eZzms2AnIjyBjA
 BoKg4se2Ay2rMxLZWXIOj9374PU+N1cnRnqgh77ZxLku5WdCzrDfB5safE7UmoTG
 /MWJuuIgzOk0iQpQORRtEZDS1dNe5KT9m4xXkUbrZbQROwqnXrT1SVIsuqNAvlPk
 uaymR1W8nshepzpMlSxVydLv/mKWZNUGnDxOJ23ooow8Yd7ndppXEtFuGwCYqIFc
 xQqxuTLREvJ9+jpSv11bmDpk/ULRqpV+2PjUqGaWlGwFArJ+qFRLVGYx31eXmDPj
 t2mrPOcXGzY0pKtIpbkuUGleY/jeI+BNsvD4+QPs+jnp0nmfvH0/Rmp7grGqx2FI
 rQM8Gn4a5i3nuEDWLp8nN2wcKC3ePwy96Vp2tqfsl6TVTPx4EFzGLkWogHR2yiqI
 0LAj8YWFmWuChSv71wYOjX79CppjcbNwOakSwtDjV30jkwoh2f/0D3OwOpua2xe8
 75KQMaSB0kesGZz7ZkL1kMqA5m5w7MGZom6XZoBJ+bq2HPLB2jo=
 =2UM7
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A bunch of fixes that accumulated in recent weeks, mostly material for
  stable.

  Summary:

   - fix for regression from 5.3 that prevents to use balance convert
     with single profile

   - qgroup fixes: rescan race, accounting leak with multiple writers,
     potential leak after io failure recovery

   - fix for use after free in relocation (reported by KASAN)

   - other error handling fixups"

* tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
  btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
  btrfs: Fix a regression which we can't convert to SINGLE profile
  btrfs: relocation: fix use-after-free on dead relocation roots
  Btrfs: fix race setting up and completing qgroup rescan workers
  Btrfs: fix missing error return if writeback for extent buffer never started
  btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer
  Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
2019-09-30 10:25:24 -07:00
Linus Torvalds
80b29b6b8c csky-for-linus-5.4-rc1: arch/csky patches for 5.4-rc1
This round of csky subsystem just some fixups.
 
 Fixup:
  - Fixup mb() synchronization problem
  - Fixup dma_alloc_coherent with PAGE_SO attribute
  - Fixup cache_op failed when cross memory ZONEs
  - Optimize arch_sync_dma_for_cpu/device with dma_inv_range
  - Fixup ioremap function losing
  - Fixup arch_get_unmapped_area() implementation
  - Fixup defer cache flush for 610
  - Support kernel non-aligned access
  - Fixup 610 vipt cache flush mechanism
  - Fixup add zero_fp fixup perf backtrace panic
  - Move static keyword to the front of declaration
  - Fixup csky_pmu.max_period assignment
  - Use generic free_initrd_mem()
  - entry: Remove unneeded need_resched() loop
 
 CI-Tested: https://gitlab.com/c-sky/buildroot/pipelines/77689888
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE2KAv+isbWR/viAKHAXH1GYaIxXsFAl2Rfc0SHHJlbl9ndW9A
 Yy1za3kuY29tAAoJEAFx9RmGiMV7eEUQAIuuO1ym/o7zMWZsIRbISJYi6xuq752W
 RVhnPv957ktFYlqrtelDuZwkCPnK03YDTv8lPzcIhes+xYtuspN6NMum7SUWPt4c
 IbfQe4GmMwPGNrJ76NO9YcWQig8flT75wWE2CzOjhGpBfEkEHmtbOvOiZujcq4yN
 vhj8TKF9CjOSaDJNCOXinC6DeGoftRYTSRq6NMPfFfHIWqFcZIyb1Fz8tE/vFP4Y
 qEoG/ys/eu114DIZVnxr5ihmrJ4gjxmlXlnYA8WMRnRt6/oHEmmbtHLTT2eHbI2T
 u4TCssqrLoNADV7g7zyAyBnrgf4dXnJivGjc2NzVLAkgNbEJn+oGNYn+kdiRGd5X
 1CNVbzbZN0gHwLayDjJ4BWNxtpxxqmOjlDffjLVRW7dleTdfHvcxJwfhlvbQOPuq
 nMj4t1qJswn44f/bsj+F13hgXV0tWctYui/HS78xscE7t4yMNduf3JAm7TIxT0XN
 ej3102ffm4ycKZtfdwKwcACmMBJc+2QvGYmQo2L9pVLYIQe3QUWYs881V50PJXxV
 jdz4kmhpfKLie+yxmsdN1/8nTlcHF5wDiYwW9UColzfz1fCRkoPRx5tu/YI9zLoi
 K41CFEv4Z+aVzcp4HZNmOvVhGWgdmuwICMdY16wawhN6SAkvKJBFaP4g24h0J95M
 QyAPQl/VUMED
 =FLYg
 -----END PGP SIGNATURE-----

Merge tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux

Pull csky updates from Guo Ren:
 "This round of csky subsystem just some fixups:

   - Fix mb() synchronization problem

   - Fix dma_alloc_coherent with PAGE_SO attribute

   - Fix cache_op failed when cross memory ZONEs

   - Optimize arch_sync_dma_for_cpu/device with dma_inv_range

   - Fix ioremap function losing

   - Fix arch_get_unmapped_area() implementation

   - Fix defer cache flush for 610

   - Support kernel non-aligned access

   - Fix 610 vipt cache flush mechanism

   - Fix add zero_fp fixup perf backtrace panic

   - Move static keyword to the front of declaration

   - Fix csky_pmu.max_period assignment

   - Use generic free_initrd_mem()

   - entry: Remove unneeded need_resched() loop"

* tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux:
  csky: Move static keyword to the front of declaration
  csky: entry: Remove unneeded need_resched() loop
  csky: Fixup csky_pmu.max_period assignment
  csky: Fixup add zero_fp fixup perf backtrace panic
  csky: Use generic free_initrd_mem()
  csky: Fixup 610 vipt cache flush mechanism
  csky: Support kernel non-aligned access
  csky: Fixup defer cache flush for 610
  csky: Fixup arch_get_unmapped_area() implementation
  csky: Fixup ioremap function losing
  csky: Optimize arch_sync_dma_for_cpu/device with dma_inv_range
  csky/dma: Fixup cache_op failed when cross memory ZONEs
  csky: Fixup dma_alloc_coherent with PAGE_SO attribute
  csky: Fixup mb() synchronization problem
2019-09-30 10:16:17 -07:00
Linus Torvalds
cef0aa0ce8 ARM: SoC fixes
A few fixes that have trickled in through the merge window:
 
  - Video fixes for OMAP due to panel-dpi driver removal
  - Clock fixes for OMAP that broke no-idle quirks + nfsroot on DRA7
  - Fixing arch version on ASpeed ast2500
  - Two fixes for reset handling on ARM SCMI
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAl2Q+QsPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx36yIP/2eeqmdd97XJUiX2qGlKlDC+AtVUJnnxzhLa
 jSAIbbN74MCVH3M0BW1ECPbHHXFR54sffLFWwu8rVBr5nRdNPt0xLdXiVJcIVMfr
 by0LeMAOcw9CDxKsLqwwAagKq4HVnwqbZ+RVC3CjGz+Sp+vvSz+T/Ta6GzblASYv
 3zOs1FD+e4pwDqonyp6P5vAlBQ6qFL7AVZFPpNmXsqIzcT1bGEj/RwReErCIoevL
 7ZJr1R69D5IaEXYwWt8dT7bNwMR0gRvskrCQVCCBBcwHkO1PRd6cTXQ9EFBG8LgV
 LCM9F8Z+6QMigqvDFgSMpIz6orhKQKGpHF7K023c4DKBVqwigT/CaTOZgFr74pUJ
 Zp7s2dFcmJo1J2HlYDz3Nde4BFJXy3gNJphD7yI9xMfBNe1EXclvqtqICvnpMpgt
 thDXrReyDhMQukOBlrUcMxABP/EK97fULpC2Z2kaBq3SbnZILAOKFKtISmh5o6eI
 s4+QYETaqnWjYJE9d+YYg0VwNAifSzplSrGJVK43mGpqpQRx49cw25vL8bV/ZRgg
 HMTUX97Oho+EgDC9BWiahe0TZaWBBGJ3hWI/mrBW/dNowxoHp3l/fCcLdBPPKltS
 qo4BbbKPdrgq5o3YRBSjftdnbJ8eE9DTRs1sAsDQeNb9XJih9aiAQE3hxAA9wdJO
 vcRFcZ1Q
 =s4qF
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC fixes from Olof Johansson:
 "A few fixes that have trickled in through the merge window:

   - Video fixes for OMAP due to panel-dpi driver removal

   - Clock fixes for OMAP that broke no-idle quirks + nfsroot on DRA7

   - Fixing arch version on ASpeed ast2500

   - Two fixes for reset handling on ARM SCMI"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
  ARM: aspeed: ast2500 is ARMv6K
  reset: reset-scmi: add missing handle initialisation
  firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
  bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
  ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
  ARM: dts: am3517-evm: Fix missing video
  ARM: dts: logicpd-torpedo-baseboard: Fix missing video
  ARM: omap2plus_defconfig: Fix missing video
  bus: ti-sysc: Fix handling of invalid clocks
  bus: ti-sysc: Fix clock handling for no-idle quirks
2019-09-30 10:04:28 -07:00
Linus Torvalds
cf4f493b10 A few more tracing fixes:
- Fixed a buffer overflow by checking nr_args correctly in probes
 
  - Fixed a warning that is reported by clang
 
  - Fixed a possible memory leak in error path of filter processing
 
  - Fixed the selftest that checks for failures, but wasn't failing
 
  - Minor clean up on call site output of a memory trace event
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXZEP5hQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhrSAQDlws8rY/vJN4tKL1YaBTRyS5OW+1B+
 LPLOxm9PBuzt0wEArVunv7iMgvRzp5spbmCqmD8Is2vSf+45KSrb10WU2wo=
 =L37R
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "A few more tracing fixes:

   - Fix a buffer overflow by checking nr_args correctly in probes

   - Fix a warning that is reported by clang

   - Fix a possible memory leak in error path of filter processing

   - Fix the selftest that checks for failures, but wasn't failing

   - Minor clean up on call site output of a memory trace event"

* tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  selftests/ftrace: Fix same probe error test
  mm, tracing: Print symbol name for call_site in trace events
  tracing: Have error path in predicate_parse() free its allocated memory
  tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro
  tracing/probe: Fix to check the difference of nr_args before adding probe
2019-09-30 09:29:53 -07:00
Linus Torvalds
c710364f78 MMC host:
- sdhci-pci: Add Genesys Logic GL975x support
  - sdhci-tegra: Recover loss in throughput for DMA
  - sdhci-of-esdhc: Fix DMA bug
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAl2OW6gXHHVsZi5oYW5z
 c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjClKFRAAghXFSUzeoqP/OJbMIMwR9FY3
 Rf0Q0PE37UDdADUcDdT530L7IpRbuLkFw84269QFZHvrQ9OD4+99aJVJqFSjgkWJ
 7OmvqJghpbr1kVlGChqNlhGBDbnW6KSeS4CM8gzOBbvXAx62pqot4RynAEhNuTzp
 Ogk/dtaKqSsWD4akCZV6EbevwQCWpQ8qMOPNK14pNEN20Qv179FC3mRtARpoYQsI
 d/x8fc+M3YtDkkMszCpgiQwDleRLFheWOvz70/rLEvhsAP3s8KiSkYp8PBQJDKTF
 teYU5mVnu5zxEtH9rbmOEtgzrliIFLN0QK28IKIHWwODFwhPxPVj3CAFEdwoWohN
 dYXUQKeFvhum63imGuit92JKsN0ZO6X4jr4mOGg4ikHoOHTwMr8+4rPS3oVIKWSZ
 ZD0Piq2hnBzNZqcN8hnK2x6WbOqH1Qt7f+VZFvwEmGjOQKu2TPM1oP6/S/GGYhfm
 riOI4W3qXyDa85AOVhMGwV87/1z0DKMmqkwufoYxbAU2cm0oZrmn68PFMrwkpJiT
 CehCDvwRIr/tgLshiWYKKJQkia3Cij/TaPX/NpBSYLYqqRXjgmdPqsgqM16YA7Je
 UCsdQZbj1S/JhhSwK6gFOzfV/8DvoUl7JYL1H4Px8Mzq5pbTaLLGzW27M9OsCnkP
 3+qXlKIjg95gdWuhUtg=
 =bZWp
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull more MMC updates from Ulf Hansson:
 "A couple more updates/fixes for MMC:

   - sdhci-pci: Add Genesys Logic GL975x support

   - sdhci-tegra: Recover loss in throughput for DMA

   - sdhci-of-esdhc: Fix DMA bug"

* tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: host: sdhci-pci: Add Genesys Logic GL975x support
  mmc: tegra: Implement ->set_dma_mask()
  mmc: sdhci: Let drivers define their DMA mask
  mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
  mmc: sdhci: improve ADMA error reporting
2019-09-30 09:21:53 -07:00
Krzysztof Wilczynski
9af032a301 csky: Move static keyword to the front of declaration
Move the static keyword to the front of declaration of
csky_pmu_of_device_ids, and resolve the following compiler
warning that can be seen when building with warnings
enabled (W=1):

arch/csky/kernel/perf_event.c:1340:1: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]

Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
2019-09-30 11:50:49 +08:00
Valentin Schneider
a2139d3b4f csky: entry: Remove unneeded need_resched() loop
Since the enabling and disabling of IRQs within preempt_schedule_irq()
is contained in a need_resched() loop, we don't need the outer arch
code loop.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
2019-09-30 11:49:47 +08:00
Linus Torvalds
97f9a3c4ee Documentation/process update for 5.4-rc1
Here are 2 small Documentation/process/embargoed-hardware-issues.rst
 file updates that missed my previous char/misc pull request for 5.4-rc1.
 
 The first one adds an Intel representative for the process, and the
 second one cleans up the text a bit more when it comes to how the
 disclosure rules work, as it was a bit confusing to some companies.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXZCMVg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymk1QCfarO6D7Wj/eg/BPSSkP/dgaLMog8AoLBJiBmz
 2ErEIjIqV0J/e3QYud8G
 =qUtH
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull Documentation/process update from Greg KH:
 "Here are two small Documentation/process/embargoed-hardware-issues.rst
  file updates that missed my previous char/misc pull request.

  The first one adds an Intel representative for the process, and the
  second one cleans up the text a bit more when it comes to how the
  disclosure rules work, as it was a bit confusing to some companies"

* tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Documentation/process: Clarify disclosure rules
  Documentation/process: Volunteer as the ambassador for Intel
2019-09-29 19:52:52 -07:00
Linus Torvalds
1eb80d6ffb Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "A couple of misc patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs dynroot: switch to simple_dir_operations
  fs/handle.c - fix up kerneldoc
2019-09-29 19:42:07 -07:00
Linus Torvalds
7edee5229c 9 smb3 patches including an important patch for debugging traces with wireshark, and 3 patches for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl2Pzl0ACgkQiiy9cAdy
 T1F7aAv9EUA2vEdV+3tyKX17yGm8GBVygANsdMlGqqmRhauO0+KJnrsTR19qh9na
 oe0r6EwaS6/JwDtM/Tt0YyjyRS7GDyfT4cNNFVmrJ0fnQV11FJR0X83uzdm3HydH
 eOyKNG22TwOeFJ3kWqdvSI0AtfbmIcVoOlUAAKsAsv2ksrJIW7Q1BIgQeD8estUV
 j8VjPEIc1c/69UU/H5ktrRHMeT5PO61SV8xGM47WnYkntlFDe1E83xWGoxo996Pc
 KdGSrB1edWXK6kSlX3yQWnoo8QxcUm8IjgsudqcnOrhnro9s/cDU5ZU1RlXNQeB8
 LMtYwNA7jEu9p3TIibxOCph4gofUWNV25GbEJWOY03NxWReTvgLsMbsreul+XNv9
 fow5mvCG94SaE8xDjTvzYRBTeYoXv0WjlTTJjqAlVshirQXk7a2dEBVBipkn0Ea7
 0845c3NtR20pDGQs3vVzdStDT2MwkNUl1hN4vE1Zl0p2ClOS+eFVq9MgIEddSLi2
 Z0oJsmfg
 =o1/m
 -----END PGP SIGNATURE-----

Merge tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:
 "Fixes from the recent SMB3 Test events and Storage Developer
  Conference (held the last two weeks).

  Here are nine smb3 patches including an important patch for debugging
  traces with wireshark, with three patches marked for stable.

  Additional fixes from last week to better handle some newly discovered
  reparse points, and a fix the create/mkdir path for setting the mode
  more atomically (in SMB3 Create security descriptor context), and one
  for path name processing are still being tested so are not included
  here"

* tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix oplock handling for SMB 2.1+ protocols
  smb3: missing ACL related flags
  smb3: pass mode bits into create calls
  smb3: Add missing reparse tags
  CIFS: fix max ea value size
  fs/cifs/sess.c: Remove set but not used variable 'capabilities'
  fs/cifs/smb2pdu.c: Make SMB2_notify_init static
  smb3: fix leak in "open on server" perf counter
  smb3: allow decryption keys to be dumped by admin for debugging
2019-09-29 19:37:32 -07:00
Mao Han
3a09d8e289 csky: Fixup csky_pmu.max_period assignment
The csky_pmu.max_period has type u64, and BIT() can only return
32 bits unsigned long on C-SKY. The initialization for max_period
will be incorrect when count_width is bigger than 32.

Use BIT_ULL()

Signed-off-by: Mao Han <han_mao@c-sky.com>
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
2019-09-30 10:26:33 +08:00
Guo Ren
48ede51fd9 csky: Fixup add zero_fp fixup perf backtrace panic
We need set fp zero to let backtrace know the end. The patch fixup perf
callchain panic problem, because backtrace didn't know what is the end
of fp.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Reported-by: Mao Han <han_mao@c-sky.com>
2019-09-30 10:26:32 +08:00
Mike Rapoport
fdbdcddc2c csky: Use generic free_initrd_mem()
The csky implementation of free_initrd_mem() is an open-coded version of
free_reserved_area() without poisoning.

Remove it and make csky use the generic version of free_initrd_mem().

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
2019-09-30 10:26:24 +08:00
Linus Torvalds
3f2dc2798b Merge branch 'entropy'
Merge active entropy generation updates.

This is admittedly partly "for discussion".  We need to have a way
forward for the boot time deadlocks where user space ends up waiting for
more entropy, but no entropy is forthcoming because the system is
entirely idle just waiting for something to happen.

While this was triggered by what is arguably a user space bug with
GDM/gnome-session asking for secure randomness during early boot, when
they didn't even need any such truly secure thing, the issue ends up
being that our "getrandom()" interface is prone to that kind of
confusion, because people don't think very hard about whether they want
to block for sufficient amounts of entropy.

The approach here-in is to decide to not just passively wait for entropy
to happen, but to start actively collecting it if it is missing.  This
is not necessarily always possible, but if the architecture has a CPU
cycle counter, there is a fair amount of noise in the exact timings of
reasonably complex loads.

We may end up tweaking the load and the entropy estimates, but this
should be at least a reasonable starting point.

As part of this, we also revert the revert of the ext4 IO pattern
improvement that ended up triggering the reported lack of external
entropy.

* getrandom() active entropy waiting:
  Revert "Revert "ext4: make __ext4_get_inode_loc plug""
  random: try to actively add entropy rather than passively wait for it
2019-09-29 19:25:39 -07:00
Linus Torvalds
02f03c4206 Revert "Revert "ext4: make __ext4_get_inode_loc plug""
This reverts commit 72dbcf7215.

Instead of waiting forever for entropy that may just not happen, we now
try to actively generate entropy when required, and are thus hopefully
avoiding the problem that caused the nice ext4 IO pattern fix to be
reverted.

So revert the revert.

Cc: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-29 17:59:23 -07:00
Linus Torvalds
50ee7529ec random: try to actively add entropy rather than passively wait for it
For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
caused a bootup regression due to lack of entropy at bootup together
with arguably broken user space that was asking for secure random
numbers when it really didn't need to.

See commit 72dbcf7215 (Revert "ext4: make __ext4_get_inode_loc plug").

This aims to solve the issue by actively generating entropy noise using
the CPU cycle counter when waiting for the random number generator to
initialize.  This only works when you have a high-frequency time stamp
counter available, but that's the case on all modern x86 CPU's, and on
most other modern CPU's too.

What we do is to generate jitter entropy from the CPU cycle counter
under a somewhat complex load: calling the scheduler while also
guaranteeing a certain amount of timing noise by also triggering a
timer.

I'm sure we can tweak this, and that people will want to look at other
alternatives, but there's been a number of papers written on jitter
entropy, and this should really be fairly conservative by crediting one
bit of entropy for every timer-induced jump in the cycle counter.  Not
because the timer itself would be all that unpredictable, but because
the interaction between the timer and the loop is going to be.

Even if (and perhaps particularly if) the timer actually happens on
another CPU, the cacheline interaction between the loop that reads the
cycle counter and the timer itself firing is going to add perturbations
to the cycle counter values that get mixed into the entropy pool.

As Thomas pointed out, with a modern out-of-order CPU, even quite simple
loops show a fair amount of hard-to-predict timing variability even in
the absense of external interrupts.  But this tries to take that further
by actually having a fairly complex interaction.

This is not going to solve the entropy issue for architectures that have
no CPU cycle counter, but it's not clear how (and if) that is solvable,
and the hardware in question is largely starting to be irrelevant.  And
by doing this we can at least avoid some of the even more contentious
approaches (like making the entropy waiting time out in order to avoid
the possibly unbounded waiting).

Cc: Ahmed Darwish <darwish.07@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Nicholas Mc Guire <hofrat@opentech.at>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-29 17:38:52 -07:00
Olof Johansson
9bfd7319e8 Merge tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
Fixes for omap variants

Few fixes for ti-sysc interconnect target module driver for no-idle
quirks that caused nfsroot to fail on some dra7 boards.

And let's fixes to get LCD working again for logicpd board that got
broken a while back with removal of panel-dpi driver. We need to now
use generic CONFIG_DRM_PANEL_SIMPLE instead.

* tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
  ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
  ARM: dts: am3517-evm: Fix missing video
  ARM: dts: logicpd-torpedo-baseboard: Fix missing video
  ARM: omap2plus_defconfig: Fix missing video
  bus: ti-sysc: Fix handling of invalid clocks
  bus: ti-sysc: Fix clock handling for no-idle quirks

Link: https://lore.kernel.org/r/pull-1568819401-72461@atomide.com
Signed-off-by: Olof Johansson <olof@lixom.net>
2019-09-29 11:20:48 -07:00
Olof Johansson
a4207a1c5e ARM SCMI fixes for v5.4
Couple of fixes: one in scmi reset driver initialising missed scmi handle
 and an other in scmi reset API implementation fixing the assignment of
 reset state
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEunHlEgbzHrJD3ZPhAEG6vDF+4pgFAl2CJ/kACgkQAEG6vDF+
 4pjUDRAAulQC4nLnchAHr77cSVUZdzHtO3fdZy4kyTOrqSzNBOYpMruPqPVx/RrO
 eF4qj+qJh2JpwbUHybo7rURA5DEpPeWrK7xGMRX5sFi0vwwSd0xNNo942VdyBGzx
 n8WiihWDbTulbfRVp3Zbzn1Hg0esYQ3zXJM1moDgmAkjjdjcXbZGDDLLJ8DMxauM
 jDu3v+9ZRYtpoh/GMz3hs1nIBv+/lq6XyYdC9p8ad6GHsdSDwRCg6GSHRPur3Q/a
 4Z4TKvMeqLGfsuHs+ZD/Cvh6HKBSOBBqS99AUny2Y/Rn/5ZYWv69h1Clk3biJ1Cp
 hgEzHDzkPrL/jU8NgGVlRTlX7lmhDXMFpYjP92P2cZSubbIhzNlXdLcJI8VtHZj/
 KKJKwEIu0WsFPrQI509mDdk9i5Dq6Ml9sOGSqyYtknipb3yuoSKaMiUhWSUGZ7PZ
 WQPvylMCCb9xDAU/kxybvx5gaOtaeWy0exQHCMmsNprkpOHWVoCYEhjwD/qixs5I
 BjT8ZTv1pkMql9V5MnB8R0NtMZPJGZiBJH4MCZb8LkTIAAQL8eT+gVUpYUQf98Od
 p4fx09upImpMic2tPuux0KiNnlmfG0E+tu7Jg2PIRhbR9H0l4KOrrMhUcUw4Sy/m
 yWO8X/9tItNlnFb8VTSa8b0/+7ncHeUubwOh/Rw/FM8cdPUtQhQ=
 =FmDn
 -----END PGP SIGNATURE-----

Merge tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes

ARM SCMI fixes for v5.4

Couple of fixes: one in scmi reset driver initialising missed scmi handle
and an other in scmi reset API implementation fixing the assignment of
reset state

* tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  reset: reset-scmi: add missing handle initialisation
  firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset

Link: https://lore.kernel.org/r/20190918142139.GA4370@bogus
Signed-off-by: Olof Johansson <olof@lixom.net>
2019-09-29 11:20:41 -07:00
Linus Torvalds
a3c0e7b1fe libnvdimm fixes v5.4-rc1
- Complete the reworks to interoperate with powerpc dynamic huge page sizes
 
 - Fix a crash due to missed accounting for the powerpc 'struct
   page'-memmap mapping granularity.
 
 - Fix badblock initialization for volatile (DRAM emulated) pmem ranges.
 
 - Stop triggering request_key() notifications to userspace when
   NVDIMM-security is disabled / not present.
 
 - Miscellaneous small fixups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdkAprAAoJEB7SkWpmfYgCjXoQAIwJE1VzNP1V+ARxfs1rTGVz
 pbNJiBnj4gxDaCkcKoatiadRkytUxeUNEcPslEKsfoNinXYqkpjMQoWm2VpILOMU
 nY+SvIudGRnuesq2/Y+CP8zrX6rV4eBDfHK05RN/Zp1IlW7pTDItUx8mJ7glmDwG
 PW0vkvK7yZ+dRFnpQ7QFjhA0Q3oudO5YcTVBDK5YYtDGlv69xfXqc9LW8SszJ1kU
 rhCIT1kdoL5of0TIgG5pTfmggPSQ9y1xPsKjllOHNa3m50eGOkkQLELOVzQb1frW
 cjAsPLjRDSzvdHHSLyu0Is04Q5JU2CucxHl2SXGHiOt5tigH8dk5XFxWt0Pc8EXx
 acYYiBqUXC3MomSYWeLK4BdO2cRTqcPPXgJYAqXblqr+/0ys+rFepjw+j8JkiLZa
 5UCC30l1GXEpw9u6gdCMqvvHN2gHvDB0BV82Sx8wTewJpeL18wCUJoKVuFmpsHko
 p1cCe7St1TzcK3eO+xfeW1rxNrcXUpKVYXVa/WOJW0vwErqAZ6YCdNuyJHocZzXn
 vNyIQmVDOlubsgBAI2ExxeZO6xc8UIwLhLg7XEJ0mg3k6UXA8HZxH2B2THJk1BSF
 RppodkYiMknh11sqgpGp+Hz5XSEg/jvmCdL/qRDGAwhsFhFaxDH37Kg4Qncj2/dg
 uDvDHXNCjbGpzCo3tyNx
 =Z6Fa
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

More libnvdimm updates from Dan Williams:

 - Complete the reworks to interoperate with powerpc dynamic huge page
   sizes

 - Fix a crash due to missed accounting for the powerpc 'struct
   page'-memmap mapping granularity

 - Fix badblock initialization for volatile (DRAM emulated) pmem ranges

 - Stop triggering request_key() notifications to userspace when
   NVDIMM-security is disabled / not present

 - Miscellaneous small fixups

* tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm/region: Enable MAP_SYNC for volatile regions
  libnvdimm: prevent nvdimm from requesting key when security is disabled
  libnvdimm/region: Initialize bad block for volatile namespaces
  libnvdimm/nfit_test: Fix acpi_handle redefinition
  libnvdimm/altmap: Track namespace boundaries in altmap
  libnvdimm: Fix endian conversion issues 
  libnvdimm/dax: Pick the right alignment default when creating dax devices
  powerpc/book3s64: Export has_transparent_hugepage() related functions.
2019-09-29 10:33:41 -07:00
Linus Torvalds
939ca9f175 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC updates from Eduardo Valentin:
 "This is a really small pull in the midst of a lot of pending patches.

  We are in the middle of restructuring how we are maintaining the
  thermal subsystem, as per discussion in our last LPC. For now, I am
  sending just some changes that were pending in my tree. Looking
  forward to get a more streamlined process in the next merge window"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
  thermal: db8500: Rewrite to be a pure OF sensor
  thermal: db8500: Use dev helper variable
  thermal: db8500: Finalize device tree conversion
  thermal: thermal_mmio: remove some dead code
2019-09-29 10:24:23 -07:00
Linus Torvalds
9ecb3e10a9 Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull  more i2c updates from Wolfram Sang:

 - make Lenovo Yoga C630 boot now that the dependencies are merged

 - restore BlockProcessCall for i801, accidently removed in this merge
   window

 - a bugfix for the riic driver

 - an improvement to the slave-eeprom driver which should have been in
   the first pull request but sadly got lost in the process

* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: slave-eeprom: Add read only mode
  i2c: i801: Bring back Block Process Call support for certain platforms
  i2c: riic: Clear NACK in tend isr
  i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630
2019-09-29 10:20:16 -07:00
Linus Torvalds
4d2af08ed0 IOMMU Fixes for Linux v5.4-rc1
A couple of fixes for the AMD IOMMU driver have piled up:
 
 	* Some fixes for the reworked IO page-table which caused memory
 	  leaks or did not allow to downgrade mappings under some
 	  conditions.
 
 	* Locking fixes to fix a couple of possible races around
 	  accessing 'struct protection_domain'. The races got introduced
 	  when the dma-ops path became lock-less in the fast-path.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl2PrpoACgkQK/BELZcB
 GuNo6A/9EpxNUllqaPLvGYJYPN1ye2kx9QOCYZW6vo+at10X9ywf69IqYtjP9cSe
 x5uWUy0BFjBhqHvMvQ+9m6begFsue/+csUZDmeP+KvBHwNxUOxFS/fb4P0WlmmNF
 /zzsjQbt+r1FRIdYodH2CvBJKyuxNxou0W1aARvs9iggoXVG5Es+WG9+kwnixBE+
 WB1gpuX0zKWlu31z2+i+JrVtdjMqoupfR/T40C4OsMD3NjfNi0bkCqmnqJ3CpNh9
 RWPmNlnd29imPhMYQonZcUFD6Ru4NOUCfEFCjHEK/nk9kSHMYjgkKFgOzvA8h1xG
 Nkzd0dRw39UMNYzKDGHHaE/xXRJV+kOFxZBcABnxfx2r+9EgXBD36AUOsfpeOdVi
 9ab75ok7Ly+tkCgdK7sEeuDD0HJiZkUYT7BqMTdBOt64BK/GtRvepF1Zv15hG6Xn
 imlAfyE4q+avTAJkrXeIu6IgdvF4XvorsIdeF5dKjCBTdTkj8DLXq/gejAo0g1NO
 shOz9E2lde1IdeT+U580nZy9JmkKDFjyeG4QkwSz7Oln/gHIFQS1K8A4i30kGiok
 vMsJzBidtUuqRWupwymtobCAggZE86O2XLOwnxolarJAFOqg5V2j7fSyL+XxXUDC
 r85Ve/jtAhMho5594X72CumoNzzr0bDyCcGerzvT0wBRXcKLIsw=
 =xajX
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:
 "A couple of fixes for the AMD IOMMU driver have piled up:

   - Some fixes for the reworked IO page-table which caused memory leaks
     or did not allow to downgrade mappings under some conditions.

   - Locking fixes to fix a couple of possible races around accessing
     'struct protection_domain'. The races got introduced when the
     dma-ops path became lock-less in the fast-path"

* tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Lock code paths traversing protection_domain->dev_list
  iommu/amd: Lock dev_data in attach/detach code paths
  iommu/amd: Check for busy devices earlier in attach_device()
  iommu/amd: Take domain->lock for complete attach/detach path
  iommu/amd: Remove amd_iommu_devtable_lock
  iommu/amd: Remove domain->updated
  iommu/amd: Wait for completion of IOTLB flush in attach_device
  iommu/amd: Unmap all L7 PTEs when downgrading page-sizes
  iommu/amd: Introduce first_pte_l7() helper
  iommu/amd: Fix downgrading default page-sizes in alloc_pte()
  iommu/amd: Fix pages leak in free_pagetable()
2019-09-29 10:00:14 -07:00
Thomas Gleixner
dc925a3606 Documentation/process: Clarify disclosure rules
The role of the contact list provided by the disclosing party and how it
affects the disclosure process and the ability to include experts into
the development process is not really well explained.

Neither is it entirely clear when the disclosing party will be informed
about the fact that a developer who is not covered by an employer NDA needs
to be brought in and disclosed.

Explain the role of the contact list and the information policy along with
an eventual conflict resolution better.

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.21.1909251028390.10825@nanos.tec.linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-29 12:43:18 +02:00
Linus Torvalds
02dc96ef6c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Sanity check URB networking device parameters to avoid divide by
    zero, from Oliver Neukum.

 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
    don't work properly. Longer term this needs a better fix tho. From
    Vijay Khemka.

 3) Small fixes to selftests (use ping when ping6 is not present, etc.)
    from David Ahern.

 4) Bring back rt_uses_gateway member of struct rtable, it's semantics
    were not well understood and trying to remove it broke things. From
    David Ahern.

 5) Move usbnet snaity checking, ignore endpoints with invalid
    wMaxPacketSize. From Bjørn Mork.

 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.

 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
    Alex Vesker, and Yevgeny Kliteynik

 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.

 9) Fix crash when removing sch_cbs entry while offloading is enabled,
    from Vinicius Costa Gomes.

10) Signedness bug fixes, generally in looking at the result given by
    of_get_phy_mode() and friends. From Dan Crapenter.

11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.

12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.

13) Fix quantization code in tcp_bbr, from Kevin Yang.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
  net: tap: clean up an indentation issue
  nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
  tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
  sk_buff: drop all skb extensions on free and skb scrubbing
  tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
  mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
  Documentation: Clarify trap's description
  mlxsw: spectrum: Clear VLAN filters during port initialization
  net: ena: clean up indentation issue
  NFC: st95hf: clean up indentation issue
  net: phy: micrel: add Asym Pause workaround for KSZ9021
  net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
  ptp: correctly disable flags on old ioctls
  lib: dimlib: fix help text typos
  net: dsa: microchip: Always set regmap stride to 1
  nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
  nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
  net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
  vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
  net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
  ...
2019-09-28 17:47:33 -07:00
Linus Torvalds
edf445ad7c Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes)
Merge hugepage allocation updates from David Rientjes:
 "We (mostly Linus, Andrea, and myself) have been discussing offlist how
  to implement a sane default allocation strategy for hugepages on NUMA
  platforms.

  With these reverts in place, the page allocator will happily allocate
  a remote hugepage immediately rather than try to make a local hugepage
  available. This incurs a substantial performance degradation when
  memory compaction would have otherwise made a local hugepage
  available.

  This series reverts those reverts and attempts to propose a more sane
  default allocation strategy specifically for hugepages. Andrea
  acknowledges this is likely to fix the swap storms that he originally
  reported that resulted in the patches that removed __GFP_THISNODE from
  hugepage allocations.

  The immediate goal is to return 5.3 to the behavior the kernel has
  implemented over the past several years so that remote hugepages are
  not immediately allocated when local hugepages could have been made
  available because the increased access latency is untenable.

  The next goal is to introduce a sane default allocation strategy for
  hugepages allocations in general regardless of the configuration of
  the system so that we prevent thrashing of local memory when
  compaction is unlikely to succeed and can prefer remote hugepages over
  remote native pages when the local node is low on memory."

Note on timing: this reverts the hugepage VM behavior changes that got
introduced fairly late in the 5.3 cycle, and that fixed a huge
performance regression for certain loads that had been around since
4.18.

Andrea had this note:

 "The regression of 4.18 was that it was taking hours to start a VM
  where 3.10 was only taking a few seconds, I reported all the details
  on lkml when it was finally tracked down in August 2018.

     https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/

  __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio
  workload degrade like in the "current upstream" above. And it still
  would have been that bad as above until 5.3-rc5"

where the bad behavior ends up happening as you fill up a local node,
and without that change, you'd get into the nasty swap storm behavior
due to compaction working overtime to make room for more memory on the
nodes.

As a result 5.3 got the two performance fix reverts in rc5.

However, David Rientjes then noted that those performance fixes in turn
regressed performance for other loads - although not quite to the same
degree.  He suggested reverting the reverts and instead replacing them
with two small changes to how hugepage allocations are done (patch
descriptions rephrased by me):

 - "avoid expensive reclaim when compaction may not succeed": just admit
   that the allocation failed when you're trying to allocate a huge-page
   and compaction wasn't successful.

 - "allow hugepage fallback to remote nodes when madvised": when that
   node-local huge-page allocation failed, retry without forcing the
   local node.

but by then I judged it too late to replace the fixes for a 5.3 release.
So 5.3 was released with behavior that harked back to the pre-4.18 logic.

But now we're in the merge window for 5.4, and we can see if this
alternate model fixes not just the horrendous swap storm behavior, but
also restores the performance regression that the late reverts caused.

Fingers crossed.

* emailed patches from David Rientjes <rientjes@google.com>:
  mm, page_alloc: allow hugepage fallback to remote nodes when madvised
  mm, page_alloc: avoid expensive reclaim when compaction may not succeed
  Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
  Revert "Revert "mm, thp: restore node-local hugepage allocations""
2019-09-28 14:26:47 -07:00
Steven Rostedt (VMware)
8ed4889eb8 selftests/ftrace: Fix same probe error test
The "same probe" selftest that tests that adding the same probe fails
doesn't add the same probe and passes, which fails the test.

Fixes: b78b94b821 ("selftests/ftrace: Update kprobe event error testcase")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28 17:13:40 -04:00
Changbin Du
f7d6316fb4 mm, tracing: Print symbol name for call_site in trace events
To improve the readability of raw slab trace points, print the call_site ip
using '%pS'. Then we can grep events with function names.

[002] ....   808.188897: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80
[002] ....   808.188898: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820
[002] ....   808.188904: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8
[002] ....   808.188913: kmem_cache_alloc: call_site=prepare_creds+0x26/0x100 ptr=0000000058d74ef8 bytes_req=168 bytes_alloc=576 gfp_flags=GFP_KERNEL
[002] ....   808.188917: kmalloc: call_site=security_prepare_creds+0x77/0xa0 ptr=0000000062400820 bytes_req=8 bytes_alloc=336 gfp_flags=GFP_KERNEL|__GFP_ZERO
[002] ....   808.188920: kmem_cache_alloc: call_site=getname_flags+0x4f/0x1e0 ptr=00000000cef40c80 bytes_req=4096 bytes_alloc=4480 gfp_flags=GFP_KERNEL
[002] ....   808.188925: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80
[002] ....   808.188926: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820
[002] ....   808.188931: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8

Link: http://lkml.kernel.org/r/20190914103215.23301-1-changbin.du@gmail.com

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28 17:13:39 -04:00
Navid Emamdoost
96c5c6e6a5 tracing: Have error path in predicate_parse() free its allocated memory
In predicate_parse, there is an error path that is not going to
out_free instead it returns directly which leads to a memory leak.

Link: http://lkml.kernel.org/r/20190920225800.3870-1-navid.emamdoost@gmail.com

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28 17:13:39 -04:00
Nathan Chancellor
968e517093 tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro
After r372664 in clang, the IF_ASSIGN macro causes a couple hundred
warnings along the lines of:

kernel/trace/trace_output.c:1331:2: warning: converting the enum
constant to a boolean [-Wint-in-bool-context]
kernel/trace/trace.h:409:3: note: expanded from macro
'trace_assign_type'
                IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry,
                ^
kernel/trace/trace.h:371:14: note: expanded from macro 'IF_ASSIGN'
                WARN_ON(id && (entry)->type != id);     \
                           ^
264 warnings generated.

This warning can catch issues with constructs like:

    if (state == A || B)

where the developer really meant:

    if (state == A || state == B)

This is currently the only occurrence of the warning in the kernel
tree across defconfig, allyesconfig, allmodconfig for arm32, arm64,
and x86_64. Add the implicit '!= 0' to the WARN_ON statement to fix
the warnings and find potential issues in the future.

Link: 28b38c277a
Link: https://github.com/ClangBuiltLinux/linux/issues/686
Link: http://lkml.kernel.org/r/20190926162258.466321-1-natechancellor@gmail.com

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28 17:13:39 -04:00
Masami Hiramatsu
d2aea95a1a tracing/probe: Fix to check the difference of nr_args before adding probe
Steven reported that a test triggered:

==================================================================
 BUG: KASAN: slab-out-of-bounds in trace_kprobe_create+0xa9e/0xe40
 Read of size 8 at addr ffff8880c4f25a48 by task ftracetest/4798

 CPU: 2 PID: 4798 Comm: ftracetest Not tainted 5.3.0-rc6-test+ #30
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
 Call Trace:
  dump_stack+0x7c/0xc0
  ? trace_kprobe_create+0xa9e/0xe40
  print_address_description+0x6c/0x332
  ? trace_kprobe_create+0xa9e/0xe40
  ? trace_kprobe_create+0xa9e/0xe40
  __kasan_report.cold.6+0x1a/0x3b
  ? trace_kprobe_create+0xa9e/0xe40
  kasan_report+0xe/0x12
  trace_kprobe_create+0xa9e/0xe40
  ? print_kprobe_event+0x280/0x280
  ? match_held_lock+0x1b/0x240
  ? find_held_lock+0xac/0xd0
  ? fs_reclaim_release.part.112+0x5/0x20
  ? lock_downgrade+0x350/0x350
  ? kasan_unpoison_shadow+0x30/0x40
  ? __kasan_kmalloc.constprop.6+0xc1/0xd0
  ? trace_kprobe_create+0xe40/0xe40
  ? trace_kprobe_create+0xe40/0xe40
  create_or_delete_trace_kprobe+0x2e/0x60
  trace_run_command+0xc3/0xe0
  ? trace_panic_handler+0x20/0x20
  ? kasan_unpoison_shadow+0x30/0x40
  trace_parse_run_command+0xdc/0x163
  vfs_write+0xe1/0x240
  ksys_write+0xba/0x150
  ? __ia32_sys_read+0x50/0x50
  ? tracer_hardirqs_on+0x61/0x180
  ? trace_hardirqs_off_caller+0x43/0x110
  ? mark_held_locks+0x29/0xa0
  ? do_syscall_64+0x14/0x260
  do_syscall_64+0x68/0x260

Fix to check the difference of nr_args before adding probe
on existing probes. This also may set the error log index
bigger than the number of command parameters. In that case
it sets the error position is next to the last parameter.

Link: http://lkml.kernel.org/r/156966474783.3478.13217501608215769150.stgit@devnote2

Fixes: ca89bc071d ("tracing/kprobe: Add multi-probe per event support")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-09-28 17:07:53 -04:00
David Rientjes
76e654cc91 mm, page_alloc: allow hugepage fallback to remote nodes when madvised
For systems configured to always try hard to allocate transparent
hugepages (thp defrag setting of "always") or for memory that has been
explicitly madvised to MADV_HUGEPAGE, it is often better to fallback to
remote memory to allocate the hugepage if the local allocation fails
first.

The point is to allow the initial call to __alloc_pages_node() to attempt
to defragment local memory to make a hugepage available, if possible,
rather than immediately fallback to remote memory.  Local hugepages will
always have a better access latency than remote (huge)pages, so an attempt
to make a hugepage available locally is always preferred.

If memory compaction cannot be successful locally, however, it is likely
better to fallback to remote memory.  This could take on two forms: either
allow immediate fallback to remote memory or do per-zone watermark checks.
It would be possible to fallback only when per-zone watermarks fail for
order-0 memory, since that would require local reclaim for all subsequent
faults so remote huge allocation is likely better than thrashing the local
zone for large workloads.

In this case, it is assumed that because the system is configured to try
hard to allocate hugepages or the vma is advised to explicitly want to try
hard for hugepages that remote allocation is better when local allocation
and memory compaction have both failed.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-28 14:05:38 -07:00