Commit Graph

872921 Commits

Author SHA1 Message Date
Masahiro Yamada
85f0ae7e43 kbuild: update comment about KBUILD_ALLDIRS
Commit 000ec95fbe ("kbuild: pkg: rename scripts/package/Makefile to
scripts/Makefile.package") missed to update this comment.

Fixes: 000ec95fbe ("kbuild: pkg: rename scripts/package/Makefile to scripts/Makefile.package")
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-10-15 23:45:07 +09:00
Max Gurtovoy
28a4cac48c nvme-tcp: fix possible leakage during error flow
During nvme_tcp_setup_cmd_pdu error flow, one must call nvme_cleanup_cmd
since it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-15 22:47:29 +09:00
Max Gurtovoy
5812d04c4c nvmet-loop: fix possible leakage during error flow
During nvme_loop_queue_rq error flow, one must call nvme_cleanup_cmd since
it's symmetric to nvme_setup_cmd.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2019-10-15 22:47:28 +09:00
Ville Syrjälä
3e30d70805 drm/i915: Make .modeset_calc_cdclk() mandatory
While not all platforms allow us to change the cdclk frequency
we should still verify that the fixed cdclk frequency isn't
too low. To that end let's cook up a .modeset_calc_cdclk()
implementation that only does the min_cdclk vs. actual cdclk
frequency check for such platforms.

Also we mustn't forget about double wide pipe on gen2/3 when
doing this.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190708125325.16576-11-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
2019-10-15 16:41:13 +03:00
Ville Syrjälä
131d3b1af1 drm/i915: Stop using drm_atomic_helper_check_planes()
We need to insert stuff between the plane and crtc .atomic_check()
drm_atomic_helper_check_planes() doesn't allow us to do that so
stop using it and hand roll the loops instead.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190708125325.16576-9-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
2019-10-15 16:37:32 +03:00
Ville Syrjälä
3e706dff08 drm/i915: Switch to using DP_MSA_MISC_* defines
Now that we have standard defines for the MSA MISC bits lets use
them on HSW+ where we program these directly into the TRANS_MSA_MISC
register.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190718145053.25808-7-ville.syrjala@linux.intel.com
Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
2019-10-15 16:24:59 +03:00
Gwan-gyeong Mun
0299dfa7ad drm/i915/dp: Attach HDR metadata property to DP connector
It attaches HDR metadata property to DP connector on GLK+.
It enables HDR metadata infoframe sdp on GLK+ to be used to send
HDR metadata to DP sink.

v2: Minor style fix

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919195311.13972-9-gwan-gyeong.mun@intel.com
2019-10-15 16:24:59 +03:00
Gwan-gyeong Mun
b246cf215e drm/i915/dp: Program an Infoframe SDP Header and DB for HDR Static Metadata
Function intel_dp_setup_hdr_metadata_infoframe_sdp handles Infoframe SDP
header and data block setup for HDR Static Metadata. It enables writing of
HDR metadata infoframe SDP to panel. Support for HDR video was introduced
in DisplayPort 1.4. It implements the CTA-861-G standard for transport of
static HDR metadata. The HDR Metadata will be provided by userspace
compositors, based on blending policies and passed to the driver through
a blob property.

Because each of GEN11 and prior GEN11 have different register size for
HDR Metadata Infoframe SDP packet, it adds and uses different register
size.

Setup Infoframe SDP header and data block in function
intel_dp_setup_hdr_metadata_infoframe_sdp for HDR Static Metadata as per
dp 1.4 spec and CTA-861-F spec.
As per DP 1.4 spec, 2.2.2.5 SDP Formats. It enables Dynamic Range and
Mastering Infoframe for HDR content, which is defined in CTA-861-F spec.
According to DP 1.4 spec and CEA-861-F spec Table 5, in order to transmit
static HDR metadata, we have to use Non-audio INFOFRAME SDP v1.3.

+--------------------------------+-------------------------------+
|      [ Packet Type Value ]     |       [ Packet Type ]         |
+--------------------------------+-------------------------------+
| 80h + Non-audio INFOFRAME Type | CEA-861-F Non-audio INFOFRAME |
+--------------------------------+-------------------------------+
|      [Transmission Timing]                                     |
+----------------------------------------------------------------+
| As per CEA-861-F for INFOFRAME, including CEA-861.3 within     |
| which Dynamic Range and Mastering INFOFRAME are defined        |
+----------------------------------------------------------------+

v2: Add a missed blank line after function declaration.
v3: Remove not handled return values from
    intel_dp_setup_hdr_metadata_infoframe_sdp(). [Uma]
v9: Addressed review comments from Ville.
    - Add BUILD_BUG_ON to check a changing of struct dp_sdp size.
    - Change a passed size toward write_infoframe() for DP infoframe sdp
      packet for HDR static metadata.

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919195311.13972-8-gwan-gyeong.mun@intel.com
2019-10-15 16:24:59 +03:00
Gwan-gyeong Mun
922430dd40 drm/i915: Add new GMP register size for GEN11
According to Bspec, GEN11 and prior GEN11 have different register size for
HDR Metadata Infoframe SDP packet. It adds new VIDEO_DIP_GMP_DATA_SIZE for
GEN11. And it makes handle different register size for
HDMI_PACKET_TYPE_GAMUT_METADATA on hsw_dip_data_size() for each GEN
platforms. It addresses Uma's review comments.

v9: Add WARN_ON() when buffer size if larger than register size. [Ville]

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919195311.13972-7-gwan-gyeong.mun@intel.com
2019-10-15 16:24:59 +03:00
Gwan-gyeong Mun
9d1bb6f022 drm/i915/dp: Attach colorspace property
It attaches the colorspace connector property to a DisplayPort connector.
Based on colorspace change, modeset will be triggered to switch to a new
colorspace.

And in order to distinguish colorspace bwtween DP and HDMI connector, it
adds a handling of drm_mode_create_dp_colorspace_property() to
intel_attach_colorspace_property().

Based on colorspace property value create a VSC SDP packet with appropriate
colorspace. This would help to enable wider color gamut like BT2020 on a
sink device.

v9: Addressed review comments from Ville
  - Add a handling of drm_mode_create_dp_colorspace_property() to
    intel_attach_colorspace_property(). This hunk moved from the previous
    commit.

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919195311.13972-6-gwan-gyeong.mun@intel.com
2019-10-15 16:24:59 +03:00
Gwan-gyeong Mun
0c06fa1560 drm/i915/dp: Add support of BT.2020 Colorimetry to DP MSA
When BT.2020 Colorimetry output is used for DP, we should program BT.2020
Colorimetry to MSA and VSC SDP. In order to handle colorspace of
drm_connector_state, it moves a calling of intel_ddi_set_pipe_settings()
function into intel_ddi_pre_enable_dp(). And it also rename
intel_ddi_set_pipe_settings() to intel_ddi_set_dp_msa().

As per DP 1.4a spec section 2.2.4 [MSA Data Transport]
The MSA data that the DP Source device transports for reproducing the main
video stream. Attribute data is sent once per frame during the main video
stream’s vertical blanking period.

In order to distinguish needed colorimetry for VSC SDP, it adds
intel_dp_needs_vsc_sdp function.
If the output colorspace requires vsc sdp or output format is YCbCr 4:2:0,
it uses MSA with VSC SDP.

As per DP 1.4a spec section 2.2.4.3 [MSA Field for Indication of
Color Encoding Format and Content Color Gamut] while sending
BT.2020 Colorimetry signals we should program MSA MISC1 fields which
indicate VSC SDP for the Pixel Encoding/Colorimetry Format.

v2: Remove useless parentheses
v3: Addressed review comments from Ville
    - In order to checking output format and output colorspace on
      intel_dp_needs_vsc_sdp(), it passes entire intel_crtc_state struct
      value.
    - Remove a pointless variable.
v9: Addressed review comments from Ville
    - Remove a duplicated output color space from intel_crtc_state.
    - In order to handle colorspace of drm_connector_state, it moves a
      calling of intel_ddi_set_pipe_settings() function into
      intel_ddi_pre_enable_dp().

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919195311.13972-3-gwan-gyeong.mun@intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2019-10-15 16:24:59 +03:00
Gwan-gyeong Mun
bb71fb0072 drm/i915/dp: Extend program of VSC Header and DB for Colorimetry Format
It refactors and renames a function which handled vsc sdp header and data
block setup for supporting colorimetry format.
Function intel_dp_setup_vsc_sdp handles vsc sdp header and data block
setup for pixel encoding / colorimetry format.
In order to use colorspace information of a connector, it adds an argument
of drm_connector_state type.

Setup VSC header and data block in function intel_dp_setup_vsc_sdp for
pixel encoding / colorimetry format as per dp 1.4a spec, section 2.2.5.7.1,
table 2-119: VSC SDP Header Bytes, section 2.2.5.7.5,
table 2-120: VSC SDP Payload for DB16 through DB18.

Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919195311.13972-2-gwan-gyeong.mun@intel.com
2019-10-15 16:24:59 +03:00
Suthikulpanit, Suravee
ec21f17a94 iommu/amd: Fix incorrect PASID decoding from event log
IOMMU Event Log encodes 20-bit PASID for events:
    ILLEGAL_DEV_TABLE_ENTRY
    IO_PAGE_FAULT
    PAGE_TAB_HARDWARE_ERROR
    INVALID_DEVICE_REQUEST
as:
    PASID[15:0]  = bit 47:32
    PASID[19:16] = bit 19:16

Note that INVALID_PPR_REQUEST event has different encoding
from the rest of the events as the following:
    PASID[15:0]  = bit 31:16
    PASID[19:16] = bit 45:42

So, fixes the decoding logic.

Fixes: d64c0486ed ("iommu/amd: Update the PASID information printed to the system log")
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15 14:13:31 +02:00
Geert Uytterhoeven
ec37d4e999 iommu/ipmmu-vmsa: Only call platform_get_irq() when interrupt is mandatory
As platform_get_irq() now prints an error when the interrupt does not
exist, calling it gratuitously causes scary messages like:

    ipmmu-vmsa e6740000.mmu: IRQ index 0 not found

Fix this by moving the call to platform_get_irq() down, where the
existence of the interrupt is mandatory.

Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15 13:00:43 +02:00
Chris Wilson
8b390c1581 drm/i915/execlists: Clear semaphore immediately upon ELSP promotion
There is no significance to our delay before clearing the semaphore the
engine is waiting on, so release it as soon as we acknowledge the CS
update following our preemption request. This should allow the GPU to
resume work earlier, if it was stuck on the semaphore at the end of a
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015093204.25693-1-chris@chris-wilson.co.uk
2019-10-15 11:51:13 +01:00
Chris Wilson
454a325a97 drm/i915: Remove leftover vma->obj->pages_pin_count on insert/remove
We now do the page pin count upfront in vma_get_pages/vma_put_pages, so
that we do the allocations before we enter the vm->mutex. Our vma
page references we are tracked in vma->pages_count and the extra
obj->pages_pin_count being performed later in i915_vma_insert and
i915_vma_remove is redundant, and worse throws off the shrinker's logic
on when it can free an object by unbinding it.

Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reported-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015100155.10376-1-chris@chris-wilson.co.uk
2019-10-15 11:46:52 +01:00
Chris Wilson
56184a20a8 drm/i915: Drop obj.page_pin_count after a failed vma->set_pages()
Before we attempt to set_pages on the vma, we claim a
obj.pages_pin_count for it. If we subsequently fail to set the pages on
the vma, we need to drop our pinning before returning the error.

Reported-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015093915.3995-1-chris@chris-wilson.co.uk
2019-10-15 11:46:40 +01:00
Heiko Stuebner
f9258156c7 iommu/rockchip: Don't use platform_get_irq to implicitly count irqs
Till now the Rockchip iommu driver walked through the irq list via
platform_get_irq() until it encountered an ENXIO error. With the
recent change to add a central error message, this always results
in such an error for each iommu on probe and shutdown.

To not confuse people, switch to platform_count_irqs() to get the
actual number of interrupts before walking through them.

Fixes: 7723f4c5ec ("driver core: platform: Add an error message to platform_get_irq*()")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-10-15 12:45:16 +02:00
Sean Christopherson
7a22e03b0c x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
Check that the per-cpu cluster mask pointer has been set prior to
clearing a dying cpu's bit.  The per-cpu pointer is not set until the
target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the
teardown function, x2apic_dead_cpu(), is associated with the earlier
CPUHP_X2APIC_PREPARE.  If an error occurs before the cpu is awakened,
e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference
the NULL pointer and cause a panic.

  smpboot: do_boot_cpu failed(-22) to wakeup CPU#1
  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:x2apic_dead_cpu+0x1a/0x30
  Call Trace:
   cpuhp_invoke_callback+0x9a/0x580
   _cpu_up+0x10d/0x140
   do_cpu_up+0x69/0xb0
   smp_init+0x63/0xa9
   kernel_init_freeable+0xd7/0x229
   ? rest_init+0xa0/0xa0
   kernel_init+0xa/0x100
   ret_from_fork+0x35/0x40

Fixes: 023a611748 ("x86/apic/x2apic: Simplify cluster management")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
2019-10-15 10:57:09 +02:00
Roman Kagan
e211288b72 x86/hyperv: Make vapic support x2apic mode
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
when supported by the vcpus.

However, the apic access functions for Hyper-V enlightened apic assume
xapic mode only.

As a result, Linux fails to bring up secondary cpus when run as a guest
in QEMU/KVM with both hv_apic and x2apic enabled.

According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic
apic MSRs behave exactly the same as the corresponding architectural
x2apic MSRs, so there's no need to override the apic accessors.  The
only exception is hv_apic_eoi_write, which benefits from lazy EOI when
available; however, its implementation works for both xapic and x2apic
modes.

Fixes: 29217a4746 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
Fixes: 6b48cb5f83 ("X86/Hyper-V: Enlighten APIC access")
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
2019-10-15 10:57:09 +02:00
Joonas Lahtinen
fa41d6ee90 Merge drm/drm-next into drm-intel-next-queued
Backmerging to pull in HDR DP code:

https://lists.freedesktop.org/archives/dri-devel/2019-September/236453.html

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2019-10-15 11:18:26 +03:00
Daniel Kurtz
fadfee3f9d drm/bridge: dw-hdmi: Restore audio when setting a mode
When setting a new display mode, dw_hdmi_setup() calls
dw_hdmi_enable_video_path(), which disables all hdmi clocks, including
the audio clock.

We should only (re-)enable the audio clock if audio was already enabled
when setting the new mode.

Without this patch, on RK3288, there will be HDMI audio on some monitors
if i2s was played to headphone when the monitor was plugged.
ACER H277HU and ASUS PB278 are two of the monitors showing this issue.

Signed-off-by: Cheng-Yi Chiang <cychiang@chromium.org>
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Reviewed-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191008102145.55134-1-cychiang@chromium.org
2019-10-15 09:48:52 +02:00
Pavel Tatashin
8c551f919a arm64: hibernate: check pgd table allocation
There is a bug in create_safe_exec_page(), when page table is allocated
it is not checked that table is allocated successfully:

But it is dereferenced in: pgd_none(READ_ONCE(*pgdp)).  Check that
allocation was successful.

Fixes: 82869ac57b ("arm64: kernel: Add support for hibernate/suspend-to-disk")
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-14 17:57:29 -07:00
Julien Grall
ec52c7134b arm64: cpufeature: Treat ID_AA64ZFR0_EL1 as RAZ when SVE is not enabled
If CONFIG_ARM64_SVE=n then we fail to report ID_AA64ZFR0_EL1 as 0 when
read by userspace, despite being required by the architecture. Although
this is theoretically a change in ABI, userspace will first check for
the presence of SVE via the HWCAP or the ID_AA64PFR0_EL1.SVE field
before probing the ID_AA64ZFR0_EL1 register. Given that these are
reported correctly for this configuration, we can safely tighten up the
current behaviour.

Ensure ID_AA64ZFR0_EL1 is treated as RAZ when CONFIG_ARM64_SVE=n.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <dave.martin@arm.com>
Fixes: 06a916feca ("arm64: Expose SVE2 features for userspace")
Signed-off-by: Will Deacon <will@kernel.org>
2019-10-14 17:56:57 -07:00
David S. Miller
8c16b55bbf Merge branch 'aquantia-fixes'
Igor Russkikh says:

====================
Aquantia/Marvell AQtion atlantic driver fixes 10/2019

Here is a set of various bugfixes, to be considered for stable as well.

V2: double space removed
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 17:01:53 -07:00
Dmitry Bogdanov
9f051db566 net: aquantia: correctly handle macvlan and multicast coexistence
macvlan and multicast handling is now mixed up.
The explicit issue is that macvlan interface gets broken (no traffic)
after clearing MULTICAST flag on the real interface.

We now do separate logic and consider both ALLMULTI and MULTICAST
flags on the device.

Fixes: 11ba961c91 ("net: aquantia: Fix IFF_ALLMULTI flag functionality")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 17:01:53 -07:00
Dmitry Bogdanov
d08b9a0a3e net: aquantia: do not pass lro session with invalid tcp checksum
Individual descriptors on LRO TCP session should be checked
for CRC errors. It was discovered that HW recalculates
L4 checksums on LRO session and does not break it up on bad L4
csum.

Thus, driver should aggregate HW LRO L4 statuses from all individual
buffers of LRO session and drop packet if one of the buffers has bad
L4 checksum.

Fixes: f38f1ee8ae ("net: aquantia: check rx csum for all packets in LRO session")
Signed-off-by: Dmitry Bogdanov <dmitry.bogdanov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 17:01:53 -07:00
Igor Russkikh
ed4d81c4b3 net: aquantia: when cleaning hw cache it should be toggled
>From HW specification to correctly reset HW caches (this is a required
workaround when stopping the device), register bit should actually
be toggled.

It was previosly always just set. Due to the way driver stops HW this
never actually caused any issues, but it still may, so cleaning this up.

Fixes: 7a1bb49461 ("net: aquantia: fix potential IOMMU fault after driver unbind")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 17:01:53 -07:00
Igor Russkikh
06b0d7fe7e net: aquantia: temperature retrieval fix
Chip temperature is a two byte word, colocated internally with cable
length data. We do all readouts from HW memory by dwords, thus
we should clear extra high bytes, otherwise temperature output
gets weird as soon as we attach a cable to the NIC.

Fixes: 8f89401186 ("net: aquantia: add infrastructure to readout chip temperature")
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-14 17:01:52 -07:00
Linus Torvalds
5bc52f64e8 Merge branch 'akpm' (patches from Andrew)
Merge more fixes from Andrew Morton:
 "The usual shower of hotfixes and some followups to the recently merged
  page_owner enhancements"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
  mm/slab.c: fix kernel-doc warning for __ksize()
  xarray.h: fix kernel-doc warning
  bitmap.h: fix kernel-doc warning and typo
  fs/fs-writeback.c: fix kernel-doc warning
  fs/libfs.c: fix kernel-doc warning
  fs/direct-io.c: fix kernel-doc warning
  mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
  mm, hugetlb: allow hugepage allocations to reclaim as needed
  lib/test_meminit: add a kmem_cache_alloc_bulk() test
  mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
  lib/generic-radix-tree.c: add kmemleak annotations
  mm/slub: fix a deadlock in show_slab_objects()
  mm, page_owner: rename flag indicating that page is allocated
  mm, page_owner: decouple freeing stack trace from debug_pagealloc
  mm, page_owner: fix off-by-one error in __set_page_owner_handle()
2019-10-14 16:49:59 -07:00
Andy Shevchenko
75e99bf5ed gpio: lynxpoint: set default handler to be handle_bad_irq()
We switch the default handler to be handle_bad_irq() instead of
handle_simple_irq() (which was not correct anyway).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:19:05 +02:00
Andy Shevchenko
4c87540940 gpio: merrifield: Move hardware initialization to callback
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.

Fixes: 8f86a5b4ad ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:19:01 +02:00
Andy Shevchenko
a339120616 gpio: lynxpoint: Move hardware initialization to callback
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.

Fixes: 7b1e889436 ("gpio: lynxpoint: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:18:57 +02:00
Andy Shevchenko
a752fbb4b4 gpio: intel-mid: Move hardware initialization to callback
The driver wants to initialize related registers before IRQ chip will be added.
That's why move it to a corresponding callback. It also fixes the NULL pointer
dereference.

Fixes: 8069e69a97 ("gpio: intel-mid: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:18:51 +02:00
Andy Shevchenko
9411e3aaa6 gpiolib: Initialize the hardware with a callback
After changing the drivers to use GPIO core to add an IRQ chip
it appears that some of them requires a hardware initialization
before adding the IRQ chip.

Add an optional callback ->init_hw() to allow that drivers
to initialize hardware if needed.

This change is a part of the fix NULL pointer dereference
brought to the several drivers recently.

Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:18:46 +02:00
Andy Shevchenko
6658f87f21 gpio: merrifield: Restore use of irq_base
During conversion to internal IRQ chip initialization the commit
  8f86a5b4ad ("gpio: merrifield: Pass irqchip when adding gpiochip")
lost the irq_base assignment.

drivers/gpio/gpio-merrifield.c: In function ‘mrfld_gpio_probe’:
drivers/gpio/gpio-merrifield.c:405:17: warning: variable ‘irq_base’ set but not used [-Wunused-but-set-variable]

Assign the girq->first to it.

Fixes: 8f86a5b4ad ("gpio: merrifield: Pass irqchip when adding gpiochip")
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-10-15 01:18:15 +02:00
Max Filippov
8b39da9851 xtensa: drop EXPORT_SYMBOL for outs*/ins*
Custom outs*/ins* implementations are long gone from the xtensa port,
remove matching EXPORT_SYMBOLs.
This fixes the following build warnings issued by modpost since commit
15bfc2348d ("modpost: check for static EXPORT_SYMBOL* functions"):

  WARNING: "insb" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "insw" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "insl" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsb" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsw" [vmlinux] is a static EXPORT_SYMBOL
  WARNING: "outsl" [vmlinux] is a static EXPORT_SYMBOL

Cc: stable@vger.kernel.org
Fixes: d38efc1f15 ("xtensa: adopt generic io routines")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2019-10-14 16:02:04 -07:00
Jane Chu
3d7fed4ad8 mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
Mmap /dev/dax more than once, then read the poison location using
address from one of the mappings.  The other mappings due to not having
the page mapped in will cause SIGKILLs delivered to the process.
SIGKILL succeeds over SIGBUS, so user process loses the opportunity to
handle the UE.

Although one may add MAP_POPULATE to mmap(2) to work around the issue,
MAP_POPULATE makes mapping 128GB of pmem several magnitudes slower, so
isn't always an option.

Details -

  ndctl inject-error --block=10 --count=1 namespace6.0

  ./read_poison -x dax6.0 -o 5120 -m 2
  mmaped address 0x7f5bb6600000
  mmaped address 0x7f3cf3600000
  doing local read at address 0x7f3cf3601400
  Killed

Console messages in instrumented kernel -

  mce: Uncorrected hardware memory error in user-access at edbe201400
  Memory failure: tk->addr = 7f5bb6601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  dev_pagemap_mapping_shift: page edbe201: no PUD
  Memory failure: tk->size_shift == 0
  Memory failure: Unable to find user space address edbe201 in read_poison
  Memory failure: tk->addr = 7f3cf3601000
  Memory failure: address edbe201: call dev_pagemap_mapping_shift
  Memory failure: tk->size_shift = 21
  Memory failure: 0xedbe201: forcibly killing read_poison:22434 because of failure to unmap corrupted page
    => to deliver SIGKILL
  Memory failure: 0xedbe201: Killing read_poison:22434 due to hardware memory corruption
    => to deliver SIGBUS

Link: http://lkml.kernel.org/r/1565112345-28754-3-git-send-email-jane.chu@oracle.com
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Suggested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
87bf4f71af mm/slab.c: fix kernel-doc warning for __ksize()
Fix kernel-doc warning in mm/slab.c:

  mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'

Also add Return: documentation section for this function.

Link: http://lkml.kernel.org/r/68c9fd7d-f09e-d376-e292-c7b2bdf1774d@infradead.org
Fixes: 10d1f8cb39 ("mm/slab: refactor common ksize KASAN logic into slab_common.c")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
13bea898cd xarray.h: fix kernel-doc warning
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:

  include/linux/xarray.h:232: WARNING: Unexpected indentation.

Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f97e ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
2a7e582f42 bitmap.h: fix kernel-doc warning and typo
Fix kernel-doc warning in <linux/bitmap.h>:

  include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'

Also fix small typo (bitnaps).

Link: http://lkml.kernel.org/r/0729ea7a-2c0d-b2c5-7dd3-3629ee0803e2@infradead.org
Fixes: b9fa6442f7 ("cpumask: Implement cpumask_or_equal()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
b46ec1da5e fs/fs-writeback.c: fix kernel-doc warning
Fix kernel-doc warning in fs/fs-writeback.c:

  fs/fs-writeback.c:913: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'

Link: http://lkml.kernel.org/r/756645ac-0ce8-d47e-d30a-04d9e4923a4f@infradead.org
Fixes: d62241c7a4 ("writeback, memcg: Implement cgroup_writeback_by_id()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
8e88bfba77 fs/libfs.c: fix kernel-doc warning
Fix kernel-doc warning in fs/libfs.c:

  fs/libfs.c:496: warning: Excess function parameter 'available' description in 'simple_write_end'

Link: http://lkml.kernel.org/r/5fc9d70b-e377-0ec9-066a-970d49579041@infradead.org
Fixes: ad2a722f19 ("libfs: Open code simple_commit_write into only user")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Boaz Harrosh <boazh@netapp.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Randy Dunlap
c70d868f27 fs/direct-io.c: fix kernel-doc warning
Fix kernel-doc warning in fs/direct-io.c:

  fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'

Also, don't mark this function as having kernel-doc notation since it is
not exported.

Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org
Fixes: 6d544bb4d9 ("dio: centralize completion in dio_complete()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Zach Brown <zab@zabbo.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Vlastimil Babka
a2e9a5afce mm, compaction: fix wrong pfn handling in __reset_isolation_pfn()
Florian and Dave reported [1] a NULL pointer dereference in
__reset_isolation_pfn().  While the exact cause is unclear, staring at
the code revealed two bugs, which might be related.

One bug is that if zone starts in the middle of pageblock, block_page
might correspond to different pfn than block_pfn, and then the
pfn_valid_within() checks will check different pfn's than those accessed
via struct page.  This might result in acessing an unitialized page in
CONFIG_HOLES_IN_ZONE configs.

The other bug is that end_page refers to the first page of next
pageblock and not last page of current pageblock.  The online and valid
check is then wrong and with sections, the while (page < end_page) loop
might wander off actual struct page arrays.

[1] https://lore.kernel.org/linux-xfs/87o8z1fvqu.fsf@mid.deneb.enyo.de/

Link: http://lkml.kernel.org/r/20191008152915.24704-1-vbabka@suse.cz
Fixes: 6b0868c820 ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Florian Weimer <fw@deneb.enyo.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
David Rientjes
3f36d86694 mm, hugetlb: allow hugepage allocations to reclaim as needed
Commit b39d0ee263 ("mm, page_alloc: avoid expensive reclaim when
compaction may not succeed") has chnaged the allocator to bail out from
the allocator early to prevent from a potentially excessive memory
reclaim.  __GFP_RETRY_MAYFAIL is designed to retry the allocation,
reclaim and compaction loop as long as there is a reasonable chance to
make forward progress.  Neither COMPACT_SKIPPED nor COMPACT_DEFERRED at
the INIT_COMPACT_PRIORITY compaction attempt gives this feedback.

The most obvious affected subsystem is hugetlbfs which allocates huge
pages based on an admin request (or via admin configured overcommit).  I
have done a simple test which tries to allocate half of the memory for
hugetlb pages while the memory is full of a clean page cache.  This is
not an unusual situation because we try to cache as much of the memory
as possible and sysctl/sysfs interface to allocate huge pages is there
for flexibility to allocate hugetlb pages at any time.

System has 1GB of RAM and we are requesting 515MB worth of hugetlb pages
after the memory is prefilled by a clean page cache:

  root@test1:~# cat hugetlb_test.sh

  set -x
  echo 0 > /proc/sys/vm/nr_hugepages
  echo 3 > /proc/sys/vm/drop_caches
  echo 1 > /proc/sys/vm/compact_memory
  dd if=/mnt/data/file-1G of=/dev/null bs=$((4<<10))
  TS=$(date +%s)
  echo 256 > /proc/sys/vm/nr_hugepages
  cat /proc/sys/vm/nr_hugepages

The results for 2 consecutive runs on clean 5.3

  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 21.0694 s, 51.0 MB/s
  + date +%s
  + TS=1569905284
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  256
  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 21.7548 s, 49.4 MB/s
  + date +%s
  + TS=1569905311
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  256

Now with b39d0ee263 applied

  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 20.1815 s, 53.2 MB/s
  + date +%s
  + TS=1569905516
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  11
  root@test1:~# sh hugetlb_test.sh
  + echo 0
  + echo 3
  + echo 1
  + dd if=/mnt/data/file-1G of=/dev/null bs=4096
  262144+0 records in
  262144+0 records out
  1073741824 bytes (1.1 GB) copied, 21.9485 s, 48.9 MB/s
  + date +%s
  + TS=1569905541
  + echo 256
  + cat /proc/sys/vm/nr_hugepages
  12

The success rate went down by factor of 20!

Although hugetlb allocation requests might fail and it is reasonable to
expect them to under extremely fragmented memory or when the memory is
under a heavy pressure but the above situation is not that case.

Fix the regression by reverting back to the previous behavior for
__GFP_RETRY_MAYFAIL requests and disable the beail out heuristic for
those requests.

Mike said:

: hugetlbfs allocations are commonly done via sysctl/sysfs shortly after
: boot where this may not be as much of an issue.  However, I am aware of at
: least three use cases where allocations are made after the system has been
: up and running for quite some time:
:
: - DB reconfiguration.  If sysctl/sysfs fails to get required number of
:   huge pages, system is rebooted to perform allocation after boot.
:
: - VM provisioning.  If unable get required number of huge pages, fall
:   back to base pages.
:
: - An application that does not preallocate pool, but rather allocates
:   pages at fault time for optimal NUMA locality.
:
: In all cases, I would expect b39d0ee263 to cause regressions and
: noticable behavior changes.
:
: My quick/limited testing in
: https://lkml.kernel.org/r/3468b605-a3a9-6978-9699-57c52a90bd7e@oracle.com
: was insufficient.  It was also mentioned that if something like
: b39d0ee263 went forward, I would like exemptions for __GFP_RETRY_MAYFAIL
: requests as in this patch.

[mhocko@suse.com: reworded changelog]
Link: http://lkml.kernel.org/r/20191007075548.12456-1-mhocko@kernel.org
Fixes: b39d0ee263 ("mm, page_alloc: avoid expensive reclaim when compaction may not succeed")
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Alexander Potapenko
03a9349ac0 lib/test_meminit: add a kmem_cache_alloc_bulk() test
Make sure allocations from kmem_cache_alloc_bulk() and
kmem_cache_free_bulk() are properly initialized.

Link: http://lkml.kernel.org/r/20191007091605.30530-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Thibaut Sautereau <thibaut@sautereau.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Alexander Potapenko
0f181f9fbe mm/slub.c: init_on_free=1 should wipe freelist ptr for bulk allocations
slab_alloc_node() already zeroed out the freelist pointer if
init_on_free was on.  Thibaut Sautereau noticed that the same needs to
be done for kmem_cache_alloc_bulk(), which performs the allocations
separately.

kmem_cache_alloc_bulk() is currently used in two places in the kernel,
so this change is unlikely to have a major performance impact.

SLAB doesn't require a similar change, as auto-initialization makes the
allocator store the freelist pointers off-slab.

Link: http://lkml.kernel.org/r/20191007091605.30530-1-glider@google.com
Fixes: 6471384af2 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Thibaut Sautereau <thibaut@sautereau.fr>
Reported-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Laura Abbott <labbott@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Eric Biggers
3c52b0af05 lib/generic-radix-tree.c: add kmemleak annotations
Kmemleak is falsely reporting a leak of the slab allocation in
sctp_stream_init_ext():

  BUG: memory leak
  unreferenced object 0xffff8881114f5d80 (size 96):
   comm "syz-executor934", pid 7160, jiffies 4294993058 (age 31.950s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<00000000ce7a1326>] kmemleak_alloc_recursive  include/linux/kmemleak.h:55 [inline]
     [<00000000ce7a1326>] slab_post_alloc_hook mm/slab.h:439 [inline]
     [<00000000ce7a1326>] slab_alloc mm/slab.c:3326 [inline]
     [<00000000ce7a1326>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
     [<000000007abb7ac9>] kmalloc include/linux/slab.h:547 [inline]
     [<000000007abb7ac9>] kzalloc include/linux/slab.h:742 [inline]
     [<000000007abb7ac9>] sctp_stream_init_ext+0x2b/0xa0  net/sctp/stream.c:157
     [<0000000048ecb9c1>] sctp_sendmsg_to_asoc+0x946/0xa00  net/sctp/socket.c:1882
     [<000000004483ca2b>] sctp_sendmsg+0x2a8/0x990 net/sctp/socket.c:2102
     [...]

But it's freed later.  Kmemleak misses the allocation because its
pointer is stored in the generic radix tree sctp_stream::out, and the
generic radix tree uses raw pages which aren't tracked by kmemleak.

Fix this by adding the kmemleak hooks to the generic radix tree code.

Link: http://lkml.kernel.org/r/20191004065039.727564-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reported-by: <syzbot+7f3b6b106be8dcdcdeec@syzkaller.appspotmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:00 -07:00
Qian Cai
e4f8e513c3 mm/slub: fix a deadlock in show_slab_objects()
A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58bcba ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e25f ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   #2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[akpm@linux-foundation.org: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/1570192309-10132-1-git-send-email-cai@lca.pw
Fixes: 01fb58bcba ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e25f ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:00 -07:00