This is a partial revert of commit 338a16ba15 ("mm, thp: copying user
pages must schedule on collapse") which added a cond_resched() to
__collapse_huge_page_copy().
On x86 with CONFIG_HIGHPTE, __collapse_huge_page_copy is called in
atomic context and thus scheduling is not possible. This is only a
possible config on arm and i386.
Although need_resched has been shown to be set for over 100 jiffies
while doing the iteration in __collapse_huge_page_copy, this is better
than doing
if (in_atomic())
cond_resched()
to cover only non-CONFIG_HIGHPTE configs.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1706191341550.97821@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two fixes to remove spurious WARN_ONs from the new(ish) qedi driver.
The driver already prints a warning message, there's no need to panic
users by printing something that looks like an oops as well.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZTBWOAAoJEAVr7HOZEZN4g0gP/0DjcJmv8c72CzInKnfB+nF1
y9zrhOojrTD3OhChI0V82rxIfpFLL7ipMCmnl5zElsB0e8R+8w2EwkwO4VJz9lyu
mztjX68HxG3UmEA4wG9euixCUvK8ewt21PjbWF+voS0pj1bf1fSobl74J+5zQD2D
svoB93gqDGAIUVL388NWfwrR8Lo1Ysrc1FMtjIGQcrpx+7lVAKpN7olby7JKQpnx
2HJn64cUiPWX5qajEPnFOfpj8tNecg9Vq7+z8lIcGKXEQk0neraEejbtBWC2z9ZA
/WQJvA4Ed5BH2I4tG8Ba+J3Z5YFGMJhIdPD3P3Xs8rWAJ3Kzte2jWxP3o1T0cu0R
8W3b1EhFmyZbiAbhLf+UOlPGMY5cxej7dMzobH2h985mZAXamBnqO7yKXrGl/mHh
/ai4bC52AdJitOxObaRzyk7ilx4dfSODUfdbTD3j1gj7mIpFoDkgCqA/GrfDFk34
pzill8IiRxKntFgcZUPflBYAuvFZvGZHXvLsV6xxsYehMv7c3uxJ6/9rPowjxEAg
HYmkOH+nQgrgKo27SAviaz9Do9hfxYjJ1hP014DjsCAdtlzb05PhXAhJCAqaPzIG
UpPxpvE1QATI+ZtudLDt7Fk7uR6ggyvkgUkRoNktf4N8rhlUs3kAVQy3Mda3vVW3
97NN3G9kppiLATijLG/b
=ysyU
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two fixes to remove spurious WARN_ONs from the new(ish) qedi driver.
The driver already prints a warning message, there's no need to panic
users by printing something that looks like an oops as well"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qedi: Remove WARN_ON from clear task context.
scsi: qedi: Remove WARN_ON for untracked cleanup.
- don't allow swapon on files on the realtime device, because the swap
code will swap pages out to blocks on the data device, thereby
corrupting the filesystem
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJZSzmHAAoJEPh/dxk0SrTroa4P/02EPljuA4pOhYlTrsrKyul4
7KnVg1AFk2uYlNbEcZjKJTkhMhvCqtENorAWawixezAbSeumft24DgPVmXxXEGRx
f2ym8UiwEVSdTs2dlP/8HCgrrx3kgaF6H4tYnu4WQxkMkDfE6feTp0TcOsklW8R1
bR+V+Q9xSJ2WRji9mDBu++3jXKa1VlsOzCRDjnWI7E/ZHJ2n8y412qYxaOHPDvl2
g5AG7jOtB2D7nDEVtfuEdsuSIBHrUsZ/LWrpDlXMhTY7eJ5ipjvcs6RtMayufNdE
H5ZeA8bKIJNcpR5Y0MvAb5lQNDA5wg4MTLWfQQ7jlvnI6qaysqWR13UhbfzRBHg8
YDUUWtuyvq+2/gy94VOn82xKTerD8l+KE+pdZUU99qZDsHVZ0FZ0A2IpSA0ZRdj+
xYm2WnzIqgMp5OD0Ef+QYzMr0043eBnD1+CDnG/JbHz/S1nqI4KdzH5t2ndMg9YS
g4sl3qKEwR1ZHnECTu2Q9LWAtF5s8WBgVj3brDG9mdMZXwWYLyGKJDNZ6tsxwOzh
Z2Pp+6Gs5KRqCt5Acok84KjcS7/XVM0a4w9KOjmlZxZ1K9R5abAePGOT+GEGFP4g
qO2WOa+wHX2UlUQI+lYg60PFMCBtO41ewptx/1+ZluREyNE24aIRTQttRRdz2twA
/kF8Uf8eGzPWkyP/uCH3
=qkCp
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I have one more bugfix for you for 4.12-rc7 to fix a disk corruption
problem:
- don't allow swapon on files on the realtime device, because the
swap code will swap pages out to blocks on the data device, thereby
corrupting the filesystem"
* tag 'xfs-4.12-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't allow bmap on rt files
Michael Chan says:
====================
bnxt_en: Error handling and netpoll fixes.
Add missing error handling and fix netpoll handling. The current code
handles RX and TX events in netpoll mode and is causing lots of warnings
and errors in the RX code path in netpoll mode. The fix is to only handle
TX events in netpoll mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To handle netpoll properly, the driver must only handle TX packets
during NAPI. Handling RX events cause warnings and errors in
netpoll mode. The ndo_poll_controller() method should call
napi_schedule() directly so that a NAPI weight of zero will be used
during netpoll mode.
The bnxt_en driver supports 2 ring modes: combined, and separate rx/tx.
In separate rx/tx mode, the ndo_poll_controller() method will only
process the tx rings. In combined mode, the rx and tx completion
entries are mixed in the completion ring and we need to drop the rx
entries and recycle the rx buffers.
Add a function bnxt_force_rx_discard() to handle this in netpoll mode
when we see rx entries in combined ring mode.
Reported-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we get a TPA_END completion to handle a completed LRO packet, it
is possible that hardware would indicate errors. The current code is
not checking for the error condition. Define the proper error bits and
the macro to check for this error and abort properly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function, skb_complete_tx_timestamp(), used to allow passing in a
NULL pointer for the time stamps, but that was changed in commit
62bccb8cdb ("net-timestamp: Make the
clone operation stand-alone from phy timestamping"), and the existing
call sites, all of which are in the dp83640 driver, were fixed up.
Even though the kernel-doc was subsequently updated in commit
7a76a021cd ("net-timestamp: Update
skb_complete_tx_timestamp comment"), still a bug fix from Manfred
Rudigier came into the driver using the old semantics. Probably
Manfred derived that patch from an older kernel version.
This fix should be applied to the stable trees as well.
Fixes: 81e8f2e930 ("net: dp83640: Fix tx timestamp overflow handling.")
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net): ipsec 2017-06-23
1) Fix xfrm garbage collecting when unregistering a netdevice.
From Hangbin Liu.
2) Fix NULL pointer derefernce when exiting a network namespace.
From Hangbin Liu.
3) Fix some error codes in pfkey to prevent a NULL pointer derefernce.
From Dan Carpenter.
4) Fix NULL pointer derefernce on allocation failure in pfkey.
From Dan Carpenter.
5) Adjust IPv6 payload_len to include extension headers. Otherwise
we corrupt the packets when doing ESP GRO on transport mode.
From Yossi Kuperman.
6) Set nhoff to the proper offset of the IPv6 nexthdr when doing ESP GRO.
From Yossi Kuperman.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory allocation size is controlled by user-space,
if it is too large just fail silently and return NULL,
not to mention there is a fallback allocation later.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our customer encountered stuck NFS writes for blocks starting at specific
offsets w.r.t. page boundary caused by networking stack sending packets via
UFO enabled device with wrong checksum. The problem can be reproduced by
composing a long UDP datagram from multiple parts using MSG_MORE flag:
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 3000, 0, ...);
Assume this packet is to be routed via a device with MTU 1500 and
NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
this condition is tested (among others) to decide whether to call
ip_ufo_append_data():
((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
At the moment, we already have skb with 1028 bytes of data which is not
marked for GSO so that the test is false (fragheaderlen is usually 20).
Thus we append second 1000 bytes to this skb without invoking UFO. Third
sendto(), however, has sufficient length to trigger the UFO path so that we
end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
uses udp_csum() to calculate the checksum but that assumes all fragments
have correct checksum in skb->csum which is not true for UFO fragments.
When checking against MTU, we need to add skb->len to length of new segment
if we already have a partially filled skb and fragheaderlen only if there
isn't one.
In the IPv6 case, skb can only be null if this is the first segment so that
we have to use headersize (length of the first IPv6 header) rather than
fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
Fixes: e89e9cf539 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Fixes: e4c5e13aa4 ("ipv6: Should use consistent conditional judgement for
ip6 fragment between __ip6_append_data and ip6_finish_output")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a RAID set was created on dm-raid version < 1.9.0 (old RAID
superblock format), all of the new 1.9.0 members of the superblock are
uninitialized (zero) -- including the device sectors member needed to
support shrinking.
All the other accesses to superblock fields new in 1.9.0 were reviewed
and verified to be properly guarded against invalid use. The 'sectors'
member was the only one used when the superblock version is < 1.9.
Don't access the superblock's >= 1.9.0 'sectors' member unconditionally.
Also add respective comments.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This fixes up two commits that have touched this driver. The
command status field is now a blk_status_t, so we can't check
for < 0 and we definitely can't assume it's holding -Exxxx error
values. All we care about here is whether ->status is zero or not.
Check for that, and remove the various attempts at smart error
reporting. Just log to dmesg what command failed, and the
blk_status_t value.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 2a842acab1 ("block: introduce new block status code type")
Fixes: 3f5e6a3577 ("mtip32xx: convert internal command issue to block IO path")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current approach, which is the wholesale efi struct initialization from
a 'efi_xen' local template is not robust. Usually if new member is defined
then it is properly initialized in drivers/firmware/efi/efi.c, but not in
arch/x86/xen/efi.c.
The effect is that the Xen initialization clears any fields the generic code
might have set and the Xen code does not know about yet.
I saw this happen a few times, so let's initialize only the EFI struct members
used by Xen and maintain no local duplicate, to avoid such issues in the future.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: jgross@suse.com
Cc: linux-efi@vger.kernel.org
Cc: matt@codeblueprint.co.uk
Cc: stable@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1498128697-12943-3-git-send-email-daniel.kiper@oracle.com
[ Clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Do not double the offset of inline expansions when using
'perf probe' on inlined functions (Björn Töpel)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZTCOqAAoJENZQFvNTUqpAqt8QALXG9ikevZKBw3JQ6G0hmMdJ
5NZuVF/DnQleUEYmhGGQCt2NwNOwd4wMNcDWLpOd8jAJynVFycN/TW5J3pVx2xAt
Y9JHIZVPbbb4w5fCwk8KsJkRODZTw6JSGlToJjOLeYoGJH1ZxEzDZvYKVqeqnzMz
CCiNg0TS7jCjBLcQQRGKN1sWFIWqyuc1lV13pIjxQJekKXE/CE9pNQbngVgRBDb1
fEyA42ul75EAXRnfP7VRJyvOu5HmjXjP5acTphfVzwMGpuiqIdZTcWWJkl/BngTr
7Nu7yVfuEYiQ2jC+YgSKXlbhWZIEmHdDfiUOz4vrtVi47Ozg++4ldVM1df10fOdL
8QU8CZyK4uzanYwULm4puizZb7S5mndJaxdDI4mP/jZTq4d4TykbpmaACHLwozmT
dozRObK4g329Ww+5fyINgie8aftKwaHav6mPyKJVgxGc9JfZR7UYcMm7MJ1uSCp5
BKeBiRPJCzfKZQLmQXPhudcQE21R8QYs9Jw75OHqwyzLxVcn1isLPBh+k7FVUIe4
Ig7uTOLzaneHY5J6w6DNC/+crEQkOOc0rgp0Y958J429IFWHK1zVQwU4Y/BiV4c3
9+gEoTaSNrlM0f7NxSSPp+M9G6d1TfQwRoapphC7W7ObYzBc/T+fTyLZrWSlM+Bj
VJWQDPBjztT34zEdbAJZ
=oZCf
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-4.12-20170622' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull 'perf probe' fix from Arnaldo Carvalho de Melo:
- Do not double the offset of inline expansions when using
'perf probe' on inlined functions (Björn Töpel)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add code comment to make it clear that the fall-through is intentional
and, OR ret with its previous value to avoid overwriting it so that
callers can check the correct return value.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170622220535.GA4896@embeddedgus
[ Massage a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
The F54 driver is currently only using the first 6 bytes of F54 so there is
no need to read all 27 bytes. Some Dell systems (Dell XP13 9333 and
similar) have an issue with the touchpad or I2C bus when reading reports
larger then 16 bytes. Reads larger then 16 bytes are reported in two HID
reports. Something about the back to back reports seems to cause the next
read to report incorrect data. This results in F30 failing to load and the
click button failing to work.
Previous issues with the I2C controller or touchpad were addressed in:
commit 5b65c2a029 ("HID: rmi: check sanity of the incoming report")
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=195949
Signed-off-by: Andrew Duggan <aduggan@synaptics.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Nick Dyer <nick@shmanahar.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
A previous set of patches "cxl: Add support for Coherent Accelerator
Interface Architecture 2.0" has introduced a new support for the CAPI
cards. These patches have been tested on Simulation environment and
quite a bit of them have been tested on real hardware.
This patch brings new fixes after a series of tests carried out on new
equipment:
- Add POWER9 definition.
- Re-enable any masked interrupts when the AFU is not activated
after resetting the AFU.
- Remove the api cxl_is_psl8/9 which is no longer useful.
- Do not dump CAPI1 registers.
- Rewrite cxl_is_page_fault() function.
- Do not register slb callack on P9.
Fixes: f24be42aab ("cxl: Add psl9 specific code")
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull in the fix for shared tags, as it conflicts with the pending
changes in for-4.13/block. We already pulled in v4.12-rc5 to solve
other conflicts or get fixes that went into 4.12, so not a lot
of changes in this merge.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Emergency stacks have their thread_info mostly uninitialised, which in
particular means garbage preempt_count values.
Emergency stack code runs with interrupts disabled entirely, and is
used very rarely, so this has been unnoticed so far. It was found by a
proposed new powerpc watchdog that takes a soft-NMI directly from the
masked_interrupt handler and using the emergency stack. That crashed
at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be
garbage.
To fix this, zero the entire THREAD_SIZE allocation, and initialize
the thread_info.
Cc: stable@vger.kernel.org
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move it all into setup_64.c, use a function not a macro. Fix
crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix sparse warnings in scripts/kconfig/nconf* ('make nconfig'):
../scripts/kconfig/nconf.c:1071:32: warning: Using plain integer as NULL pointer
../scripts/kconfig/nconf.c:1238:30: warning: Using plain integer as NULL pointer
../scripts/kconfig/nconf.c:511:51: warning: Using plain integer as NULL pointer
../scripts/kconfig/nconf.c:1460:6: warning: symbol 'setup_windows' was not declared. Should it be static?
../scripts/kconfig/nconf.c:274:12: warning: symbol 'current_instructions' was not declared. Should it be static?
../scripts/kconfig/nconf.c:308:22: warning: symbol 'function_keys' was not declared. Should it be static?
../scripts/kconfig/nconf.gui.c:132:17: warning: non-ANSI function declaration of function 'set_colors'
../scripts/kconfig/nconf.gui.c:195:24: warning: Using plain integer as NULL pointer
nconf.gui.o before/after files are the same.
nconf.o before/after files are the same until the 'static' function
declarations are added.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
In commit 613f050d68 ("perf probe: Fix to probe on gcc generated
functions in modules"), the offset from symbol is, incorrectly, added
to the trace point address. This leads to incorrect probe trace points
for inlined functions and when using relative line number on symbols.
Prior this patch:
$ perf probe -m nf_nat -D in_range
p:probe/in_range nf_nat:in_range.isra.9+0
$ perf probe -m i40e -D i40e_clean_rx_irq
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
$ perf probe -m i40e -D i40e_clean_rx_irq:16
p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626
After:
$ perf probe -m nf_nat -D in_range
p:probe/in_range nf_nat:in_range.isra.9+0
$ perf probe -m i40e -D i40e_clean_rx_irq
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
$ perf probe -m i40e -D i40e_clean_rx_irq:16
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665
Committer testing:
Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
the functions that while not being explicitely marked as inline, were inlined
by the compiler:
# pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
__ew32
e1000_regdump
e1000e_dump_ps_pages
e1000_desc_unused
e1000e_systim_to_hwtstamp
e1000e_rx_hwtstamp
e1000e_update_rdt_wa
e1000e_update_tdt_wa
e1000_put_txbuf
e1000_consume_page
Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
them:
# perf probe -m e1000e -D e1000e_rx_hwtstamp
p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74
# perf probe -m e1000e -D e1000_consume_page
p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
that function:
$ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
<SNIP>
<1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
<13e27c> DW_AT_name : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
<13e287> DW_AT_low_pc : 0x17a30
<3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13e6f0> DW_AT_abstract_origin: <0x13ed2c>
<13e6f4> DW_AT_low_pc : 0x17be6
<SNIP>
<1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
<13ed2e> DW_AT_name : (indirect string, offset: 0xa54c3): e1000_consume_page
So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
address, gives us the offset we should use in the probe definition:
0x17be6 - 0x17a30 = 438
but above we have 876, which is twice as much.
Lets see the second inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():
<3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13e86f> DW_AT_abstract_origin: <0x13ed2c>
<13e873> DW_AT_low_pc : 0x17d21
0x17d21 - 0x17a30 = 753
So we where adding it at twice the offset from the containing function as we
should.
And then after this patch:
# perf probe -m e1000e -D e1000e_rx_hwtstamp
p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37
# perf probe -m e1000e -D e1000_consume_page
p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
#
Which matches the two first expansions and shows that because we were
doubling the offset it would spill over the next function:
readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
673: 0000000000017a30 1626 FUNC LOCAL DEFAULT 2 e1000_clean_jumbo_rx_irq
674: 0000000000018090 2013 FUNC LOCAL DEFAULT 2 e1000_clean_rx_irq_ps
This is the 3rd inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():
<3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13ec78> DW_AT_abstract_origin: <0x13ed2c>
<13ec7c> DW_AT_low_pc : 0x17f79
0x17f79 - 0x17a30 = 1353
So:
0x17a30 + 2 * 1353 = 0x184c2
And:
0x184c2 - 0x18090 = 1074
Which explains the bogus third expansion for e1000_consume_page() to end up at:
p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074
All fixed now :-)
[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 613f050d68 ("perf probe: Fix to probe on gcc generated functions in modules")
Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull cifs fixes from Steve French:
"Various small fixes for stable"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix some return values in case of error in 'crypt_message'
cifs: remove redundant return in cifs_creation_time_get
CIFS: Improve readdir verbosity
CIFS: check if pages is null rather than bv for a failed allocation
CIFS: Set ->should_dirty in cifs_user_readv()
MIPS:
- Fix build with KVM, DYNAMIC_DEBUG and JUMP_LABEL.
PPC:
- Fix host crashes/hangs on POWER9.
- Properly restore userspace state after KVM_RUN ioctl.
s390:
- Fix address translation in odd-ball cases (real-space designation
ASCEs).
x86:
- Fix privilege escalation in 64-bit Windows guests.
All patches are for stable and the x86 also has a CVE.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJZS9uIAAoJEED/6hsPKofou7UH/1AopK/4WzfZqIlObxf1O2K/
iqeoHlU/7TPz3+YVN4PxCyb9KWxOR1CS6IjmrrQRnl/ncYkFwUI11zb1Dao7mvYo
L/D4XeT9rLheNATj9RPlznIAbQicN3TFWWczMzR0T2kftHHDAe0rWF1hkyS3BDyY
n6V6LbG6h6ONacUHUFfDAgRugiI1rKAjKtOeFvylIS5nIe1ez1ocULBxoXVJFxv1
0XnX/OrWDocGeope0xt6Jmjr7N5cMU0fyjJ+VM4ap8HGmovVUPeXF+cKdaOUyZyS
L+4goghsHDK8fCrtQiPhL+TqQ7El0OtzzcSScb662vT1wd7haAtrQcv96WFAVE4=
=Zhvq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"MIPS:
- Fix build with KVM, DYNAMIC_DEBUG and JUMP_LABEL.
PPC:
- Fix host crashes/hangs on POWER9.
- Properly restore userspace state after KVM_RUN ioctl.
s390:
- Fix address translation in odd-ball cases (real-space designation
ASCEs).
x86:
- Fix privilege escalation in 64-bit Windows guests
All patches are for stable and the x86 also has a CVE"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix singlestepping over syscall
KVM: s390: gaccess: fix real-space designation asce handling for gmap shadows
KVM: MIPS: Fix maybe-uninitialized build failure
KVM: PPC: Book3S HV: Ignore timebase offset on POWER9 DD1
KVM: PPC: Book3S HV: Save/restore host values of debug registers
KVM: PPC: Book3S HV: Preserve userspace HTM state properly
KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
KVM: PPC: Book3S HV: Context-switch EBB registers properly
KVM: PPC: Book3S HV: Cope with host using large decrementer mode
- Use address passed in, rather than hard coded value
- Correct clock-names value in DT binding documentation
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJZSUUqAAoJEFGvii+H/Hdhw7oP/i2Lks1NeHv+fXR3qWfoYZc6
QAyAguc1NHRhVi76am+Qr4p5joAg8zrPeRVf8rBzIybS0d834M6ehs87rzkZPo01
3gkO9nvjWIzbXk9b3IONm8djVvTTEf8dndtfdqp2AD53sSXEm9FOavUB7zVwfOEU
Tyo/PIOb8xlhK05S+Yqq72gkCMgpQ4YXwzKQ2fvYH5RdHHq8g6hpPExDosrcH84g
u9Yd9ccNA03E+82rBj0AdNaM3ECm3lHdMLA+6BIolgGpm5PDGBERVYMibPtZCJ00
t0QbmOXeE8PA6x1hu2tHowz0MpqWdU6IplwxEu2Zd5ycQArjeSp7/zr8p5TNBEnq
zf7ADGa108hlMW4TKSe+vvsk6ya0G7Rw2QDM2qwp4ZUnU4zEScrbWEVXOX5kt8df
1oU43358rDDuCMfdHuKxyNgi6rT3b0cke//VGmPGdBlbW7rNBYoDUi3GynSXP38x
J1L6hEelszr9JDBQDw8s1bfsh0Yux3su+IXHwtBEJd8skEGdWZ9n8AUymEBwCW3o
41StxTxSutdl/fCu4pde0q2e1KoHNCFRhbxlNY7I69HnxLNzyVfWylEr/TSJb2JN
qnp1zx96VqGIp4fxSOhvcACm41XIa5+geGeeNde18B1Pc/YxFXuYc6kB7rokkEiS
GjbpslGVeZcnPCFpyB3c
=iygS
-----END PGP SIGNATURE-----
Merge tag 'mfd-fixes-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
- arizona: use address passed in, rather than hard coded value
- correct STM32 clock-names value in DT binding documentation
* tag 'mfd-fixes-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
dt-bindings: mfd: Update STM32 timers clock names
mfd: arizona: Fix typo using hard-coded register
The 8000 series adapters uses catch-all filters for encapsulated traffic
to support filtering VXLAN, NVGRE and GENEVE traffic.
This new filter functionality requires a longer MCDI command.
This patch increases the size of buffers on stack that were missed, which
fixes a kernel panic from the stack protector.
Fixes: 9b41080125 ("sfc: insert catch-all filters for encapsulated traffic")
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Acked-by: Bert Kenward bkenward@solarflare.com
Signed-off-by: David S. Miller <davem@davemloft.net>
This structure member is hidden behind CONFIG_SYSFS, and we
get a build error when that is disabled:
drivers/net/hyperv/netvsc_drv.c: In function 'netvsc_set_channels':
drivers/net/hyperv/netvsc_drv.c:754:49: error: 'struct net_device' has no member named 'num_rx_queues'; did you mean 'num_tx_queues'?
drivers/net/hyperv/netvsc_drv.c: In function 'netvsc_set_rxfh':
drivers/net/hyperv/netvsc_drv.c:1181:25: error: 'struct net_device' has no member named 'num_rx_queues'; did you mean 'num_tx_queues'?
As the value is only set once to the argument of alloc_netdev_mq(),
we can compare against that constant directly.
Fixes: ff4a441990 ("netvsc: allow get/set of RSS indirection table")
Fixes: 2b01888d1b ("netvsc: allow more flexible setting of number of channels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The per netns loopback_dev->ip6_ptr is unregistered and set to
NULL when its mtu is set to smaller than IPV6_MIN_MTU, this
leads to that we could set rt->rt6i_idev NULL after a
rt6_uncached_list_flush_dev() and then crash after another
call.
In this case we should just bring its inet6_dev down, rather
than unregistering it, at least prior to commit 176c39af29
("netns: fix addrconf_ifdown kernel panic") we always
override the case for loopback.
Thanks a lot to Andrey for finding a reliable reproducer.
Fixes: 176c39af29 ("netns: fix addrconf_ifdown kernel panic")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hwctx's queue_num has been set prior call blk_mq_init_hctx, so no need
set it again.
Signed-off-by: weiping <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vladislav Yasevich says:
====================
macvlan: Fix some issues with changing mac addresses
There are some issues in macvlan wrt to changing it's mac address.
* An error is returned in the specified address is the same as an already
assigned address.
* In passthru mode, the mac address of the macvlan device doesn't change.
* After changing the mac address of a passthru macvlan and then removing it,
the mac address of the physical device remains changed.
This patch series attempts to resolve these issues.
V2: Address a small issue in p4 where we save the address from the lowerdev
(from girish.moodalbail@oracle.com)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Passthru macvlans directly change the mac address of the lower
level device. That's OK, but after the macvlan is deleted,
the lower device is left with changed address and one needs to
reboot to bring back the origina HW addresses.
This scenario is actually quite common with passthru macvtap devices.
This patch attempts to solve this, by storing the mac address
of the lower device in macvlan_port structure and keeping track of
it through the changes.
After this patch, any changes to the lower device mac address
done trough the macvlan device, will be reverted back. Any
changs done directly to the lower device mac address will be kept.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the port passthru boolean into flags with accesor functions.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a lower device of the passthru macvlan changes it's address,
passthru macvlan is supposed to change it's own address as well.
However, that doesn't happen correctly because the check in
macvlan_addr_busy() will catch the fact that the lower level
(port) mac address is the same as the address we are trying to
assign to the macvlan, and return an error. As a reasult,
the address of the passthru macvlan device is never changed.
The same thing happens when the user attempts to change the
mac address of the passthru macvlan.
The simple solution appers to be to not check against
the lower device in case of passthru macvlan device, since
the 2 addresses are _supposed_ to be the same.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The user currently gets an EBUSY error when attempting to set
the mac address on a macvlan device to the same value.
This should really be a no-op as nothing changes. Catch
the condition and return early.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flag to indicate if a queue is rate-limited. Test the flag in
NAPI poll handler and avoid rescheduling the queue if true, otherwise
we risk locking up the host. The rescheduling will be done in the
timer callback function.
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are number of problems with configuration peer
network device in absence of IFLA_VETH_PEER attributes
where attributes for main network device shared with
peer.
First it is not feasible to configure both network
devices with same MAC address since this makes
communication in such configuration problematic.
This case can be reproduced with following sequence:
# ip link add address 02:11:22:33:44:55 type veth
# ip li sh
...
26: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc \
noop state DOWN mode DEFAULT qlen 1000
link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
27: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc \
noop state DOWN mode DEFAULT qlen 1000
link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
Second it is not possible to register both main and
peer network devices with same name, that happens
when name for main interface is given with IFLA_IFNAME
and same attribute reused for peer.
This case can be reproduced with following sequence:
# ip link add dev veth1a type veth
RTNETLINK answers: File exists
To fix both of the cases check if corresponding netlink
attributes are taken from peer_tb when valid or
name based on rtnl ops kind and random address is used.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cpsw driver tries to get macid for am43xx SoCs using the compatible
ti,am4372. But not all variants of am43x uses this complatible like
epos evm uses ti,am438x. So use a generic compatible ti,am43 to get
macid for all am43 based platforms.
Reviewed-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 242d3a49a2 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
unfortunately, as reported by jeffy, netdev_wait_allrefs()
could rebroadcast NETDEV_UNREGISTER event until all refs are
gone.
We have to add an additional check to avoid this corner case.
For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
for dev_change_net_namespace(), dev->reg_state is
NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.
Fixes: 242d3a49a2 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
Reported-by: jeffy <jeffy.chen@rock-chips.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit ("net/phy: micrel: Add workaround for bad autoneg") fixes an
autoneg failure case by resetting the hardware. This turns off
intterupts. Things will work themselves out if the phy polls, as it will
figure out it's state during a poll. However if the phy uses only
intterupts, the phy will stall, since interrupts are off. This patch
fixes the issue by calling config_intr after resetting the phy.
Fixes: d2fd719bcb ("net/phy: micrel: Add workaround for bad autoneg ")
Signed-off-by: Zach Brown <zach.brown@ni.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TF is handled a bit differently for syscall and sysret, compared
to the other instructions: TF is checked after the instruction completes,
so that the OS can disable #DB at a syscall by adding TF to FMASK.
When the sysret is executed the #DB is taken "as if" the syscall insn
just completed.
KVM emulates syscall so that it can trap 32-bit syscall on Intel processors.
Fix the behavior, otherwise you could get #DB on a user stack which is not
nice. This does not affect Linux guests, as they use an IST or task gate
for #DB.
This fixes CVE-2017-7518.
Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
NPU2 requires an extra explicit flush to an active GPU PID when
sending address translation shoot downs (ATSDs) to reliably flush the
GPU TLB. This patch adds just such a flush at the end of each sequence
of ATSDs.
We can safely use PID 0 which is always reserved and active on the
GPU. PID 0 is only used for init_mm which will never be a user mm on
the GPU. To enforce this we add a check in pnv_npu2_init_context()
just in case someone tries to use PID 0 on the GPU.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Use true/false for bool literals]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For real-space designation asces the asce origin part is only a token.
The asce token origin must not be used to generate an effective
address for storage references. This however is erroneously done
within kvm_s390_shadow_tables().
Furthermore within the same function the wrong parts of virtual
addresses are used to generate a corresponding real address
(e.g. the region second index is used as region first index).
Both of the above can result in incorrect address translations. Only
for real space designations with a token origin of zero and addresses
below one megabyte the translation was correct.
Furthermore replace a "!asce.r" statement with a "!*fake" statement to
make it more obvious that a specific condition has nothing to do with
the architecture, but with the fake handling of real space designations.
Fixes: 3218f7094b ("s390/mm: support real-space for gmap shadows")
Cc: David Hildenbrand <david@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Current DTLB load/store miss events (0x608/0x649) only counts 4K,2M and
4M page size.
Need to extend the events to support any page size (4K/2M/4M/1G).
The complete DTLB load/store miss events are:
DTLB_LOAD_MISSES.WALK_COMPLETED 0xe08
DTLB_STORE_MISSES.WALK_COMPLETED 0xe49
Signed-off-by: Kan Liang <Kan.liang@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20170619142609.11058-1-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The i2c-imx driver incorrectly uses readb()/writeb() to read and
write to the appropriate registers when performing a repeated start.
The appropriate imx_i2c_read_reg()/imx_i2c_write_reg() functions
should be used instead. Performing a repeated start results in
a kernel panic. The platform is imx.
Signed-off-by: Michail G Etairidis <m.etairidis@beck-ipc.com>
Fixes: ce1a78840f ("i2c: imx: add DMA support for freescale i2c driver")
Fixes: 054b62d9f2 ("i2c: imx: fix the i2c bus hang issue when do repeat restart")
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
IP6CB(skb)->nhoff is the offset of the nexthdr field in an IPv6
header, unless there are extension headers present, in which case
nhoff points to the nexthdr field of the last extension header.
In non-GRO code path, nhoff is set by ipv6_rcv before any XFRM code
is executed. Conversely, in GRO code path (when esp6_offload is loaded),
nhoff is not set. The following functions fail to read the correct value
and eventually the packet is dropped:
xfrm6_transport_finish
xfrm6_tunnel_input
xfrm6_rcv_tnl
Set nhoff to the proper offset of nexthdr in esp6_gro_receive.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
IPv6 payload length indicates the size of the payload, including any
extension headers.
In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len is set to the
payload size only, regardless of the presence of any extension headers.
After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet
according to the wrong payload_len, thus corrupting the packet.
Set payload_len to account for extension headers as well.
Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Pull block fixes from Jens Axboe:
"This contains a set of fixes for xen-blkback by way of Konrad, and a
performance regression fix for blk-mq for shared tags.
The latter could account for as much as a 50x reduction in
performance, with the test case from the user with 500 name spaces. A
more realistic setup on my end with 32 drives showed a 3.5x drop. The
fix has been thoroughly tested before being committed"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix performance regression with shared tags
xen-blkback: don't leak stack data via response ring
xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
xen/blkback: don't free be structure too early
xen/blkback: fix disconnect while I/Os in flight