When auto-negotiation is enabled, the MAC flow control settings is
based on the flow control negotiation result. And it should be configured
after a valid link has been established. This patch adds support to update
flow control settings after auto-negotiation has completed.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds set_pauseparam support for ethtool cmd.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When phy exists, we use the value of phydev.autoneg to represent the
auto-negotiation state of hardware. Otherwise, we use the value of
mac.autoneg to represent it.
This patch fixes for getting a error value of auto-negotiation state in
hclge_get_autoneg().
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When checking whether auto-negotiation is on, driver only needs to
check the value of mac.autoneg(SW) directly, and does not need to
query it from hardware. Because this value is always synchronized
with the auto-negotiation state of hardware.
This patch removes the mac auto-negotiation state query.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch deals with the vlan tag information between
sk_buff and rx/tx bd.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds offload command related to "ethtool -K".
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds vlan offload config commands, initializes
the rules of tx/rx vlan tag handle for hw.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch sets vlan masked, in order to avoid the received
packets being filtered.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add configuration for rss_size_max in hdev but not hardcode it.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a memory leak problems in change tqps process,
the function hns3_uninit_all_ring and hns3_init_all_ring
may be called many times.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies the return data of get_rxnfc, it will return
the current handle's rss_size but not the total tqp number.
because the tc_size has been change to the log2 of roundup
power of two of rss_size.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support to change tqps number for PF driver
by using ehtool -L command.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support to query tqps number for PF driver
by using ehtool -l command.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS currently doesn't check if the length of the control message is
large enough to hold the required data, before dereferencing the control
message data. This results in following crash:
BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
[inline]
BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
net/rds/send.c:1066
Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157
CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
rds_rdma_bytes net/rds/send.c:1013 [inline]
rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
sock_sendmsg_nosec net/socket.c:628 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:638
___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
__sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
SYSC_sendmmsg net/socket.c:2139 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2134
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x43fe49
RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000
To fix this, we verify that the cmsg_len is large enough to hold the
data to be read, before proceeding further.
Reported-by: syzbot <syzkaller-bugs@googlegroups.com>
Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There was a long-standing problem on HP Spectre X360 with Kabylake
where it lacks of the front speaker output in some situations. Also
there are other products showing the similar behavior. The culprit
seems to be the missing COEF setup on ALC codecs, ALC225/295/299,
which are all compatible.
This patch adds the proper COEF setup (to initialize idx 0x67 / bits
0x3000) for addressing the issue.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195457
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull hwmon fix from Guenter Roeck:
"Handle errors from thermal subsystem"
* tag 'hwmon-for-linus-v4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: Deal with errors from the thermal subsystem
- Fix a build problem in the gpio single register created by
refactorings.
- Fix assignment of GPIO line names, something that was
mangled by another patch.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaP6Y/AAoJEEEQszewGV1zG7wP/0muM/n/boAMHuBs2mBtC1qh
P8UAZjCO23Li4vX67d6kodYcVQhES8ntBhSANykaAdaNaluAHm6FTxrM0fGDt2Wd
MtwwzbmXgOFQT8/Vvou6fx2YLJBZgYf/x3U+G0tkXYQTmZTH5zHgl8vBz8+L8+5p
gWm02CVuogy8K/r11aYg3goj8wih6MTrMHKtBab9qvF9mB2/C2yrPdstDvXlvkvy
KiTSnZu4AHHm0cdFgQXgB3SAp2bnqqVz0jYuZjMofzLmNYdxQjLWDJvWfkNmxuzQ
+zsG4UG9AjHtjObdtsEs6e00V4oHbnyoHwFwXqvcBDumR56axbyfNpQYhB8zTtCm
4JrxJoM5pZG9V7oXv346eDspf+gVqEI8gVWpll+PPMRQ+Vlt9TvYsP+XOVA7eNcd
rM8wnK4xzXzTQRfKY5r7drhMtCOmaihnV8g599YPlzhS5rnmYLHNO6v6V9y6rmVd
gysskKtyKe0C4B2rFJdUHNihXTw9wdzMdiT2dlGDiH1o9k+NkCDLe4WeJ08Cg8Hb
LP+J5r+8dGqUItu77wSb3ZwmfTUq+0eG05CozggoTUDPphJz6OHaBf2xDKQolamL
RRLVgYbr0n58B1WJMGquLkV6wJOUVPRwQL9HvW4l7UmcObROv3rbKDTGmdte4BTY
uCDSUXTog2vubl20s36D
=9KOX
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Two fixes. They are both kind of important, so why not send a pull
request on christmas eve.
- Fix a build problem in the gpio single register created by
refactorings.
- Fix assignment of GPIO line names, something that was mangled by
another patch"
* tag 'gpio-v4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: fix "gpio-line-names" property retrieval
gpio: gpio-reg: fix build
Current clk_pm_runtime_put is using pm_runtime_put_sync which
is not safe to be called in clk_core_is_enabled as it should
be able to run in atomic context.
Thus use pm_runtime_put instead which is atomic safe.
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 9a34b45397 ("clk: Add support for runtime PM")
Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
The 'md' is allocated from 'tun_dst = ip_tun_rx_dst' and
since we've checked 'tun_dst', 'md' will never be NULL.
The patch removes it at both ipv4 and ipv6 erspan.
Fixes: afb4c97d90 ("ip6_gre: fix potential memory leak in ip6erspan_rcv")
Fixes: 50670b6ee9 ("ip_gre: fix potential memory leak in erspan_rcv")
Cc: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
of errors:
./include/trace/events/tcp.h:68:1: error: directive in argument list
./include/trace/events/tcp.h:75:1: error: directive in argument list
./include/trace/events/tcp.h:144:1: error: directive in argument list
./include/trace/events/tcp.h:151:1: error: directive in argument list
./include/trace/events/tcp.h:216:1: error: directive in argument list
./include/trace/events/tcp.h:223:1: error: directive in argument list
./include/trace/events/tcp.h:274:1: error: directive in argument list
./include/trace/events/tcp.h:281:1: error: directive in argument list
Once sparse finds an error, it stops printing warnings for the file it
is checking. This masks any sparse warnings that would normally be
reported for the core TCP code.
Instead, handle the preprocessor conditionals in a couple of auxiliary
macros. This also has the benefit of reducing duplicate code.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dereference tp->md5sig_info in tcp_v4_destroy_sock() the same way it is
done in the adjacent call to tcp_clear_md5_list().
Resolves this sparse warning:
net/ipv4/tcp_ipv4.c:1914:17: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_ipv4.c:1914:17: expected struct callback_head *head
net/ipv4/tcp_ipv4.c:1914:17: got struct callback_head [noderef] <asn:4>*<noident>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Non-functional cleanups in lan9303_csr_reg_wait():
- Change type of param 'mask' from int to u32.
- Remove param 'value' (will probably never be used)
- Reduced retries from 1000 to 25, consistent with lan9303_read_wait.
- Removed comments
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Changes v1 -> v2:
- Removed comments
Signed-off-by: David S. Miller <davem@davemloft.net>
If SNAT modifies the source address the resulting packet might match
an IPsec policy, reinject the packet if that's the case.
The exact same thing is already done for IPv4.
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the thermal subsystem returne -EPROBE_DEFER or any other error
when hwmon calls devm_thermal_zone_of_sensor_register(), this is
silently ignored.
I ran into this with an incorrectly defined thermal zone, making
it non-existing and thus this call failed with -EPROBE_DEFER
assuming it would appear later. The sensor was still added
which is incorrect: sensors must strictly be added after the
thermal zones, so deferred probe must be respected.
Fixes: d560168b5d ("hwmon: (core) New hwmon registration API")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
In case of tx clean up, we set '-1' as budget. This means clean up until
wq is empty or till (1 << 32) pkts are cleaned. Under heavy load this
will run for long time and cause
"watchdog: BUG: soft lockup - CPU#25 stuck for 21s!" warning.
This patch sets wq clean up budget to 256.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a group member receives a member WITHDRAW event, this might have
two reasons: either the peer member is leaving the group, or the link
to the member's node has been lost.
In the latter case we need to issue a DOWN event to the user right away,
and let function tipc_group_filter_msg() perform delete of the member
item. However, in this case we miss to change the state of the member
item to MBR_LEAVING, so the member item is not deleted, and we have a
memory leak.
We now separate better between the four sub-cases of a WITHRAW event
and make sure that each case is handled correctly.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to check block for being null in both tcf_block_put and
tcf_block_put_ext.
Fixes: 343723dd51 ("net: sched: fix clsact init error path")
Reported-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 2f487712b8 ("tipc: guarantee that group broadcast doesn't
bypass group unicast") we introduced a mechanism that requires the first
(replicated) broadcast sent after a unicast to be acknowledged by all
receivers before permitting sending of the next (true) broadcast.
The counter for keeping track of the number of acknowledges to expect
is based on the tipc_group::member_cnt variable. But this misses that
some of the known members may not be ready for reception, and will never
acknowledge the message, either because they haven't fully joined the
group or because they are leaving the group. Such members are identified
by not fulfilling the condition tested for in the function
tipc_group_is_enabled().
We now set the counter for the actual number of acks to receive at the
moment the message is sent, by just counting the number of recipients
satisfying the tipc_group_is_enabled() test.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with
two exceptions. First, it prints stack for multiple hits and not only
once as WARN_ONCE() does. Second, the user can disable prints of
WARN_ONCE by setting CONFIG_BUG to N.
The multiple prints of dump stack are actually not needed, because calls
without rtnl lock are programming errors and user can't do anything
about them except to complain to the mailing list after first occurrence
of such failure.
The user who disabled BUG/WARN prints did it explicitly because by default
in upstream kernel and distributions this option is enabled. It means
that user doesn't want to see prints about missing locks too.
This patch replaces open-coded variant in favor of already existing
macro and change error prints to be once only.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rcu_barrier_bh() in mini_qdisc_pair_swap() is to wait for
flying RCU callback installed by a previous mini_qdisc_pair_swap(),
however we miss it on the tp_head==NULL path, which leads to that
the RCU callback still uses miniq_old->rcu after it is freed together
with qdisc in qdisc_graft(). So just add it on that path too.
Fixes: 46209401f8 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath ")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under some circumstances driver will perform PHY reset in
ksz9031_read_status() to fix autoneg failure case (idle error count =
0xFF). When this happens ksz9031 will not detect link status change any
more when connecting to Netgear 1G switch (link can be recovered sometimes by
restarting netdevice "ifconfig down up"). Reproduced with TI am572x board
equipped with ksz9031 PHY while connecting to Netgear 1G switch.
Fix the issue by reconfiguring autonegotiation after PHY reset in
ksz9031_read_status().
Fixes: d2fd719bcb ("net/phy: micrel: Add workaround for bad autoneg")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When ip6gre is created using ioctl, its features, such as
scatter-gather, GSO and tx-checksumming will be turned off:
# ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
# ethtool -k gre6 (truncated output)
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
generic-segmentation-offload: off [requested on]
But when netlink is used, they will be enabled:
# ip link add gre6 type ip6gre remote fd00::1
# ethtool -k gre6 (truncated output)
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
generic-segmentation-offload: on
This results in a loss of performance when gre6 is created via ioctl.
The issue was found with LTP/gre tests.
Fix it by moving the setup of device features to a separate function
and invoke it with ndo_init callback because both netlink and ioctl
will eventually call it via register_netdevice():
register_netdevice()
- ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
- ip6gre_tunnel_init_common()
- ip6gre_tnl_init_features()
The moved code also contains two minor style fixes:
* removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
* fixed the issue reported by checkpatch: "Unnecessary parentheses around
'nt->encap.type == TUNNEL_ENCAP_NONE'"
Fixes: ac4eb009e4 ("ip6gre: Add support for basic offloads offloads excluding GSO")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove superfluous pin setup to get out of accessing invalid I/O pin
registers because the way for pin configuring tends to be different from
various SoCs and thus it should be better being managed and controlled by
the pinctrl driver which MT7622 already can support.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The property "mediatek,pctl" is only required for SoCs such as MT2701 and
MT7623, so adding a few words for stating the condition.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that we mark AN as enabled at boot time, rather than leaving
it disabled. This is noticable if your SFP module is fiber, and
it supports faster speeds than 1G with 2.5G support in place.
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting the ethtool settings, ensure that the validated PHY
interface mode is propagated to the current link settings, so that
2500BaseX can be selected.
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libnvdimm fixes from Dan Williams:
"These fixes are all tagged for -stable and have received a build
success notification from the kbuild robot.
- NVDIMM namespaces, configured to enforce 1GB alignment, fail to
initialize on platforms that mis-align the start or end of the
physical address range.
- The Linux implementation of the BTT (Block Translation Table) is
incompatible with the UEFI 2.7 definition of the BTT format. The
BTT layers a software atomic sector semantic on top of an NVDIMM
namespace. Linux needs to be compatible with the UEFI definition to
enable boot support or any pre-OS access of data on a BTT enabled
namespace.
- A fix for ACPI SMART notification events, this allows a userspace
monitor to register for health events rather than poll. This has
been broken since it was initially merged as the unit test
inadvertently worked around the problem. The urgency for fixing
this during the -rc series is driven by how expensive it is to poll
for this data (System Management Mode entry)"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, btt: Fix an incompatibility in the log layout
libnvdimm, btt: add a couple of missing kernel-doc lines
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
libnvdimm, pfn: fix start_pad handling for aligned namespaces
acpi, nfit: fix health event notification
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
The loop which populates the CPU entry area PMDs can wrap around on 32bit
machines when the number of CPUs is small.
It worked wonderful for NR_CPUS=64 for whatever reason and the moron who
wrote that code did not bother to test it with !SMP.
Check for the wraparound to fix it.
Fixes: 92a0f81d89 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas "Feels stupid" Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Commit cc2b14d510 ("bpf: teach verifier to recognize zero initialized
stack") introduced a very relaxed check when comparing stacks of different
states, effectively returning a positive result in many cases where it
shouldn't.
This can create problems in cases such as this following C pseudocode:
long var;
long *x = bpf_map_lookup(...);
if (!x)
return;
if (*x != 0xbeef)
var = 0;
else
var = 1;
/* This is the key part, calling a helper causes an explored state
* to be saved with the information that "var" is on the stack as
* STACK_ZERO, since the helper is first met by the verifier after
* the "var = 0" assignment. This state will however be wrongly used
* also for the "var = 1" case, so the verifier assumes "var" is always
* 0 and will replace the NULL assignment with nops, because the
* search pruning prevents it from exploring the faulty branch.
*/
bpf_ktime_get_ns();
if (var)
*(long *)0 = 0xbeef;
Fix the issue by making sure that the stack is fully explored before
returning a positive comparison result.
Also attach a couple tests that highlight the bad behavior. In the first
test, without this fix instructions 16 and 17 are replaced with nops
instead of being rejected by the verifier.
The second test, instead, allows a program to make a potentially illegal
read from the stack.
Fixes: cc2b14d510 ("bpf: teach verifier to recognize zero initialized stack")
Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jakub Kicinski says:
====================
Two small fixes here to listing maps and programs. The loop for showing
maps is written slightly differently to programs which was missed in JSON
output support, and output would be broken if any of the system calls
failed. Second fix is in very unlikely case that program or map disappears
after we get its ID we should just skip over that object instead of failing.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On program/map show we may get an ID of an object from GETNEXT,
but the object may disappear before we call GET_FD_BY_ID. If
that happens, ignore the object and continue.
Fixes: 71bb428fe2 ("tools: bpf: add bpftool")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We can't return from the middle of do_show(), because
json_array will not be closed. Break out of the loop.
Note that the error handling after the loop depends on
errno, so no need to set err.
Fixes: 831a0aafe5 ("tools: bpftool: add JSON output for `bpftool map *` commands")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Modelled strongly upon the arm64 implementation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Of note is two fixes for KVM XIVE (Power9 interrupt controller). These would
normally go via the KVM tree but Paul is away so I've picked them up.
Other than that, two fixes for error handling in the IMC driver, and one for a
potential oops in the BHRB code if the hardware records a branch address that
has subsequently been unmapped, and finally a s/%p/%px/ in our oops code.
Thanks to:
Anju T Sudhakar, Cédric Le Goater, Laurent Vivier, Madhavan Srinivasan, Naveen
N. Rao, Ravi Bangoria.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJaPNx6ExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBm
Dw/+K2DRM23L4I1OD+i71N0F9DIxoS95FhIheqnidJxWfff+sFyRhL1IQa6AUTfv
9vLGUQ6IcqmrzyiHClewRVsX0DeXB1mYpoCBIqhgyL1cspkp+cP7DubpaeB1wXpQ
vlq2VL6ZfeRAGvMykLIoE/xXtfVx8CuaAjY9AUIFvRRP4vupcpbl503cHEXmhaP9
GaV+8poslwbxYf9ZPucPJVg4dxmT2dEb/xiZ6lLTDt3QXZx3abnFWYXhxGkdGhpt
yPszkE3cDlypsa2nPfotEby4ThE9D4Ypxk1unSQfcFkaVjKAwwQ9MDED8E1NpEH5
hqxmYoUNqLcftcxSZHX93acyHgKfvfM69i/vN7YwjhMEISdSDYCTaDrkxv5ntK4S
A3FncuApqYPMRtFi+8O4AinUS2t2KkdLYckP1bXC++++F9wRth3iifK4QTj6cV9u
V4aAPWvNSTgye0lokcwQF2KVdfdku9pl/85bclKddwGa1byscvNvCVPKuexoR3fM
/PSNgzOizTMiAkuEO4WYmmuNNziSUjIMEWTfO4jIi2jKhuxg+s6hPg7SYN+iyQ/T
il4b/fjsX6snXtwzxH2Xjche3c0UIN8UfgEkgKO21gbdrr7Ec6IIzkdgwu2jMHnt
fEzUPYtW0vH9OKRqgKkY+YHYsBXNXu+pFUAu2jaG3KfPSWE=
=d5wh
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"This is all fairly boring, except that there's two KVM fixes that
you'd normally get via Paul's kvm-ppc tree. He's away so I picked them
up. I was waiting to see if he would apply them, which is why they
have only been in my tree since today. But they were on the list for a
while and have been tested on the relevant hardware.
Of note is two fixes for KVM XIVE (Power9 interrupt controller). These
would normally go via the KVM tree but Paul is away so I've picked
them up.
Other than that, two fixes for error handling in the IMC driver, and
one for a potential oops in the BHRB code if the hardware records a
branch address that has subsequently been unmapped, and finally a
s/%p/%px/ in our oops code.
Thanks to: Anju T Sudhakar, Cédric Le Goater, Laurent Vivier, Madhavan
Srinivasan, Naveen N. Rao, Ravi Bangoria"
* tag 'powerpc-4.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()
KVM: PPC: Book3S: fix XIVE migration of pending interrupts
powerpc/kernel: Print actual address of regs when oopsing
powerpc/perf: Fix kfree memory allocated for nest pmus
powerpc/perf/imc: Fix nest-imc cpuhotplug callback failure
powerpc/perf: Dereference BHRB entries safely
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJaPKq7AAoJELDendYovxMvQOEH/2iuLSDI7b5vjPuBCvFjituP
floACKQl3Zp1Xk//DQLwTis02/9cIAOUGM11PmrkEq1lehpXPxIPzyfpx3wbEezd
A9hP71AMojdOIUCxucAGg94kxryv9OgXT6/qggzLlpmEpo7x12dVSPV+LxfcbkqL
zeTi1WEzz9jacfFI5CRvJx68tacIxvxCdKfauq2Yz2AB3BKd2xtMR7j77lycAeSw
KTFaIikKnZ3Aonn/yRUhD89oOp/Kt7XJib3glsAAKgA1GMuqmJsk1yB4Wm3qkpGD
bFSzf51HLl2PRyV5PxlJOfHtyTUKRj1Jf80YQgI2x9jR2LT3pBSI+NZt7Paw4Wc=
=QB74
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"This contains two fixes for running under Xen:
- a fix avoiding resource conflicts between adding mmio areas and
memory hotplug
- a fix setting NX bits in page table entries copied from Xen when
running a PV guest"
* tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: Mark unallocated host memory as UNUSABLE
x86-64/Xen: eliminate W+X mappings
- Fix a locking problem during xattr block conversion that could lead to
the log checkpointing thread to try to write an incomplete buffer to
disk, which leads to a corruption shutdown
- Fix a null pointer dereference when removing delayed allocation extents
- Remove post-eof speculative allocations when reflinking a block past
current inode size so that we don't just leave them there and assert on
inode reclaim
- Relax an assert which didn't accurately reflect the way locking works
and would trigger under heavy io load
- Avoid infinite loop when cancelling copy on write extents after a
writeback failure
- Try to avoid copy on write transaction reservation overflows when
remapping after a successful write
- Fix various problems with the copy-on-write reservation automatic
garbage collection not being cleaned up properly during a ro remount
- Fix problems with rmap log items being processed in the wrong order,
leading to corruption shutdowns
- Fix problems with EFI recovery wherein the "remove any rmapping if
present" mechanism wasn't actually doing anything, which would lead
to corruption problems later when the extent is reallocated, leading
to multiple rmaps for the same extent
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJaO+dwAAoJEPh/dxk0SrTrY8YP/R9AXH3Wt6S2QGGjZfXURa22
/cioJKFl8hWay00ZT8Zcj4Pdx6R+stvausj5ECDvpdWZG+d28e61c1bxg+bqRYO5
JWXikWnAa80RQ5uEjOXHoUjAgk6u6YYuQHEuHH/xA0nL4Cw98WLSzLjqk7ZU53rx
P17dgUWWHta/w8OpxG9UG5pxvNW3VRitiyCMWxa2gzBPncHnCk3fu9lInpDzH9S+
xakwCRtfiAykoOG/O5pnMg6vw5r6ENwK7DymxXgqF+Vv/HzgMbeJs+9UON2eACtp
ECHGffN4pXpqWVcGDMs5cWCOfLUEjxCrotMLYpIrdZs5DptmOcOWpQpHWl4JiaXB
rqAxx3D0Yo+00ENponM01un8UgCXF5gqsDGyTzn99aPpDVqxCJw1XmSdOXRhcnnF
At2raUkXF+nbqaVwL3Y7ZJuOKs1hi3HpsYwwfvClR8cTFk/BaY6sQ4QnVR0Ggkg6
8lZxeDb8VdoUjWO11sX1edwGtR8g+p3PSHiUFSnh1JsbP2I0R+TV+j5Y9rMotxFT
Eq6+Ehp889GeSpEBCrDpMgNIABMjBxoi5JvOwXSUNhF5Rh/1Vf//7v31nXcyVlah
a95IhCYfQLFMtaYaGr2ElvdO+Qs1+ppsD207I4H86XotjRkvD7U+mJoYm9EaujQX
jgUDdZEsP5h5DX524VHU
=i51V
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are some XFS fixes for 4.15-rc5. Apologies for the unusually
large number of patches this late, but I wanted to make sure the
corruption fixes were really ready to go.
Changes since last update:
- Fix a locking problem during xattr block conversion that could lead
to the log checkpointing thread to try to write an incomplete
buffer to disk, which leads to a corruption shutdown
- Fix a null pointer dereference when removing delayed allocation
extents
- Remove post-eof speculative allocations when reflinking a block
past current inode size so that we don't just leave them there and
assert on inode reclaim
- Relax an assert which didn't accurately reflect the way locking
works and would trigger under heavy io load
- Avoid infinite loop when cancelling copy on write extents after a
writeback failure
- Try to avoid copy on write transaction reservation overflows when
remapping after a successful write
- Fix various problems with the copy-on-write reservation automatic
garbage collection not being cleaned up properly during a ro
remount
- Fix problems with rmap log items being processed in the wrong
order, leading to corruption shutdowns
- Fix problems with EFI recovery wherein the "remove any rmapping if
present" mechanism wasn't actually doing anything, which would lead
to corruption problems later when the extent is reallocated,
leading to multiple rmaps for the same extent"
* tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only skip rmap owner checks for unknown-owner rmap removal
xfs: always honor OWN_UNKNOWN rmap removal requests
xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
xfs: set cowblocks tag for direct cow writes too
xfs: remove leftover CoW reservations when remounting ro
xfs: don't be so eager to clear the cowblocks tag on truncate
xfs: track cowblocks separately in i_flags
xfs: allow CoW remap transactions to use reserve blocks
xfs: avoid infinite loop when cancelling CoW blocks after writeback failure
xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping
xfs: remove dest file's post-eof preallocations before reflinking
xfs: move xfs_iext_insert tracepoint to report useful information
xfs: account for null transactions in bunmapi
xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
xfs: add the ability to join a held buffer to a defer_ops