Previously we will return directly if (!rt || !rt->fib6_nh.fib_nh_gw_family)
in function rt6_probe(), but after commit cc3a86c802
("ipv6: Change rt6_probe to take a fib6_nh"), the logic changed to
return if there is fib_nh_gw_family.
Fixes: cc3a86c802 ("ipv6: Change rt6_probe to take a fib6_nh")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kobject_init_and_add takes reference even when it fails. This has
to be given up by the caller in error handling. Otherwise memory
allocated by kobject_init_and_add is never freed. Originally found
by Syzkaller:
BUG: memory leak
unreferenced object 0xffff8880679f8b08 (size 8):
comm "netdev_register", pid 269, jiffies 4294693094 (age 12.132s)
hex dump (first 8 bytes):
72 78 2d 30 00 36 20 d4 rx-0.6 .
backtrace:
[<000000008c93818e>] __kmalloc_track_caller+0x16e/0x290
[<000000001f2e4e49>] kvasprintf+0xb1/0x140
[<000000007f313394>] kvasprintf_const+0x56/0x160
[<00000000aeca11c8>] kobject_set_name_vargs+0x5b/0x140
[<0000000073a0367c>] kobject_init_and_add+0xd8/0x170
[<0000000088838e4b>] net_rx_queue_update_kobjects+0x152/0x560
[<000000006be5f104>] netdev_register_kobject+0x210/0x380
[<00000000e31dab9d>] register_netdevice+0xa1b/0xf00
[<00000000f68b2465>] __tun_chr_ioctl+0x20d5/0x3dd0
[<000000004c50599f>] tun_chr_ioctl+0x2f/0x40
[<00000000bbd4c317>] do_vfs_ioctl+0x1c7/0x1510
[<00000000d4c59e8f>] ksys_ioctl+0x99/0xb0
[<00000000946aea81>] __x64_sys_ioctl+0x78/0xb0
[<0000000038d946e5>] do_syscall_64+0x16f/0x580
[<00000000e0aa5d8f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<00000000285b3d1a>] 0xffffffffffffffff
Cc: David Miller <davem@davemloft.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Jouni Hogander <jouni.hogander@unikie.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1d4639567d ("mdio_bus: Fix PTR_ERR applied after initialization
to constant") accidentally changed a check from -ENOTSUPP to -ENOSYS,
causing failures if reset controller support is not enabled. E.g. on
r7s72100/rskrza1:
sh-eth e8203000.ethernet: MDIO init failed: -524
sh-eth: probe of e8203000.ethernet failed with error -524
Seen on r8a7740/armadillo, r7s72100/rskrza1, and r7s9210/rza2mevb.
Fixes: 1d4639567d ("mdio_bus: Fix PTR_ERR applied after initialization to constant")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 075e238d12.
Going to go with Geert's fix instead, which also has a
correct Fixes tag.
Signed-off-by: David S. Miller <davem@davemloft.net>
According to hardware user manual, bits5~7 in register
HCLGE_MISC_VECTOR_INT_STS means reset interrupts status,
but HCLGE_RESET_INT_M is defined as bits0~2 now. So it
will make hclge_reset_err_handle() read the wrong reset
interrupt status.
This patch fixes this wrong bit mask.
Fixes: 2336f19d78 ("net: hns3: check reset interrupt status when reset fails")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pm_runtime_put_autosuspend in probe will call runtime suspend to
disable clks automatically if CONFIG_PM is defined. (If CONFIG_PM
is not defined, its implementation will be empty, then runtime
suspend will not be called.)
Therefore, we can call pm_runtime_get_sync to runtime resume it
first to enable clks, which matches the runtime suspend. (Only when
CONFIG_PM is defined, otherwise pm_runtime_get_sync will also be
empty, then runtime resume will not be called.)
Then it is fine to disable clks without causing clock count mis-match.
Fixes: c43eab3edd ("net: fec: add missed clk_disable_unprepare in remove")
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modifying the link settings via phylink_ethtool_ksettings_set() and
phylink_ethtool_set_pauseparam() didn't always work as intended for
PHY based setups, as calling phylink_mac_config() would result in the
unresolved configuration being committed to the MAC, rather than the
configuration with the speed and duplex setting.
This would work fine if the update caused the link to renegotiate,
but if no settings have changed, phylib won't trigger a renegotiation
cycle, and the MAC will be left incorrectly configured.
Avoid calling phylink_mac_config() unless we are using an inband mode
in phylink_ethtool_ksettings_set(), and use phy_set_asym_pause() as
introduced in 4.20 to set the PHY settings in
phylink_ethtool_set_pauseparam().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the documentation on phylink's create and destroy functions to
explicitly state that the rtnl lock must not be held while calling
these.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
During performance testing, I found that one of my r8169 NICs suffered
a major performance loss, a 8168c model.
Running netperf's TCP_STREAM test didn't return the expected
throughput of > 900 Mb/s, but rather only about 22 Mb/s. Strange
enough, running the TCP_MAERTS and UDP_STREAM tests all returned with
throughput > 900 Mb/s, as did TCP_STREAM with the other r8169 NICs I can
test (either one of 8169s, 8168e, 8168f).
Bisecting turned up commit 93681cd7d9,
"r8169: enable HW csum and TSO" as the culprit.
I added my 8168c version, RTL_GIGA_MAC_VER_22, to the code
special-casing the 8168evl as per the patch below. This fixed the
performance problem for me.
Fixes: 93681cd7d9 ("r8169: enable HW csum and TSO")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I prefer to use my personal email address for kernel related work.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Rain River <rain.1986.08.12@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The taprio qdisc allows to set mqprio setting but only once. In case
if mqprio settings are provided next time the error is returned as
it's not allowed to change traffic class mapping in-flignt and that
is normal. But if configuration is absolutely the same - no need to
return error. It allows to provide same command couple times,
changing only base time for instance, or changing only scheds maps,
but leaving mqprio setting w/o modification. It more corresponds the
message: "Changing the traffic mapping of a running schedule is not
supported", so reject mqprio if it's really changed.
Also corrected TC_BITMASK + 1 for consistency, as proposed.
Fixes: a3d43c0d56 ("taprio: Add support adding an admin schedule")
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket
with TLS_TX takes the following path:
tcp_bpf_sendmsg_redir
tcp_bpf_push_locked
tcp_bpf_push
kernel_sendpage_locked
sock->ops->sendpage_locked
Also update the flags test in tls_sw_sendpage_locked to allow flag
MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this.
Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t
Link: https://github.com/wdebruij/kerneltools/commits/icept.2
Fixes: 0608c69c9a ("bpf: sk_msg, sock{map|hash} redirect through ULP")
Fixes: f3de19af0f ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous commit had a bug where the last page in the memory range
could not be synced. This change fixes the behavior so that all the
required pages are synced.
Fixes: 9cfeeb576d ("gve: Fixes DMA synchronization")
Signed-off-by: Adi Suresh <adisuresh@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When CONFIG_RESET_CONTROLLER is disabled, the
devm_reset_control_get_exclusive function returns -ENOTSUPP. This is not
handled in subsequent check and then the mdio device fails to probe.
When CONFIG_RESET_CONTROLLER is enabled, its code checks in OF for reset
device, and since it is not present, returns -ENOENT. -ENOENT is handled.
Add -ENOTSUPP also.
This happened to me when upgrading kernel on Turris Omnia. You either
have to enable CONFIG_RESET_CONTROLLER or use this patch.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
Fixes: 71dd6c0dff ("net: phy: add support for reset-controller")
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit eec4844fae ("proc/sysctl: add shared variables for range
check") did:
- .extra2 = &two,
+ .extra2 = SYSCTL_ONE,
here, which doesn't seem to be intentional, given the changelog.
This patch restores it to the previous, as the value of 2 still makes
sense (used in fib_multipath_hash()).
Fixes: eec4844fae ("proc/sysctl: add shared variables for range check")
Cc: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver forgets to disable the regulator in remove like what is done
in probe failure.
Add the missed call to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP_TX rings should not be limited by max_num_tx_rings_p_up.
To make sure total number of TX rings never exceed MAX_TX_RINGS,
add similar check in mlx4_en_alloc_tx_queue_per_tc(), where
a new value is assigned for num_up.
Fixes: 7e1dc5e926 ("net/mlx4_en: Limit the number of TX rings")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
info->options_len is 'u8' type, and when opts_len with a value >
IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int
to u8 and set a wrong value to info->options_len.
Kernel crashed in my test when doing:
# opts="0102:80:00800022"
# for i in {1..99}; do opts="$opts,0102:80:00800022"; done
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 ip_proto udp action tunnel_key \
set src_ip 10.0.99.192 dst_ip 10.0.99.193 \
dst_port 6081 id 11 geneve_opts $opts \
action mirred egress redirect dev geneve0
So we should do the similar check as cls_flower does, return error
when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts().
Fixes: 0ed5269f9e ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The helper mlxsw_sp_ipip_dev_ul_tb_id() determines the underlay VRF of a
GRE tunnel. For a tunnel without a bound device, it uses the same VRF that
the tunnel is in. However in Linux, a GRE tunnel without a bound device
uses the main VRF as the underlay. Fix the function accordingly.
mlxsw further assumed that moving a tunnel to a different VRF could cause
conflict in local tunnel endpoint address, which cannot be offloaded.
However, the only way that an underlay could be changed by moving the
tunnel device itself is if the tunnel device does not have a bound device.
But in that case the underlay is always the main VRF, so there is no
opportunity to introduce a conflict by moving such device. Thus this check
constitutes a dead code, and can be removed, which do.
Fixes: 6ddb7426a7 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of errors in unlink_clip_vcc, the logging level is set to
pr_crit but failures in clip_setentry are handled by pr_err().
The patch changes the severity consistent across invocations.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool expects ETHTOOL_GRXCLSRLALL to set ethtool_rxnfc->data with the
total number of entries in the rx classifier table. Surprisingly, mlx4
is missing this part (in principle ethtool could still move forward and
try the insert).
Tested: compiled and run command:
phh13:~# ethtool -N eth1 flow-type udp4 queue 4
Added rule with ID 255
Signed-off-by: Luigi Rizzo <lrizzo@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Daniel Borkmann says:
====================
pull-request: bpf 2019-11-15
The following pull-request contains BPF updates for your *net* tree.
We've added 1 non-merge commits during the last 9 day(s) which contain
a total of 1 file changed, 3 insertions(+), 1 deletion(-).
The main changes are:
1) Fix a missing unlock of bpf_devs_lock in bpf_offload_dev_create()'s
error path, from Dan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull crypto fix from Herbert Xu:
"This reverts a number of changes to the khwrng thread which feeds the
kernel random number pool from hwrng drivers. They were trying to fix
issues with suspend-and-resume but ended up causing regressions"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "hwrng: core - Freeze khwrng thread during suspend"
This reverts commit 03a3bb7ae6 ("hwrng: core - Freeze khwrng
thread during suspend"), ff296293b3 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") and 59b569480d ("random:
Use wait_event_freezable() in add_hwgenerator_randomness()").
These patches introduced regressions and we need more time to
get them ready for mainline.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Pull x86 fixes from Ingo Molnar:
"Two fixes: disable unreliable HPET on Intel Coffe Lake platforms, and
fix a lockdep splat in the resctrl code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix potential lockdep warning
x86/quirks: Disable HPET on Intel Coffe Lake platforms
Pull perf fixes from Ingo Molnar:
"Misc fixes: a handful of AUX event handling related fixes, a Sparse
fix and two ABI fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix missing static inline on perf_cgroup_switch()
perf/core: Consistently fail fork on allocation failures
perf/aux: Disallow aux_output for kernel events
perf/core: Reattach a misplaced comment
perf/aux: Fix the aux_output group inheritance fix
perf/core: Disallow uncore-cgroup events
Pull networking fixes from David Miller:
1) Fix memory leak in xfrm_state code, from Steffen Klassert.
2) Fix races between devlink reload operations and device
setup/cleanup, from Jiri Pirko.
3) Null deref in NFC code, from Stephan Gerhold.
4) Refcount fixes in SMC, from Ursula Braun.
5) Memory leak in slcan open error paths, from Jouni Hogander.
6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.
7) Info leak on short USB request answers in ax88172a driver, from
Oliver Neukum.
8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.
9) PTP config timestamp flags validation, from Richard Cochran.
10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.
11) Missing free_netdev() in gemini driver, from Chuhong Yuan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
ipmr: Fix skb headroom in ipmr_get_route().
net: hns3: cleanup of stray struct hns3_link_mode_mapping
net/smc: fix fastopen for non-blocking connect()
rds: ib: update WR sizes when bringing up connection
net: gemini: add missed free_netdev
net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
seg6: fix skb transport_header after decap_and_validate()
seg6: fix srh pointer in get_srh()
net: stmmac: Use the correct style for SPDX License Identifier
octeontx2-af: Use the correct style for SPDX License Identifier
ptp: Extend the test program to check the external time stamp flags.
mlx5: Reject requests to enable time stamping on both edges.
igb: Reject requests that fail to enable time stamping on both edges.
dp83640: Reject requests to enable time stamping on both edges.
mv88e6xxx: Reject requests to enable time stamping on both edges.
ptp: Introduce strict checking of external time stamp options.
renesas: reject unsupported external timestamp flags
mlx5: reject unsupported external timestamp flags
igb: reject unsupported external timestamp flags
dp83640: reject unsupported external timestamp flags
...
In route.c, inet_rtm_getroute_build_skb() creates an skb with no
headroom. This skb is then used by inet_rtm_getroute() which may pass
it to rt_fill_info() and, from there, to ipmr_get_route(). The later
might try to reuse this skb by cloning it and prepending an IPv4
header. But since the original skb has no headroom, skb_push() triggers
skb_under_panic():
skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:108!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
RIP: 0010:skb_panic+0xbf/0xd0
Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
FS: 00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
skb_push+0x7e/0x80
ipmr_get_route+0x459/0x6fa
rt_fill_info+0x692/0x9f0
inet_rtm_getroute+0xd26/0xf20
rtnetlink_rcv_msg+0x45d/0x630
netlink_rcv_skb+0x1a5/0x220
rtnetlink_rcv+0x15/0x20
netlink_unicast+0x305/0x3a0
netlink_sendmsg+0x575/0x730
sock_sendmsg+0xb5/0xc0
___sys_sendmsg+0x497/0x4f0
__sys_sendmsg+0xcb/0x150
__x64_sys_sendmsg+0x48/0x50
do_syscall_64+0xd2/0xac0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Actually the original skb used to have enough headroom, but the
reserve_skb() call was lost with the introduction of
inet_rtm_getroute_build_skb() by commit 404eb77ea7 ("ipv4: support
sport, dport and ip_proto in RTM_GETROUTE").
We could reserve some headroom again in inet_rtm_getroute_build_skb(),
but this function shouldn't be responsible for handling the special
case of ipmr_get_route(). Let's handle that directly in
ipmr_get_route() by calling skb_realloc_headroom() instead of
skb_clone().
Fixes: 404eb77ea7 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch cleans-up the stray left over code. It has no
functionality impact.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
TCP native during connection start, the FASTOPEN setsockopts trigger
this fallback, if the SMC-socket is still in state SMC_INIT.
But if a FASTOPEN setsockopt is called after a non-blocking connect(),
this is broken, and fallback does not make sense.
This change complements
commit cd2063604e ("net/smc: avoid fallback in case of non-blocking connect")
and fixes the syzbot reported problem "WARNING in smc_unhash_sk".
Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
Fixes: e1bbdd5704 ("net/smc: reduce sock_put() for fallback sockets")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
a connection being down while rds_ib_sysctl_max_send_wr or
rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
it comes back up.
Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
with the most current WR sizes.
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver forgets to free allocated netdev in remove like
what is done in probe failure.
Add the free to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sequence of operations:
ip link set dev br0 type bridge vlan_filtering 1
bridge vlan del dev swp2 vid 1
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev br0 type bridge vlan_filtering 0
apparently fails with the message:
[ 31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
[ 31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0)
[ 31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2
[ 31.335599] ------------[ cut here ]------------
[ 31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
[ 31.349981] br0: Commit of attribute (id=6) failed.
[ 31.354890] Modules linked in:
[ 31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062
[ 31.366167] Hardware name: Freescale LS1021A
[ 31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14)
[ 31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c)
[ 31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c)
[ 31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8)
[ 31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4)
[ 31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118)
[ 31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518)
[ 31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c)
[ 31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60)
[ 31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c)
[ 31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110)
[ 31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8)
[ 31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4)
[ 31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250)
[ 31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c)
[ 31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[ 31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0)
[ 31.505122] 7fa0: 00000002 b6f2e060 00000003 beabd6a4 00000000 00000000
[ 31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4
[ 31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68
The reason is the implementation of br_get_pvid:
static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
{
if (!vg)
return 0;
smp_rmb();
return vg->pvid;
}
Since VID 0 is an invalid pvid from the bridge's point of view, let's
add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that
doesn't really exist.
Fixes: 5f33183b7f ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer says:
====================
seg6: fixes to Segment Routing in IPv6
This patchset is divided in 2 patches and it introduces some fixes
to Segment Routing in IPv6, which are:
- in function get_srh() fix the srh pointer after calling
pskb_may_pull();
- fix the skb->transport_header after calling decap_and_validate()
function;
Any comments on the patchset are welcome.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
in the receive path (more precisely in ip6_rcv_core()) the
skb->transport_header is set to skb->network_header + sizeof(*hdr). As a
consequence, after routing operations, destination input expects to find
skb->transport_header correctly set to the next protocol (or extension
header) that follows the network protocol. However, decap behaviors (DX*,
DT*) remove the outer IPv6 and SRH extension and do not set again the
skb->transport_header pointer correctly. For this reason, the patch sets
the skb->transport_header to the skb->network_header + sizeof(hdr) in each
DX* and DT* behavior.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
pskb_may_pull may change pointers in header. For this reason, it is
mandatory to reload any pointer that points into skb header.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style in
header files related to STMicroelectronics based Multi-Gigabit
Ethernet driver. For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used).
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects the SPDX License Identifier style in
header files related to Marvell OcteonTX2 network devices.
It uses an expilict block comment for the SPDX License
Identifier.
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge misc fixes from Andrew Morton:
"11 fixes"
MM fixes and one xz decompressor fix.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/debug.c: PageAnon() is true for PageKsm() pages
mm/debug.c: __dump_page() prints an extra line
mm/page_io.c: do not free shared swap slots
mm/memory_hotplug: fix try_offline_node()
mm,thp: recheck each page before collapsing file THP
mm: slub: really fix slab walking for init_on_free
mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
mm: mempolicy: fix the wrong return value and potential pages leak of mbind
Pull more input fixes from Dmitry Torokhov:
"A couple of fixes in driver teardown paths and another ID for
Synaptics RMI mode"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics - enable RMI mode for X1 Extreme 2nd Generation
Input: synaptics-rmi4 - destroy F54 poller workqueue when removing
Input: ff-memless - kill timer in destroy()
PageAnon() and PageKsm() use the low two bits of the page->mapping
pointer to indicate the page type. PageAnon() only checks the LSB while
PageKsm() checks the least significant 2 bits are equal to 3.
Therefore, PageAnon() is true for KSM pages. __dump_page() incorrectly
will never print "ksm" because it checks PageAnon() first. Fix this by
checking PageKsm() first.
Link: http://lkml.kernel.org/r/20191113000651.20677-1-rcampbell@nvidia.com
Fixes: 1c6fb1d89e ("mm: print more information about mapping in __dump_page")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When dumping struct page information, __dump_page() prints the page type
with a trailing blank followed by the page flags on a separate line:
anon
flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)
It looks like the intent was to use pr_cont() for printing "flags:" but
pr_cont() usage is discouraged so fix this by extending the format to
include the flags into a single line:
anon flags: 0x100000000090034(uptodate|lru|active|head|swapbacked)
If the page is file backed, the name might be long so use two lines:
shmem_aops name:"dev/zero"
flags: 0x10000000008000c(uptodate|dirty|swapbacked)
Eliminate pr_conf() usage as well for appending compound_mapcount.
Link: http://lkml.kernel.org/r/20191112012608.16926-1-rcampbell@nvidia.com
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following race is observed due to which a processes faulting on a
swap entry, finds the page neither in swapcache nor swap. This causes
zram to give a zero filled page that gets mapped to the process,
resulting in a user space crash later.
Consider parent and child processes Pa and Pb sharing the same swap slot
with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set.
Virtual address 'VA' of Pa and Pb points to the shared swap entry.
Pa Pb
fault on VA fault on VA
do_swap_page do_swap_page
lookup_swap_cache fails lookup_swap_cache fails
Pb scheduled out
swapin_readahead (deletes zram entry)
swap_free (makes swap_count 1)
Pb scheduled in
swap_readpage (swap_count == 1)
Takes SWP_SYNCHRONOUS_IO path
zram enrty absent
zram gives a zero filled page
Fix this by making sure that swap slot is freed only when swap count
drops down to one.
Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org
Fixes: aa8d22a11d ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference")
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Suggested-by: Minchan Kim <minchan@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
try_offline_node() is pretty much broken right now:
- The node span is updated when onlining memory, not when adding it. We
ignore memory that was mever onlined. Bad.
- We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
trigger a kernel panic. Bad for memory that is offline but also bad
for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
first PFN of a section might contain garbage.
- Sections belonging to mixed nodes are not properly considered.
As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE). This makes things
more complicated.
Luckily, we can piggy pack on the node span and the nid stored in memory
blocks. Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node(). Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.
If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline. As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).
Introduce for_each_memory_block() to efficiently walk all memory blocks.
Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized. This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.
Since commit f1dd2cd13c ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized). The introducing
commit 60a5a19e74 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.
I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node. The node is properly onlined when adding the DIMMs. When
removing the DIMMs, the node is properly offlined.
Masayoshi Mizuma reported:
: Without this patch, memory hotplug fails as panic:
:
: BUG: kernel NULL pointer dereference, address: 0000000000000000
: ...
: Call Trace:
: remove_memory_block_devices+0x81/0xc0
: try_remove_memory+0xb4/0x130
: __remove_memory+0xa/0x20
: acpi_memory_device_remove+0x84/0x100
: acpi_bus_trim+0x57/0x90
: acpi_bus_trim+0x2e/0x90
: acpi_device_hotplug+0x2b2/0x4d0
: acpi_hotplug_work_fn+0x1a/0x30
: process_one_work+0x171/0x380
: worker_thread+0x49/0x3f0
: kthread+0xf8/0x130
: ret_from_fork+0x35/0x40
[david@redhat.com: v3]
Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
Fixes: 60a5a19e74 ("memory-hotplug: remove sysfs file of node")
Fixes: f1dd2cd13c ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Nayna Jain <nayna@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In collapse_file(), for !is_shmem case, current check cannot guarantee
the locked page is up-to-date. Specifically, xas_unlock_irq() should
not be called before lock_page() and get_page(); and it is necessary to
recheck PageUptodate() after locking the page.
With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
may contain corrupted data. This is because khugepaged mistakenly
collapses some not up-to-date sub pages into a huge page, and assumes
the huge page is up-to-date. This will NOT corrupt data in the disk,
because the page is read-only and never written back. Fix this by
properly checking PageUptodate() after locking the page. This check
replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".
Also, move PageDirty() check after locking the page. Current khugepaged
should not try to collapse dirty file THP, because it is limited to
read-only .text. The only case we hit a dirty page here is when the
page hasn't been written since write. Bail out and retry when this
happens.
syzbot reported bug on previous version of this patch.
Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com
Fixes: 99cb0dbd47 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 1b7e816fc8 ("mm: slub: Fix slab walking for init_on_free")
fixed one problem with the slab walking but missed a key detail: When
walking the list, the head and tail pointers need to be updated since we
end up reversing the list as a result. Without doing this, bulk free is
broken.
One way this is exposed is a NULL pointer with slub_debug=F:
=============================================================================
BUG skbuff_head_cache (Tainted: G T): Object already free
-----------------------------------------------------------------------------
INFO: Slab 0x000000000d2d2f8f objects=16 used=3 fp=0x0000000064309071 flags=0x3fff00000000201
BUG: kernel NULL pointer dereference, address: 0000000000000000
Oops: 0000 [#1] PREEMPT SMP PTI
RIP: 0010:print_trailer+0x70/0x1d5
Call Trace:
<IRQ>
free_debug_processing.cold.37+0xc9/0x149
__slab_free+0x22a/0x3d0
kmem_cache_free_bulk+0x415/0x420
__kfree_skb_flush+0x30/0x40
net_rx_action+0x2dd/0x480
__do_softirq+0xf0/0x246
irq_exit+0x93/0xb0
do_IRQ+0xa0/0x110
common_interrupt+0xf/0xf
</IRQ>
Given we're now almost identical to the existing debugging code which
correctly walks the list, combine with that.
Link: https://lkml.kernel.org/r/20191104170303.GA50361@gandi.net
Link: http://lkml.kernel.org/r/20191106222208.26815-1-labbott@redhat.com
Fixes: 1b7e816fc8 ("mm: slub: Fix slab walking for init_on_free")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reported-by: Thibaut Sautereau <thibaut.sautereau@clip-os.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <clipos@ssi.gouv.fr>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An exiting task might belong to an offline cgroup. In this case an
attempt to grab a cgroup reference from the task can end up with an
infinite loop in hugetlb_cgroup_charge_cgroup(), because neither the
cgroup will become online, neither the task will be migrated to a live
cgroup.
Fix this by switching over to css_tryget(). As css_tryget_online()
can't guarantee that the cgroup won't go offline, in most cases the
check doesn't make sense. In this particular case users of
hugetlb_cgroup_charge_cgroup() are not affected by this change.
A similar problem is described by commit 18fa84a2db ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Link: http://lkml.kernel.org/r/20191106225131.3543616-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've encountered a rcu stall in get_mem_cgroup_from_mm():
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
(t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
<...>
RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
<...>
__memcg_kmem_charge+0x55/0x140
__alloc_pages_nodemask+0x267/0x320
pipe_write+0x1ad/0x400
new_sync_write+0x127/0x1c0
__kernel_write+0x4f/0xf0
dump_emit+0x91/0xc0
writenote+0xa0/0xc0
elf_core_dump+0x11af/0x1430
do_coredump+0xc65/0xee0
get_signal+0x132/0x7c0
do_signal+0x36/0x640
exit_to_usermode_loop+0x61/0xd0
do_syscall_64+0xd4/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The problem is caused by an exiting task which is associated with an
offline memcg. We're iterating over and over in the do {} while
(!css_tryget_online()) loop, but obviously the memcg won't become online
and the exiting task won't be migrated to a live memcg.
Let's fix it by switching from css_tryget_online() to css_tryget().
As css_tryget_online() cannot guarantee that the memcg won't go offline,
the check is usually useless, except some rare cases when for example it
determines if something should be presented to a user.
A similar problem is described by commit 18fa84a2db ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Johannes:
: The bug aside, it doesn't matter whether the cgroup is online for the
: callers. It used to matter when offlining needed to evacuate all charges
: from the memcg, and so needed to prevent new ones from showing up, but we
: don't care now.
Link: http://lkml.kernel.org/r/20191106225131.3543616-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Shakeel Butt <shakeeb@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>