IEEE 802.11-2016 14.10.8.3 HWMP sequence numbering says:
If it is a target mesh STA, it shall update its own HWMP SN to
maximum (current HWMP SN, target HWMP SN in the PREQ element) + 1
immediately before it generates a PREP element in response to a
PREQ element.
Signed-off-by: Yuan-Chi Pang <fu3mo6goo@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
freq_reg_info expects to get the frequency in kHz. Instead we
accidently pass it in MHz. Thus, currently the function always
return ERR rule. Fix that.
Fixes: 50f32718e1 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[fix kHz/MHz in commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
TXOP (also known as Channel Occupancy Time) is u16 and should be
added using nla_put_u16 instead of u8, fix that.
Fixes: 50f32718e1 ("nl80211: Add wmm rule attribute to NL80211_CMD_GET_WIPHY dump command")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
I changed the way mac80211 updates the PM state of the peer.
I forgot that we could also have multicast frames from the
peer and that those frame should of course not change the
PM state of the peer: A peer goes to power save when it
needs to scan, but it won't send the broadcast Probe Request
with the PM bit set.
This made us mark the peer as awake when it wasn't and then
Intel's firmware would fail to transmit because the peer is
asleep according to its database. The driver warned about
this and it looked like this:
WARNING: CPU: 0 PID: 184 at /usr/src/linux-4.16.14/drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1369 iwl_mvm_rx_tx_cmd+0x53b/0x860
CPU: 0 PID: 184 Comm: irq/124-iwlwifi Not tainted 4.16.14 #1
RIP: 0010:iwl_mvm_rx_tx_cmd+0x53b/0x860
Call Trace:
iwl_pcie_rx_handle+0x220/0x880
iwl_pcie_irq_handler+0x6c9/0xa20
? irq_forced_thread_fn+0x60/0x60
? irq_thread_dtor+0x90/0x90
The relevant code that spits the WARNING is:
case TX_STATUS_FAIL_DEST_PS:
/* the FW should have stopped the queue and not
* return this status
*/
WARN_ON(1);
info->flags |= IEEE80211_TX_STAT_TX_FILTERED;
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199967.
Fixes: 9fef654433 ("mac80211: always update the PM state of a peer on MGMT / DATA frames")
Cc: <stable@vger.kernel.org> #4.16+
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Make wmm_rule be part of the reg_rule structure. This simplifies the
code a lot at the cost of having bigger memory usage. However in most
cases we have only few reg_rule's and when we do have many like in
iwlwifi we do not save memory as it allocates a separate wmm_rule for
each channel anyway.
This also fixes a bug reported in various places where somewhere the
pointers were corrupted and we ended up doing a null-dereference.
Fixes: 230ebaa189 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[rephrase commit message slightly]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The mod mask for VHT capabilities intends to say that you can override
the number of STBC receive streams, and it does, but only by accident.
The IEEE80211_VHT_CAP_RXSTBC_X aren't bits to be set, but values (albeit
left-shifted). ORing the bits together gets the right answer, but we
should use the _MASK macro here instead.
Signed-off-by: Danek Duvall <duvall@comfychair.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Pointer arithmetic already adjusts by the size of the struct,
so the sizeof() calculation is wrong. This is basically the
same as Colin King's patch for similar code in the iwlwifi
driver.
Fixes: 230ebaa189 ("cfg80211: read wmm rules from regulatory database")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The TXQ teardown code can reference the vif data structures that are
stored in the netdev private memory area if there are still packets on
the queue when it is being freed. Since the TXQ teardown code is run
after the netdevs are freed, this can lead to a use-after-free. Fix this
by moving the TXQ teardown code to earlier in ieee80211_unregister_hw().
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
One more driver is apparently broken by the recent change
to linux/platform_device.h:
net/rfkill/rfkill-gpio.c: In function 'rfkill_gpio_acpi_probe':
net/rfkill/rfkill-gpio.c:82:29: error: dereferencing pointer to incomplete type 'const struct acpi_device_id'
Include linux/mod_devicetable.h to get the definition of the
acpi_device_id structure.
Fixes: ac3167257b ("headers: separate linux/mod_devicetable.h from linux/platform_device.h")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
For a port to be able to use EEE, both the MAC and the PHY must
support EEE. A phy can be provided by both a phydev or phylink. Verify
at least one of these exist, not just phydev.
Fixes: aab9c4067d ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an SMC socket is connecting it is decided whether fallback to
TCP is needed. To avoid races between connect and ioctl move the
sock lock before the use_fallback check.
Reported-by: syzbot+5b2cece1a8ecb2ca77d8@syzkaller.appspotmail.com
Reported-by: syzbot+19557374321ca3710990@syzkaller.appspotmail.com
Fixes: 1992d99882 ("net/smc: take sock lock in smc_ioctl()")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without setsockopt SO_SNDBUF and SO_RCVBUF settings, the sysctl
defaults net.ipv4.tcp_wmem and net.ipv4.tcp_rmem should be the base
for the sizes of the SMC sndbuf and rcvbuf. Any TCP buffer size
optimizations for servers should be ignored.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Invoking shutdown for a socket in state SMC_LISTEN does not make
sense. Nevertheless programs like syzbot fuzzing the kernel may
try to do this. For SMC this means a socket refcounting problem.
This patch makes sure a shutdown call for an SMC socket in state
SMC_LISTEN simply returns with -ENOTCONN.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_RXRPC has a keepalive message generator that generates a message for a
peer ~20s after the last transmission to that peer to keep firewall ports
open. The implementation is incorrect in the following ways:
(1) It mixes up ktime_t and time64_t types.
(2) It uses ktime_get_real(), the output of which may jump forward or
backward due to adjustments to the time of day.
(3) If the current time jumps forward too much or jumps backwards, the
generator function will crank the base of the time ring round one slot
at a time (ie. a 1s period) until it catches up, spewing out VERSION
packets as it goes.
Fix the problem by:
(1) Only using time64_t. There's no need for sub-second resolution.
(2) Use ktime_get_seconds() rather than ktime_get_real() so that time
isn't perceived to go backwards.
(3) Simplifying rxrpc_peer_keepalive_worker() by splitting it into two
parts:
(a) The "worker" function that manages the buckets and the timer.
(b) The "dispatch" function that takes the pending peers and
potentially transmits a keepalive packet before putting them back
in the ring into the slot appropriate to the revised last-Tx time.
(4) Taking everything that's pending out of the ring and splicing it into
a temporary collector list for processing.
In the case that there's been a significant jump forward, the ring
gets entirely emptied and then the time base can be warped forward
before the peers are processed.
The warping can't happen if the ring isn't empty because the slot a
peer is in is keepalive-time dependent, relative to the base time.
(5) Limit the number of iterations of the bucket array when scanning it.
(6) Set the timer to skip any empty slots as there's no point waking up if
there's nothing to do yet.
This can be triggered by an incoming call from a server after a reboot with
AF_RXRPC and AFS built into the kernel causing a peer record to be set up
before userspace is started. The system clock is then adjusted by
userspace, thereby potentially causing the keepalive generator to have a
meltdown - which leads to a message like:
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:23]
...
Workqueue: krxrpcd rxrpc_peer_keepalive_worker
EIP: lock_acquire+0x69/0x80
...
Call Trace:
? rxrpc_peer_keepalive_worker+0x5e/0x350
? _raw_spin_lock_bh+0x29/0x60
? rxrpc_peer_keepalive_worker+0x5e/0x350
? rxrpc_peer_keepalive_worker+0x5e/0x350
? __lock_acquire+0x3d3/0x870
? process_one_work+0x110/0x340
? process_one_work+0x166/0x340
? process_one_work+0x110/0x340
? worker_thread+0x39/0x3c0
? kthread+0xdb/0x110
? cancel_delayed_work+0x90/0x90
? kthread_stop+0x70/0x70
? ret_from_fork+0x19/0x24
Fixes: ace45bec6d ("rxrpc: Fix firewall route keepalive")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
llc_sap_put() decreases the refcnt before deleting sap
from the global list. Therefore, there is a chance
llc_sap_find() could find a sap with zero refcnt
in this global list.
Close this race condition by checking if refcnt is zero
or not in llc_sap_find(), if it is zero then it is being
removed so we can just treat it as gone.
Reported-by: <syzbot+278893f3f7803871f7ce@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The shift of 'cwnd' with '(now - hc->tx_lsndtime) / hc->tx_rto' value
can lead to undefined behavior [1].
In order to fix this use a gradual shift of the window with a 'while'
loop, similar to what tcp_cwnd_restart() is doing.
When comparing delta and RTO there is a minor difference between TCP
and DCCP, the last one also invokes dccp_cwnd_restart() and reduces
'cwnd' if delta equals RTO. That case is preserved in this change.
[1]:
[40850.963623] UBSAN: Undefined behaviour in net/dccp/ccids/ccid2.c:237:7
[40851.043858] shift exponent 67 is too large for 32-bit type 'unsigned int'
[40851.127163] CPU: 3 PID: 15940 Comm: netstress Tainted: G W E 4.18.0-rc7.x86_64 #1
...
[40851.377176] Call Trace:
[40851.408503] dump_stack+0xf1/0x17b
[40851.451331] ? show_regs_print_info+0x5/0x5
[40851.503555] ubsan_epilogue+0x9/0x7c
[40851.548363] __ubsan_handle_shift_out_of_bounds+0x25b/0x2b4
[40851.617109] ? __ubsan_handle_load_invalid_value+0x18f/0x18f
[40851.686796] ? xfrm4_output_finish+0x80/0x80
[40851.739827] ? lock_downgrade+0x6d0/0x6d0
[40851.789744] ? xfrm4_prepare_output+0x160/0x160
[40851.845912] ? ip_queue_xmit+0x810/0x1db0
[40851.895845] ? ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40851.963530] ccid2_hc_tx_packet_sent+0xd36/0x10a0 [dccp]
[40852.029063] dccp_xmit_packet+0x1d3/0x720 [dccp]
[40852.086254] dccp_write_xmit+0x116/0x1d0 [dccp]
[40852.142412] dccp_sendmsg+0x428/0xb20 [dccp]
[40852.195454] ? inet_dccp_listen+0x200/0x200 [dccp]
[40852.254833] ? sched_clock+0x5/0x10
[40852.298508] ? sched_clock+0x5/0x10
[40852.342194] ? inet_create+0xdf0/0xdf0
[40852.388988] sock_sendmsg+0xd9/0x160
...
Fixes: 113ced1f52 ("dccp ccid-2: Perform congestion-window validation")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 9faa89d4ed ("tipc: make function tipc_net_finalize() thread
safe") tries to make it thread safe to set node address, so it uses
node_list_lock lock to serialize the whole process of setting node
address in tipc_net_finalize(). But it causes the following interrupt
unsafe locking scenario:
CPU0 CPU1
---- ----
rht_deferred_worker()
rhashtable_rehash_table()
lock(&(&ht->lock)->rlock)
tipc_nl_compat_doit()
tipc_net_finalize()
local_irq_disable();
lock(&(&tn->node_list_lock)->rlock);
tipc_sk_reinit()
rhashtable_walk_enter()
lock(&(&ht->lock)->rlock);
<Interrupt>
tipc_disc_rcv()
tipc_node_check_dest()
tipc_node_create()
lock(&(&tn->node_list_lock)->rlock);
*** DEADLOCK ***
When rhashtable_rehash_table() holds ht->lock on CPU0, it doesn't
disable BH. So if an interrupt happens after the lock, it can create
an inverse lock ordering between ht->lock and tn->node_list_lock. As
a consequence, deadlock might happen.
The reason causing the inverse lock ordering scenario above is because
the initial purpose of node_list_lock is not designed to do the
serialization of node address setting.
As cmpxchg() can guarantee CAS (compare-and-swap) process is atomic,
we use it to replace node_list_lock to ensure setting node address can
be atomically finished. It turns out the potential deadlock can be
avoided as well.
Fixes: 9faa89d4ed ("tipc: make function tipc_net_finalize() thread safe")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported that we reinitialize an active delayed
work in vsock_stream_connect():
ODEBUG: init active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x90 kernel/workqueue.c:1414
WARNING: CPU: 1 PID: 11518 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
The pattern is apparently wrong, we should only initialize
the dealyed work once and could repeatly schedule it. So we
have to move out the initializations to allocation side.
And to avoid confusion, we can split the shared dwork
into two, instead of re-using the same one.
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Reported-by: <syzbot+8a9b1bd330476a4f3db6@syzkaller.appspotmail.com>
Cc: Andy king <acking@vmware.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TPACKET_V3 stores variable length frames in fixed length blocks.
Blocks must be able to store a block header, optional private space
and at least one minimum sized frame.
Frames, even for a zero snaplen packet, store metadata headers and
optional reserved space.
In the block size bounds check, ensure that the frame of the
chosen configuration fits. This includes sockaddr_ll and optional
tp_reserve.
Syzbot was able to construct a ring with insuffient room for the
sockaddr_ll in the header of a zero-length frame, triggering an
out-of-bounds write in dev_parse_header.
Convert the comparison to less than, as zero is a valid snap len.
This matches the test for minimum tp_frame_size immediately below.
Fixes: f6fb8f100b ("af-packet: TPACKET_V3 flexible buffer implementation.")
Fixes: eb73190f4f ("net/packet: refine check for priv area size")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RFC791, 68 bytes is the minimum size of IPv4 datagram every
device must be able to forward without further fragmentation while 576
bytes is the minimum size of IPv4 datagram every device has to be able
to receive, so in ip6_tnl_xmit(), 68(IPV4_MIN_MTU) should be the right
value for the ipv4 min mtu check in ip6_tnl_xmit.
While at it, change to use max() instead of if statement.
Fixes: c9fefa0819 ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the callers of ip6_rt_copy_init()/rt6_set_from() hold refcnt
of the "from" fib6_info, so there is no need to hold fib6_metrics
refcnt again, because fib6_metrics refcnt is only released when
fib6_info is gone, that is, they have the same life time, so the
whole fib6_metrics refcnt can be removed actually.
This fixes a kmemleak warning reported by Sabrina.
Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Sabrina Dubroca <sd@queasysnail.net>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's legal to have 64 groups for netlink_sock.
As user-supplied nladdr->nl_groups is __u32, it's possible to subscribe
only to first 32 groups.
The check for correctness of .bind() userspace supplied parameter
is done by applying mask made from ngroups shift. Which broke Android
as they have 64 groups and the shift for mask resulted in an overflow.
Fixes: 61f4b23769 ("netlink: Don't shift with UB on nlk->ngroups")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Reported-and-Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-08-05
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix bpftool percpu_array dump by using correct roundup to next
multiple of 8 for the value size, from Yonghong.
2) Fix in AF_XDP's __xsk_rcv_zc() to not returning frames back to
allocator since driver will recycle frame anyway in case of an
error, from Jakub.
3) Fix up BPF test_lwt_seg6local test cases to final iproute2
syntax, from Mathieu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If a writer blocked condition is received without data, the current
consumer cursor is immediately sent. Servers could already receive this
condition in state SMC_INIT without finished tx-setup. This patch
avoids sending a consumer cursor update in this case.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If 'session' is not NULL and is not a PPP pseudo-wire, then we fail to
drop the reference taken by l2tp_session_get().
Fixes: ecd012e45a ("l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit df18b50448.
This change causes other problems and use-after-free situations as
found by syzbot.
Signed-off-by: David S. Miller <davem@davemloft.net>
There just check the user call ID isn't already in use, hence should
compare user_call_ID with xcall->user_call_ID, which is current
node's user_call_ID.
Fixes: 540b1c48c3 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Suggested-by: David Howells <dhowells@redhat.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a DSA slave network device was previously disabled, there is no need
to suspend or resume it.
Fixes: 2446254915 ("net: dsa: allow switch drivers to implement suspend/resume hooks")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
'protocol' is a user-controlled value, so sanitize it after the bounds
check to avoid using it for speculative out-of-bounds access to arrays
indexed by it.
This addresses the following accesses detected with the help of smatch:
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_keys' [w]
* net/netlink/af_netlink.c:654 __netlink_create() warn: potential
spectre issue 'nlk_cb_mutex_key_strings' [w]
* net/netlink/af_netlink.c:685 netlink_create() warn: potential spectre
issue 'nl_table' [w] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_frag_queue() might call pskb_pull() on one skb that
is already in the fragment queue.
We need to take care of possible truesize change, or we
might have an imbalance of the netns frags memory usage.
IPv6 is immune to this bug, because RFC5722, Section 4,
amended by Errata ID 3089 states :
When reassembling an IPv6 datagram, if
one or more its constituent fragments is determined to be an
overlapping fragment, the entire datagram (and any constituent
fragments) MUST be silently discarded.
Fixes: 158f323b98 ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently check current frags memory usage only when
a new frag queue is created. This allows attackers to first
consume the memory budget (default : 4 MB) creating thousands
of frag queues, then sending tiny skbs to exceed high_thresh
limit by 2 to 3 order of magnitude.
Note that before commit 648700f76b ("inet: frags: use rhashtables
for reassembly units"), work queue could be starved under DOS,
getting no cpu cycles.
After commit 648700f76b, only the per frag queue timer can eventually
remove an incomplete frag queue and its skbs.
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Peter Oskolkov <posk@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_return_buff() is used when frame has been successfully
handled (transmitted) or if an error occurred during delayed
processing and there is no way to report it back to
xdp_do_redirect().
In case of __xsk_rcv_zc() error is propagated all the way
back to the driver, so there is no need to call
xdp_return_buff(). Driver will recycle the frame anyway
after seeing that error happened.
Fixes: 173d3adb6f ("xsk: add zero-copy support for Rx")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On i386 nlk->ngroups might be 32 or 0. Which leads to UB, resulting in
hang during boot.
Check for 0 ngroups and use (unsigned long long) as a type to shift.
Fixes: 7acf9d4237 ("netlink: Do not subscribe to non-existent groups").
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit d4ead6b34b ("net/ipv6: move metrics from dst to
rt6_info"), ipv6 metrics are shared and refcounted. rt6_set_from()
assigns the rt->from pointer and increases the refcount on from's
metrics. This reference is never released.
Introduce the fib6_metrics_release() helper and use it to release the
metrics.
Fixes: d4ead6b34b ("net/ipv6: move metrics from dst to rt6_info")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The meter code would create an entry for each new meter. However, it
would not set the meter id in the new entry, so every meter would appear
to have a meter id of zero. This commit properly sets the meter id when
adding the entry.
Fixes: 96fbc13d7e ("openvswitch: Add meter infrastructure")
Signed-off-by: Justin Pettit <jpettit@ovn.org>
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some very small BDPs (with just a few packets) there was a
quantization effect where the target number of packets in flight
during the super-unity-gain (1.25x) phase of gain cycling was
implicitly truncated to a number of packets no larger than the normal
unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow
scenarios some flows could get stuck with a lower bandwidth, because
they did not push enough packets inflight to discover that there was
more bandwidth available. This was really only an issue in multi-flow
LAN scenarios, where RTTs and BDPs are low enough for this to be an
issue.
This fix ensures that gain cycling can raise inflight for small BDPs
by ensuring that in PROBE_BW mode target inflight values with a
super-unity gain are always greater than inflight values with a gain
<= 1. Importantly, this applies whether the inflight value is
calculated for use as a cwnd value, or as a target inflight value for
the end of the super-unity phase in bbr_is_next_cycle_phase() (both
need to be bigger to ensure we can probe with more packets in flight
reliably).
This is a candidate fix for stable releases.
Fixes: 0f8782ea14 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'family' can be a user-controlled value, so sanitize it after the bounds
check to avoid speculative out-of-bounds access.
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'call' is a user-controlled value, so sanitize the array index after the
bounds check to avoid speculating past the bounds of the 'nargs' array.
Found with the help of Smatch:
net/socket.c:2508 __do_sys_socketcall() warn: potential spectre issue
'nargs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-28
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) API fixes for libbpf's BTF mapping of map key/value types in order
to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
markings, from Martin.
2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
consumer pointer of the RX queue, from Björn.
3) Fix __xdp_return() to check for NULL pointer after the rhashtable
lookup that retrieves the allocator object, from Taehee.
4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
by 4 bytes which got removed from overall stack usage, from Wang.
5) Fix bpf_skb_load_bytes_relative() length check to use actual
packet length, from Daniel.
6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
handler, from Thomas.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove BUG_ON() from fib_compute_spec_dst routine and check
in_dev pointer during flowi4 data structure initialization.
fib_compute_spec_dst routine can be run concurrently with device removal
where ip_ptr net_device pointer is set to NULL. This can happen
if userspace enables pkt info on UDP rx socket and the device
is removed while traffic is flowing
Fixes: 35ebf65e85 ("ipv4: Create and use fib_compute_spec_dst() helper")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The len > skb_headlen(skb) cannot be used as a maximum upper bound
for the packet length since it does not have any relation to the full
linear packet length when filtering is used from upper layers (e.g.
in case of reuseport BPF programs) as by then skb->data, skb->len
already got mangled through __skb_pull() and others.
Fixes: 4e1ec56cdc ("bpf: add skb_load_bytes_relative helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Steffen Klassert says:
====================
pull request (net): ipsec 2018-07-27
1) Fix PMTU handling of vti6. We update the PMTU on
the xfrm dst_entry which is not cached anymore
after the flowchache removal. So update the
PMTU of the original dst_entry instead.
From Eyal Birger.
2) Fix a leak of kernel memory to userspace.
From Eric Dumazet.
3) Fix a possible dst_entry memleak in xfrm_lookup_route.
From Tommi Rantala.
4) Fix a skb leak in case we can't call nlmsg_multicast
from xfrm_nlmsg_multicast. From Florian Westphal.
5) Fix a leak of a temporary buffer in the error path of
esp6_input. From Zhen Lei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
rhashtable_lookup() can return NULL. so that NULL pointer
check routine should be added.
Fixes: 02b55e5657 ("xdp: add MEM_TYPE_ZERO_COPY")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix dev_change_tx_queue_len so it rolls back original value
upon a failure in dev_qdisc_change_tx_queue_len.
This is already done for notifirers' failures, share the code.
In case of failure in dev_qdisc_change_tx_queue_len, some tx queues
would still be of the new length, while they should be reverted.
Currently, the revert is not done, and is marked with a TODO label
in dev_qdisc_change_tx_queue_len, and should find some nice solution
to do it.
Yet it is still better to not apply the newly requested value.
Fixes: 48bfd55e7e ("net_sched: plug in qdisc ops change_tx_queue_len")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Ran Rozenstein <ranro@mellanox.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Polling for the ingress queues relies on reading the producer/consumer
pointers of the Rx queue.
Prior this commit, a cached consumer pointer could be used, instead of
the actual consumer pointer and therefore report POLLIN prematurely.
This patch makes sure that the non-cached consumer pointer is used
instead.
Reported-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Qi Zhang <qi.z.zhang@intel.com>
Fixes: c497176cb2 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fixes the following sparse warnings:
net/ipv4/igmp.c:1391:6: warning:
symbol '__ip_mc_inc_group' was not declared. Should it be static?
Fixes: 6e2059b53f ("ipv4/igmp: init group mode as INCLUDE when join source group")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>