It looks like these meant to be unreffing the
of_parse_phandle_with_args() node, since the error paths above it
don't do of_node_put. That function returns a new ref in pd_args.np,
though, not a new ref on dev->of_node. Also, it would have leaked the
ref in the success case.
Fixes "ERROR: Bad of_node_put()" on bcm2835 in the -EPROBE_DEFER case.
Fixes: aa42240ab2 (PM / Domains: Add generic OF-based PM domain look-up)
Signed-off-by: Eric Anholt <eric@anholt.net>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kevin Hilman <khilman@linaro.org>
Cc: 3.18+ <stable@vger.kernel.org> # 3.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull networking fixes from David Miller:
"A lot of Thanksgiving turkey leftovers accumulated, here goes:
1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.
2) IDs for some new iwlwifi chips, from Oren Givon.
3) Fix rtlwifi lockups on boot, from Larry Finger.
4) Fix memory leak in fm10k, from Stephen Hemminger.
5) We have a route leak in the ipv6 tunnel infrastructure, fix from
Paolo Abeni.
6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.
7) Wrong lockdep annotations in tcp md5 support, fix from Eric
Dumazet.
8) Work around some middle boxes which prevent proper handling of TCP
Fast Open, from Yuchung Cheng.
9) TCP repair can do huge kmalloc() requests, build paged SKBs
instead. From Eric Dumazet.
10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
Borkmann.
11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
Nikolay Aleksandrov.
12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
Weikusat.
13) Fix double free in VRF code, from Nikolay Aleksandrov.
14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.
15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.
16) Fix clearing of persistent array maps in bpf, from Daniel
Borkmann.
17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
early enough. From Eric Dumazet.
18) Fix out of bounds accesses in bpf array implementation when
updating elements, from Daniel Borkmann.
19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
Dumazet.
20) When dumping proxy neigh entries, we have to accomodate NULL
device pointers properly, from Konstantin Khlebnikov.
21) SCTP doesn't release all ipv6 socket resources properly, fix from
Eric Dumazet.
22) Prevent underflows of sch->q.qlen for multiqueue packet
schedulers, also from Eric Dumazet.
23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
Huang and Michael Chan.
24) Don't actively scan radar channels, from Antonio Quartulli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
net: phy: reset only targeted phy
bnxt_en: Setup uc_list mac filters after resetting the chip.
bnxt_en: enforce proper storing of MAC address
bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
net: lpc_eth: remove irq > NR_IRQS check from probe()
net_sched: fix qdisc_tree_decrease_qlen() races
openvswitch: fix hangup on vxlan/gre/geneve device deletion
ipv4: igmp: Allow removing groups from a removed interface
ipv6: sctp: implement sctp_v6_destroy_sock()
arm64: bpf: add 'store immediate' instruction
ipv6: kill sk_dst_lock
ipv6: sctp: add rcu protection around np->opt
net/neighbour: fix crash at dumping device-agnostic proxy entries
sctp: use GFP_USER for user-controlled kmalloc
sctp: convert sack_needed and sack_generation to bits
ipv6: add complete rcu protection around np->opt
bpf: fix allocation warnings in bpf maps and integer overflow
mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
net: mvneta: enable setting custom TX IP checksum limit
net: mvneta: fix error path for building skb
...
Pull block fixes from Jens Axboe:
"A collection of fixes from this series. The most important here is a
regression fix for an issue that some folks would hit in blk-merge.c,
and the NVMe queue depth limit for the screwed up Apple "nvme"
controller.
In more detail, this pull request contains:
- a set of fixes for null_blk, including a fix for a few corner cases
where we could hang the device. From Arianna and Paolo.
- lightnvm:
- A build improvement from Keith.
- Update the qemu pci id detection from Matias.
- Error handling fixes for leaks and other little fixes from
Sudip and Wenwei.
- fix from Eric where BLKRRPART would not return EBUSY for whole
device mounts, only when partitions were mounted.
- fix from Jan Kara, where EOF O_DIRECT reads would return
negatively.
- remove check for rq_mergeable() when checking limits for cloned
requests. The check doesn't make any sense. It's assuming that
since NOMERGE is set on the request that we don't have to
recalculate limits since the request didn't change, but that's not
true if the request has been redirected. From Hannes.
- correctly get the bio front segment value set for single segment
bio's, fixing a BUG() in blk-merge. From Ming"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: temporary fix for Apple controller reset
null_blk: change type of completion_nsec to unsigned long
null_blk: guarantee device restart in all irq modes
null_blk: set a separate timer for each command
blk-merge: fix computing bio->bi_seg_front_size in case of single segment
direct-io: Fix negative return from dio read beyond eof
block: Always check queue limits for cloned requests
lightnvm: missing nvm_lock acquire
lightnvm: unconverted ppa returned in get_bb_tbl
lightnvm: refactor and change vendor id for qemu
lightnvm: do device max sectors boundary check first
lightnvm: fix ioctl memory leaks
lightnvm: free memory when gennvm register fails
lightnvm: Simplify config when disabled
Return EBUSY from BLKRRPART for mounted whole-dev fs
events on pids. It filters all events where only tasks with their pid in that
file exists. It also handles the sched_switch and sched_wakeup trace events
where the current task does not have its pid in the file, but the task
either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both of
these tracepoints use the same class as the sched_wakeup tracepoint, and
they too should be included in what gets filtered by the set_event_pid file.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWX7XpAAoJEKKk/i67LK/8G6gH/0W9QxCt/iS+J+x3gAPPs+/9
jtwZgAOWjq+118ZpWORtgRcoH2r/sJUNwlXqhMojAHsPlwZsr6TXkWJkgyNdZZ7B
QdUtZrr+egGYvd7TE0ONi/XrLTe9VLtBQsh5pN7l9fF9TjxYUmu5V9LplH9z0RxW
Hw8EzqGzG2iZnXYCnErtu5jRLmr18f2u9aUptPAc4bYPLVUUw9M9MqRV/ZwQxsaX
1mfIoR5SVC5IWW/R07qjULlbFpvNXkVJ56HwXMVBN44mYz3eUGYBKzjyAJ0Ugymf
CNDPzh4HgVFsEqDedr0D8T5WZNJSUErdbHVSWze+CCUNfYikvU7gNmxNJ89Q3P4=
=LsnU
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"During the merge window I added a new file that is used to filter
trace events on pids. It filters all events where only tasks with
their pid in that file exists. It also handles the sched_switch and
sched_wakeup trace events where the current task does not have its pid
in the file, but the task either being switched to or awaken does.
Unfortunately, I forgot about sched_wakeup_new and sched_waking. Both
of these tracepoints use the same class as the sched_wakeup
tracepoint, and they too should be included in what gets filtered by
the set_event_pid file"
* tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJWX2JpAAoJEGt7eEactAAd+9cQAJmn3zt0orj/sASv7BeF0h5d
sRfAhkBVOTZur8MgVj1c7fNzT3h1HYNei6c4SA2+rphy6Vbifoli1nLNloC+1Ld4
2WXllEqVe473GqVofCxZHsYZPr2Inmhj7uMiDqvoKUiSRz7phmkY9m+Vju6WZG/W
F6FrTLqFS7UDHIYNYH1DNVSScd/89Gu6pHZvEpoHkrsvt5rEEZiPAQ7sDB4MAMSm
amETtuBqgX83gHR2G4UT2Z9r8TVdzhO+s7vvdVjj0qbP6C6BaS9IUXDjmm3gOvHy
G7j9MJuUC8w/2fZ5A5/l94OuN5rF/ZFMNkn2e6OIzg0HjEZh74CeLl21CnuxdNpB
ECmDVbKoI3OVoFbhEl7P5fBokzZsqhAXpZOmbYEeFRyO6lF2Mv9uzttsF6EOmCX0
BjIoEXOWA2o6IUD/8M6NjW+/B58SDDVi9Mg6D+7Dn7rUFlQ4pddjb0m94bI8GQQU
wl7gROMvYR3tIhiMs1bLF9jJgA831WGWu9eiq8mT2kHPaEV2bFO7OK+SUxyZu1M7
UhN4eoLpU84v9QNJ34N8RCiYxEZ1e6HQxBwQn/fDIWOjOHryZoArhicFY9aOEja4
9xBI9OJhBWOL4N4AFdmTExBdYudSgCTpX+/gQ4tSfedz3lqF79y8+PILwv6E1Q6D
8pScH/4pVo4v5omGaMpA
=vui7
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
channels (from Antonio)
* fix uninitialized variable in remain-on-channel that
could lead to treating frame TX as remain-on-channel
and not sending the frame at all
* remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
was broken and needs more work, we'll enable it later
* fix call_rcu() induced use-after-reset/free in mesh
(that was suddenly causing issues in certain tests)
* always request block-ack window size 64 as we found
some APs will otherwise crash (really ...)
* fix P2P-Device teardown sequence to avoid restarting
with uninitialized data
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible to address another chip on same MDIO bus. The case is
correctly handled for media advertising. It is taken into account
only if mii_data->phy_id == phydev->addr. However, this condition
was missing for reset case.
Signed-off-by: Jérôme Pouiller <jezz@sysmic.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: set mac address and uc_list bug fixes.
Fix ndo_set_mac_address() for PF and VF.
Re-apply uc_list after chip reset.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters. Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.
Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW. For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.
v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.
This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.
Fixes a runtime problem:
lpc-eth 31060000.ethernet: error getting resources.
lpc_eth: lpc-eth: not found (-6).
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.
One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[ 681.774821] PAX: sch->q.qlen: 0 n: 1
[ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1
[ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[ 681.774962] Call Trace:
[ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40
[ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100
[ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.
A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.
Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.
Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.
A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.
Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc39 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The non-PCI builds of the O day test project are failing:
On Thu, 2015-12-03 at 05:02 +0800, kbuild test robot wrote:
> warning: (SCSI_MPT2SAS) selects SCSI_MPT3SAS which has unmet direct
> dependencies (SCSI_LOWLEVEL && PCI && SCSI)
The problem is that select and depend don't interact because Kconfig doesn't
have a SAT solver, so depend picks up dependencies and select does onward
selects, but select doesn't pick up dependencies. To fix this, we need to add
the correct dependencies to the MPT2SAS option like this.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: b840c3627b
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.
If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a53 ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.
The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a53 "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2015-12-01
Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
the Security Manager L2CAP channel that'll happen for every LE
connection.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:
Load immediate to register
Store register
Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.
ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.
__ip6_dst_store() and ip6_dst_store() share same implementation.
sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()
Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
This is because ip6_dst_store() can be called from process context,
without any lock held.
As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()
Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch completes the work I did in commit 45f6fad84c
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.
This simply makes sure np->opt is used with proper RCU locking
and accessors.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because accounting resources for the root cgroup sometimes incurs
measureable overhead for workloads which don't care about cgroup and
often ends up calculating a number which is available elsewhere in a
slightly different form, cgroup is not in the business of providing
system-wide statistics. The pids controller which was introduced
recently was exposing "pids.current" at the root. This patch disable
accounting for root cgroup and removes the file from the root
directory.
While this is a userland visible behavior change, pids has been
available only in one version and was badly broken there, so I don't
think this will be noticeable. If it turns out to be a problem, we
can reinstate it for v1 hierarchies.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Consider the following v2 hierarchy.
P0 (+memory) --- P1 (-memory) --- A
\- B
P0 has memory enabled in its subtree_control while P1 doesn't. If
both A and B contain processes, they would belong to the memory css of
P1. Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter. IOW, enabling controllers
can cause atomic migrations into different csses.
The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses. pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.
WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
Modules linked in:
CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
...
ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
Call Trace:
[<ffffffff81551ffc>] dump_stack+0x4e/0x82
[<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
[<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
[<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
[<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
[<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
[<ffffffff81189016>] cgroup_attach_task+0x176/0x200
[<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
[<ffffffff81189684>] cgroup_procs_write+0x14/0x20
[<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
[<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
[<ffffffff81265f88>] __vfs_write+0x28/0xe0
[<ffffffff812666fc>] vfs_write+0xac/0x1a0
[<ffffffff81267019>] SyS_write+0x49/0xb0
[<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76
This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated. All controllers are
updated accordingly.
* Controllers which don't care whether there are one or multiple
target csses can be converted trivially. cpu, io, freezer, perf,
netclassid and netprio fall in this category.
* cpuset's current implementation assumes that there's single source
and destination and thus doesn't support v2 hierarchy already. The
only change made by this patchset is how that single destination css
is obtained.
* memory migration path already doesn't do anything on v2. How the
single destination css is obtained is updated and the prep stage of
mem_cgroup_can_attach() is reordered to accomodate the change.
* pids is the only controller which was affected by this bug. It now
correctly handles multi-destination migrations and no longer causes
counter underflow from incorrect accounting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
If one or more tasks get moved into a frozen css, the frozen state is
cleared up from the destination css so that it can be reasserted once
the migrated tasks are frozen. freezer_attach() implements this in
two separate steps - clearing CGROUP_FROZEN on the target css while
processing each task and propagating the clearing upwards after the
task loop is done if necessary.
This patch merges the two steps. Propagation now takes place inside
the task loop. This simplifies the code and prepares it for the fix
of multi-destination migration.
Signed-off-by: Tejun Heo <tj@kernel.org>
Contrary to what the datasheet says, the pre divider doesn't seem to be
incremented by one in the PLL2, but just uses the value from the register,
with 0 being a bypass.
This fixes the audio playing too fast.
Since we now have the same pre-divider flags, and the only difference with
the A10 is the post-divider offset, also remove the structure to just pass
the offset as an argument.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Fixes: eb662f8547 ("clk: sunxi: pll2: Add A13 support")
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Proxy entries could have null pointer to net-device.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c1420 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.
This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
They don't need to be any bigger than that and with this we start a new
bitfield for tracking association runtime stuff, like zero window
situation.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch addresses multiple problems :
UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.
Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())
This patch adds full RCU protection to np->opt
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For large map->value_size the user space can trigger memory allocation warnings like:
WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x695/0x14e0()
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50
[<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
[<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
[< inline >] __alloc_pages_slowpath mm/page_alloc.c:2989
[<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
[<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
[< inline >] alloc_pages include/linux/gfp.h:451
[<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
[<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
[<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
[< inline >] kmalloc_large include/linux/slab.h:390
[<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525
[< inline >] kmalloc include/linux/slab.h:463
[< inline >] map_update_elem kernel/bpf/syscall.c:288
[< inline >] SYSC_bpf kernel/bpf/syscall.c:744
To avoid never succeeding kmalloc with order >= MAX_ORDER check that
elem->value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512,
so keep those kmalloc-s as-is.
Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.
Fixes: aaac3ba95e ("bpf: charge user for creation of BPF maps and programs")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas says:
====================
Marvell Armada XP/370/38X Neta fixes
I'm sending v4 with corrected commit log of the last patch, in order to
avoid possible conflicts between the branches as suggested by Gregory
Clement.
Best regards,
Marcin Wojtas
Changes from v4:
* Correct commit log of patch 6/6
Changes from v2:
* Style fixes in patch updating mbus protection
* Remove redundant stable notifications except for patch 4/6
Changes from v1:
* update MBUS windows access protection register once, after whole loop
* add fixing value of MVNETA_RXQ_INTR_ENABLE_ALL_MASK
* add fixing error path for skb_build()
* add possibility of setting custom TX IP checksum limit in DT property
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ethernet controller found in the Armada 38x SoC's family support
TCP/IP checksumming with frame sizes larger than 1600 bytes, however
only on port 0.
This commit enables it by setting 'tx-csum-limit' to 9800B in
'ethernet@70000' node.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since Armada 38x SoC can support IP checksum for jumbo frames only on
a single port, it means that this feature should be enabled per-port,
rather than for the whole SoC.
This patch enables setting custom TX IP checksum limit by adding new
optional property to the mvneta device tree node. If not used, by
default 1600B is set for "marvell,armada-370-neta" and 9800B for other
strings, which ensures backward compatibility. Binding documentation
is updated accordingly.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the actual RX processing, there is same error path for both descriptor
ring refilling and building skb fails. This is not correct, because after
successful refill, the ring is already updated with newly allocated
buffer. Then, in case of build_skb() fail, hitherto code left the original
buffer unmapped.
This patch fixes above situation by swapping error check of skb build with
DMA-unmap of original buffer.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: <stable@vger.kernel.org> # v4.2+
Fixes a84e328941 ("net: mvneta: fix refilling for Rx DMA buffers")
Signed-off-by: David S. Miller <davem@davemloft.net>
A value originally defined in the driver was inappropriate. Even though
the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK
to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16]
are reserved and read-only.
This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with
the controller's documentation.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network
unit")
Signed-off-by: David S. Miller <davem@davemloft.net>
MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer
allocation was mistakenly set as BIT(1). This commit fixes the assignment.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network
unit")
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds missing configuration of MBUS windows access protection
in mvneta_conf_mbus_windows function - a dedicated variable for that
purpose remained there unused since v3.8 initial mvneta support. Because
of that the register contents were inherited from the bootloader.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Fixes: c5aff18204 ("net: mvneta: driver for Marvell Armada 370/XP network
unit")
Signed-off-by: David S. Miller <davem@davemloft.net>
There's one fix for the core here, we weren't reinitialising the actual
transferred length in messages when they get reused which meant that
we'd just keep adding to the length if a message is reused. This has
limited impact since it's only used in error handling cases but will
really mess anything that tries to use it up when it triggers.
As ever there's a small collection of driver specific fixes too.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWXaUPAAoJECTWi3JdVIfQLPAH/1A6xgPcS2HnBu43ezzQu9JY
gd22gYN1+klyKhOzWv86zG0vbT/sVo2YCTb8nbnGfMCc5ghwlxF1fl2mCVXRTr2F
7dP2n43TmTNsuXQjO4tv2iJTkCXXuhoEr3DTrOu3+ywm+HNn1gUPgOl0i7zZpuIJ
AcQ0uuh33k5nQW3elj/gFcdjlVdHIq5oNpP4JBJeOn+ijsd/6DzVyW+TBiGnImWX
50XCuM+gUvj762y77T3rgORkWwkmJSQIjV+TNKXD/hIDvVWQK9OvtyMpGBOTe7nH
emEiy7BJnVIdpknUUVf0ZqjNpprxpLwU+ryQhVC9orP0dL64GlsbPkFOtIgxkDs=
=NlLa
-----END PGP SIGNATURE-----
Merge tag 'spi-fix-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's one fix for the core here, we weren't reinitialising the
actual transferred length in messages when they get reused which meant
that we'd just keep adding to the length if a message is reused. This
has limited impact since it's only used in error handling cases but
will really mess anything that tries to use it up when it triggers.
As ever there's a small collection of driver specific fixes too"
* tag 'spi-fix-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bugfix: spi_message.transfer_length does not get reset
spi: pl022: handle EPROBE_DEFER for dma
spi: bcm63xx: use correct format string for printing a resource
spi: mediatek: single device does not require cs_gpios
spi: Add missing kerneldoc description for parameter
For cpufreq drivers which use setpolicy interface, after offline->online
the policy is set to default. This can be reproduced by setting the
default policy of intel_pstate or longrun to ondemand and then change to
"performance". After offline and online, the setpolicy will be called with
the policy=ondemand.
For drivers using governors this condition is handled by storing
last_governor, during offline and restoring during online. The same should
be done for drivers using setpolicy interface. Storing last_policy during
offline and restoring during online.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the last change here, I neglected to update the cookie in one code
path: when a mgmt-tx has no real cookie sent to userspace as it doesn't
wait for a response, but is off-channel. The original code used the SKB
pointer as the cookie and always assigned the cookie to the TX SKB in
ieee80211_start_roc_work(), but my change turned this around and made
the code rely on a valid cookie being passed in.
Unfortunately, the off-channel no-wait TX path wasn't assigning one at
all, resulting in an uninitialized stack value being used. This wasn't
handed back to userspace as a cookie (since in the no-wait case there
isn't a cookie), but it was tested for non-zero to distinguish between
mgmt-tx and off-channel.
Fix this by assigning a dummy non-zero cookie unconditionally, and get
rid of a misleading comment and some dead code while at it. I'll clean
up the ACK SKB handling separately later.
Fixes: 3b79af973c ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
DFS channels should not be actively scanned as we can't be sure
if we are allowed or not.
If the current channel is in the DFS band, active scan might be
performed after CSA, but we have no guarantee about other channels,
therefore it is safer to prevent active scanning at all.
Cc: stable@vger.kernel.org
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Interfaces are being initialized (setup) on addition,
and torn down on removal.
However, p2p device is being torn down when stopped,
resulting in the next p2p start operation being done
on uninitialized interface.
Solve it by calling ieee80211_teardown_sdata() only
on interface removal (for the non-netdev case).
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[squashed in fix to call teardown after unregister]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Sunil Goutham says:
====================
thunderx: miscellaneous fixes
This patch series contains fixes for various issues observed
with BGX and NIC drivers.
Changes from v1:
- Fixed comment syle in the first patch of the series
- Removed 'Increase transmit queue length' patch from the series,
will recheck if it's a driver or system issue and resubmit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable or disable BGX LMAC's RX/TX based on corresponding VF's
status. If otherwise, when multiple LMAC's physical link is up
then packets from all LMAC's whose corresponding VF is not yet
initialized will get forwarded to VF0. This is due to VNIC's default
configuration where CPI, RSSI e.t.c point to VF0/QSET0/RQ0.
This patch will prevent multiple copies of packets on VF0.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call netif_carrier_on() only if interface's link is up. Switching this on
upon IFF_UP by default, is causing issues with ethernet channel bonding
in LACP mode. Initial NETDEV_CHANGE notification was being skipped.
Also fixed some issues with link/speed/duplex reporting via ethtool.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly set CQ timer threshold and also set it to 2us.
With previous incorrect settings it was set to 0.5us which is too less.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While VNIC or BGX driver teardown, wait for already scheduled delayed work to
finish before destroying it.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as we leave the spinlock after the job has been added to the job
queue, we can no longer rely on the job's data to be available.
I have seen a null-pointer dereference due to sched == NULL in
amd_sched_wakeup via amd_sched_entity_push_job and
amd_sched_ib_submit_kernel_helper. Since the latter initializes
sched_job->sched with the address of the ring scheduler, which is
guaranteed to be non-NULL, this race appears to be a likely culprit.
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/attachment.cgi?bugid=93079
Reviewed-by: Christian König <christian.koenig@amd.com>
Missing error check if the operation failed.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>