Compound page should be freed by put_page() or free_pages() with correct
order. Not doing so will cause tail pages leaked.
The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case. Some people would argue the latter is
faster but I prefer the former which is more general.
This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.
$ cat /proc/vmstat | grep thp_zero_page_alloc
thp_zero_page_alloc 55
thp_zero_page_alloc_failed 0
This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.
Fixes: 97ae17497e ("thp: implement refcounting for huge zero page")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: <stable@vger.kernel.org> [3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Following up the arm testing of gcov, turns out gcov on ARM64 works fine
as well. Only change needed is adding ARM64 to Kconfig depends.
Tested with qemu and mach-virt
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.
Multiple crash dumps showed:
The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself. As this is not the value of list upon entry to the
function, the kernel never exits the loop.
To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del. This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.
Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.
We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.
The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget. However, the code doesn't do the
__iget call on next_i
if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)
The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list. This makes the handling of next_i more closely match the
handling of the variable "inode."
The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.
During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate. Advance next_i in those cases where
__iget is not done.
Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ken Helias <kenhelias@firemail.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit edc2ca6124 ("mm, compaction: move pageblock checks up from
isolate_migratepages_range()") commonizes isolate_migratepages variants
and make them use isolate_migratepages_block().
isolate_migratepages_block() could stop the execution when enough pages
are isolated, but, there is no code in isolate_migratepages_range() to
handle this case. In the result, even if isolate_migratepages_block()
returns prematurely without checking all pages in the range,
isolate_migratepages_block() is called repeately on the following
pageblock and some pages in the previous range are skipped to check.
Then, CMA is failed frequently due to this fact.
To fix this problem, this patch let isolate_migratepages_range() know
the situation that enough pages are isolated and stop the isolation in
that case.
Note that isolate_migratepages() has no such problem, because, it always
stops the isolation after just one call of isolate_migratepages_block().
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ff7ee93f47 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining. Log is pasted at the end of
this commit message.
This patch add kmemleak_free() into free_page_cgroup(). During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.
bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak
Offlined Pages 32768
kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x46/0x58
create_object+0x266/0x2c0
kmemleak_alloc+0x26/0x50
kmem_cache_alloc+0xd3/0x160
__sigqueue_alloc+0x49/0xd0
__send_signal+0xcb/0x410
send_signal+0x45/0x90
__group_send_sig_info+0x13/0x20
do_notify_parent+0x1bb/0x260
do_exit+0x767/0xa40
do_group_exit+0x44/0xa0
SyS_exit_group+0x17/0x20
system_call_fastpath+0x16/0x1b
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff880016900000 (size 524288):
kmemleak: comm "swapper/0", pid 0, jiffies 4294667296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
log_early+0x63/0x77
kmemleak_alloc+0x4b/0x50
init_section_page_cgroup+0x7f/0xf5
page_cgroup_init+0xc5/0xd0
start_kernel+0x333/0x408
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0xf5/0xfc
Fixes: ff7ee93f47 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org> [3.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In neigh_parms_release() we loop over all entries to find the entry given in
argument and being able to remove it from the list. By using a double linked
list, we can avoid this loop.
Here are some numbers with 30 000 dummy interfaces configured:
Before the patch:
$ time rmmod dummy
real 2m0.118s
user 0m0.000s
sys 1m50.048s
After the patch:
$ time rmmod dummy
real 1m9.970s
user 0m0.000s
sys 0m47.976s
Suggested-by: Thierry Herbelot <thierry.herbelot@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_schedule() can be called from any context and has to mask hard
irqs.
Add a variant that can only be called from hard interrupts handlers
or when irqs are already masked.
Many NIC drivers can use it from their hard IRQ handler instead of
generic variant.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Vrabel says:
====================
xen-netback: minor cleanups
Two minor xen-netback cleanups originally from Zoltan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This flag is unnecessary, it came from some old code.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the interrupt handler still calls napi_complete. Although it
won't schedule NAPI again as either NAPI_STATE_DISABLE or
NAPI_STATE_SCHED is set, it is just unnecessary, and it makes more
sense to do this way.
Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.
CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
- then the timer ref is removed, but inet_evict_bucket will not add
this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
- then the timer ref is kept until the evictor "expires" the queue, but
inet_frag_kill will remove the initial ref and will set
INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.
CC: Florian Westphal <fw@strlen.de>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McLean <chutzpah@gentoo.org>
Fixes: b13d3cbfb8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Patrick McLean <chutzpah@gentoo.org>
Tested-by: Patrick McLean <chutzpah@gentoo.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: support nway_reset
Fix the CHECK from checkpatch.pl and support nway_reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace tx_underun with tx_underrun for checkpatch.pl.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Remove the rcu_read_lock/unlock around rcu_access_pointer
2. Replace the rcu_dereference with rcu_access_pointer
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.
Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.
Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.
[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current kernel. This contains:
- Two error handling fixes from Jan Kara. One for null_blk on
failure to add a device, and the other for the block/scsi_ioctl
SCSI_IOCTL_SEND_COMMAND fixing up the error jump point.
- A commit added in the merge window for the bio integrity bits
unfortunately disabled merging for all requests if
CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that
integrity checking wont disallow merges when not enabled.
- A fix from Ming Lei for merging and generating too many segments.
This caused a BUG in virtio_blk.
- Two error handling printk() fixups from Robert Elliott, improving
the information given when we rate limit.
- Error handling fixup on elevator_init() failure from Sudip
Mukherjee.
- A fix from Tony Battersby, fixing up a memory leak in the
scatterlist handling with scsi-mq"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined
lib/scatterlist: fix memory leak with scsi-mq
block: fix wrong error return in elevator_init()
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
null_blk: Cleanup error recovery in null_add_dev()
blk-merge: recaculate segment if it isn't less than max segments
fs: clarify rate limit suppressed buffer I/O errors
fs: merge I/O error prints into one line
Pull HID fixes from Jiri Kosina:
- workarounds for a couple of misbehaving Elan Touchscreens, by Adel
Gadllah
- fix for TransducerSerialNumber field implementation, by Jason Gerecke
- a couple of new HID usages (added by HUT), by Olivier Gay
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: input: Fix TransducerSerialNumber implementation
HID: add keyboard input assist hid usages
HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f
HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b
Several warnings and errors of coding style rules corrected.
Compile tested.
Signed-off-by: Roberto Medina <robertoxmed@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replaced repetive Device ID's which got added in commit b961f9a488
("cxgb4vf: Remove superfluous "idx" parameter of CH_DEVICE() macro")
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull Integrity subsystem fix from James Morris:
"These changes fix a bug in xattr handling, where the evm and ima
inode_setxattr() functions do not check for empty xattrs being passed
from userspace (leading to user-triggerable null pointer
dereferences)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
evm: check xattr value length and type in evm_inode_setxattr()
ima: check xattr value length and type in the ima_inode_setxattr()
This patch generalizes commit d6a4a10411 ("tcp: GSO should be TSQ
friendly") to protocols using skb_set_owner_w()
TCP uses its own destructor (tcp_wfree) and needs a more complex scheme
as explained in commit 6ff50cd555 ("tcp: gso: do not generate out of
order packets")
This allows UDP sockets using UFO to get proper backpressure,
thus avoiding qdisc drops and excessive cpu usage.
Here are performance test results (macvlan on vlan):
- Before
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 144096 1224195 1258.56
212992 60.00 51 0.45
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.23 0.00 25.26 0.08 0.00 74.43
- After
# netperf -t UDP_STREAM ...
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 60.00 109593 0 957.20
212992 60.00 109593 957.20
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.18 0.00 8.38 0.02 0.00 91.43
[edumazet] Rewrote patch and changelog.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NetworkManager might want to know that it changed when the router advertisement
arrives.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware can automatically generate pause frames when the number
of free buffers drops under a certain threshold, but in order to do this,
the address of the last free buffer needs to be written to a specific
register for each RX queue.
This has to be done in 'gfar_clean_rx_ring' which is called for each
RX queue. In order not to impact performance, by adding a register write
for each incoming packet, this operation is done only when the PAUSE frame
transmission is enabled.
Whenever the link is readjusted, this capability is turned on or off.
Signed-off-by: Matei Pavaluca <matei.pavaluca@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Local flow control options needed in order to resolve the negotiation
are incorrectly calculated.
Previously 'mii_advertise_flowctrl' was called to determine the local advertising
options, but these were determined based on FLOW_CTRL_RX/TX flags which are
never set through ethtool.
The patch simply translates from ethtool flow options to mii flow options.
Signed-off-by: Pavaluca Matei <matei.pavaluca@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The phy device supports 802.3x flow control, but the specific flags are not set
in the phy initialisation code. Flow control flags need to be added to the
supported capabilities of the phydev by the driver.
This is needed in order for ethtool to work ('ethtool -A' code checks for these
flags)
Signed-off-by: Pavaluca Matei <matei.pavaluca@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Vijay Subramanian <vijaynsu@cisco.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Pull powerpc updates from Michael Ellerman:
"There's some bug fixes or cleanups to facilitate fixes, a MAINTAINERS
update, and a new syscall (bpf)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update
powerpc/numa: use cached value of update->cpu in update_cpu_topology
cxl: Fix PSL error due to duplicate segment table entries
powerpc/mm: Use appropriate ESID mask in copro_calculate_slb()
cxl: Refactor cxl_load_segment() and find_free_sste()
cxl: Disable secondary hash in segment table
Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors"
powernv: Use _GLOBAL_TOC for opal wrappers
powerpc: Wire up sys_bpf() syscall
MAINTAINERS: nx-842 driver maintainer change
powerpc/mm: Remove redundant #if case
powerpc/mm: Fix build error with hugetlfs disabled
For boards without a reset GPIO we skip the delay between enabling the
pcie_ref_clk and touching the RC registers for configuration. This hangs
the system if there isn't a proper delay to ensure the clocks are settled
in the DW PCIe core.
Also iMX6Q always needs an additional 10us delay to make sure the reset is
propagated through the core, as we don't have an explicitly controlled
reset input on this SoC.
This fixes a problem with 3fce0e882f ("PCI: imx6: Delay enabling
reference clock for SS until it stabilizes"): the kernel doesn't boot on
systems that don't pass the PCI GPIO reset in the DTB. This regression
affects mx6 nitrogen boards.
[bhelgaas: add regression info in changelog]
Fixes: 3fce0e882f ("PCI: imx6: Delay enabling reference clock for SS until it stabilizes")
Reported-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Richard Zhu <richard.zhu@freescale.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
It is reported that Samsung laptops that need to poll events are broken by
the following commit:
Commit 3afcf2ece4
Subject: ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set
The behaviors of the 2 vendor firmwares are conflict:
1. Acer: OSPM shouldn't issue QR_EC unless SCI_EVT is set, firmware
automatically sets SCI_EVT as long as there is event queued up.
2. Samsung: OSPM should issue QR_EC whatever SCI_EVT is set, firmware
returns 0 when there is no event queued up.
This patch is a quick fix to distinguish the behaviors to make Acer
behavior only effective for Acer EC firmware so that the breakages on
Samsung EC firmware can be avoided.
Fixes: 3afcf2ece4 (ACPI / EC: Add support to disallow QR_EC to be issued ...)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161
Reported-and-tested-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+
[ rjw : Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is reported that the following commit breaks Samsung hardware:
Commit: 558e4736f2.
Subject: ACPI / EC: Add support to disallow QR_EC to be issued before
completing previous QR_EC
Which means the Samsung behavior conflicts with the Acer behavior.
1. Samsung may behave like:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
write QR_EC
read event
[ -event 1 ] SCI_EVT clear
Without the above commit, Samsung can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
write QR_EC
CAN prepare next QR_EC as SCI_EVT=1
read event
[ -event 1 ] SCI_EVT clear
write QR_EC
read event
[ -event 2 ] SCI_EVT clear
With the above commit, Samsung cannot work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
write QR_EC
read event
[ -event 1 ] SCI_EVT clear
CANNOT prepare next QR_EC as SCI_EVT=0
2. Acer may behave like:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
write QR_EC
read event
[ -event 1 ] SCI_EVT clear
[ +event 2 ] SCI_EVT set
Without the above commit, Acer cannot work when there is only 1 event:
[ +event 1 ] SCI_EVT set
write QR_EC
can prepared next QR_EC as SCI_EVT=1
read event
[ -event 1 ] SCI_EVT clear
CANNOT write QR_EC as SCI_EVT=0
With the above commit, Acer can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
write QR_EC
read event
[ -event 1 ] SCI_EVT set
can prepare next QR_EC because SCI_EVT=0
CAN write QR_EC as SCI_EVT=1
Since Acer can also work with only the following commit applied:
Commit: 3afcf2ece4
Subject: ACPI / EC: Add support to disallow QR_EC to be issued when
SCI_EVT isn't set
commit 558e4736f2 can be reverted.
Fixes: 558e4736f2 (ACPI / EC: Add support to disallow QR_EC to be issued ...)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=44161
Reported-and-tested-by: Ortwin Glück <odi@odi.ch>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CMI8888 shows the stuttering playback when the snooping is disabled
on the audio buffer. Meanwhile, we've got reports that CORB/RIRB
doesn't work in the snooped mode. So, as a compromise, disable the
snoop only for CORB/RIRB and enable the snoop for the stream buffers.
The resultant patch became a bit ugly, unfortunately, but we still can
live with it.
Reported-and-tested-by: Geoffrey McRae <geoff@spacevs.com>
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The reported-by text says you have to ask for permission, but that
should only be if the bug was reported in private. These days the
standard is to always give reported-by credit or it's considered a bit
rude.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Replace "Disable" with "Enable", since --demangle option enables symbol
demangling, not disable it.
perf probe has --demangle and --no-demangle options, but the
command-line help (--help) shows only --demangle option. So it should
explain about --demangle.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20141027203124.21219.68278.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F dso_from
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-8-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F dso_to
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F symbol_from
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F symbol_to
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F mispredict
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F in_tx
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The branch field sorting code assumes hist_entry::branch_info is
allocated, which is wrong and following perf session ends up with report
segfault.
$ perf record ls
$ perf report -F abort
perf: Segmentation fault
Checking that hist_entry::branch_info is valid and display "N/A" string
in snprint callback if it's not.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1413468427-31049-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After kernel 3.7 (commit b4b8f770eb),
/proc/cpuinfo replaces 'Processor' to 'model name'.
This patch makes CPUINFO_PROC to an array and provides two choices for
ARM, makes it compatible for different kernel version.
v1 -> v2: minor changes as suggested by Namhyung Kim:
- Doesn't pass @h and @evlist to __write_cpudesc;
- Coding style fix.
v2 -> v3:
- Rebase:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/core
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1414115126-7479-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The libunwind provides two caching policy which are global and
per-thread. As perf unwinds callchains in a single thread, it'd
sufficient to use global caching.
This speeds up my perf report from 14s to 7s on a ~260MB data file.
Although the output sometimes contains a slight difference (~0.01% in
terms of number of lines printed) on callchains which were not resolved.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1412556363-26229-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These patches:
86a349a28b ("perf/x86/intel: Add Broadwell core support")
c46e665f03 ("perf/x86: Add INST_RETIRED.ALL workarounds")
fdda3c4aac ("perf/x86/intel: Use Broadwell cache event list for Haswell")
introduced magic constants and unexplained changes:
https://lkml.org/lkml/2014/10/28/1128https://lkml.org/lkml/2014/10/27/325https://lkml.org/lkml/2014/8/27/546https://lkml.org/lkml/2014/10/28/546
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
e735b9db12 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
https://lkml.org/lkml/2014/10/22/730
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The commit which introduced TransducerSerialNumber (368c966) is missing
two crucial implementation details. Firstly, the commit does not set the
type/code/bit/max fields as expected later down the code which can cause
the driver to crash when a tablet with this usage is connected. Secondly,
the call to 'set_bit' causes MSC_PULSELED to be sent instead of the
expected MSC_SERIAL. This commit addreses both issues.
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Ping Cheng <pingc@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Function mp_register_gsi() returns blindly the GSI number for the ACPI
SCI interrupt. That causes a regression when the GSI for ACPI SCI is
shared with other devices.
The regression was caused by commit 84245af729 "x86, irq, ACPI:
Change __acpi_register_gsi to return IRQ number instead of GSI" and
exposed on a SuperMicro system, which shares one GSI between ACPI SCI
and PCI device, with following failure:
http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low
level)
[ 2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt
20
Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number.
Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Intel MID platforms has no legacy interrupts, so no IRQ descriptors
preallocated. We need to call mp_map_gsi_to_irq() to create IRQ
descriptors for APB timers and RTC timers, otherwise it may cause
invalid memory access as:
[ 0.116839] BUG: unable to handle kernel NULL pointer dereference at
0000003a
[ 0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>