mirror of
https://github.com/torvalds/linux.git
synced 2024-12-21 10:31:54 +00:00
ac7c1c5af9
1113 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
9ec3a646fe |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only |
||
Linus Torvalds
|
e98bf5cedf |
The changes to the common clock framework for 4.0 are mostly new clock
drivers and updates to existing ones for feature enhancements and bug fixes. There is more churn than usual in the framework core due to the change to introduce per-user unique struct clk pointers in 4.0. This caused several regressions to surface, some of which were sent as fixes to 4.0. New generic clock drivers were added for GPIO- and PWM-based clock controllers. Additionally the common clk-divider code recieved several fixes to the way it rounds rates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVNcIIAAoJEKI6nJvDJaTU3a8QAM+fjhDMY5xpI6VIbxZaA2aR VUofw9/rdAtP1UdwtlSKBvCqpwwqt/U7zlMWU9v+UvTjYdHIf9SIDQoJnd+uEtwL roz/kNeB7WOVyxwbTJ2B5fjvPSN+mq8Rm8ANDcL8ZOGxxtt2Mip1IWMAlx2XUnwG tYZhB7EfKzLHZRblOdn2Q4U/4T+KXOFTSO+Gb9o2J0I2sJLI0NRXhcl9Fcoo8KVz G0ACWa0F1WKsbqzBATnhtYiKkuC3BeiS2eMuTVTlkP+Gd6YQ2f1zWLeBfXEiPGZb q0p/qTrUFLHbRoJMMuWaUfaBxb8PeUfM6yllxrzvRxPJU25pbj8OW/O5ZAe9xP8G S17sQ2nhEoWZW9hqbuA39IcLGa6RjT+TD+z3kmXQ9ZvCVDN2Oqqb/4ZNViwAvQq7 t67EfV7hGXty3Q58tS4XE9hHfwY+9YqMDLNIS/ED+hP8rcxTmiLlAIyk+qbT3b0l Q+375Ar7iCgihPPHYxeM5Qe1+Vsfh4NjR9thdAbT245MB3f90ULb+GNP/izUDOgA c/Ot6pStVFEUxTol6RlcLb85PugzrkoBOF/8ZLySdMLhALjPwaFcQZ1sFdcKUKlE tt7sZKQgbbCfqYGS9K264uUfWbdmZh05zhtkH0xUjyQpyIcnrYQsSIIEEnlbYnPp 0D55nooSGROKeud+gyrx =2LMr -----END PGP SIGNATURE----- Merge tag 'clk-for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clock framework updates from Michael Turquette: "The changes to the common clock framework for 4.0 are mostly new clock drivers and updates to existing ones for feature enhancements and bug fixes. There is more churn than usual in the framework core due to the change to introduce per-user unique struct clk pointers in 4.0. This caused several regressions to surface, some of which were sent as fixes to 4.0. New generic clock drivers were added for GPIO- and PWM-based clock controllers. Additionally the common clk-divider code recieved several fixes to the way it rounds rates" * tag 'clk-for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (91 commits) clk: check ->determine/round_rate() return value in clk_calc_new_rates clk: at91: usb: propagate rate modification to the parent clk clk: samsung: exynos4: Disable ARMCLK down feature on Exynos4210 SoC clk: don't use __initconst for non-const arrays clk: at91: change to using endian agnositc IO clk: clk-gpio-gate: Fix active low clk: Add PWM clock driver clk: Add clock driver for mb86s7x clk: pxa: pxa3xx: add missing os timer clock clk: tegra: Use the proper parent for plld_dsi clk: tegra: Use generic tegra_osc_clk_init() on Tegra114 clk: tegra: Model oscillator as clock clk: tegra: Add peripheral registers for bank Y clk: tegra: Register the proper number of resets clk: tegra: Remove needless initializations clk: tegra: Use consistent indentation clk: tegra: Various whitespace cleanups clk: tegra: Enable HDA to HDMI clocks on Tegra124 clk: tegra: Fix a bunch of sparse warnings clk: tegra: Fix typo tabel -> table ... |
||
Linus Torvalds
|
96b90f27bc |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "This update has mostly fixes, but also other bits: - perf tooling fixes - PMU driver fixes - Intel Broadwell PMU driver HW-enablement for LBR callstacks - a late coming 'perf kmem' tool update that enables it to also analyze page allocation data. Note, this comes with MM tracepoint changes that we believe to not break anything: because it changes the formerly opaque 'struct page *' field that uniquely identifies pages to 'pfn' which identifies pages uniquely too, but isn't as opaque and can be used for other purposes as well" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix and clean up error handling in pt_event_add() perf/x86/intel: Add Broadwell support for the LBR callstack perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events perf/x86: Fix hw_perf_event::flags collision perf probe: Fix segfault when probe with lazy_line to file perf probe: Find compilation directory path for lazy matching perf probe: Set retprobe flag when probe in address-based alternative mode perf kmem: Analyze page allocator events also tracing, mm: Record pfn instead of pointer to struct page |
||
Linus Torvalds
|
06a60deca8 |
Merge tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim: "New features: - in-memory extent_cache - fs_shutdown to test power-off-recovery - use inline_data to store symlink path - show f2fs as a non-misc filesystem Major fixes: - avoid CPU stalls on sync_dirty_dir_inodes - fix some power-off-recovery procedure - fix handling of broken symlink correctly - fix missing dot and dotdot made by sudden power cuts - handle wrong data index during roll-forward recovery - preallocate data blocks for direct_io ... and a bunch of minor bug fixes and cleanups" * tag 'for-f2fs-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (71 commits) f2fs: pass checkpoint reason on roll-forward recovery f2fs: avoid abnormal behavior on broken symlink f2fs: flush symlink path to avoid broken symlink after POR f2fs: change 0 to false for bool type f2fs: do not recover wrong data index f2fs: do not increase link count during recovery f2fs: assign parent's i_mode for empty dir f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries f2fs: fix mismatching lock and unlock pages for roll-forward recovery f2fs: fix sparse warnings f2fs: limit b_size of mapped bh in f2fs_map_bh f2fs: persist system.advise into on-disk inode f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get f2fs: preallocate fallocated blocks for direct IO f2fs: enable inline data by default f2fs: preserve extent info for extent cache f2fs: initialize extent tree with on-disk extent info of inode f2fs: introduce __{find,grab}_extent_tree f2fs: split set_data_blkaddr from f2fs_update_extent_cache f2fs: enable fast symlink by utilizing inline data ... |
||
Jaegeuk Kim
|
10027551cc |
f2fs: pass checkpoint reason on roll-forward recovery
This patch adds CP_RECOVERY to remain recovery information for checkpoint. And, it makes sure writing checkpoint in this case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Stefan Strogin
|
99e8ea6cd2 |
mm: cma: add trace events for CMA allocations and freeings
Add trace events for cma_alloc() and cma_release(). The cma_alloc tracepoint is used both for successful and failed allocations, in case of allocation failure pfn=-1UL is stored and printed. Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mpn@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Howells
|
2b0143b5c9 |
VFS: normal filesystems (and lustre): d_inode() annotations
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Linus Torvalds
|
1dcf58d6e6 |
Merge branch 'akpm' (patches from Andrew)
Merge first patchbomb from Andrew Morton: - arch/sh updates - ocfs2 updates - kernel/watchdog feature - about half of mm/ * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (122 commits) Documentation: update arch list in the 'memtest' entry Kconfig: memtest: update number of test patterns up to 17 arm: add support for memtest arm64: add support for memtest memtest: use phys_addr_t for physical addresses mm: move memtest under mm mm, hugetlb: abort __get_user_pages if current has been oom killed mm, mempool: do not allow atomic resizing memcg: print cgroup information when system panics due to panic_on_oom mm: numa: remove migrate_ratelimited mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE mm: split ET_DYN ASLR from mmap ASLR s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE mm: expose arch_mmap_rnd when available s390: standardize mmap_rnd() usage powerpc: standardize mmap_rnd() usage mips: extract logic for mmap_rnd() arm64: standardize mmap_rnd() usage x86: standardize mmap_rnd() usage arm: factor out mmap ASLR into mmap_rnd ... |
||
Kirill A. Shutemov
|
9823336833 |
x86: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct. Let's expose it as CONFIG_PGTABLE_LEVELS. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
eeee78cf77 |
Some clean ups and small fixes, but the biggest change is the addition
of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" }) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVLBTuAAoJEEjnJuOKh9ldjHMIALdRS755TXCZGOf0r7O2akOR wMPeum7C+ae1mH+jCsJKUC0/jUfQKaMt/UxoHlipDgcGg8kD2jtGnGCw4Xlwvdsr y4rFmcTRSl1mo0zDSsg6ujoupHlVYN0+JPjrd7S3cv/llJoY49zcanNLF7S2XLeM dZCtWRLWYpBiWO68ai6AqJTnE/eGFIqBI048qb5Eg8dbK243SSeSIf9Ywhb+VsA+ aq6F7cWI/H6j4tbeza8tAN19dcwenDro5EfCDY8ARQHJu1f6Y3+DLf2imjkd6Aiu JVAoGIjHIpI+djwCZC1u4gi4urjfOqYartrM3Q54tb3YWYqHeNqP2ASI2a4EpYk= =Ixwt -----END PGP SIGNATURE----- Merge tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Some clean ups and small fixes, but the biggest change is the addition of the TRACE_DEFINE_ENUM() macro that can be used by tracepoints. Tracepoints have helper functions for the TP_printk() called __print_symbolic() and __print_flags() that lets a numeric number be displayed as a a human comprehensible text. What is placed in the TP_printk() is also shown in the tracepoint format file such that user space tools like perf and trace-cmd can parse the binary data and express the values too. Unfortunately, the way the TRACE_EVENT() macro works, anything placed in the TP_printk() will be shown pretty much exactly as is. The problem arises when enums are used. That's because unlike macros, enums will not be changed into their values by the C pre-processor. Thus, the enum string is exported to the format file, and this makes it useless for user space tools. The TRACE_DEFINE_ENUM() solves this by converting the enum strings in the TP_printk() format into their number, and that is what is shown to user space. For example, the tracepoint tlb_flush currently has this in its format file: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) After adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); Its format file will contain this: __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" })" * tag 'trace-v4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (27 commits) tracing: Add enum_map file to show enums that have been mapped writeback: Export enums used by tracepoint to user space v4l: Export enums used by tracepoints to user space SUNRPC: Export enums in tracepoints to user space mm: tracing: Export enums in tracepoints to user space irq/tracing: Export enums in tracepoints to user space f2fs: Export the enums in the tracepoints to userspace net/9p/tracing: Export enums in tracepoints to userspace x86/tlb/trace: Export enums in used by tlb_flush tracepoint tracing/samples: Update the trace-event-sample.h with TRACE_DEFINE_ENUM() tracing: Allow for modules to convert their enums to values tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values tracing: Update trace-event-sample with TRACE_SYSTEM_VAR documentation tracing: Give system name a pointer brcmsmac: Move each system tracepoints to their own header iwlwifi: Move each system tracepoints to their own header mac80211: Move message tracepoints to their own header tracing: Add TRACE_SYSTEM_VAR to xhci-hcd tracing: Add TRACE_SYSTEM_VAR to kvm-s390 tracing: Add TRACE_SYSTEM_VAR to intel-sst ... |
||
Linus Torvalds
|
a1480a166d |
Merge branch 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo: - Hannes's patchset implements support for better error reporting introduced by the new ATA command spec. - the deperecated pci_ dma API usages have been replaced by dma_ ones. - a bunch of hardware specific updates and some cleanups. * 'for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: remove deprecated use of pci api ahci: st: st_configure_oob must be called after IP is clocked. ahci: st: Update the ahci_st DT documentation ahci: st: Update the DT example for how to obtain the PHY. sata_dwc_460ex: indent an if statement libata: Add tracepoints libata-eh: Set 'information' field for autosense libata: Implement support for sense data reporting libata: Implement NCQ autosense libata: use status bit definitions in ata_dump_status() ide,ata: Rename ATA_IDX to ATA_SENSE libata: whitespace fixes in ata_to_sense_error() libata: whitespace cleanup in ata_get_cmd_descript() libata: use READ_LOG_DMA_EXT libata: remove ATA_FLAG_LOWTAG sata_dwc_460ex: re-use hsdev->dev instead of dwc_dev sata_dwc_460ex: move to generic DMA driver sata_dwc_460ex: join messages back sata: xgene: add ACPI support for APM X-Gene SATA ports ata: sata_mv: add proper definitions for LP_PHY_CTL register values |
||
Namhyung Kim
|
9fdd8a875c |
tracing, mm: Record pfn instead of pointer to struct page
The struct page is opaque for userspace tools, so it'd be better to save pfn in order to identify page frames. The textual output of $debugfs/tracing/trace file remains unchanged and only raw (binary) data format is changed - but thanks to libtraceevent, userspace tools which deal with the raw data (like perf and trace-cmd) can parse the format easily. So impact on the userspace will also be minimal. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Based-on-patch-by: Joonsoo Kim <js1304@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1428298576-9785-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jaegeuk Kim
|
8ce67cb07d |
f2fs: add some tracepoints to debug volatile and atomic writes
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Steven Rostedt (Red Hat)
|
91df6089aa |
writeback: Export enums used by tracepoint to user space
The enums used in tracepoints for __print_symbolic() do not have their values shown in the tracepoint format files and this makes it difficult for user space tools to convert the binary values to the strings they are to represent. Add TRACE_DEFINE_ENUM(x) macros to export the enum names to their values to make the tracing output from user space tools more robust. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Dave Chinner <dchinner@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
43d0f71f0e |
v4l: Export enums used by tracepoints to user space
Enums used by tracepoints for __print_symbolic() are shown in the tracepoint format files with just their names and not their values. This makes it difficult for user space tools to know how to convert the binary data into their string representations. By adding the use of TRACE_DEFINE_ENUM(), the enum names will be mapped to their values and shown in the tracing file system to let tools convert the data as necessary. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
6ba16eefcd |
SUNRPC: Export enums in tracepoints to user space
The enums used in the tracepoints for __print_symbolic() have their names shown in the tracepoint format files. User space tools do not know how to convert those names into their values to be able to convert the binary data. Use TRACE_DEFINE_ENUM() to export the enum names to their values for userspace to do the parsing correctly. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
190f0b76ca |
mm: tracing: Export enums in tracepoints to user space
The enums used in tracepoints with __print_symbolic() have their names shown in the tracepoint format files and not their values. This makes it difficult for user space tools to convert the binary data to the strings as user space does not know what those enums are about. By having them use TRACE_DEFINE_ENUM(), the names of the enums will be mapped to the values and shown to user space. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
f0a91b3caa |
irq/tracing: Export enums in tracepoints to user space
The enums used by the softirq mapping is what is shown in the output of the __print_symbolic() and not their values, that are needed to map them to their strings. Export them to userspace with the TRACE_DEFINE_ENUM() macro so that user space tools can map the enums with their values. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
5511b9a471 |
f2fs: Export the enums in the tracepoints to userspace
The tracepoints that use __print_symbolic() use enums as the value to convert to strings. Unfortunately, the format files for these tracepoints show the enum name and not their value. This causes some userspace tools not to know how to convert __print_symbolic() to their strings. Add TRACE_DEFINE_ENUM() macros to export the enums used to userspace to let those tools know what those enum values are. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
56e1b22608 |
net/9p/tracing: Export enums in tracepoints to userspace
The tracepoints in the 9p code use a lot of enums for the __print_symbolic() function. These enums are shown in the tracepoint format files, and user space tools such as trace-cmd does not have the information to parse it. Add helper macros to export the enums with TRACE_DEFINE_ENUM(). Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Eric Van Hensbergen <ericvh@gmail.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
23b9766261 |
x86/tlb/trace: Export enums in used by tlb_flush tracepoint
Have the enums used in __print_symbolic() by the trace_tlb_flush() tracepoint exported to userpace such that they can be parsed by userspace tools. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Dave Hansen <dave@sr71.net> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
0c564a538a |
tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values
Several tracepoints use the helper functions __print_symbolic() or __print_flags() and pass in enums that do the mapping between the binary data stored and the value to print. This works well for reading the ASCII trace files, but when the data is read via userspace tools such as perf and trace-cmd, the conversion of the binary value to a human string format is lost if an enum is used, as userspace does not have access to what the ENUM is. For example, the tracepoint trace_tlb_flush() has: __print_symbolic(REC->reason, { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" }, { TLB_REMOTE_SHOOTDOWN, "remote shootdown" }, { TLB_LOCAL_SHOOTDOWN, "local shootdown" }, { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" }) Which maps the enum values to the strings they represent. But perf and trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would not be able to map it. With TRACE_DEFINE_ENUM(), developers can place these in the event header files and ftrace will convert the enums to their values: By adding: TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH); TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN); TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN); $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format [...] __print_symbolic(REC->reason, { 0, "flush on task switch" }, { 1, "remote shootdown" }, { 2, "local shootdown" }, { 3, "local mm shootdown" }) The above is what userspace expects to see, and tools do not need to be modified to parse them. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Cc: Guilherme Cox <cox@computer.org> Cc: Tony Luck <tony.luck@gmail.com> Cc: Xie XiuQi <xiexiuqi@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
acd388fd3a |
tracing: Give system name a pointer
Normally the compiler will use the same pointer for a string throughout the file. But there's no guarantee of that happening. Later changes will require that all events have the same pointer to the system string. Name the system string and have all events point to it. Testing this, it did not increases the size of the text, except for the notes section, which should not harm the real size any. Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt (Red Hat)
|
882156e040 |
tracing: Add TRACE_SYSTEM_VAR to intel-sst
New code will require TRACE_SYSTEM to be a valid C variable name, but some tracepoints have TRACE_SYSTEM with '-' and not '_', so it can not be used. Instead, add a TRACE_SYSTEM_VAR that can give the tracing infrastructure a unique name for the trace system. Link: http://lkml.kernel.org/r/20150402142831.GT6023@sirena.org.uk Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Hannes Reinecke
|
255c03d15a |
libata: Add tracepoints
Add some tracepoints for ata_qc_issue, ata_qc_complete, and ata_eh_link_autopsy. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
Scott Wood
|
bbedb17994 |
tracing: %pF is only for function pointers
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Link: http://lkml.kernel.org/r/1426130037-17956-22-git-send-email-scottwood@freescale.com Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
Steven Rostedt
|
f58078daca |
regmap: Move tracing header into drivers/base/regmap
The tracing events for regmap are confined to the regmap subsystem. It also requires accessing an internal header. Instead of including the internal header from a generic file location, move the tracing file into the regmap directory. Also rename the regmap tracing header to trace.h, as it is redundant to keep the regmap.h name when it is in the regmap directory. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Mark Brown <broonie@kernel.org> |
||
Philipp Zabel
|
c6b570d97c |
regmap: introduce regmap_name to fix syscon regmap trace events
This patch fixes a NULL pointer dereference when enabling regmap event tracing in the presence of a syscon regmap, introduced by commit |
||
Stephen Boyd
|
dfc202ead3 |
clk: Add tracepoints for hardware operations
It's useful to have tracepoints around operations that change the hardware state so that we can debug clock hardware performance and operations. Four basic types of events are supported: on/off events for enable, disable, prepare, unprepare that only record an event and a clock name, rate changing events for clk_set_{min_,max_}rate{_range}(), phase changing events for clk_set_phase() and parent changing events for clk_set_parent(). Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Michael Turquette <mturquette@linaro.org> |
||
Chao Yu
|
1ec4610c52 |
f2fs: add trace for rb-tree extent cache ops
This patch adds trace for lookup/update/shrink/destroy ops in rb-tree extent cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
038911597e |
Merge branch 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull lazytime mount option support from Al Viro: "Lazytime stuff from tytso" * 'lazytime' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4: add optimization for the lazytime mount option vfs: add find_inode_nowait() function vfs: add support for a lazytime mount option |
||
Linus Torvalds
|
b9085bcbf5 |
Fairly small update, but there are some interesting new features.
Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: the highlights are support for GICv3 emulation and dirty page tracking s390: several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. ARM has other conflicts where functions are added in the same place by 3.19-rc and 3.20 patches. These are not large though, and entirely within KVM. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5 DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9 LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak= =7gdx -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ... |
||
Linus Torvalds
|
c7d7b98671 |
Merge tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim: "Major changes are to: - add f2fs_io_tracer and F2FS_IOC_GETVERSION - fix wrong acl assignment from parent - fix accessing wrong data blocks - fix wrong condition check for f2fs_sync_fs - align start block address for direct_io - add and refactor the readahead flows of FS metadata - refactor atomic and volatile write policies But most of patches are for clean-ups and minor bug fixes. Some of them refactor old code too" * tag 'for-f2fs-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (64 commits) f2fs: use spinlock for segmap_lock instead of rwlock f2fs: fix accessing wrong indexed data blocks f2fs: avoid variable length array f2fs: fix sparse warnings f2fs: allocate data blocks in advance for f2fs_direct_IO f2fs: introduce macros to convert bytes and blocks in f2fs f2fs: call set_buffer_new for get_block f2fs: check node page contents all the time f2fs: avoid data offset overflow when lseeking huge file f2fs: fix to use highmem for pages of newly created directory f2fs: introduce a batched trim f2fs: merge {invalidate,release}page for meta/node/data pages f2fs: show the number of writeback pages in stat f2fs: keep PagePrivate during releasepage f2fs: should fail mount when trying to recover data on read-only dev f2fs: split UMOUNT and FASTBOOT flags f2fs: avoid write_checkpoint if f2fs is mounted readonly f2fs: support norecovery mount option f2fs: fix not to drop mount options when retrying fill_super f2fs: merge flags in struct f2fs_sb_info ... |
||
Linus Torvalds
|
6bec003528 |
Merge branch 'for-3.20/bdi' of git://git.kernel.dk/linux-block
Pull backing device changes from Jens Axboe: "This contains a cleanup of how the backing device is handled, in preparation for a rework of the life time rules. In this part, the most important change is to split the unrelated nommu mmap flags from it, but also removing a backing_dev_info pointer from the address_space (and inode), and a cleanup of other various minor bits. Christoph did all the work here, I just fixed an oops with pages that have a swap backing. Arnd fixed a missing export, and Oleg killed the lustre backing_dev_info from staging. Last patch was from Al, unexporting parts that are now no longer needed outside" * 'for-3.20/bdi' of git://git.kernel.dk/linux-block: Make super_blocks and sb_lock static mtd: export new mtd_mmap_capabilities fs: make inode_to_bdi() handle NULL inode staging/lustre/llite: get rid of backing_dev_info fs: remove default_backing_dev_info fs: don't reassign dirty inodes to default_backing_dev_info nfs: don't call bdi_unregister ceph: remove call to bdi_unregister fs: remove mapping->backing_dev_info fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info nilfs2: set up s_bdi like the generic mount_bdev code block_dev: get bdev inode bdi directly from the block device block_dev: only write bdev inode on close fs: introduce f_op->mmap_capabilities for nommu mmap support fs: kill BDI_CAP_SWAP_BACKED fs: deduplicate noop_backing_dev_info |
||
Linus Torvalds
|
a26be149fa |
IOMMU Updates for Linux v3.20
This time with: * Generic page-table framework for ARM IOMMUs using the LPAE page-table format, ARM-SMMU and Renesas IPMMU make use of it already. * Break out of the IO virtual address allocator from the Intel IOMMU so that it can be used by other DMA-API implementations too. The first user will be the ARM64 common DMA-API implementation for IOMMUs * Device tree support for Renesas IPMMU * Various fixes and cleanups all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJU3MJOAAoJECvwRC2XARrjopUP+wachFx8vb00M4hlnlwL6FCn DyIFkA1n4wL0muPhjcBI+LViEXrSxjr2TYoJEaBg+fiByWWQ1Hefg+KPz331Lo1D +uo7WiOa1AB3pfkQiUN9IN6xx+o6ivhb3UQPiL4FHjggB/qz+KVxMM9nx0j8o0fQ D9q6HLFiOIsFkra3xZaSuDGvYUBpcwyfn8FP1HVfvLlg1uxIGDcUJX3qU5UBpj9q al/lPZ4A7rp+JLApV6WyouPiyVOZKikb5x920KeRNBem7a9fNBdgf+x7QbKpNXa1 5MaT5MarwGe8lJE4wtjOqRtsllhia+A1rg/6JbROPrlGetRFiuIh2sCKLvwOCko/ IjBHSutpaRT1lFoAG0TAnXQlvHRG/58XxOlP3eF613X/p8/cezuUaTyTIwZam9X3 j2GWwbUcBiHTxlu7bQDPz6a7cTf4w6wEALzYl18QrAFv+2LqlCfOo/LSlpStmjrF kRN8DYaohlTULvmFneSr8rfGsnp5yPgIPvdmqiSwTz/Ih7kYPgfLy6+v6IAHUqZj 0n9oGs8eMqVvSzM2qqmyA9WGuQZRyhNjj4iDwn/he5YMw2kqxUQYGMpLnSu0Oi48 n4PqodtVol64jKLwaHZwyU8u71iyjUC5K9TDot/I2wlSRcTELJhxGh6c1sfDLyrO u/htIszgKCgFvVrQoEZB =dwrA -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "This time with: - Generic page-table framework for ARM IOMMUs using the LPAE page-table format, ARM-SMMU and Renesas IPMMU make use of it already. - Break out the IO virtual address allocator from the Intel IOMMU so that it can be used by other DMA-API implementations too. The first user will be the ARM64 common DMA-API implementation for IOMMUs - Device tree support for Renesas IPMMU - Various fixes and cleanups all over the place" * tag 'iommu-updates-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (36 commits) iommu/amd: Convert non-returned local variable to boolean when relevant iommu: Update my email address iommu/amd: Use wait_event in put_pasid_state_wait iommu/amd: Fix amd_iommu_free_device() iommu/arm-smmu: Avoid build warning iommu/fsl: Various cleanups iommu/fsl: Use %pa to print phys_addr_t iommu/omap: Print phys_addr_t using %pa iommu: Make more drivers depend on COMPILE_TEST iommu/ipmmu-vmsa: Fix IOMMU lookup when multiple IOMMUs are registered iommu: Disable on !MMU builds iommu/fsl: Remove unused fsl_of_pamu_ids[] iommu/fsl: Fix section mismatch iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator iommu: Fix trace_map() to report original iova and original size iommu/arm-smmu: add support for iova_to_phys through ATS1PR iopoll: Introduce memory-mapped IO polling macros iommu/arm-smmu: don't touch the secure STLBIALL register iommu/arm-smmu: make use of generic LPAE allocator iommu: io-pgtable-arm: add non-secure quirk ... |
||
Linus Torvalds
|
41cbc01f6e |
The updates included in this pull request for ftrace are:
o Several clean ups to the code One such clean up was to convert to 64 bit time keeping, in the ring buffer benchmark code. o Adding of __print_array() helper macro for TRACE_EVENT() o Updating the sample/trace_events/ to add samples of different ways to make trace events. Lots of features have been added since the sample code was made, and these features are mostly unknown. Developers have been making their own hacks to do things that are already available. o Performance improvements. Most notably, I found a performance bug where a waiter that is waiting for a full page from the ring buffer will see that a full page is not available, and go to sleep. The sched event caused by it going to sleep would cause it to wake up again. It would see that there was still not a full page, and go back to sleep again, and that would wake it up again, until finally it would see a full page. This change has been marked for stable. Other improvements include removing global locks from fast paths. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU3M+GAAoJEEjnJuOKh9ldpWQIAJTUzeVXlU0cf3bVn768VW7e XS41WHF34l1tNevmKTh6fCPiw8+U0UMGLQt5WKtyaaARsZn2MlefLVuvHPKFlK2w +qcI4OEVHH97Qgf9HWJSsYgnZaOnOE+TENqnokEgXMimRMuVcd/S4QaGxwJVDcjm iBF5j2TaG4aGbx4a3J7KueoZ3K+39r3ut15hIGi/IZBZldQ1pt26ytafD/KA3CU3 BLRM2HLttAMsV1ds0EDLgZjSGICVetFcdOmI5Gwj7Qr3KrOTRPYJMNc8NdDL7Js9 v8VhujhFGvcCrhO/IKpVvd9yluz3RCF+Z7ihc+D/+1B3Nsm0PTwN3Fl5J+f89AA= =u2Mm -----END PGP SIGNATURE----- Merge tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "The updates included in this pull request for ftrace are: o Several clean ups to the code One such clean up was to convert to 64 bit time keeping, in the ring buffer benchmark code. o Adding of __print_array() helper macro for TRACE_EVENT() o Updating the sample/trace_events/ to add samples of different ways to make trace events. Lots of features have been added since the sample code was made, and these features are mostly unknown. Developers have been making their own hacks to do things that are already available. o Performance improvements. Most notably, I found a performance bug where a waiter that is waiting for a full page from the ring buffer will see that a full page is not available, and go to sleep. The sched event caused by it going to sleep would cause it to wake up again. It would see that there was still not a full page, and go back to sleep again, and that would wake it up again, until finally it would see a full page. This change has been marked for stable. Other improvements include removing global locks from fast paths" * tag 'trace-v3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Do not wake up a splice waiter when page is not full tracing: Fix unmapping loop in tracing_mark_write tracing: Add samples of DECLARE_EVENT_CLASS() and DEFINE_EVENT() tracing: Add TRACE_EVENT_FN example tracing: Add TRACE_EVENT_CONDITION sample tracing: Update the TRACE_EVENT fields available in the sample code tracing: Separate out initializing top level dir from instances tracing: Make tracing_init_dentry_tr() static trace: Use 64-bit timekeeping tracing: Add array printing helper tracing: Remove newline from trace_printk warning banner tracing: Use IS_ERR() check for return value of tracing_init_dentry() tracing: Remove unneeded includes of debugfs.h and fs.h tracing: Remove taking of trace_types_lock in pipe files tracing: Add ref count to tracer for when they are being read by pipe |
||
Vlastimil Babka
|
99592d598e |
mm: when stealing freepages, also take pages created by splitting buddy page
When studying page stealing, I noticed some weird looking decisions in try_to_steal_freepages(). The first I assume is a bug (Patch 1), the following two patches were driven by evaluation. Testing was done with stress-highalloc of mmtests, using the mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how often page stealing occurs for individual migratetypes, and what migratetypes are used for fallbacks. Arguably, the worst case of page stealing is when UNMOVABLE allocation steals from MOVABLE pageblock. RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal, so the goal is to minimize these two cases. The evaluation of v2 wasn't always clear win and Joonsoo questioned the results. Here I used different baseline which includes RFC compaction improvements from [1]. I found that the compaction improvements reduce variability of stress-highalloc, so there's less noise in the data. First, let's look at stress-highalloc configured to do sync compaction, and how these patches reduce page stealing events during the test. First column is after fresh reboot, other two are reiterations of test without reboot. That was all accumulater over 5 re-iterations (so the benchmark was run 5x3 times with 5 fresh restarts). Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-nothp-1 5-nothp-2 5-nothp-3 Page alloc extfrag event 10264225 8702233 10244125 Extfrag fragmenting 10263271 8701552 10243473 Extfrag fragmenting for unmovable 13595 17616 15960 Extfrag fragmenting unmovable placed with movable 7989 12193 8447 Extfrag fragmenting for reclaimable 658 1840 1817 Extfrag fragmenting reclaimable placed with movable 558 1677 1679 Extfrag fragmenting for movable 10249018 8682096 10225696 With Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-nothp-1 6-nothp-2 6-nothp-3 Page alloc extfrag event 11834954 9877523 9774860 Extfrag fragmenting 11833993 9876880 9774245 Extfrag fragmenting for unmovable 7342 16129 11712 Extfrag fragmenting unmovable placed with movable 4191 10547 6270 Extfrag fragmenting for reclaimable 373 1130 923 Extfrag fragmenting reclaimable placed with movable 302 906 738 Extfrag fragmenting for movable 11826278 9859621 9761610 With Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-nothp-1 7-nothp-2 7-nothp-3 Page alloc extfrag event 4725990 3668793 3807436 Extfrag fragmenting 4725104 3668252 3806898 Extfrag fragmenting for unmovable 6678 7974 7281 Extfrag fragmenting unmovable placed with movable 2051 3829 4017 Extfrag fragmenting for reclaimable 429 1208 1278 Extfrag fragmenting reclaimable placed with movable 369 976 1034 Extfrag fragmenting for movable 4717997 3659070 3798339 With Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-nothp-1 8-nothp-2 8-nothp-3 Page alloc extfrag event 5016183 4700142 3850633 Extfrag fragmenting 5015325 4699613 3850072 Extfrag fragmenting for unmovable 1312 3154 3088 Extfrag fragmenting unmovable placed with movable 1115 2777 2714 Extfrag fragmenting for reclaimable 437 1193 1097 Extfrag fragmenting reclaimable placed with movable 330 969 879 Extfrag fragmenting for movable 5013576 4695266 3845887 In v2 we've seen apparent regression with Patch 1 for unmovable events, this is now gone, suggesting it was indeed noise. Here, each patch improves the situation for unmovable events. Reclaimable is improved by patch 1 and then either the same modulo noise, or perhaps sligtly worse - a small price for unmovable improvements, IMHO. The number of movable allocations falling back to other migratetypes is most noisy, but it's reduced to half at Patch 2 nevertheless. These are least critical as compaction can move them around. If we look at success rates, the patches don't affect them, that didn't change. Baseline: 3.19-rc4 3.19-rc4 3.19-rc4 5-nothp-1 5-nothp-2 5-nothp-3 Success 1 Min 49.00 ( 0.00%) 42.00 ( 14.29%) 41.00 ( 16.33%) Success 1 Mean 51.00 ( 0.00%) 45.00 ( 11.76%) 42.60 ( 16.47%) Success 1 Max 55.00 ( 0.00%) 51.00 ( 7.27%) 46.00 ( 16.36%) Success 2 Min 53.00 ( 0.00%) 47.00 ( 11.32%) 44.00 ( 16.98%) Success 2 Mean 59.60 ( 0.00%) 50.80 ( 14.77%) 48.20 ( 19.13%) Success 2 Max 64.00 ( 0.00%) 56.00 ( 12.50%) 52.00 ( 18.75%) Success 3 Min 84.00 ( 0.00%) 82.00 ( 2.38%) 78.00 ( 7.14%) Success 3 Mean 85.60 ( 0.00%) 82.80 ( 3.27%) 79.40 ( 7.24%) Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) Patch 1: 3.19-rc4 3.19-rc4 3.19-rc4 6-nothp-1 6-nothp-2 6-nothp-3 Success 1 Min 49.00 ( 0.00%) 44.00 ( 10.20%) 44.00 ( 10.20%) Success 1 Mean 51.80 ( 0.00%) 46.00 ( 11.20%) 45.80 ( 11.58%) Success 1 Max 54.00 ( 0.00%) 49.00 ( 9.26%) 49.00 ( 9.26%) Success 2 Min 58.00 ( 0.00%) 49.00 ( 15.52%) 48.00 ( 17.24%) Success 2 Mean 60.40 ( 0.00%) 51.80 ( 14.24%) 50.80 ( 15.89%) Success 2 Max 63.00 ( 0.00%) 54.00 ( 14.29%) 55.00 ( 12.70%) Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) Success 3 Mean 85.00 ( 0.00%) 81.60 ( 4.00%) 79.80 ( 6.12%) Success 3 Max 86.00 ( 0.00%) 82.00 ( 4.65%) 82.00 ( 4.65%) Patch 2: 3.19-rc4 3.19-rc4 3.19-rc4 7-nothp-1 7-nothp-2 7-nothp-3 Success 1 Min 50.00 ( 0.00%) 44.00 ( 12.00%) 39.00 ( 22.00%) Success 1 Mean 52.80 ( 0.00%) 45.60 ( 13.64%) 42.40 ( 19.70%) Success 1 Max 55.00 ( 0.00%) 46.00 ( 16.36%) 47.00 ( 14.55%) Success 2 Min 52.00 ( 0.00%) 48.00 ( 7.69%) 45.00 ( 13.46%) Success 2 Mean 53.40 ( 0.00%) 49.80 ( 6.74%) 48.80 ( 8.61%) Success 2 Max 57.00 ( 0.00%) 52.00 ( 8.77%) 52.00 ( 8.77%) Success 3 Min 84.00 ( 0.00%) 81.00 ( 3.57%) 79.00 ( 5.95%) Success 3 Mean 85.00 ( 0.00%) 82.40 ( 3.06%) 79.60 ( 6.35%) Success 3 Max 86.00 ( 0.00%) 83.00 ( 3.49%) 80.00 ( 6.98%) Patch 3: 3.19-rc4 3.19-rc4 3.19-rc4 8-nothp-1 8-nothp-2 8-nothp-3 Success 1 Min 46.00 ( 0.00%) 44.00 ( 4.35%) 42.00 ( 8.70%) Success 1 Mean 50.20 ( 0.00%) 45.60 ( 9.16%) 44.00 ( 12.35%) Success 1 Max 52.00 ( 0.00%) 47.00 ( 9.62%) 47.00 ( 9.62%) Success 2 Min 53.00 ( 0.00%) 49.00 ( 7.55%) 48.00 ( 9.43%) Success 2 Mean 55.80 ( 0.00%) 50.60 ( 9.32%) 49.00 ( 12.19%) Success 2 Max 59.00 ( 0.00%) 52.00 ( 11.86%) 51.00 ( 13.56%) Success 3 Min 84.00 ( 0.00%) 80.00 ( 4.76%) 79.00 ( 5.95%) Success 3 Mean 85.40 ( 0.00%) 81.60 ( 4.45%) 80.40 ( 5.85%) Success 3 Max 87.00 ( 0.00%) 83.00 ( 4.60%) 82.00 ( 5.75%) While there's no improvement here, I consider reduced fragmentation events to be worth on its own. Patch 2 also seems to reduce scanning for free pages, and migrations in compaction, suggesting it has somewhat less work to do: Patch 1: Compaction stalls 4153 3959 3978 Compaction success 1523 1441 1446 Compaction failures 2630 2517 2531 Page migrate success 4600827 4943120 5104348 Page migrate failure 19763 16656 17806 Compaction pages isolated 9597640 10305617 |
||
Joonsoo Kim
|
24e2716f63 |
mm/compaction: add tracepoint to observe behaviour of compaction defer
Compaction deferring logic is heavy hammer that block the way to the compaction. It doesn't consider overall system state, so it could prevent user from doing compaction falsely. In other words, even if system has enough range of memory to compact, compaction would be skipped due to compaction deferring logic. This patch add new tracepoint to understand work of deferring logic. This will also help to check compaction success and fail. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joonsoo Kim
|
837d026d56 |
mm/compaction: more trace to understand when/why compaction start/finish
It is not well analyzed that when/why compaction start/finish or not. With these new tracepoints, we can know much more about start/finish reason of compaction. I can find following bug with these tracepoint. http://www.spinics.net/lists/linux-mm/msg81582.html Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joonsoo Kim
|
e34d85f0e3 |
mm/compaction: print current range where compaction work
It'd be useful to know current range where compaction work for detailed analysis. With it, we can know pageblock where we actually scan and isolate, and, how much pages we try in that pageblock and can guess why it doesn't become freepage with pageblock order roughly. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joonsoo Kim
|
16c4a097a0 |
mm/compaction: enhance tracepoint output for compaction begin/end
We now have tracepoint for begin event of compaction and it prints start position of both scanners, but, tracepoint for end event of compaction doesn't print finish position of both scanners. It'd be also useful to know finish position of both scanners so this patch add it. It will help to find odd behavior or problem on compaction internal logic. And mode is added to both begin/end tracepoint output, since according to mode, compaction behavior is quite different. And lastly, status format is changed to string rather than status number for readability. [akpm@linux-foundation.org: fix sparse warning] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joonsoo Kim
|
4645f06334 |
mm/compaction: change tracepoint format from decimal to hexadecimal
To check the range that compaction is working, tracepoint print start/end pfn of zone and start pfn of both scanner with decimal format. Since we manage all pages in order of 2 and it is well represented by hexadecimal, this patch change the tracepoint format from decimal to hexadecimal. This would improve readability. For example, it makes us easily notice whether current scanner try to compact previously attempted pageblock or not. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Jaegeuk Kim
|
29e7043f40 |
f2fs: fix sparse warnings
This patch resolves the following warnings. include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static? fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32 fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32 fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static? fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static? Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Jaegeuk Kim
|
119ee91445 |
f2fs: split UMOUNT and FASTBOOT flags
This patch adds FASTBOOT flag into checkpoint as follows. - CP_UMOUNT_FLAG is set when system is umounted. - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries was done. So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly. Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Chao Yu
|
caf0047e7e |
f2fs: merge flags in struct f2fs_sb_info
Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> |
||
Linus Torvalds
|
c5ce28df0e |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: 1) More iov_iter conversion work from Al Viro. [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was wrong, and this pull actually adds an extra commit on top of the branch I'm pulling to fix that up, so that the pre-merge state is ok. - Linus ] 2) Various optimizations to the ipv4 forwarding information base trie lookup implementation. From Alexander Duyck. 3) Remove sock_iocb altogether, from CHristoph Hellwig. 4) Allow congestion control algorithm selection via routing metrics. From Daniel Borkmann. 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet. 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet. 7) Add xmit_more support to r8169, e1000, and e1000e drivers. From Florian Westphal. 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross. 9) Add BPF packet actions to packet scheduler, from Jiri Pirko. 10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer. 11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman Kwok. 12) More sanely handle out-of-window dupacks, which can result in serious ACK storms. From Neal Cardwell. 13) Various rhashtable bug fixes and enhancements, from Herbert Xu, Patrick McHardy, and Thomas Graf. 14) Support xmit_more in be2net, from Sathya Perla. 15) Group Policy extensions for vxlan, from Thomas Graf. 16) Remove Checksum Offload support for vxlan, from Tom Herbert. 17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits) crypto: fix af_alg_make_sg() conversion to iov_iter ipv4: Namespecify TCP PMTU mechanism i40e: Fix for stats init function call in Rx setup tcp: don't include Fast Open option in SYN-ACK on pure SYN-data openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set ipv6: Make __ipv6_select_ident static ipv6: Fix fragment id assignment on LE arches. bridge: Fix inability to add non-vlan fdb entry net: Mellanox: Delete unnecessary checks before the function call "vunmap" cxgb4: Add support in cxgb4 to get expansion rom version via ethtool ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version net: dsa: Remove redundant phy_attach() IB/mlx4: Reset flow support for IB kernel ULPs IB/mlx4: Always use the correct port for mirrored multicast attachments net/bonding: Fix potential bad memory access during bonding events tipc: remove tipc_snprintf tipc: nl compat add noop and remove legacy nl framework tipc: convert legacy nl stats show to nl compat tipc: convert legacy nl net id get to nl compat tipc: convert legacy nl net id set to nl compat ... |
||
Linus Torvalds
|
a4cbbf549a |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Kernel side changes: - AMD range breakpoints support: Extend breakpoint tools and core to support address range through perf event with initial backend support for AMD extended breakpoints. The syntax is: perf record -e mem:addr/len:type For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512) perf record -e mem:0x1000/512:w - event throttling/rotating fixes - various event group handling fixes, cleanups and general paranoia code to be more robust against bugs in the future. - kernel stack overhead fixes User-visible tooling side changes: - Show precise number of samples in at the end of a 'record' session, if processing build ids, since we will then traverse the whole perf.data file and see all the PERF_RECORD_SAMPLE records, otherwise stop showing the previous off-base heuristicly counted number of "samples" (Namhyung Kim). - Support to read compressed module from build-id cache (Namhyung Kim) - Enable sampling loads and stores simultaneously in 'perf mem' (Stephane Eranian) - 'perf diff' output improvements (Namhyung Kim) - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho de Melo) Tooling side infrastructure changes: - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim) - Support parsing parameterized events (Cody P Schafer) - Add support for IP address formats in libtraceevent (David Ahern) Plus other misc fixes" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits) perf: Decouple unthrottling and rotating perf: Drop module reference on event init failure perf: Use POLLIN instead of POLL_IN for perf poll data in flag perf: Fix put_event() ctx lock perf: Fix move_group() order perf: Fix event->ctx locking perf: Add a bit of paranoia perf symbols: Convert lseek + read to pread perf tools: Use perf_data_file__fd() consistently perf symbols: Support to read compressed module from build-id cache perf evsel: Set attr.task bit for a tracking event perf header: Set header version correctly perf record: Show precise number of samples perf tools: Do not use __perf_session__process_events() directly perf callchain: Cache eh/debug frame offset for dwarf unwind perf tools: Provide stub for missing pthread_attr_setaffinity_np perf evsel: Don't rely on malloc working for sz 0 tools lib traceevent: Add support for IP address formats perf ui/tui: Show fatal error message only if exists perf tests: Fix typo in sample-parsing.c ... |
||
Linus Torvalds
|
4e02370f64 |
During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed
out a real bug. During suspend and resume the tlb_flush tracepoint is called when the CPU is going offline. As the CPU has been noted as offline, RCU is ignoring that CPU, which means that it can not use RCU protected locks. When tracepoints are activated, they require RCU locking, and if RCU is ignoring a CPU that runs a tracepoint, there is a chance that the tracepoint could cause corruption. The solution was to change the tracepoint into a TRACE_EVENT_CONDITION() which allows us to check a condition to determine if the tracepoint should be called or not. If the condition is not met, the rcu protected code will not be executed. By adding the condition "cpu_online(smp_processor_id())", this will prevent the RCU protected code from being executed if the CPU is marked offline. After adding this, another bug was discovered. As RCU checks rcu callers, if a rcu call is not done, there is no check (obviously). We found that tracepoints could be added in RCU ignored locations and not have lockdep complain until the tracepoint is activated. This missed places where tracepoints were added in places they should not have been. To fix this, code was added in 3.18 that if lockdep is enabled, any tracepoint will still call the rcu checks even if the tracepoint is not enabled. The bug here, is that the check does not take the CONDITION into account. As the condition may prevent tracepoints from being activated in RCU ignored areas (as the one patch does), we get false positives when we enable lockdep and hit a tracepoint that the condition prevents it from being called in a RCU ignored location. The fix for this is to add the CONDITION to the rcu checks, even if the tracepoint is not enabled. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJU1rQfAAoJEEjnJuOKh9ld19UH/juFLZFjpYBgtRmbCZa/54Zk i2Fa3U8jQe8MHHEYOjCLT9MQTYo/42btmJhr7kWKtIoUgDEli4lkOpbs+H0qar5y Vv9+1cLeNFQzgIE3nwV7cjAw7Jufoyzd1lstDqIQvcmzZnQ5sNyyVeigMcxGv8Ls 4FyqzG6zCVgiDL4LyYNHdNcMr6qLs3KTFDEqp+kQreeO7R1r3ZEpq3JoWaEUgoPP qrYv/rqVosLBUGA0pd7RmiGOxhjeKm15qz1GkiPeeus6DDWC6bvPC8cAc/FfkXH0 hYpoQghSZVnXGy0LzVsd44gj7tYx1FHEpYy1s8G6d5WJcNOGZ6OoZOdOZMyjPVw= =PeL2 -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fixes from Steven Rostedt: "During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed out a real bug. During suspend and resume the tlb_flush tracepoint is called when the CPU is going offline. As the CPU has been noted as offline, RCU is ignoring that CPU, which means that it can not use RCU protected locks. When tracepoints are activated, they require RCU locking, and if RCU is ignoring a CPU that runs a tracepoint, there is a chance that the tracepoint could cause corruption. The solution was to change the tracepoint into a TRACE_EVENT_CONDITION() which allows us to check a condition to determine if the tracepoint should be called or not. If the condition is not met, the rcu protected code will not be executed. By adding the condition "cpu_online(smp_processor_id())", this will prevent the RCU protected code from being executed if the CPU is marked offline. After adding this, another bug was discovered. As RCU checks rcu callers, if a rcu call is not done, there is no check (obviously). We found that tracepoints could be added in RCU ignored locations and not have lockdep complain until the tracepoint is activated. This missed places where tracepoints were added in places they should not have been. To fix this, code was added in 3.18 that if lockdep is enabled, any tracepoint will still call the rcu checks even if the tracepoint is not enabled. The bug here, is that the check does not take the CONDITION into account. As the condition may prevent tracepoints from being activated in RCU ignored areas (as the one patch does), we get false positives when we enable lockdep and hit a tracepoint that the condition prevents it from being called in a RCU ignored location. The fix for this is to add the CONDITION to the rcu checks, even if the tracepoint is not enabled" * tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: x86/tlb/trace: Do not trace on CPU that is offline tracing: Add condition check to RCU lockdep checks |
||
Steven Rostedt (Red Hat)
|
6c8465a82a |
x86/tlb/trace: Do not trace on CPU that is offline
When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.
Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:
...
Disabling non-boot CPUs ...
intel_pstate CPU 1 exiting
===============================
smpboot: CPU 1 didn't die...
[ INFO: suspicious RCU usage. ]
3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
-------------------------------
include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
no locks held by swapper/1/0.
stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
Call Trace:
[<ffffffff817e370d>] dump_stack+0x4c/0x65
[<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
[<ffffffff81054c4e>] play_dead_common+0xe/0x50
[<ffffffff81054ca5>] native_play_dead+0x15/0x140
[<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
[<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
[<ffffffff81053e20>] start_secondary+0x140/0x150
intel_pstate CPU 2 exiting
...
By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.
Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
Cc: stable@vger.kernel.org # 3.17+
Fixes:
|
||
Paolo Bonzini
|
f781951299 |
kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |