Currently the fs sync function (super.c:btrfs_sync_fs()) doesn't
wait for delayed work to finish before returning success to the
caller. This change fixes this, ensuring that there's no data loss
if a power failure happens right after fs sync returns success to
the caller and before the next commit happens.
Steps to reproduce the data loss issue:
$ mkfs.btrfs -f /dev/sdb3
$ mount /dev/sdb3 /mnt/btrfs
$ perl -e '$d = ("\x41" x 6001); open($f,">","/mnt/btrfs/foobar"); print $f $d; close($f);' && btrfs fi sync /mnt/btrfs
Right after the btrfs fi sync command (a second or 2 for example), power
off the machine and reboot it. The file will be empty, as it can be verified
after mounting the filesystem and through btrfs-debug-tree:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 7 transid 7 size 0 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29429760 items 0 free space 3995 generation 7 owner 7
fs uuid 6192815c-af2a-4b75-b3db-a959ffb6166e
chunk uuid b529c44b-938c-4d3d-910a-013b4700bcae
uuid tree key (UUID_TREE ROOT_ITEM 0)
After this patch, the data loss no longer happens after a power failure and
btrfs-debug-tree shows:
$ btrfs-debug-tree /dev/sdb3 | egrep '\(257 INODE_ITEM 0\) itemoff' -B 3 -A 8
item 3 key (256 DIR_INDEX 2) itemoff 3751 itemsize 36
location key (257 INODE_ITEM 0) type FILE
namelen 6 datalen 0 name: foobar
item 4 key (257 INODE_ITEM 0) itemoff 3591 itemsize 160
inode generation 6 transid 6 size 6001 block group 0 mode 100644 links 1
item 5 key (257 INODE_REF 256) itemoff 3575 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 3522 itemsize 53
extent data disk byte 12845056 nr 8192
extent data offset 0 nr 8192 ram 8192
extent compression 0
checksum tree key (CSUM_TREE ROOT_ITEM 0)
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
In inode.c:btrfs_orphan_add() if we failed to insert the orphan
item, we would return without decrementing the orphan count that
we just incremented before attempting the insertion, leaving the
orphan inode count wrong.
In inode.c:btrfs_orphan_del(), we were decrementing the inode
orphan count if the bit BTRFS_INODE_ORPHAN_META_RESERVED was set,
which is logically wrong because it should be decremented if the
bit BTRFS_INODE_HAS_ORPHAN_ITEM was set - after all we increment
the count when we set the bit BTRFS_INODE_HAS_ORPHAN_ITEM elsewhere.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Similar to ocfs2, btrfs also supports that extents can be shared by
different inodes, and there are some userspace tools requesting
for this kind of 'space shared infomation'.[1]
ocfs2 uses flag FIEMAP_EXTENT_SHARED, so does btrfs.
[1]: http://thr3ads.net/ocfs2-devel/2010/09/489052-PATCH-3-3-shared-du-using-fiemap-to-figure-up-the-shared-extents-per-file-and-the-footprint-in
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Not used for anything, and removing it avoids caller's need to
allocate a path structure.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
We're doing a unnecessary extra lookup of the ino cache's
inode when we already have it (and holding a reference)
during the process of saving the ino cache contents to disk.
Therefore remove this extra lookup.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
While running some snashot aware defrag tests I noticed I was panicing every
once and a while in key_search. This is because of the optimization that says
if we find a key at slot 0 it will be at slot 0 all the way down the rest of the
tree. This isn't the case for btrfs_search_old_slot since it will likely replay
changes to a buffer if something has changed since we took our sequence number.
So short circuit this optimization by setting prev_cmp to -1 every time we call
key_search so we will do our normal binary search. With this patch I am no
longer seeing the panics I was seeing before. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
While looking at somebodys corruption I became completely convinced that
btrfs_split_item was broken, so I wrote this test to verify that it was working
as it was supposed to. Thankfully it appears to be working as intended, so just
add this test to make sure nobody breaks it in the future. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Remove unused eb parameter from btrfs_item_nr
Signed-off-by: Ross Kirk <ross.kirk@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
It is not necessary to store the NULL byte in a symlink inline file
extent. There's currently no code that requires the NULL byte to be
present in the extent. This change also doesn't break file format
compatibility nor the send/receive feature.
The VFS also doesn't need the NULL byte to be present in the extent,
as it reads up to inode->i_size bytes (which already excluded the NULL
byte) and sets the NULL byte for us (in fs/namei.c:page_getlink()).
So with this change we save 1 byte per symlink file extent (which is
always inlined in the btree leaf) without losing backward and forward
compatibility.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The fact that btrfs_root_refs() returned 0 for the tree_root caused
bugs in the past, therefore it is set to 1 with this patch and
(hopefully) all affected code is adapted to this change.
I verified this change by temporarily adding WARN_ON() checks
everywhere where btrfs_root_refs() is used, checking whether the
logic of the code is changed by btrfs_root_refs() returning 1
instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID.
With these added checks, I ran the xfstests './check -g auto'.
The two roots chunk_root and log_root_tree that are only referenced
by the superblock and the log_roots below the log_root_tree still
have btrfs_root_refs() == 0, only the tree_root is changed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Pull x86 uaccess changes from Ingo Molnar:
"A single change that micro-optimizes __copy_*_user_inatomic(), used by
the futex code"
* 'x86-uaccess-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Add 1/2/4/8 byte optimization to 64bit __copy_{from,to}_user_inatomic
Pull x86 reboot changes from Ingo Molnar:
"Misc changes - the only one with functional impact should be commit
16c21ae5ca ("reboot: Allow specifying warm/cold reset for CF9 boot
type") which extends cold/warm reboot handling to the 0xCF9 reboot
method"
* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/reboot: Correct pr_info() log message in the set_bios/pci/kbd_reboot()
x86/reboot: Sort reboot DMI quirks by vendor
x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
reboot: Allow specifying warm/cold reset for CF9 boot type
Pull x86 platform fixlet from Ingo Molnar:
"A single __initdata fix"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/geode: Fix incorrect placement of __initdata tag
Pull x86 mm fixlet from Ingo Molnar:
"One cleanup that documents a particular detail in init_mem_mapping()"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Add 'step_size' comments to init_mem_mapping()
Pull x86 RAS changes from Ingo Molnar:
"The biggest change adds support for Intel 'CPER' (UEFI Common Platform
Error Record) error logging, which builds upon an enhanced error
logging mechanism available on Xeon processors.
Full description is here:
http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
This change provides a module (and support code) to check for an
extended error log and prints extra details about the error on the
console"
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ACPI, x86: Fix extended error log driver to depend on CONFIG_X86_LOCAL_APIC
dmi: Avoid unaligned memory access in save_mem_devices()
Move cper.c from drivers/acpi/apei to drivers/firmware/efi
EDAC, GHES: Update ghes error record info
ACPI, APEI, CPER: Cleanup CPER memory error output format
ACPI, APEI, CPER: Enhance memory reporting capability
ACPI, APEI, CPER: Add UEFI 2.4 support for memory error
DMI: Parse memory device (type 17) in SMBIOS
ACPI, x86: Extended error log driver for x86 platform
bitops: Introduce a more generic BITMASK macro
ACPI, CPER: Update cper info
ACPI, APEI, CPER: Fix status check during error printing
Pull x86 iommu changes from Ingo Molnar:
"Make it easier to turn off the old AMD GART code"
* 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu: Clean up the CONFIG_GART_IOMMU config option a bit
x86/iommu: Don't make AMD_GART depend on EXPERT and default y
Pull x86/intel-mid changes from Ingo Molnar:
"Update the 'intel mid' (mobile internet device) platform code as Intel
is rolling out more SoC designs.
This gets rid of most of the 'MRST' platform code in the process,
mostly by renaming and shuffling code around into their respective
'intel-mid' platform drivers"
* 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, intel-mid: Do not re-introduce usage of obsolete __cpuinit
intel_mid: Move platform device setups to their own platform_<device>.* files
x86: intel-mid: Add section for sfi device table
intel-mid: sfi: Allow struct devs_id.get_platform_data to be NULL
intel_mid: Moved SFI related code to sfi.c
intel_mid: Added custom handler for ipc devices
intel_mid: Added custom device_handler support
intel_mid: Refactored sfi_parse_devs() function
intel_mid: Renamed *mrst* to *intel_mid*
pci: intel_mid: Return true/false in function returning bool
intel_mid: Renamed *mrst* to *intel_mid*
mrst: Fixed indentation issues
mrst: Fixed printk/pr_* related issues
Pull x86/hyperv changes from Ingo Molnar:
"These changes enable Linux guests to boot as 'Modern VM' guest kernels
on MS-Hyperv hosts"
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, hyperv: Move a variable to avoid an unused variable warning
x86, hyperv: Fix build error due to missing <asm/apic.h> include
x86, hyperv: Correctly guard the local APIC calibration code
x86, hyperv: Get the local APIC timer frequency from the hypervisor
Pull x86 EFI changes from Ingo Molnar:
"Main changes:
- Add support for earlyprintk=efi which uses the EFI framebuffer.
Very useful for debugging boot problems.
- EFI stub support for large memory maps (more than 128 entries)
- EFI ARM support - this was mostly done by generalizing x86 <-> ARM
platform differences, such as by moving x86 EFI code into
drivers/firmware/efi/ and sharing it with ARM.
- Documentation updates
- misc fixes"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/efi: Add EFI framebuffer earlyprintk support
boot, efi: Remove redundant memset()
x86/efi: Fix config_table_type array termination
x86 efi: bugfix interrupt disabling sequence
x86: EFI stub support for large memory maps
efi: resolve warnings found on ARM compile
efi: Fix types in EFI calls to match EFI function definitions.
efi: Renames in handle_cmdline_files() to complete generalization.
efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
efi: Allow efi_free() to be called with size of 0
efi: use efi_get_memory_map() to get final map for x86
efi: generalize efi_get_memory_map()
efi: Rename __get_map() to efi_get_memory_map()
efi: Move unicode to ASCII conversion to shared function.
efi: Generalize relocate_kernel() for use by other architectures.
efi: Move relocate_kernel() to shared file.
efi: Enforce minimum alignment of 1 page on allocations.
efi: Rename memory allocation/free functions
efi: Add system table pointer argument to shared functions.
efi: Move common EFI stub code from x86 arch code to common location
...
Pull x86 cpu changes from Ingo Molnar:
"The biggest change that stands out is the increase of the
CONFIG_NR_CPUS range from 4096 to 8192 - as real hardware out there
already went beyond 4k CPUs ...
We only allow more than 512 CPUs if offstack cpumasks are enabled.
CONFIG_MAXSMP=y remains to be the 'you are nuts!' extreme testcase,
which now means a max of 8192 CPUs"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Increase max CPU count to 8192
x86/cpu: Allow higher NR_CPUS values
x86/cpu: Always print SMP information in /proc/cpuinfo
x86/cpu: Track legacy CPU model data only on 32-bit kernels
Pull x86 cleanups from Ingo Molnar:
"Two small cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, msr: Use file_inode(), not f_mapping->host
x86: mkpiggy.c: Explicitly close the output file
Pull x86 build changes from Ingo Molnar:
"Two small changes"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, defconfig: Add DEVTMPFS and DEVTMPFS_MOUNT to *86*_defconfig
x86, build: move build output statistics away from stderr
Pull x86 user access changes from Ingo Molnar:
"This tree contains two copy_[from/to]_user() build time checking
changes/enhancements from Jan Beulich.
The desired outcome is to get better compiler warnings with
CONFIG_DEBUG_STRICT_USER_COPY_CHECKS=y, to keep people from
introducing bugs such as overflows and information leaks"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Unify copy_to_user() and add size checking to it
x86: Unify copy_from_user() size checking
Pull x86/apic fix from Ingo Molnar:
"A single fix to the IO-APIC / local-APIC shutdown sequence"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Disable I/O APIC before shutdown of the local APIC
Pull timer changes from Ingo Molnar:
"Main changes in this cycle were:
- Updated full dynticks support.
- Event stream support for architected (ARM) timers.
- ARM clocksource driver updates.
- Move arm64 to using the generic sched_clock framework & resulting
cleanup in the generic sched_clock code.
- Misc fixes and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC
clocksource: sun4i: remove IRQF_DISABLED
clocksource: sun4i: Report the minimum tick that we can program
clocksource: sun4i: Select CLKSRC_MMIO
clocksource: Provide timekeeping for efm32 SoCs
clocksource: em_sti: convert to clk_prepare/unprepare
time: Fix signedness bug in sysfs_get_uname() and its callers
timekeeping: Fix some trivial typos in comments
alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist
clocksource: arch_timer: Do not register arch_sys_counter twice
timer stats: Add a 'Collection: active/inactive' line to timer usage statistics
sched_clock: Remove sched_clock_func() hook
arch_timer: Move to generic sched_clock framework
clocksource: tcb_clksrc: Remove IRQF_DISABLED
clocksource: tcb_clksrc: Improve driver robustness
clocksource: tcb_clksrc: Replace clk_enable/disable with clk_prepare_enable/disable_unprepare
clocksource: arm_arch_timer: Use clocksource for suspend timekeeping
clocksource: dw_apb_timer_of: Mark a few more functions as __init
clocksource: Put nodes passed to CLOCKSOURCE_OF_DECLARE callbacks centrally
arm: zynq: Enable arm_global_timer
...
Pull scheduler changes from Ingo Molnar:
"The main changes in this cycle are:
- (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik
van Riel, Peter Zijlstra et al. Yay!
- optimize preemption counter handling: merge the NEED_RESCHED flag
into the preempt_count variable, by Peter Zijlstra.
- wait.h fixes and code reorganization from Peter Zijlstra
- cfs_bandwidth fixes from Ben Segall
- SMP load-balancer cleanups from Peter Zijstra
- idle balancer improvements from Jason Low
- other fixes and cleanups"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED
stop_machine: Fix race between stop_two_cpus() and stop_cpus()
sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus
sched: Fix asymmetric scheduling for POWER7
sched: Move completion code from core.c to completion.c
sched: Move wait code from core.c to wait.c
sched: Move wait.c into kernel/sched/
sched/wait: Fix __wait_event_interruptible_lock_irq_timeout()
sched: Avoid throttle_cfs_rq() racing with period_timer stopping
sched: Guarantee new group-entities always have weight
sched: Fix hrtimer_cancel()/rq->lock deadlock
sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining
sched: Fix race on toggling cfs_bandwidth_used
sched: Remove extra put_online_cpus() inside sched_setaffinity()
sched/rt: Fix task_tick_rt() comment
sched/wait: Fix build breakage
sched/wait: Introduce prepare_to_wait_event()
sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too
sched: Remove get_online_cpus() usage
sched: Fix race in migrate_swap_stop()
...
Pull perf updates from Ingo Molnar:
"As a first remark I'd like to note that the way to build perf tooling
has been simplified and sped up, in the future it should be enough for
you to build perf via:
cd tools/perf/
make install
(ie without the -j option.) The build system will figure out the
number of CPUs and will do a parallel build+install.
The various build system inefficiencies and breakages Linus reported
against the v3.12 pull request should now be resolved - please
(re-)report any remaining annoyances or bugs.
Main changes on the perf kernel side:
* Performance optimizations:
. perf ring-buffer code optimizations, by Peter Zijlstra
. perf ring-buffer code optimizations, by Oleg Nesterov
. x86 NMI call-stack processing optimizations, by Peter Zijlstra
. perf context-switch optimizations, by Peter Zijlstra
. perf sampling speedups, by Peter Zijlstra
. x86 Intel PEBS processing speedups, by Peter Zijlstra
* Enhanced hardware support:
. for Intel Ivy Bridge-EP uncore PMUs, by Zheng Yan
. for Haswell transactions, by Andi Kleen, Peter Zijlstra
* Core perf events code enhancements and fixes by Oleg Nesterov:
. for uprobes, if fork() is called with pending ret-probes
. for uprobes platform support code
* New ABI details by Andi Kleen:
. Report x86 Haswell TSX transaction abort cost as weight
Main changes on the perf tooling side (some of these tooling changes
utilize the above kernel side changes):
* 'perf report/top' enhancements:
. Convert callchain children list to rbtree, greatly reducing the
time taken for callchain processing, from Namhyung Kim.
. Add new COMM infrastructure, further improving histogram
processing, from Frédéric Weisbecker, one fix from Namhyung Kim.
. Add /proc/kcore based live-annotation improvements, including
build-id cache support, multi map 'call' instruction navigation
fixes, kcore address validation, objdump workarounds. From
Adrian Hunter.
. Show progress on histogram collapsing, that can take a long
time, from Namhyung Kim.
. Add --max-stack option to limit callchain stack scan in 'top'
and 'report', improving callchain processing when reducing the
stack depth is an option, from Waiman Long.
. Add new option --ignore-vmlinux for perf top, from Willy
Tarreau.
* 'perf trace' enhancements:
. 'perf trace' now can can use a 'perf probe' dynamic tracepoints
to hook into the userspace -> kernel pathname copy so that it
can map fds to pathnames without reading /proc/pid/fd/ symlinks.
From Arnaldo Carvalho de Melo.
. Show VFS path associated with fd in live sessions, using a
'vfs_getname' 'perf probe' created dynamic tracepoint or by
looking at /proc/pid/fd, from Arnaldo Carvalho de Melo.
. Add 'trace' beautifiers for lots of syscall arguments, from
Arnaldo Carvalho de Melo.
. Implement more compact 'trace' output by suppressing zeroed
args, from Arnaldo Carvalho de Melo.
. Show thread COMM by default in 'trace', from Arnaldo Carvalho de
Melo.
. Add option to show full timestamp in 'trace', from David Ahern.
. Add 'record' command in 'trace', to record raw_syscalls:*, from
David Ahern.
. Add summary option to dump syscall statistics in 'trace', from
David Ahern.
. Improve error messages in 'trace', providing hints about system
configuration steps needed for using it, from Ramkumar
Ramachandra.
. 'perf trace' now emits hints as to why tracing is not possible,
helping the user to setup the system to allow tracing in the
desired permission granularity, telling if the problem is due to
debugfs not being mounted or with not enough permission for
!root, /proc/sys/kernel/perf_event_paranoit value, etc. From
Arnaldo Carvalho de Melo.
* 'perf record' enhancements:
. Check maximum frequency rate for record/top, emitting better
error messages, from Jiri Olsa.
. 'perf record' code cleanups, from David Ahern.
. Improve write_output error message in 'perf record', from Adrian
Hunter.
. Allow specifying B/K/M/G unit to the --mmap-pages arguments,
from Jiri Olsa.
. Fix command line callchain attribute tests to handle the new
-g/--call-chain semantics, from Arnaldo Carvalho de Melo.
* 'perf kvm' enhancements:
. Disable live kvm command if timerfd is not supported, from David
Ahern.
. Fix detection of non-core features, from David Ahern.
* 'perf list' enhancements:
. Add usage to 'perf list', from David Ahern.
. Show error in 'perf list' if tracepoints not available, from
Pekka Enberg.
* 'perf probe' enhancements:
. Support "$vars" meta argument syntax for local variables,
allowing asking for all possible variables at a given probe
point to be collected when it hits, from Masami Hiramatsu.
* 'perf sched' enhancements:
. Address the root cause of that 'perf sched' stack initialization
build slowdown, by programmatically setting a big array after
moving the global variable back to the stack. Fix from Adrian
Hunter.
* 'perf script' enhancements:
. Set up output options for in-stream attributes, from Adrian
Hunter.
. Print addr by default for BTS in 'perf script', from Adrian
Juntmer
* 'perf stat' enhancements:
. Improved messages when doing profiling in all or a subset of
CPUs using a workload as the session delimitator, as in:
'perf stat --cpu 0,2 sleep 10s'
from Arnaldo Carvalho de Melo.
. Add units to nanosec-based counters in 'perf stat', from David
Ahern.
. Remove bogus info when using 'perf stat' -e cycles/instructions,
from Ramkumar Ramachandra.
* 'perf lock' enhancements:
. 'perf lock' fixes and cleanups, from Davidlohr Bueso.
* 'perf test' enhancements:
. Fixup PERF_SAMPLE_TRANSACTION handling in sample synthesizing
and 'perf test', from Adrian Hunter.
. Clarify the "sample parsing" test entry, from Arnaldo Carvalho
de Melo.
. Consider PERF_SAMPLE_TRANSACTION in the "sample parsing" test,
from Arnaldo Carvalho de Melo.
. Memory leak fixes in 'perf test', from Felipe Pena.
* 'perf bench' enhancements:
. Change the procps visible command-name of invididual benchmark
tests plus cleanups, from Ingo Molnar.
* Generic perf tooling infrastructure/plumbing changes:
. Separating data file properties from session, code
reorganization from Jiri Olsa.
. Fix version when building out of tree, as when using one of
these:
$ make help | grep perf
perf-tar-src-pkg - Build perf-3.12.0.tar source tarball
perf-targz-src-pkg - Build perf-3.12.0.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-3.12.0.tar.bz2 source tarball
perf-tarxz-src-pkg - Build perf-3.12.0.tar.xz source tarball
$
from David Ahern.
. Enhance option parse error message, showing just the help lines
of the options affected, from Namhyung Kim.
. libtraceevent updates from upstream trace-cmd repo, from Steven
Rostedt.
. Always use perf_evsel__set_sample_bit to set sample_type, from
Adrian Hunter.
. Memory and mmap leak fixes from Chenggang Qin.
. Assorted build fixes for from David Ahern and Jiri Olsa.
. Speed up and prettify the build system, from Ingo Molnar.
. Implement addr2line directly using libbfd, from Roberto Vitillo.
. Separate the GTK support in a separate libperf-gtk.so DSO, that
is only loaded when --gtk is specified, from Namhyung Kim.
. perf bash completion fixes and improvements from Ramkumar
Ramachandra.
. Support for Openembedded/Yocto -dbg packages, from Ricardo
Ribalda Delgado.
And lots and lots of other fixes and code reorganizations that did not
make it into the list, see the shortlog, diffstat and the Git log for
details!"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (300 commits)
uprobes: Fix the memory out of bound overwrite in copy_insn()
uprobes: Fix the wrong usage of current->utask in uprobe_copy_process()
perf tools: Remove unneeded include
perf record: Remove post_processing_offset variable
perf record: Remove advance_output function
perf record: Refactor feature handling into a separate function
perf trace: Don't relookup fields by name in each sample
perf tools: Fix version when building out of tree
perf evsel: Ditch evsel->handler.data field
uprobes: Export write_opcode() as uprobe_write_opcode()
uprobes: Introduce arch_uprobe->ixol
uprobes: Kill module_init() and module_exit()
uprobes: Move function declarations out of arch
perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support
perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes
perf: Factor out strncpy() in perf_event_mmap_event()
tools/perf: Add required memory barriers
perf: Fix arch_perf_out_copy_user default
perf: Update a stale comment
perf: Optimize perf_output_begin() -- address calculation
...
Pull leftover IRQ fixes from Ingo Molnar:
"Two (minor) fixlets that missed v3.12"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Set the irq thread policy without checking CAP_SYS_NICE
irq: DocBook/genericirq.tmpl: Correct various typos
Pull IRQ changes from Ingo Molnar:
"The biggest change this cycle are the softirq/hardirq stack
interaction and nesting fixes, cleanups and reorganizations from
Frederic. This is the longer followup story to the softirq nesting
fix that is already upstream (commit ded7975475: "irq: Force hardirq
exit's softirq processing on its own stack")"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip: bcm2835: Convert to use IRQCHIP_DECLARE macro
powerpc: Tell about irq stack coverage
x86: Tell about irq stack coverage
irq: Optimize softirq stack selection in irq exit
irq: Justify the various softirq stack choices
irq: Improve a bit softirq debugging
irq: Optimize call to softirq on hardirq exit
irq: Consolidate do_softirq() arch overriden implementations
x86/irq: Correct comment about i8259 initialization
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:
- Idle entry/exit changes, to throttle callback execution and other
refinements to speed up kbuild, primarily to address performance
issues located by Tibor Billes.
- Grace-period related changes, primarily to aid in debugging,
inspired by an -rt debugging session.
- Code reorganization moving RCU's source files into its own
kernel/rcu/ directory.
- RCU documentation updates
- Miscellaneous fixes.
Note, the following commit:
5c889690aa mm: Place preemption point in do_mlockall() loop
is identical to the commit already in your tree via email:
22356f447c mm: Place preemption point in do_mlockall() loop
[ Your version of the changelog nicely demonstrates it how kernel oops
messages should be trimmed properly :-/ ]"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
rcu: Move RCU-related source code to kernel/rcu directory
rcu: Fix occurrence of "the the" in checklist.txt
kthread: Add pointer to vmstat-avoidance patch
rcu: Update stall-warning documentation
rcu: Consistent rcu_is_watching() naming
rcu: Change EXPORT_SYMBOL() to EXPORT_SYMBOL_GPL()
rcu: Is it safe to enter an RCU read-side critical section?
rcu: Throttle invoke_rcu_core() invocations due to non-lazy callbacks
rcu: Throttle rcu_try_advance_all_cbs() execution
rcu: Remove redundant code from rcu_cleanup_after_idle()
rcu: Fix CONFIG_RCU_NOCB_CPU_ALL panic on machines with sparse CPU mask
rcu: Avoid sparse warnings in rcu_nocb_wake trace event
rcu: Track rcu_nocb_kthread()'s sleeping and awakening
rcu: Distinguish between NOCB and non-NOCB rcu_callback trace events
rcu: Add tracing for rcuo no-CBs CPU wakeup handshake
rcu: Add tracing of normal (non-NOCB) grace-period requests
rcu: Add tracing to rcu_gp_kthread()
rcu: Flag lockless access to ->gp_flags with ACCESS_ONCE()
rcu: Prevent spurious-wakeup DoS attack on rcu_gp_kthread()
rcu: Improve grace-period start logic
...
A bit of cleanup plus some gratuitous variable renaming. I think using
structures instead of numeric offsets makes this code much more
understandable.
Also added a comment about current time range expected by
the server.
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Opens on current cifs/smb2/smb3 mounts with O_DIRECT flag fail
even when caching is disabled on the mount. This was
reported by those running SMB2 benchmarks who need to
be able to pass O_DIRECT on many of their open calls to
reduce caching effects, but would also be needed by other
applications.
When mounting with forcedirectio ("cache=none") cifs and smb2/smb3
do not go through the page cache and thus opens with O_DIRECT flag
should work (when posix extensions are negotiated we even are
able to send the flag to the server). This patch fixes that
in a simple way.
The 9P client has a similar situation (caching is often disabled)
and takes the same approach to O_DIRECT support ie works if caching
disabled, but if client caching enabled it fails with EINVAL.
A followon idea for a future patch as Pavel noted, could
be that files opened with O_DIRECT could cause us to change
inode->i_fop on the fly from
cifs_file_strict_ops
to
cifs_file_direct_ops
which would allow us to support this on non-forcedirectio mounts
(cache=strict and cache=loose) as well.
Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Andrey reported that he was seeing cifs.ko spam the logs with messages
like this:
CIFS VFS: Unexpected lookup error -26
He was listing the root directory of a server and hitting an error when
trying to QUERY_PATH_INFO against hiberfil.sys and pagefile.sys. The
right fix would be to switch the lookup code over to using FIND_FIRST,
but until then we really don't need to report this at a level of
KERN_ERR. Convert this message over to FYI level.
Reported-by: "Andrey Shernyukov" <andreysh@nioch.nsc.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Sometimes, the server will report an error that basically indicates
that it's running out of resources. These include these under SMB1:
NT_STATUS_NO_MEMORY
NT_STATUS_SECTION_TOO_BIG
NT_STATUS_TOO_MANY_PAGING_FILES
...and this one under SMB2:
STATUS_NO_MEMORY
Currently, this gets mapped to ENOMEM by the client, but that's
confusing as an ENOMEM error is typically an indicator that the
client is out of memory.
Change these errors to instead map to EREMOTEIO to indicate that
the problem is actually server-side and not on the client.
Reported-by: "ISHIKAWA,chiaki" <ishikawa@yk.rim.or.jp>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Now we treat any reparse point as a symbolic link and map it to a Unix
one that is not true in a common case due to many reparse point types
supported by SMB servers.
Distinguish reparse point types into two groups:
1) that can be accessed directly through a reparse point
(junctions, deduplicated files, NFS symlinks);
2) that need to be processed manually (Windows symbolic links, DFS);
and map only Windows symbolic links to Unix ones.
Cc: <stable@vger.kernel.org>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reported-and-tested-by: Joao Correia <joaomiguelcorreia@gmail.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Userspace uses the netdev devtype for stuff like device naming and type
detection. Be nice and set it. Remove the pointless #if/#endif around
SET_NETDEV_DEV too.
Signed-off-by: Dan Williams <dcbw@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There are several places which are missing unlocks.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add the missing unlock before return from function
wcn36xx_smd_update_proberesp_tmpl() in the error handling case.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Currently, the PLL is turned off for AR9485 when
switching to a low power state, but AR9485 has an issue
where the card will become unresponsive if left idle
for a long time without any traffic. To fix this,
force the PLL to always be on using a different initval
array, ar9485_1_1_pll_on_cdr_on_clkreq_disable_L1.
This is done for most of the AR9485 based cards
like HB125, WB225 etc. but certain models require the
feature to be turned off. Identify such cards and use
default values for them.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
[1] The gpmi uses the nand_command_lp to issue the commands to NAND chips.
The gpmi issues a DMA operation with gpmi_cmd_ctrl when it handles
a NAND_CMD_NONE control command. So when we read a page(NAND_CMD_READ0)
from the NAND, we may send two DMA operations back-to-back.
If we do not serialize the two DMA operations, we will meet a bug when
1.1) we enable CONFIG_DMA_API_DEBUG, CONFIG_DMADEVICES_DEBUG,
and CONFIG_DEBUG_SG.
1.2) Use the following commands in an UART console and a SSH console:
cmd 1: while true;do dd if=/dev/mtd0 of=/dev/null;done
cmd 1: while true;do dd if=/dev/mmcblk0 of=/dev/null;done
The kernel log shows below:
-----------------------------------------------------------------
kernel BUG at lib/scatterlist.c:28!
Unable to handle kernel NULL pointer dereference at virtual address 00000000
.........................
[<80044a0c>] (__bug+0x18/0x24) from [<80249b74>] (sg_next+0x48/0x4c)
[<80249b74>] (sg_next+0x48/0x4c) from [<80255398>] (debug_dma_unmap_sg+0x170/0x1a4)
[<80255398>] (debug_dma_unmap_sg+0x170/0x1a4) from [<8004af58>] (dma_unmap_sg+0x14/0x6c)
[<8004af58>] (dma_unmap_sg+0x14/0x6c) from [<8027e594>] (mxs_dma_tasklet+0x18/0x1c)
[<8027e594>] (mxs_dma_tasklet+0x18/0x1c) from [<8007d444>] (tasklet_action+0x114/0x164)
-----------------------------------------------------------------
1.3) Assume the two DMA operations is X (first) and Y (second).
The root cause of the bug:
Assume process P issues DMA X, and sleep on the completion
@this->dma_done. X's tasklet callback is dma_irq_callback. It firstly
wake up the process sleeping on the completion @this->dma_done,
and then trid to unmap the scatterlist S. The waked process P will
issue Y in another ARM core. Y initializes S->sg_magic to zero
with sg_init_one(), while dma_irq_callback is unmapping S at the same
time.
See the diagram:
ARM core 0 | ARM core 1
-------------------------------------------------------------
(P issues DMA X, then sleep) --> |
|
(X's tasklet wakes P) --> |
|
| <-- (P begin to issue DMA Y)
|
(X's tasklet unmap the |
scatterlist S with dma_unmap_sg) --> | <-- (Y calls sg_init_one() to init
| scatterlist S)
|
[2] This patch serialize both the X and Y in the following way:
Unmap the DMA scatterlist S firstly, and wake up the process at the end
of the DMA callback, in such a way, Y will be executed after X.
After this patch:
ARM core 0 | ARM core 1
-------------------------------------------------------------
(P issues DMA X, then sleep) --> |
|
(X's tasklet unmap the |
scatterlist S with dma_unmap_sg) --> |
|
(X's tasklet wakes P) --> |
|
| <-- (P begin to issue DMA Y)
|
| <-- (Y calls sg_init_one() to init
| scatterlist S)
|
Cc: stable@vger.kernel.org # 3.2
Signed-off-by: Huang Shijie <b32955@freescale.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
We were using it at 10 kHz, which doesn't work in machines where somehow
the max freq was auto reduced by the kernel:
[root@ssdandy ~]# perf test 19
19: Test software clock events have valid period values : FAILED!
[root@ssdandy ~]# perf test -v 19
19: Test software clock events have valid period values :
--- start ---
Couldn't open evlist: Invalid argument
---- end ----
Test software clock events have valid period values: FAILED!
[root@ssdandy ~]#
[root@ssdandy ~]# cat /proc/sys/kernel/perf_event_max_sample_rate
7000
Reducing it to 500 Hz should be good enough for this test and also
shouldn't affect what it is testing.
But warn the user if it fails, informing the knob and the freq tried.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-548rhj1uo6xbwnxa95kw3hqe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On 64 bit systems we write past the end of the arg[] array.
Fixes: 8e84c25821 ('wcn36xx: mac80211 driver for Qualcomm WCN3660/WCN3680 hardware')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The eth_hdr is never defined in this driver but it gets compiled
without any warning/error because kernel has defined eth_hdr.
Fix it by defining our own p_ethhdr and use it instead of eth_hdr.
Cc: <stable@vger.kernel.org> # 3.7+
Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While receiving a packet on SDIO interface, we allocate skb with
size multiple of SDIO block size. We need to resize this skb
after RX using packet length from RX header.
Cc: <stable@vger.kernel.org>
Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The routine that processes received frames was returning the RSSI value for the
signal strength; however, that value is available only for associated APs. As
a result, the strength was the absurd value of 10 dBm. As a result, scans
return incorrect values for the strength, which causes unwanted attempts to roam.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [3.1+]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The routine that processes received frames was returning the RSSI value for the
signal strength; however, that value is available only for associated APs. As
a result, the strength was the absurd value of 10 dBm. As a result, scans
return incorrect values for the strength, which causes unwanted attempts to roam.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org> [2.6.39+]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The routine that processes received frames was returning the RSSI value for the
signal strength; however, that value is available only for associated APs. As
a result, the strength was the absurd value of 10 dBm. As a result, scans
return incorrect values for the strength, which causes unwanted attempts to roam.
This patch fixes https://bugzilla.kernel.org/show_bug.cgi?id=63881.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Matthieu Baerts <matttbe@gmail.com>
Cc: Stable <stable@vger.kernel.org> [3.0 +]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All of the rtlwifi drivers have an error in the routine that tests if
the data is "special". If it is, the subsequant transmission will be
at the lowest rate to enhance reliability. The 16-bit quantity is
big-endian, but was being extracted in native CPU mode. One of the
effects of this bug is to inhibit association under some conditions
as the TX rate is too high.
Based on suggestions by Joe Perches, the entire routine is rewritten.
One of the local headers contained duplicates of some of the ETH_P_XXX
definitions. These are deleted.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Stable <stable@vger.kernel.org> [2.6.38+]
Signed-off-by: John W. Linville <linville@tuxdriver.com>