linux

Author	SHA1	Message	Date
Eli Cohen	05bdb2ab6b	mlx5_core: Fix PowerPC support 1. Fix derivation of sub-page index from the dma address in free_4k. 2. Fix the DMA address passed to dma_unmap_page by masking it properly. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:52 -08:00
Eli Cohen	db81a5c374	mlx5_core: Improve debugfs readability Use strings to display transport service or state of QPs. Use numeric value for MTU of a QP. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:50 -08:00
Eli Cohen	bde51583f4	IB/mlx5: Add support for resize CQ Implement resize CQ which is a mandatory verb in mlx5. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:50 -08:00
Eli Cohen	3bdb31f688	IB/mlx5: Implement modify CQ Modify CQ is used by ULPs like IPoIB to change moderation parameters. This patch adds support in mlx5. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>	2014-01-22 23:23:49 -08:00
Linus Torvalds	0dc3fd0249	Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell. * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: module: Add missing newline in printk call. module: fix coding style export: declare ksymtab symbols module.h: Remove unnecessary semicolon params: improve standard definitions Add Documentation/module-signing.txt file	2014-01-22 22:30:15 -08:00
Linus Torvalds	84621c9b18	Merge tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "Two major features that Xen community is excited about: The first is event channel scalability by David Vrabel - we switch over from an two-level per-cpu bitmap of events (IRQs) - to an FIFO queue with priorities. This lets us be able to handle more events, have lower latency, and better scalability. Good stuff. The other is PVH by Mukesh Rathor. In short, PV is a mode where the kernel lets the hypervisor program page-tables, segments, etc. With EPT/NPT capabilities in current processors, the overhead of doing this in an HVM (Hardware Virtual Machine) container is much lower than the hypervisor doing it for us. In short we let a PV guest run without doing page-table, segment, syscall, etc updates through the hypervisor - instead it is all done within the guest container. It is a "hybrid" PV - hence the 'PVH' name - a PV guest within an HVM container. The major benefits are less code to deal with - for example we only use one function from the the pv_mmu_ops (which has 39 function calls); faster performance for syscall (no context switches into the hypervisor); less traps on various operations; etc. It is still being baked - the ABI is not yet set in stone. But it is pretty awesome and we are excited about it. Lastly, there are some changes to ARM code - you should get a simple conflict which has been resolved in #linux-next. In short, this pull has awesome features. Features: - FIFO event channels. Key advantages: support for over 100,000 events (2^17), 16 different event priorities, improved fairness in event latency through the use of FIFOs. - Xen PVH support. "It’s a fully PV kernel mode, running with paravirtualized disk and network, paravirtualized interrupts and timers, no emulated devices of any kind (and thus no qemu), no BIOS or legacy boot — but instead of requiring PV MMU, it uses the HVM hardware extensions to virtualize the pagetables, as well as system calls and other privileged operations." (from "The Paravirtualization Spectrum, Part 2: From poles to a spectrum") Bug-fixes: - Fixes in balloon driver (refactor and make it work under ARM) - Allow xenfb to be used in HVM guests. - Allow xen_platform_pci=0 to work properly. - Refactors in event channels" * tag 'stable/for-linus-3.14-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (52 commits) xen/pvh: Set X86_CR0_WP and others in CR0 (v2) MAINTAINERS: add git repository for Xen xen/pvh: Use 'depend' instead of 'select'. xen: delete new instances of __cpuinit usage xen/fb: allow xenfb initialization for hvm guests xen/evtchn_fifo: fix error return code in evtchn_fifo_setup() xen-platform: fix error return code in platform_pci_init() xen/pvh: remove duplicated include from enlighten.c xen/pvh: Fix compile issues with xen_pvh_domain() xen: Use dev_is_pci() to check whether it is pci device xen/grant-table: Force to use v1 of grants. xen/pvh: Support ParaVirtualized Hardware extensions (v3). xen/pvh: Piggyback on PVHVM XenBus. xen/pvh: Piggyback on PVHVM for grant driver (v4) xen/grant: Implement an grant frame array struct (v3). xen/grant-table: Refactor gnttab_init xen/grants: Remove gnttab_max_grant_frames dependency on gnttab_init. xen/pvh: Piggyback on PVHVM for event channels (v2) xen/pvh: Update E820 to work with PVH (v2) xen/pvh: Secondary VCPU bringup (non-bootup CPUs) ...	2014-01-22 22:00:18 -08:00
Jiri Pirko	a9517d0f43	rtnetlink: remove ndo_get_slave No longer used API bond-specific can be removed now. This is now handled in a generic way in rtnl_link_ops. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:17 -08:00
Jiri Pirko	ba7d49b1f0	rtnetlink: provide api for getting and setting slave info Recent patch bonding: add netlink attributes to slave link dev (`1d3ee88ae0`) Introduced yet another device specific way to access slave information over rtnetlink. There is one already there for bridge. This patch introduces generic way to do this, for getting and setting info as well by extending link_ops. Later on, this new interface will be used for bridge ports as well. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
Jiri Pirko	df7dbcbbaf	rtnetlink: put "BOND" into nl attribute names which are related to bonding Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
FX Le Bail	7c90cc2d40	ipv6: enable anycast addresses as source addresses for datagrams This change allows to consider an anycast address valid as source address when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item. So, when sending a datagram with ancillary data, the unicast and anycast addresses are handled in the same way. - Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local on given interface or is global. - Uses it in ip6_datagram_send_ctl(). Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
FX Le Bail	eb97768acb	net: update comments of "struct msghdr" with the more accurate RFC3542 ones Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 21:57:05 -08:00
Linus Torvalds	7ebd3faa9b	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First round of KVM updates for 3.14; PPC parts will come next week. Nothing major here, just bugfixes all over the place. The most interesting part is the ARM guys' virtualized interrupt controller overhaul, which lets userspace get/set the state and thus enables migration of ARM VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) kvm: make KVM_MMU_AUDIT help text more readable KVM: s390: Fix memory access error detection KVM: nVMX: Update guest activity state field on L2 exits KVM: nVMX: Fix nested_run_pending on activity state HLT KVM: nVMX: Clean up handling of VMX-related MSRs KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit KVM: nVMX: Leave VMX mode on clearing of feature control MSR KVM: VMX: Fix DR6 update on #DB exception KVM: SVM: Fix reading of DR6 KVM: x86: Sync DR7 on KVM_SET_DEBUGREGS add support for Hyper-V reference time counter KVM: remove useless write to vcpu->hv_clock.tsc_timestamp KVM: x86: fix tsc catchup issue with tsc scaling KVM: x86: limit PIT timer frequency KVM: x86: handle invalid root_hpa everywhere kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub kvm: vfio: silence GCC warning KVM: ARM: Remove duplicate include arm/arm64: KVM: relax the requirements of VMA alignment for THP ...	2014-01-22 21:40:43 -08:00
Linus Torvalds	bb1281f2aa	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science stuff from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) neighbour.h: fix comment sched: Fix warning on make htmldocs caused by wait.h slab: struct kmem_cache is protected by slab_mutex doc: Fix typo in USB Gadget Documentation of/Kconfig: Spelling s/one/once/ mkregtable: Fix sscanf handling lp5523, lp8501: comment improvements thermal: rcar: comment spelling treewide: fix comments and printk msgs IXP4xx: remove '1 &&' from a condition check in ixp4xx_restart() Documentation: update /proc/uptime field description Documentation: Fix size parameter for snprintf arm: fix comment header and macro name asm-generic: uaccess: Spelling s/a ny/any/ mtd: onenand: fix comment header doc: driver-model/platform.txt: fix a typo drivers: fix typo in DEVTMPFS_MOUNT Kconfig help text doc: Fix typo (acces_process_vm -> access_process_vm) treewide: Fix typos in printk drivers/gpu/drm/qxl/Kconfig: reformat the help text ...	2014-01-22 21:21:55 -08:00
Linus Torvalds	fe41c2c018	Merge tag 'dm-3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device-mapper changes from Mike Snitzer: "A lot of attention was paid to improving the thin-provisioning target's handling of metadata operation failures and running out of space. A new 'error_if_no_space' feature was added to allow users to error IOs rather than queue them when either the data or metadata space is exhausted. Additional fixes/features include: - a few fixes to properly support thin metadata device resizing - a solution for reliably waiting for a DM device's embedded kobject to be released before destroying the device - old dm-snapshot is updated to use the dm-bufio interface to take advantage of readahead capabilities that improve snapshot activation - new dm-cache target tunables to control how quickly data is promoted to the cache (fast) device - improved write efficiency of cluster mirror target by combining userspace flush and mark requests" * tag 'dm-3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (35 commits) dm log userspace: allow mark requests to piggyback on flush requests dm space map metadata: fix bug in resizing of thin metadata dm cache: add policy name to status output dm thin: fix pool feature parsing dm sysfs: fix a module unload race dm snapshot: use dm-bufio prefetch dm snapshot: use dm-bufio dm snapshot: prepare for switch to using dm-bufio dm snapshot: use GFP_KERNEL when initializing exceptions dm cache: add block sizes and total cache blocks to status output dm btree: add dm_btree_find_lowest_key dm space map metadata: fix extending the space map dm space map common: make sure new space is used during extend dm: wait until embedded kobject is released before destroying a device dm: remove pointless kobject comparison in dm_get_from_kobject dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() dm cache policy mq: introduce three promotion threshold tunables dm cache policy mq: use list_del_init instead of list_del + INIT_LIST_HEAD dm thin: fix set_pool_mode exposed pool operation races dm thin: eliminate the no_free_space flag ...	2014-01-22 20:17:48 -08:00
Neil Horman	2d36097d26	af_packet: Add Queue mapping mode to af_packet fanout operation This patch adds a queue mapping mode to the fanout operation of af_packet sockets. This allows user space af_packet users to better filter on flows ingressing and egressing via a specific hardware queue, and avoids the potential packet reordering that can occur when FANOUT_CPU is being used and irq affinity varies. Tested successfully by myself. applies to net-next Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 17:35:50 -08:00
Linus Torvalds	194e57fd18	Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This patch set is a lot of driver updates for qla4xxx, bfa, hpsa, qla2xxx. It also removes the aic7xxx_old driver (which has been deprecated for nearly a decade) and adds support for deadlines in error handling" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (75 commits) [SCSI] hpsa: allow SCSI mid layer to handle unit attention [SCSI] hpsa: do not require board "not ready" status after hard reset [SCSI] hpsa: enable unit attention reporting [SCSI] hpsa: rename scsi prefetch field [SCSI] hpsa: use workqueue instead of kernel thread for lockup detection [SCSI] ipr: increase dump size in ipr driver [SCSI] mac_scsi: Fix crash on out of memory [SCSI] st: fix enlarge_buffer [SCSI] qla1280: Annotate timer on stack so object debug does not complain [SCSI] qla4xxx: Update driver version to 5.04.00-k3 [SCSI] qla4xxx: Recreate chap data list during get chap operation [SCSI] qla4xxx: Add support for ISCSI_PARAM_LOCAL_IPADDR sysfs attr [SCSI] libiscsi: Add local_ipaddr parameter in iscsi_conn struct [SCSI] scsi_transport_iscsi: Export ISCSI_PARAM_LOCAL_IPADDR attr for iscsi_connection [SCSI] qla4xxx: Add host statistics support [SCSI] scsi_transport_iscsi: Add host statistics support [SCSI] qla4xxx: Added support for Diagnostics MBOX command [SCSI] bfa: Driver version upgrade to 3.2.23.0 [SCSI] bfa: change FC_ELS_TOV to 20sec [SCSI] bfa: Observed auto D-port mode instead of manual ...	2014-01-22 17:32:26 -08:00
Linus Torvalds	e1ba84597c	Merge tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.14 merge window: Resource management - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas) - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu) - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas) - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas) - Enforce bus address limits in resource allocation (Yinghai Lu) - Allocate 64-bit BARs above 4G when possible (Yinghai Lu) - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu) PCI device hotplug - Major rescan/remove locking update (Rafael J. Wysocki) - Make ioapic builtin only (not modular) (Yinghai Lu) - Fix release/free issues (Yinghai Lu) - Clean up pciehp (Bjorn Helgaas) - Announce pciehp slot info during enumeration (Bjorn Helgaas) MSI - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev) - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev) - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev) - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman) - Drop "irq" param from _restore_msi_irqs() (DuanZhenzhong) SR-IOV - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao) Virtualization - Add support for save/restore of extended capabilities (Alex Williamson) - Add Virtual Channel to save/restore support (Alex Williamson) - Never treat a VF as a multifunction device (Alex Williamson) - Add pci_try_reset_function(), et al (Alex Williamson) AER - Ignore non-PCIe error sources (Betty Dall) - Support ACPI HEST error sources for domains other than 0 (Betty Dall) - Consolidate HEST error source parsers (Bjorn Helgaas) - Add a TLP header print helper (Borislav Petkov) Freescale i.MX6 - Remove unnecessary code (Fabio Estevam) - Make reset-gpio optional (Marek Vasut) - Report "link up" only after link training completes (Marek Vasut) - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut) - Fix PCIe startup code (Richard Zhu) Marvell MVEBU - Remove duplicate of_clk_get_by_name() call (Andrew Lunn) - Drop writes to bridge Secondary Status register (Jason Gunthorpe) - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe) - Support a bridge with no IO port window (Jason Gunthorpe) - Use max_t() instead of max(resource_size_t,) (Jingoo Han) - Remove redundant of_match_ptr (Sachin Kamat) - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni) NVIDIA Tegra - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower) Renesas R-Car - Add runtime PM support (Valentine Barshak) - Fix rcar_pci_probe() return value check (Wei Yongjun) Synopsys DesignWare - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen) - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen) - Fix missing MSI IRQs (Harro Haan) - Add dw_pcie prefix before cfg_read/write (Pratyush Anand) - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand) - Whitespace cleanup (Jingoo Han) EISA - Call put_device() if device_register() fails (Levente Kurusa) - Revert EISA initialization breakage ((Bjorn Helgaas) Miscellaneous - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger) - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas) - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas) - Use dev_is_pci() to identify PCI devices (Yijing Wang) - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches) - Update documentation 00-INDEX (Erik Ekman)" tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (119 commits) Revert "EISA: Initialize device before its resources" Revert "EISA: Log device resources in dmesg" vfio-pci: Use pci "try" reset interface PCI: Check parent kobject in pci_destroy_dev() xen/pcifront: Use global PCI rescan-remove locking powerpc/eeh: Use global PCI rescan-remove locking PCI: Fix pci_check_and_unmask_intx() comment typos PCI: Add pci_try_reset_function(), pci_try_reset_slot(), pci_try_reset_bus() MPT / PCI: Use pci_stop_and_remove_bus_device_locked() platform / x86: Use global PCI rescan-remove locking PCI: hotplug: Use global PCI rescan-remove locking pcmcia: Use global PCI rescan-remove locking ACPI / hotplug / PCI: Use global PCI rescan-remove locking ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug PCI: Add global pci_lock_rescan_remove() PCI: Cleanup pci.h whitespace PCI: Reorder so actual code comes before stubs PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0 ACPICA: Add helper macros to extract bus/segment numbers from HEST table. PCI: Make local functions static ...	2014-01-22 16:39:28 -08:00
Linus Torvalds	60eaa0190f	Merge tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a new feature to ftrace, namely the trace event triggers by Tom Zanussi. A trigger is a way to enable an action when an event is hit. The actions are: o trace on/off - enable or disable tracing o snapshot - save the current trace buffer in the snapshot o stacktrace - dump the current stack trace to the ringbuffer o enable/disable events - enable or disable another event Namhyung Kim added updates to the tracing uprobes code. Having the uprobes add support for fetch methods. The rest are various bug fixes with the new code, and minor ones for the old code" * tag 'trace-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (38 commits) tracing: Fix buggered tee(2) on tracing_pipe tracing: Have trace buffer point back to trace_array ftrace: Fix synchronization location disabling and freeing ftrace_ops ftrace: Have function graph only trace based on global_ops filters ftrace: Synchronize setting function_trace_op with ftrace_trace_function tracing: Show available event triggers when no trigger is set tracing: Consolidate event trigger code tracing: Fix counter for traceon/off event triggers tracing: Remove double-underscore naming in syscall trigger invocations tracing/kprobes: Add trace event trigger invocations tracing/probes: Fix build break on !CONFIG_KPROBE_EVENT tracing/uprobes: Add @+file_offset fetch method uprobes: Allocate ->utask before handler_chain() for tracing handlers tracing/uprobes: Add support for full argument access methods tracing/uprobes: Fetch args before reserving a ring buffer tracing/uprobes: Pass 'is_return' to traceprobe_parse_probe_arg() tracing/probes: Implement 'memory' fetch method for uprobes tracing/probes: Add fetch{,_size} member into deref fetch method tracing/probes: Move 'symbol' fetch method to kprobes tracing/probes: Implement 'stack' fetch method for uprobes ...	2014-01-22 16:35:21 -08:00
Miklos Szeredi	28a625cbc2	fuse: fix pipe_buf_operations Having this struct in module memory could Oops when if the module is unloaded while the buffer still persists in a pipe. Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal merge them into nosteal_pipe_buf_ops (this is the same as default_pipe_buf_ops except stealing the page from the buffer is not allowed). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org	2014-01-22 19:36:57 +01:00
Li Zhong	c04e7da013	neighbour.h: fix comment Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-01-22 10:34:15 +01:00
Masanari Iida	f434f7afa5	sched: Fix warning on make htmldocs caused by wait.h Missing "@" in include/linux/wait.h cause "make htmldocs" failed with following warning messages. Warning(/home/iida/Repo/linux-next//include/linux/wait.h:304): No description found for parameter 'cmd1' Warning(/home/iida/Repo/linux-next//include/linux/wait.h:304): No description found for parameter 'cmd2' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2014-01-22 10:25:39 +01:00
Geert Uytterhoeven	c3d19d3c3f	drm/i915: Spelling s/auxilliary/auxiliary/ Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2014-01-22 09:58:24 +01:00
Hannes Frederic Sowa	809fa972fd	reciprocal_divide: update/correction of the algorithm Jakub Zawadzki noticed that some divisions by reciprocal_divide() were not correct [1][2], which he could also show with BPF code after divisions are transformed into reciprocal_value() for runtime invariance which can be passed to reciprocal_divide() later on; reverse in BPF dump ended up with a different, off-by-one K in some situations. This has been fixed by Eric Dumazet in commit `aee636c480` ("bpf: do not use reciprocal divide"). This follow-up patch improves reciprocal_value() and reciprocal_divide() to work in all cases by using Granlund and Montgomery method, so that also future use is safe and without any non-obvious side-effects. Known problems with the old implementation were that division by 1 always returned 0 and some off-by-ones when the dividend and divisor where very large. This seemed to not be problematic with its current users, as far as we can tell. Eric Dumazet checked for the slab usage, we cannot surely say so in the case of flex_array. Still, in order to fix that, we propose an extension from the original implementation from commit `6a2d7a955d` resp. [3][4], by using the algorithm proposed in "Division by Invariant Integers Using Multiplication" [5], Torbjörn Granlund and Peter L. Montgomery, that is, pseudocode for q = n/d where q, n, d is in u32 universe: 1) Initialization: int l = ceil(log_2 d) uword m' = floor((1<<32)((1<<l)-d)/d)+1 int sh_1 = min(l,1) int sh_2 = max(l-1,0) 2) For q = n/d, all uword: uword t = (nm')>>32 q = (t+((n-t)>>sh_1))>>sh_2 The assembler implementation from Agner Fog [6] also helped a lot while implementing. We have tested the implementation on x86_64, ppc64, i686, s390x; on x86_64/haswell we're still half the latency compared to normal divide. Joint work with Daniel Borkmann. [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c [3] https://gmplib.org/~tege/division-paper.pdf [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 [6] http://www.agner.org/optimize/asmlib.zip Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Jesse Gross <jesse@nicira.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Daniel Borkmann	89770b0a69	net: introduce reciprocal_scale helper and convert users As David Laight suggests, we shouldn't necessarily call this reciprocal_divide() when users didn't requested a reciprocal_value(); lets keep the basic idea and call it reciprocal_scale(). More background information on this topic can be found in [1]. Joint work with Hannes Frederic Sowa. [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html Suggested-by: David Laight <david.laight@aculab.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Daniel Borkmann	f337db64af	random32: add prandom_u32_max and convert open coded users Many functions have open coded a function that returns a random number in range [0,N-1]. Under the assumption that we have a PRNG such as taus113 with being well distributed in [0, ~0U] space, we can implement such a function as uword t = (n*m')>>32, where m' is a random number obtained from PRNG, n the right open interval border and t our resulting random number, with n,m',t in u32 universe. Lets go with Joe and simply call it prandom_u32_max(), although technically we have an right open interval endpoint, but that we have documented. Other users can further be migrated to the new prandom_u32_max() function later on; for now, we need to make sure to migrate reciprocal_divide() users for the reciprocal_divide() follow-up fixup since their function signatures are going to change. Joint work with Hannes Frederic Sowa. Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Dongmao Zhang	5066a4df1f	dm log userspace: allow mark requests to piggyback on flush requests In the cluster evironment, cluster write has poor performance because userspace_flush() has to contact a userspace program (cmirrord) for clear/mark/flush requests. But both mark and flush requests require cmirrord to communicate the message to all the cluster nodes for each flush call. This behaviour is really slow. To address this we now merge mark and flush requests together to reduce the kernel-userspace-kernel time. We allow a new directive, "integrated_flush" that can be used to instruct the kernel log code to combine flush and mark requests when directed by userspace. If not directed by userspace (due to an older version of the userspace code perhaps), the kernel will function as it did previously - preserving backwards compatibility. Additionally, flush requests are performed lazily when only clear requests exist. Signed-off-by: Dongmao Zhang <dmzhang@suse.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2014-01-21 23:46:27 -05:00
Jens Axboe	e1803a706f	Merge branch 'for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/linux-block into for-3.14/drivers	2014-01-21 20:18:54 -08:00
CaiZhiyong	bca266b388	block: remove unrelated header files and export symbol Fix up the following items: - remove unrelated header files. - export interface function. - modify function cmdline_parts_parse return value, this will make it more friendly for the caller. Signed-off-by: CaiZhiyong <caizhiyong@huawei.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> CC: Brian Norris <computersforpeace@gmail.com> Cc: "Wanglin (Albert)" <albert.wanglin@hisilicon.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Karel Zak <kzak@redhat.com> Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2014-01-21 20:18:26 -08:00
Linus Torvalds	df32e43a54	Merge branch 'akpm' (incoming from Andrew) Merge first patch-bomb from Andrew Morton: - a couple of misc things - inotify/fsnotify work from Jan - ocfs2 updates (partial) - about half of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm/migrate: remove unused function, fail_migrate_page() mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages mm/migrate: correct failure handling if !hugepage_migration_support() mm/migrate: add comment about permanent failure path mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure mm: compaction: reset scanner positions immediately when they meet mm: compaction: do not mark unmovable pageblocks as skipped in async compaction mm: compaction: detect when scanners meet in isolate_freepages mm: compaction: reset cached scanner pfn's before reading them mm: compaction: encapsulate defer reset logic mm: compaction: trace compaction begin and end memcg, oom: lock mem_cgroup_print_oom_info sched: add tracepoints related to NUMA task migration mm: numa: do not automatically migrate KSM pages mm: numa: trace tasks that fail migration due to rate limiting mm: numa: limit scope of lock for NUMA migrate rate limiting mm: numa: make NUMA-migrate related functions static lib/show_mem.c: show num_poisoned_pages when oom mm/hwpoison: add '#' to hwpoison_inject mm/memblock: use WARN_ONCE when MAX_NUMNODES passed as input parameter ...	2014-01-21 19:05:45 -08:00
Daniel Borkmann	3769229931	net: filter: let bpf_tell_extensions return SKF_AD_MAX Michal Sekletar added in commit `ea02f9411d` ("net: introduce SO_BPF_EXTENSIONS") a facility where user space can enquire the BPF ancillary instruction set, which is imho a step into the right direction for letting user space high-level to BPF optimizers make an informed decision for possibly using these extensions. The original rationale was to return through a getsockopt(2) a bitfield of which instructions are supported and which are not, as of right now, we just return 0 to indicate a base support for SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Limitations of this approach are that this API which we need to maintain for a long time can only support a maximum of 32 extensions, and needs to be additionally maintained/updated when each new extension that comes in. I thought about this a bit more and what we can do here to overcome this is to just return SKF_AD_MAX. Since we never remove any extension since we cannot break user space and always linearly increase SKF_AD_MAX on each newly added extension, user space can make a decision on what extensions are supported in the whole set of extensions and which aren't, by just checking which of them from the whole set have an offset < SKF_AD_MAX of the underlying kernel. Since SKF_AD_MAX must be updated each time we add new ones, we don't need to introduce an additional enum and got maintenance for free. At some point in time when SO_BPF_EXTENSIONS becomes ubiquitous for most kernels, then an application can simply make use of this and easily be run on newer or older underlying kernels without needing to be recompiled, of course. Since that is for 3.14, it's not too late to do this change. Cc: Michal Sekletar <msekleta@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Michal Sekletar <msekleta@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:57:43 -08:00
wangweidong	5bc1d1b4a2	sctp: remove macros sctp_bh_[un]lock_sock Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:36 -08:00
wangweidong	048ed4b626	sctp: remove macros sctp_{lock\|release}_sock Redefined {lock\|release}_sock to sctp_{lock\|release}_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:36 -08:00
wangweidong	1b0de194f1	sctp: remove macros sctp_read_[un]lock Redefined read_[un]lock to sctp_read_[un]lock for user space friendly code which we haven't use in years, and the macros we never used, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	387602dfdc	sctp: remove macros sctp_write_[un]_lock Redefined write_[un]lock to sctp_write_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	3c8e43ba9f	sctp: remove macros sctp_spin_[un]lock Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	79b91130a2	sctp: remove macros sctp_local_bh_{disable\|enable} Redefined local_bh_{disable\|enable} to sctp_local_bh_{disable\|enable} for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:40 -08:00
wangweidong	940287ee10	sctp: remove macros sctp_spin_[un]lock_irqrestore Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:40 -08:00
Linus Torvalds	fbd918a202	Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: "Support for some new embedded controllers. A couple late (<= a week) fixes have stable cc'd and one patch ("SATA: MV: Add support for the optional PHYs") got committed yesterday because otherwise the resulting kernel would fail boot on an embedded board due to interdependent changes in its platform tree. Other than that, nothing too noteworthy" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: SATA: MV: Add support for the optional PHYs sata-highbank: Remove unnecessary ahci_platform.h include libata: disable LPM for some WD SATA-I devices ARM: mvebu: update the SATA compatible string for Armada 370/XP ata: sata_mv: fix disk hotplug for Armada 370/XP SoCs ata: sata_mv: introduce compatible string "marvell, armada-370-sata" ata: pata_samsung_cf: Remove unused macros ata: pata_samsung_cf: Use devm_ioremap_resource() ata: pata_samsung_cf: Merge pata_samsung_cf.h into pata_samsung_cf.c ata: pata_samsung_cf: Move plat/regs-ata.h to drivers/ata drivers: ata: Mark the function as static in libahci.c drivers: ata: Mark the function ahci_init_interrupts() as static in ahci.c ahci: imx: fix the error handling in imx_ahci_probe() ahci: imx: ahci_imx_softreset() can be static ahci: imx: Add i.MX53 support ahci: imx: Pull out the clock enable/disable calls libata, dt: Document sata_rcar bindings sata_rcar: Add R-Car Gen2 SATA PHY support ahci: mcp89: enter AHCI mode under Apple BIOS emulation ata: libata-eh: Remove unnecessary snprintf arithmetic	2014-01-21 18:16:08 -08:00
Or Gerlitz	dc01e7d344	net: Add GRO support for vxlan traffic Add GRO handlers for vxlann, by using the UDP GRO infrastructure. For single TCP session that goes through vxlan tunneling I got nice improvement from 6.8Gbs to 11.5Gbs --> UDP/VXLAN GRO disabled $ netperf -H 192.168.52.147 -c -C $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 65536 10.00 6799.75 12.54 24.79 0.604 1.195 --> UDP/VXLAN GRO enabled $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 65536 10.00 11562.72 24.90 20.34 0.706 0.577 Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Or Gerlitz	b582ef0990	net: Add GRO support for UDP encapsulating protocols Add GRO handlers for protocols that do UDP encapsulation, with the intent of being able to coalesce packets which encapsulate packets belonging to the same TCP session. For GRO purposes, the destination UDP port takes the role of the ether type field in the ethernet header or the next protocol in the IP header. The UDP GRO handler will only attempt to coalesce packets whose destination port is registered to have gro handler. Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive code twice on a packet. This solves the problem of udp encapsulated packets whose inner VM packet is udp and happen to carry a port which has registered offloads. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Linus Torvalds	f075e0f699	Merge branch 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "The bulk of changes are cleanups and preparations for the upcoming kernfs conversion. - cgroup_event mechanism which is and will be used only by memcg is moved to memcg. - pidlist handling is updated so that it can be served by seq_file. Also, the list is not sorted if sane_behavior. cgroup documentation explicitly states that the file is not sorted but it has been for quite some time. - All cgroup file handling now happens on top of seq_file. This is to prepare for kernfs conversion. In addition, all operations are restructured so that they map 1-1 to kernfs operations. - Other cleanups and low-pri fixes" * 'for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (40 commits) cgroup: trivial style updates cgroup: remove stray references to css_id doc: cgroups: Fix typo in doc/cgroups cgroup: fix fail path in cgroup_load_subsys() cgroup: fix missing unlock on error in cgroup_load_subsys() cgroup: remove for_each_root_subsys() cgroup: implement for_each_css() cgroup: factor out cgroup_subsys_state creation into create_css() cgroup: combine css handling loops in cgroup_create() cgroup: reorder operations in cgroup_create() cgroup: make for_each_subsys() useable under cgroup_root_mutex cgroup: css iterations and css_from_dir() are safe under cgroup_mutex cgroup: unify pidlist and other file handling cgroup: replace cftype->read_seq_string() with cftype->seq_show() cgroup: attach cgroup_open_file to all cgroup files cgroup: generalize cgroup_pidlist_open_file cgroup: unify read path so that seq_file is always used cgroup: unify cgroup_write_X64() and cgroup_write_string() cgroup: remove cftype->read(), ->read_map() and ->write() hugetlb_cgroup: convert away from cftype->read() ...	2014-01-21 17:51:34 -08:00
Linus Torvalds	2182c815f3	Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull GFS2 updates from Steven Whitehouse: "The main topics this time are allocation, in the form of Bob's improvements when searching resource groups and several updates to quotas which should increase scalability. The quota changes follow on from those in the last merge window, and there will likely be further work to come in this area in due course. There are also a few patches which help to improve efficiency of adding entries into directories, and clean up some of that code. One on-disk change is included this time, which is to write some additional information which should be useful to fsck and also potentially for debugging. Other than that, its just a few small random bug fixes and clean ups" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: (24 commits) GFS2: revert "GFS2: d_splice_alias() can't return error" GFS2: Small cleanup GFS2: Don't use ENOBUFS when ENOMEM is the correct error code GFS2: Fix kbuild test robot reported warning GFS2: Move quota bitmap operations under their own lock GFS2: Clean up quota slot allocation GFS2: Only run logd and quota when mounted read/write GFS2: Use RCU/hlist_bl based hash for quotas GFS2: No need to invalidate pages for a dio read GFS2: Add initialization for address space in super block GFS2: Add hints to directory leaf blocks GFS2: For exhash conversion, only one block is needed GFS2: Increase i_writecount during gfs2_setattr_chown GFS2: Remember directory insert point GFS2: Consolidate transaction blocks calculation for dir add GFS2: Add directory addition info structure GFS2: Use only a single address space for rgrps GFS2: Use range based functions for rgrp sync/invalidation GFS2: Remove test which is always true GFS2: Remove gfs2_quota_change_host structure ...	2014-01-21 17:42:00 -08:00
Vince Bridgers	2618abb73c	stmmac: Fix kernel crashes for jumbo frames These changes correct the following issues with jumbo frames on the stmmac driver: 1) The Synopsys EMAC can be configured to support different FIFO sizes at core configuration time. There's no way to query the controller and know the FIFO size, so the driver needs to get this information from the device tree in order to know how to correctly handle MTU changes and setting up dma buffers. The default max-frame-size is as currently used, which is the size of a jumbo frame. 2) The driver was enabling Jumbo frames by default, but was not allocating dma buffers of sufficient size to handle the maximum possible packet size that could be received. This led to memory corruption since DMAs were occurring beyond the extent of the allocated receive buffers for certain types of network traffic. kernel BUG at net/core/skbuff.c:126! Internal error: Oops - BUG: 0 [#1] SMP ARM Modules linked in: CPU: 0 PID: 563 Comm: sockperf Not tainted 3.13.0-rc6-01523-gf7111b9 #31 task: ef35e580 ti: ef252000 task.ti: ef252000 PC is at skb_panic+0x60/0x64 LR is at skb_panic+0x60/0x64 pc : [<c03c7c3c>] lr : [<c03c7c3c>] psr: 60000113 sp : ef253c18 ip : 60000113 fp : 00000000 r10: ef3a5400 r9 : 00000ebc r8 : ef3a546c r7 : ee59f000 r6 : ee59f084 r5 : ee59ff40 r4 : ee59f140 r3 : 000003e2 r2 : 00000007 r1 : c0b9c420 r0 : 0000007d Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 2e8ac04a DAC: 00000015 Process sockperf (pid: 563, stack limit = 0xef252248) Stack: (0xef253c18 to 0xef254000) 3c00: 00000ebc ee59f000 3c20: ee59f084 ee59ff40 ee59f140 c04a9cd8 ee8c50c0 00000ebc ee59ff40 00000000 3c40: ee59f140 c02d0ef0 00000056 ef1eda80 ee8c50c0 00000ebc 22bbef29 c0318f8c 3c60: 00000056 ef3a547c ffe2c716 c02c9c90 c0ba1298 ef3a5838 ef3a5838 ef3a5400 3c80: 000020c0 ee573840 000055cb ef3f2050 c053f0e0 c0319214 22b9b085 22d92813 3ca0: 00001c80 004b8e00 ef3a5400 ee573840 ef3f2064 22d92813 ef3f2064 000055cb 3cc0: ef3f2050 c031a19c ef252000 00000000 00000000 c0561bc0 00000000 ff00ffff 3ce0: c05621c0 ef3a5400 ef3f2064 ee573840 00000020 ef3f2064 000055cb ef3f2050 3d00: c053f0e0 c031cad0 c053e740 00000e60 00000000 00000000 ee573840 ef3a5400 3d20: ef0a6e00 00000000 ef3f2064 c032507c 00010000 00000020 c0561bc0 c0561bc0 3d40: ee599850 c032799c 00000000 ee573840 c055a380 ef3a5400 00000000 ef3f2064 3d60: ef3f2050 c032799c 0101c7c0 2b6755cb c059a280 c030e4d8 000055cb ffffffff 3d80: ee574fc0 c055a380 ee574000 ee573840 00002b67 ee573840 c03fe9c4 c053fa68 3da0: c055a380 00001f6f 00000000 ee573840 c053f0e0 c0304fdc ef0a6e01 ef3f2050 3dc0: ee573858 ef031000 ee573840 c03055d8 c0ba0c40 ef000f40 00100100 c053f0dc 3de0: c053ffdc c053f0f0 00000008 00000000 ef031000 c02da948 00001140 00000000 3e00: c0563c78 ef253e5f 00000020 ee573840 00000020 c053f0f0 ef313400 ee573840 3e20: c053f0e0 00000000 00000000 c05380c0 ef313400 00001000 00000015 c02df280 3e40: ee574000 ef001e00 00000000 00001080 00000042 005cd980 ef031500 ef031500 3e60: 00000000 c02df824 ef031500 c053e390 c0541084 f00b1e00 c05925e8 c02df864 3e80: 00001f5c ef031440 c053e390 c0278524 00000002 00000000 c0b9eb48 c02df280 3ea0: ee8c7180 00000100 c0542ca8 00000015 00000040 ef031500 ef031500 ef031500 3ec0: c027803c ef252000 00000040 000000ec c05380c0 c0b9eb40 c0b9eb48 c02df940 3ee0: ef060780 ffffa4dd c0564a9c c056343c 002e80a8 00000080 ef031500 00000001 3f00: c053808c ef252000 fffec100 00000003 00000004 002e80a8 0000000c c00258f0 3f20: 002e80a8 c005e704 00000005 00000100 c05634d0 c0538080 c05333e0 00000000 3f40: 0000000a c0565580 c05380c0 ffffa4dc c05434f4 00400100 00000004 c0534cd4 3f60: 00000098 00000000 fffec100 002e80a8 00000004 002e80a8 002a20e0 c0025da8 3f80: c0534cd4 c000f020 fffec10c c053ea60 ef253fb0 c0008530 0000ffe2 b6ef67f4 3fa0: 40000010 ffffffff 00000124 c0012f3c 0000ffe2 002e80f0 0000ffe2 00004000 3fc0: becb6338 becb6334 00000004 00000124 002e80a8 00000004 002e80a8 002a20e0 3fe0: becb6300 becb62f4 002773bb b6ef67f4 40000010 ffffffff 00000000 00000000 [<c03c7c3c>] (skb_panic+0x60/0x64) from [<c02d0ef0>] (skb_put+0x4c/0x50) [<c02d0ef0>] (skb_put+0x4c/0x50) from [<c0318f8c>] (tcp_collapse+0x314/0x3ec) [<c0318f8c>] (tcp_collapse+0x314/0x3ec) from [<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) [<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) from [<c031a19c>] (tcp_data_queue+0x480/0xe6c) [<c031a19c>] (tcp_data_queue+0x480/0xe6c) from [<c031cad0>] (tcp_rcv_established+0x180/0x62c) [<c031cad0>] (tcp_rcv_established+0x180/0x62c) from [<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) [<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) from [<c032799c>] (tcp_v4_rcv+0x718/0x73c) [<c032799c>] (tcp_v4_rcv+0x718/0x73c) from [<c0304fdc>] (ip_local_deliver+0x98/0x274) [<c0304fdc>] (ip_local_deliver+0x98/0x274) from [<c03055d8>] (ip_rcv+0x420/0x758) [<c03055d8>] (ip_rcv+0x420/0x758) from [<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) [<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) from [<c02df280>] (netif_receive_skb+0x48/0xb4) [<c02df280>] (netif_receive_skb+0x48/0xb4) from [<c02df824>] (napi_gro_flush+0x70/0x94) [<c02df824>] (napi_gro_flush+0x70/0x94) from [<c02df864>] (napi_complete+0x1c/0x34) [<c02df864>] (napi_complete+0x1c/0x34) from [<c0278524>] (stmmac_poll+0x4e8/0x5c8) [<c0278524>] (stmmac_poll+0x4e8/0x5c8) from [<c02df940>] (net_rx_action+0xc4/0x1e4) [<c02df940>] (net_rx_action+0xc4/0x1e4) from [<c00258f0>] (__do_softirq+0x12c/0x2e8) [<c00258f0>] (__do_softirq+0x12c/0x2e8) from [<c0025da8>] (irq_exit+0x78/0xac) [<c0025da8>] (irq_exit+0x78/0xac) from [<c000f020>] (handle_IRQ+0x44/0x90) [<c000f020>] (handle_IRQ+0x44/0x90) from [<c0008530>] (gic_handle_irq+0x2c/0x5c) [<c0008530>] (gic_handle_irq+0x2c/0x5c) from [<c0012f3c>] (__irq_usr+0x3c/0x60) 3) The driver was setting the dma buffer size after allocating dma buffers, which caused a system panic when changing the MTU. BUG: Bad page state in process ifconfig pfn:2e850 page:c0b72a00 count:0 mapcount:0 mapping: (null) index:0x0 page flags: 0x200(arch_1) Modules linked in: CPU: 0 PID: 566 Comm: ifconfig Not tainted 3.13.0-rc6-01523-gf7111b9 #29 [<c001547c>] (unwind_backtrace+0x0/0xf8) from [<c00122dc>] (show_stack+0x10/0x14) [<c00122dc>] (show_stack+0x10/0x14) from [<c03c793c>] (dump_stack+0x70/0x88) [<c03c793c>] (dump_stack+0x70/0x88) from [<c00b2620>] (bad_page+0xc8/0x118) [<c00b2620>] (bad_page+0xc8/0x118) from [<c00b302c>] (get_page_from_freelist+0x744/0x870) [<c00b302c>] (get_page_from_freelist+0x744/0x870) from [<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) [<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) from [<c00b4858>] (__get_free_pages+0x10/0x54) [<c00b4858>] (__get_free_pages+0x10/0x54) from [<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) [<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) from [<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) [<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) from [<c02d240c>] (__alloc_skb+0x68/0x13c) [<c02d240c>] (__alloc_skb+0x68/0x13c) from [<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) [<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) from [<c0279378>] (stmmac_open+0x63c/0x1024) [<c0279378>] (stmmac_open+0x63c/0x1024) from [<c02e18cc>] (__dev_open+0xa0/0xfc) [<c02e18cc>] (__dev_open+0xa0/0xfc) from [<c02e1b40>] (__dev_change_flags+0x94/0x158) [<c02e1b40>] (__dev_change_flags+0x94/0x158) from [<c02e1c24>] (dev_change_flags+0x18/0x48) [<c02e1c24>] (dev_change_flags+0x18/0x48) from [<c0337bc0>] (devinet_ioctl+0x638/0x700) [<c0337bc0>] (devinet_ioctl+0x638/0x700) from [<c02c7aec>] (sock_ioctl+0x64/0x290) [<c02c7aec>] (sock_ioctl+0x64/0x290) from [<c0100890>] (do_vfs_ioctl+0x78/0x5b8) [<c0100890>] (do_vfs_ioctl+0x78/0x5b8) from [<c0100e0c>] (SyS_ioctl+0x3c/0x5c) [<c0100e0c>] (SyS_ioctl+0x3c/0x5c) from [<c000e760>] The fixes have been verified using reproducible, automated testing. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:05:27 -08:00
Hannes Frederic Sowa	82b276cd2b	ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow connecting and binding to them. sendmsg logic does already check msg->name for this but must trust already connected sockets which could be set up for connection to ipv4 address family. Per-socket flag ipv6only is of no use here, as it is under users control by setsockopt. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:59:19 -08:00
Joonsoo Kim	78d5506e82	mm/migrate: remove unused function, fail_migrate_page() fail_migrate_page() isn't used anywhere, so remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:49 -08:00
Joonsoo Kim	59c82b70dc	mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages Some part of putback_lru_pages() and putback_movable_pages() is duplicated, so it could confuse us what we should use. We can remove putback_lru_pages() since it is not really needed now. This makes us undestand and maintain the code more easily. And comment on putback_movable_pages() is stale now, so fix it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:49 -08:00
Vlastimil Babka	de6c60a6c1	mm: compaction: encapsulate defer reset logic Currently there are several functions to manipulate the deferred compaction state variables. The remaining case where the variables are touched directly is when a successful allocation occurs in direct compaction, or is expected to be successful in the future by kswapd. Here, the lowest order that is expected to fail is updated, and in the case of successful allocation, the deferred status and counter is reset completely. Create a new function compaction_defer_reset() to encapsulate this functionality and make it easier to understand the code. No functional change. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:48 -08:00
Mel Gorman	0eb927c0ab	mm: compaction: trace compaction begin and end The broad goal of the series is to improve allocation success rates for huge pages through memory compaction, while trying not to increase the compaction overhead. The original objective was to reintroduce capturing of high-order pages freed by the compaction, before they are split by concurrent activity. However, several bugs and opportunities for simple improvements were found in the current implementation, mostly through extra tracepoints (which are however too ugly for now to be considered for sending). The patches mostly deal with two mechanisms that reduce compaction overhead, which is caching the progress of migrate and free scanners, and marking pageblocks where isolation failed to be skipped during further scans. Patch 1 (from mgorman) adds tracepoints that allow calculate time spent in compaction and potentially debug scanner pfn values. Patch 2 encapsulates the some functionality for handling deferred compactions for better maintainability, without a functional change type is not determined without being actually needed. Patch 3 fixes a bug where cached scanner pfn's are sometimes reset only after they have been read to initialize a compaction run. Patch 4 fixes a bug where scanners meeting is sometimes not properly detected and can lead to multiple compaction attempts quitting early without doing any work. Patch 5 improves the chances of sync compaction to process pageblocks that async compaction has skipped due to being !MIGRATE_MOVABLE. Patch 6 improves the chances of sync direct compaction to actually do anything when called after async compaction fails during allocation slowpath. The impact of patches were validated using mmtests's stress-highalloc benchmark with mmtests's stress-highalloc benchmark on a x86_64 machine with 4GB memory. Due to instability of the results (mostly related to the bugs fixed by patches 2 and 3), 10 iterations were performed, taking min,mean,max values for success rates and mean values for time and vmstat-based metrics. First, the default GFP_HIGHUSER_MOVABLE allocations were tested with the patches stacked on top of v3.13-rc2. Patch 2 is OK to serve as baseline due to no functional changes in 1 and 2. Comments below. stress-highalloc 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp Success 1 Min 9.00 ( 0.00%) 10.00 (-11.11%) 43.00 (-377.78%) 43.00 (-377.78%) 33.00 (-266.67%) Success 1 Mean 27.50 ( 0.00%) 25.30 ( 8.00%) 45.50 (-65.45%) 45.90 (-66.91%) 46.30 (-68.36%) Success 1 Max 36.00 ( 0.00%) 36.00 ( 0.00%) 47.00 (-30.56%) 48.00 (-33.33%) 52.00 (-44.44%) Success 2 Min 10.00 ( 0.00%) 8.00 ( 20.00%) 46.00 (-360.00%) 45.00 (-350.00%) 35.00 (-250.00%) Success 2 Mean 26.40 ( 0.00%) 23.50 ( 10.98%) 47.30 (-79.17%) 47.60 (-80.30%) 48.10 (-82.20%) Success 2 Max 34.00 ( 0.00%) 33.00 ( 2.94%) 48.00 (-41.18%) 50.00 (-47.06%) 54.00 (-58.82%) Success 3 Min 65.00 ( 0.00%) 63.00 ( 3.08%) 85.00 (-30.77%) 84.00 (-29.23%) 85.00 (-30.77%) Success 3 Mean 76.70 ( 0.00%) 70.50 ( 8.08%) 86.20 (-12.39%) 85.50 (-11.47%) 86.00 (-12.13%) Success 3 Max 87.00 ( 0.00%) 86.00 ( 1.15%) 88.00 ( -1.15%) 87.00 ( 0.00%) 87.00 ( 0.00%) 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp User 6437.72 6459.76 5960.32 5974.55 6019.67 System 1049.65 1049.09 1029.32 1031.47 1032.31 Elapsed 1856.77 1874.48 1949.97 1994.22 1983.15 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-nothp 3-nothp 4-nothp 5-nothp 6-nothp Minor Faults 253952267 254581900 250030122 250507333 250157829 Major Faults 420 407 506 530 530 Swap Ins 4 9 9 6 6 Swap Outs 398 375 345 346 333 Direct pages scanned 197538 189017 298574 287019 299063 Kswapd pages scanned 1809843 1801308 1846674 1873184 1861089 Kswapd pages reclaimed 1806972 1798684 1844219 1870509 1858622 Direct pages reclaimed 197227 188829 298380 286822 298835 Kswapd efficiency 99% 99% 99% 99% 99% Kswapd velocity 953.382 970.449 952.243 934.569 922.286 Direct efficiency 99% 99% 99% 99% 99% Direct velocity 104.058 101.832 153.961 143.200 148.205 Percentage direct scans 9% 9% 13% 13% 13% Zone normal velocity 347.289 359.676 348.063 339.933 332.983 Zone dma32 velocity 710.151 712.605 758.140 737.835 737.507 Zone dma velocity 0.000 0.000 0.000 0.000 0.000 Page writes by reclaim 557.600 429.000 353.600 426.400 381.800 Page writes file 159 53 7 79 48 Page writes anon 398 375 345 346 333 Page reclaim immediate 825 644 411 575 420 Sector Reads 2781750 2769780 2878547 2939128 2910483 Sector Writes 12080843 12083351 12012892 12002132 12010745 Page rescued immediate 0 0 0 0 0 Slabs scanned `1575654` `1545344` 1778406 1786700 1794073 Direct inode steals 9657 10037 15795 14104 14645 Kswapd inode steals 46857 46335 50543 50716 51796 Kswapd skipped wait 0 0 0 0 0 THP fault alloc 97 91 81 71 77 THP collapse alloc 456 506 546 544 565 THP splits 6 5 5 4 4 THP fault fallback 0 1 0 0 0 THP collapse fail 14 14 12 13 12 Compaction stalls 1006 980 1537 1536 1548 Compaction success 303 284 562 559 578 Compaction failures 702 696 974 976 969 Page migrate success 1177325 1070077 3927538 3781870 3877057 Page migrate failure 0 0 0 0 0 Compaction pages isolated 2547248 2306457 8301218 8008500 8200674 Compaction migrate scanned 42290478 38832618 153961130 154143900 159141197 Compaction free scanned 89199429 79189151 356529027 351943166 356326727 Compaction cost 1566 1426 5312 5156 5294 NUMA PTE updates 0 0 0 0 0 NUMA hint faults 0 0 0 0 0 NUMA hint local faults 0 0 0 0 0 NUMA hint local percent 100 100 100 100 100 NUMA pages migrated 0 0 0 0 0 AutoNUMA cost 0 0 0 0 0 Observations: - The "Success 3" line is allocation success rate with system idle (phases 1 and 2 are with background interference). I used to get stable values around 85% with vanilla 3.11. The lower min and mean values came with 3.12. This was bisected to commit `81c0a2bb` ("mm: page_alloc: fair zone allocator policy") As explained in comment for patch 3, I don't think the commit is wrong, but that it makes the effect of compaction bugs worse. From patch 3 onwards, the results are OK and match the 3.11 results. - Patch 4 also clearly helps phases 1 and 2, and exceeds any results I've seen with 3.11 (I didn't measure it that thoroughly then, but it was never above 40%). - Compaction cost and number of scanned pages is higher, especially due to patch 4. However, keep in mind that patches 3 and 4 fix existing bugs in the current design of compaction overhead mitigation, they do not change it. If overhead is found unacceptable, then it should be decreased differently (and consistently, not due to random conditions) than the current implementation does. In contrast, patches 5 and 6 (which are not strictly bug fixes) do not increase the overhead (but also not success rates). This might be a limitation of the stress-highalloc benchmark as it's quite uniform. Another set of results is when configuring stress-highalloc t allocate with similar flags as THP uses: (GFP_HIGHUSER_MOVABLE\|__GFP_NOMEMALLOC\|__GFP_NORETRY\|__GFP_NO_KSWAPD) stress-highalloc 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-thp 3-thp 4-thp 5-thp 6-thp Success 1 Min 2.00 ( 0.00%) 7.00 (-250.00%) 18.00 (-800.00%) 19.00 (-850.00%) 26.00 (-1200.00%) Success 1 Mean 19.20 ( 0.00%) 17.80 ( 7.29%) 29.20 (-52.08%) 29.90 (-55.73%) 32.80 (-70.83%) Success 1 Max 27.00 ( 0.00%) 29.00 ( -7.41%) 35.00 (-29.63%) 36.00 (-33.33%) 37.00 (-37.04%) Success 2 Min 3.00 ( 0.00%) 8.00 (-166.67%) 21.00 (-600.00%) 21.00 (-600.00%) 32.00 (-966.67%) Success 2 Mean 19.30 ( 0.00%) 17.90 ( 7.25%) 32.20 (-66.84%) 32.60 (-68.91%) 35.70 (-84.97%) Success 2 Max 27.00 ( 0.00%) 30.00 (-11.11%) 36.00 (-33.33%) 37.00 (-37.04%) 39.00 (-44.44%) Success 3 Min 62.00 ( 0.00%) 62.00 ( 0.00%) 85.00 (-37.10%) 75.00 (-20.97%) 64.00 ( -3.23%) Success 3 Mean 66.30 ( 0.00%) 65.50 ( 1.21%) 85.60 (-29.11%) 83.40 (-25.79%) 83.50 (-25.94%) Success 3 Max 70.00 ( 0.00%) 69.00 ( 1.43%) 87.00 (-24.29%) 86.00 (-22.86%) 87.00 (-24.29%) 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-thp 3-thp 4-thp 5-thp 6-thp User 6547.93 6475.85 6265.54 6289.46 6189.96 System 1053.42 1047.28 1043.23 1042.73 1038.73 Elapsed 1835.43 1821.96 1908.67 1912.74 1956.38 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 3.13-rc2 2-thp 3-thp 4-thp 5-thp 6-thp Minor Faults 256805673 253106328 253222299 249830289 251184418 Major Faults 395 375 423 434 448 Swap Ins 12 10 10 12 9 Swap Outs 530 537 487 455 415 Direct pages scanned 71859 86046 153244 152764 190713 Kswapd pages scanned 1900994 1870240 1898012 1892864 1880520 Kswapd pages reclaimed 1897814 1867428 1894939 1890125 1877924 Direct pages reclaimed 71766 85908 153167 152643 190600 Kswapd efficiency 99% 99% 99% 99% 99% Kswapd velocity 1029.000 1067.782 1000.091 991.049 951.218 Direct efficiency 99% 99% 99% 99% 99% Direct velocity 38.897 49.127 80.747 79.983 96.468 Percentage direct scans 3% 4% 7% 7% 9% Zone normal velocity 351.377 372.494 348.910 341.689 335.310 Zone dma32 velocity 716.520 744.414 731.928 729.343 712.377 Zone dma velocity 0.000 0.000 0.000 0.000 0.000 Page writes by reclaim 669.300 604.000 545.700 538.900 429.900 Page writes file 138 66 58 83 14 Page writes anon 530 537 487 455 415 Page reclaim immediate 806 655 772 548 517 Sector Reads 2711956 2703239 2811602 2818248 2839459 Sector Writes 12163238 12018662 12038248 11954736 11994892 Page rescued immediate 0 0 0 0 0 Slabs scanned 1385088 1388364 1507968 1513292 1558656 Direct inode steals 1739 2564 4622 5496 6007 Kswapd inode steals 47461 46406 47804 48013 48466 Kswapd skipped wait 0 0 0 0 0 THP fault alloc 110 82 84 69 70 THP collapse alloc 445 482 467 462 539 THP splits 6 5 4 5 3 THP fault fallback 3 0 0 0 0 THP collapse fail 15 14 14 14 13 Compaction stalls 659 685 1033 1073 1111 Compaction success 222 225 410 427 456 Compaction failures 436 460 622 646 655 Page migrate success 446594 439978 1085640 1095062 1131716 Page migrate failure 0 0 0 0 0 Compaction pages isolated 1029475 1013490 2453074 2482698 2565400 Compaction migrate scanned `9955461` 11344259 24375202 27978356 30494204 Compaction free scanned 27715272 28544654 80150615 82898631 85756132 Compaction cost 552 555 1344 1379 1436 NUMA PTE updates 0 0 0 0 0 NUMA hint faults 0 0 0 0 0 NUMA hint local faults 0 0 0 0 0 NUMA hint local percent 100 100 100 100 100 NUMA pages migrated 0 0 0 0 0 AutoNUMA cost 0 0 0 0 0 There are some differences from the previous results for THP-like allocations: - Here, the bad result for unpatched kernel in phase 3 is much more consistent to be between 65-70% and not related to the "regression" in 3.12. Still there is the improvement from patch 4 onwards, which brings it on par with simple GFP_HIGHUSER_MOVABLE allocations. - Compaction costs have increased, but nowhere near as much as the non-THP case. Again, the patches should be worth the gained determininsm. - Patches 5 and 6 somewhat increase the number of migrate-scanned pages. This is most likely due to __GFP_NO_KSWAPD flag, which means the cached pfn's and pageblock skip bits are not reset by kswapd that often (at least in phase 3 where no concurrent activity would wake up kswapd) and the patches thus help the sync-after-async compaction. It doesn't however show that the sync compaction would help so much with success rates, which can be again seen as a limitation of the benchmark scenario. This patch (of 6): Add two tracepoints for compaction begin and end of a zone. Using this it is possible to calculate how much time a workload is spending within compaction and potentially debug problems related to cached pfns for scanning. In combination with the direct reclaim and slab trace points it should be possible to estimate most allocation-related overhead for a workload. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:48 -08:00
Mel Gorman	286549dcaf	sched: add tracepoints related to NUMA task migration This patch adds three tracepoints o trace_sched_move_numa when a task is moved to a node o trace_sched_swap_numa when a task is swapped with another task o trace_sched_stick_numa when a numa-related migration fails The tracepoints allow the NUMA scheduler activity to be monitored and the following high-level metrics can be calculated o NUMA migrated stuck nr trace_sched_stick_numa o NUMA migrated idle nr trace_sched_move_numa o NUMA migrated swapped nr trace_sched_swap_numa o NUMA local swapped trace_sched_swap_numa src_nid == dst_nid (should never happen) o NUMA remote swapped trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped) o NUMA group swapped trace_sched_swap_numa src_ngid == dst_ngid Maybe a small number of these are acceptable but a high number would be a major surprise. It would be even worse if bounces are frequent. o NUMA avg task migs. Average number of migrations for tasks o NUMA stddev task mig Self-explanatory o NUMA max task migs. Maximum number of migrations for a single task In general the intent of the tracepoints is to help diagnose problems where automatic NUMA balancing appears to be doing an excessive amount of useless work. [akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:48 -08:00
Mel Gorman	af1839d722	mm: numa: trace tasks that fail migration due to rate limiting A low local/remote numa hinting fault ratio is potentially explained by failed migrations. This patch adds a tracepoint that fires when migration fails due to migration rate limitation. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-01-21 16:19:48 -08:00

... 119 120 121 122 123 ...

70103 Commits