linux

Author	SHA1	Message	Date
Leigh Brown	526edcf567	net: mvmdio: slight optimisation of orion_mdio_write Make only a single call to mutex_unlock in orion_mdio_write. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 18:53:36 -04:00
Leigh Brown	839f46bb4c	net: mvmdio: orion_mdio_ready: remove manual poll Replace manual poll of MVMDIO_SMI_READ_VALID with a call to orion_mdio_wait_ready. This ensures a consistent timeout, eliminates a busy loop, and allows for use of interrupts on systems that support them. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 18:53:36 -04:00
Leigh Brown	b70cd1c1a9	net: mvmdio: make orion_mdio_wait_ready consistent Amend orion_mdio_wait_ready so that the same timeout is used when polling or using wait_event_timeout. Set the timeout to 1ms. Replace udelay with usleep_range to avoid a busy loop, and set the polling interval range as 45us to 55us, so that the first sleep will be enough in almost all cases. Generate the same log message at timeout when polling or using wait_event_timeout. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 18:53:35 -04:00
Gavin Shan	87f20c26f9	net/benet: Make lancer_wait_ready() static The function needn't to be public, so to make it as static. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:50:17 -04:00
Gavin Shan	547e2daeba	net/benet: Remove interface type The interface type, which is being traced by "struct be_adapter:: if_type", isn't used currently. So we can remove that safely according to Sathya's comments. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:50:16 -04:00
Joe Perches	22ded57729	netconsole: Convert to pr_<level> Use a more current logging style. Convert printks to pr_<level>. Consolidate multiple printks into a single printk to avoid any possible dmesg interleaving. Add a default "event" msg in case the listed types are ever expanded. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:41:49 -04:00
Vlad Yasevich	06499098a0	bridge: pass correct vlan id to multicast code Currently multicast code attempts to extrace the vlan id from the skb even when vlan filtering is disabled. This can lead to mdb entries being created with the wrong vlan id. Pass the already extracted vlan id to the multicast filtering code to make the correct id is used in creation as well as lookup. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:40:08 -04:00
David S. Miller	911aeb1084	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== One patch for net/3.12 fixing an issue where devices could be in an invalid state they are removed while still attached to OVS. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:36:39 -04:00
Michael Drüing	706e282b69	net: x25: Fix dead URLs in Kconfig Update the URLs in the Kconfig file to the new pages at sangoma.com and cisco.com Signed-off-by: Michael Drüing <michael@drueing.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:35:17 -04:00
Daniel Borkmann	7d1d65cb84	net: sched: cls_bpf: add BPF-based classifier This work contains a lightweight BPF-based traffic classifier that can serve as a flexible alternative to ematch-based tree classification, i.e. now that BPF filter engine can also be JITed in the kernel. Naturally, tc actions and policies are supported as well with cls_bpf. Multiple BPF programs/filter can be attached for a class, or they can just as well be written within a single BPF program, that's really up to the user how he wishes to run/optimize the code, e.g. also for inversion of verdicts etc. The notion of a BPF program's return/exit codes is being kept as follows: 0: No match -1: Select classid given in "tc filter ..." command else: flowid, overwrite the default one As a minimal usage example with iproute2, we use a 3 band prio root qdisc on a router with sfq each as leave, and assign ssh and icmp bpf-based filters to band 1, http traffic to band 2 and the rest to band 3. For the first two bands we load the bytecode from a file, in the 2nd we load it inline as an example: echo 1 > /proc/sys/net/core/bpf_jit_enable tc qdisc del dev em1 root tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 tc qdisc add dev em1 parent 1:1 sfq perturb 16 tc qdisc add dev em1 parent 1:2 sfq perturb 16 tc qdisc add dev em1 parent 1:3 sfq perturb 16 tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1 tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1 tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2 tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3 BPF programs can be easily created and passed to tc, either as inline 'bytecode' or 'bytecode-file'. There are a couple of front-ends that can compile opcodes, for example: 1) People familiar with tcpdump-like filters: tcpdump -iem1 -ddd port 22 \| tr '\n' ',' > /etc/tc/ssh.bpf 2) People that want to low-level program their filters or use BPF extensions that lack support by libpcap's compiler: bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf ssh.ops example code: ldh [12] jne #0x800, drop ldb [23] jneq #6, drop ldh [20] jset #0x1fff, drop ldxb 4 * ([14] & 0xf) ldh [%x + 14] jeq #0x16, pass ldh [%x + 16] jne #0x16, drop pass: ret #-1 drop: ret #0 It was chosen to load bytecode into tc, since the reverse operation, tc filter list dev em1, is then able to show the exact commands again. Possible follow-up work could also include a small expression compiler for iproute2. Tested with the help of bmon. This idea came up during the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from Eric Dumazet! Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:33:17 -04:00
Rafał Miłecki	d549c76bcc	bgmac: separate RX descriptor setup code into a new function This cleans code a bit and will be useful when allocating buffers in other places (like RX path, to avoid skb_copy_from_linear_data_offset). Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 17:20:36 -04:00
David S. Miller	68783ec73c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== This pull request contains the following netfilter fix: * fix --queue-bypass in xt_NFQUEUE revision 3. While adding the revision 3 of this target, the bypass flags were not correctly handled anymore, thus, breaking packet bypassing if no application is listening from userspace, patch from Holger Eitzenberger, reported by Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 16:53:44 -04:00
Deng-Cheng Zhu	7f081f1755	MIPS: Perf: Fix 74K cache map According to Software User's Manual, the event of last-level-cache read/write misses is mapped to even counters. Odd counters of that event number count miss cycles. Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6036/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2013-10-29 21:18:23 +01:00
Linus Torvalds	7314e613d5	Fix a few incorrectly checked [io_]remap_pfn_range() calls Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that really should use the vm_iomap_memory() helper. This trivially converts two of them to the helper, and comments about why the third one really needs to continue to use remap_pfn_range(), and adds the missing size check. Reported-by: Nico Golde <nico@ngolde.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.	2013-10-29 10:21:34 -07:00
Linus Torvalds	f9ec2e6f79	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fixes from Ingo Molnar: "This contains five tooling fixes: - fix a remaining mmap2 assumption which resulted in perf top output breakage - fix mmap ring-buffer processing bug that corrupts data - fix for a severe python scripting memory leak - fix broken (and user-visible) -g option handling - fix stdio output The diffstat size is larger than what we'd like to see this late :-/" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fixup mmap event consumption perf top: Split -G and --call-graph perf record: Split -g and --call-graph perf hists: Add color overhead for stdio output buffer perf tools: Fix up /proc/PID/maps parsing perf script python: Fix mem leak due to missing Py_DECREFs on dict entries	2013-10-29 08:36:50 -07:00
Linus Torvalds	2a999aa0a1	Kconfig: make KOBJECT_RELEASE debugging require timer debugging Without the timer debugging, the delayed kobject release will just result in undebuggable oopses if it triggers any latent bugs. That doesn't actually help debugging at all. So make DEBUG_KOBJECT_RELEASE depend on DEBUG_OBJECTS_TIMERS to avoid having people enable one without the other. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-10-29 08:33:36 -07:00
Daniel Vetter	1fbc0d789d	drm/i915: Fix the PPT fdi lane bifurcate state handling on ivb Originally I've thought that this is leftover hw state dirt from the BIOS. But after way too much helpless flailing around on my part I've noticed that the actual bug is when we change the state of an already active pipe. For example when we change the fdi lines from 2 to 3 without switching off outputs in-between we'll never see the crucial on->off transition in the ->modeset_global_resources hook the current logic relies on. Patch version 2 got this right by instead also checking whether the pipe is indeed active. But that in turn broke things when pipes have been turned off through dpms since the bifurcate enabling is done in the ->crtc_mode_set callback. To address this issues discussed with Ville in the patch review move the setting of the bifurcate bit into the ->crtc_enable hook. That way we won't wreak havoc with this state when userspace puts all other outputs into dpms off state. This also moves us forward with our overall goal to unify the modeset and dpms on paths (which we need to have to allow runtime pm in the dpms off state). Unfortunately this requires us to move the bifurcate helpers around a bit. Also update the commit message, I've misanalyzed the bug rather badly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70507 Tested-by: Jan-Michael Brummer <jan.brummer@tabos.org> Cc: stable@vger.kernel.org Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2013-10-29 13:52:56 +01:00
Holger Eitzenberger	d954777324	netfilter: xt_NFQUEUE: fix --queue-bypass regression V3 of the NFQUEUE target ignores the --queue-bypass flag, causing packets to be dropped when the userspace listener isn't running. Regression is in since `8746ddcf12` ("netfilter: xt_NFQUEUE: introduce CPU fanout"). Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-10-29 13:05:54 +01:00
Wei Yongjun	ed87ac09d8	i40e: fix error return code in i40e_probe() Fix to return -ENOMEM in the memory alloc error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 04:29:25 -07:00
Don Skidmore	44bd741e10	ixgbevf: Add zero_base handler to network statistics This patch removes the need to keep a zero_base variable in the adapter structure. Now we just use two different macros to set the non-zero and zero base. This adds to readability and shortens some of the structure initialization under 80 columns. The gathering of status for ethtool was slightly modified to again better fit into 80 columns and become a bit more readable. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 04:22:24 -07:00
Jacob Keller	3b5dca262f	ixgbevf: add BP_EXTENDED_STATS for CONFIG_NET_RX_BUSY_POLL This patch adds the extended statistics similar to the ixgbe driver. These statistics keep track of how often the busy polling yields, as well as how many packets are cleaned or missed by the polling routine. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 04:15:11 -07:00
Jacob Keller	c777cdfa4e	ixgbevf: implement CONFIG_NET_RX_BUSY_POLL This patch enables CONFIG_NET_RX_BUSY_POLL support in the VF code. This enables sockets which have enabled the SO_BUSY_POLL socket option to use the ndo_busy_poll_recv operation which could result in lower latency, at the cost of higher CPU utilization, and increased power usage. This support is similar to how the ixgbe driver works. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 04:08:12 -07:00
Peter Zijlstra	e8a923cc1f	perf/x86: Fix NMI measurements OK, so what I'm actually seeing on my WSM is that sched/clock.c is 'broken' for the purpose we're using it for. What triggered it is that my WSM-EP is broken :-( [ 0.001000] tsc: Fast TSC calibration using PIT [ 0.002000] tsc: Detected 2533.715 MHz processor [ 0.500180] TSC synchronization [CPU#0 -> CPU#6]: [ 0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock. [ 0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed For some reason it consistently detects TSC skew, even though NHM+ should have a single clock domain for 'reasonable' systems. This marks sched_clock_stable=0, which means that we do fancy stuff to try and get a 'sane' clock. Part of this fancy stuff relies on the tick, clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until it either wakes up or gets kicked by another cpu. While this is perfectly fine for the scheduler -- it only cares about actually running stuff, and when we're running stuff we're obviously not idle. This does somewhat break down for perf which can trigger events just fine on an otherwise idle cpu. So I've got NMIs get get 'measured' as taking ~1ms, which actually don't last nearly that long: <idle>-0 [013] d.h. 886.311970: rcu_nmi_enter <-do_nmi ... <idle>-0 [013] d.h. 886.311997: perf_sample_event_took: HERE!!! : 1040990 So ftrace (which uses sched_clock(), not the fancy bits) only sees ~27us, but we measure ~1ms !! Now since all this measurement stuff lives in x86 code, we can actually fix it. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mingo@kernel.org Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: Don Zickus <dzickus@redhat.com> Cc: jmario@redhat.com Cc: acme@infradead.org Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 12:01:20 +01:00
Peter Zijlstra	bf378d341e	perf: Fix perf ring buffer memory ordering The PPC64 people noticed a missing memory barrier and crufty old comments in the perf ring buffer code. So update all the comments and add the missing barrier. When the architecture implements local_t using atomic_long_t there will be double barriers issued; but short of introducing more conditional barrier primitives this is the best we can do. Reported-by: Victor Kaplansky <victork@il.ibm.com> Tested-by: Victor Kaplansky <victork@il.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: michael@ellerman.id.au Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: anton@samba.org Cc: benh@kernel.crashing.org Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 12:01:19 +01:00
Jacob Keller	08e50a20ed	ixgbevf: have clean_rx_irq return total_rx_packets cleaned Rather than return true/false indicating whether there was budget left, return the total packets cleaned. This currently has no use, but will be used in a following patch which enables CONFIG_NET_RX_BUSY_POLL support in order to track how many packets were cleaned during the busy poll as part of the extended statistics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 04:00:21 -07:00
Jacob Keller	0868161866	ixgbevf: add ixgbevf_rx_skb This patch adds ixgbevf_rx_skb in line with how ixgbe handles the variations on how packets can be received. It will be extended in a following patch for CONFIG_NET_RX_BUSY_POLL support. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 03:53:30 -07:00
Jacob Keller	6a2aae5ae6	ixgbe: remove unnecessary duplication of PCIe bandwidth display This patch removes the unnecessary display of PCIe bandwidth twice. Since the ixgbe_check_minimum_link does a better job, and ensures accurate detection on even complex chains, this older check is no longer necessary. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 03:45:57 -07:00
Jacob Keller	9f0a433ce6	ixgbe: show <2% for encoding loss on PCIe Gen3 This patch updates the ixgbe_check_minimum_link function to correctly show that there is some minor loss of encoding, even though we don't calculate it in the max GT/s equation. It is small enough to not bother, but is better to report it than not. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 03:38:26 -07:00
Mel Gorman	0255d49184	mm: Account for a THP NUMA hinting update as one PTE update A THP PMD update is accounted for as 512 pages updated in vmstat. This is large difference when estimating the cost of automatic NUMA balancing and can be misleading when comparing results that had collapsed versus split THP. This patch addresses the accounting issue. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-10-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 11:38:17 +01:00
Mel Gorman	3f926ab945	mm: Close races between THP migration and PMD numa clearing THP migration uses the page lock to guard against parallel allocations but there are cases like this still open Task A Task B --------------------- --------------------- do_huge_pmd_numa_page do_huge_pmd_numa_page lock_page mpol_misplaced == -1 unlock_page goto clear_pmdnuma lock_page mpol_misplaced == 2 migrate_misplaced_transhuge pmd = pmd_mknonnuma set_pmd_at During hours of testing, one crashed with weird errors and while I have no direct evidence, I suspect something like the race above happened. This patch extends the page lock to being held until the pmd_numa is cleared to prevent migration starting in parallel while the pmd_numa is being cleared. It also flushes the old pmd entry and orders pagetable insertion before rmap insertion. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-9-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 11:38:05 +01:00
Mel Gorman	c61109e34f	mm: numa: Sanitize task_numa_fault() callsites There are three callers of task_numa_fault(): - do_huge_pmd_numa_page(): Accounts against the current node, not the node where the page resides, unless we migrated, in which case it accounts against the node we migrated to. - do_numa_page(): Accounts against the current node, not the node where the page resides, unless we migrated, in which case it accounts against the node we migrated to. - do_pmd_numa_page(): Accounts not at all when the page isn't migrated, otherwise accounts against the node we migrated towards. This seems wrong to me; all three sites should have the same sementaics, furthermore we should accounts against where the page really is, we already know where the task is. So modify all three sites to always account; we did after all receive the fault; and always account to where the page is after migration, regardless of success. They all still differ on when they clear the PTE/PMD; ideally that would get sorted too. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-8-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 11:37:52 +01:00
Mel Gorman	587fe586f4	mm: Prevent parallel splits during THP migration THP migrations are serialised by the page lock but on its own that does not prevent THP splits. If the page is split during THP migration then the pmd_same checks will prevent page table corruption but the unlock page and other fix-ups potentially will cause corruption. This patch takes the anon_vma lock to prevent parallel splits during migration. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-7-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 11:37:39 +01:00
Mel Gorman	42836f5f8b	mm: Wait for THP migrations to complete during NUMA hinting faults The locking for migrating THP is unusual. While normal page migration prevents parallel accesses using a migration PTE, THP migration relies on a combination of the page_table_lock, the page lock and the existance of the NUMA hinting PTE to guarantee safety but there is a bug in the scheme. If a THP page is currently being migrated and another thread traps a fault on the same page it checks if the page is misplaced. If it is not, then pmd_numa is cleared. The problem is that it checks if the page is misplaced without holding the page lock meaning that the racing thread can be migrating the THP when the second thread clears the NUMA bit and faults a stale page. This patch checks if the page is potentially being migrated and stalls using the lock_page if it is potentially being migrated before checking if the page is misplaced or not. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-6-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 11:37:19 +01:00
Mel Gorman	1dd49bfa34	mm: numa: Do not account for a hinting fault if we raced If another task handled a hinting fault in parallel then do not double account for it. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-5-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-10-29 11:37:05 +01:00
Jacob Keller	27d9ce4fd0	ixgbe: fix qv_lock_napi call in ixgbe_napi_disable_all ixgbe_napi_disable_all calls napi_disable on each queue, however the busy polling code introduced a local_bh_disable()d context around the napi_disable. The original author did not realize that napi_disable might sleep, which would cause a sleep while atomic BUG. In addition, on a single processor system, the ixgbe_qv_lock_napi loop shouldn't have to mdelay. This patch adds an ixgbe_qv_disable along with a new IXGBE_QV_STATE_DISABLED bit, which it uses to indicate to the poll and napi routines that the q_vector has been disabled. Now the ixgbe_napi_disable_all function will wait until all pending work has been finished and prevent any future work from being started. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@intel.com> Cc: Hyong-Youb Kim <hykim@myri.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Dmitry Kravkov <dmitry@broadcom.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 03:30:08 -07:00
Jacob Keller	80c33ddd31	net: add might_sleep() call to napi_disable napi_disable uses an msleep() call to wait for outstanding napi work to be finished after setting the disable bit. It does not always sleep incase there was no outstanding work. This resulted in a rare bug in ixgbe_down operation where a napi_disable call took place inside of a local_bh_disable()d context. In order to enable easier detection of future sleep while atomic BUGs, this patch adds a might_sleep() call, so that every use of napi_disable during atomic context will be visible. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: Alexander Duyck <alexander.duyck@intel.com> Cc: Hyong-Youb Kim <hykim@myri.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Dmitry Kravkov <dmitry@broadcom.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 02:40:21 -07:00
Joseph Gasparakis	e6cd988c27	vxlan: Have the NIC drivers do less work for offloads This patch removes the burden from the NIC drivers to check if the vxlan driver is enabled in the kernel and also makes available the vxlan headrooms to them. Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2013-10-29 02:39:13 -07:00
Ingo Molnar	cd65718712	perf/urgent fixes: . Add color overhead for stdio output buffer, which fixes --stdio output being chopped up on the hot (red) entries, fix from Jiri Olsa. . Get 'perf record -g -a sleep 1' working again, removing the need for -- separating perf options from the workload, restoring ages old behaviour, fix from Jiri Olsa. More patches allowing ~/.perfconfig setting up of default callchain collecting method ("fp" or "dwarf") left for next merge window. . Fixup mmap event consumption, where we were acking the consumption by writing the tail before actually accessing the event, which could lead to using overwritten records in things like 'perf record --call-graph'. from Zhouyi Zhou. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSbrexAAoJENZQFvNTUqpA+MIP/1dMAat/+jAobBvy9s69rJbT OV2QxqS5haA6rjbyogY7AiaCM/bSFO9dpLDF31NUTc3s6hiMsHnXnogQxvGGrBzT lzj45V9/a8XXl8Wxbpv0z4mI7ppNR3KjJdkBTsuqso5GivzMuG/8/N7ZxYB8kSlL vnWLRBKiBa/sDKIhKy5oEqmMMa+kPmwIXGZiKFxWkNMjmBtK80SB7MtPernhLNGf wj6EEulphPR1s8VHXqZIoijEvgw2vEXDTFapKd53EV+UPCII13NW7gicgPXyiWrY BIRW4Pwl98y9qHEc5gSq5Eotcl9JhIPQVegTSnvx8jJ7x0bYrxmChb/I9eAjWQcj YQAKGqUoQGu0PJv8TeIoGsxNMFxeeeMKzf275KKc5g6Les6gIHYKOmW7RPXHIIA7 DHGYOJ808jMrNFmAkKNmxQGbE7forMFj6CPDgGpGezqTUbGsszpzga2QQgI0QmrY l8aPUsirF5cZ67vPk7dnUt58FayfKMdGME2dkU352hjOL1V5EeVdxx46zZsPQNyQ evQvkGBmRm6WFi6Sybjc6i4Gkn9F1O5sNDtEDbSQ6aZH5tNnktzLoufPrWwHKV95 GzEHpCuAeZbagPF6WSxpNTb2xfScPb6PfveGm0gbypwl/KaAr47w8VCWJiuvpSOv yV83qEooKW0LUn4JXrH3 =U+a8 -----END PGP SIGNATURE----- Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: * Add color overhead for stdio output buffer, which fixes --stdio output being chopped up on the hot (red) entries, fix from Jiri Olsa. * Get 'perf record -g -a sleep 1' working again, removing the need for -- separating perf options from the workload, restoring ages old behaviour, fix from Jiri Olsa. More patches allowing ~/.perfconfig setting up of default callchain collecting method ("fp" or "dwarf") left for next merge window. * Fixup mmap event consumption, where we were acking the consumption by writing the tail before actually accessing the event, which could lead to using overwritten records in things like 'perf record --call-graph'. From Zhouyi Zhou. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2013-10-29 09:06:07 +01:00
Mathias Krause	1c5ad13f7c	net: esp{4,6}: get rid of struct esp_data struct esp_data consists of a single pointer, vanishing the need for it to be a structure. Fold the pointer into 'data' direcly, removing one level of pointer indirection. Signed-off-by: Mathias Krause <mathias.krause@secunet.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-10-29 06:39:42 +01:00
Mathias Krause	123b0d1ba0	net: esp{4,6}: remove padlen from struct esp_data The padlen member of struct esp_data is always zero. Get rid of it. Signed-off-by: Mathias Krause <mathias.krause@secunet.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2013-10-29 06:39:42 +01:00
Wei Liu	059dfa6a93	xen-netback: use jiffies_64 value to calculate credit timeout time_after_eq() only works if the delta is < MAX_ULONG/2. For a 32bit Dom0, if netfront sends packets at a very low rate, the time between subsequent calls to tx_credit_exceeded() may exceed MAX_ULONG/2 and the test for timer_after_eq() will be incorrect. Credit will not be replenished and the guest may become unable to send packets (e.g., if prior to the long gap, all credit was exhausted). Use jiffies_64 variant to mitigate this problem for 32bit Dom0. Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jason Luan <jianhai.luan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:24:49 -04:00
Zhi Yong Wu	cdfb97bc01	net, mc: fix the incorrect comments in two mc-related functions Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:19:05 -04:00
Zhi Yong Wu	ab1a2d7773	net, iovec: fix the incorrect comment in memcpy_fromiovecend() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:19:04 -04:00
Zhi Yong Wu	c4e819d16c	net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:19:04 -04:00
Zhi Yong Wu	39deb2c7db	vxlan: silence one build warning drivers/net/vxlan.c: In function ‘vxlan_sock_add’: drivers/net/vxlan.c:2298:11: warning: ‘sock’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/vxlan.c:2275:17: note: ‘sock’ was declared here LD drivers/net/built-in.o Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:19:04 -04:00
Hannes Frederic Sowa	daba287b29	ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK UFO as well as UDP_CORK do not respect IP_PMTUDISC_DO and IP_PMTUDISC_PROBE well enough. UFO enabled packet delivery just appends all frags to the cork and hands it over to the network card. So we just deliver non-DF udp fragments (DF-flag may get overwritten by hardware or virtual UFO enabled interface). UDP_CORK does enqueue the data until the cork is disengaged. At this point it sets the correct IP_DF and local_df flags and hands it over to ip_fragment which in this case will generate an icmp error which gets appended to the error socket queue. This is not reflected in the syscall error (of course, if UFO is enabled this also won't happen). Improve this by checking the pmtudisc flags before appending data to the socket and if we still can fit all data in one packet when IP_PMTUDISC_DO or IP_PMTUDISC_PROBE is set, only then proceed. We use (mtu-fragheaderlen) to check for the maximum length because we ensure not to generate a fragment and non-fragmented data does not need to have its length aligned on 64 bit boundaries. Also the passed in ip_options are already aligned correctly. Maybe, we can relax some other checks around ip_fragment. This needs more research. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:15:22 -04:00
Ben Hutchings	262e827fe7	cxgb3: Fix length calculation in write_ofld_wr() on 32-bit architectures The length calculation here is now invalid on 32-bit architectures, since sk_buff::tail is a pointer and sk_buff::transport_header is an integer offset: drivers/net/ethernet/chelsio/cxgb3/sge.c: In function 'write_ofld_wr': drivers/net/ethernet/chelsio/cxgb3/sge.c:1603:9: warning: passing argument 4 of 'make_sgl' makes integer from pointer without a cast [enabled by default] adap->pdev); ^ drivers/net/ethernet/chelsio/cxgb3/sge.c:964:28: note: expected 'unsigned int' but argument is of type 'sk_buff_data_t' static inline unsigned int make_sgl(const struct sk_buff skb, ^ Use the appropriate skb accessor functions. Compile-tested only. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: `1a37e412a0` ('net: Use 16bits for _headers fields of struct skbuff') Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:14:03 -04:00
Ariel Elior	826cb7b43b	bnx2x: Disable VF access on PF removal When the bnx2x driver is rmmoded, if VFs of a given PF will be assigned to a VM then that PF will be unable to call `pci_disable_sriov()'. If for that same PF there would also exist unassigned VFs in the hypervisor, the result will be that after the removal there will still be virtual PCI functions on the hypervisor. If the bnx2x module were to be re-inserted, the result will be that the VFs on the hypervisor will be re-probed directly following the PF's probe, even though that in regular loading flow sriov is only enabled once PF is loaded. The probed VF will then try to access its bar, causing a PCI error as the HW is not in a state enabling such a request. This patch adds a missing disablement procedure to the PF's removal, one that sets registers viewable to the VF to indicate that the VFs have no permission to access the bar, thus resulting in probe errors instead of PCI errors. Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:12:45 -04:00
Dmitry Kravkov	e3ed4eaef4	bnx2x: prevent FW assert on low mem during unload Buffers for FW statistics were allocated at an inappropriate time; In a machine where the driver encounters problems allocating all of its queues, the driver would still create FW requests for the statistics of the non-existing queues. The wrong order of memory allocation could lead to zeroed statistics messages being sent, leading to fw assert in case function 0 was down. This changes the order of allocations, guaranteeing that statistic requests will only be generated for actual queues. Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:12:45 -04:00
Eric Dumazet	0d08c42cf9	tcp: gso: fix truesize tracking commit `6ff50cd555` ("tcp: gso: do not generate out of order packets") had an heuristic that can trigger a warning in skb_try_coalesce(), because skb->truesize of the gso segments were exactly set to mss. This breaks the requirement that skb->truesize >= skb->len + truesizeof(struct sk_buff); It can trivially be reproduced by : ifconfig lo mtu 1500 ethtool -K lo tso off netperf As the skbs are looped into the TCP networking stack, skb_try_coalesce() warns us of these skb under-estimating their truesize. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-10-29 00:04:47 -04:00

... 2 3 4 5 6 ...

402339 Commits