Similar to the rest of the DRM UAPI - these are to be imported
unmodified into libdrm. In current form that's impossible.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Johannes Berg says:
====================
Some more work for 4.7, notably:
* completion and fixups of nla_put_64_64bit() work
* remove a/b/g/n from wext nickname to avoid confusion
with 11ac (which wouldn't even fit fully there due to
string length restrictions)
along with some other minor changes/cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The KVM_MAX_VCPUS define provides the maximum number of vCPUs per guest, and
also the upper limit for vCPU ids. This is okay for all archs except PowerPC
which can have higher ids, depending on the cpu/core/thread topology. In the
worst case (single threaded guest, host with 8 threads per core), it limits
the maximum number of vCPUS to KVM_MAX_VCPUS / 8.
This patch separates the vCPU numbering from the total number of vCPUs, with
the introduction of KVM_MAX_VCPU_ID, as the maximal valid value for vCPU ids
plus one.
The corresponding KVM_CAP_MAX_VCPU_ID allows userspace to validate vCPU ids
before passing them to KVM_CREATE_VCPU.
This patch only implements KVM_MAX_VCPU_ID with a specific value for PowerPC.
Other archs continue to return KVM_MAX_VCPUS instead.
Suggested-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is an initial implementation of a netdev driver for GTP datapath
(GTP-U) v0 and v1, according to the GSM TS 09.60 and 3GPP TS 29.060
standards. This tunneling protocol is used to prevent subscribers from
accessing mobile carrier core network infrastructure.
This implementation requires a GGSN userspace daemon that implements the
signaling protocol (GTP-C), such as OpenGGSN [1]. This userspace daemon
updates the PDP context database that represents active subscriber
sessions through a genetlink interface.
For more context on this tunneling protocol, you can check the slides
that were presented during the NetDev 1.1 [2].
Only IPv4 is supported at this time.
[1] http://git.osmocom.org/openggsn/
[2] http://www.netdevconf.org/1.1/proceedings/slides/schultz-welte-osmocom-gtp.pdf
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
glibc's net/if.h contains copies of definitions from linux/if.h and these
conflict and cause build failures if both files are included by application
source code. Changes in uapi headers, which fixed header file dependencies to
include linux/if.h when it was needed, e.g. commit 1ffad83d, made the
net/if.h and linux/if.h incompatibilities visible as build failures for
userspace applications like iproute2 and xtables-addons.
This patch fixes compile errors when glibc net/if.h is included before
linux/if.h:
./linux/if.h:99:21: error: redeclaration of enumerator ‘IFF_NOARP’
./linux/if.h:98:23: error: redeclaration of enumerator ‘IFF_RUNNING’
./linux/if.h:97:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’
./linux/if.h:96:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’
./linux/if.h:95:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’
./linux/if.h:94:21: error: redeclaration of enumerator ‘IFF_DEBUG’
./linux/if.h:93:25: error: redeclaration of enumerator ‘IFF_BROADCAST’
./linux/if.h:92:19: error: redeclaration of enumerator ‘IFF_UP’
./linux/if.h:252:8: error: redefinition of ‘struct ifconf’
./linux/if.h:203:8: error: redefinition of ‘struct ifreq’
./linux/if.h:169:8: error: redefinition of ‘struct ifmap’
./linux/if.h:107:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’
./linux/if.h:106:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’
./linux/if.h:105:23: error: redeclaration of enumerator ‘IFF_PORTSEL’
./linux/if.h:104:25: error: redeclaration of enumerator ‘IFF_MULTICAST’
./linux/if.h:103:21: error: redeclaration of enumerator ‘IFF_SLAVE’
./linux/if.h:102:22: error: redeclaration of enumerator ‘IFF_MASTER’
./linux/if.h:101:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’
./linux/if.h:100:23: error: redeclaration of enumerator ‘IFF_PROMISC’
The cases where linux/if.h is included before net/if.h need a similar fix in
the glibc side, or the order of include files can be changed userspace
code as a workaround.
This change was tested in x86 userspace on Debian unstable with
scripts/headers_compile_test.sh:
$ make headers_install && \
cd usr/include && ../../scripts/headers_compile_test.sh -l -k
...
cc -Wall -c -nostdinc -I /usr/lib/gcc/i586-linux-gnu/5/include -I /usr/lib/gcc/i586-linux-gnu/5/include-fixed -I . -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH/i586-linux-gnu -o /dev/null ./linux/if.h_libc_before_kernel.h
PASSED libc before kernel test: ./linux/if.h
Reported-by: Jan Engelhardt <jengelh@inai.de>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: Stephen Hemminger <shemming@brocade.com>
Reported-by: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
Cc: Gabriel Laskar <gabriel@lse.epita.fr>
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device DAX is the device-centric analogue of Filesystem DAX
(CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and
mapped without need of an intervening file system. This initial
infrastructure arranges for a libnvdimm pfn-device to be represented as
a different device-type so that it can be attached to a driver other
than the pmem driver.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In netdevice.h we removed the structure in net-next that is being
changes in 'net'. In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.
The mlx5 conflicts have to do with vxlan support dependencies.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Check klogctl failure correctly, from Colin Ian King.
2) Prevent OOM when under memory pressure in flowcache, from Steffen
Klassert.
3) Fix info leak in llc and rtnetlink ifmap code, from Kangjie Lu.
4) Memory barrier and multicast handling fixes in bnxt_en, from Michael
Chan.
5) Endianness bug in mlx5, from Daniel Jurgens.
6) Fix disconnect handling in VSOCK, from Ian Campbell.
7) Fix locking of netdev list walking in get_bridge_ifindices(), from
Nikolay Aleksandrov.
8) Bridge multicast MLD parser can look at wrong packet offsets, fix
from Linus Lüssing.
9) Fix chip hang in qede driver, from Sudarsana Reddy Kalluru.
10) Fix missing setting of encapsulation before inner handling completes
in udp_offload code, from Jarno Rajahalme.
11) Missing rollbacks during LAG join and flood configuration failures
in mlxsw driver, from Ido Schimmel.
12) Fix error code checks in netxen driver, from Dan Carpenter.
13) Fix key size in new macsec driver, from Sabrina Dubroca.
14) Fix mlx5/VXLAN dependencies, from Arnd Bergmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
net/mlx5e: make VXLAN support conditional
Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue"
macsec: key identifier is 128 bits, not 64
Documentation/networking: more accurate LCO explanation
macvtap: segmented packet is consumed
tools: bpf_jit_disasm: check for klogctl failure
qede: uninitialized variable in qede_start_xmit()
netxen: netxen_rom_fast_read() doesn't return -1
netxen: reversed condition in netxen_nic_set_link_parameters()
netxen: fix error handling in netxen_get_flash_block()
mlxsw: spectrum: Add missing rollback in flood configuration
mlxsw: spectrum: Fix rollback order in LAG join failure
udp_offload: Set encapsulation before inner completes.
udp_tunnel: Remove redundant udp_tunnel_gro_complete().
qede: prevent chip hang when increasing channels
net: ipv6: tcp reset, icmp need to consider L3 domain
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following large patchset contains Netfilter updates for your
net-next tree. My initial intention was to send you this in two goes but
when I looked back twice I already had this burden on top of me.
Several updates for IPVS from Marco Angaroni:
1) Allow SIP connections originating from real-servers to be load
balanced by the SIP persistence engine as is already implemented
in the other direction.
2) Release connections immediately for One-packet-scheduling (OPS)
in IPVS, instead of making it via timer and rcu callback.
3) Skip deleting conntracks for each one packet in OPS, and don't call
nf_conntrack_alter_reply() since no reply is expected.
4) Enable drop on exhaustion for OPS + SIP persistence.
Miscelaneous conntrack updates from Florian Westphal, including fix for
hash resize:
5) Move conntrack generation counter out of conntrack pernet structure
since this is only used by the init_ns to allow hash resizing.
6) Use get_random_once() from packet path to collect hash random seed
instead of our compound.
7) Don't disable BH from ____nf_conntrack_find() for statistics,
use NF_CT_STAT_INC_ATOMIC() instead.
8) Fix lookup race during conntrack hash resizing.
9) Introduce clash resolution on conntrack insertion for connectionless
protocol.
Then, Florian's netns rework to get rid of per-netns conntrack table,
thus we use one single table for them all. There was consensus on this
change during the NFWS 2015 and, on top of that, it has recently been
pointed as a source of multiple problems from unpriviledged netns:
11) Use a single conntrack hashtable for all namespaces. Include netns
in object comparisons and make it part of the hash calculation.
Adapt early_drop() to consider netns.
12) Use single expectation and NAT hashtable for all namespaces.
13) Use a single slab cache for all namespaces for conntrack objects.
14) Skip full table scanning from nf_ct_iterate_cleanup() if the pernet
conntrack counter tells us the table is empty (ie. equals zero).
Fixes for nf_tables interval set element handling, support to set
conntrack connlabels and allow set names up to 32 bytes.
15) Parse element flags from element deletion path and pass it up to the
backend set implementation.
16) Allow adjacent intervals in the rbtree set type for dynamic interval
updates.
17) Add support to set connlabel from nf_tables, from Florian Westphal.
18) Allow set names up to 32 bytes in nf_tables.
Several x_tables fixes and updates:
19) Fix incorrect use of IS_ERR_VALUE() in x_tables, original patch
from Andrzej Hajda.
And finally, miscelaneous netfilter updates such as:
20) Disable automatic helper assignment by default. Note this proc knob
was introduced by a900689264 ("netfilter: nf_ct_helper: allow to
disable automatic helper assignment") 4 years ago to start moving
towards explicit conntrack helper configuration via iptables CT
target.
21) Get rid of obsolete and inconsistent debugging instrumentation
in x_tables.
22) Remove unnecessary check for null after ip6_route_output().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux 4.6-rc7
* tag 'v4.6-rc7': (185 commits)
Linux 4.6-rc7
parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
mailmap: add John Paul Adrian Glaubitz
byteswap: try to avoid __builtin_constant_p gcc bug
lib/stackdepot: avoid to return 0 handle
mm: fix kcompactd hang during memory offlining
modpost: fix module autoloading for OF devices with generic compatible property
proc: prevent accessing /proc/<PID>/environ until it's ready
mm/zswap: provide unique zpool name
mm: thp: kvm: fix memory corruption in KVM with THP enabled
MAINTAINERS: fix Rajendra Nayak's address
mm, cma: prevent nr_isolated_* counters from going negative
mm: update min_free_kbytes from khugepaged after core initialization
huge pagecache: mmap_sem is unlocked when truncation splits pmd
rapidio/mport_cdev: fix uapi type definitions
mm: memcontrol: let v2 cgroups follow changes in system swappiness
mm: thp: correct split_huge_pages file permission
maintainers: update rmk's email address(es)
writeback: Fix performance regression in wb_over_bg_thresh()
...
The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.
IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On small embedded routers, one wants to control maximal amount of
memory used by fq_codel, instead of controlling number of packets or
bytes, since GRO/TSO make these not practical.
Assuming skb->truesize is accurate, we have to keep track of
skb->truesize sum for skbs in queue.
This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.
I chose a default value of 32 MBytes, which looks reasonable even
for heavy duty usages. (Prior fq_codel users should not be hurt
when they upgrade their kernels)
Two fields are added to tc_fq_codel_qd_stats to report :
- Current memory usage
- Number of drops caused by memory limits
# tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
..
# tc -s -d qd sh dev eth1
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223)
rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
new_flows_len 1 old_flows_len 177
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Dave Täht <dave.taht@gmail.com>
Cc: Sebastian Möller <moeller0@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an implementation of Qualcomm's IPC router protocol, used to
communicate with service providing remote processors.
Signed-off-by: Courtney Cavin <courtney.cavin@sonymobile.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
[bjorn: Cope with 0 being a valid node id and implement RTM_NEWADDR]
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extended BPF carried over two instructions from classic to access
packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
but due to their design they have to do length check for every access.
When BPF is processing 20M packets per second single LD_ABS after JIT
is consuming 3% cpu. Hence the need to optimize it further by amortizing
the cost of 'off < skb_headlen' over multiple packet accesses.
One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
with similar usage as skb_header_pointer().
The kernel part for interpreter and x64 JIT was implemented in [1], but such
new insns behave like old ld_abs and abort the program with 'return 0' if
access is beyond linear data. Such hidden control flow is hard to workaround
plus changing JITs and rolling out new llvm is incovenient.
Therefore allow cls_bpf/act_bpf program access skb->data directly:
int bpf_prog(struct __sk_buff *skb)
{
struct iphdr *ip;
if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
/* packet too small */
return 0;
ip = skb->data + ETH_HLEN;
/* access IP header fields with direct loads */
if (ip->version != 4 || ip->saddr == 0x7f000001)
return 1;
[...]
}
This solution avoids introduction of new instructions. llvm stays
the same and all JITs stay the same, but verifier has to work extra hard
to prove safety of the above program.
For XDP the direct store instructions can be allowed as well.
The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
the alignment. The complex packet parsers where packet pointer is adjusted
incrementally cannot be tracked for alignment, so allow byte access in such cases
and misaligned access on architectures that define efficient_unaligned_access
[1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dw
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
byteswap: try to avoid __builtin_constant_p gcc bug
lib/stackdepot: avoid to return 0 handle
mm: fix kcompactd hang during memory offlining
modpost: fix module autoloading for OF devices with generic compatible property
proc: prevent accessing /proc/<PID>/environ until it's ready
mm/zswap: provide unique zpool name
mm: thp: kvm: fix memory corruption in KVM with THP enabled
MAINTAINERS: fix Rajendra Nayak's address
mm, cma: prevent nr_isolated_* counters from going negative
mm: update min_free_kbytes from khugepaged after core initialization
huge pagecache: mmap_sem is unlocked when truncation splits pmd
rapidio/mport_cdev: fix uapi type definitions
mm: memcontrol: let v2 cgroups follow changes in system swappiness
mm: thp: correct split_huge_pages file permission
This is another attempt to avoid a regression in wwn_to_u64() after that
started using get_unaligned_be64(), which in turn ran into a bug on
gcc-4.9 through 6.1.
The regression got introduced due to the combination of two separate
workarounds (commits e3bde9568d: "include/linux/unaligned: force
inlining of byteswap operations" and ef3fb2422f: "scsi: fc: use
get/put_unaligned64 for wwn access") that each try to sidestep distinct
problems with gcc behavior (code growth and increased stack usage).
Unfortunately after both have been applied, a more serious gcc bug has
been uncovered, leading to incorrect object code that discards part of a
function and causes undefined behavior.
As part of this problem is how __builtin_constant_p gets evaluated on an
argument passed by reference into an inline function, this avoids the
use of __builtin_constant_p() for all architectures that set
CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not set
ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not
suffer from the problem in the qla2xxx driver, but they might still run
into it elsewhere.
Both of the original workarounds were only merged in the 4.6 kernel, and
the bug that is fixed by this patch should only appear if both are
there, so we probably don't need to backport the fix. On the other
hand, it works by simplifying the code path and should not have any
negative effects.
[arnd@arndb.de: fix older gcc warnings]
(http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel)
Link: https://lkml.org/lkml/headers/2016/4/12/1103
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
Fixes: e3bde9568d ("include/linux/unaligned: force inlining of byteswap operations")
Fixes: ef3fb2422f ("scsi: fc: use get/put_unaligned64 for wwn access")
Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> # on gcc-5.3
Tested-by: Quinn Tran <quinn.tran@qlogic.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Jan Hubicka <hubicka@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull asm-generic syscall fix from Arnd Bergmann:
"My last pull request for asm-generic had just one patch that added two
new system calls to asm/unistd.h, but unfortunately it turned out to
be wrong, pointing arch/tile compat mode at the native handlers rather
than the compat ones.
This was spotted by Yury Norov, who is working on ILP32 mode for
arch/arm64, which would have the same problem when merged. This fixes
the table to use the correct compat syscalls, like the other 64-bit
architectures do.
I'll try to find the time to come up with a solution that prevents
this problem from happening again, by allowing all future system calls
to just get added in a single file for use by all architectures"
* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: use compat version for preadv2 and pwritev2
Now that all MTD drivers have moved to the mtd_ooblayout_ops model we can
safely remove the struct nand_ecclayout definition, and all the remaining
places where it was still used.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Currently, we support set names of up to 16 bytes, get this aligned
with the maximum length we can use in ipset to make it easier when
considering migration to nf_tables.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Compat architectures that does not use generic unistd (mips, s390),
declare compat version in their syscall tables for preadv2 and
pwritev2. Generic unistd syscall table should do it as well.
[arnd: this initially slipped through the review and an
incorrect patch got merged. arch/tile/ is the only architecture
that could be affected for their 32-bit compat mode, every
other architecture we support today is fine.]
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This pull request brings in DPI panel support, gamma ramp support, and
render nodes for vc4.
* tag 'drm-vc4-next-2016-05-02' of https://github.com/anholt/linux:
drm/vc4: Add missing render node support
drm/vc4: Add support for gamma ramps.
drm/vc4: Fix NULL deref in HDMI init error path
drm/vc4: Add DPI driver
drm: Add an encoder and connector type enum for DPI.
Conflicts:
net/ipv4/ip_gre.c
Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver adds support for the STM CoreSight IP block, allowing any
system compoment (HW or SW) to log and aggregate messages via a
single entity.
The CoreSight STM exposes an application defined number of channels
called stimulus port. Configuration is done using entries in sysfs
and channels made available to userspace via configfs.
Signed-off-by: Pratik Patel <pratikp@codeaurora.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Michael Williams <michael.williams@arm.com>
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In presence of inelastic flows and stress, we can call
fq_codel_drop() for every packet entering fq_codel qdisc.
fq_codel_drop() is quite expensive, as it does a linear scan
of 4 KB of memory to find a fat flow.
Once found, it drops the oldest packet of this flow.
Instead of dropping a single packet, try to drop 50% of the backlog
of this fat flow, with a configurable limit of 64 packets per round.
TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this
limit configurable.
With this strategy the 4 KB search is amortized to a single cache line
per drop [1], so fq_codel_drop() no longer appears at the top of kernel
profile in presence of few inelastic flows.
[1] Assuming a 64byte cache line, and 1024 buckets
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dave Taht <dave.taht@gmail.com>
Cc: Jonathan Morton <chromatix99@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Dave Taht
Signed-off-by: David S. Miller <davem@davemloft.net>
Add driver for the PCI Express Downstream Port Containment extended
capability. DPC is an optional capability to contain uncorrectable errors
below a port.
For more information on DPC, please see PCI Express Base Specification
Revision 4, section 7.31, or view the PCI-SIG DPC ECN here:
https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf
When a DPC event is triggered, the hardware disables downstream links, so
the DPC driver schedules removal for all devices below this port. This may
happen concurrently with a PCIe hotplug driver if enabled. When all
downstream devices are removed and the link state transitions to disabled,
the DPC driver clears the DPC status and interrupt bits so the link may
retrain for a newly connected device.
[bhelgaas: clear (not set) DPC_CTL bits on remove, whitespace cleanup]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
Add the Downstream Port Containment (PCIE_PORT_SERVICE_DPC) portdrv service
type, available if the device has the DPC extended capability.
[bhelgaas: split to separate patch, changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>