With the new y2038 safe timestamping options added, update the
documentation to reflect the changes.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add defines and docs for generic info versions.
v3:
- add docs;
- separate patch (Jiri).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-01-29
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Teach verifier dead code removal, this also allows for optimizing /
removing conditional branches around dead code and to shrink the
resulting image. Code store constrained architectures like nfp would
have hard time doing this at JIT level, from Jakub.
2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
code generation for 32-bit sub-registers. Evaluation shows that this
can result in code reduction of ~5-20% compared to 64 bit-only code
generation. Also add implementation for most JITs, from Jiong.
3) Add support for __int128 types in BTF which is also needed for
vmlinux's BTF conversion to work, from Yonghong.
4) Add a new command to bpftool in order to dump a list of BPF-related
parameters from the system or for a specific network device e.g. in
terms of available prog/map types or helper functions, from Quentin.
5) Add AF_XDP sock_diag interface for querying sockets from user
space which provides information about the RX/TX/fill/completion
rings, umem, memory usage etc, from Björn.
6) Add skb context access for skb_shared_info->gso_segs field, from Eric.
7) Add support for testing flow dissector BPF programs by extending
existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.
8) Split BPF kselftest's test_verifier into various subgroups of tests
in order better deal with merge conflicts in this area, from Jakub.
9) Add support for queue/stack manipulations in bpftool, from Stanislav.
10) Document BTF, from Yonghong.
11) Dump supported ELF section names in libbpf on program load
failure, from Taeung.
12) Silence a false positive compiler warning in verifier's BTF
handling, from Peter.
13) Fix help string in bpftool's feature probing, from Prashant.
14) Remove duplicate includes in BPF kselftests, from Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add initial documentation file for devlink params of mlxsw driver. Only
"fw_load_policy" is now supported.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new eBPF instruction class JMP32 uses the reserved class number 0x6.
Kernel BPF ISA documentation updated accordingly.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Switch phylib documentation to rst format.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes to the phylib API
- removed phy_stop_interrupts
- replaced phy_start_interrupts with phy_request_interrupt
- moved some functionality from phy_connect() and phy_disconnect()
to phy_start() and phy_stop() respectively.
Reflect these changes in the documentation.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert scaling document into reStructuredText and add reference to
scaling document into main table of contents in network documentation.
There are no semantic changes.
There are no references to "scaling.txt" file. Whole kernel tree was
checked using:
$ grep -r "scaling\.txt"
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This patch adds a new file to add information about devlink health
mechanism.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix "reference to nonexisting document" warnings.
Fixes: b255e500c8 ("net: documentation: build a directory structure for drivers")
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_tstamp.h is an UAPI header, so it was moved under include/uapi.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A follow-up patch will enable vetoing of FDB entries. Make it possible
to communicate details of why an FDB entry is not acceptable back to the
user.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 83c0afaec7 ("net: dsa: Add new binding implementation"), DSA is
no longer a platform device exclusively and can support registering DSA
switches from other bus drivers (PCI, USB, I2C, etc.).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix over 100 documentation warnings in snmp_counter.rst by
extending the underline string lengths and inserting a blank line
after bullet items.
Examples:
Documentation/networking/snmp_counter.rst:1: WARNING: Title overline too short.
Documentation/networking/snmp_counter.rst:14: WARNING: Bullet list ends without a blank line; unexpected unindent.
Fixes: 2b96547223 ("add document for TCP OFO, PAWS and skip ACK counters")
Fixes: 8e2ea53a83 ("add snmp counters document")
Fixes: 712ee16c23 ("add documents for snmp counters")
Fixes: 80cc49507b ("net: Add part of TCP counts explanations in snmp_counters.rst")
Fixes: b08794a922 ("documentation of some IP/ICMP snmp counters")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The changes introduced to allow rxrpc calls to be retried creates an issue
when it comes to refcounting afs_call structs. The problem is that when
rxrpc_send_data() queues the last packet for an asynchronous call, the
following sequence can occur:
(1) The notify_end_tx callback is invoked which causes the state in the
afs_call to be changed from AFS_CALL_CL_REQUESTING or
AFS_CALL_SV_REPLYING.
(2) afs_deliver_to_call() can then process event notifications from rxrpc
on the async_work queue.
(3) Delivery of events, such as an abort from the server, can cause the
afs_call state to be changed to AFS_CALL_COMPLETE on async_work.
(4) For an asynchronous call, afs_process_async_call() notes that the call
is complete and tried to clean up all the refs on async_work.
(5) rxrpc_send_data() might return the amount of data transferred
(success) or an error - which could in turn reflect a local error or a
received error.
Synchronising the clean up after rxrpc_kernel_send_data() returns an error
with the asynchronous cleanup is then tricky to get right.
Mostly revert commit c038a58ccf. The two API
functions the original commit added aren't currently used. This makes
rxrpc_kernel_send_data() always return successfully if it queued the data
it was given.
Note that this doesn't affect synchronous calls since their Rx notification
function merely pokes a wait queue and does not refcounting. The
asynchronous call notification function *has* to do refcounting and pass a
ref over the work item to avoid the need to sync the workqueue in call
cleanup.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch just adds references to offload documents into main table of
contents in network documentation.
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The titles do not look very nice in the table of contents generated by
Sphinx.
I also think it is obvious that the documents are describing offloads
in the Linux Networking Stack.
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This patch renames offload files. This is necessary for Sphinx.
Also update reference to checksum-offloads.rst file.
Whole kernel code was grepped for references using:
$ grep -r "\(segmentation\|checksum\)-offloads.txt" .
There should be no other references
to {segmentation,checksum}-offloads.txt files.
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add small number of markups which are sufficient for conversion
into reStructuredText.
Unfortunately there was necessary to restructure all sections
in checksum-offloads.txt file and create paragraphs separated
by newline. There also must not be a space at the
beginning of paragpraph.
There are no semantic changes.
Signed-off-by: Otto Sabart <ottosabart@seberm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Pull networking fixes from David Miller:
"Several fixes here. Basically split down the line between newly
introduced regressions and long existing problems:
1) Double free in tipc_enable_bearer(), from Cong Wang.
2) Many fixes to nf_conncount, from Florian Westphal.
3) op->get_regs_len() can throw an error, check it, from Yunsheng
Lin.
4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
driver, from Scott Wood.
5) Inifnite loop in fib_empty_table(), from Yue Haibing.
6) Use after free in ax25_fillin_cb(), from Cong Wang.
7) Fix socket locking in nr_find_socket(), also from Cong Wang.
8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.
9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.
10) Fix ptr_ring wrap during queue swap, from Cong Wang.
11) Missing shutdown callback in hinic driver, from Xue Chaojing.
12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
Brivio.
13) BPF out of bounds speculation fixes from Daniel Borkmann"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
ipv6: Consider sk_bound_dev_if when binding a socket to an address
ipv6: Fix dump of specific table with strict checking
bpf: add various test cases to selftests
bpf: prevent out of bounds speculation on pointer arithmetic
bpf: fix check_map_access smin_value test when pointer contains offset
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
bpf: restrict stack pointer arithmetic for unprivileged
bpf: restrict map value pointer arithmetic for unprivileged
bpf: enable access to ax register also from verifier rewrite
bpf: move tmp variable into ax register in interpreter
bpf: move {prev_,}insn_idx into verifier env
isdn: fix kernel-infoleak in capi_unlocked_ioctl
ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
net/hamradio/6pack: use mod_timer() to rearm timers
net-next/hinic:add shutdown callback
net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
ip: validate header length on virtual device xmit
tap: call skb_probe_transport_header after setting skb->dev
ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
net: rds: remove unnecessary NULL check
...
document on perf security, more Italian translations, more
improvements to the memory-management docs, improvements to the
pathname lookup documentation, and the usual array of smaller
fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAlwmSPkPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y9ZoH/joPnMFykOxS0SmdfI7Z+F4EiJct/ZwF9bHx
T673T0RC30IgnUXGmBl5OtktfWqVh9aGqHOGwgh65ybp2QvzemdP0k6Lu6RtwNk9
6LfkpvuUb8FzaQmCHnSMzMSDmXtZUw3Z/mOjCBcQtfGAsUULNT08xl+Dr+gwWIWt
H+gPEEP+MCXTOQO1jm2dHOHW8NGm6XOijMTpOxp/pkoEY5tUxkVB1T//8EeX7LVh
c1QHzFrufE3bmmubCLtIuyVqZbm/V5l6rHREDQ46fnH/G9fM4gojzsrAL/Y2m4bt
E4y0XJHycjLMRDimAnYhbPm1ryTFAX1lNzHP3M/EF6Heqx8YHAk=
=vtwu
-----END PGP SIGNATURE-----
Merge tag 'docs-5.0' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"A fairly normal cycle for documentation stuff. We have a new document
on perf security, more Italian translations, more improvements to the
memory-management docs, improvements to the pathname lookup
documentation, and the usual array of smaller fixes.
As is often the case, there are a few reaches outside of
Documentation/ to adjust kerneldoc comments"
* tag 'docs-5.0' of git://git.lwn.net/linux: (38 commits)
docs: improve pathname-lookup document structure
configfs: fix wrong name of struct in documentation
docs/mm-api: link slab_common.c to "The Slab Cache" section
slab: make kmem_cache_create{_usercopy} description proper kernel-doc
doc:process: add links where missing
docs/core-api: make mm-api.rst more structured
x86, boot: documentation whitespace fixup
Documentation: devres: note checking needs when converting
doc🇮🇹 add some process/* translations
doc🇮🇹 fixes in process/1.Intro
Documentation: convert path-lookup from markdown to resturctured text
Documentation/admin-guide: update admin-guide index.rst
Documentation/admin-guide: introduce perf-security.rst file
scripts/kernel-doc: Fix struct and struct field attribute processing
Documentation: dev-tools: Fix typos in index.rst
Correct gen_init_cpio tool's documentation
Document /proc/pid PID reuse behavior
Documentation: update path-lookup.md for parallel lookups
Documentation: Use "while" instead of "whilst"
dmaengine: Add mailing list address to the documentation
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for destination MAC in ipset, from Stefano Brivio.
2) Disallow all-zeroes MAC address in ipset, also from Stefano.
3) Add IPSET_CMD_GET_BYNAME and IPSET_CMD_GET_BYINDEX commands,
introduce protocol version number 7, from Jozsef Kadlecsik.
A follow up patch to fix ip_set_byindex() is also included
in this batch.
4) Honor CTA_MARK_MASK from ctnetlink, from Andreas Jaggi.
5) Statify nf_flow_table_iterate(), from Taehee Yoo.
6) Use nf_flow_table_iterate() to simplify garbage collection in
nf_flow_table logic, also from Taehee Yoo.
7) Don't use _bh variants of call_rcu(), rcu_barrier() and
synchronize_rcu_bh() in Netfilter, from Paul E. McKenney.
8) Remove NFC_* cache definition from the old caching
infrastructure.
9) Remove layer 4 port rover in NAT helpers, use random port
instead, from Florian Westphal.
10) Use strscpy() in ipset, from Qian Cai.
11) Remove NF_NAT_RANGE_PROTO_RANDOM_FULLY branch now that
random port is allocated by default, from Xiaozhou Liu.
12) Ignore NF_NAT_RANGE_PROTO_RANDOM too, from Florian Westphal.
13) Limit port allocation selection routine in NAT to avoid
softlockup splats when most ports are in use, from Florian.
14) Remove unused parameters in nf_ct_l4proto_unregister_sysctl()
from Yafang Shao.
15) Direct call to nf_nat_l4proto_unique_tuple() instead of
indirection, from Florian Westphal.
16) Several patches to remove all layer 4 NAT indirections,
remove nf_nat_l4proto struct, from Florian Westphal.
17) Fix RTP/RTCP source port translation when SNAT is in place,
from Alin Nastac.
18) Selective rule dump per chain, from Phil Sutter.
19) Revisit CLUSTERIP target, this includes a deadlock fix from
netns path, sleep in atomic, remove bogus WARN_ON_ONCE()
and disallow mismatching IP address and MAC address.
Patchset from Taehee Yoo.
20) Update UDP timeout to stream after 2 seconds, from Florian.
21) Shrink UDP established timeout to 120 seconds like TCP timewait.
22) Sysctl knobs to set GRE timeouts, from Yafang Shao.
23) Move seq_print_acct() to conntrack core file, from Florian.
24) Add enum for conntrack sysctl knobs, also from Florian.
25) Place nf_conntrack_acct, nf_conntrack_helper, nf_conntrack_events
and nf_conntrack_timestamp knobs in the core, from Florian Westphal.
As a side effect, shrink netns_ct structure by removing obsolete
sysctl anchors, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds two sysctl knobs for GRE:
net.netfilter.nf_conntrack_gre_timeout = 30
net.netfilter.nf_conntrack_gre_timeout_stream = 180
Update the Documentation as well.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We have no explicit signal when a UDP stream has terminated, peers just
stop sending.
For suspected stream connections a timeout of two minutes is sane to keep
NAT mapping alive a while longer.
It matches tcp conntracks 'timewait' default timeout value.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add some pointers to the definition of the CBS algorithm, and some
notes about the limits of its implementation in the i210 family of
controllers.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove skb->sp and allocate secpath storage via extension
infrastructure. This also reduces sk_buff by 8 bytes on x86_64.
Total size of allyesconfig kernel is reduced slightly, as there is
less inlined code (one conditional atomic op instead of two on
skb_clone).
No differences in throughput in following ipsec performance tests:
- transport mode with aes on 10GB link
- tunnel mode between two network namespaces with aes and null cipher
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add explainations for some general IP counters, SACK and DSACK related
counters
Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing garbage collection algorithm has a number of problems:
1. The gc algorithm will not evict PERMANENT entries as those entries
are managed by userspace, yet the existing algorithm walks the entire
hash table which means it always considers PERMANENT entries when
looking for entries to evict. In some use cases (e.g., EVPN) there
can be tens of thousands of PERMANENT entries leading to wasted
CPU cycles when gc kicks in. As an example, with 32k permanent
entries, neigh_alloc has been observed taking more than 4 msec per
invocation.
2. Currently, when the number of neighbor entries hits gc_thresh2 and
the last flush for the table was more than 5 seconds ago gc kicks in
walks the entire hash table evicting *all* entries not in PERMANENT
or REACHABLE state and not marked as externally learned. There is no
discriminator on when the neigh entry was created or if it just moved
from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).
It is possible for entries to be created or for established neighbor
entries to be moved to STALE (e.g., an external node sends an ARP
request) right before the 5 second window lapses:
-----|---------x|----------|-----
t-5 t t+5
If that happens those entries are evicted during gc causing unnecessary
thrashing on neighbor entries and userspace caches trying to track them.
Further, this contradicts the description of gc_thresh2 which says
"Entries older than 5 seconds will be cleared".
One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
whole point of having separate thresholds.
3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
when gc_thresh2 is exceeded is over kill and contributes to trashing
especially during startup.
This patch addresses these problems as follows:
1. Use of a separate list_head to track entries that can be garbage
collected along with a separate counter. PERMANENT entries are not
added to this list.
The gc_thresh parameters are only compared to the new counter, not the
total entries in the table. The forced_gc function is updated to only
walk this new gc_list looking for entries to evict.
2. Entries are added to the list head at the tail and removed from the
front.
3. Entries are only evicted if they were last updated more than 5 seconds
ago, adhering to the original intent of gc_thresh2.
4. Forced gc is stopped once the number of gc_entries drops below
gc_thresh2.
5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
when allocating a new neighbor for a PERMANENT entry. By extension this
means there are no explicit limits on the number of PERMANENT entries
that can be created, but this is no different than FIB entries or FDB
entries.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation/networking/ is full of cryptically named files with
driver documentation. This makes finding interesting information
at a glance really hard. Move all those files into a directory
called device_drivers (since not all drivers are for device) and
fix up references.
RFC v0.1 -> RFC v1:
- also add .txt suffix to the files which are missing it (Quentin)
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Henrik Austad <henrik@austad.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.
'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.
This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The #define for NETIF_F_GSO_UDP_L4 was incorrect in the
documentation, fix it by making it match the actual code.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add a short note about using IPsec Hardware Offload with
the ixgbe driver.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Whilst making an unrelated change to some Documentation, Linus sayeth:
| Afaik, even in Britain, "whilst" is unusual and considered more
| formal, and "while" is the common word.
|
| [...]
|
| Can we just admit that we work with computers, and we don't need to
| use þe eald Englisc spelling of words that most of the world never
| uses?
dictionary.com refers to the word as "Chiefly British", which is
probably an undesirable attribute for technical documentation.
Replace all occurrences under Documentation/ with "while".
Cc: David Howells <dhowells@redhat.com>
Cc: Liam Girdwood <lgirdwood@gmail.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add explanations of some generic TCP counters, fast open
related counters and TCP abort related counters and several
examples.
Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The life-checking function, which is used by kAFS to make sure that a call
is still live in the event of a pending signal, only samples the received
packet serial number counter; it doesn't actually provoke a change in the
counter, rather relying on the server to happen to give us a packet in the
time window.
Fix this by adding a function to force a ping to be transmitted.
kAFS then keeps track of whether there's been a stall, and if so, uses the
new function to ping the server, resetting the timeout to allow the reply
to come back.
If there's a stall, a ping and the call is *still* stalled in the same
place after another period, then the call will be aborted.
Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Fixes: f4d15fb6f9 ("rxrpc: Provide functions for allowing cleaner handling of signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FQ pacing guarantees that paced packets queued by one flow do not
add head-of-line blocking for other flows.
After TCP GSO conversion, increasing limit_output_bytes to 1 MB is safe,
since this maps to 16 skbs at most in qdisc or device queues.
(or slightly more if some drivers lower {gso_max_segs|size})
We still can queue at most 1 ms worth of traffic (this can be scaled
by wifi drivers if they need to)
Tested:
# ethtool -c eth0 | egrep "tx-usecs:|tx-frames:" # 40 Gbit mlx4 NIC
tx-usecs: 16
tx-frames: 16
# tc qdisc replace dev eth0 root fq
# for f in {1..10};do netperf -P0 -H lpaa24,6 -o THROUGHPUT;done
Before patch:
27711
26118
27107
27377
27712
27388
27340
27117
27278
27509
After patch:
37434
36949
36658
36998
37711
37291
37605
36659
36544
37349
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The snmp_counter.rst explains the meanings of snmp counters. It also
provides a set of experiments (only 1 for this initial patch),
combines the experiments' resutls and the snmp counters'
meanings. This is an initial path, only explains a part of IP/ICMP
counters and provide a simple ping test.
Signed-off-by: yupeng <yupeng0921@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
for datagram sockets. Have this default to enabled for reasons of
backwards compatibility. This is so as to specify the output device
with cmsg and IP_PKTINFO, but using a socket not bound to the
corresponding VRF. This allows e.g. older ping implementations to be
run with specifying the device but without executing it in the VRF.
If the option is disabled, packets received in a VRF context are only
handled by a raw socket bound to the VRF, and correspondingly packets
in the default VRF are only handled by a socket not bound to any VRF.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the inet socket lookup to avoid packets arriving on a device
enslaved to an l3mdev from matching unbound sockets by removing the
wildcard for non sk_bound_dev_if and instead relying on check against
the secondary device index, which will be 0 when the input device is
not enslaved to an l3mdev and so match against an unbound socket and
not match when the input device is enslaved.
Change the socket binding to take the l3mdev into account to allow an
unbound socket to not conflict sockets bound to an l3mdev given the
datapath isolation now guaranteed.
Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com>
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As commit 911a91c39c ("kconfig: rename silentoldconfig to
syncconfig") announced, it is time for the removal.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch documents the tcp_fwmark_accept sysctl that was
added in 3.15.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
readability improvements for the formatted output, some LICENSES updates
including the addition of the ISC license, the removal of the unloved and
unmaintained 00-INDEX files, the deprecated APIs document from Kees, more
MM docs from Mike Rapoport, and the usual pile of typo fixes and
corrections.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbztcuAAoJEI3ONVYwIuV6nTAP/0Be+5dNPGJmSnb/RbkwBuBV
zAFVUj2sx4lZlRmWRZ0r7AOef2eSw3IvwBix/vnmllYCVahjp+BdRbhXQAijjyeb
FWWjOH50/J+BaxSthAINiLRLvuoe0D/M08OpmXQfRl5q0S8RufeV3BDtEABx9j2n
IICPGTl8LpPUgSMA4cw8zPhHdauhZpbmL2mGE9LXZ27SJT4S8lcHMwyPU1n5S+Jd
ChEz5g9dYr3GNxFp712pkI5GcVL3tP2nfoVwK7EuGf1tvSnEnn2kzac8QgMqorIh
xB2+Sh4XIUCbHYpGHpxIniD+WI4voNr/E7STQioJK5o2G4HTuxLjktvTezNF8paa
hgNHWjPQBq0OOCdM/rsffONFF2J/v/r7E3B+kaRg8pE0uZWTFaDMs6MVaL2fL4Ls
DrFhi90NJI/Fs7uB4sriiviShAhwboiSIRXJi4VlY/5oFJKHFgqes+R7miU+zTX3
2qv0k4mWZXWDV9w1piPxSCZSdRzaoYSoxEihX+tnYpCyEcYd9ovW/X1Uhl/wCWPl
Ft+Op6rkHXRXVfZzTLuF6PspZ4Udpw2PUcnA5zj5FRDDBsjSMFR31c19IFbCeiNY
kbTIcqejJG1WbVrAK4LCcFyVSGxbrr281eth4rE06cYmmsz3kJy1DB6Lhyg/2vI0
I8K9ZJ99n1RhPJIcburB
=C0wt
-----END PGP SIGNATURE-----
Merge tag 'docs-4.20' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This is a fairly typical cycle for documentation. There's some welcome
readability improvements for the formatted output, some LICENSES
updates including the addition of the ISC license, the removal of the
unloved and unmaintained 00-INDEX files, the deprecated APIs document
from Kees, more MM docs from Mike Rapoport, and the usual pile of typo
fixes and corrections"
* tag 'docs-4.20' of git://git.lwn.net/linux: (41 commits)
docs: Fix typos in histogram.rst
docs: Introduce deprecated APIs list
kernel-doc: fix declaration type determination
doc: fix a typo in adding-syscalls.rst
docs/admin-guide: memory-hotplug: remove table of contents
doc: printk-formats: Remove bogus kobject references for device nodes
Documentation: preempt-locking: Use better example
dm flakey: Document "error_writes" feature
docs/completion.txt: Fix a couple of punctuation nits
LICENSES: Add ISC license text
LICENSES: Add note to CDDL-1.0 license that it should not be used
docs/core-api: memory-hotplug: add some details about locking internals
docs/core-api: rename memory-hotplug-notifier to memory-hotplug
docs: improve readability for people with poorer eyesight
yama: clarify ptrace_scope=2 in Yama documentation
docs/vm: split memory hotplug notifier description to Documentation/core-api
docs: move memory hotplug description into admin-guide/mm
doc: Fix acronym "FEKEK" in ecryptfs
docs: fix some broken documentation references
iommu: Fix passthrough option documentation
...
Now that the documents have been updated to conform to the reStructured Text
guidelines, we can now change the file extensions and update the other
related references.
This converts all of the Intel wired LAN driver documentation to *.rst.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Added the fm10k kernel documentation, which apparently was missing.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Before making the conversion to the RST (reStructured Text) format, there
are changes needed to the documentation so that there are no build errors.
Also fixed old/broken URLs to the correct or updated URL.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Add the SPDX-Lincense-Identifier to the Intel wired Ethernet *.rst
kernel documentation.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
NAPI is enabled by default and IXGB_NAPI was removed since
commit 6d37ab282e ("ixgb: make NAPI the only option and the default")
Update the doc accordingly.
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment
Corporation's first-generation FDDI network interface adapter, made for
TURBOchannel and based on a discrete version of what eventually became
Motorola's widely used CAMEL chipset.
The CAMEL chipset is present for example in the DEC FDDIcontroller
TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support
with the `defxx' driver, however the host bus interface logic and the
firmware API are different in the DEFZA and hence a separate driver is
required.
There isn't much to say about the driver except that it works, but there
is one peculiarity to mention. The adapter implements two Tx/Rx queue
pairs.
Of these one pair is the usual network Tx/Rx queue pair, in this case
used by the adapter to exchange frames with the ring, via the RMC (Ring
Memory Controller) chip. The Tx queue is handled directly by the RMC
chip and resides in onboard packet memory. The Rx queue is maintained
via DMA in host memory by adapter's firmware copying received data
stored by the RMC in onboard packet memory.
The other pair is used to communicate SMT frames with adapter's
firmware. Any SMT frame received from the RMC via the Rx queue must be
queued back by the driver to the SMT Rx queue for the firmware to
process. Similarly the firmware uses the SMT Tx queue to supply the
driver with SMT frames that must be queued back to the Tx queue for the
RMC to send to the ring.
This solution was chosen because the designers ran out of PCB space and
could not squeeze in more logic onto the board that would be required to
handle this SMT frame traffic without the need to involve the driver, as
with the later DEFTA/DEFEA/DEFPA adapters.
Finally the driver does some Frame Control byte decoding, so to avoid
magic numbers some macros are added to <linux/if_fddi.h>.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE
notifications when a device is taken down (admin down) or deleted. IPv4
does not generate a message for routes evicted by the down or delete;
IPv6 does. A NOS at scale really needs to avoid these messages and have
IPv4 and IPv6 behave similarly, relying on userspace to handle link
notifications and evict the routes.
At this point existing user behavior needs to be preserved. Since
notifications are a global action (not per app) the only way to preserve
existing behavior and allow the messages to be skipped is to add a new
sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to
disable the notificatioons.
IPv6 route code already supports the option to skip the message (it is
used for multipath routes for example). Besides the new sysctl we need
to pass the skip_notify setting through the generic fib6_clean and
fib6_walk functions to fib6_clean_node and to set skip_notify on calls
to __ip_del_rt for the addrconf_ifdown path.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-10-08
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.
2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c
3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet
4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski
5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.
6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf
7) various AF_XDP fixes from Björn and Magnus
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bpf_asm and the other classic BPF tools support jump conditions
comparing register A to register X, in addition to comparing
register A with constant K.
Only the latter was documented in filter.txt, add two new addressing
modes that describe the former.
Signed-off-by: Arthur Fabre <arthur@arthurfabre.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix a simple typo: Completetion -> Completion
Signed-off-by: Konrad Djimeli <kdjimeli@igalia.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds a new file to add information about configuration
parameters that are supported by bnxt_en driver via devlink.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new file to add information about some of the
generic configuration parameters set via devlink.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the epoch value to be queried on a server connection. This is in the
rxrpc header of every packet for use in routing and is derived from the
client's state. It's also not supposed to change unless the client gets
restarted.
AFS can make use of this information to deduce whether a fileserver has
been restarted because the fileserver makes client calls to the filesystem
driver's cache manager to send notifications (ie. callback breaks) about
conflicting changes from other clients. These convey the fileserver's own
epoch value back to the filesystem.
Signed-off-by: David Howells <dhowells@redhat.com>
Allow the timestamp on the sk_buff holding the first DATA packet of a reply
to be queried. This can then be used as a base for the expiry time
calculation on the callback promise duration indicated by an operation
result.
Signed-off-by: David Howells <dhowells@redhat.com>
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the new pointer types in the verifier and how the pointer ID
tracking works to ensure that references which are taken are later
released.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2018-10-01
1) Make xfrmi_get_link_net() static to silence a sparse warning.
From Wei Yongjun.
2) Remove a unused esph pointer definition in esp_input().
From Haishuang Yan.
3) Allow the NIC driver to quietly refuse xfrm offload
in case it does not support it, the SA is created
without offload in this case.
From Shannon Nelson.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update document for LRO/RSC support, and the command line info to
change the setting.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the Intel Ethernet Adaptive Virtual Function driver
(i40evf) to a new name (iavf) that is more consistent with
the ongoing maintenance of the driver as the universal VF driver
for multiple product lines.
This first patch fixes up the directory names and the .ko name,
intentionally ignoring the function names inside the driver
for now. Basically this is the simplest patch that gets
the rename done and will be followed by other patches that
rename the internal functions.
This patch also addresses a couple of string/name issues
and updates the Copyright year.
Also, made sure to add a MODULE_ALIAS to the old name.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Document is stale, let's remove it.
Remove TCP congestion document.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a respin with a wider audience (all that get_maintainer returned)
and I know this spams a *lot* of people. Not sure what would be the correct
way, so my apologies for ruining your inbox.
The 00-INDEX files are supposed to give a summary of all files present
in a directory, but these files are horribly out of date and their
usefulness is brought into question. Often a simple "ls" would reveal
the same information as the filenames are generally quite descriptive as
a short introduction to what the file covers (it should not surprise
anyone what Documentation/sched/sched-design-CFS.txt covers)
A few years back it was mentioned that these files were no longer really
needed, and they have since then grown further out of date, so perhaps
it is time to just throw them out.
A short status yields the following _outdated_ 00-INDEX files, first
counter is files listed in 00-INDEX but missing in the directory, last
is files present but not listed in 00-INDEX.
List of outdated 00-INDEX:
Documentation: (4/10)
Documentation/sysctl: (0/1)
Documentation/timers: (1/0)
Documentation/blockdev: (3/1)
Documentation/w1/slaves: (0/1)
Documentation/locking: (0/1)
Documentation/devicetree: (0/5)
Documentation/power: (1/1)
Documentation/powerpc: (0/5)
Documentation/arm: (1/0)
Documentation/x86: (0/9)
Documentation/x86/x86_64: (1/1)
Documentation/scsi: (4/4)
Documentation/filesystems: (2/9)
Documentation/filesystems/nfs: (0/2)
Documentation/cgroup-v1: (0/2)
Documentation/kbuild: (0/4)
Documentation/spi: (1/0)
Documentation/virtual/kvm: (1/0)
Documentation/scheduler: (0/2)
Documentation/fb: (0/1)
Documentation/block: (0/1)
Documentation/networking: (6/37)
Documentation/vm: (1/3)
Then there are 364 subdirectories in Documentation/ with several files that
are missing 00-INDEX alltogether (and another 120 with a single file and no
00-INDEX).
I don't really have an opinion to whether or not we /should/ have 00-INDEX,
but the above 00-INDEX should either be removed or be kept up to date. If
we should keep the files, I can try to keep them updated, but I rather not
if we just want to delete them anyway.
As a starting point, remove all index-files and references to 00-INDEX and
see where the discussion is going.
Signed-off-by: Henrik Austad <henrik@austad.us>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Just-do-it-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: [Almost everybody else]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The DPAA2 Ethernet driver supports Freescale/NXP SoCs with DPAA2
(DataPath Acceleration Architecture v2). The driver manages
network objects discovered on the fsl-mc bus.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the "offload" attribute is used to create an IPsec SA
and the .xdo_dev_state_add() fails, the SA creation fails.
However, if the "offload" attribute is used on a device that
doesn't offer it, the attribute is quietly ignored and the SA
is created without an offload.
Along the same line of that second case, it would be good to
have a way for the device to refuse to offload an SA without
failing the whole SA creation. This patch adds that feature
by allowing the driver to return -EOPNOTSUPP as a signal that
the SA may be fine, it just can't be offloaded.
This allows the user a little more flexibility in requesting
offloads and not needing to know every detail at all times about
each specific NIC when trying to create SAs.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Some of the larger changes this merge window:
- Removal of drivers for Exynos5440, a Samsung SoC that never saw
widespread use.
- Uniphier support for USB3 and SPI reset handling
- Syste control and SRAM drivers and bindings for Allwinner platforms
- Qualcomm AOSS (Always-on subsystem) reset controller drivers
- Raspberry Pi hwmon driver for voltage
- Mediatek pwrap (pmic) support for MT6797 SoC
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAlt+MMkPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3pB4QAIj7iVxSKEQFz65iXLTfMJKFZ9TSvRgWSDyE
CHF+WOQGTnxkvySEHSw/SNqDM+Bas8ijR8b4vWzsXJFB+3HA0ZTGLU379/af1zCE
9k8QjyIWtRWKX9fo7qCHVXlMfxGbOdbCOsh4jnmHqEIDxCHXpIiJRfvUbKIXGpfn
tw6QpM70vm6Q6AdKwzmDbMCYnQAMWxBK/G/Q7BfRG+IYWYjFGbiWIc9BV9Ki8+nE
3235ISaTHvAHodoec8tpLxv34GsOP4RCqscGYEuCf22RYfWva4S9e4yoWT8qPoIl
IHWNsE3YWjksqpt9rj9Pie/PycthO4E4BUPMtqjMbC2OyKFgVsAcHrmToSdd+7ob
t3VNM6RVl8xyWSRlm5ioev15CCOeWRi1nUT7m3UEBWpQ6ihJVpbjf1vVxZRW/E0t
cgC+XzjSg26sWx1bSH9lGPFytOblAcZ04GG/Kpz02MmTgMiTdODFZ67AsqtdeQS7
a9wpaQ+DgTqU0VcQx8Kdq8uy9MOztkhXn5yO8fEWjpm0lPcxjhJS4EpN+Ru2T7/Z
AMuy5lRJfQzAPU9kY7TE0yZ07pgpZgh7LlWOoKtGD7UklzXVVZrVlpn7bApRN5vg
ZLze5OiEiIF5gIiRC8sIyQ9TZdvg4NqwebCqspINixqs7iIpB7TG93WQcy82osSE
TXhtx4Sy
=ZjwY
-----END PGP SIGNATURE-----
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Olof Johansson:
"Some of the larger changes this merge window:
- Removal of drivers for Exynos5440, a Samsung SoC that never saw
widespread use.
- Uniphier support for USB3 and SPI reset handling
- Syste control and SRAM drivers and bindings for Allwinner platforms
- Qualcomm AOSS (Always-on subsystem) reset controller drivers
- Raspberry Pi hwmon driver for voltage
- Mediatek pwrap (pmic) support for MT6797 SoC"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (52 commits)
drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests
soc: fsl: cleanup Kconfig menu
soc: fsl: dpio: Convert DPIO documentation to .rst
staging: fsl-mc: Remove remaining files
staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl
staging: fsl-dpaa2: eth: move generic FD defines to DPIO
soc: fsl: qe: gpio: Add qe_gpio_set_multiple
usb: host: exynos: Remove support for Exynos5440
clk: samsung: Remove support for Exynos5440
soc: sunxi: Add the A13, A23 and H3 system control compatibles
reset: uniphier: add reset control support for SPI
cpufreq: exynos: Remove support for Exynos5440
ata: ahci-platform: Remove support for Exynos5440
soc: imx6qp: Use GENPD_FLAG_ALWAYS_ON for PU errata
soc: mediatek: pwrap: add mt6351 driver for mt6797 SoCs
soc: mediatek: pwrap: add pwrap driver for mt6797 SoCs
soc: mediatek: pwrap: fix cipher init setting error
dt-bindings: pwrap: mediatek: add pwrap support for MT6797
reset: uniphier: add USB3 core reset control
dt-bindings: reset: uniphier: add USB3 core reset support
...
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Infinite loop in IPVS when net namespace is released, from
Tan Hu.
2) Do not show negative timeouts in ip_vs_conn by using the new
jiffies_delta_to_msecs(), patches from Matteo Croce.
3) Set F_IFACE flag for linklocal addresses in ip6t_rpfilter,
from Florian Westphal.
4) Fix overflow in set size allocation, from Taehee Yoo.
5) Use netlink_dump_start() from ctnetlink to fix memleak from
the error path, again from Florian.
6) Register nfnetlink_subsys in last place, otherwise netns
init path may lose race and see net->nft uninitialized data.
This also reverts previous attempt to fix this by increase
netns refcount, patches from Florian.
7) Remove conntrack entries on layer 4 protocol tracker module
removal, from Florian.
8) Use GFP_KERNEL_ACCOUNT for xtables blob allocation, from
Michal Hocko.
9) Get tproxy documentation in sync with existing codebase,
from Mate Eckl.
10) Honor preset layer 3 protocol via ctx->family in the new nft_ct
timeout infrastructure, from Harsha Sharma.
11) Let uapi nfnetlink_osf.h compile standalone with no errors,
from Dmitry V. Levin.
12) Missing braces compilation warning in nft_tproxy, patch from
Mate Eclk.
13) Disregard bogus check to bail out on non-anonymous sets from
the dynamic set update extension.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If set cbs parameters calculated for 1000Mb, but use on 100Mb port
w/o h/w offload (for cpsw offload it doesn't matter), it works
incorrectly. According to the example and testing board, second port
is 100Mb interface. Correct them on recalculated for 100Mb interface.
It allows to use the same command for CBS software implementation for
board in example.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently, transparent proxy support has been added to nf_tables so that
this document should be updated with the new information.
- Nft commands are added as alternatives to iptables ones.
- The link for a patched iptables is removed as it is already part of
the mainline iptables implementation (and the link is dead).
- tcprdr is added as an example implementation of a transparent proxy
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Florian Westphal <fw@strlen.de>
Cc: KOVACS Krisztian <hidden@sch.bme.hu>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Máté Eckl <ecklm94@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Preventing the kernel from responding to ICMP Echo Requests messages
can be useful in several ways. The sysctl parameter
'icmp_echo_ignore_all' can be used to prevent the kernel from
responding to IPv4 ICMP echo requests. For IPv6 pings, such
a sysctl kernel parameter did not exist.
Add the ability to prevent the kernel from responding to IPv6
ICMP echo requests through the use of the following sysctl
parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
Update the documentation to reflect this change.
Signed-off-by: Virgile Jarry <virgile@acceis.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BTF conflicts were simple overlapping changes.
The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
After IPv4 packets are forwarded, the priority of the corresponding SKB
is updated according to the TOS field of IPv4 header. This overrides any
prioritization done earlier by e.g. an skbedit action or ingress-qos-map
defined at a vlan device.
Such overriding may not always be desirable. Even if the packet ends up
being routed, which implies this is an L3 network node, an administrator
may wish to preserve whatever prioritization was done earlier on in the
pipeline.
Therefore introduce a sysctl that controls this behavior. Keep the
default value at 1 to maintain backward-compatible behavior.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add overline heading adornment to document title in order to comply
with kernel doc requirements.
Fixes: 60b9131 staging: fsl-mc: Convert documentation to rst format
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UCAN driver supports the microcontroller-based USB/CAN
adapters from Theobroma Systems. There are two form-factors
that run essentially the same firmware:
* Seal: standalone USB stick ( https://www.theobroma-systems.com/seal )
* Mule: integrated on the PCB of various System-on-Modules from
Theobroma Systems like the A31-µQ7 and the RK3399-Q7
( https://www.theobroma-systems.com/rk3399-q7 )
The USB wire protocol has been designed to be as generic and
hardware-indendent as possible in the hope of being useful for
implementation on other microcontrollers.
Signed-off-by: Martin Elshuber <martin.elshuber@theobroma-systems.com>
Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher@theobroma-systems.com>
Signed-off-by: Philipp Tomsich <philipp.tomsich@theobroma-systems.com>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Preferred kernel docs format is now restructured text. Convert
netdev-FAQ.txt to restructured text.
- Add SPDX license identifier.
- Change file heading 'Information you need to know about netdev' to
'netdev FAQ' to better suit displayed index (in HTML).
- Change question/answer layout to suit rst. Copy format in
Documentation/bpf/bpf_devel_QA.rst
- Fix indentation of code snippets
- If multiple consecutive URLs appear put them in a list (to maintain
whitespace).
- Use uniform spelling of 'bug fix' throughout document (not bugfix or
bug-fix).
- Add double back ticks to 'net' and 'net-next' when referring to the
trees.
- Use rst references for Documentation/ links.
- Add rst label 'netdev-FAQ' for referencing by other docs files.
- Remove stale entry from Documentation/networking/00-INDEX
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves DPAA2 DPIO driver from staging to fsl/soc
Adds multiple-pin support to QE gpio driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbV51GAAoJEIbcUA77rBVUzB0P/1l1XZ14jlyIc4PI8eiEKx2i
Emet7qvEaeeoRYI06Dqtm+VkNYjO2Ev4n+XQYPTZGP3/b+cPh7CEI1N/L+ULFGop
HtD0FsOikvfql7BMHvGCCRLzFYHYjDNpg8JCB/3q+aOhI3/8HQyVIAEyggh1Ztam
NSmMQXHwdB8d1qAGcSYGttiJCIxLcDUtVEGcF6ZN6Lg3orpDHEbCceeQ10f1yayQ
PZuM+F1YFM4Lp17gt92caMSKENsN0Kyk/7lEVPHq0ANGMvVsHIVtZGJML+/ulaeI
v7FZrEicYJVu8LDkFAPeg3qK+O6WirOa9bQEctH7jia43QWZAZ9EROCkFOzlEwx6
+AmOB5BsqMTQsz7HppNOqB6v3zgK898UIYavGeud0c/SaIqAW3uVkKvHLKxXd/uY
K2eyvxcBs9ttK+qLopLWO1QzwWAvedIZFjSDCYpGcWDlhZR1lOqoC1u6wSApX/ZC
h7SGOOhjmzZBLtS89hHn7LnzN7RI6teNmC9uhdFtY+55IVfcRAzX3m2ym/TWPRc8
dQNA/vNMuXK2Hv8rtElqIEVUvWil3p86+640m1fnbkljmSqgzp8vAIAopUbhq2Qj
QytaQBwWPcIoAgKQjLMOypjyCTyNs1oFhKycGlwL4Jq5BwxWq27714fl+dSk4JMz
itj5Fz0+82WeDts7CBjM
=9CHI
-----END PGP SIGNATURE-----
Merge tag 'soc-fsl-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux into next/drivers
Various updates to soc/fsl for 4.19
Moves DPAA2 DPIO driver from staging to fsl/soc
Adds multiple-pin support to QE gpio driver
* tag 'soc-fsl-for-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/leo/linux:
soc: fsl: cleanup Kconfig menu
soc: fsl: dpio: Convert DPIO documentation to .rst
staging: fsl-mc: Remove remaining files
staging: fsl-mc: Move DPIO from staging to drivers/soc/fsl
staging: fsl-dpaa2: eth: move generic FD defines to DPIO
soc: fsl: qe: gpio: Add qe_gpio_set_multiple
Signed-off-by: Olof Johansson <olof@lixom.net>
Convert the Datapath I/O documentation to .rst format
and move to the Documation/networking/dpaa2 directory
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Reviewed-by: Horia Geantă <horia.geanta@nxp.com>
Reviewed-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
This document describes MQPRIO and CBS Qdisc offload configuration
for cpsw driver based on examples. It potentially can be used in
audio video bridging (AVB) and time sensitive networking (TSN).
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel documentation is now restructured text. Convert the Ethernet
Bridge documentation and include it in the toplevel kernel
documentation.
- Fix heading adornments.
- Add license identifier.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel documentation is now restructured text. Convert the IP
aliasing documentation and include it in the toplevel kernel
documentation.
- Fix heading adornments.
- Correctly indent code snippets.
- Limit line length to 72 characters inline with kernel documentation
standards.
- Add license identifier.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a spelling typo in bonding.txt
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently building the net_failover docs causes a bunch of warnings to
be emitted. These warnings are all related to indentation and correctly
highlight missing '::' (for code sections). It looks, from other rst
files in Documentation, that the first column should be indented 2
spaces.
Add '::' before code snippets and indent all snippets uniformly starting
with 2 spaces.
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we have rst format docs for the failover and net_failover
modules however these docs are not linked to within the index.
Add `failover` and `net_failover` to the networking documentation index.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation/networking/e1000.rst:83: ERROR: Unexpected indentation.
Documentation/networking/e1000.rst:84: WARNING: Block quote ends without a blank line; unexpected unindent.
Documentation/networking/e1000.rst:173: WARNING: Definition list ends without a blank line; unexpected unindent.
Documentation/networking/e1000.rst:236: WARNING: Definition list ends without a blank line; unexpected unindent.
While here, fix highlights and mark a table as such.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
addr_gen_mode was introduced in without documentation, add it now.
Fixes: d35a00b8e3 ("net/ipv6: allow sysctl to change link-local address generation mode")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.
XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.
TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.
Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action
and on TX side the netfilter checks if the reference is local before
use it.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replaced strp_pause() with strp_unpause() to correct a seemingly copy
paste documentation mistake.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent patch updated e1000 docs to rst format. Docs build (`make
htmldocs`) is currently failing due to this file with error:
(SEVERE/4) Unexpected section title.
This is because a section of the file is indented 2 spaces. Build error
can be cleared by aligning the text with column 0. While we are changing
these lines we can make sure line length does not exceed 72, that
newlines following headings are uniform, and that full stops are
followed by two spaces.
Align text with column 0, limit line length to 72, ensure two spaces
follow all full stops, ensure uniform use of newlines after heading.
Fixes commit (228046e761 Documentation: e1000: Update kernel documentation)
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent patch updated e100 docs to rst format. Docs build (`make
htmldocs`) is currently failing due to this file with error:
(SEVERE/4) Unexpected section title.
This is because a section of the file is indented 2 spaces. Build error
can be cleared by aligning the text with column 0. While we are changing
these lines we can make sure line length does not exceed 72, that
newlines following headings are uniform, and that full stops are
followed by two spaces.
Align text with column 0, limit line length to 72, ensure two spaces
follow all full stops, ensure uniform use of newlines after heading.
Fixes commit (85d63445f4 Documentation: e100: Update the Intel 10/100 driver doc)
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently documentation file was converted to rst. The document title
has the incorrect heading adornment. From kernel docs:
* Please stick to this order of heading adornments:
1. ``=`` with overline for document title::
==============
Document title
==============
Add overline heading adornment to document title.
Fixes commit (228046e761 Documentation: e1000: Update kernel documentation)
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently documentation file was converted to rst. The document title
has the incorrect heading adornment. From kernel docs:
* Please stick to this order of heading adornments:
1. ``=`` with overline for document title::
==============
Document title
==============
Add overline heading adornment to document title.
Fixes commit (85d63445f4 Documentation: e100: Update the Intel 10/100 driver doc)
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As stated at:
http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#footnotes
A footnote should contain either a number, a reference or
an auto number, e. g.:
[1], [#f1] or [#].
While using [*] accidentaly works for html, it fails for other
document outputs. In particular, it causes an error with LaTeX
output, causing all books after networking to not be built.
So, replace it by a valid syntax.
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Per discussion with David at netconf 2018, let's clarify
DaveM's position of handling stable backports in netdev-FAQ.
This is important for people relying on upstream -stable
releases.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-06-05
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
connect: "This allows to override source IP (including the case when it's
set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
TCP and connected UDP (fast path) are not affected. This makes UDP support
complete, that is, connected UDP is handled by connect hooks, unconnected
by sendmsg ones.", from Andrey.
2) Rework of the AF_XDP API to allow extending it in future for type writer
model if necessary. In this mode a memory window is passed to hardware
and multiple frames might be filled into that window instead of just one
that is the case in the current fixed frame-size model. With the new
changes made this can be supported without having to add a new descriptor
format. Also, core bits for the zero-copy support for AF_XDP have been
merged as agreed upon, where i40e bits will be routed via Jeff later on.
Various improvements to documentation and sample programs included as
well, all from Björn and Magnus.
3) Given BPF's flexibility, a new program type has been added to implement
infrared decoders. Quote: "The kernel IR decoders support the most
widely used IR protocols, but there are many protocols which are not
supported. [...] There is a 'long tail' of unsupported IR protocols,
for which lircd is need to decode the IR. IR encoding is done in such
a way that some simple circuit can decode it; therefore, BPF is ideal.
[...] user-space can define a decoder in BPF, attach it to the rc
device through the lirc chardev.", from Sean.
4) Several improvements and fixes to BPF core, among others, dumping map
and prog IDs into fdinfo which is a straight forward way to correlate
BPF objects used by applications, removing an indirect call and therefore
retpoline in all map lookup/update/delete calls by invoking the callback
directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
for tc BPF programs to have an efficient way of looking up cgroup v2 id
for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
state that hasn't been filled, to allow context access wrt pt_regs in
32 bit archs for tracing, and last but not least various test cases
for fixes that landed in bpf earlier, from Daniel.
5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.
6) Add a new bpf_get_current_cgroup_id() helper that can be used in
tracing to retrieve the cgroup id from the current process in order
to allow for e.g. aggregation of container-level events, from Yonghong.
7) Two follow-up fixes for BTF to reject invalid input values and
related to that also two test cases for BPF kselftests, from Martin.
8) Various API improvements to the bpf_fib_lookup() helper, that is,
dropping MPLS bits which are not fully hashed out yet, rejecting
invalid helper flags, returning error for unsupported address
families as well as renaming flowlabel to flowinfo, from David.
9) Various fixes and improvements to sockmap BPF kselftests in particular
in proper error detection and data verification, from Prashant.
10) Two arm32 BPF JIT improvements. One is to fix imm range check with
regards to whether immediate fits into 24 bits, and a naming cleanup
to get functions related to rsh handling consistent to those handling
lsh, from Wang.
11) Two compile warning fixes in BPF, one for BTF and a false positive
to silent gcc in stack_map_get_build_id_offset(), from Arnd.
12) Add missing seg6.h header into tools include infrastructure in order
to fix compilation of BPF kselftests, from Mathieu.
13) Several formatting cleanups in the BPF UAPI helper description that
also fix an error during rst2man compilation, from Quentin.
14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
not built into the kernel, from Yue.
15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-06-04
This series contains a smorgasbord of updates to documentation, e1000e,
igb, ixgbe, ixgbevf and i40e.
Benjamin Poirier fixes a potential kernel crash due to NULL pointer
dereference in e1000e.
Jeff updates the kernel documentation for e100 and e1000 to correct
default values and URLs which were incorrect in the documentation. Also
took the time to update these to the new reStructured text format for
kernel documentation.
Joanna Yurdal fixes a missing PTP transmit timestamp by ensuring that
TSICR gets cleared when ICR is cleared.
Sergey updates igb to reset all the transmit queues at one time so that
we only have to wait once for all the queues to be reset.
Alex fixes ixgbevf so that malicious driver detection (MDD) can co-exist
with XDP.
Emil and Tony extend the RTNL lock to ensure we get the most up-to-date
values for the bits and avoid a possible race condition when going down.
YueHaibing from Huawei introduces a helper function in ixgbe for
operation reads to simplify the code a bit more.
Daniel Borkmann adds support for XDP meta data when using build SKB
for i40e.
Shannon Nelson provides twp fixes for the IPSec code in ixgbe, first is
to make sure we do not try to offload the decryption of any incoming
packet that is destined for the management engine. The other fix is to
resolve a cast problem introduced by a sparse cleanup patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes some typos/misspelling errors in the
Documentation/networking files.
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.
It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
ie. bound to 'lo' interface or where one of source or destination
IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.
This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').
This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)
Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -> [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.
This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Wei Wang <weiwan@google.com>
Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updated the e1000.txt kernel documentation with the latest information.
Also convert the text file to reStructuredText (RST) format, since the
Linux kernel documentation now uses this format for documentation.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Over the years, several of the links have changed or are no longer valid
so update them. In addition, the default values were incorrect for a
couple of parameters.
Converted the text file to the reStructuredText (RST) format, since the
Linux kernel documentation now uses this format for documentation.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Currently, AF_XDP only supports a fixed frame-size memory scheme where
each frame is referenced via an index (idx). A user passes the frame
index to the kernel, and the kernel acts upon the data. Some NICs,
however, do not have a fixed frame-size model, instead they have a
model where a memory window is passed to the hardware and multiple
frames are filled into that window (referred to as the "type-writer"
model).
By changing the descriptor format from the current frame index
addressing scheme, AF_XDP can in the future be extended to support
these kinds of NICs.
In the index-based model, an idx refers to a frame of size
frame_size. Addressing a frame in the UMEM is done by offseting the
UMEM starting address by a global offset, idx * frame_size + offset.
Communicating via the fill- and completion-rings are done by means of
idx.
In this commit, the idx is removed in favor of an address (addr),
which is a relative address ranging over the UMEM. To convert an
idx-based address to the new addr is simply: addr = idx * frame_size +
offset.
We also stop referring to the UMEM "frame" as a frame. Instead it is
simply called a chunk.
To transfer ownership of a chunk to the kernel, the addr of the chunk
is passed in the fill-ring. Note, that the kernel will mask addr to
make it chunk aligned, so there is no need for userspace to do
that. E.g., for a chunk size of 2k, passing an addr of 2048, 2050 or
3000 to the fill-ring will refer to the same chunk.
On the completion-ring, the addr will match that of the Tx descriptor,
passed to the kernel.
Changing the descriptor format to use chunks/addr will allow for
future changes to move to a type-writer based model, where multiple
frames can reside in one chunk. In this model passing one single chunk
into the fill-ring, would potentially result in multiple Rx
descriptors.
This commit changes the uapi of AF_XDP sockets, and updates the
documentation.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch enables virtio_net to switch over to a VF datapath when STANDBY
feature is enabled and a VF netdev is present with the same MAC address.
It allows live migration of a VM with a direct attached VF without the need
to setup a bond/team between a VF and virtio net device in the guest.
It uses the API that is exported by the net_failover driver to create and
and destroy a master failover netdev. When STANDBY feature is enabled, an
additional netdev(failover netdev) is created that acts as a master device
and tracks the state of the 2 lower netdevs. The original virtio_net netdev
is marked as 'standby' netdev and a passthru device with the same MAC is
registered as 'primary' netdev.
The hypervisor needs to unplug the VF device from the guest on the source
host and reset the MAC filter of the VF to initiate failover of datapath
to virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.
This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The net_failover driver provides an automated failover mechanism via APIs
to create and destroy a failover master netdev and manages a primary and
standby slave netdevs that get registered via the generic failover
infrastructure.
The failover netdev acts a master device and controls 2 slave devices. The
original paravirtual interface gets registered as 'standby' slave netdev and
a passthru/vf device with the same MAC gets registered as 'primary' slave
netdev. Both 'standby' and 'failover' netdevs are associated with the same
'pci' device. The user accesses the network interface via 'failover' netdev.
The 'failover' netdev chooses 'primary' netdev as default for transmits when
it is available with link up and running.
This can be used by paravirtual drivers to enable an alternate low latency
datapath. It also enables hypervisor controlled live migration of a VM with
direct attached VF by failing over to the paravirtual datapath when the VF
is unplugged.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The failover module provides a generic interface for paravirtual drivers
to register a netdev and a set of ops with a failover instance. The ops
are used as event handlers that get called to handle netdev register/
unregister/link change/name change events on slave pci ethernet devices
with the same mac address as the failover netdev.
This enables paravirtual drivers to use a VF as an accelerated low latency
datapath. It also allows migration of VMs with direct attached VFs by
failing over to the paravirtual datapath when the VF is unplugged.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file
before f_count has reached 0, which is fundamentally a bad idea. It
does check 'f_count < 2', which excludes concurrent operations on the
file since they would only be possible with a shared fd table, in which
case each fdget() would take a file reference. However, it fails to
account for the fact that even with 'f_count == 1' the file can still be
linked into epoll instances. As reported by syzbot, this can trivially
be used to cause a use-after-free.
Yet, the only known user of PPPIOCDETACH is pppd versions older than
ppp-2.4.2, which was released almost 15 years ago (November 2003).
Also, PPPIOCDETACH apparently stopped working reliably at around the
same time, when the f_count check was added to the kernel, e.g. see
https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2'
check makes PPPIOCDETACH only work in single-threaded applications; it
always fails if called from a multithreaded application.
All pppd versions released in the last 15 years just close() the file
descriptor instead.
Therefore, instead of hacking around this bug by exporting epoll
internals to modules, and probably missing other related bugs, just
remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave
a stub in place that prints a one-time warning and returns EINVAL.
Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Guillaume Nault <g.nault@alphalink.fr>
Tested-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This per netns sysctl allows for TCP SACK compression fine-tuning.
This limits number of SACK that can be compressed.
Using 0 disables SACK compression.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This per netns sysctl allows for TCP SACK compression fine-tuning.
Its default value is 1,000,000, or 1 ms to meet TSO autosizing period.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch disables RFC6675 loss detection and make sysctl
net.ipv4.tcp_recovery = 1 controls a binary choice between RACK
(1) or RFC6675 (0).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for the classic DUPACK threshold rule
(#DupThresh) in RACK.
When the number of packets SACKed is greater or equal to the
threshold, RACK sets the reordering window to zero which would
immediately mark all the unsacked packets below the highest SACKed
sequence lost. Since this approach is known to not work well with
reordering, RACK only uses it if no reordering has been observed.
The DUPACK threshold rule is a particularly useful extension to the
fast recoveries triggered by RACK reordering timer. For example
data-center transfers where the RTT is much smaller than a timer
tick, or high RTT path where the default RTT/4 may take too long.
Note that this patch differs slightly from RFC6675. RFC6675
considers a packet lost when at least #DupThresh higher-sequence
packets are SACKed.
With RACK, for connections that have seen reordering, RACK
continues to use a dynamically-adaptive time-based reordering
window to detect losses. But for connections on which we have not
yet seen reordering, this patch considers a packet lost when at
least one higher sequence packet is SACKed and the total number
of SACKed packets is at least DupThresh. For example, suppose a
connection has not seen reordering, and sends 10 packets, and
packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2
lost. RACK considers packets 1, 2, 4, 6 lost.
There is some small risk of spurious retransmits here due to
reordering. However, this is mostly limited to the first flight of
a connection on which the sender receives SACKs from reordering.
And RFC 6675 and FACK loss detection have a similar risk on the
first flight with reordering (it's just that the risk of spurious
retransmits from reordering was slightly narrower for those older
algorithms due to the margin of 3*MSS).
Also the minimum reordering window is reduced from 1 msec to 0
to recover quicker on short RTT transfers. Therefore RACK is more
aggressive in marking packets lost during recovery to reduce the
reordering window timeouts.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-05-17
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Provide a new BPF helper for doing a FIB and neighbor lookup
in the kernel tables from an XDP or tc BPF program. The helper
provides a fast-path for forwarding packets. The API supports
IPv4, IPv6 and MPLS protocols, but currently IPv4 and IPv6 are
implemented in this initial work, from David (Ahern).
2) Just a tiny diff but huge feature enabled for nfp driver by
extending the BPF offload beyond a pure host processing offload.
Offloaded XDP programs are allowed to set the RX queue index and
thus opening the door for defining a fully programmable RSS/n-tuple
filter replacement. Once BPF decided on a queue already, the device
data-path will skip the conventional RSS processing completely,
from Jakub.
3) The original sockmap implementation was array based similar to
devmap. However unlike devmap where an ifindex has a 1:1 mapping
into the map there are use cases with sockets that need to be
referenced using longer keys. Hence, sockhash map is added reusing
as much of the sockmap code as possible, from John.
4) Introduce BTF ID. The ID is allocatd through an IDR similar as
with BPF maps and progs. It also makes BTF accessible to user
space via BPF_BTF_GET_FD_BY_ID and adds exposure of the BTF data
through BPF_OBJ_GET_INFO_BY_FD, from Martin.
5) Enable BPF stackmap with build_id also in NMI context. Due to the
up_read() of current->mm->mmap_sem build_id cannot be parsed.
This work defers the up_read() via a per-cpu irq_work so that
at least limited support can be enabled, from Song.
6) Various BPF JIT follow-up cleanups and fixups after the LD_ABS/LD_IND
JIT conversion as well as implementation of an optimized 32/64 bit
immediate load in the arm64 JIT that allows to reduce the number of
emitted instructions; in case of tested real-world programs they
were shrinking by three percent, from Daniel.
7) Add ifindex parameter to the libbpf loader in order to enable
BPF offload support. Right now only iproute2 can load offloaded
BPF and this will also enable libbpf for direct integration into
other applications, from David (Beckett).
8) Convert the plain text documentation under Documentation/bpf/ into
RST format since this is the appropriate standard the kernel is
moving to for all documentation. Also add an overview README.rst,
from Jesper.
9) Add __printf verification attribute to the bpf_verifier_vlog()
helper. Though it uses va_list we can still allow gcc to check
the format string, from Mathieu.
10) Fix a bash reference in the BPF selftest's Makefile. The '|& ...'
is a bash 4.0+ feature which is not guaranteed to be available
when calling out to shell, therefore use a more portable variant,
from Joe.
11) Fix a 64 bit division in xdp_umem_reg() by using div_u64()
instead of relying on the gcc built-in, from Björn.
12) Fix a sock hashmap kmalloc warning reported by syzbot when an
overly large key size is used in hashmap then causing overflows
in htab->elem_size. Reject bogus attr->key_size early in the
sock_hash_alloc(), from Yonghong.
13) Ensure in BPF selftests when urandom_read is being linked that
--build-id is always enabled so that test_stacktrace_build_id[_nmi]
won't be failing, from Alexei.
14) Add bitsperlong.h as well as errno.h uapi headers into the tools
header infrastructure which point to one of the arch specific
uapi headers. This was needed in order to fix a build error on
some systems for the BPF selftests, from Sirio.
15) Allow for short options to be used in the xdp_monitor BPF sample
code. And also a bpf.h tools uapi header sync in order to fix a
selftest build failure. Both from Prashant.
16) More formally clarify the meaning of ID in the direct packet access
section of the BPF documentation, from Wang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1386c36b30.
We don't want to encourage drivers to not report carrier status
correctly, therefore remove this commit.
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a mixed environment it may be difficult to tell if your hardware
support carrier, if it does not it can always report true. With a new
use_carrier option of 2, we can check both carrier and link status
sequentially, instead of one or the other
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For me, as a reader whose mother language isn't English, the
old words bring a little difficulty to catch the meaning, this
patch rewords the subsection in a more clarificatory way.
This patch also add blank lines as separator at two places
to improve readability.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Minor conflict, a CHECK was placed into an if() statement
in net-next, whilst a newline was added to that CHECK
call in 'net'. Thanks to Daniel for the merge resolution.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a sample application for AF_XDP sockets. The application
supports three different modes of operation: rxdrop, txonly and l2fwd.
To show-case a simple round-robin load-balancing between a set of
sockets in an xskmap, set the RR_LB compile time define option to 1 in
"xdpsock.h".
v2: The entries variable was calculated twice in {umem,xq}_nb_avail.
Co-authored-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a documentation for seg_flowlabel sysctl into
Documentation/networking/ip-sysctl.txt
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are 2 staging driver fixups for 4.17-rc3.
The first is the remaining stragglers of the irda code removal that you
pointed out during the merge window. The second is a fix for the
wilc1000 driver due to a patch that got merged in 4.17-rc1.
Both of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWuMyew8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymXxACffYtMbj0Vg5pD0yAPqRzJ2iVMVE0AnRkp4BYQ
kXgAjDeSyrdKPUwQ7Hl2
=UNuF
-----END PGP SIGNATURE-----
Merge tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are two staging driver fixups for 4.17-rc3.
The first is the remaining stragglers of the irda code removal that
you pointed out during the merge window. The second is a fix for the
wilc1000 driver due to a patch that got merged in 4.17-rc1.
Both of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: wilc1000: fix NULL pointer exception in host_int_parse_assoc_resp_info()
staging: irda: remove remaining remants of irda code removal
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
for JIT opcode dumping; this patch is to update the doc for it.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Virtual devices such as tunnels and bonding can handle large packets.
Only segment packets when reaching a physical or loopback device.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The name of the following proc/sysctl entries were incorrectly
documented:
/proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number
/proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number
/proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length
/proc/sys/net/ipv6/conf/<interface>/max_hbt_length
Their name was set to the name of the symbol in the .data field of the
control table instead of their .proc name.
Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were some documentation locations that irda was mentioned, as well
as an old MAINTAINERS entry and the networking sysctl entries. Clean
these all out as this stuff really is finally gone.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The tools are located at tootls/bpf/ instead of tools/net/.
Update the filter.txt doc.
Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the big set of Staging/IIO driver patches for 4.17-rc1.
It is a lot, over 500 changes, but not huge by previous kernel release
standards. We deleted more lines than we added again (27k added vs. 91k
remvoed), thanks to finally being able to delete the IRDA drivers and
networking code.
We also deleted the ccree crypto driver, but that's coming back in
through the crypto tree to you, in a much cleaned-up form.
Added this round is at lot of "mt7621" device support, which is for an
embedded device that Neil Brown cares about, and of course a handful of
new IIO drivers as well.
And finally, the fsl-mc core code moved out of the staging tree to the
"real" part of the kernel, which is nice to see happen as well.
Full details are in the shortlog, which has all of the tiny cleanup
patches described.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWsSnAA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn60ACgxKvU/5XBP14hBkBpAcD0Q43OHe0AniEti65M
Kw03GWK3NNM3pzk49BjZ
=sj3K
-----END PGP SIGNATURE-----
Merge tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big set of Staging/IIO driver patches for 4.17-rc1.
It is a lot, over 500 changes, but not huge by previous kernel release
standards. We deleted more lines than we added again (27k added vs.
91k remvoed), thanks to finally being able to delete the IRDA drivers
and networking code.
We also deleted the ccree crypto driver, but that's coming back in
through the crypto tree to you, in a much cleaned-up form.
Added this round is at lot of "mt7621" device support, which is for an
embedded device that Neil Brown cares about, and of course a handful
of new IIO drivers as well.
And finally, the fsl-mc core code moved out of the staging tree to the
"real" part of the kernel, which is nice to see happen as well.
Full details are in the shortlog, which has all of the tiny cleanup
patches described.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (579 commits)
staging: rtl8723bs: Remove yield call, replace with cond_resched()
staging: rtl8723bs: Replace yield() call with cond_resched()
staging: rtl8723bs: Remove unecessary newlines from 'odm.h'.
staging: rtl8723bs: Rework 'struct _ODM_Phy_Status_Info_' coding style.
staging: rtl8723bs: Rework 'struct _ODM_Per_Pkt_Info_' coding style.
staging: rtl8723bs: Replace NULL pointer comparison with '!'.
staging: rtl8723bs: Factor out rtl8723bs_recv_tasklet() sections.
staging: rtl8723bs: Fix function signature that goes over 80 characters.
staging: rtl8723bs: Fix lines too long in update_recvframe_attrib().
staging: rtl8723bs: Remove unnecessary blank lines in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Change camel case to snake case in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Add missing braces in else statement.
staging: rtl8723bs: Add spaces around ternary operators.
staging: rtl8723bs: Fix lines with trailing open parentheses.
staging: rtl8723bs: Remove unnecessary length #define's.
staging: rtl8723bs: Fix IEEE80211 authentication algorithm constants.
staging: rtl8723bs: Fix alignment in rtw_wx_set_auth().
staging: rtl8723bs: Remove braces from single statement conditionals.
staging: rtl8723bs: Remove unecessary braces from switch statement.
staging: rtl8723bs: Fix newlines in rtw_wx_set_auth().
...
- improve checkpatch for more precise Kconfig code checking
- clarify effective selects by grouping reverse dependencies in help
- do not write out '# CONFIG_FOO is not set' from invisible symbols
- make oldconfig as silent as it should be
- rename 'silentoldconfig' to 'syncconfig'
- add unit-test framework and several test cases
- warn unmet dependency of tristate symbols
- make unmet dependency warnings readable, removing false positives
- improve recursive include detection
- use yylineno to simplify the line number tracking
- misc cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJaw6i1AAoJED2LAQed4NsGU7oP/Rc5DJmtQXbqONvEfVskfNzL
NaQekDa4QuWRCbrAdRZM+8dlAgX76kXHd0I1LnL6XeCZ2KZ7f93zxpqBKHZAteGU
Y6b06tDXiW9qIdI0wzfKB3KdZo0Jc1LELv7SMRrZ7+wFXZKmhh5M0mVX17sKrQai
L3wPMqiu15ve2Ya9s8F8+PGFBZfqzhOBEzYij8YtTZWFWVEfoLDDD5YDUxQNcJrS
FXO+fZH5EUpoWj+JseiIPuOKASChsyeqtwqCND444IrjqDQ0TLAyJSZJhSm+6bX3
qP/yMH0K+kMMkvKp8CCnaTfwkOJ2BZo+91Ydw1mnqLdnZ8gZndnyexrBFubIv+fJ
0KxX9IyGA+RBnwArsnF1yIoryktG3xtaR5diO2p5ztd8xgptKG+PqQfJ40DHjpu4
iZNpoVPIPh669w/Dfx033t1RZVhov8Mau2dZ5RCtpvXAAS6oRe+UmaazqUGt7O2z
V8ekSNL3g7FN3YYx0awXJWzX8e3BDMOcUjRvw/SI16XBk0BdHiBMrYfnRN+e3mpy
FjhzZdXajJclNwMVbUOAPaQypvbBQJjBMy0ryz05ZyTPEsmJqM+WjkPSLDppnMYT
8L3C5KSqC7WKdf1bj55YdGWyfyU0P9bCO026IClnvZNZDI/bg+3gB9ocyRfZG0sL
0Q7GJF+BixbUUQeUcejm
=sKto
-----END PGP SIGNATURE-----
Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig updates from Masahiro Yamada:
- improve checkpatch for more precise Kconfig code checking
- clarify effective selects by grouping reverse dependencies in help
- do not write out '# CONFIG_FOO is not set' from invisible symbols
- make oldconfig as silent as it should be
- rename 'silentoldconfig' to 'syncconfig'
- add unit-test framework and several test cases
- warn unmet dependency of tristate symbols
- make unmet dependency warnings readable, removing false positives
- improve recursive include detection
- use yylineno to simplify the line number tracking
- misc cleanups
* tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
kconfig: use yylineno option instead of manual lineno increments
kconfig: detect recursive inclusion earlier
kconfig: remove duplicated file name and lineno of recursive inclusion
kconfig: do not include both curses.h and ncurses.h for nconfig
kconfig: make unmet dependency warnings readable
kconfig: warn unmet direct dependency of tristate symbols selected by y
kconfig: tests: test if recursive inclusion is detected
kconfig: tests: test if recursive dependencies are detected
kconfig: tests: test randconfig for choice in choice
kconfig: tests: test defconfig when two choices interact
kconfig: tests: check visibility of tristate choice values in y choice
kconfig: tests: check unneeded "is not set" with unmet dependency
kconfig: tests: test if new symbols in choice are asked
kconfig: tests: test automatic submenu creation
kconfig: tests: add basic choice tests
kconfig: tests: add framework for Kconfig unit testing
kbuild: add PYTHON2 and PYTHON3 variables
kconfig: remove redundant streamline_config.pl prerequisite
kconfig: rename silentoldconfig to syncconfig
kconfig: invoke oldconfig instead of silentoldconfig from local*config
...
Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.
Current memory tracking is using one atomic_t and integers.
Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.
Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :
if (... || frag_mem_limit(nf) > nf->high_thresh)
Tested:
$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh
<frag DDOS>
$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880
$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds 3317150 0.0
IpReasmFails 3317112 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.
It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.
This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.
Then there is the problem of sharing this hash table for all netns.
It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.
Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.
Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.
After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)
$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608
A followup patch will change the limits for 64bit arches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Florian Westphal <fw@strlen.de>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify that when disable_ipv6 is enabled even the ipv6 routes
are deleted for the selected interface and from now it will not
be possible to add addresses/routes to that interface
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. This batch comes with more input sanitization for xtables to
address bug reports from fuzzers, preparation works to the flowtable
infrastructure and assorted updates. In no particular order, they are:
1) Make sure userspace provides a valid standard target verdict, from
Florian Westphal.
2) Sanitize error target size, also from Florian.
3) Validate that last rule in basechain matches underflow/policy since
userspace assumes this when decoding the ruleset blob that comes
from the kernel, from Florian.
4) Consolidate hook entry checks through xt_check_table_hooks(),
patch from Florian.
5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
very large compat offset arrays, so we have a reasonable upper limit
and fuzzers don't exercise the oom-killer. Patches from Florian.
6) Several WARN_ON checks on xtables mutex helper, from Florian.
7) xt_rateest now has a hashtable per net, from Cong Wang.
8) Consolidate counter allocation in xt_counters_alloc(), from Florian.
9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
from Xin Long.
10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
Felix Fietkau.
11) Consolidate code through flow_offload_fill_dir(), also from Felix.
12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
to remove a dependency with flowtable and ipv6.ko, from Felix.
13) Cache mtu size in flow_offload_tuple object, this is safe for
forwarding as f87c10a8aa describes, from Felix.
14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
modular infrastructure, from Felix.
15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
Ahmed Abdelsalam.
16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.
17) Support for counting only to nf_conncount infrastructure, patch
from Yi-Hung Wei.
18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
to nft_ct.
19) Use boolean as return value from ipt_ah and from IPVS too, patch
from Gustavo A. R. Silva.
20) Remove useless parameters in nfnl_acct_overquota() and
nf_conntrack_broadcast_help(), from Taehee Yoo.
21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.
22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.
23) Fix typo in xt_limit, from Geert Uytterhoeven.
24) Do no use VLAs in Netfilter code, again from Gustavo.
25) Use ADD_COUNTER from ebtables, from Taehee Yoo.
26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.
27) Use pr_*() and add pr_fmt(), from Arushi Singhal.
28) Add synproxy support to ctnetlink.
29) ICMP type and IGMP matching support for ebtables, patches from
Matthias Schiffer.
30) Support for the revision infrastructure to ebtables, from
Bernie Harris.
31) String match support for ebtables, also from Bernie.
32) Documentation for the new flowtable infrastructure.
33) Use generic comparison functions in ebt_stp, from Joe Perches.
34) Demodularize filter chains in nftables.
35) Register conntrack hooks in case nftables NAT chain is added.
36) Merge assignments with return in a couple of spots in the
Netfilter codebase, also from Arushi.
37) Document that xtables percpu counters are stored in the same
memory area, from Ben Hutchings.
38) Revert mark_source_chains() sanity checks that break existing
rulesets, from Florian Westphal.
39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds initial documentation for the Netfilter flowtable
infrastructure.
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds a basic driver framework for the Intel(R) E800 Ethernet
Series of network devices. There is no functionality right now other than
the ability to load.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As commit cedd55d49d ("kconfig: Remove silentoldconfig from help
and docs; fix kconfig/conf's help") mentioned, 'silentoldconfig' is a
historical misnomer. That commit removed it from help and docs since
it is an internal interface. If so, it should be allowed to rename
it to something more intuitive. 'syncconfig' is the one I came up
with because it updates the .config if necessary, then synchronize
include/generated/autoconf.h and include/config/* with it.
You should not manually invoke 'silentoldcofig'. Display warning if
used in case existing scripts are doing wrong.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Ulf Magnusson <ulfalizer@gmail.com>
Add documentation on rx path setup and cmsg interface.
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fun set of conflict resolutions here...
For the mac80211 stuff, these were fortunately just parallel
adds. Trivially resolved.
In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.
In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.
The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.
The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:
====================
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Net DIM is a generic algorithm, purposed for dynamically
optimizing network devices interrupt moderation. This
document describes how it works and how to use it.
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet_mmap documentation had links to no longer existing web
sites; replace with other site which has similar example.
Support for packet mmap has been in mainline versions of libpcap
for several years.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Socket option SO_ZEROCOPY determines whether the kernel ignores or
processes flag MSG_ZEROCOPY on subsequent send calls. This to avoid
changing behavior for legacy processes.
Limiting the state change to closed sockets is annoying with passive
sockets and not necessary for correctness. Once created, zerocopy skbs
are processed based on their private state, not this socket flag.
Remove the constraint.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No one has publicly stepped up to maintain this broken codebase for
devices that no one uses anymore, so let's just drop the whole thing.
If someone really wants/needs it, we can revert this and they can fix
the code up to work properly.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As well as the basic conversion, I noticed that a lot of the
SCTP code checks gso_type without first checking skb_is_gso()
so I have added that where appropriate.
Also, document the helper.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pretty minor: just SKB_GSO_TCP -> SKB_GSO_TCPV4 and
SKB_GSO_TCP6 -> SKB_GSO_TCPV6.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to bf4e0a3db9
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
unconditionally mangling of that will result in nonsense value
and would corrupt the skb later on.
Therefore, i) add two helpers skb_increase_gso_size() and
skb_decrease_gso_size() that would throw a one time warning and
bail out for such skbs and ii) refuse and return early with an
error in those BPF helpers that are affected. We do need to bail
out as early as possible from there before any changes on the
skb have been performed.
Fixes: 6578171a7f ("bpf: add bpf_skb_change_proto helper")
Co-authored-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We want the IIO/Staging fixes in here, and to resolve a merge problem
with the move of the fsl-mc code.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the source files out of staging into their final locations:
-mc.h include file in drivers/staging/fsl-mc/include go to include/linux/fsl
-source files in drivers/staging/fsl-mc/bus go to drivers/bus/fsl-mc
-overview.rst, providing an overview of DPAA2, goes to
Documentation/networking/dpaa2/overview.rst
Update or delete other remaining staging files -- Makefile, Kconfig, TODO.
Update dpaa2_eth and dpio staging drivers.
Add integration bits for the documentation build system.
Signed-off-by: Stuart Yoder <stuyoder@gmail.com>
[rebased, add dpaa2_eth and dpio #include updates]
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
[rebased, split irqchip to separate patch]
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@nxp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most of this is extracted from 90017accff ("sctp: Add GSO support"),
with some extra text about GSO_BY_FRAGS and the need to check for it.
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The doc originally called it SKB_GSO_REMCSUM. Fix it.
Fixes: f7a6272bf3 ("Documentation: Add documentation for TSO and GSO features")
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
UFO is deprecated except for tuntap and packet per 0c19f846d5,
("net: accept UFO datagrams from tuntap and packet"). Update UFO
docs to reflect this.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096. And the
tcp_wmem/tcp_rmem min default values are 4096.
Fixes: bd68a2a854 ("net: set SK_MEM_QUANTUM to 4096")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2018-01-26
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) A number of extensions to tcp-bpf, from Lawrence.
- direct R or R/W access to many tcp_sock fields via bpf_sock_ops
- passing up to 3 arguments to bpf_sock_ops functions
- tcp_sock field bpf_sock_ops_cb_flags for controlling callbacks
- optionally calling bpf_sock_ops program when RTO fires
- optionally calling bpf_sock_ops program when packet is retransmitted
- optionally calling bpf_sock_ops program when TCP state changes
- access to tclass and sk_txhash
- new selftest
2) div/mod exception handling, from Daniel.
One of the ugly leftovers from the early eBPF days is that div/mod
operations based on registers have a hard-coded src_reg == 0 test
in the interpreter as well as in JIT code generators that would
return from the BPF program with exit code 0. This was basically
adopted from cBPF interpreter for historical reasons.
There are multiple reasons why this is very suboptimal and prone
to bugs. To name one: the return code mapping for such abnormal
program exit of 0 does not always match with a suitable program
type's exit code mapping. For example, '0' in tc means action 'ok'
where the packet gets passed further up the stack, which is just
undesirable for such cases (e.g. when implementing policy) and
also does not match with other program types.
After considering _four_ different ways to address the problem,
we adapt the same behavior as on some major archs like ARMv8:
X div 0 results in 0, and X mod 0 results in X. aarch64 and
aarch32 ISA do not generate any traps or otherwise aborts
of program execution for unsigned divides.
Given the options, it seems the most suitable from
all of them, also since major archs have similar schemes in
place. Given this is all in the realm of undefined behavior,
we still have the option to adapt if deemed necessary.
3) sockmap sample refactoring, from John.
4) lpm map get_next_key fixes, from Yonghong.
5) test cleanups, from Alexei and Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlpq+ZkTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRAe/sog7Dgc9mFcB/wPSu30a664/+wjUvXM7Zdw4ko/PRdS
deSRnjGj3epkHRyGJkdGSuPx9iGg3pqR8poMCZZmFUG+kGBmEcGQX+eyaR41zIUz
iyEgZSufYDjsW47eGBsNE01xQjoL1jcF9JM7NHmRrw4+2YF75cGE3BOGcmcV6Hjc
O5HDIpLmbeMHI4NcujgD4UG/VPnZQw3+oN9eyYUEbY5Aa2XQyW76DIJ3SyKsHQz0
K/s0uxAGo+Ap7xuoBUJpx6BBYoHYM171DTgXfH9pUB0MwqyDCq3hAyYGR+UEdIXb
IDhIcN/l5wFU8VICjYmSKgKyjjHqlixgoki2snmJxVWu0KeVl5LJ1Edv
=7jiC
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.16-20180126' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2018-01-26
this is a pull request for net-next/master consisting of 3 patches.
The first two patches target the CAN documentation. The first is by me
and fixes pointer to location of fsl,mpc5200-mscan node in the mpc5200
documentation. The second patch is by Robert Schwebel and it converts
the plain ASCII documentation to restructured text.
The third patch is by Fabrizio Castro add the r8a774[35] support to the
rcar_can dt-bindings documentation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2018-01-26
One last patch for this development cycle:
1) Add ESN support for IPSec HW offload.
From Yossef Efraim.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel documentation is now restructured text. Convert the SocketCAN
documentation and include it in the toplevel kernel documentation.
This patch doesn't do any content change.
All references to can.txt in the code are converted to can.rst.
Signed-off-by: Robert Schwebel <r.schwebel@pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
o Change process name in ps output: looks like, these days the process
is named kpktgend_<cpu>, rather than pktgen/<cpu>.
o Use pg_ctrl for start/stop as it can work well with pgset without
changes to $(PGDEV) variable.
o Clarify a bit needed $(PGDEV) definition for sample scripts and that
one needs to `source functions.sh`.
o Document how-to unset a behaviour flag, note about history expansion.
o Fix pgset spi parameter value.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we then OR this with 0x40, then the value of 6th bit (0th is first bit)
become known, so the right mask is 0xbf instead of 0xcf.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds ESN support to IPsec device offload.
Adding new xfrm device operation to synchronize device ESN.
Signed-off-by: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
BPF alignment tests got a conflict because the registers
are output as Rn_w instead of just Rn in net-next, and
in net a fixup for a testcase prohibits logical operations
on pointers before using them.
Also, we should attempt to patch BPF call args if JIT always on is
enabled. Instead, if we fail to JIT the subprogs we should pass
an error back up and fail immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kornilios Kourtis <kou@zurich.ibm.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following 'make htmldocs' complaint:
Documentation/networking/msg_zerocopy.rst:: WARNING: document isn't included in any toctree.
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-12-22
1) Separate ESP handling from segmentation for GRO packets.
This unifies the IPsec GSO and non GSO codepath.
2) Add asynchronous callbacks for xfrm on layer 2. This
adds the necessary infrastructure to core networking.
3) Allow to use the layer2 IPsec GSO codepath for software
crypto, all infrastructure is there now.
4) Also allow IPsec GSO with software crypto for local sockets.
5) Don't require synchronous crypto fallback on IPsec offloading,
it is not needed anymore.
6) Check for xdo_dev_state_free and only call it if implemented.
From Shannon Nelson.
7) Check for the required add and delete functions when a driver
registers xdo_dev_ops. From Shannon Nelson.
8) Define xfrmdev_ops only with offload config.
From Shannon Nelson.
9) Update the xfrm stats documentation.
From Shannon Nelson.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of stats that aren't in the documentation file
and rework the top description to be a little more readable.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
- bump version strings, by Simon Wunderlich
- de-inline hash functions to save memory footprint, by Denys Vlasenko
- Add License information to various files, by Sven Eckelmann (3 patches)
- Change batman_adv.h from ISC to MIT, by Sven Eckelmann
- Improve various includes, by Sven Eckelmann (5 patches)
- Lots of kernel-doc work by Sven Eckelmann (8 patches)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlo6QfgWHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeobWfEADPEOdxWS7nW4Xkhug+7vLbcloJ
Om1VDKFD4n5NfB6e+vh8kQAnnnQ/LzFSv53giNdnjjE9IPNKxNhzBQFS95H189EP
ebP0mKOesTadkTx+MjFxenhnaTnzK0hkngdxz/frvaq+i6ECMhnq8Bw0elVG0nSg
X9ts0x6BuNIw6EjIdPP0GvfOV1DvUmdMz1YLJy4yoJ8Tm671y86y07jTJaaEN2Ex
9dp0j3hEqtrYZvgRdQ/hzYLFJ9fvcF1FyA+duufK3FNbkJtZn89zqseIuN2saqqw
QN/nDBrduzW6SR9y0JfXlatI6FN6316jskLcpqorz5/88KrwQeTOg2ZXXn0iw6l1
r0tkBP5/eEu2Dcd3WRsNtTnMZmGfc2uuqmvD1Pz0wN7RAzIYhPvs9to6TVy3mK5p
0OyFJaZU9vurtIPqVYjNtSofkUHKMOZZ1H7LocWaINGIVsREx/i0DmX//M0ZiqbB
mS4ybkn8yOYfFUizg51RYo2nKVyw/ZXNqBYBPUWio91CPj0vOUFitvfxnxNitV/m
182HVdoOnGhorYbS3J/8Su3AyEyhJTVWnmq0z3u1CuvdMhppbAzvXvmERotAJN9e
5Wp26PE2a7zb4LJhMBQ4q+RnUnZV5ADhigrIEGPOQoY5KdIrEOcW65wrgfhWYlER
NVJc2okVZKhg3o7amA==
=5JUK
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20171220' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- de-inline hash functions to save memory footprint, by Denys Vlasenko
- Add License information to various files, by Sven Eckelmann (3 patches)
- Change batman_adv.h from ISC to MIT, by Sven Eckelmann
- Improve various includes, by Sven Eckelmann (5 patches)
- Lots of kernel-doc work by Sven Eckelmann (8 patches)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
GRO. With this flag, we can now independently turn on or off hardware
GRO when GRO is on. Previously, drivers were using NETIF_F_GRO to
control hardware GRO and so it cannot be independently turned on or
off without affecting GRO.
Hardware GRO (just like GRO) guarantees that packets can be re-segmented
by TSO/GSO to reconstruct the original packet stream. Logically,
GRO_HW should depend on GRO since it a subset, but we will let
individual drivers enforce this dependency as they see fit.
Since NETIF_F_GRO is not propagated between upper and lower devices,
NETIF_F_GRO_HW should follow suit since it is a subset of GRO. In other
words, a lower device can independent have GRO/GRO_HW enabled or disabled
and no feature propagation is required. This will preserve the current
GRO behavior. This can be changed later if we decide to propagate GRO/
GRO_HW/RXCSUM from upper to lower devices.
Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The "Linux licensing rules" require that also the restructuredText files
are marked with the appropriate SPDX license identifier.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-12-15
1) Currently we can add or update socket policies, but
not clear them. Support clearing of socket policies
too. From Lorenzo Colitti.
2) Add documentation for the xfrm device offload api.
From Shannon Nelson.
3) Fix IPsec extended sequence numbers (ESN) for
IPsec offloading. From Yossef Efraim.
4) xfrm_dev_state_add function returns success even for
unsupported options, fix this to fail in such cases.
From Yossef Efraim.
5) Remove a redundant xfrm_state assignment.
From Aviv Heller.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.
This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout. Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.
Reported-by: Dragana Damjanovic <ddamjanovic@mozilla.com>
Reported-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2017-12-04
Some update from ieee802154 to *net-next*
Jian-Hong Pan updated our docs to match the APIs in code.
Michael Hennerichs enhanced the adf7242 driver to work with adf7241
devices and reworked the IRQ and packet handling in the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add kernel-doc documentation for sfp kernel APIs, and link it into the
networking kapi documentation under "Network device support".
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add kernel-doc documentation for phylink kernel APIs, and link it into
the networking kapi documentation under "Network device support".
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is not supported anymore, devices needing a MAC address
just assign one at random, it's just a driver pecularity.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a writeup on how to use the XFRM device offload API, and
mention this new file in the index.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
There are more functions and operations which must be used or implemented
in each IEEE 802.15.4 device driver, but are not mentioned in the Device
drivers API section of Documentation/networking/ieee802154.txt. Therefore,
I want to fulfill the missed part into the documentation with this patch.
Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
Acked-by: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
* clarify specification references for v0/v1
* add section "APN vs. Network device"
* add section "Local GTP-U entity and tunnel identification"
Signed-off-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The old driver statement has been added to the kernel docs.
- We have a couple of new helper scripts. find-unused-docs.sh from Sayli
Karnic will point out kerneldoc comments that are not actually used in
the documentation. Jani Nikula's documentation-file-ref-check finds
references to non-existing files.
- A new ftrace document from Steve Rostedt.
- Vinod Koul converted the dmaengine docs to RST
Beyond that, it's mostly simple fixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaCK37AAoJEI3ONVYwIuV63nwQALeqzVwGqqTwiyRyMqgEwMQM
je/6IurEwTHtyfwtW/mztCfNid1CLTiYZg7RET3/zlHjcUI/9VlV2dbBksGFgoQo
muHGqhwTJjXYREwjK3FkzrGckRsVZKJgdzmZYgukCCY6Ir7IffwJKYaLOCZN1S/l
4nBHQpt2nITo0WhdmZjaNRKOQxMA8nN5yGpOIl0neGE6ywIUMgauCCCHhxnOPVWg
ant1HliS8WR8Tizqt9wQgLCvs5lvklsBFibZPO9LBTPG2Zy3HIO9kb+npUAh2MTl
j0Wg39zzOFvVVErqErqUIwIuQ9IrfltHrEHYYoruTvDBXBiMKIcwApF+DS+H3WSp
TnDu3Qif4llM5SZsZGvcjawXNnbck+7SYOe9cyqpylV3SWMWrEX1tbUv6zVuVk+7
fencYBvEZgkJmWbjDeO/Z4S50STxRTzIxFwZgLft7g/RiHo9HvlubjjwQTqBFjxA
fVkolN7h69MGkrD8TF19eapyujqSXaNYH0pFYo87JNOjLgYmezUHyvHd8YeZJL31
Ll0h10HqSNVzJsjFolBMgrC3CcVjsEXdBufu0yVk45sAg9ZiMYOCpwa6Rtp+tfxa
uIBf1LKzfWSa0ocKx7+sMJt0B/CXwU3AMtsbYGyDhFhR2r3cp1NWBHf5nisz9etD
2Md9RDFAMLELZurewB9Q
=H6ud
-----END PGP SIGNATURE-----
Merge tag 'docs-4.15' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A relatively calm cycle for the docs tree again.
- The old driver statement has been added to the kernel docs.
- We have a couple of new helper scripts. find-unused-docs.sh from
Sayli Karnic will point out kerneldoc comments that are not actually
used in the documentation. Jani Nikula's
documentation-file-ref-check finds references to non-existing files.
- A new ftrace document from Steve Rostedt.
- Vinod Koul converted the dmaengine docs to RST
Beyond that, it's mostly simple fixes.
This set reaches outside of Documentation/ a bit more than most. In
all cases, the changes are to comment docs, mostly from Randy, in
places where there didn't seem to be anybody better to take them"
* tag 'docs-4.15' of git://git.lwn.net/linux: (52 commits)
documentation: fb: update list of available compiled-in fonts
MAINTAINERS: update DMAengine documentation location
dmaengine: doc: ReSTize pxa_dma doc
dmaengine: doc: ReSTize dmatest doc
dmaengine: doc: ReSTize client API doc
dmaengine: doc: ReSTize provider doc
dmaengine: doc: Add ReST style dmaengine document
ftrace/docs: Add documentation on how to use ftrace from within the kernel
bug-hunting.rst: Fix an example and a typo in a Sphinx tag
scripts: Add a script to find unused documentation
samples: Convert timers to use timer_setup()
documentation: kernel-api: add more info on bitmap functions
Documentation: fix selftests related file refs
Documentation: fix ref to power basic-pm-debugging
Documentation: fix ref to trace stm content
Documentation: fix ref to coccinelle content
Documentation: fix ref to workqueue content
Documentation: fix ref to sphinx/kerneldoc.py
Documentation: fix locking rt-mutex doc refs
docs: dev-tools: correct Coccinelle version number
...
FACK loss detection has been disabled by default and the
successor RACK subsumed FACK and can handle reordering better.
This patch removes FACK to simplify TCP loss recovery.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a per-device sysctl to specify the default traffic class to use for
kernel originated IPv6 Neighbour Discovery packets.
Currently this includes:
- Router Solicitation (ICMPv6 type 133)
ndisc_send_rs() -> ndisc_send_skb() -> ip6_nd_hdr()
- Neighbour Solicitation (ICMPv6 type 135)
ndisc_send_ns() -> ndisc_send_skb() -> ip6_nd_hdr()
- Neighbour Advertisement (ICMPv6 type 136)
ndisc_send_na() -> ndisc_send_skb() -> ip6_nd_hdr()
- Redirect (ICMPv6 type 137)
ndisc_send_redirect() -> ndisc_send_skb() -> ip6_nd_hdr()
and if the kernel ever gets around to generating RA's,
it would presumably also include:
- Router Advertisement (ICMPv6 type 134)
(radvd daemon could pick up on the kernel setting and use it)
Interface drivers may examine the Traffic Class value and translate
the DiffServ Code Point into a link-layer appropriate traffic
prioritization scheme. An example of mapping IETF DSCP values to
IEEE 802.11 User Priority values can be found here:
https://tools.ietf.org/html/draft-ietf-tsvwg-ieee-802-11
The expected primary use case is to properly prioritize ND over wifi.
Testing:
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
0
jzem22:~# echo -1 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
-bash: echo: write error: Invalid argument
jzem22:~# echo 256 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
-bash: echo: write error: Invalid argument
jzem22:~# echo 0 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# echo 255 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
255
jzem22:~# echo 34 > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# cat /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
34
jzem22:~# echo $[0xDC] > /proc/sys/net/ipv6/conf/eth0/ndisc_tclass
jzem22:~# tcpdump -v -i eth0 icmp6 and src host jzem22.pgc and dst host fe80::1
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
IP6 (class 0xdc, hlim 255, next-header ICMPv6 (58) payload length: 24)
jzem22.pgc > fe80::1: [icmp6 sum ok] ICMP6, neighbor advertisement,
length 24, tgt is jzem22.pgc, Flags [solicited]
(based on original change written by Erik Kline, with minor changes)
v2: fix 'suspicious rcu_dereference_check() usage'
by explicitly grabbing the rcu_read_lock.
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documenation for kernel ILA. This describes ILA, features,
configuration gives some examples.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently TCP RACK loss detection does not work well if packets are
being reordered beyond its static reordering window (min_rtt/4).Under
such reordering it may falsely trigger loss recoveries and reduce TCP
throughput significantly.
This patch improves that by increasing and reducing the reordering
window based on DSACK, which is now supported in major TCP implementations.
It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries.
- If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded
by srtt), since there is possibility that spurious retransmission was
due to reordering delay longer than reo_wnd.
- Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16)
no. of successful recoveries (accounts for full DSACK-based loss
recovery undo). After that, reset it to default (min_rtt/4).
- At max, reo_wnd is incremented only once per rtt. So that the new
DSACK on which we are reacting, is due to the spurious retx (approx)
after the reo_wnd has been updated last time.
- reo_wnd is tracked in terms of steps (of min_rtt/4), rather than
absolute value to account for change in rtt.
In our internal testing, we observed significant increase in throughput,
in scenarios where reordering exceeds min_rtt/4 (previous static value).
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 8200 (IPv6) defines Hop-by-Hop options and Destination options
extension headers. Both of these carry a list of TLVs which is
only limited by the maximum length of the extension header (2048
bytes). By the spec a host must process all the TLVs in these
options, however these could be used as a fairly obvious
denial of service attack. I think this could in fact be
a significant DOS vector on the Internet, one mitigating
factor might be that many FWs drop all packets with EH (and
obviously this is only IPv6) so an Internet wide attack might not
be so effective (yet!).
By my calculation, the worse case packet with TLVs in a standard
1500 byte MTU packet that would be processed by the stack contains
1282 invidual TLVs (including pad TLVS) or 724 two byte TLVs. I
wrote a quick test program that floods a whole bunch of these
packets to a host and sure enough there is substantial time spent
in ip6_parse_tlv. These packets contain nothing but unknown TLVS
(that are ignored), TLV padding, and bogus UDP header with zero
payload length.
25.38% [kernel] [k] __fib6_clean_all
21.63% [kernel] [k] ip6_parse_tlv
4.21% [kernel] [k] __local_bh_enable_ip
2.18% [kernel] [k] ip6_pol_route.isra.39
1.98% [kernel] [k] fib6_walk_continue
1.88% [kernel] [k] _raw_write_lock_bh
1.65% [kernel] [k] dst_release
This patch adds configurable limits to Destination and Hop-by-Hop
options. There are three limits that may be set:
- Limit the number of options in a Hop-by-Hop or Destination options
extension header.
- Limit the byte length of a Hop-by-Hop or Destination options
extension header.
- Disallow unrecognized options in a Hop-by-Hop or Destination
options extension header.
The limits are set in corresponding sysctls:
ipv6.sysctl.max_dst_opts_cnt
ipv6.sysctl.max_hbh_opts_cnt
ipv6.sysctl.max_dst_opts_len
ipv6.sysctl.max_hbh_opts_len
If a max_*_opts_cnt is less than zero then unknown TLVs are disallowed.
The number of known TLVs that are allowed is the absolute value of
this number.
If a limit is exceeded when processing an extension header the packet is
dropped.
Default values are set to 8 for options counts, and set to INT_MAX
for maximum length. Note the choice to limit options to 8 is an
arbitrary guess (roughly based on the fact that the stack supports
three HBH options and just one destination option).
These limits have being proposed in draft-ietf-6man-rfc6434-bis.
Tested (by Martin Lau)
I tested out 1 thread (i.e. one raw_udp process).
I changed the net.ipv6.max_dst_(opts|hbh)_number between 8 to 2048.
With sysctls setting to 2048, the softirq% is packed to 100%.
With 8, the softirq% is almost unnoticable from mpstat.
v2;
- Code and documention cleanup.
- Change references of RFC2460 to be RFC8200.
- Add reference to RFC6434-bis where the limits will be in standard.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a rough overview of the state of the driver. And explain that the
driver operates in two modes: bridged and port-separated.
Signed-off-by: Egil Hjelmeland <egil.hjelmeland@zenitel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is very similar to the Macvlan VEPA mode, however, there is some
difference. IPvlan uses the mac-address of the lower device, so the VEPA
mode has implications of ICMP-redirects for packets destined for its
immediate neighbors sharing same master since the packets will have same
source and dest mac. The external switch/router will send redirect msg.
Having said that, this will be useful tool in terms of debugging
since IPvlan will not switch packets within its slaves and rely completely
on the external entity as intended in 802.1Qbg.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPvlan has always operated in bridge mode. However there are scenarios
where each slave should be able to talk through the master device but
not necessarily across each other. Think of an environment where each
of a namespace is a private and independant customer. In this scenario
the machine which is hosting these namespaces neither want to tell who
their neighbor is nor the individual namespaces care to talk to neighbor
on short-circuited network path.
This patch implements the mode that is very similar to the 'private' mode
in macvlan where individual slaves can send and receive traffic through
the master device, just that they can not talk among slave devices.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two things:
1) Update examples to show usage of metric
2) Discuss reasoning for using such a high metric.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to
ignore signals whilst loading up the message queue, provided progress is
being made in emptying the queue at the other side.
Progress is defined as the base of the transmit window having being
advanced within 2 RTT periods. If the period is exceeded with no progress,
sendmsg() will return anyway, indicating how much data has been copied, if
any.
Once the supplied buffer is entirely decanted, the sendmsg() will return.
Signed-off-by: David Howells <dhowells@redhat.com>
Provide a couple of functions to allow cleaner handling of signals in a
kernel service. They are:
(1) rxrpc_kernel_get_rtt()
This allows the kernel service to find out the RTT time for a call, so
as to better judge how large a timeout to employ.
Note, though, that whilst this returns a value in nanoseconds, the
timeouts can only actually be in jiffies.
(2) rxrpc_kernel_check_life()
This returns a number that is updated when ACKs are received from the
peer (notably including PING RESPONSE ACKs which we can elicit by
sending PING ACKs to see if the call still exists on the server).
The caller should compare the numbers of two calls to see if the call
is still alive.
These can be used to provide an extending timeout rather than returning
immediately in the case that a signal occurs that would otherwise abort an
RPC operation. The timeout would be extended if the server is still
responsive and the call is still apparently alive on the server.
For most operations this isn't that necessary - but for FS.StoreData it is:
OpenAFS writes the data to storage as it comes in without making a backup,
so if we immediately abort it when partially complete on a CTRL+C, say, we
have no idea of the state of the file after the abort.
Signed-off-by: David Howells <dhowells@redhat.com>
Provide support for a kernel service to make use of the service upgrade
facility. This involves:
(1) Pass an upgrade request flag to rxrpc_kernel_begin_call().
(2) Make rxrpc_kernel_recv_data() return the call's current service ID so
that the caller can detect service upgrade and see what the service
was upgraded to.
Signed-off-by: David Howells <dhowells@redhat.com>
* port authorized event for 4-way-HS offload (Avi)
* enable MFP optional for such devices (Emmanuel)
* Kees's timer setup patch for mac80211 mesh
(the part that isn't trivially scripted)
* improve VLAN vs. TXQ handling (myself)
* load regulatory database as firmware file (myself)
* with various other small improvements and cleanups
I merged net-next once in the meantime to allow Kees's
timer setup patch to go in.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlneDzEACgkQa3t4Rpy0
AB3EHBAAhQana6YiMx0Ag4ANGlll3xnxFCZlkmlBoJ/EwKgQhPonylHntuvtkXf6
kZRsOr4uA+wpN/opHLGfMJzat9uxztHVo2sT4rxVnvZq4DYcB/JdlhTMLZDsdDgm
kHRpUEKh/+2FAgq2A4VEUpVb+Mtg0dq8iJJXFw89xb3Sw5UhNA6ljWQZ4zpXuI0P
xOB8Z52LqAcMNnspP+L2TRpanu2ETLcl4Laj+cMl1Yiut2GHkclXUoGvbZ1al5SO
CYqpjVKk67ENLJMrmhQ7DVzj0rpwlV+Eh756RU9DhamPAWbxqWLWJgfuGBskRXnI
GneCUQkLZ5j1kUJjvQdXBv1UmpkCG4/3yITZX8kL3UR+AbhSCqzVQDo7it5hsWEf
XTNAlhdTDhSn7OQQ6XOxvWeydAiaaz671bhPuIvKEo9D/+7Uv0PxHmvu8QqUm0xH
Wvyh0LYRrblDz7fgEkaFctjJKYKnwviQ9O2LGx98C8NVam+Qyti2MlLA4AO5E+it
ky97W3Dh5ftjQhFD0Ip9P4+BO/9hvNELlCRWUXI197n6B0/KH7FWX1eqw/vpnKc4
w7VB/V59mB8zMmZ1QUdwT1/Ru+MD++6ds93STttZvH/0P3H0dDRGuxUK4m32YHiX
s97uSBAbBMy2UH6b8HyxjVMGWvmW3KRakBID1zv2NRSIXtyfWj4=
=gW8q
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2017-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Work continues in various areas:
* port authorized event for 4-way-HS offload (Avi)
* enable MFP optional for such devices (Emmanuel)
* Kees's timer setup patch for mac80211 mesh
(the part that isn't trivially scripted)
* improve VLAN vs. TXQ handling (myself)
* load regulatory database as firmware file (myself)
* with various other small improvements and cleanups
I merged net-next once in the meantime to allow Kees's
timer setup patch to go in.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Parsing and building C structures from a regdb is no longer needed
since the "firmware" file (regulatory.db) can be linked into the
kernel image to achieve the same effect.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
As the current regulatory database is only about 4k big, and already
difficult to extend, we decided that overall it would be better to
get rid of the complications with CRDA and load the database into the
kernel directly, but in a new format that is extensible.
The new file format can be extended since it carries a length field
on all the structs that need to be extensible.
In order to be able to request firmware when the module initializes,
move cfg80211 from subsys_initcall() to the later fs_initcall(); the
firmware loader is at the same level but linked earlier, so it can
be called from there. Otherwise, when both the firmware loader and
cfg80211 are built-in, the request will crash the kernel. We also
need to be before device_initcall() so that cfg80211 is available
for devices when they initialize.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Update Documentation/networking/netvsc.txt for TCP hash level setting
and related info.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should be "802.3ad" like everywhere else in the document.
Signed-off-by: Axel Beckert <abe@deuxchevaux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix NAPI poll list corruption in enic driver, from Christian
Lamparter.
2) Fix route use after free, from Eric Dumazet.
3) Fix regression in reuseaddr handling, from Josef Bacik.
4) Assert the size of control messages in compat handling since we copy
it in from userspace twice. From Meng Xu.
5) SMC layer bug fixes (missing RCU locking, bad refcounting, etc.)
from Ursula Braun.
6) Fix races in AF_PACKET fanout handling, from Willem de Bruijn.
7) Don't use ARRAY_SIZE on spinlock array which might have zero
entries, from Geert Uytterhoeven.
8) Fix miscomputation of checksum in ipv6 udp code, from Subash Abhinov
Kasiviswanathan.
9) Push the ipv6 header properly in ipv6 GRE tunnel driver, from Xin
Long.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
inet: fix improper empty comparison
net: use inet6_rcv_saddr to compare sockets
net: set tb->fast_sk_family
net: orphan frags on stand-alone ptype in dev_queue_xmit_nit
MAINTAINERS: update git tree locations for ieee802154 subsystem
net: prevent dst uses after free
net: phy: Fix truncation of large IRQ numbers in phy_attached_print()
net/smc: no close wait in case of process shut down
net/smc: introduce a delay
net/smc: terminate link group if out-of-sync is received
net/smc: longer delay for client link group removal
net/smc: adapt send request completion notification
net/smc: adjust net_device refcount
net/smc: take RCU read lock for routing cache lookup
net/smc: add receive timeout check
net/smc: add missing dev_put
net: stmmac: Cocci spatch "of_table"
lan78xx: Use default values loaded from EEPROM/OTP after reset
lan78xx: Allow EEPROM write for less than MAX_EEPROM_SIZE
lan78xx: Fix for eeprom read/write when device auto suspend
...
- sysctl and seccomp operation to discover available actions. (tyhicks)
- new per-filter configurable logging infrastructure and sysctl. (tyhicks)
- SECCOMP_RET_LOG to log allowed syscalls. (tyhicks)
- SECCOMP_RET_KILL_PROCESS as the new strictest possible action.
- self-tests for new behaviors.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJZxVbTAAoJEIly9N/cbcAmvIAQALR9aVQQXjma4lLhZxwTsLtG
rJm8t/o4y/2aBV8vzpFbMPT5gfN/PAkHJpCoxVPssx0k4PH2M7HjpnR6E1OC+erg
RNom3uNdNqZeFlDpdX1qriYiCTB9p6rHe0DPwgG9iGqgDxsJ+G3W+x1sMZ1C+A0M
shxA3fwt+Qpivo8Zq44xjMFjK+Zeor9V3yPc51QoZktWHlM16ID3HvHVnUtzqAUb
nTWF6ZlmZlJ/lp4Dq8/55lytVcXPo240G3H0Odai+SNFakK6p5UO//BRBV209bmb
05jpAOH6uym1sxVz00TQXCtDqOEzs2mQgomtTSShHg8SrLFX7nFkEFtAVA6tEri2
FqDYce9KX7ZtOYiq83C7pnpAFCouc0z31dQl9USHiAiexXklwBIX+OsVv98omWGi
pW43uLE2ovY0cpOsN50xI4mnxiGh6MhFcdbor2VLRJwLIFSw3XjjgNCCLyK4AJxs
N514252qi70c9cWyAHYDLy077yTVxu3JUlsVQKtRTMfoFUq6bX1jPXVXE8qkVrui
bc/Ay54pPrUwM854IpQ9ZBOuMfs6I5opocGIsBvMaND45U4o2B0ANCsxhuZ0zEtM
E55DhK5OgjukNemQmlWK2foDckYdtkJXCj2yMBNQady0Uynr2BWZ6VDBP7vFcnRB
UihRlFZRZleu8383uHsc
=sKeC
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
"Major additions:
- sysctl and seccomp operation to discover available actions
(tyhicks)
- new per-filter configurable logging infrastructure and sysctl
(tyhicks)
- SECCOMP_RET_LOG to log allowed syscalls (tyhicks)
- SECCOMP_RET_KILL_PROCESS as the new strictest possible action
- self-tests for new behaviors"
[ This is the seccomp part of the security pull request during the merge
window that was nixed due to unrelated problems - Linus ]
* tag 'seccomp-v4.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
samples: Unrename SECCOMP_RET_KILL
selftests/seccomp: Test thread vs process killing
seccomp: Implement SECCOMP_RET_KILL_PROCESS action
seccomp: Introduce SECCOMP_RET_KILL_PROCESS
seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREAD
seccomp: Action to log before allowing
seccomp: Filter flag to log all actions except SECCOMP_RET_ALLOW
seccomp: Selftest for detection of filter flag support
seccomp: Sysctl to configure actions that are allowed to be logged
seccomp: Operation for checking if an action is available
seccomp: Sysctl to display available actions
seccomp: Provide matching filter for introspection
selftests/seccomp: Refactor RET_ERRNO tests
selftests/seccomp: Add simple seccomp overhead benchmark
selftests/seccomp: Add tests for basic ptrace actions
Currently, writing into
net.ipv6.conf.all.{accept_dad,use_optimistic,optimistic_dad} has no effect.
Fix handling of these flags by:
- using the maximum of global and per-interface values for the
accept_dad flag. That is, if at least one of the two values is
non-zero, enable DAD on the interface. If at least one value is
set to 2, enable DAD and disable IPv6 operation on the interface if
MAC-based link-local address was found
- using the logical OR of global and per-interface values for the
optimistic_dad flag. If at least one of them is set to one, optimistic
duplicate address detection (RFC 4429) is enabled on the interface
- using the logical OR of global and per-interface values for the
use_optimistic flag. If at least one of them is set to one,
optimistic addresses won't be marked as deprecated during source address
selection on the interface.
While at it, as we're modifying the prototype for ipv6_use_optimistic_addr(),
drop inline, and let the compiler decide.
Fixes: 7fd2561e4e ("net: ipv6: Add a sysctl to make optimistic addresses useful candidates")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ASCII art in Documentation/networking/switchdev.txt:
Change non-ASCII "spaces" to ASCII spaces.
Change 2 erroneous '+' characters in ASCII art to '-' (at the '*'
characters below):
line 32:
+--+----+----+----+-*--+----+---+ +-----+-----+
line 41:
+--------------+---*------------+
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
Pull documentation updates from Jonathan Corbet:
"After a fair amount of churn in the last couple of cycles, docs are
taking it easier this time around. Lots of fixes and some new
documentation, but nothing all that radical. Perhaps the most
interesting change for many is the scripts/sphinx-pre-install tool
from Mauro; it will tell you exactly which packages you need to
install to get a working docs toolchain on your system.
There are two little patches reaching outside of Documentation/; both
just tweak kerneldoc comments to eliminate warnings and fix some
dangling doc pointers"
* 'docs-next' of git://git.lwn.net/linux: (52 commits)
Documentation/sphinx: fix kernel-doc decode for non-utf-8 locale
genalloc: Fix an incorrect kerneldoc comment
doc: Add documentation for the genalloc subsystem
assoc_array: fix path to assoc_array documentation
kernel-doc parser mishandles declarations split into lines
docs: ReSTify table of contents in core.rst
docs: process: drop git snapshots from applying-patches.rst
Documentation:input: fix typo
swap: Remove obsolete sentence
sphinx.rst: Allow Sphinx version 1.6 at the docs
docs-rst: fix verbatim font size on tables
Documentation: stable-kernel-rules: fix broken git urls
rtmutex: update rt-mutex
rtmutex: update rt-mutex-design
docs: fix minimal sphinx version in conf.py
docs: fix nested numbering in the TOC
NVMEM documentation fix: A minor typo
docs-rst: pdf: use same vertical margin on all Sphinx versions
doc: Makefile: if sphinx is not found, run a check script
docs: Fix paths in security/keys
...
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree. Basically, updates to the conntrack core, enhancements for
nf_tables, conversion of netfilter hooks from linked list to array to
improve memory locality and asorted improvements for the Netfilter
codebase. More specifically, they are:
1) Add expection to hashes after timer initialization to prevent
access from another CPU that walks on the hashes and calls
del_timer(), from Florian Westphal.
2) Don't update nf_tables chain counters from hot path, this is only
used by the x_tables compatibility layer.
3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
Hooks are always guaranteed to run from rcu read side, so remove
nested rcu_read_lock() where possible. Patch from Taehee Yoo.
4) nf_tables new ruleset generation notifications include PID and name
of the process that has updated the ruleset, from Phil Sutter.
5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
the nf_family netdev family. Patch from Pablo M. Bermudo.
6) Add support for nft_fib in nf_tables netdev family, also from Pablo.
7) Use deferrable workqueue for conntrack garbage collection, to reduce
power consumption, from Patch from Subash Abhinov Kasiviswanathan.
8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
Westphal.
9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.
10) Drop references on conntrack removal path when skbuffs has escaped via
nfqueue, from Florian.
11) Don't queue packets to nfqueue with dying conntrack, from Florian.
12) Constify nf_hook_ops structure, from Florian.
13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.
14) Add nla_strdup(), from Phil Sutter.
15) Rise nf_tables objects name size up to 255 chars, people want to use
DNS names, so increase this according to what RFC 1035 specifies.
Patch series from Phil Sutter.
16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
registration on demand, suggested by Eric Dumazet, patch from Florian.
17) Remove unused variables in compat_copy_entry_from_user both in
ip_tables and arp_tables code. Patch from Taehee Yoo.
18) Constify struct nf_conntrack_l4proto, from Julia Lawall.
19) Constify nf_loginfo structure, also from Julia.
20) Use a single rb root in connlimit, from Taehee Yoo.
21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.
22) Use audit_log() instead of open-coding it, from Geliang Tang.
23) Allow to mangle tcp options via nft_exthdr, from Florian.
24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
a fix for a miscalculation of the minimal length.
25) Simplify branch logic in h323 helper, from Nick Desaulniers.
26) Calculate netlink attribute size for conntrack tuple at compile
time, from Florian.
27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
From Florian.
28) Remove holes in nf_conntrack_l4proto structure, so it becomes
smaller. From Florian.
29) Get rid of print_tuple() indirection for /proc conntrack listing.
Place all the code in net/netfilter/nf_conntrack_standalone.c.
Patch from Florian.
30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
off. From Florian.
31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
Florian.
32) Fix broken indentation in ebtables extensions, from Colin Ian King.
33) Fix several harmless sparse warning, from Florian.
34) Convert netfilter hook infrastructure to use array for better memory
locality, joint work done by Florian and Aaron Conole. Moreover, add
some instrumentation to debug this.
35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
per batch, from Florian.
36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.
37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.
38) Remove unused code in the generic protocol tracker, from Davide
Caratti.
I think I will have material for a second Netfilter batch in my queue if
time allow to make it fit in this merge window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation for this feature was missing from the patchset.
Copied a lot from the netdev 2.1 paper, addressing some small
interface changes since then.
Changes
v1 -> v2
- change email discussion URL format
- clarify that u32 counter is per-syscall, unsigned and
wraps after UINT_MAX calls
- describe errno on send failure specific to MSG_ZEROCOPY
- a few very minor rewordings
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two typos in the document, netvsc.txt,
regarding UDP hashing level. This patch fixes them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RmNet driver provides a transport agnostic MAP (multiplexing and
aggregation protocol) support in embedded module. Module provides
virtual network devices which can be attached to any IP-mode
physical device. This will be used to provide all MAP functionality
on future hardware in a single consistent location.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.
Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).
Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.
Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Explain that the patch queue in patchwork should not be touched by patch
submitters.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow a client call that failed on network error to be retried, provided
that the Tx queue still holds DATA packet 1. This allows an operation to
be submitted to another server or another address for the same server
without having to repackage and re-encrypt the data so far processed.
Two new functions are provided:
(1) rxrpc_kernel_check_call() - This is used to find out the completion
state of a call to guess whether it can be retried and whether it
should be retried.
(2) rxrpc_kernel_retry_call() - Disconnect the call from its current
connection, reset the state and submit it as a new client call to a
new address. The new address need not match the previous address.
A call may be retried even if all the data hasn't been loaded into it yet;
a partially constructed will be retained at the same point it was at when
an error condition was detected. msg_data_left() can be used to find out
how much data was packaged before the error occurred.
Signed-off-by: David Howells <dhowells@redhat.com>
Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
a notification that the AF_RXRPC call has transitioned out the Tx phase and
is now waiting for a reply or a final ACK.
This is called from AF_RXRPC with the call state lock held so the
notification is guaranteed to come before any reply is passed back.
Further, modify the AFS filesystem to make use of this so that we don't have
to change the afs_call state before sending the last bit of data.
Signed-off-by: David Howells <dhowells@redhat.com>
Reflecting IPv6 Flow Label at server nodes is useful in environments
that employ multipath routing to load balance the requests. As "IPv6
Flow Label Reflection" standard draft [1] points out - ICMPv6 PTB error
messages generated in response to a downstream packets from the server
can be routed by a load balancer back to the original server without
looking at transport headers, if the server applies the flow label
reflection. This enables the Path MTU Discovery past the ECMP router in
load-balance or anycast environments where each server node is reachable
by only one path.
Introduce a sysctl to enable flow label reflection per net namespace for
all newly created sockets. Same could be earlier achieved only per
socket by setting the IPV6_FL_F_REFLECT flag for the IPV6_FLOWLABEL_MGR
socket option.
[1] https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As eBPF JIT support for arm32 was added recently with
commit 39c13c204b, it seems appropriate to
add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well.
Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update Documentation/networking/netvsc.txt for UDP hash level setting
and related info.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize hw interface as part of the nic initialization for accessing hw.
Signed-off-by: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for adding SECCOMP_RET_KILL_PROCESS, rename SECCOMP_RET_KILL
to the more accurate SECCOMP_RET_KILL_THREAD.
The existing selftest values are intentionally left as SECCOMP_RET_KILL
just to be sure we're exercising the alias.
Signed-off-by: Kees Cook <keescook@chromium.org>
The function declaration in the lastest include/net/mac802154.h has been
changed since v3.19.
ieee802154_alloc_device => ieee802154_alloc_hw
ieee802154_free_device => ieee802154_free_hw
ieee802154_register_device => ieee802154_register_hw
ieee802154_unregister_device => ieee802154_unregister_hw
However, the description in the Device drivers API section of
Documentation/networking/ieee802154.txt is still in the state of
v3.18.63.
Signed-off-by: Jian-Hong Pan <starnight@g.ncu.edu.tw>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=),
BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that
particularly *JLT/*JLE counterparts involving immediates need
to be rewritten from e.g. X < [IMM] by swapping arguments into
[IMM] > X, meaning the immediate first is required to be loaded
into a register Y := [IMM], such that then we can compare with
Y > X. Note that the destination operand is always required to
be a register.
This has the downside of having unnecessarily increased register
pressure, meaning complex program would need to spill other
registers temporarily to stack in order to obtain an unused
register for the [IMM]. Loading to registers will thus also
affect state pruning since we need to account for that register
use and potentially those registers that had to be spilled/filled
again. As a consequence slightly more stack space might have
been used due to spilling, and BPF programs are a bit longer
due to extra code involving the register load and potentially
required spill/fills.
Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=)
counterparts to the eBPF instruction set. Modifying LLVM to
remove the NegateCC() workaround in a PoC patch at [1] and
allowing it to also emit the new instructions resulted in
cilium's BPF programs that are injected into the fast-path to
have a reduced program length in the range of 2-3% (e.g.
accumulated main and tail call sections from one of the object
file reduced from 4864 to 4729 insns), reduced complexity in
the range of 10-30% (e.g. accumulated sections reduced in one
of the cases from 116432 to 88428 insns), and reduced stack
usage in the range of 1-5% (e.g. accumulated sections from one
of the object files reduced from 824 to 784b).
The modification for LLVM will be incorporated in a backwards
compatible way. Plan is for LLVM to have i) a target specific
option to offer a possibility to explicitly enable the extension
by the user (as we have with -m target specific extensions today
for various CPU insns), and ii) have the kernel checked for
presence of the extensions and enable them transparently when
the user is selecting more aggressive options such as -march=native
in a bpf target context. (Other frontends generating BPF byte
code, e.g. ply can probe the kernel directly for its code
generation.)
[1] https://github.com/borkmann/llvm/tree/bpf-insns
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also bring the eBPF documentation up to date in other ways.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- bump version strings, by Simon Wunderlich
- Remove unnecessary length qualifier, by Joe Perches
- Remove too short %pM field width, by Sven Eckelmann
- Remove return value handling from skb_put_data, by Sven Eckelmann
- Spelling fixes, by Colin Ian King
- Convert batman-adv.txt to reStructuredText, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlmB364WHHN3QHNpbW9u
d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoUL9EADc1EFFdHxbJdLbVL/FjkDkrwpL
60exCnDNLZ06g2+XhViA/Lo5VMPjJTYvNVciG4RbphtZPGwpnXLO1mM+1EntzHLl
Bkvd8pMS8SbhUIlJ4Ua8ret3GoIf2FLz3tEOr/LGO/aJ5iEOS1N9msI1nYK12E6V
4pgLJiHztoLkoEnsCaB30iF/jxlCXKE+AG2LjD65zUuG95DPR0XGGkgdP0qomlvh
kaSG7G4kVCQEsS2W+6TLqC+CKoO3uGGCF1wc4KDD3wgbUU0YCmqhzehrtstCku66
ksNavOy0e0iA3bURo6md/aWWUyOV6wK2uV6QE5ef1gStgXQKYwXa6MzaHVRu8XYZ
SrzfwKQRFDZXzouHTNJCNSeNhrPGKXPs6JSjeQDR1hzwZ0e+5xOct/mgp2VgHhqP
v4xs0ZFcjOWPZ52Yy0kY0r/f4AKTwS20DJLSQaKEST1E0m4rlzpNUxi+/4T1wUSD
LFtXjf9IonS1Weo6Ro5v6x2db4tXuX6pwmlCpfYcAdAK1FFyKIG7HHWw4UV7s85P
5nNbZP/v6K8DsGVVD3I/HEIeoZyi2DnPzYgFeV8pMJy6gYggnu/axdgmd7mgDL3J
aCEaL3rvSbkmnmq6QG/pC0VwXmTR9j945uEBjGICmPq1nzV2rvUt7KvEv1MWBH28
Qv5VAUe8XsIiwKMgFA==
=3oIU
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20170802' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich
- Remove unnecessary length qualifier, by Joe Perches
- Remove too short %pM field width, by Sven Eckelmann
- Remove return value handling from skb_put_data, by Sven Eckelmann
- Spelling fixes, by Colin Ian King
- Convert batman-adv.txt to reStructuredText, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some background documentation on netvsc device options
and limitations.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generalize strparser from more than just being used in conjunction
with read_sock. strparser will also be used in the send path with
zero proxy. The primary change is to create strp_process function
that performs the critical processing on skbs. The documentation
is also updated to reflect the new uses.
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discussion during NFWS 2017 in Faro has shown that the current
conntrack behaviour is unreasonable.
Even if conntrack module is loaded on behalf of a single net namespace,
its turned on for all namespaces, which is expensive. Commit
481fa37347 ("netfilter: conntrack: add nf_conntrack_default_on sysctl")
attempted to provide an alternative to the 'default on' behaviour by
adding a sysctl to change it.
However, as Eric points out, the sysctl only becomes available
once the module is loaded, and then its too late.
So we either have to move the sysctl to the core, or, alternatively,
change conntrack to become active only once the rule set requires this.
This does the latter, conntrack is only enabled when a rule needs it.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Converting the freeform text to parsable reStructuredText, allows the
integration in the sphinx based documentation system of the kernel. It will
therefore be accessible as hypertext under
https://www.kernel.org/doc/html/latest/
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
revert c386578f1c ("xfrm: Let the flowcache handle its size by default.").
Once we remove flow cache, we don't have a flow cache limit anymore.
We must not allow (virtually) unlimited allocations of xfrm dst entries.
Revert back to the old xfrm dst gc limits.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing (void*)val instead of &val would make a pointer out of an integer
and cause sock_setsockopt to -EFAULT.
See tools/testing/selftests/networking/timestamping/timestamping.c
for a working example.
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Ahmad Fatoum <ahmad@a3f.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Those enum values don't exist anymore.
Fixes: 7e13318daa ("net: define gso types for IPx over IPv4 and IPv6")
CC: Tom Herbert <tom@herbertland.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:
1) Several optimizations for UDP processing under high load from
Paolo Abeni.
2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.
3) Support mutliple filter chains per qdisc, from Jiri Pirko.
4) Move to 1ms TCP timestamp clock, from Eric Dumazet.
5) Add batch dequeueing to vhost_net, from Jason Wang.
6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.
7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.
8) Add devlink support to nfp driver, from Simon Horman.
9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.
10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.
11) Support XDP on qed device VFs, from Yuval Mintz.
12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.
13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.
14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.
15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.
16) Support kernel based TLS, from Dave Watson and others.
17) Remove completely DST garbage collection, from Wei Wang.
18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.
19) Add XDP support to Intel i40e driver, from Björn Töpel
20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.
21) IPSEC offloading support in mlx5, from Ilan Tayari.
22) Add HW PTP support to macb driver, from Rafal Ozieblo.
23) Networking refcount_t conversions, From Elena Reshetova.
24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZWkGAAAoJEI3ONVYwIuV6rf0P/0B3JTiVPKS/WUx53+jzbAi4
1BN7dmmuMxE1bWpgdEq+ac4aKxm07iAojuntuMj0qz/ZB1WARcmvEqqzI5i4wfq9
5MrLduLkyuWfr4MOPseKJ2VK83p8nkMOiO7jmnBsilu7fE4nF+5YY9j4cVaArfMy
cCQvAGjQzvej2eiWMGUSLHn4QFKh00aD7cwKyBVsJ08b27C9xL0J2LQyCDZ4yDgf
37/MH3puEd3HX/4qAwLonIxT3xrIrrbDturqLU7OSKcWTtGZNrYyTFbwR3RQtqWd
H8YZVg2Uyhzg9MYhkbQ2E5dEjUP4mkegcp6/JTINH++OOPpTbdTJgirTx7VTkSf1
+kL8t7+Ayxd0FH3+77GJ5RMj8LUK6rj5cZfU5nClFQKWXP9UL3IelQ3Nl+SpdM8v
ZAbR2KjKgH9KS6+cbIhgFYlvY+JgPkOVruwbIAc7wXVM3ibk1sWoBOFEujcbueWh
yDpQv3l1UX0CKr3jnevJoW26LtEbGFtC7gSKZ+3btyeSBpWFGlii42KNycEGwUW0
ezlwryDVHzyTUiKllNmkdK4v73mvPsZHEjgmme4afKAIiUilmcUF4XcqD86hISFT
t+UJLA/zEU+0sJe26o2nK6GNJzmo4oCtVyxfhRe26Ojs1n80xlYgnZRfuIYdd31Z
nwLBnwDCHAOyX91WXp9G
=cVjZ
-----END PGP SIGNATURE-----
Merge tag 'docs-4.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"There has been a fair amount of activity in the docs tree this time
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates"
* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
scripts/kernel-doc: handle DECLARE_HASHTABLE
Documentation: atomic_ops.txt is core-api/atomic_ops.rst
Docs: clean up some DocBook loose ends
Make the main documentation title less Geocities
Docs: Use kernel-figure in vidioc-g-selection.rst
Docs: fix table problems in ras.rst
Docs: Fix breakage with Sphinx 1.5 and upper
Docs: Include the Latex "ifthen" package
doc/kokr/howto: Only send regression fixes after -rc1
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
doc: Document suitability of IBM Verse for kernel development
Doc: fix a markup error in coding-style.rst
docs: driver-api: i2c: remove some outdated information
Documentation: DMA API: fix a typo in a function name
Docs: Insert missing space to separate link from text
doc/ko_KR/memory-barriers: Update control-dependencies example
Documentation, kbuild: fix typo "minimun" -> "minimum"
docs: Fix some formatting issues in request-key.rst
doc: ReSTify keys-trusted-encrypted.txt
doc: ReSTify keys-request-key.txt
...
In the IPVLAN documentation there is an example command line where the
master and slave interface names are inverted.
Fix the command line and also add the optional `name' keyword to better
describe what the command is doing.
v2: added commit message
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It dates back from 2.1.16 and is obsolete since 2.1.68 when the current
rule system has been introduced.
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation for the tcp ULP tls interface.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 61b905da33 ("net: Rename skb->rxhash to skb->hash")
didn't update the documentation, fix this up.
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.
Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.
By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).
An error will be returned if too little or too much data is presented in
the Tx phase.
Signed-off-by: David Howells <dhowells@redhat.com>
It's unused, so remove it.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update tcp.txt to fix mandatory congestion control ops and default
CCA selection. Also, fix comment in tcp.h for undo_cwnd.
Signed-off-by: Anmol Sarma <me@anmolsarma.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it possible for a client to use AuriStor's service upgrade facility.
The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call. This takes no parameters.
When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt. If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.
Note that:
(1) The choice of upgrade service is up to the server
(2) Further client calls to the same server that would share a connection
are blocked if an upgrade probe is in progress.
(3) This should only be used to probe the service. Clients should then
use the returned service ID in all subsequent communications with that
server (and not set the upgrade). Note that the kernel will not
retain this information should the connection expire from its cache.
(4) If a server that supports upgrading is replaced by one that doesn't,
whilst a connection is live, and if the replacement is running, say,
OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
server will not respond to packets sent to the upgraded connection.
At this point, calls will time out and the server must be reprobed.
Signed-off-by: David Howells <dhowells@redhat.com>
Implement AuriStor's service upgrade facility. There are three problems
that this is meant to deal with:
(1) Various of the standard AFS RPC calls have IPv4 addresses in their
requests and/or replies - but there's no room for including IPv6
addresses.
(2) Definition of IPv6-specific RPC operations in the standard operation
sets has not yet been achieved.
(3) One could envision the creation a new service on the same port that as
the original service. The new service could implement improved
operations - and the client could try this first, falling back to the
original service if it's not there.
Unfortunately, certain servers ignore packets addressed to a service
they don't implement and don't respond in any way - not even with an
ABORT. This means that the client must then wait for the call timeout
to occur.
What service upgrade does is to see if the connection is marked as being
'upgradeable' and if so, change the service ID in the server and thus the
request and reply formats. Note that the upgrade isn't mandatory - a
server that supports only the original call set will ignore the upgrade
request.
In the protocol, the procedure is then as follows:
(1) To request an upgrade, the first DATA packet in a new connection must
have the userStatus set to 1 (this is normally 0). The userStatus
value is normally ignored by the server.
(2) If the server doesn't support upgrading, the reply packets will
contain the same service ID as for the first request packet.
(3) If the server does support upgrading, all future reply packets on that
connection will contain the new service ID and the new service ID will
be applied to *all* further calls on that connection as well.
(4) The RPC op used to probe the upgrade must take the same request data
as the shadow call in the upgrade set (but may return a different
reply). GetCapability RPC ops were added to all standard sets for
just this purpose. Ops where the request formats differ cannot be
used for probing.
(5) The client must wait for completion of the probe before sending any
further RPC ops to the same destination. It should then use the
service ID that recvmsg() reported back in all future calls.
(6) The shadow service must have call definitions for all the operation
IDs defined by the original service.
To support service upgrading, a server should:
(1) Call bind() twice on its AF_RXRPC socket before calling listen().
Each bind() should supply a different service ID, but the transport
addresses must be the same. This allows the server to receive
requests with either service ID.
(2) Enable automatic upgrading by calling setsockopt(), specifying
RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
unsigned shorts as the argument:
unsigned short optval[2];
This specifies a pair of service IDs. They must be different and must
match the service IDs bound to the socket. Member 0 is the service ID
to upgrade from and member 1 is the service ID to upgrade to.
Signed-off-by: David Howells <dhowells@redhat.com>
Permit bind() to be called on an AF_RXRPC socket more than once (currently
maximum twice) to bind multiple listening services to it. There are some
restrictions:
(1) All bind() calls involved must have a non-zero service ID.
(2) The service IDs must all be different.
(3) The rest of the address (notably the transport part) must be the same
in all (a single UDP socket is shared).
(4) This must be done before listen() or sendmsg() is called.
This allows someone to connect to the service socket with different service
IDs and lays the foundation for service upgrading.
The service ID used by an incoming call can be extracted from the msg_name
returned by recvmsg().
Signed-off-by: David Howells <dhowells@redhat.com>
The addition of the AVF and virtchnl code to the i40evf driver
means we should update the i40evf.txt file with the most up to date
information.
It seems this file hasn't been updated in a while, so the
changes cover a little more than just AVF, but it's all only
in the i40evf.txt.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
be looped to the socket's error queue with a software timestamp even
when a hardware transmit timestamp is expected to be provided by the
driver.
Applications using this option will receive two separate messages from
the error queue, one with a software timestamp and the other with a
hardware timestamp. As the hardware timestamp is saved to the shared skb
info, which may happen before the first message with software timestamp
is received by the application, the hardware timestamp is copied to the
SCM_TIMESTAMPING control message only when the skb has no software
timestamp or it is an incoming packet.
While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
there are no other users.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scm_timestamping struct may return multiple non-zero fields, e.g.
when both software and hardware RX timestamping is enabled, or when the
SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
software timestamp is generated in the recvmsg() call in order to always
return a SCM_TIMESTAMP(NS) message.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
for incoming packets with hardware timestamps. It contains the index of
the real interface which received the packet and the length of the
packet at layer 2.
The index is useful with bonding, bridges and other interfaces, where
IP_PKTINFO doesn't allow applications to determine which PHC made the
timestamp. With the L2 length (and link speed) it is possible to
transpose preamble timestamps to trailer timestamps, which are used in
the NTP protocol.
While this information could be provided by two new socket options
independently from timestamping, it doesn't look like they would be very
useful. With this option any performance impact is limited to hardware
timestamping.
Use dev_get_by_napi_id() to get the device and its index. On kernels
with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
index will be returned in the control message.
CC: Richard Cochran <richardcochran@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to
determine if skb needs software computation of Internet Checksum or crc32c
(or nothing, if this computation can be done by the hardware). Use it in
place of skb_checksum_help() in validate_xmit_skb() to avoid corruption
of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL.
While at it, remove references to skb_csum_off_chk* functions, since they
are not present anymore in Linux _ see commit cf53b1da73 ("Revert
"net: Add driver helper functions to determine checksum offloadability"").
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mauro says:
This patch series convert the remaining DocBooks to ReST.
The first version was originally
send as 3 patch series:
[PATCH 00/36] Convert DocBook documents to ReST
[PATCH 0/5] Convert more books to ReST
[PATCH 00/13] Get rid of DocBook
The lsm book was added as if it were a text file under
Documentation. The plan is to merge it with another file
under Documentation/security, after both this series and
a security Documentation patch series gets merged.
It also adjusts some Sphinx-pedantic errors/warnings on
some kernel-doc markups.
I also added some patches here to add PDF output for all
existing ReST books.
Adjusts for ReST markup and moves under keys security devel index.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Use pandoc to convert documentation to ReST by calling
Documentation/sphinx/tmplcvt script.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Use pandoc to convert documentation to ReST by calling
Documentation/sphinx/tmplcvt script.
Some manual adjustments were required due to some
conversion error on some literals.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Use pandoc to convert documentation to ReST by calling
Documentation/sphinx/tmplcvt script.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
guide for user-space API documents, rather sparsely populated at the
moment, but it's a start. Markus improved the infrastructure for
converting diagrams. Mauro has converted much of the USB documentation
over to RST. Plus the usual set of fixes, improvements, and tweaks.
There's a bit more than the usual amount of reaching out of Documentation/
to fix comments elsewhere in the tree; I have acks for those where I could
get them.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZB1elAAoJEI3ONVYwIuV6wUIQAJSM/4rNdj6z+GXeWhRfbsOo
vqqVYluvXQIJaaqdsy9dgcfThhOXWYsPyVF6Xd+bDJpwF3BMZYbX1CI1Mo3kRD+7
9+Pf68cYSHRoU3l/sFI8q0zfKbHtmFteIvnRQoFtRaExqgTR8glUfxNDyN9XuNAZ
3naS4qMZivM4gjMcSpIB/wFOQpV+6qVIs6VTFLdCC8wodT3W/Wmb+bqrCVJ0twbB
t8jJeYHt2wsiTdqrKU+VilAUAZ1Lby+DNfeWrO18rC1ohktPyUzOGg8JmTKUBpVO
qj1OJwD6abuaNh/J9bXsh8u0OrVrBKWjVrhq9IFYDlm92fu3Bgr6YeoaVPEpcklt
jdlgZnWs9/oXa6d32aMc9F7mP9a0Q1qikFTYINhaHQZCb4VDRuQ9hCSuqWm5jlVy
lmVAoxLa0zSdOoXaYuO3HC99ku1cIn814CXMDz/IwKXkqUCV+zl+H3AGkvxGyQ5M
eblw2TnQnc6e1LRcxt5bgpFR1JYMbCJhu0U5XrNFueQV8ReB15dvL7h4y21dWJKF
2Sr83rwfG1rpZQiSqCjOXxIzuXbEGH3+a+zCDV5IHhQRt/VNDOt2hgmcyucSSJ5h
5GRFYgTlGvoT/6LdIT39QooHB+4tSDRtEQ6lh0q2ZtVV2rfG/I6/PR5sUbWM65SN
vAfctRm2afHLhdonSX5O
=41m+
-----END PGP SIGNATURE-----
Merge tag 'docs-4.12' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"A reasonably busy cycle for documentation this time around. There is a
new guide for user-space API documents, rather sparsely populated at
the moment, but it's a start. Markus improved the infrastructure for
converting diagrams. Mauro has converted much of the USB documentation
over to RST. Plus the usual set of fixes, improvements, and tweaks.
There's a bit more than the usual amount of reaching out of
Documentation/ to fix comments elsewhere in the tree; I have acks for
those where I could get them"
* tag 'docs-4.12' of git://git.lwn.net/linux: (74 commits)
docs: Fix a couple typos
docs: Fix a spelling error in vfio-mediated-device.txt
docs: Fix a spelling error in ioctl-number.txt
MAINTAINERS: update file entry for HSI subsystem
Documentation: allow installing man pages to a user defined directory
Doc/PM: Sync with intel_powerclamp code behavior
zr364xx.rst: usb/devices is now at /sys/kernel/debug/
usb.rst: move documentation from proc_usb_info.txt to USB ReST book
convert philips.txt to ReST and add to media docs
docs-rst: usb: update old usbfs-related documentation
arm: Documentation: update a path name
docs: process/4.Coding.rst: Fix a couple of document refs
docs-rst: fix usb cross-references
usb: gadget.h: be consistent at kernel doc macros
usb: composite.h: fix two warnings when building docs
usb: get rid of some ReST doc build errors
usb.rst: get rid of some Sphinx errors
usb/URB.txt: convert to ReST and update it
usb/persist.txt: convert to ReST and add to driver-api book
usb/hotplug.txt: convert to ReST and add to driver-api book
...
Figure 1 is full of whitespaces; fix it
Signed-off-by: Liam Beguin <lbeguin@tycoint.com>
Signed-off-by: Sylvain Lemieux <slemieux@tycoint.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Middlebox firewall issues can potentially cause server's data being
blackholed after a successful 3WHS using TFO. Following are the related
reports from Apple:
https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
Slide 31 identifies an issue where the client ACK to the server's data
sent during a TFO'd handshake is dropped.
C ---> syn-data ---> S
C <--- syn/ack ----- S
C (accept & write)
C <---- data ------- S
C ----- ACK -> X S
[retry and timeout]
https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
Slide 5 shows a similar situation that the server's data gets dropped
after 3WHS.
C ---- syn-data ---> S
C <--- syn/ack ----- S
C ---- ack --------> S
S (accept & write)
C? X <- data ------ S
[retry and timeout]
This is the worst failure b/c the client can not detect such behavior to
mitigate the situation (such as disabling TFO). Failing to proceed, the
application (e.g., SSL library) may simply timeout and retry with TFO
again, and the process repeats indefinitely.
The proposed solution is to disable active TFO globally under the
following circumstances:
1. client side TFO socket detects out of order FIN
2. client side TFO socket receives out of order RST
We disable active side TFO globally for 1hr at first. Then if it
happens again, we disable it for 2h, then 4h, 8h, ...
And we reset the timeout to 1hr if a client side TFO sockets not opened
on loopback has successfully received data segs from server.
And we examine this condition during close().
The rational behind it is that when such firewall issue happens,
application running on the client should eventually close the socket as
it is not able to get the data it is expecting. Or application running
on the server should close the socket as it is not able to receive any
response from client.
In both cases, out of order FIN or RST will get received on the client
given that the firewall will not block them as no data are in those
frames.
And we want to disable active TFO globally as it helps if the middle box
is very close to the client and most of the connections are likely to
fail.
Also, add a debug sysctl:
tcp_fastopen_blackhole_detect_timeout_sec:
the initial timeout to use when firewall blackhole issue happens.
This can be set and read.
When setting it to 0, it means to disable the active disable logic.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
update the list and remove 'in the future' statement,
since all still alive 64-bit architectures now do eBPF JIT.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no usbfs anymore. The old features are now either
exported to /dev/bus/usb or via debugfs.
Update documentation accordingly, pointing to the new
places where the character devices and usb/devices are
now placed.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
As ftp.kernel.org is closed [0], this commit fixes dead URLs in
documents to use www.kernel.org instead.
[0] https://www.kernel.org/shutting-down-ftp-services.html
Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-03-23
This series contains updates to i40e and i40e.txt documentation.
Jake provides all the changes in the series which are centered around
ntuple filter fixes and additional support. Fixed the current
implementation of .set_rxnfc, where we were not reading the mask field
for filter entries which was resulting in filters not behaving as
expected and not working correctly. When cleaning up after disabling
flow director support, ensure that the default input set is correctly
reprogrammed. Since the hardware only supports a single input set for
all flows of that type, the driver shall only allow the input set to
change if there are no other configured filters for that flow type, so
add support to detect when we can update the input set for each flow
type. Align the driver to other drivers to partition the ring_cookie
value into 8bits of VF index, along with 32bits of queue number instead
of using the user-def field. Added support to parse the user-def field
into a data structure format to allow future extensions of the user-def
filed by keeping all the code that read/writes the field into a single
location. Added support for flexible payloads passed via ethtool
user-def field. We support a single flexible word (2byte) value per
protocol type, and we handle the FLX_PIT register using a list of
flexible entries so that each flow type may be configured separately.
Enabled flow director filters for SCTPv4 packets using the ethtool
ntuple interface to enable filters. Updated the documentation on the
i40e driver to include the newly added support to ntuple filters.
Reduced complexity of a if-continue-else-break section of code by
taking advantage of using hlist_for_each_entry_continue() instead.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.
By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -> 788Mbps unconnected single stream UDPv4
633 -> 654Mbps unconnected UDPv4 different sources
The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.
Both sysctls are enabled by default to preserve existing behavior.
v1->v2: Change function pointer instead of adding conditional as
suggested by Stephen.
v2->v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.
v3->v4: Store and use read once result instead of querying pointer
again incorrectly.
v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Tom Herbert <tom@herbertland.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add documentation describing the drivers use of ethtool ntuple filters,
including the limitations that it has due to hardware, as well as how it
reads and parses the user-def data block.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This commit adds a new sysctl accept_ra_rt_info_min_plen that
defines the minimum acceptable prefix length of Route Information
Options. The new sysctl is intended to be used together with
accept_ra_rt_info_max_plen to configure a range of acceptable
prefix lengths. It is useful to prevent misconfigurations from
unintentionally blackholing too much of the IPv6 address space
(e.g., home routers announcing RIOs for fc00::/7, which is
incorrect).
Signed-off-by: Joel Scherpelz <jscherpelz@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
0 - layer 3 (default)
1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your
net-next tree. A couple of new features for nf_tables, and unsorted
cleanups and incremental updates for the Netfilter tree. More
specifically, they are:
1) Allow to check for TCP option presence via nft_exthdr, patch
from Phil Sutter.
2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana.
3) Use pr_cont() in ebt_log, from Joe Perches.
4) Remove some dead code in arp_tables reported via static analysis
tool, from Colin Ian King.
5) Consolidate nf_tables expression validation, from Liping Zhang.
6) Consolidate set lookup via nft_set_lookup().
7) Remove unnecessary rcu read lock side in bridge netfilter, from
Florian Westphal.
8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo.
9) Pass nft_ctx struct to object initialization indirections, from
Florian Westphal.
10) Add code to integrate conntrack helper into nf_tables, also from
Florian.
11) Allow to check if interface index or name exists via
NFTA_FIB_F_PRESENT, from Phil Sutter.
12) Simplify resolve_normal_ct(), from Florian.
13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang.
14) Use rwlock in nft_set_rbtree set, also from Liping Zhang.
15) One patch to remove a useless printk at netns init path in ipvs,
and several patches to document IPVS knobs.
16) Use refcount_t for reference counter in the Netfilter/IPVS code,
from Elena Reshetova.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.
After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.
Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Lutz Vieweg <lvml@5t9.de>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document sysctl pmtu_disc based on commit 3654e61137 ("ipvs: add
pmtu_disc option to disable IP DF for TUN packets").
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Document sysctl sync_ports based on commit f73181c828 ("ipvs: add support
for sync threads").
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Document sysctl sync_qlen_max and sync_sock_size based on
commit 1c003b1580 ("ipvs: wakeup master thread").
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Fix sync_threshold description which should have two values. Also add
sync_refresh_period and sync_retries based on commit 749c42b620
("ipvs: reduce sync rate with time thresholds").
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Conflicts:
drivers/net/ethernet/broadcom/genet/bcmgenet.c
net/core/sock.c
Conflicts were overlapping changes in bcmgenet and the
lockdep handling of sockets.
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow TTL propagation from IP packets to MPLS packets to be
configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
allows the TTL to be set in the resulting MPLS packet, with the value
of 0 having the semantics of enabling propagation of the TTL from the
IP header (i.e. non-zero values disable propagation).
Also allow the configuration to be overridden globally by reusing the
same sysctl to control whether the TTL is propagated from IP packets
into the MPLS header. If the per-LWT attribute is set then it
overrides the global configuration. If the TTL isn't propagated then a
default TTL value is used which can be configured via a new sysctl,
"net.mpls.default_ttl". This is kept separate from the configuration
of whether IP TTL propagation is enabled as it can be used in the
future when non-IP payloads are supported (i.e. where there is no
payload TTL that can be propagated).
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide the ability to control on a per-route basis whether the TTL
value from an MPLS packet is propagated to an IPv4/IPv6 packet when
the last label is popped as per the theoretical model in RFC 3443
through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
mean disable propagation and 1 to mean enable propagation.
In order to provide the ability to change the behaviour for packets
arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
way for a user to change the behaviour for all existing routes without
having to reprogram them, a global knob is provided. This is done
through the addition of a new per-namespace sysctl,
"net.mpls.ip_ttl_propagate", which defaults to enabled. If the
per-route attribute is set (either enabled or disabled) then it
overrides the global configuration.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It wasn't clear if the 'forwarding' setting needs to be enabled on the
interface that packets are received from, or on the interface that
packets are forwarded to, or both.
In fact (according to my code reading) the setting is relevant on the
interface that packets are received from, so this change updates the doc
to say that.
Signed-off-by: Neil Jerram <neil@tigera.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix typos and add the following to the scripts/spelling.txt:
an user||a user
an userspace||a userspace
I also added "userspace" to the list since it is a common word in Linux.
I found some instances for "an userfaultfd", but I did not add it to the
list. I felt it is endless to find words that start with "user" such as
"userland" etc., so must draw a line somewhere.
Link: http://lkml.kernel.org/r/1481573103-11329-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Three more DocBook template files have been converted to RST; only 21 to
go. There are various build improvements and the usual array of
documentation improvements and fixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYriFXAAoJEI3ONVYwIuV6iTMP/iV7ownq9IK1f8askcXKM76i
NoRdj4/JywAPQ73vLhOSDVELGdVJNRBjdyOdBRzxPgsqAhFmm79lVYV2eLIffQ2k
7LcVbEQR77I+4z9SwqIVbIWNCBry7Hu8aWh7moDL3I6yeuay408yr5YW2lIlsqHZ
V/LZgkTWDe+iQPeXNA4Djzylx0lcRlAy4yMSLjN1+gb9/uBnXb9J0eGJzgfZfrL8
fiIhymg3bv8vB99l6LMR5vT343QLWXf1yS31A7rPQvwkDo6zFehUJA0XNfIsl2dw
VQYsvl9vp9wy3e6Y0qKXPn1XhAhCrm64P3crBxK31MMvcKZVCfeRSZ78wrvpvewy
MVLlXdqop1bHPHowtRfA5jwxr1NqcYp+Jg0+YGX3iXpPi1Jfk36DNUy9iWvtvIzr
lWgQcIKsdCwwYUcvPR8Kt8T/3q/AHbYlI6mimWlkmbZwncQcgCrH5xSG+c2BIPfV
fn3W6eLHBn8RyVsxlaXlA0Y9TNtI/Cm85b3Ri10pFvhl868ppWfJxXHi7UtcbU58
sQzahISCTXOH/NQwkkh7kFMtczbB43rAcChvF7EUYpazVBpJ4P4HxKFg3eIzIdc6
VlBSaMu1hxUGoYxNNYuKr/nYstuczLOKzK7q4j/JOExY3RgTWP+T3bF02wgubvoa
D/9WfScewkgCJRoA7i17
=C5nd
-----END PGP SIGNATURE-----
Merge tag 'docs-4.11' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A slightly quieter cycle for documentation this time around.
Three more DocBook template files have been converted to RST; only 21
to go. There are various build improvements and the usual array of
documentation improvements and fixes"
* tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits)
docs / driver-api: Fix structure references in device_link.rst
PM / docs: Fix structure references in device.rst
Add a target to check broken external links in the Documentation
Documentation: Fix linux-api list typo
Documentation: DocBook/Makefile comment typo
Improve sparse documentation
Documentation: make Makefile.sphinx no-ops quieter
Documentation: DMA-ISA-LPC.txt
Documentation: input: fix path to input code definitions
docs: Remove the copyright year from conf.py
docs: Fix a warning in the Korean HOWTO.rst translation
PM / sleep / docs: Convert PM notifiers document to reST
PM / core / docs: Convert sleep states API document to reST
PM / core: Update kerneldoc comments in pm.h
doc-rst: Fix recursive make invocation from macros
doc-rst: Delete output of failed dot-SVG conversion
doc-rst: Break shell command sequences on failure
Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0
doc-rst: fixed cleandoc target when used with O=dir
Documentation/sphinx: prevent generation of .pyc files in the source tree
...
In order to clarify what the module actually does, and how to use it,
let's add some basic documentation to the kernel tree, together with
pointers to related specs and projects.
Signed-off-by: Harald Welte <laforge@gnumonks.org>
Acked-by: Andreas Schultz <aschultz@tpip.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next
tree, they are:
1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
sk_buff so we only access one single cacheline in the conntrack
hotpath. Patchset from Florian Westphal.
2) Don't leak pointer to internal structures when exporting x_tables
ruleset back to userspace, from Willem DeBruijn. This includes new
helper functions to copy data to userspace such as xt_data_to_user()
as well as conversions of our ip_tables, ip6_tables and arp_tables
clients to use it. Not surprinsingly, ebtables requires an ad-hoc
update. There is also a new field in x_tables extensions to indicate
the amount of bytes that we copy to userspace.
3) Add nf_log_all_netns sysctl: This new knob allows you to enable
logging via nf_log infrastructure for all existing netnamespaces.
Given the effort to provide pernet syslog has been discontinued,
let's provide a way to restore logging using netfilter kernel logging
facilities in trusted environments. Patch from Michal Kubecek.
4) Validate SCTP checksum from conntrack helper, from Davide Caratti.
5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
a copy&paste from the original helper, from Florian Westphal.
6) Reset netfilter state when duplicating packets, also from Florian.
7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
nft_meta, from Liping Zhang.
8) Add missing code to deal with loopback packets from nft_meta when
used by the netdev family, also from Liping.
9) Several cleanups on nf_tables, one to remove unnecessary check from
the netlink control plane path to add table, set and stateful objects
and code consolidation when unregister chain hooks, from Gao Feng.
10) Fix harmless reference counter underflow in IPVS that, however,
results in problems with the introduction of the new refcount_t
type, from David Windsor.
11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
from Davide Caratti.
12) Missing documentation on nf_tables uapi header, from Liping Zhang.
13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 69b34fb996 ("netfilter: xt_LOG: add net namespace support for
xt_LOG") disabled logging packets using the LOG target from non-init
namespaces. The motivation was to prevent containers from flooding
kernel log of the host. The plan was to keep it that way until syslog
namespace implementation allows containers to log in a safe way.
However, the work on syslog namespace seems to have hit a dead end
somewhere in 2013 and there are users who want to use xt_LOG in all
network namespaces. This patch allows to do so by setting
/proc/sys/net/netfilter/nf_log_all_netns
to a nonzero value. This sysctl is only accessible from init_net so that
one cannot switch the behaviour from inside a container.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Packets arriving in a VRF currently are delivered to UDP sockets that
aren't bound to any interface. TCP defaults to not delivering packets
arriving in a VRF to unbound sockets. IP route lookup and socket
transmit both assume that unbound means using the default table and
UDP applications that haven't been changed to be aware of VRFs may not
function correctly in this case since they may not be able to handle
overlapping IP address ranges, or be able to send packets back to the
original sender if required.
So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
being analgous to the existing tcp_l3mdev_accept, namely to allow a
process to have a VRF-global listen socket. Have this default to off
as this is the behaviour that users will expect, given that there is
no explicit mechanism to set unmodified VRF-unaware application into a
default VRF.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix some double words found in Documentation.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Previous patches have moved the temperature sensor code into the
Marvell PHYs. A few now dead references to NET_DSA_HWMON were left
behind. Go reap them.
Reported-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
that denotes the first unprivileged inet port in the namespace. To
disable all privileged ports set this to zero. It also checks for
overlap with the local port range. The privileged and local range may
not overlap.
The use case for this change is to allow containerized processes to bind
to priviliged ports, but prevent them from ever being allowed to modify
their container's network configuration. The latter is accomplished by
ensuring that the network namespace is not a child of the user
namespace. This modification was needed to allow the container manager
to disable a namespace's priviliged port restrictions without exposing
control of the network namespace to processes in the user namespace.
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* socket owner support for connections, so when the wifi
manager (e.g. wpa_supplicant) is killed, connections are
torn down - wpa_supplicant is critical to managing certain
operations, and can opt in to this where applicable
* minstrel & minstrel_ht updates to be more efficient (time and space)
* set wifi_acked/wifi_acked_valid for skb->destructor use in the
kernel, which was already available to userspace
* don't indicate new mesh peers that might be used if there's no
room to add them
* multicast-to-unicast support in mac80211, for better medium usage
(since unicast frames can use *much* higher rates, by ~3 orders of
magnitude)
* add API to read channel (frequency) limitations from DT
* add infrastructure to allow randomizing public action frames for
MAC address privacy (still requires driver support)
* many cleanups and small improvements/fixes across the board
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJYeKu7AAoJEGt7eEactAAdwjEP/RA4bXFMfkC7qUJ++cLrMMwY
yCvjb8+ULWL2wbCzpfY37acbGJgot3DNoQJzrO2jMQPqyM9nRlTMg5aF49cI7t62
gU6daNKJaGBe/0yeG7lTJ4n5UtVCDtN45hGc06Yert+ewb9njiJf+XYrtCWetsIJ
5bOLYQKPWOz/7UyMH7uJ25zrPFaiA3y7XnXKPEudagG/EwEq9ZuUpSSfLwEAEBPi
6i/2w4fLj32vXRsQMvQT0sU6mjd+1ub8Is7w5l2F06iWwNYPzdSM0IbU+E+ie2tk
sE6RA70c4ILrp8KisTAz2lJPa4XEpFkLhI3lzRRy8CVzjyyo/OJen92zvr2R7TVb
/uZG9qfRQ3UitQmgeKd+wS8PsbRAyWUR/xhNxD2r7zARH2vliwyneU+zEpXLeGA1
Y4PrN1+Fk45Ye4/4XSbPO4cf1MHX7qinN4rjrpsJKPwoYD/gQ1cZvef4AbaKPvq6
oCKRVrwNoUuSB8NTcMLPqze3WCfhnJyVUhCZTyzHeW4uG81qrHwrvBvM25vcWGcm
CcSWFktFIpuGML4FCU3byZfb0NkmJtpCD4n7P98WFPGjvsWIEVCMckqlC8x1F7B7
BqqjGS2mGA17Xy0uLfmN/JempesQJnZhnAnFERdyX1S1YQuKhLwEu7OsYegnStDL
Cn1wFw2/qcgeTkJfBICB
=UToW
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
For 4.11, we seem to have more than in the past few releases:
* socket owner support for connections, so when the wifi
manager (e.g. wpa_supplicant) is killed, connections are
torn down - wpa_supplicant is critical to managing certain
operations, and can opt in to this where applicable
* minstrel & minstrel_ht updates to be more efficient (time and space)
* set wifi_acked/wifi_acked_valid for skb->destructor use in the
kernel, which was already available to userspace
* don't indicate new mesh peers that might be used if there's no
room to add them
* multicast-to-unicast support in mac80211, for better medium usage
(since unicast frames can use *much* higher rates, by ~3 orders of
magnitude)
* add API to read channel (frequency) limitations from DT
* add infrastructure to allow randomizing public action frames for
MAC address privacy (still requires driver support)
* many cleanups and small improvements/fixes across the board
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thin stream DUPACK is to start fast recovery on only one DUPACK
provided the connection is a thin stream (i.e., low inflight). But
this older feature is now subsumed with RACK. If a connection
receives only a single DUPACK, RACK would arm a reordering timer
and soon starts fast recovery instead of timeout if no further
ACKs are received.
The socket option (THIN_DUPACK) is kept as a nop for compatibility.
Note that this patch does not change another thin-stream feature
which enables linear RTO. Although it might be good to generalize
that in the future (i.e., linear RTO for the first say 3 retries).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the support of RFC5827 early retransmit (i.e.,
fast recovery on small inflight with <3 dupacks) because it is
subsumed by the new RACK loss detection. More specifically when
RACK receives DUPACKs, it'll arm a reordering timer to start fast
recovery after a quarter of (min)RTT, hence it covers the early
retransmit except RACK does not limit itself to specific inflight
or dupack numbers.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.
This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's just an example, but lets make it look more real to don't confuse
people about possible REG_RULE usage. Channels are 20 MHz wide, so start
and end frequencies are 10 MHz away from the center one.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's maintained by Seth Forshe for a long time now.
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Cc: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
It's another busy cycle for the docs tree, as the sphinx conversion
continues. Highlights include:
- Further work on PDF output, which remains a bit of a pain but should be
more solid now.
- Five more DocBook template files converted to Sphinx. Only 27 to go...
Lots of plain-text files have also been converted and integrated.
- Images in binary formats have been replaced with more source-friendly
versions.
- Various bits of organizational work, including the renaming of various
files discussed at the kernel summit.
- New documentation for the device_link mechanism.
...and, of course, lots of typo fixes and small updates.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9
sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/
T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF
XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf
BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx
r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh
QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7
RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G
zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF
A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC
bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf
pmSJR0hVeRUmA4uw6+Su
=A0EV
-----END PGP SIGNATURE-----
Merge tag 'docs-4.10' of git://git.lwn.net/linux
Pull documentation update from Jonathan Corbet:
"These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion
continues. Highlights include:
- Further work on PDF output, which remains a bit of a pain but
should be more solid now.
- Five more DocBook template files converted to Sphinx. Only 27 to
go... Lots of plain-text files have also been converted and
integrated.
- Images in binary formats have been replaced with more
source-friendly versions.
- Various bits of organizational work, including the renaming of
various files discussed at the kernel summit.
- New documentation for the device_link mechanism.
... and, of course, lots of typo fixes and small updates"
* tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
dma-buf: Extract dma-buf.rst
Update Documentation/00-INDEX
docs: 00-INDEX: document directories/files with no docs
docs: 00-INDEX: remove non-existing entries
docs: 00-INDEX: add missing entries for documentation files/dirs
docs: 00-INDEX: consolidate process/ and admin-guide/ description
scripts: add a script to check if Documentation/00-INDEX is sane
Docs: change sh -> awk in REPORTING-BUGS
Documentation/core-api/device_link: Add initial documentation
core-api: remove an unexpected unident
ppc/idle: Add documentation for powersave=off
Doc: Correct typo, "Introdution" => "Introduction"
Documentation/atomic_ops.txt: convert to ReST markup
Documentation/local_ops.txt: convert to ReST markup
Documentation/assoc_array.txt: convert to ReST markup
docs-rst: parse-headers.pl: cleanup the documentation
docs-rst: fix media cleandocs target
docs-rst: media/Makefile: reorganize the rules
docs-rst: media: build SVG from graphviz files
docs-rst: replace bayer.png by a SVG image
...
PPPOL2TP_MSG_* and L2TP_MSG_* are duplicates, and are being used
interchangeably in the kernel, so let's standardize on L2TP_MSG_*
internally, and keep PPPOL2TP_MSG_* defined in UAPI for compatibility.
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: David S. Miller <davem@davemloft.net>
>From : Woojung Huh <woojung.huh@microchip.com>
Add functions to unregister phy fixup for modules.
int phy_unregister_fixup(const char *bus_id, u32 phy_uid, u32 phy_uid_mask)
Unregister phy fixup from phy_fixup_list per bus_id, phy_uid &
phy_uid_mask
int phy_unregister_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask)
Unregister phy fixup from phy_fixup_list.
Use it for fixup registered by phy_register_fixup_for_uid()
int phy_unregister_fixup_for_id(const char *bus_id)
Unregister phy fixup from phy_fixup_list.
Use it for fixup registered by phy_register_fixup_for_id()
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently always sets the PBLx8/PBLx4 bit, which means that
the pbl values configured via the pbl/txpbl/rxpbl DT properties are
always multiplied by 8/4 in the hardware.
In order to allow the DT to configure lower pbl values, while at the
same time not changing behavior of any existing device trees using the
pbl/txpbl/rxpbl settings, add a property to disable the multiplication
of the pbl by 8/4 in the hardware.
Suggested-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GMAC and newer supports independent programmable burst lengths for
DMA tx/rx. Add new optional devicetree properties representing this.
To be backwards compatible, snps,pbl will still be valid, but
snps,txpbl/snps,rxpbl will override the value in snps,pbl if set.
If the IP is synthesized to use the AXI interface, there is a register
and a matching DT property inside the optional stmmac-axi-config DT node
for controlling burst lengths, named snps,blen.
However, using this register, it is not possible to control tx and rx
independently. Also, this register is not available if the IP was
synthesized with, e.g., the AHB interface.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains a large Netfilter update for net-next,
to summarise:
1) Add support for stateful objects. This series provides a nf_tables
native alternative to the extended accounting infrastructure for
nf_tables. Two initial stateful objects are supported: counters and
quotas. Objects are identified by a user-defined name, you can fetch
and reset them anytime. You can also use a maps to allow fast lookups
using any arbitrary key combination. More info at:
http://marc.info/?l=netfilter-devel&m=148029128323837&w=2
2) On-demand registration of nf_conntrack and defrag hooks per netns.
Register nf_conntrack hooks if we have a stateful ruleset, ie.
state-based filtering or NAT. The new nf_conntrack_default_on sysctl
enables this from newly created netnamespaces. Default behaviour is not
modified. Patches from Florian Westphal.
3) Allocate 4k chunks and then use these for x_tables counter allocation
requests, this improves ruleset load time and also datapath ruleset
evaluation, patches from Florian Westphal.
4) Add support for ebpf to the existing x_tables bpf extension.
From Willem de Bruijn.
5) Update layer 4 checksum if any of the pseudoheader fields is updated.
This provides a limited form of 1:1 stateless NAT that make sense in
specific scenario, eg. load balancing.
6) Add support to flush sets in nf_tables. This series comes with a new
set->ops->deactivate_one() indirection given that we have to walk
over the list of set elements, then deactivate them one by one.
The existing set->ops->deactivate() performs an element lookup that
we don't need.
7) Two patches to avoid cloning packets, thus speed up packet forwarding
via nft_fwd from ingress. From Florian Westphal.
8) Two IPVS patches via Simon Horman: Decrement ttl in all modes to
prevent infinite loops, patch from Dwip Banerjee. And one minor
refactoring from Gao feng.
9) Revisit recent log support for nf_tables netdev families: One patch
to ensure that we correctly handle non-ethernet packets. Another
patch to add missing logger definition for netdev. Patches from
Liping Zhang.
10) Three patches for nft_fib, one to address insufficient register
initialization and another to solve incorrect (although harmless)
byteswap operation. Moreover update xt_rpfilter and nft_fib to match
lbcast packets with zeronet as source, eg. DHCP Discover packets
(0.0.0.0 -> 255.255.255.255). Also from Liping Zhang.
11) Built-in DCCP, SCTP and UDPlite conntrack and NAT support, from
Davide Caratti. While DCCP is rather hopeless lately, and UDPlite has
been broken in many-cast mode for some little time, let's give them a
chance by placing them at the same level as other existing protocols.
Thus, users don't explicitly have to modprobe support for this and
NAT rules work for them. Some people point to the lack of support in
SOHO Linux-based routers that make deployment of new protocols harder.
I guess other middleboxes outthere on the Internet are also to blame.
Anyway, let's see if this has any impact in the midrun.
12) Skip software SCTP software checksum calculation if the NIC comes
with SCTP checksum offload support. From Davide Caratti.
13) Initial core factoring to prepare conversion to hook array. Three
patches from Aaron Conole.
14) Gao Feng made a wrong conversion to switch in the xt_multiport
extension in a patch coming in the previous batch. Fix it in this
batch.
15) Get vmalloc call in sync with kmalloc flags to avoid a warning
and likely OOM killer intervention from x_tables. From Marcelo
Ricardo Leitner.
16) Update Arturo Borrero's email address in all source code headers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2016-12-03
Here's a set of Bluetooth & 802.15.4 patches for net-next (i.e. 4.10
kernel):
- Fix for a potential NULL deref in the ieee802154 netlink code
- Fix for the ED values of the at86rf2xx driver
- Documentation updates to ieee802154
- Cleanups to u8 vs __u8 usage
- Timer API usage cleanups in HCI drivers
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This switch (default on) can be used to disable automatic registration
of connection tracking functionality in newly created network
namespaces.
This means that when net namespace goes down (or the tracker protocol
module is unloaded) we *might* have to unregister the hooks.
We can either add another per-netns variable that tells if
the hooks got registered by default, or, alternatively, just call
the protocol _put() function and have the callee deal with a possible
'extra' put() operation that doesn't pair with a get() one.
This uses the latter approach, i.e. a put() without a get has no effect.
Conntrack is still enabled automatically regardless of the new sysctl
setting if the new net namespace requires connection tracking, e.g. when
NAT rules are created.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Implemented RFC7527 Enhanced DAD.
IPv6 duplicate address detection can fail if there is some temporary
loopback of Ethernet frames. RFC7527 solves this by including a random
nonce in the NS messages used for DAD, and if an NS is received with the
same nonce it is assumed to be a looped back DAD probe and is ignored.
RFC7527 is enabled by default. Can be disabled by setting both of
conf/{all,interface}/enhanced_dad to zero.
Signed-off-by: Erik Nordmark <nordmark@arista.com>
Signed-off-by: Bob Gilligan <gilligan@arista.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix english in documentation, make documentation match reality, remove
options that were removed from code.
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Couple conflicts resolved here:
1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.
2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.
3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.
4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().
5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.
6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."
Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation. For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.
To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This updates some out of date documentation and fixes some wrong assumptions as
well as pure grammar fixes. This file needs to move towards the new kernel doc
system and getting an overhaul during this work.
Signed-off-by: Stefan Schmidt <stefan@osg.samsung.com>
Add links to the IEEE 802.3-2008 document, and the RGMII v1.3 and v2.0
revisions of the standard.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RGMII is a recurring source of pain for people with Gigabit Ethernet
hardware since it may require PHY driver and MAC driver level
configuration hints. Document what are the expectations from PHYLIB and
what options exist.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Describe that the Ethernet MAC controller is ultimately responsible for
dealing with proper pause frames/flow control advertisement and
enabling, and that it is therefore allowed to have it change
phydev->supported/advertising with SUPPORTED_Pause and
SUPPORTED_AsymPause.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the function pointers documentation which duplicates information
found in include/linux/phy.h. Maintaining documentation about two
different locations just does not work, but the code is less likely to
be outdated.
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
=2KUF
-----END PGP SIGNATURE-----
Merge tag 'v4.9-rc4' into sound
Bring in -rc4 patches so I can successfully merge the sound doc changes.
This patch adds documentation for some SR-related per-interface
sysctls.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is some difference between force_igmp_version and force_mld_version.
Add document to make users aware of this.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Qualcomm QCA tagging introduced in cafdc45c9 to the
list of supported protocols.
Signed-off-by: Fabian Mewes <architekt@coding4coffee.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.
1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.
2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.
3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.
4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.
12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.
14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.
19) Revert a netns locking change that causes regressions, from Paul
Moore.
20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
...
* client FILS authentication support in mac80211 (Jouni)
* AP/VLAN multicast improvements (Michael Braun)
* config/advertising support for differing beacon intervals on
multiple virtual interfaces (Purushottam Kushwaha, myself)
* deprecate the old WDS mode for cfg80211-based drivers, the
mode is hardly usable since it doesn't support any "modern"
features like WPA encryption (2003), HT (2009) or VHT (2014),
I'm not even sure WEP (introduced in 1997) could be done.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJYEy/9AAoJEGt7eEactAAd/V0P/0FmHGS8HjlSjm+1p6sbWKbt
5v8bb3cuKHQiYiUM6euIXql2OYuOEHVAQEpNoPXN9CsfKFYgbIH6yW6d8HtKNedV
n9lmMy/U6yJX9nYt7yMIQ3kLkbEg+YU58B9Hf47waWXLLSNVumS8rfNBn43EoNQf
VKWYPWpetsCRIWJ1fnLuxvMHCtOOYCxH+491BUonof32+DKPEAsAbnszZ2ElufTR
7KNyA3K6leOtTd5Ml52dvLOGNc+h2C83VAMxiShq/6r8OnlX5tPifaubzd9n3m41
jiJJH/92ESrtF2AaWEm8slcgtcfHS/O7y/FSoV4r0PMSvPTBdjwQ9nqCsbONd831
vjj6c6YWNxgHPcISX0XcWz+FHnLJdUGaDUtHjAJYw4oH4gaRXwfSw0U+jvdlSMUf
2CBUArk5f0OEguzwa/5X4Jio3OPPIj4jY/lKplcpLOUu8K2FWTLDuIlww/FHXovs
rDzTLQeXZkx+MkszkTJN42qSEfOFly91J6OA2Wju+emBqrLIbkAGmvyLVg8U8BQd
gG7oltgmZ6Xg6fEnUQqpIDO7UJlQ+GXAU04SpNDMv1j/ueUJskxlr3hYM7E9ueQv
LJcZcVV0RAwNRw52cEsdcYCMLuSMYRrO1OHlkl0wd+x2hFrUCWnVzUgLEUhNBV+c
ICmNMr96nKhrZI217yzF
=SjfQ
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2016-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Among various cleanups and improvements, we have the following:
* client FILS authentication support in mac80211 (Jouni)
* AP/VLAN multicast improvements (Michael Braun)
* config/advertising support for differing beacon intervals on
multiple virtual interfaces (Purushottam Kushwaha, myself)
* deprecate the old WDS mode for cfg80211-based drivers, the
mode is hardly usable since it doesn't support any "modern"
features like WPA encryption (2003), HT (2009) or VHT (2014),
I'm not even sure WEP (introduced in 1997) could be done.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
Geert Uytterhoeven.
2) Fix wrong timeout in set elements added from packet path via
nft_dynset, from Anders K. Pedersen.
3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
documentation, from Nicolas Dichtel.
4) Ensure proper initialization of log flags via xt_LOG, from
Liping Zhang.
5) Missing alias to autoload ipcomp, also from Liping Zhang.
6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.
7) Wrong integer type in the new nft_parse_u32_check() function,
from Dan Carpenter.
8) Another wrong integer type declaration in nft_exthdr_init, also
from Dan Carpenter.
9) Fix insufficient mode validation in nft_range.
10) Fix compilation warning in nft_range due to possible uninitialized
value, from Arnd Bergmann.
11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
calm down kmemcheck, from Florian Westphal.
12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
from Nicolas Dichtel.
13) Fix nf_queue() after conversion to single-linked hook list, related
to incorrect bypass flag handling and incorrect hook point of
reinjection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This entry has been removed in commit 9500507c61.
Fixes: 9500507c61 ("netfilter: conntrack: remove timer from ecache extension")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
For mac80211_hwsim interfaces, suggest to use wpa_supplicant with the more
modern, netlink based driver instead of wext.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This update consists of:
- Fixes and improvements to existing tests
- Moving code from Documentation to selftests, samples, and tools.
Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and networking
tests from Documentation to selftests.
Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay, and
blackfin examples from Documentation to samples.
Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
Documentation to tools.
Deletes BUILD_DOCSRC and its dependencies.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX/6zUAAoJEAsCRMQNDUMczIEP/0kH+yjJ3El4GYIokspR1/UU
++sy4XMzrD1UPy90v+ftcg4ss5R80r0v7EZ59k1UjDJSZ6WATHHGoZKCS2Dy3xcq
i+0vm7Bawh7YWrXD3TunwaL97lwb2DdVTSxRXuU4Hfv+oVynUfh/+ZlCH6RCM2nm
ZJE5PDYiq4nTVSRqFB2FyRE6yay5dPvpQ2ArwnSEw+ku4C+ZdGTGCWzS+aZBwZM/
ykePkGLVRXz9FsWTCmipJzYu0Z/M4xEGlfXQZiiLG2HicbJNP6AqJImbQrANm+TW
RFigYpofdhr9XG5TKTLIudaRt9qB6BE0mYEApZXH8U7NrHElfO9BBMEwzajl0V/2
q/r5iej/CJult3zsfkhdHo7GLXpOaDLyoXiUI6UTgL0XOdWLAWTqDYx4JJz9sXxp
B9dwKJeP5HLipk6FMkAHgJM90JKQFd/nLDKxeWexbMu/b/yQ2C9AR7NpdQ+c1X7I
8W8UNEi/fnK75+r4t3NfeD2/5boq/jwujSKEMDQm/3R8L8EFYYb/TRoujFn89Na3
wbZLV3hBL+KQ5lRyIx7X8RKyVJv1nlo9Wh57ItJed6zvGp5EmsI8w+DER2RfbO2c
HR2JPDKSxmU8O2WBfDW5QoiPQH8Lssd147Ir0UFE7mwBXgWWsmxJxDpufizAXwyJ
qnELJ9X3UFIdydtoObLr
=60kH
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
"This update consists of:
- Fixes and improvements to existing tests
- Moving code from Documentation to selftests, samples, and tools:
* Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and
networking tests from Documentation to selftests.
* Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay,
and blackfin examples from Documentation to samples.
* Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
Documentation to tools.
* Deletes BUILD_DOCSRC and its dependencies"
* tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
selftests/futex: Check ANSI terminal color support
Doc: update 00-INDEX files to reflect the runnable code move
samples: move blackfin gptimers-example from Documentation
tools: move pcmcia crc32hash tool from Documentation
tools: move laptops dslm tool from Documentation
tools: move accounting tool from Documentation
samples: move auxdisplay example code from Documentation
samples: move watchdog example code from Documentation
samples: move timers example code from Documentation
samples: move misc-devices/mei example code from Documentation
samples: move mic/mpssd example code from Documentation
selftests: Move networking/timestamping from Documentation
selftests: move watchdog tests from Documentation/watchdog
selftests: move ia64 tests from Documentation/ia64
selftests: move vDSO tests from Documentation/vDSO
selftests: move ptp tests from Documentation/ptp
selftests: move prctl tests from Documentation/prctl
selftests: move dnotify_test from Documentation/filesystems
selftests/timers: Add missing error code assignment before test
selftests/zram: replace ZRAM_LZ4_COMPRESS
...
Update 00-INDEX files with the current file list to reflect the runnable
code move.
Acked-by: Michal Marek <mmarek@suse.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
This is to reflect the change of FIB offload infrastructure from
switchdev objects to FIB notifier.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove networking from Documentation Makefile to move the test to
selftests. Update networking/timestamping Makefile to work under
selftests. These tests will not be run as part of selftests suite
and will not be included in install targets. They can be built and
run separately for now.
This is part of the effort to move runnable code from Documentation.
Acked-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
In a typical IPvlan L3 setup where master is in default-ns and
each slave is into different (slave) ns. In this setup egress
packet processing for traffic originating from slave-ns will
hit all NF_HOOKs in slave-ns as well as default-ns. However same
is not true for ingress processing. All these NF_HOOKs are
hit only in the slave-ns skipping them in the default-ns.
IPvlan in L3 mode is restrictive and if admins want to deploy
iptables rules in default-ns, this asymmetric data path makes it
impossible to do so.
This patch makes use of the l3_rcv() (added as part of l3mdev
enhancements) to perform input route lookup on RX packets without
changing the skb->dev and then uses nf_hook at NF_INET_LOCAL_IN
to change the skb->dev just before handing over skb to L4.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't expose skbs to in-kernel users, such as the AFS filesystem, but
instead provide a notification hook the indicates that a call needs
attention and another that indicates that there's a new call to be
collected.
This makes the following possibilities more achievable:
(1) Call refcounting can be made simpler if skbs don't hold refs to calls.
(2) skbs referring to non-data events will be able to be freed much sooner
rather than being queued for AFS to pick up as rxrpc_kernel_recv_data
will be able to consult the call state.
(3) We can shortcut the receive phase when a call is remotely aborted
because we don't have to go through all the packets to get to the one
cancelling the operation.
(4) It makes it easier to do encryption/decryption directly between AFS's
buffers and sk_buffs.
(5) Encryption/decryption can more easily be done in the AFS's thread
contexts - usually that of the userspace process that issued a syscall
- rather than in one of rxrpc's background threads on a workqueue.
(6) AFS will be able to wait synchronously on a call inside AF_RXRPC.
To make this work, the following interface function has been added:
int rxrpc_kernel_recv_data(
struct socket *sock, struct rxrpc_call *call,
void *buffer, size_t bufsize, size_t *_offset,
bool want_more, u32 *_abort_code);
This is the recvmsg equivalent. It allows the caller to find out about the
state of a specific call and to transfer received data into a buffer
piecemeal.
afs_extract_data() and rxrpc_kernel_recv_data() now do all the extraction
logic between them. They don't wait synchronously yet because the socket
lock needs to be dealt with.
Five interface functions have been removed:
rxrpc_kernel_is_data_last()
rxrpc_kernel_get_abort_code()
rxrpc_kernel_get_error_number()
rxrpc_kernel_free_skb()
rxrpc_kernel_data_consumed()
As a temporary hack, sk_buffs going to an in-kernel call are queued on the
rxrpc_call struct (->knlrecv_queue) rather than being handed over to the
in-kernel user. To process the queue internally, a temporary function,
temp_deliver_data() has been added. This will be replaced with common code
between the rxrpc_recvmsg() path and the kernel_rxrpc_recv_data() path in a
future patch.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add SWITCHDEV_OBJ_ID_PORT_MDB support to the DSA layer.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass struct socket * to more rxrpc kernel interface functions. They should
be starting from this rather than the socket pointer in the rxrpc_call
struct if they need to access the socket.
I have left:
rxrpc_kernel_is_data_last()
rxrpc_kernel_get_abort_code()
rxrpc_kernel_get_error_number()
rxrpc_kernel_free_skb()
rxrpc_kernel_data_consumed()
unmodified as they're all about to be removed (and, in any case, don't
touch the socket).
Signed-off-by: David Howells <dhowells@redhat.com>
Provide a function so that kernel users, such as AFS, can ask for the peer
address of a call:
void rxrpc_kernel_get_peer(struct rxrpc_call *call,
struct sockaddr_rxrpc *_srx);
In the future the kernel service won't get sk_buffs to look inside.
Further, this allows us to hide any canonicalisation inside AF_RXRPC for
when IPv6 support is added.
Also propagate this through to afs_find_server() and issue a warning if we
can't handle the address family yet.
Signed-off-by: David Howells <dhowells@redhat.com>
Since commit 83c0afaec7 ("net: dsa: Add new binding implementation"),
the shortcomings of the dsa platform device have been addressed, remove
that TODO item.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.
It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.
This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.
The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.
Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).
Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the dsa_switch_driver structure contains only function pointers
as it is supposed to, rename it to the more appropriate dsa_switch_ops,
uniformly to any other operations structure in the kernel.
No functional changes here, basically just the result of something like:
s/dsa_switch_driver *drv/dsa_switch_ops *ops/g
However keep the {un,}register_switch_driver functions and their
dsa_switch_drivers list as is, since they represent the -- likely to be
deprecated soon -- legacy DSA registration framework.
In the meantime, also fix the following checks from checkpatch.pl to
make it happy with this patch:
CHECK: Comparison to NULL could be written "!ops"
#403: FILE: net/dsa/dsa.c:470:
+ if (ops == NULL) {
CHECK: Comparison to NULL could be written "ds->ops->get_strings"
#773: FILE: net/dsa/slave.c:697:
+ if (ds->ops->get_strings != NULL)
CHECK: Comparison to NULL could be written "ds->ops->get_ethtool_stats"
#824: FILE: net/dsa/slave.c:785:
+ if (ds->ops->get_ethtool_stats != NULL)
CHECK: Comparison to NULL could be written "ds->ops->get_sset_count"
#835: FILE: net/dsa/slave.c:798:
+ if (ds->ops->get_sset_count != NULL)
total: 0 errors, 0 warnings, 4 checks, 784 lines checked
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TFO_SERVER_WO_SOCKOPT2 was intended for debugging purposes during
Fast Open development. Remove this config option and also
update/clean-up the documentation of the Fast Open sysctl.
Reported-by: Piotr Jurkiewicz <piotr.jerzy.jurkiewicz@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
chronological order):
- bump version strings, by Simon Wunderlich
- kerneldoc clean up, by Sven Eckelmann
- enable RTNL automatic loading and according documentation
changes, by Sven Eckelmann (2 patches)
- fix/improve interface removal and associated locking, by
Sven Eckelmann (3 patches)
- clean up unused variables, by Linus Luessing
- implement Gateway selection code for B.A.T.M.A.N. V by
Antonio Quartulli (4 patches)
- rewrite TQ comparison by Markus Pargmann
- fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels
kbuild test robot) and bitwise arithmetic operations (by Linus
Luessing)
- rewrite packet creation for forwarding for readability and to avoid
reference count mistakes, by Linus Luessing
- use kmem_cache for translation table, which results in more efficient
storing of translation table entries, by Sven Eckelmann
- rewrite/clarify reference handling for send_skb_unicast, by Sven
Eckelmann
- fix debug messages when updating routes, by Sven Eckelmann
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJXrY1iAAoJEKEr45hCkp6hqjsP/1GI9Mm9iHGE5s4jY9+JORkn
yR57i0l8IENLtQ2jrxu48VtyBKI5gQoitftRpAMZw5iUjgWXVTzA8/ik1Hy7VHnG
NkDRAwHG/XH0peoubQGGPNbX2pZzBDnjR3wC9/8rOk/q6VqVqcLtgyHKbJFS5hd7
dW3okXqKCZhJcTFnu95i7PZ9zTB7BrHEqcu9aDuA6VHdf4HF9ndCizP9bdnRCOVr
wR1CkCrSjt2pMqRPLDAFcHzq/Lr+4LsNwodO0zqK5yetysJNaFJ7j5nTle2REk4L
V2Wvbmyzxa8MznRphisYM+UJ12BxVjwmuilVoxgeu/FmfCpopA1L7lbWf+xxtAcP
VLegLq2BG3fqkG6Pvk8emabC6oDxZaHsFV6uC4HylzLy09mCKWuap0qtgvNFjloM
ntdYail5BFFsTq9j7KK7k4cfYikeLfmd3/j7F/ok+PJXGpAHKqOsfRABV61rxELH
era8GrQmllh1UA/KX7j6rS4DK+AjaXmh+nk6+KDrd6IKo4+hZ1dg3UHB+ytrnx+6
p0BoLgUnjBvNT44LsjtUZlt+3ILUspJWfb86kBgTFuZm8rJqulrJu6qDbmBZEayb
PPrubxjYSKxR0nMlOVTBsGmNjugQIGn0ku89HKD210YZlpfnYENxwsxtYucWK/Tm
AvwINRUXfumyJIZ385BQ
=6XUi
-----END PGP SIGNATURE-----
Merge tag 'batadv-next-for-davem-20160812' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature patchset includes the following changes (mostly
chronological order):
- bump version strings, by Simon Wunderlich
- kerneldoc clean up, by Sven Eckelmann
- enable RTNL automatic loading and according documentation
changes, by Sven Eckelmann (2 patches)
- fix/improve interface removal and associated locking, by
Sven Eckelmann (3 patches)
- clean up unused variables, by Linus Luessing
- implement Gateway selection code for B.A.T.M.A.N. V by
Antonio Quartulli (4 patches)
- rewrite TQ comparison by Markus Pargmann
- fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels
kbuild test robot) and bitwise arithmetic operations (by Linus
Luessing)
- rewrite packet creation for forwarding for readability and to avoid
reference count mistakes, by Linus Luessing
- use kmem_cache for translation table, which results in more efficient
storing of translation table entries, by Sven Eckelmann
- rewrite/clarify reference handling for send_skb_unicast, by Sven
Eckelmann
- fix debug messages when updating routes, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a driver for the ENA family of networking devices.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The standard kernel API to add new virtual interfaces and attach other
interfaces to it is rtnl-link. batman-adv supports it since v3.10. This
functionality should be used instead of the legacy batman-adv-only sysfs
interface.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.
Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.
To this end, the following changes are made:
(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().
(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.
The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.
(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.
(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.
(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.
(6) There are many bits of unmarshalling code where:
ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}
is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:
ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;
(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:
if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;
becomes:
ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;
(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.
Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:
general protection fault: 0000 [#1] SMP
Modules linked in: kafs(E) af_rxrpc(E) [last unloaded: af_rxrpc]
CPU: 0 PID: 1305 Comm: kworker/u8:3 Tainted: G E 4.7.0-fsdevel+ #1303
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Workqueue: kafsd afs_async_workfn [kafs]
task: ffff88040be041c0 ti: ffff88040c070000 task.ti: ffff88040c070000
RIP: 0010:[<ffffffff8108fd3c>] [<ffffffff8108fd3c>] __lock_acquire+0xcf/0x15a1
RSP: 0018:ffff88040c073bc0 EFLAGS: 00010002
RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000000 RCX: ffff88040d29a710
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040d29a710
RBP: ffff88040c073c70 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff88040be041c0 R15: ffffffff814c928f
FS: 0000000000000000(0000) GS:ffff88041fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa4595f4750 CR3: 0000000001c14000 CR4: 00000000001406f0
Stack:
0000000000000006 000000000be04930 0000000000000000 ffff880400000000
ffff880400000000 ffffffff8108f847 ffff88040be041c0 ffffffff81050446
ffff8803fc08a920 ffff8803fc08a958 ffff88040be041c0 ffff88040c073c38
Call Trace:
[<ffffffff8108f847>] ? mark_held_locks+0x5e/0x74
[<ffffffff81050446>] ? __local_bh_enable_ip+0x9b/0xa1
[<ffffffff8108f9ca>] ? trace_hardirqs_on_caller+0x16d/0x189
[<ffffffff810915f4>] lock_acquire+0x122/0x1b6
[<ffffffff810915f4>] ? lock_acquire+0x122/0x1b6
[<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
[<ffffffff81609dbf>] _raw_spin_lock_irqsave+0x35/0x49
[<ffffffff814c928f>] ? skb_dequeue+0x18/0x61
[<ffffffff814c928f>] skb_dequeue+0x18/0x61
[<ffffffffa009aa92>] afs_deliver_to_call+0x344/0x39d [kafs]
[<ffffffffa009ab37>] afs_process_async_call+0x4c/0xd5 [kafs]
[<ffffffffa0099e9c>] afs_async_workfn+0xe/0x10 [kafs]
[<ffffffff81063a3a>] process_one_work+0x29d/0x57c
[<ffffffff81064ac2>] worker_thread+0x24a/0x385
[<ffffffff81064878>] ? rescuer_thread+0x2d0/0x2d0
[<ffffffff810696f5>] kthread+0xf3/0xfb
[<ffffffff8160a6ff>] ret_from_fork+0x1f/0x40
[<ffffffff81069602>] ? kthread_create_on_node+0x1cf/0x1cf
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the design of mprds, covering a brief description
of the motivation, data-structures and modifications to the
RDS control plane.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the documentation to describe the changes added by
commit 8ba38460f3 ("net/rds Add getsockopt support for SO_RDS_TRANSPORT")
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Comments from Frank Kellerman on last doc update:
- extra whitespace in front of a neigh show command
- convert the brief link example to 'vrf red'
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update vrf documentation for changes made to 4.4 - 4.8 kernels
and iproute2 support for vrf keyword.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next,
they are:
1) Don't use userspace datatypes in bridge netfilter code, from
Tobin Harding.
2) Iterate only once over the expectation table when removing the
helper module, instead of once per-netns, from Florian Westphal.
3) Extra sanitization in xt_hook_ops_alloc() to return error in case
we ever pass zero hooks, xt_hook_ops_alloc():
4) Handle NFPROTO_INET from the logging core infrastructure, from
Liping Zhang.
5) Autoload loggers when TRACE target is used from rules, this doesn't
change the behaviour in case the user already selected nfnetlink_log
as preferred way to print tracing logs, also from Liping Zhang.
6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields
by cache lines, increases the size of entries in 11% per entry.
From Florian Westphal.
7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian.
8) Remove useless defensive check in nf_logger_find_get() from Shivani
Bhardwaj.
9) Remove zone extension as place it in the conntrack object, this is
always include in the hashing and we expect more intensive use of
zones since containers are in place. Also from Florian Westphal.
10) Owner match now works from any namespace, from Eric Bierdeman.
11) Make sure we only reply with TCP reset to TCP traffic from
nf_reject_ipv4, patch from Liping Zhang.
12) Introduce --nflog-size to indicate amount of network packet bytes
that are copied to userspace via log message, from Vishwanath Pai.
This obsoletes --nflog-range that has never worked, it was designed
to achieve this but it has never worked.
13) Introduce generic macros for nf_tables object generation masks.
14) Use generation mask in table, chain and set objects in nf_tables.
This allows fixes interferences with ongoing preparation phase of
the commit protocol and object listings going on at the same time.
This update is introduced in three patches, one per object.
15) Check if the object is active in the next generation for element
deactivation in the rbtree implementation, given that deactivation
happens from the commit phase path we have to observe the future
status of the object.
16) Support for deletion of just added elements in the hash set type.
17) Allow to resize hashtable from /proc entry, not only from the
obscure /sys entry that maps to the module parameter, from Florian
Westphal.
18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised
anymore since we tear down the ruleset whenever the netdevice
goes away.
19) Support for matching inverted set lookups, from Arturo Borrero.
20) Simplify the iptables_mangle_hook() by removing a superfluous
extra branch.
21) Introduce ether_addr_equal_masked() and use it from the netfilter
codebase, from Joe Perches.
22) Remove references to "Use netfilter MARK value as routing key"
from the Netfilter Kconfig description given that this toggle
doesn't exists already for 10 years, from Moritz Sichert.
23) Introduce generic NF_INVF() and use it from the xtables codebase,
from Joe Perches.
24) Setting logger to NONE via /proc was not working unless explicit
nul-termination was included in the string. This fixes seems to
leave the former behaviour there, so we don't break backward.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The 3.xx and 4.xx synopsys gmacs have a very similar
PCS embedded module and they share almost the same registers:
for example:
AN_Control, AN_Status, AN_Advertisement, AN_Link_Partner_Ability,
AN_Expansion, TBI_Extended_Status.
Just the RGMII/SMII Control/Status register differs.
So This patch aims to reorganize and enhance the PCS support.
It removes the existent support from the dwmac1000/dwmac4_core.c
moving basic PCS functions inside a new file called: stmmac_pcs.h.
The patch also reviews the available APIs to be better shared among
different hardware and easily enhanced to support new features.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to restrict this to module parameter.
We export a copy of the real hash size -- when user alters the value we
allocate the new table, copy entries etc before we update the real size
to the requested one.
This is also needed because the real size is used by concurrent readers
and cannot be changed without synchronizing the conntrack generation
seqcnt.
We only allow changing this value from the initial net namespace.
Tested using http-client-benchmark vs. httpterm with concurrent
while true;do
echo $RANDOM > /proc/sys/net/netfilter/nf_conntrack_buckets
done
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :
For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()
In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.
[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify how secure_redirects works. Mention that RFC1122 always applies.
Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
Described what the port_vlan_filtering function is supposed to
accomplish.
Fixes: fb2dabad69 ("net: dsa: support VLAN filtering switchdev attr")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We no longer have a priv_size structure member since 5feebd0a8a ("net:
dsa: Remove allocation of driver private memory")
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function has been removed in 4baee937b8 ("net: dsa: remove DSA
link polling") in favor of using the PHYLIB polling mechanism.
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
infrastructural work to allow documents to be written using restructured
text. Maybe someday, in a galaxy far far away, we'll be able to eliminate
the DocBook dependency and have a much better integrated set of kernel
docs. Someday.
Beyond that, there's a new document on security hardening from Kees, the
movement of some sample code over to samples/, a number of improvements to
the serial docs from Geert, and the usual collection of corrections, typo
fixes, etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXPf/VAAoJEI3ONVYwIuV60pkP/3brq+CavbwptWppESoyZaf7
mpVSH7sOKicMcfHYYIXHmmg0K5gM4e22ATl39+izUCRZRwRnObXvroH++G5mARLs
MUDxLvkc/QxDDuCZnUBq5E2gPtuyYpgj1q9fMGB+70ucc/EXYp5cxUhDmbNVrpSG
KBMoZqKaW/Cf8/4fvRQG/glSR0iwyaQuvvoFAWLHgf8uWN/JPM2Cnv9V2zGQCtzP
4B4Jzayu2BGKowBd65WUYdpGnccc7OAJFSJDY/Z9x7kVxKyD+VTn7VgxGnXxs88v
uNmUEMENUpswzuoYEnDHoR0Y2o7jUi2doFKv+eacSmPaMLWL5EMDzcooZ+Vi7HWH
mvp6GtAZ5qs96OGjsi+gFIw4kY8HGdnpzs7qk/uEdAndfAif5v24YLSQRG2rUCJM
LxomnAWOJEIWGKJtuJnl16aZkgOcn6soecXw3PJmpxzhwd8BnQzwyZIdaZ98kwjA
7Enq2Mmw5NBQwGIV2ODUxzoQ3Axj7aJJsDra2n6lPGTGXONGdgNFzk/hGmtQSuIp
Aeatiy66FF0qKomzs2+EACOFP+eH/IId0yvW83Pj0o9nV25YZiPsw0Z1Tae5n3+g
zgTFycalaowIwE3YzyH6BwvnMrluiPpUTjSLsmEaviJxE7/o+zrjOvMvallUIVUn
YkJcia/DtSuc7u7LYkWe
=2O+a
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux
Pull Documentation updates from Jon Corbet:
"A bit busier this time around.
The most interesting thing (IMO) this time around is some beginning
infrastructural work to allow documents to be written using
restructured text. Maybe someday, in a galaxy far far away, we'll be
able to eliminate the DocBook dependency and have a much better
integrated set of kernel docs. Someday.
Beyond that, there's a new document on security hardening from Kees,
the movement of some sample code over to samples/, a number of
improvements to the serial docs from Geert, and the usual collection
of corrections, typo fixes, etc"
* tag 'docs-for-linus' of git://git.lwn.net/linux: (55 commits)
doc: self-protection: provide initial details
serial: doc: Use port->state instead of info
serial: doc: Always refer to tty_port->mutex
Documentation: vm: Spelling s/paltform/platform/g
Documentation/memcg: update kmem limit doc as codes behavior
docproc: print a comment about autogeneration for rst output
docproc: add support for reStructuredText format via --rst option
docproc: abstract terminating lines at first space
docproc: abstract docproc directive detection
docproc: reduce unnecessary indentation
docproc: add variables for subcommand and filename
kernel-doc: use rst C domain directives and references for types
kernel-doc: produce RestructuredText output
kernel-doc: rewrite usage description, remove duplicated comments
Doc: correct the location of sysrq.c
Documentation: fix common spelling mistakes
samples: v4l: from Documentation to samples directory
samples: connector: from Documentation to samples directory
Documentation: xillybus: fix spelling mistake
Documentation: x86: fix spelling mistakes
...
Fix description of some of the bpf_asm tool related jump instructions
and generally move them to format A <op> k.
Reported-by: Sebastian Amend <sebastian.amend@googlemail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netdevice.h we removed the structure in net-next that is being
changes in 'net'. In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.
The mlx5 conflicts have to do with vxlan support dependencies.
Signed-off-by: David S. Miller <davem@davemloft.net>
In few places the term "ones-complement sum" was used but the actual
meaning is "the complement of the ones-complement sum".
Also, avoid enclosing long statements with underscore, to ease
readability.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
explain how verifier checks safety of packet access
and update email addresses.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers that use LLTX need to update trans_start of the netdev_queue.
(Most drivers don't use LLTX; stack does this update if .ndo_start_xmit
returned TX_OK).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv4/ip_gre.c
Minor conflicts between tunnel bug fixes in net and
ipv6 tunnel cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes several spelling mistakes in the Documentation/ tree, which
are caught by checkpatch.pl's spell checking.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
No more users in the tree, remove NETDEV_TX_LOCKED support.
Adds another hole in softnet_stats struct, but better than keeping
the unused collision counter around.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This document is a starting point for defining the TSO and GSO features.
The whole thing is starting to get a bit messy so I wanted to make sure we
have notes somwhere to start describing what does and doesn't work.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix typos in Documentation/networking/dsa.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Multipath route lookups should consider knowledge about next hops and not
select a hop that is known to be failed.
Example:
[h2] [h3] 15.0.0.5
| |
3| 3|
[SP1] [SP2]--+
1 2 1 2
| | /-------------+ |
| \ / |
| X |
| / \ |
| / \---------------\ |
1 2 1 2
12.0.0.2 [TOR1] 3-----------------3 [TOR2] 12.0.0.3
4 4
\ /
\ /
\ /
-------| |-----/
1 2
[TOR3]
3|
|
[h1] 12.0.0.1
host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:
root@h1:~# ip ro ls
...
12.0.0.0/24 dev swp1 proto kernel scope link src 12.0.0.1
15.0.0.0/16
nexthop via 12.0.0.2 dev swp1 weight 1
nexthop via 12.0.0.3 dev swp1 weight 1
...
If the link between tor3 and tor1 is down and the link between tor1
and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
ssh 15.0.0.5 gets the other. Connections that attempt to use the
12.0.0.2 nexthop fail since that neighbor is not reachable:
root@h1:~# ip neigh show
...
12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
12.0.0.2 dev swp1 FAILED
...
The failed path can be avoided by considering known neighbor information
when selecting next hops. If the neighbor lookup fails we have no
knowledge about the nexthop, so give it a shot. If there is an entry
then only select the nexthop if the state is sane. This is similar to
what fib_detect_death does.
To maintain backward compatibility use of the neighbor information is
based on a new sysctl, fib_multipath_use_neigh.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DSA layer doesn't care about the return code of the port_stp_update
routine, so make it void in the layer and the DSA drivers.
Replace the useless dsa_slave_stp_update function with a
dsa_slave_stp_state function used to reply to the switchdev
SWITCHDEV_ATTR_ID_PORT_STP_STATE attribute.
In the meantime, rename port_stp_update to port_stp_state_set to
explicit the state change.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add description for the missing port_vlan_prepare, port_fdb_prepare,
port_fdb_dump functions in the DSA documentation.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Bob's mesh mode rhashtable conversion, this includes
the rhashtable API change for allocation flags
* BSSID scan, connect() command reassoc support (Jouni)
* fast (optimised data only) and support for RSS in mac80211 (myself)
* various smaller changes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXBQ4GAAoJEGt7eEactAAdWiMP/ibaP3I79NDc0s7wCDA+KRkm
hx0Qx4a0wwm7lDFlnGBjY6yKr+XFDliCvdGX7XGpLSsTioNg7eXPpwx5FQoj6RiV
8+5RKE9fTguN9ofUzqAwHd9sVOaxvdlXbKfb/N93Gzjpw/meYk58wXdF7Almkroa
ukgJeMzIlIh+6D96zFEA+Ofzp5chwh+x2Dn0wXutEe9P9fOERA859veAvx65b+Ql
IRGTqyuY5B/wcbkr4o+DWQwgrdt7Vop9nYVPNWtMHm2JTzfuCSaQ2cD9TnVAK/bg
/vtqC46KKNLyBRGexAPqdftY9PWcfipgE+n7k+Et4iGSmNm7Z3dEyewgXmqli7XJ
X8Uiaq+N6Fpe06DVSU7aSRt8NLV64A44jXSfKRI9U2POUqKMn/PMdm8bhPW8qCdM
ra6myWpQGHWK9e0TQQdShq0NQKGxCZAiSRiiIrbbvXl1CwXxkPCG39wAC3Sh1tEN
ou4lGraeywGnTjaq+mwLEtHLoug8Y2x+Fz+Ze4Cu2enXxna9lp4lr+rFlc+2+0Er
o9oPxkTk8krZGIj9M6PNc5W+InMwchaFX3076n67hnFHzFRlOQzkfffbPYlhKJDQ
f8c9JiNZIoX/fD1TAKsrdO1+EKm/xo7w7pLgbMwQal8Jr88SkITDg0i3oXc56vNQ
ZK2gUzwvrD/jh0AUyDfN
=sj7y
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
For the 4.7 cycle, we have a number of changes:
* Bob's mesh mode rhashtable conversion, this includes
the rhashtable API change for allocation flags
* BSSID scan, connect() command reassoc support (Jouni)
* fast (optimised data only) and support for RSS in mac80211 (myself)
* various smaller changes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there can be multiple switch ASICs on the same system we should
use the switch ID in order to differentiate between them and set the
switch name (e.g. swX) accordingly.
Also, replace the order of the "Switch ID" and "Port Netdev Naming"
sections following the above change.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not the internal flags but the radiotap flags are parsed when the monitor
injected frames are prepared for transmission. Thus the documentation
should only document these.
Reported-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Fixes: dfdfc2beb0 ("mac80211: Parse legacy and HT rate in injected frames")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add VHT radiotap parsing support to ieee80211_parse_tx_radiotap().
That capability has been tested using a d-link dir-860l rev b1 running
OpenWrt trunk and mt76 driver
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Update docs and add code snippet for using cmsg for timestamping.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update stmmac driver documentation according to new GMAC 4.x family.
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes a copy-paste-o in the BPF opcode table: "neg" takes no arguments
and thus has no addressing modes.
Signed-off-by: Dave Anderson <danderson@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Commit d67ef35fff ("clarify documentation for
net.ipv4.igmp_max_memberships") mistakenly indented a block of
documentation such that it now looks like it belongs to a specific sysctl.
Restore that block's original position.
Cc: Jeremy Eder <jeder@redhat.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename DSA port_join_bridge and port_leave_bridge routines to
respectively port_bridge_join and port_bridge_leave in order to respect
an implicit Port::Bridge namespace.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some new development in PHYLIB added new function pointers to the struct
phy_driver, document these.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS iWarp support code has become stale and non testable. As
indicated earlier, am dropping the support for it.
If new iWarp user(s) shows up in future, we can adapat the RDS IB
transprt for the special RDMA READ sink case. iWarp needs an MR
for the RDMA READ sink.
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
<< nothing; all addresses have been flushed>>
Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces
$ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
$ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
Will keep addresses on eth1 on an admin down.
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2100:1::2/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
valid_lft forever preferred_lft forever
$ ip link set dev eth1 down
$ ip -6 addr show dev eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
inet6 2100:1::2/120 scope global tentative
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
valid_lft forever preferred_lft forever
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VLAN GetNext operation is specific to some switches, and thus can be
complicated to implement for some drivers.
Remove the support for the vlan_getnext/port_pvid_get approach in favor
of the generic and simpler port_vlan_dump function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers
which gets passed the switchdev VLAN object and callback.
This function, if implemented, takes precedence over the soon legacy
vlan_getnext/port_pvid_get approach.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers/devices without their own rate control algorithm can get the
information what rates they should use from either the radiotap header of
injected frames or from the rate control algorithm. But the parsing of the
legacy rate information from the radiotap header was removed in commit
e6a9854b05 ("mac80211/drivers: rewrite the rate control API").
The removal of this feature heavily reduced the usefulness of frame
injection when wanting to simulate specific transmission behavior. Having
rate parsing together with MCS rates and retry support allows a fine
grained selection of the tx behavior of injected frames for these kind of
tests.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Some DSA drivers may or may not support multiple software bridges on top
of an hardware switch.
It is more convenient for them to access the bridge's net_device for
finer configuration.
Removing the need to craft and access a bitmask also simplifies the
code.
This patch changes the signature of bridge related functions, update DSA
drivers, and removes dsa_slave_br_port_mask.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mmapped netlink has a number of unresolved issues:
- TX zerocopy support had to be disabled more than a year ago via
commit 4682a03586 ("netlink: Always copy on mmap TX.")
because the content of the mmapped area can change after netlink
attribute validation but before message processing.
- RX support was implemented mainly to speed up nfqueue dumping packet
payload to userspace. However, since commit ae08ce0021
("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
with the socket-based interface too (via the skb_zerocopy helper).
The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:
- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.
- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094 ("netlink: not trim skb for mmaped socket when dump").
- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.
Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").
mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch. Daniel Borkmann fixed this via
commit 6bb0fef489 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.
nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
Problem is that in the mmap case, the allocation time also determines
the ordering in which the frame will be seen by userspace (A
allocating before B means that A is located in earlier ring slot,
but this also means that B might get a lower sequence number then A
since seqno is decided later. To fix this we would need to extend the
spinlocked region to also cover the allocation and message setup which
isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
Queing GSO packets is faster than having to force a software segmentation
in the kernel, so this is a desirable option. However, with a mmap based
ring one has to use 64kb per ring slot element, else mmap has to fall back
to the socket path (NL_MMAP_STATUS_COPY) for all large packets.
To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.
Enable this by providing an option called "drop_unsolicited_na".
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.
Enable this by providing an option called "drop_gratuitous_arp".
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.
Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
open-mesh.org and its subdomains can only be accessed via HTTPS. HTTP-only
requests are currently redirected automatically to HTTPS but references in
the source code should be only https.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Documentation should be kept consistent with the code:
static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
#define MAX_TCP_SYNCNT 127
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fixes and various document tweaks.
One patch reaches out of the documentation subtree to fix a comment in
init/do_mounts_rd.c. There didn't seem to be anybody more appropriate to
take that one, so I accepted it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWmmJ8AAoJEI3ONVYwIuV6uqwP/0mnqdxVWo47ohaYJP7q0Soh
ovJAbfttxKnkmOdGbWcNIJtTiw+MpdF805CYR+2treE0zvEEDodg7BhkDnmKZJ9n
F1r53JrIj769E1c5ETmWTHcBt3jjtKyQIbBmDr4YTgX91dlKF28o1bMmyDECWIcT
PktTlPUidDtffKMn3klh6baPCMrTpLJ8aLshBzUrQhrQY8lxcZKAU+98vtFzYofG
LXCSulMYXumb7XBxErTLQZhmJslD4gaDMh2xkov6ALS8XNHnfoUIFRbArAllNfTf
LQGJ6Q5qnn58UWi9F/vgDqx7+d1KIPUjBxJR9wfa0w9ggQhA9ly2BSN/fllbiSbp
yIi1JS4hwBe8H/h577BNC3xjmgVN7mazZsXlS+fg3G16gpv4JdWeRY4efjosFIzQ
EIJxB8qAovUNqw4s1mzRIJ5B9L7PEK27O6z8N27Fiw4EigtMTFAOC2/GD3ELx4iJ
p1doiSr+wjfDcFd8kdIUiDKGrTSTXwNy3hUfrhzQyaEjDTJnx3+1+ono1orSazPO
Fr2RSsC5VzX4IYSuxTMvFSKjN1Iiu8xqwq3IdclHXrBhRvwOF2wpjjQ5Guf0lHBJ
FLBahSjZqt01kmwFykxoHps+VeSwpoEen6rClBQolfmtYVDTvgRNN46AxK9jZ8T4
jZmCNNs/mYzrqo/RTnmw
=u38W
-----END PGP SIGNATURE-----
Merge tag 'docs-4.5' of git://git.lwn.net/linux
Pull documentation updates from Jon Corbet:
"A relatively boring cycle in the docs tree. There's a few kernel-doc
fixes and various document tweaks.
One patch reaches out of the documentation subtree to fix a comment in
init/do_mounts_rd.c. There didn't seem to be anybody more appropriate
to take that one, so I accepted it"
* tag 'docs-4.5' of git://git.lwn.net/linux: (29 commits)
thermal: add description for integral_cutoff unit
Documentation: update libhugetlbfs site url
Documentation: Explain pci=conf1,conf2 more verbosely
DMA-API: fix confusing sentence in Documentation/DMA-API.txt
Documentation: translations: update linux cross reference link
Documentation: fix typo in CodingStyle
init, Documentation: Remove ramdisk_blocksize mentions
Documentation-getdelays: Apply a recommendation from "checkpatch.pl" in main()
Documentation: HOWTO: update versions from 3.x to 4.x
Documentation: remove outdated references from translations
Doc: treewide: Fix grammar "a" to "an"
Documentation: cpu-hotplug: Fix sysfs mount instructions
can-doc: Add hint about getting timestamps
Fix CFQ I/O scheduler parameter name in documentation
Documentation: arm: remove dead links from Marvell Berlin docs
Documentation: HOWTO: update code cross reference link
Doc: Docbook/iio: Fix typo in iio.tmpl
DocBook: make index.html generation less verbose by default
DocBook: Cleanup: remove an unused $(call) line
DocBook: Add a help message for DOCBOOKS env var
...
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <a@unstable.cc>
Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.
This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/geneve.c
Here we had an overlapping change, where in 'net' the extraneous stats
bump was being removed whilst in 'net-next' the final argument to
udp_tunnel6_xmit_skb() was being changed.
Signed-off-by: David S. Miller <davem@davemloft.net>
As we all know, the value of pf_retrans >= max_retrans_path can
disable pf state. The variables of pf_retrans and max_retrans_path
can be changed by the userspace application.
Sometimes the user expects to disable pf state while the 2
variables are changed to enable pf state. So it is necessary to
introduce a new variable to disable pf state.
According to the suggestions from Vlad Yasevich, extra1 and extra2
are removed. The initialization of pf_enable is added.
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a hint about how to get timestamps of received
CAN frames with ioctl(2). This hint has been applied to the
former SocketCAN Documentation, but it got lost during mainlining
the first bits and pieces to linux kernel.
Signed-off-by: Stefan Tatschner <rumpelsepp@sevenbyte.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Apparently the e100.txt document contained a "License" section left
over from days of old, which does not need to be in the kernel
documentation. So clean it up..
CC: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
wait; these include some improvements to the suggestions for email clients
and patch submission.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWRge3AAoJEI3ONVYwIuV6BboP/30p1d0kGC9KWH96aVZst1uu
4VaUcOf+zKp8wQwhyHQIgmGlgD8u6fzfa8i2YIiVRzJD18Tm67KjHsbSiT4qgI84
xizcmnuRogTthtWTmGITKfgQ5OL3Z0IX/5JgIMoIdmAKSjg/3kSR2b5FvN+tj6Qh
9SQAWYIW2cvDWp9PV8gWOdA2j6EKDQ6BdwRE749LmS3MXRN9bM/KRezCyIrmCvOF
FChTzxVjKd2TNYlO9nDyzn70WhyoV/e7jn+9AOKC4XHOCrFI0KB36mE8Wv/VmUmF
+6YPLWNJrTb7J07p9fkYb0FeMzEVcZIVqtMgvtmWTrX4qat9VJI38tOcgDQKHOhj
ETf+8DDM0t4pNUw+3KDdCdu+AIZcEuuRQYc7AUC/URHZGM+LsI0hERRcNS5Z/KRv
lKyq3Y+A+/Z7b6Ia87lJavgubEdY3pagCOZXWQWvb8CwaTnOhuf4qfBK6pkqJBuN
2DfNSQrsi2t1cqKfrnGr58kkMslOKzZLXXSjgvB8IhHmqedNha9JODykq2ldnMdx
Lch1fIe8Exh+p0pY3nKUnmB69dMZnuNVnW92C4JbNr+zgU1jxQeLVer3bum3NzZw
cEASmIPb1nJnl5aJkapc/rz+tSJ9GIIgaTyarNO+cVWQ712gYU5Ot4nNj5ayKW5C
RgQOqBL+68GF1tMUB61b
=6qJ0
-----END PGP SIGNATURE-----
Merge tag '4.4-additional' of git://git.lwn.net/linux
Pull more documentation updates from Jon Corbet:
"A few more documentation patches that wandered in and have no reason
to wait; these include some improvements to the suggestions for email
clients and patch submission"
* tag '4.4-additional' of git://git.lwn.net/linux:
Documentation: Add minimal Mutt config for using Gmail
Documentation: Add note on sending files directly with Mutt
Documentation: dontdiff: remove media from dontdiff
Documentation/SubmittingPatches: discuss In-Reply-To
Remove email address from Documentation/filesystems/overlayfs.txt
can-doc: Add missing semicolon to example
The example code for CAN_BCM,
connect(s, (struct sockaddr *)&addr, sizeof(addr))
lacks a semicolon at the end of the line. This patch adds that
missing semicolon to ensure that the given code snippet actually
compiles.
Signed-off-by: Stefan Tatschner <rumpelsepp@sevenbyte.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Pull networking fixes from David Miller:
1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.
2) Several spots need to get to the original listner for SYN-ACK
packets, most spots got this ok but some were not. Whilst covering
the remaining cases, create a helper to do this. From Eric Dumazet.
3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
from Rasmus Villemoes.
4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.
5) Use after free in mvneta driver, from Justin Maggard.
6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.
7) Add missing ZLIB_INFLATE dependency for new qed driver. From Arnd
Bergmann.
8) Fix multicast getsockopt deadlock, from WANG Cong.
9) Fix deadlock in btusb, from Kuba Pawlak.
10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
counter state. From Sabrina Dubroca.
11) Fix packet_bind() race, which can cause lost notifications, from
Francesco Ruggeri.
12) Fix MAC restoration in qlcnic driver during bonding mode changes,
from Jarod Wilson.
13) Revert bridging forward delay change which broke libvirt and other
userspace things, from Vlad Yasevich.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
Revert "bridge: Allow forward delay to be cfgd when STP enabled"
bpf_trace: Make dependent on PERF_EVENTS
qed: select ZLIB_INFLATE
net: fix a race in dst_release()
net: mvneta: Fix memory use after free.
net: Documentation: Fix default value tcp_limit_output_bytes
macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
mvneta: add FIXED_PHY dependency
net: caif: check return value of alloc_netdev
net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
drivers: net: xgene: fix RGMII 10/100Mb mode
netfilter: nft_meta: use skb_to_full_sk() helper
net_sched: em_meta: use skb_to_full_sk() helper
sched: cls_flow: use skb_to_full_sk() helper
netfilter: xt_owner: use skb_to_full_sk() helper
smack: use skb_to_full_sk() helper
net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
bpf: doc: correct arch list for supported eBPF JIT
dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
bonding: fix panic on non-ARPHRD_ETHER enslave failure
...
Commit c39c4c6abb ("tcp: double default TSQ output bytes limit")
updated default value for tcp_limit_output_bytes
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aarch64 and s390x support eBPF JIT too, correct document to reflect this and
avoid any confusion.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
some new CAN driver documentation. Beyond that, we have kernel-doc fixes,
a bit more work to support reproducible builds, and the usual collection of
small fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWO6HiAAoJEI3ONVYwIuV6ihwQAK0KC72h0706bdwDJ1p1/aJU
QLuPeiKYWgGAXq2zgOyw3Povj4bkMwkiq1IGHLyK0Id4tg3ngxOXjimk4YKrqarI
BD5HdpOm7IyQEe66ZU9b1RFDVst+bg3yp6ZIZsH5vQxl/KnyJ6AyaaDk8TPYId8S
1+CykJzxyi7GyT/jlLpHbKtBKrraoVke+cNPMAvOf0NjSyO7Ix5B+qH50sttG6Eu
9qcQ8hlKXOdZRTiGW6P+jeZNA+e5+CRpnG9VHBquHy4lI85kQThhWq41UMH690PP
eRbLipeUybb0FwW2KwuMjGKEMDkMvrGJh0TzSXX9lGHd+5/41v7zcyKh8vJcpLjh
bNQ2WOAKUBd2d15EP1MNoKXDLGJXusJczLwOjigWiSCQvgouAWwMrpWEw+Obv8Yl
rdoH1oQqDFfDnk6mnKrSaqLWGNuLxDtkEl/1P0jsGSK6lM3FDkOgTuNPYXTJJgxN
rXuGmPhyUlS2srERUeQJw2rISN0WRBvcKJGkMX6IpvrXHkItbelqK+yY1DeKPmcm
qgbIx9ZWNqtltFpG22VVByqAVwucO5Nu8cAIQ2ysJsTnKOvQCQmhu5UKTjBCkEJM
VpeMm32BfNiJFLuLTQGWBZ8bkRl2shQyXhOaR3uyqG4T+rpPD3qJi6dtFRpsAzOB
q1nZuJCpOaxJFzjSKvpJ
=emZ7
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux
Pull documentation update from Jon Corbet:
"There is a nice new document from Neil on how pathname lookups work
and some new CAN driver documentation. Beyond that, we have
kernel-doc fixes, a bit more work to support reproducible builds, and
the usual collection of small fixes"
* tag 'docs-for-linus' of git://git.lwn.net/linux: (34 commits)
Documentation: add new description of path-name lookup.
Documentation/vm/slub.txt: document slabinfo-gnuplot.sh
Doc: ABI/stable: Fix typo in ABI/stable
doc: Clarify that nmi_watchdog param is for hardlockups
Typo correction for description in gpio document.
DocBook: Fix kernel-doc to be case-insensitive for private:
kernel-docs.txt: update kernelnewbies reference
Doc:kvm: Fix typo in Doc/virtual/kvm
Documentation/Changes: Add bc in "Current Minimal Requirements" section
Documentation/email-clients.txt: remove trailing whitespace
DocBook: Use a fixed encoding for output
MAINTAINERS: The docs tree has moved
Docs/kernel-parameters: Add earlycon devicetree usage
SubmittingPatches: make Subject examples match the de facto standard
Documentation: gpio: mention that <function>-gpio has been deprecated
Documentation: cgroups: just fix a few typos
Documentation: Update kselftest.txt
Documentation: DMA API: Be more explicit that nents is always the same
Documentation: Update the default value of crashkernel low
zram: update documentation
...
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-10-30
1) The flow cache is limited by the flow cache limit which
depends on the number of cpus and the xfrm garbage collector
threshold which is independent of the number of cpus. This
leads to the fact that on systems with more than 16 cpus
we hit the xfrm garbage collector limit and refuse new
allocations, so new flows are dropped. On systems with 16
or less cpus, we hit the flowcache limit. In this case, we
shrink the flow cache instead of refusing new flows.
We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.
2) Fix some unaligned accesses on sparc systems.
From Sowmini Varadhan.
3) Fix some header checks in _decode_session4. We may call
pskb_may_pull with a negative value converted to unsigened
int from pskb_may_pull. This can lead to incorrect policy
lookups. We fix this by a check of the data pointer position
before we call pskb_may_pull.
4) Reload skb header pointers after calling pskb_may_pull
in _decode_session4 as this may change the pointers into
the packet.
5) Add a missing statistic counter on inner mode errors.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In certain use cases it is not always desirable for the switch device to
flood traffic to CPU port. Instead, only certain packet types (e.g.
STP, LACP) should be trapped to it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow devices supporting this feature to control the flooding of unknown
unicast traffic, by making switchdev infrastructure propagate this setting
to the switch driver.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the second half of RACK that uses the the most
recent transmit time among all delivered packets to detect losses.
tcp_rack_mark_lost() is called upon receiving a dubious ACK.
It then checks if an not-yet-sacked packet was sent at least
"reo_wnd" prior to the sent time of the most recently delivered.
If so the packet is deemed lost.
The "reo_wnd" reordering window starts with 1msec for fast loss
detection and changes to min-RTT/4 when reordering is observed.
We found 1msec accommodates well on tiny degree of reordering
(<3 pkts) on faster links. We use min-RTT instead of SRTT because
reordering is more of a path property but SRTT can be inflated by
self-inflicated congestion. The factor of 4 is borrowed from the
delayed early retransmit and seems to work reasonably well.
Since RACK is still experimental, it is now used as a supplemental
loss detection on top of existing algorithms. It is only effective
after the fast recovery starts or after the timeout occurs. The
fast recovery is still triggered by FACK and/or dupack threshold
instead of RACK.
We introduce a new sysctl net.ipv4.tcp_recovery for future
experiments of loss recoveries. For now RACK can be disabled by
setting it to 0.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kathleen Nichols' algorithm for tracking the minimum RTT of a
data stream over some measurement window. It uses constant space
and constant time per update. Yet it almost always delivers
the same minimum as an implementation that has to keep all
the data in the window. The measurement window is tunable via
sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.
The algorithm keeps track of the best, 2nd best & 3rd best min
values, maintaining an invariant that the measurement time of
the n'th best >= n-1'th best. It also makes sure that the three
values are widely separated in the time window since that bounds
the worse case error when that data is monotonically increasing
over the window.
Upon getting a new min, we can forget everything earlier because
it has no value - the new min is less than everything else in the
window by definition and it's the most recent. So we restart fresh
on every new min and overwrites the 2nd & 3rd choices. The same
property holds for the 2nd & 3rd best.
Therefore we have to maintain two invariants to maximize the
information in the samples, one on values (1st.v <= 2nd.v <=
3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
now). These invariants determine the structure of the code
The RTT input to the windowed filter is the minimum RTT measured
from ACK or SACK, or as the last resort from TCP timestamps.
The accessor tcp_min_rtt() returns the minimum RTT seen in the
window. ~0U indicates it is not available. The minimum is 1usec
even if the true RTT is below that.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert the commit e2ca690b65 ("ipv4/icmp: redirect messages
can use the ingress daddr as source"), which tried to introduce a more
suitable behaviour for ICMP redirect messages generated by VRRP routers.
However RFC 5798 section 8.1.1 states:
The IPv4 source address of an ICMP redirect should be the address
that the end-host used when making its next-hop routing decision.
while said commit used the generating packet destination
address, which do not match the above and in most cases leads to
no redirect packets to be generated.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ip commands with examples for creating VRF devics, enslaving interfaces
and dumping VRF-focused data (address, neighbors, routes).
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows configuring how the source address of ICMP
redirect messages is selected; by default the old behaviour is
retained, while setting icmp_redirects_use_orig_daddr force the
usage of the destination address of the packet that caused the
redirect.
The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
following scenario:
Two machines are set up with VRRP to act as routers out of a subnet,
they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
x.x.x.254/24.
If a host in said subnet needs to get an ICMP redirect from the VRRP
router, i.e. to reach a destination behind a different gateway, the
source IP in the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.
The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
and will continue to use the wrong next-op.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be aligned with obj.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xfrm flowcache size is limited by the flowcache limit
(4096 * number of online cpus) and the xfrm garbage collector
threshold (2 * 32768), whatever is reached first. This means
that we can hit the garbage collector limit only on systems
with more than 16 cpus. On such systems we simply refuse
new allocations if we reach the limit, so new flows are dropped.
On syslems with 16 or less cpus, we hit the flowcache limit.
In this case, we shrink the flow cache instead of refusing new
flows.
We increase the xfrm garbage collector threshold to INT_MAX
to get the same behaviour, independent of the number of cpus.
The xfrm garbage collector threshold can still be set below
the flowcache limit to reduce the memory usage of the flowcache.
Tested-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Conflicts:
net/ipv4/arp.c
The net/ipv4/arp.c conflict was one commit adding a new
local variable while another commit was deleting one.
Signed-off-by: David S. Miller <davem@davemloft.net>
No longer need explicit modprobe's and update to use ip instead
of deprecated ifconfig command.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now, the memory allocation in prepare/commit state is done separatelly
in each driver (rocker). Introduce the similar mechanism in generic
switchdev code, in form of queue. That can be used not only for memory
allocations, but also for different items. Abort item destruction
is handled as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next tree
in this 4.4 development cycle, they are:
1) Schedule ICMP traffic to IPVS instances, this introduces a new schedule_icmp
proc knob to enable/disable it. By default is off to retain the old
behaviour. Patchset from Alex Gartrell.
I'm also including what Alex originally said for the record:
"The configuration of ipvs at Facebook is relatively straightforward. All
ipvs instances bgp advertise a set of VIPs and the network prefers the
nearest one or uses ECMP in the event of a tie. For the uninitiated, ECMP
deterministically and statelessly load balances by hashing the packet
(usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using
that number as an index (basic hash table type logic).
The problem is that ICMP packets (which contain really important
information like whether or not an MTU has been exceeded) will get a
different hash value and may end up at a different ipvs instance. With no
information about where to route these packets, they are dropped, creating
ICMP black holes and breaking Path MTU discovery. Suddenly, my mom's
pictures can't load and I'm fielding midday calls that I want nothing to do
with.
To address this, this patch set introduces the ability to schedule icmp
packets which is gated by a sysctl net.ipv4.vs.schedule_icmp. If set to 0,
the old behavior is maintained -- otherwise ICMP packets are scheduled."
2) Add another proc entry to ignore tunneled packets to avoid routing loops
from IPVS, also from Alex.
3) Fifteen patches from Eric Biederman to:
* Stop passing nf_hook_ops as parameter to the hook and use the state hook
object instead all around the netfilter code, so only the private data
pointer is passed to the registered hook function.
* Now that we've got state->net, propagate the netns pointer to netfilter hook
clients to avoid its computation over and over again. A good example of how
this has been simplified is the former TEE target (now nf_dup infrastructure)
since it has killed the ugly pick_net() function.
There's another round of netns updates from Eric Biederman making the line. To
avoid the patchbomb again to almost all the networking mailing list (that is 84
patches) I'd suggest we send you a pull request with no patches or let me know
if you prefer a better way.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman says:
====================
IPVS Updates for v4.4
please consider these IPVS Updates for v4.4.
The updates include the following from Alex Gartrell:
* Scheduling of ICMP
* Sysctl to ignore tunneled packets; and hence some packet-looping scenarios
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
With Linux 3.15 the infrastructure for CAN FD hardware drivers had been
introduced into the kernel. Now the M_CAN driver and the peak_usb driver
support CAN FD. Update the documentation to show the latest CAN related
configuration options of 'ip' from iproute2 and describe the CAN FD specific
options to set the data bitrate and protocol version (ISO/non-ISO).
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This is a way to avoid nasty routing loops when multiple ipvs instances can
forward to eachother.
Signed-off-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Pull networking updates from David Miller:
"Another merge window, another set of networking changes. I've heard
rumblings that the lightweight tunnels infrastructure has been voted
networking change of the year. But what do I know?
1) Add conntrack support to openvswitch, from Joe Stringer.
2) Initial support for VRF (Virtual Routing and Forwarding), which
allows the segmentation of routing paths without using multiple
devices. There are some semantic kinks to work out still, but
this is a reasonably strong foundation. From David Ahern.
3) Remove spinlock fro act_bpf fast path, from Alexei Starovoitov.
4) Ignore route nexthops with a link down state in ipv6, just like
ipv4. From Andy Gospodarek.
5) Remove spinlock from fast path of act_gact and act_mirred, from
Eric Dumazet.
6) Document the DSA layer, from Florian Fainelli.
7) Add netconsole support to bcmgenet, systemport, and DSA. Also
from Florian Fainelli.
8) Add Mellanox Switch Driver and core infrastructure, from Jiri
Pirko.
9) Add support for "light weight tunnels", which allow for
encapsulation and decapsulation without bearing the overhead of a
full blown netdevice. From Thomas Graf, Jiri Benc, and a cast of
others.
10) Add Identifier Locator Addressing support for ipv6, from Tom
Herbert.
11) Support fragmented SKBs in iwlwifi, from Johannes Berg.
12) Allow perf PMUs to be accessed from eBPF programs, from Kaixu Xia.
13) Add BQL support to 3c59x driver, from Loganaden Velvindron.
14) Stop using a zero TX queue length to mean that a device shouldn't
have a qdisc attached, use an explicit flag instead. From Phil
Sutter.
15) Use generic geneve netdevice infrastructure in openvswitch, from
Pravin B Shelar.
16) Add infrastructure to avoid re-forwarding a packet in software
that was already forwarded by a hardware switch. From Scott
Feldman.
17) Allow AF_PACKET fanout function to be implemented in a bpf
program, from Willem de Bruijn"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1458 commits)
netfilter: nf_conntrack: make nf_ct_zone_dflt built-in
netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled
net: fec: clear receive interrupts before processing a packet
ipv6: fix exthdrs offload registration in out_rt path
xen-netback: add support for multicast control
bgmac: Update fixed_phy_register()
sock, diag: fix panic in sock_diag_put_filterinfo
flow_dissector: Use 'const' where possible.
flow_dissector: Fix function argument ordering dependency
ixgbe: Resolve "initialized field overwritten" warnings
ixgbe: Remove bimodal SR-IOV disabling
ixgbe: Add support for reporting 2.5G link speed
ixgbe: fix bounds checking in ixgbe_setup_tc for 82598
ixgbe: support for ethtool set_rxfh
ixgbe: Avoid needless PHY access on copper phys
ixgbe: cleanup to use cached mask value
ixgbe: Remove second instance of lan_id variable
ixgbe: use kzalloc for allocating one thing
flow: Move __get_hash_from_flowi{4,6} into flow_dissector.c
ixgbe: Remove unused PCI bus types
...
An SFP module may have a link up/down status pin which can be
connection to a GPIO line of the host. Add support for reading such an
GPIO in the fixed_phy driver.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the addition of a new sysctl variable which controls the
generation of IGMP reports for link local multicast groups in the
224.0.0.X range.
IGMP reports for local multicast groups can now be optionally
inhibited by setting the value to zero e.g.:
echo 0 > /proc/sys/net/ipv4/igmp_link_local_mcast_reports
To retain backwards compatibility the previous behaviour is retained
by default on system boot or reverted by setting the value back to
non-zero.
Signed-off-by: Philip Downey <pdowney@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a document describing the Broadcom Starfigther 2 switch hardware,
its specifics, and how the driver is implemented and its specifics.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Describe how the DSA subsystem works, its design principles,
limitations, and describe in details how to implement a DSA switch
driver.
Acked-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When TCP pacing was added back in linux-3.12, we chose
to apply a fixed ratio of 200 % against current rate,
to allow probing for optimal throughput even during
slow start phase, where cwnd can be doubled every other gRTT.
At Google, we found it was better applying a different ratio
while in Congestion Avoidance phase.
This ratio was set to 120 %.
We've used the normal tcp_in_slow_start() helper for a while,
then tuned the condition to select the conservative ratio
as soon as cwnd >= ssthresh/2 :
- After cwnd reduction, it is safer to ramp up more slowly,
as we approach optimal cwnd.
- Initial ramp up (ssthresh == INFINITY) still allows doubling
cwnd every other RTT.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-08-16
Here's what's likely the last bluetooth-next pull request for 4.3:
- 6lowpan/802.15.4 refactoring, cleanups & fixes
- Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
- Support for UART based QCA Bluetooth controllers
- Power management support for Broeadcom Bluetooth controllers
- Change LE connection initiation to always use passive scanning first
- Support for new Silicon Wave USB ID
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2015-08-17
1) Fix IPv6 ECN decapsulation for IPsec interfamily tunnels.
From Thomas Egerer.
2) Use kmemdup instead of duplicating it in xfrm_dump_sa().
From Andrzej Hajda.
3) Pass oif to the xfrm lookups so that it gets set on the flow
and the resolver routines can match based on oif.
From David Ahern.
4) Add documentation for the new xfrm garbage collector threshold.
From Alexander Duyck.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rocker driver tracks arp_tbl neighs to resolve IPv4 route nexthops. The
driver uses NETEVENT_NEIGH_UPDATE for neigh adds and updates, but there is
no event when the neigh is removed from the device (such as when the device
goes admin down). This patches hooks ndo_neigh_destroy so the driver can
know when a neigh is removed from the device. In response, the driver will
purge the neigh entry from its internal tbl.
I didn't find an in-tree users of ndo_neigh_destroy, so I'm not sure if
this ndo is vestigial or if there are out-of-tree users. In any case, it
does what I need here. An alternative design would be to generate
NETEVENT_NEIGH_UPDATE event when neigh is being destroyed, setting state to
NUD_NONE so driver knows neigh entry is dead.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few things have changed since the previous version of the vxlan
documentation was written, so update it and correct some grammar and
such while we are at it.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change adds documentation for xfrm4_gc_thresh and xfrm6_gc_thresh
based on the comments in commit eeb1b73378 ("xfrm: Increase the garbage
collector threshold").
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This patch adds a 6lowpan.txt into the networking documentation
directory. Currently this documentation describes how the lowpan
private data of net devices will be handled.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Suggested-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Initialize auto_flowlabels to one. This enables automatic flow labels,
individual socket may disable them using the IPV6_AUTOFLOWLABEL socket
option.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:
0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets
np->autoflowlabel is initialized according to the sysctl value.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6fd99094de ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.
RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.
So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As all dwmac-* drivers have been converted to have a proper probe
function the setup callback can now be removed. Also remove the
free callback that wasn't used by any driver.
New dwmac-* drivers should implement standard probe and remove
functions to preform any needed setup and teardown.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As dwmac-* drivers that need OF match have been converted
to use their own internal OF match data structure this can
now be removed.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per RFC 6724, section 4, "Candidate Source Addresses":
It is RECOMMENDED that the candidate source addresses be the set
of unicast addresses assigned to the interface that will be used
to send to the destination (the "outgoing" interface).
Add a sysctl to enable this behaviour.
Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-07-17
This series contains updates to igb, ixgbe, ixgbevf, i40e, bnx2x,
freescale, siena and dp83640.
Jacob provides several patches to clarify the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info(). It is okay to support
the specific filters in SIOCSHWTSTAMP by upscaling them to the generic
filters.
Alex Duyck provides a igb patch to pull the time stamp from the fragment
before it gets added to the skb, to avoid a possible issue in which the
fragment can possibly be less than IGB_RX_HDR_LEN due to the time stamp
being pulled after the copybreak check. Also provides a ixgbevf patch to
fold the ixgbevf_pull_tail() call into ixgbevf_add_rx_frag(), which gives
the advantage that the fragment does not have to be modified after it is
added to the skb.
Fan provides patches for ixgbe/ixgbevf to set the receive hash type
based on receive descriptor RSS type.
Todd provides a fix for igb where on check for link on any media other
than copper was not being detected since it was looking on the incorrect
PHY page (due to the page being used gets switched before the function
to check link gets executed).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Both of these fields are unused and has been unused since they
were added 3 and 5 years ago. Drop them since they are clearly
not very useful.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds some clarification about the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info. The HWTSTAMP API has
several Rx filters which are very specific, as well as more general
filters. The specific filters really only exist to support some broken
hardware which can't fully implement the generic filters. This patch
adds clarification that it is okay to support the specific filters in
SIOCSHWTSTAMP by upscaling them to the generic filters. In addition,
update the header for ethtool_ts_info to specify that drivers ought to
only report the filters they support without upscaling in this manner.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Reviewed-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In f35f6c8f7 (can: update MAINTAINERS and Documentation) chapter 3.3
was removed. This patch fixes some old references to chapter 3.4 which
no longer exists.
Signed-off-by: Stefan Tatschner <stefan@sevenbyte.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.
This add the ip_nonlocal_bind sysctl under ipv6.
Testing:
Set up nonlocal binding and receive routing on a host, e.g.:
ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1
Set up routing to 2001:0:0:1::/64 on peer to go to first host
ping6 -I 2001:0:0:1::1 peer-address -- to verify
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
printk logbuf keeps various metadata and optional key=value dictionary for
structured messages, both of which are stripped when messages are handed
to regular console drivers.
It can be useful to have this metadata and dictionary available to
netconsole consumers. This obviously makes logging via netconsole more
complete and the sequence number in particular is useful in environments
where messages may be lost or reordered in transit - e.g. when netconsole
is used to collect messages in a large cluster where packets may have to
travel congested hops to reach the aggregator. The lost and reordered
messages can easily be identified and handled accordingly using the
sequence numbers.
printk recently added extended console support which can be selected by
setting CON_EXTENDED flag. From console driver side, not much changes.
The only difference is that the text passed to the write callback is
formatted the same way as /dev/kmsg.
This patch implements extended console support for netconsole which can be
enabled by either prepending "+" to a netconsole boot param entry or
echoing 1 to "extended" file in configfs. When enabled, netconsole
transmits extended log messages with headers identical to /dev/kmsg
output.
There's one complication due to message fragments. netconsole limits the
maximum message size to 1k and messages longer than that are split into
multiple fragments. As all extended console messages should carry
matching headers and be uniquely identifiable, each extended message
fragment carries full copy of the metadata and an extra header field to
identify the specific fragment. The optional header is of the form
"ncfrag=OFF/LEN" where OFF is the byte offset into the message body and
LEN is the total length.
To avoid unnecessarily making printk format extended messages, Extended
netconsole is registered with printk when the first extended netconsole is
configured.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Miller <davem@davemloft.net>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The main thing here is Ingo's big subdirectory documenting feature support
for each architecture. Beyond that, it's the usual pile of fixes, tweaks,
and small additions.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVi0g2AAoJEI3ONVYwIuV6Me4QAIfa79z05ABSjlyWaKw46plH
lULR9cyHdR59JVPHKjSOfT9/c+GOdoz6kkXQoe/TgVyj5fRB8seUW5GJXCASndkk
aVd4c6yKFH1NISXsSdVQC0JbpgAURgcSR6x59It++fG3NINvXronFTWGMBHMLKcI
A2hM2jNP914Dy5r4ipWZKzF1KxIlqK9kmLxlNoE6/LoQfBhh1dMdnyfuM11sguAy
s5pr9JeCPbWC0RE7st/qEivXF4lpj6hd3XoYfM2Y+oukj5xEPQevLTLHOgtesnx9
guUAul5Sw27n+Dx8I0Qxf1n+5SkrijoAa72g5vAxTs+ilOey67qba012NaYSy7RK
s15XOIZ/1JTS9JjkO7GR5NbG6AiIIAH5P+Y501ivCIrsWciTOgKj7cOzakIEV8/P
NX4120Lh5lbBrWeYkl8WbgMO0Me8cThbALC+rncF/wjvGyREKyxNlZ9qvBqmHYjG
5Et2DT+rANaDmmblgMK3tX/zI1g3pN51e+CRF+Hzh1jZD3MZ/i+KS4qgfGFDzMIj
uoniO5VfyD4zRbyv4Grg7XMpXiP8xFxKDypglYiXzzwlkarUgbMGOoFE7AkiPOKB
t9gLPetbDsDyU/bSpzHlfObZp+q+pCxHPhyLS7hxEi3gBxYajIMbkpHHJugnE0+H
TfkIhy6QQm1vAPTpRXaE
=ODt8
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"The main thing here is Ingo's big subdirectory documenting feature
support for each architecture. Beyond that, it's the usual pile of
fixes, tweaks, and small additions"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
doc:md: fix typo in md.txt.
Documentation/mic/mpssd: don't build x86 userspace when cross compiling
Documentation/prctl: don't build tsc tests when cross compiling
Documentation/vDSO: don't build tests when cross compiling
Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
Doc: Change wikipedia's URL from http to https
Documentation/kernel-parameters: add missing pciserial to the earlyprintk
Doc:pps: Fix typo in pps.txt
kbuild : Fix documentation of INSTALL_HDR_PATH
Documentation: filesystems: updated struct file_operations documentation in vfs.txt
kbuild: edit explanation of clean-files variable
Doc: ja_JP: Fix typo in HOWTO
Move freefall program from Documentation/ to tools/
Documentation: ARM: EXYNOS: Describe boot loaders interface
Doc:nfc: Fix typo in nfc-hci.txt
vfs: Minor documentation fix
Doc: networking: txtimestamp: fix printf format warning
Documentation, intel_pstate: Improve legacy mode internal governors description
Documentation: extend use case for EXPORT_SYMBOL_GPL()
...
Recently wikipedia announced to secure access to the servers.
Now all http access re-route to https.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
We need to delete from offload the device externally learnded fdbs when any
one of these events happen:
1) Bridge ages out fdb. (When bridge is doing ageing vs. device doing
ageing. If device is doing ageing, it would send SWITCHDEV_FDB_DEL
directly).
2) STP state change flushes fdbs on port.
3) User uses sysfs interface to flush fdbs from bridge or bridge port:
echo 1 >/sys/class/net/BR_DEV/bridge/flush
echo 1 >/sys/class/net/BR_PORT/brport/flush
4) Offload driver send event SWITCHDEV_FDB_DEL to delete fdb entry.
For rocker, we can now get called to delete fdb entry in wait and nowait
contexts, so set NOWAIT flag when deleting fdb entry.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix URL (http to https) for wiki.wireshark.org.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation/networking/timestamping/txtimestamp.c: In function ‘__print_timestamp’:
Documentation/networking/timestamping/txtimestamp.c:99:3: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘int64_t’ [-Wformat=]
fprintf(stderr, " (%+ld us)", cur_ms - prev_ms);
int64_t differs per platform, so a type specifier that differs along
with it is required.
Signed-off-by: Frans Klaver <fransklaver@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Clarify in documentation and code that IPV4 FIB add operation is used for
both adding a new FIB entry to the device and for modifying an existing FIB
entry on the device.
Also, remove left-over references to ipv4_fib ops and replace with details
on SWITCHDEV_PORT_IPV4_FIB object.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-05-28
Here's a set of patches intended for 4.2. The majority of the changes
are on the 802.15.4 side of things rather than Bluetooth related:
- All sorts of cleanups & fixes to ieee802154 and related drivers
- Rework of tx power support in ieee802154 and its drivers
- Support for setting ieee802154 tx power through nl802154
- New IDs for the btusb driver
- Various cleanups & smaller fixes to btusb
- New btrtl driver for Realtec devices
- Fix suspend/resume for Realtek devices
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A long standing problem on busy servers is the tiny available TCP port
range (/proc/sys/net/ipv4/ip_local_port_range) and the default
sequential allocation of source ports in connect() system call.
If a host is having a lot of active TCP sessions, chances are
very high that all ports are in use by at least one flow,
and subsequent bind(0) attempts fail, or have to scan a big portion of
space to find a slot.
In this patch, I changed the starting point in __inet_hash_connect()
so that we try to favor even [1] ports, leaving odd ports for bind()
users.
We still perform a sequential search, so there is no guarantee, but
if connect() targets are very different, end result is we leave
more ports available to bind(), and we spread them all over the range,
lowering time for both connect() and bind() to find a slot.
This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
is even, ie if start/end values have different parity.
Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
32768 - 60999 (instead of 32768 - 61000)
There is no change on security aspects here, only some poor hashing
schemes could be eventually impacted by this change.
[1] : The odd/even property depends on ip_local_port_range values parity
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Update the linux-zigbee git:// repository URL.
* Remove the MLME section as the current kernel does not provide a
full 802.15.4 MLME implementation.
* The hardmac example driver 'fakehard' was removed some time ago.
* The IEEE 802.15.4 device drivers live in drivers/net/ieee802154/,
not in drivers/ieee802154/.
* The IEEE 802.15.4 MTU is 127 bytes, not 128 bytes.
* Some of the 6LoWPAN code lives in net/6lowpan/.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add the pktgen samples script pktgen_sample02_multiqueue.sh that
demonstrates generating packets on multiqueue NICs.
Specifically notice the options "-t" that specifies how many
kernel threads to activate. Also notice the flag QUEUE_MAP_CPU,
which cause the SKB TX queue to be mapped to the CPU running the
kernel thread. For best scalability people are also encourage to
map NIC IRQ /proc/irq/*/smp_affinity to CPU number.
Usage example with "-t" 4 threads and help:
./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4
Usage: ./pktgen_sample02_multiqueue.sh [-vx] -i ethX
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
-d : ($DEST_IP) destination IP
-m : ($DST_MAC) destination MAC-addr
-t : ($THREADS) threads to start
-c : ($SKB_CLONE) SKB clones send before alloc new SKB
-b : ($BURST) HW level bursting of SKBs
-v : ($VERBOSE) verbose
-x : ($DEBUG) debug
Removing pktgen.conf-2-1 and pktgen.conf-2-2 as these examples
should be covered now.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the first basic pktgen samples script pktgen_sample01_simple.sh,
which demonstrates the a simple use of the helper functions.
Removing pktgen.conf-1-1 as that example should be covered now.
The naming scheme pktgen_sampleNN, where NN is a number, should encourage
reading the samples in a specific order.
Script cause pktgen sending with a single thread and single interface,
and introduce flow variation via random UDP source port.
Usage example and help:
./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2
Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
-d : ($DEST_IP) destination IP
-m : ($DST_MAC) destination MAC-addr
-c : ($SKB_CLONE) SKB clones send before alloc new SKB
-v : ($VERBOSE) verbose
-x : ($DEBUG) debug
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pktgen.txt documentation still claimed that adding same device to
multiple threads were not supported, but it have been since 2008 via
commit e6fce5b916 ("pktgen: multiqueue etc.").
Document this and describe the naming scheme dev@X, as the procfile name
still need to be unique.
Fixes: e6fce5b916 ("pktgen: multiqueue etc.")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pktgen.txt documentation over available config options were not complete.
Making the list complete by adding the following.
Pgcontrol commands:
reset
Device commands:
burst
queue_map_min
queue_map_max
skb_priority
tos
traffic_class
node
spi
dst6_max
dst6_min
vlan_cfi
vlan_id
vlan_p
svlan_cfi
svlan_id
svlan_p
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And cleanup some whitespaces in pktgen.txt.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work as a follow-up of commit f7b3bec6f5 ("net: allow setting ecn
via routing table") and adds RFC3168 section 6.1.1.1. fallback for outgoing
ECN connections. In other words, this work adds a retry with a non-ECN
setup SYN packet, as suggested from the RFC on the first timeout:
[...] A host that receives no reply to an ECN-setup SYN within the
normal SYN retransmission timeout interval MAY resend the SYN and
any subsequent SYN retransmissions with CWR and ECE cleared. [...]
Schematic client-side view when assuming the server is in tcp_ecn=2 mode,
that is, Linux default since 2009 via commit 255cac91c3 ("tcp: extend
ECN sysctl to allow server-side only ECN"):
1) Normal ECN-capable path:
SYN ECE CWR ----->
<----- SYN ACK ECE
ACK ----->
2) Path with broken middlebox, when client has fallback:
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ----->
<----- SYN ACK
ACK ----->
In case we would not have the fallback implemented, the middlebox drop
point would basically end up as:
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
SYN ECE CWR ----X crappy middlebox drops packet
(timeout, rtx)
In any case, it's rather a smaller percentage of sites where there would
occur such additional setup latency: it was found in end of 2014 that ~56%
of IPv4 and 65% of IPv6 servers of Alexa 1 million list would negotiate
ECN (aka tcp_ecn=2 default), 0.42% of these webservers will fail to connect
when trying to negotiate with ECN (tcp_ecn=1) due to timeouts, which the
fallback would mitigate with a slight latency trade-off. Recent related
paper on this topic:
Brian Trammell, Mirja Kühlewind, Damiano Boppart, Iain Learmonth,
Gorry Fairhurst, and Richard Scheffenegger:
"Enabling Internet-Wide Deployment of Explicit Congestion Notification."
Proc. PAM 2015, New York.
http://ecn.ethz.ch/ecn-pam15.pdf
Thus, when net.ipv4.tcp_ecn=1 is being set, the patch will perform RFC3168,
section 6.1.1.1. fallback on timeout. For users explicitly not wanting this
which can be in DC use case, we add a net.ipv4.tcp_ecn_fallback knob that
allows for disabling the fallback.
tp->ecn_flags are not being cleared in tcp_ecn_clear_syn() on output, but
rather we let tcp_ecn_rcv_synack() take that over on input path in case a
SYN ACK ECE was delayed. Thus a spurious SYN retransmission will not prevent
ECN being negotiated eventually in that case.
Reference: https://www.ietf.org/proceedings/92/slides/slides-92-iccrg-1.pdf
Reference: https://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Brian Trammell <trammell@tik.ee.ethz.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Dave That <dave.taht@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Seems all we want here is to avoid endless 'goto reclassify' loop.
tc_classify_compat even resets this counter when something other
than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
break hypothetical loops induced by something other than perpetual
TC_ACT_RECLASSIFY return values.
skb_act_clone is now identical to skb_clone, so just use that.
Tested with following (bogus) filter:
tc filter add dev eth0 parent ffff: \
protocol ip u32 match u32 0 0 police rate 10Kbit burst \
64000 mtu 1500 action reclassify
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were a few review comments on the switchdev.txt documentation that
didn't get included with the Spring Cleanup series, so include them now.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Much need updated of switchdev documentation to cover what's been
implmented to-date. There are some XXX comments in the text for
unimplemented or broken items. I'd like to keep these in there (poor-man's
TODO list) and update the document once each issue is resolved.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port key has three components - user-key, speed-part, and duplex-part.
The LSBit is for the duplex-part, next 5 bits are for the speed while the
remaining 10 bits are the user defined key bits. Get these 10 bits
from the user-space (through the SysFs interface) and use it to form the
admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If
it is not provided then use zero for the user-key-bits (default).
It can set using following example code -
# modprobe bonding mode=4
# usr_port_key=$(( RANDOM & 0x3FF ))
# echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
# echo +eth1 > /sys/class/net/bond0/bonding/slaves
...
# ip link set bond0 up
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: * fixed up style issues reported by checkpatch
* fixed up context from change in ad_actor_sys_prio patch]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In an AD system, the communication between actor and partner is the
business between these two entities. In the current setup anyone on the
same L2 can "guess" the LACPDU contents and then possibly send the
spoofed LACPDUs and trick the partner causing connectivity issues for
the AD system. This patch allows to use a random mac-address obscuring
it's identity making it harder for someone in the L2 is do the same thing.
This patch allows user-space to choose the mac-address for the AD-system.
This mac-address can not be NULL or a Multicast. If the mac-address is set
from user-space; kernel will honor it and will not overwrite it. In the
absence (value from user space); the logic will default to using the
masters' mac as the mac-address for the AD-system.
It can be set using example code below -
# modprobe bonding mode=4
# sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
$(( (RANDOM & 0xFE) | 0x02 )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )) \
$(( RANDOM & 0xFF )))
# echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
# echo +eth1 > /sys/class/net/bond0/bonding/slaves
...
# ip link set bond0 up
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: fixed up style issues reported by checkpatch]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows user to randomize the system-priority in an ad-system.
The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user
does not specify this value, the system defaults to 0xFFFF, which is
what it was before this patch.
Following example code could set the value -
# modprobe bonding mode=4
# sys_prio=$(( 1 + RANDOM + RANDOM ))
# echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
# echo +eth1 > /sys/class/net/bond0/bonding/slaves
...
# ip link set bond0 up
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
[jt: * fixed up style issues reported by checkpatch
* changed how the default value is set in bond_check_params(), this
makes the default consistent between what gets set for a new bond
and what the default is claimed to be in the bonding options.]
Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce xmit_mode 'netif_receive' for pktgen which generates the
packets using familiar pktgen commands, but feeds them into
netif_receive_skb() instead of ndo_start_xmit().
Default mode is called 'start_xmit'.
It is designed to test netif_receive_skb and ingress qdisc
performace only. Make sure to understand how it works before
using it for other rx benchmarking.
Sample script 'pktgen.sh':
\#!/bin/bash
function pgset() {
local result
echo $1 > $PGDEV
result=`cat $PGDEV | fgrep "Result: OK:"`
if [ "$result" = "" ]; then
cat $PGDEV | fgrep Result:
fi
}
[ -z "$1" ] && echo "Usage: $0 DEV" && exit 1
ETH=$1
PGDEV=/proc/net/pktgen/kpktgend_0
pgset "rem_device_all"
pgset "add_device $ETH"
PGDEV=/proc/net/pktgen/$ETH
pgset "xmit_mode netif_receive"
pgset "pkt_size 60"
pgset "dst 198.18.0.1"
pgset "dst_mac 90:e2:ba:ff:ff:ff"
pgset "count 10000000"
pgset "burst 32"
PGDEV=/proc/net/pktgen/pgctrl
echo "Running... ctrl^C to stop"
pgset "start"
echo "Done"
cat /proc/net/pktgen/$ETH
Usage:
$ sudo ./pktgen.sh eth2
...
Result: OK: 232376(c232372+d3) usec, 10000000 (60byte,0frags)
43033682pps 20656Mb/sec (20656167360bps) errors: 10000000
Raw netif_receive_skb speed should be ~43 million packet
per second on 3.7Ghz x86 and 'perf report' should look like:
37.69% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core
25.81% kpktgend_0 [kernel.vmlinux] [k] kfree_skb
7.22% kpktgend_0 [kernel.vmlinux] [k] ip_rcv
5.68% kpktgend_0 [pktgen] [k] pktgen_thread_worker
If fib_table_lookup is seen on top, it means skb was processed
by the stack. To benchmark netif_receive_skb only make sure
that 'dst_mac' of your pktgen script is different from
receiving device mac and it will be dropped by ip_rcv
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
with a negation of the flag like !NO_TIMESTAMP.
Also document the option flag NO_TIMESTAMP.
Fixes: afb84b6261 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current definition of struct can_frame has a 16-byte size, with 8-byte
alignment, but the 3 bytes of padding are not explicit like the similar 2 bytes
of padding of struct canfd_frame. Make it explicit so it is easier to read.
Signed-off-by: Shawn Landden <shawn@churchofgit.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.
Background:
IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.
Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.
This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.
Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not used.
pedit sets TC_MUNGED when packet content was altered, but all the core
does is unset MUNGED again and then set OK2MUNGE.
And the latter isn't tested anywhere. So lets remove both
TC_MUNGED and TC_OK2MUNGE.
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 567e4b7973 ("net: rfs: add hash collision detection") had one
mistake :
RPS_NO_CPU is no longer the marker for invalid cpu in set_rps_cpu()
and get_rps_cpu(), as @next_cpu was the result of an AND with
rps_cpu_mask
This bug showed up on a host with 72 cpus :
next_cpu was 0x7f, and the code was trying to access percpu data of an
non existent cpu.
In a follow up patch, we might get rid of compares against nr_cpu_ids,
if we init the tables with 0. This is silly to test for a very unlikely
condition that exists only shortly after table initialization, as
we got rid of rps_reset_sock_flow() and similar functions that were
writing this RPS_NO_CPU magic value at flow dismantle : When table is
old enough, it never contains this value anymore.
Fixes: 567e4b7973 ("net: rfs: add hash collision detection")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.
To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dwmac-socfpga.c conflict was a case of a bug fix overlapping
changes in net-next to handle an error pointer differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move documentation into this century, even if this device hasn't
been available for some time.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The MTU values in the documentation do not match the source.
The source has frame limit of IXGBE_MAX_JUMBO_FRAME_SIZE (9728)
which is MTU of 9710 because of the accounting for Ethernet header
and CRC.
Also, don't refer to the obsolete ifconfig command.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ifconfig command is obsolete, best to remove all references so that
new users learn ip.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
AF_RDS, PF_RDS and SOL_RDS are available in header files,
and there is no need to get their values from /proc. Document
this correctly.
Fixes: 0c5f9b8830 ("RDS: Documentation")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CAN_RAW socket can set multiple CAN identifier specific filters that lead
to multiple filters in the af_can.c filter processing. These filters are
indenpendent from each other which leads to logical OR'ed filters when applied.
This socket option joines the given CAN filters in the way that only CAN frames
are passed to user space that matched *all* given CAN filters. The semantic for
the applied filters is therefore changed to a logical AND.
This is useful especially when the filterset is a combination of filters where
the CAN_INV_FILTER flag is set in order to notch single CAN IDs or CAN ID
ranges from the incoming traffic.
As the raw_rcv() function is executed from NET_RX softirq the introduced
variables are implemented as per-CPU variables to avoid extensive locking at
CAN frame reception time.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.
This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.
For now, the flag may be set for incoming packets only.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a tx_maxrate attribute to the tx queue sysfs entry allowing
for max-rate limiting. Along with DCB-ETS and BQL this provides another
knob to tune queue performance. The limit units are Mbps.
By default it is disabled. To disable the rate limitation after it
has been set for a queue, it should be set to zero.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove reference to obsolete ifconfig command.
MTU can be changed with ip command instead.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Namely tcp_probe_interval to control how often to restart
a probe. And tcp_probe_threshold to control when stop the
probing in respect to the width of search range in bytes
Signed-off-by: Fan Du <fan.du@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sysctl gives two benefits. By defaulting the table size to 0
mpls even when compiled in and enabled defaults to not forwarding
any packets. This prevents unpleasant surprises for users.
The other benefit is that as mpls labels are allocated locally a dense
table a small dense label table may be used which saves memory and
is extremely simple and efficient to implement.
This sysctl allows userspace to choose the restrictions on the label
table size userspace applications need to cope with.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
A small batch with accumulated updates in nf-next, mostly IPVS updates,
they are:
1) Add 64-bits stats counters to IPVS, from Julian Anastasov.
2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker
seem to require this, from Anton Blanchard.
3) Use boolean instead of numeric value in set_match_v*(), from
coccinelle via Fengguang Wu.
4) Allows rescheduling of new connections in IPVS when port reuse is
detected, from Marcelo Ricardo Leitner.
5) Add missing bits to support arptables extensions from nft_compat,
from Arturo Borrero.
Patrick is preparing a large batch to enhance the set infrastructure,
named expressions among other things, that should follow up soon after
this batch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.
This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Drop the '.o' suffix so this text properly covers both the
built-in and modular cases.
'insmod pktgen' obviously won't work; the command should be modprobe.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are Robert Olsson's samples which used to be available from
<ftp://robur.slu.se/pub/Linux/net-development/pktgen-testing/examples/>
but currently are not.
Change the documentation to refer to these consistently as 'sample
scripts', matching the directory name used here.
Cc: Robert Olsson <robert@herjulf.se>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to Rob Jones for suggesting some of the changes.
Cc: Rob Jones <rob.jones@codethink.co.uk>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This has been updated quite a few times since 2004, and git can
keep track of the actual date for us.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Highlights this time around include:
- A thrashing of SubmittingPatches to bring it out of the "send everything
to Linus" era of kernel development.
- A new document on completions from Nicholas McGuire
- Lots of typo fixes, formatting improvements, corrections, build fixes,
and more.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU2QvPAAoJEI3ONVYwIuV6UdAP/iFa3yK0jTMFJV49K2PhVPiW
9AxlcDT3mKCmGrCxS2ST84Kyvxi+bn5k6pQbOhXqPTLRzlWj7o+9zG76yCvhI1/0
mSUc/DoEZSlxQCAi4RH6lNFBtyopwOF1Fy2hqP8Wj7fOu2OxJ+DbTFJ7Cjy2Ybnq
REFT+28mD3GwBYv1r6mirbXsxBGUWK4avqUl9TPkC1AgVoksZ/aLrDffdmixL6Ul
cNBGGn3/mQhDCd6QgrHfdSe3BmL/vpiO5nr7BnSGe/hg65tUgi5d5s9tyP1eRkYo
d+3Ityz24ISCW6kAGH9udNzdtw2QA7vxdVIcz01RgxHQcYdcTyE7ZH28+A+aEtxm
ANkOO5PvaQJrCdPATOuVvwTPmK+xjdW7TmmPiJZ3QpVuPkFlUDP+amOgOajBDQ2l
UDHE2GfrUcjUG4FSUGXLVKXVAXuqLG8DG1rMSEf1utb80jxcuhK2kIrhjfRi4gle
gL3PKd7SfhMowm3QbaMxdiy0RpNK+IlJpiFsDFWUJwQCJvCtxlwL/RalfGxitjqs
RxJo+uPxFLrmsnM8fdw5a/82R0T/nclvnzniq5PoZROOJ6VzKvwn3oS9d63bliPI
JQTfNpbfYq8sRlrPlp+XETrrcbmmYyxgqvurLobNXb74cdJHEzHjqf9OG++NKtQQ
Jre067cSnkSrHAHtxNns
=r07E
-----END PGP SIGNATURE-----
Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6
Pull documentation updates from Jonathan Corbet:
"Highlights this time around include:
- A thrashing of SubmittingPatches to bring it out of the "send
everything to Linus" era of kernel development.
- A new document on completions from Nicholas McGuire
- Lots of typo fixes, formatting improvements, corrections, build
fixes, and more"
* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
can-doc: Fixed a wrong filepath in can.txt
Documentation: Fix trivial typo in comment.
kgdb,docs: Fix typo and minor style issues
Documentation: add description for FTRACE probe status
doc: brief user documentation for completion
Documentation/misc-devices/mei: Fix indentation of embedded code.
Documentation/misc-devices/mei: Fix indentation of enumeration.
Documentation/misc-devices/mei: Fix spacing around parentheses.
Documentation/misc-devices/mei: Fix formatting of headings.
Documentation: devicetree: Fix double words in Doumentation/devicetree
Documentation: mm: Fix typo in vm.txt
lockstat: Add documentation on contention and contenting points
Documentation: fix blackfin gptimers-example build errors
Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
CodingStyle: enable emacs display of trailing whitespace
DocBook: Do not exceed argument list limit
gpio: board.txt: Fix the gpio name example
Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
MAINTAINERS: Add the docs-next git tree to the maintainer entry
...
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.
This patch includes:
- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending
The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.
We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.
We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.
The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.
The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.
Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
<linux/can/error.h> moved in the big UAPI shuffle; update the document to
note its new location.
Signed-off-by: Stefan Tatschner <stefan@sevenbyte.org>
[jc: added changelog]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Update netlink_mmap.txt wrt. commit 4682a03586
("netlink: Always copy on mmap TX.").
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
test the implementation.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.
Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes (rfc -> v1)
- add documentation
- remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, flows were manipulated by userspace specifying a full,
unmasked flow key. This adds significant burden onto flow
serialization/deserialization, particularly when dumping flows.
This patch adds an alternative way to refer to flows using a
variable-length "unique flow identifier" (UFID). At flow setup time,
userspace may specify a UFID for a flow, which is stored with the flow
and inserted into a separate table for lookup, in addition to the
standard flow table. Flows created using a UFID must be fetched or
deleted using the UFID.
All flow dump operations may now be made more terse with OVS_UFID_F_*
flags. For example, the OVS_UFID_F_OMIT_KEY flag allows responses to
omit the flow key from a datapath operation if the flow has a
corresponding UFID. This significantly reduces the time spent assembling
and transacting netlink messages. With all OVS_UFID_F_OMIT_* flags
enabled, the datapath only returns the UFID and statistics for each flow
during flow dump, increasing ovs-vswitchd revalidator performance by 40%
or more.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter updates for net-next
The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:
1) Rise default number of buckets in conntrack from 16384 to 65536 in
systems with >= 4GBytes, patch from Marcelo Leitner.
2) Small refactor to save one level on indentation in xt_osf, from
Joe Perches.
3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.
4) Another small cleanup to remove redundant variable in nfnetlink,
from Duan Jiong.
5) Fix compilation warning in nfnetlink_cthelper on parisc, from
Chen Gang.
6) Fix wrong format in debugging for ctseqadj, from Gao feng.
7) Selective conntrack flushing through the mark for ctnetlink, patch
from Kristian Evensen.
8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
not required anymore after the selective flushing patch, again from
Kristian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/xen-netfront.c
Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.
Signed-off-by: David S. Miller <davem@davemloft.net>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update documentation to reflect the fact that
/proc/sys/net/ipv4/route/max_size is no longer used for ipv4.
Signed-off-by: Ani Sinha <ani@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A fix to ipv6 structure definitions removed the now superfluous
definition of in6_pktinfo in this file.
But, use of the glibc definition requires defining _GNU_SOURCE
(see also https://sourceware.org/bugzilla/show_bug.cgi?id=6775).
Before this change, the following would fail for me:
make
make headers_install
make M=Documentation/networking/timestamping
with
Documentation/networking/timestamping/txtimestamp.c: In function '__recv_errmsg_cmsg':
Documentation/networking/timestamping/txtimestamp.c:205:33: error: dereferencing pointer to incomplete type
Documentation/networking/timestamping/txtimestamp.c:206:23: error: dereferencing pointer to incomplete type
After this patch compilation succeeded.
Fixes: cd91cc5bdd ("doc: fix the compile error of txtimestamp.c")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vinson reported:
HOSTCC Documentation/networking/timestamping/txtimestamp
Documentation/networking/timestamping/txtimestamp.c:64:8: error:
redefinition of ‘struct in6_pktinfo’
struct in6_pktinfo {
^
In file included from /usr/include/arpa/inet.h:23:0,
from Documentation/networking/timestamping/txtimestamp.c:33:
/usr/include/netinet/in.h:456:8: note: originally defined here
struct in6_pktinfo
^
After we sync with libc header, we don't need this ugly hack any more.
Reported-by: Vinson Lee <vlee@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- altera_tse.txt was added by 04add4ab (Add Altera Ethernet (TSE)
Documentation)
- cdc_mbim.txt was added by a563babe (cdc_mbim: add driver
documentation)
- dctcp.txt was added by e3118e83 (tcp: add DCTCP congestion control
algorithm)
CC: Jonathan Corbet <corbet@lwn.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: linux-doc@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Manually bumping either nf_conntrack_buckets or nf_conntrack_max has
become a common task as our Linux servers tend to serve more and more
clients/applications, so let's adjust nf_conntrack_buckets this to a
more updated value.
Now for systems with more than 4GB of memory, nf_conntrack_buckets
becomes 65536 instead of 16384, resulting in nf_conntrack_max=256k
entries.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fix the typo, there should be "It".
On the other hand, fix whitespace errors detected by checkpatch.pl
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the erronous usage of an hexadecimal address in the
example, by replacing it with a decimal address.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Documentation:
expand explanation of timestamp counter
Test:
new: flag -I requests and prints PKTINFO
new: flag -x prints payload (possibly truncated)
fix: remove pretty print that breaks common flag '-l 1'
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.
on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.
on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.
In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.
The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:
ipv4: if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6: if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {
At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
v1 -> v2
large rewrite
- integrate with existing pktinfo cmsg generation code
- on ipv4: only send with new flag, to maintain legacy behavior
- on ipv6: send at most a single pktinfo cmsg
- on ipv6: initialize fields if not yet initialized
The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches
1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
(http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
cmsg from users without CAP_NET_RAW.
Signed-off-by: David S. Miller <davem@davemloft.net>
The goal of this is to provide a possibility to support various switch
chips. Drivers should implement relevant ndos to do so. Now there is
only one ndo defined:
- for getting physical switch id is in place.
Note that user can use random port netdevice to access the switch.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SOF_TIMESTAMPING_OPT_ID puts the id in ee_data, not ee_info.
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver is very similar to the macvlan driver except that it
uses L3 on the frame to determine the logical interface while
functioning as packet dispatcher. It inherits L2 of the master
device hence the packets on wire will have the same L2 for all
the packets originating from all virtual devices off of the same
master device.
This driver was developed keeping the namespace use-case in
mind. Hence most of the examples given here take that as the
base setup where main-device belongs to the default-ns and
virtual devices are assigned to the additional namespaces.
The device operates in two different modes and the difference
in these two modes in primarily in the TX side.
(a) L2 mode : In this mode, the device behaves as a L2 device.
TX processing upto L2 happens on the stack of the virtual device
associated with (namespace). Packets are switched after that
into the main device (default-ns) and queued for xmit.
RX processing is simple and all multicast, broadcast (if
applicable), and unicast belonging to the address(es) are
delivered to the virtual devices.
(b) L3 mode : In this mode, the device behaves like a L3 device.
TX processing upto L3 happens on the stack of the virtual device
associated with (namespace). Packets are switched to the
main-device (default-ns) for the L2 processing. Hence the routing
table of the default-ns will be used in this mode.
RX processins is somewhat similar to the L2 mode except that in
this mode only Unicast packets are delivered to the virtual device
while main-dev will handle all other packets.
The devices can be added using the "ip" command from the iproute2
package -
ip link add link <master> <virtual> type ipvlan mode [ l2 | l3 ]
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Tim Hockin <thockin@google.com>
Cc: Brandon Philips <brandon.philips@coreos.com>
Cc: Pavel Emelianov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently many changes have been done inside the driver
so this patch updates the driver's doc for example reviewing
information for the rx and tx processes that are managed
by napi method, adding new information for missing glue-logic files
etc.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was initially sent by Lorenzo Colitti, but was subsequently
lost in the final diff he submitted.
Signed-off-by: Loganaden Velvindron <logan@elandsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a sysctl that causes an interface's optimistic addresses
to be considered equivalent to other non-deprecated addresses
for source address selection purposes. Preferred addresses
will still take precedence over optimistic addresses, subject
to other ranking in the source address selection algorithm.
This is useful where different interfaces are connected to
different networks from different ISPs (e.g., a cell network
and a home wifi network).
The current behaviour complies with RFC 3484/6724, and it
makes sense if the host has only one interface, or has
multiple interfaces on the same network (same or cooperating
administrative domain(s), but not in the multiple distinct
networks case.
For example, if a mobile device has an IPv6 address on an LTE
network and then connects to IPv6-enabled wifi, while the wifi
IPv6 address is undergoing DAD, IPv6 connections will try use
the wifi default route with the LTE IPv6 address, and will get
stuck until they time out.
Also, because optimistic nodes can receive frames, issue
an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
flag appropriately set). A second RTM_NEWADDR is sent if DAD
completes (the address flags have changed), otherwise an
RTM_DELADDR is sent.
Also: add an entry in ip-sysctl.txt for optimistic_dad.
Signed-off-by: Erik Kline <ek@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing upcoming Yaogong patch (converting out of order queue
into an RB tree), I hit the max reordering level of linux TCP stack.
Reordering level was limited to 127 for no good reason, and some
network setups [1] can easily reach this limit and get limited
throughput.
Allow a new max limit of 300, and add a sysctl to allow admins to even
allow bigger (or lower) values if needed.
[1] Aggregation of links, per packet load balancing, fabrics not doing
deep packet inspections, alternative TCP congestion modules...
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yaogong Wang <wygivan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__sk_run_filter has been renamed as __bpf_prog_run, so replace them in comments
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Most notable changes in here:
1) By far the biggest accomplishment, thanks to a large range of
contributors, is the addition of multi-send for transmit. This is
the result of discussions back in Chicago, and the hard work of
several individuals.
Now, when the ->ndo_start_xmit() method of a driver sees
skb->xmit_more as true, it can choose to defer the doorbell
telling the driver to start processing the new TX queue entires.
skb->xmit_more means that the generic networking is guaranteed to
call the driver immediately with another SKB to send.
There is logic added to the qdisc layer to dequeue multiple
packets at a time, and the handling mis-predicted offloads in
software is now done with no locks held.
Finally, pktgen is extended to have a "burst" parameter that can
be used to test a multi-send implementation.
Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
virtio_net
Adding support is almost trivial, so export more drivers to
support this optimization soon.
I want to thank, in no particular or implied order, Jesper
Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
David Tat, Hannes Frederic Sowa, and Rusty Russell.
2) PTP and timestamping support in bnx2x, from Michal Kalderon.
3) Allow adjusting the rx_copybreak threshold for a driver via
ethtool, and add rx_copybreak support to enic driver. From
Govindarajulu Varadarajan.
4) Significant enhancements to the generic PHY layer and the bcm7xxx
driver in particular (EEE support, auto power down, etc.) from
Florian Fainelli.
5) Allow raw buffers to be used for flow dissection, allowing drivers
to determine the optimal "linear pull" size for devices that DMA
into pools of pages. The objective is to get exactly the
necessary amount of headers into the linear SKB area pre-pulled,
but no more. The new interface drivers use is eth_get_headlen().
From WANG Cong, with driver conversions (several had their own
by-hand duplicated implementations) by Alexander Duyck and Eric
Dumazet.
6) Support checksumming more smoothly and efficiently for
encapsulations, and add "foo over UDP" facility. From Tom
Herbert.
7) Add Broadcom SF2 switch driver to DSA layer, from Florian
Fainelli.
8) eBPF now can load programs via a system call and has an extensive
testsuite. Alexei Starovoitov and Daniel Borkmann.
9) Major overhaul of the packet scheduler to use RCU in several major
areas such as the classifiers and rate estimators. From John
Fastabend.
10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
Duyck.
11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
Dumazet.
12) Add Datacenter TCP congestion control algorithm support, From
Florian Westphal.
13) Reorganize sk_buff so that __copy_skb_header() is significantly
faster. From Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
netlabel: directly return netlbl_unlabel_genl_init()
net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
net: description of dma_cookie cause make xmldocs warning
cxgb4: clean up a type issue
cxgb4: potential shift wrapping bug
i40e: skb->xmit_more support
net: fs_enet: Add NAPI TX
net: fs_enet: Remove non NAPI RX
r8169:add support for RTL8168EP
net_sched: copy exts->type in tcf_exts_change()
wimax: convert printk to pr_foo()
af_unix: remove 0 assignment on static
ipv6: Do not warn for informational ICMP messages, regardless of type.
Update Intel Ethernet Driver maintainers list
bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
tipc: fix bug in multicast congestion handling
net: better IFF_XMIT_DST_RELEASE support
net/mlx4_en: remove NETDEV_TX_BUSY
3c59x: fix bad split of cpu_to_le32(pci_map_single())
net: bcmgenet: fix Tx ring priority programming
...
- eBPF JIT compiler for arm64
- CPU suspend backend for PSCI (firmware interface) with standard idle
states defined in DT (generic idle driver to be merged via a different
tree)
- Support for CONFIG_DEBUG_SET_MODULE_RONX
- Support for unmapped cpu-release-addr (outside kernel linear mapping)
- set_arch_dma_coherent_ops() implemented and bus notifiers removed
- EFI_STUB improvements when base of DRAM is occupied
- Typos in KGDB macros
- Clean-up to (partially) allow kernel building with LLVM
- Other clean-ups (extern keyword, phys_addr_t usage)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUNB6NAAoJEGvWsS0AyF7x22sP/1qPQvFoY71fSqTZmSY+kfgW
UMXhDFZOd+khD2TPHWptbgBRDElTQjRPHyISv/8ILKwDNoMlUDLlYkp1XPLM/nlB
ea9ou2GX8iktqgM2JF5r4vk1hjH6JqEGOUHyWKZc7ibphTVm3dhg3nWL1A4peOUG
0UyX79kl8BLAaggLSUhjtUz1GMpSNlb6Pc1ForUXaPMayBlOcVoOzh1ir7b5wb3e
IvotUY1gv+opE9uK0QPr1AJSfpCogPEfQ2TSCP8MQZjxkrEz69n0HaFvdy60rwf4
DaJiqBoQ5MSP3Bw+qvoYgyz+tfiPFAvEF+O3YQ5x3LBTteoooriFYH4mL7DsicAs
2WLor/342mHykE0bOc44/gNl8B/xaZNzvO2ezLYrjVGsiY2QHTZ7fXB8arPUvQSS
RUXVfHmcv4qthZjI17rgreBKvsfeFIMighSfvMJnVhGqDSvB8abjiPwZjzqB91Bq
pu5MDitNgR3k3ctwzRaS6JtH2CluVFv97xIS4VaD/hm3JnS5NPeTXFou3Gb3lvon
d/wXOIB3vY8FDMIt+BMCQPzWiU0liZ/sN7p1bsOmkgZ1wLOZ0nmsaHF09PDRGbtA
vifopwaw9qtNlcVrTB/rDBCDaT0Ds/mTYD/a3+ch5CYUeLmQmfW/vBMfq/3gUt65
JdI/nTVXawbl2CpBWw36
=SAfQ
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- eBPF JIT compiler for arm64
- CPU suspend backend for PSCI (firmware interface) with standard idle
states defined in DT (generic idle driver to be merged via a
different tree)
- Support for CONFIG_DEBUG_SET_MODULE_RONX
- Support for unmapped cpu-release-addr (outside kernel linear mapping)
- set_arch_dma_coherent_ops() implemented and bus notifiers removed
- EFI_STUB improvements when base of DRAM is occupied
- Typos in KGDB macros
- Clean-up to (partially) allow kernel building with LLVM
- Other clean-ups (extern keyword, phys_addr_t usage)
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (51 commits)
arm64: Remove unneeded extern keyword
ARM64: make of_device_ids const
arm64: Use phys_addr_t type for physical address
aarch64: filter $x from kallsyms
arm64: Use DMA_ERROR_CODE to denote failed allocation
arm64: Fix typos in KGDB macros
arm64: insn: Add return statements after BUG_ON()
arm64: debug: don't re-enable debug exceptions on return from el1_dbg
Revert "arm64: dmi: Add SMBIOS/DMI support"
arm64: Implement set_arch_dma_coherent_ops() to replace bus notifiers
of: amba: use of_dma_configure for AMBA devices
arm64: dmi: Add SMBIOS/DMI support
arm64: Correct ftrace calls to aarch64_insn_gen_branch_imm()
arm64:mm: initialize max_mapnr using function set_max_mapnr
setup: Move unmask of async interrupts after possible earlycon setup
arm64: LLVMLinux: Fix inline arm64 assembly for use with clang
arm64: pageattr: Correctly adjust unaligned start addresses
net: bpf: arm64: fix module memory leak when JIT image build fails
arm64: add PSCI CPU_SUSPEND based cpu_suspend support
arm64: kernel: introduce cpu_init_idle CPU operation
...
Pull documentation updates from Jiri Kosina:
"Updates to kernel documentation.
I took this over (hopefully temporarily) from Randy who was not
willing to maintain it any longer. This pile mostly is a relay of
queue that Randy already had in his tree"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/doc:
Documentation: fix broken v4l-utils URL
Documentation: update include path for mpssd
Documentation: correct parameter error for dma_mapping_error
MAINTAINERS: update location of linux-doc tree
Documentation: remove networking/.gitignore
tools: add more endian.h macros
Make Documenation depend on headers_install
Docs: this_cpu_ops: remove redundant add forms
Documentation: disable vdso_test to avoid breakage with old glibc
Documentation: update vDSO makefile to build portable examples
Documentation: update .gitignore files
Documentation: support glibc versions without htole macros
v4l2-pci-skeleton: Only build if PCI is available
Documentation: fix misc. warnings
Documentation: make functions static to avoid prototype warnings
Documentation: add makefiles for more targets
Documentation: use subdir-y to avoid unnecessary built-in.o files
1/ Step down as dmaengine maintainer see commit 08223d80df "dmaengine
maintainer update"
2/ Removal of net_dma, as it has been marked 'broken' since 3.13 (commit
7787380336 "net_dma: mark broken"), without reports of performance
regression.
3/ Miscellaneous fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUKDLKAAoJEB7SkWpmfYgC7wwP/iNHqRjf1suMUTBIF3P6Hgbe
VCUwh0IkuujMPDG46WRn6cYzarRxVPLoGaLHLPszgjI6pmGPVv19wqeDOlUxtcmr
0iQWEWv/zqseaAIW+4gj/WYCyMgKil49EUBJKCZCfNmIaad+e0pr8f0uE5yOkHPM
tqWoZERu9A4dlXGr1TjeOZVzdnPrCt92MrLDN6ZZ6tMuJaEc5PauaLxKTeGy5fYj
UB+k1xJQzECbsYfpB+uCVYl5/qPO1rNyuBYS8THCsW+JYmrbbfH2kkF2lo2FaUpO
8Yd50FtzXHKWwAt7BzfIwU2M7x0wRmryrC/xsQi6M+WmVeHYvvHUIpzaA66xRZ5x
fCy3Fu8sEnmnmboAbh2v2c5uTycqRl2xPzbpLAuxglloXIxzi3ckp6ESF/Z4SldH
oxIoEievN7lah3vKgvlHZYcWDzrYr8EKf/EzFe9RqDBQDKtzDzre1H9Uivr387Vm
uFUcGHYG/GXuX47C7EUsMtaSW2UEoR2ytw/HR6CKFPTVXwAzEO6kA9vg0EqL0iIq
2wVLgavlZuwegmaUBgnr+bgVZMvVN7OU7fAIRVe5xNO6itrPKvheSlQthmRiiq9C
uzOu4PS6PexqzHUNPCcJpCsj+lawmCSrE0bxtPzTA/CQInVgWs219V9+W5Gn/0YA
EARN9k6ueX9PZPQrPQLm
=BBBv
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine
Pull dmaengine updates from Dan Williams:
"Even though this has fixes marked for -stable, given the size and the
needed conflict resolutions this is 3.18-rc1/merge-window material.
These patches have been languishing in my tree for a long while. The
fact that I do not have the time to do proper/prompt maintenance of
this tree is a primary factor in the decision to step down as
dmaengine maintainer. That and the fact that the bulk of drivers/dma/
activity is going through Vinod these days.
The net_dma removal has not been in -next. It has developed simple
conflicts against mainline and net-next (for-3.18).
Continuing thanks to Vinod for staying on top of drivers/dma/.
Summary:
1/ Step down as dmaengine maintainer see commit 08223d80df
"dmaengine maintainer update"
2/ Removal of net_dma, as it has been marked 'broken' since 3.13
(commit 7787380336 "net_dma: mark broken"), without reports of
performance regression.
3/ Miscellaneous fixes"
* tag 'dmaengine-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
net: make tcp_cleanup_rbuf private
net_dma: revert 'copied_early'
net_dma: simple removal
dmaengine maintainer update
dmatest: prevent memory leakage on error path in thread
ioat: Use time_before_jiffies()
dmaengine: fix xor sources continuation
dma: mv_xor: Rename __mv_xor_slot_cleanup() to mv_xor_slot_cleanup()
dma: mv_xor: Remove all callers of mv_xor_slot_cleanup()
dma: mv_xor: Remove unneeded mv_xor_clean_completed_slots() call
ioat: Use pci_enable_msix_exact() instead of pci_enable_msix()
drivers: dma: Include appropriate header file in dca.c
drivers: dma: Mark functions as static in dma_v3.c
dma: mv_xor: Add DMA API error checks
ioat/dca: Use dev_is_pci() to check whether it is pci device
This patch demonstrates the effect of delaying update of HW tailptr.
(based on earlier patch by Jesper)
burst=1 is the default. It sends one packet with xmit_more=false
burst=2 sends one packet with xmit_more=true and
2nd copy of the same packet with xmit_more=false
burst=3 sends two copies of the same packet with xmit_more=true and
3rd copy with xmit_more=false
Performance with ixgbe (usec 30):
burst=1 tx:9.2 Mpps
burst=2 tx:13.5 Mpps
burst=3 tx:14.5 Mpps full 10G line rate
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This work adds the DataCenter TCP (DCTCP) congestion control
algorithm [1], which has been first published at SIGCOMM 2010 [2],
resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more
recently as an informational IETF draft available at [4]).
DCTCP is an enhancement to the TCP congestion control algorithm for
data center networks. Typical data center workloads are i.e.
i) partition/aggregate (queries; bursty, delay sensitive), ii) short
messages e.g. 50KB-1MB (for coordination and control state; delay
sensitive), and iii) large flows e.g. 1MB-100MB (data update;
throughput sensitive). DCTCP has therefore been designed for such
environments to provide/achieve the following three requirements:
* High burst tolerance (incast due to partition/aggregate)
* Low latency (short flows, queries)
* High throughput (continuous data updates, large file
transfers) with commodity, shallow buffered switches
The basic idea of its design consists of two fundamentals: i) on the
switch side, packets are being marked when its internal queue
length > threshold K (K is chosen so that a large enough headroom
for marked traffic is still available in the switch queue); ii) the
sender/host side maintains a moving average of the fraction of marked
packets, so each RTT, F is being updated as follows:
F := X / Y, where X is # of marked ACKs, Y is total # of ACKs
alpha := (1 - g) * alpha + g * F, where g is a smoothing constant
The resulting alpha (iow: probability that switch queue is congested)
is then being used in order to adaptively decrease the congestion
window W:
W := (1 - (alpha / 2)) * W
The means for receiving marked packets resp. marking them on switch
side in DCTCP is the use of ECN.
RFC3168 describes a mechanism for using Explicit Congestion Notification
from the switch for early detection of congestion, rather than waiting
for segment loss to occur.
However, this method only detects the presence of congestion, not
the *extent*. In the presence of mild congestion, it reduces the TCP
congestion window too aggressively and unnecessarily affects the
throughput of long flows [4].
DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN)
processing to estimate the fraction of bytes that encounter congestion,
rather than simply detecting that some congestion has occurred. DCTCP
then scales the TCP congestion window based on this estimate [4],
thus it can derive multibit feedback from the information present in
the single-bit sequence of marks in its control law. And thus act in
*proportion* to the extent of congestion, not its *presence*.
Switches therefore set the Congestion Experienced (CE) codepoint in
packets when internal queue lengths exceed threshold K. Resulting,
DCTCP delivers the same or better throughput than normal TCP, while
using 90% less buffer space.
It was found in [2] that DCTCP enables the applications to handle 10x
the current background traffic, without impacting foreground traffic.
Moreover, a 10x increase in foreground traffic did not cause any
timeouts, and thus largely eliminates TCP incast collapse problems.
The algorithm itself has already seen deployments in large production
data centers since then.
We did a long-term stress-test and analysis in a data center, short
summary of our TCP incast tests with iperf compared to cubic:
This test measured DCTCP throughput and latency and compared it with
CUBIC throughput and latency for an incast scenario. In this test, 19
senders sent at maximum rate to a single receiver. The receiver simply
ran iperf -s.
The senders ran iperf -c <receiver> -t 30. All senders started
simultaneously (using local clocks synchronized by ntp).
This test was repeated multiple times. Below shows the results from a
single test. Other tests are similar. (DCTCP results were extremely
consistent, CUBIC results show some variance induced by the TCP timeouts
that CUBIC encountered.)
For this test, we report statistics on the number of TCP timeouts,
flow throughput, and traffic latency.
1) Timeouts (total over all flows, and per flow summaries):
CUBIC DCTCP
Total 3227 25
Mean 169.842 1.316
Median 183 1
Max 207 5
Min 123 0
Stddev 28.991 1.600
Timeout data is taken by measuring the net change in netstat -s
"other TCP timeouts" reported. As a result, the timeout measurements
above are not restricted to the test traffic, and we believe that it
is likely that all of the "DCTCP timeouts" are actually timeouts for
non-test traffic. We report them nevertheless. CUBIC will also include
some non-test timeouts, but they are drawfed by bona fide test traffic
timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing
TCP timeouts. DCTCP reduces timeouts by at least two orders of
magnitude and may well have eliminated them in this scenario.
2) Throughput (per flow in Mbps):
CUBIC DCTCP
Mean 521.684 521.895
Median 464 523
Max 776 527
Min 403 519
Stddev 105.891 2.601
Fairness 0.962 0.999
Throughput data was simply the average throughput for each flow
reported by iperf. By avoiding TCP timeouts, DCTCP is able to
achieve much better per-flow results. In CUBIC, many flows
experience TCP timeouts which makes flow throughput unpredictable and
unfair. DCTCP, on the other hand, provides very clean predictable
throughput without incurring TCP timeouts. Thus, the standard deviation
of CUBIC throughput is dramatically higher than the standard deviation
of DCTCP throughput.
Mean throughput is nearly identical because even though cubic flows
suffer TCP timeouts, other flows will step in and fill the unused
bandwidth. Note that this test is something of a best case scenario
for incast under CUBIC: it allows other flows to fill in for flows
experiencing a timeout. Under situations where the receiver is issuing
requests and then waiting for all flows to complete, flows cannot fill
in for timed out flows and throughput will drop dramatically.
3) Latency (in ms):
CUBIC DCTCP
Mean 4.0088 0.04219
Median 4.055 0.0395
Max 4.2 0.085
Min 3.32 0.028
Stddev 0.1666 0.01064
Latency for each protocol was computed by running "ping -i 0.2
<receiver>" from a single sender to the receiver during the incast
test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure
that traffic traversed the DCTCP queue and was not dropped when the
queue size was greater than the marking threshold. The summary
statistics above are over all ping metrics measured between the single
sender, receiver pair.
The latency results for this test show a dramatic difference between
CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer
which incurs the maximum queue latency (more buffer memory will lead
to high latency.) DCTCP, on the other hand, deliberately attempts to
keep queue occupancy low. The result is a two orders of magnitude
reduction of latency with DCTCP - even with a switch with relatively
little RAM. Switches with larger amounts of RAM will incur increasing
amounts of latency for CUBIC, but not for DCTCP.
4) Convergence and stability test:
This test measured the time that DCTCP took to fairly redistribute
bandwidth when a new flow commences. It also measured DCTCP's ability
to remain stable at a fair bandwidth distribution. DCTCP is compared
with CUBIC for this test.
At the commencement of this test, a single flow is sending at maximum
rate (near 10 Gbps) to a single receiver. One second after that first
flow commences, a new flow from a distinct server begins sending to
the same receiver as the first flow. After the second flow has sent
data for 10 seconds, the second flow is terminated. The first flow
sends for an additional second. Ideally, the bandwidth would be evenly
shared as soon as the second flow starts, and recover as soon as it
stops.
The results of this test are shown below. Note that the flow bandwidth
for the two flows was measured near the same time, but not
simultaneously.
DCTCP performs nearly perfectly within the measurement limitations
of this test: bandwidth is quickly distributed fairly between the two
flows, remains stable throughout the duration of the test, and
recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth
fairly, and has trouble remaining stable.
CUBIC DCTCP
Seconds Flow 1 Flow 2 Seconds Flow 1 Flow 2
0 9.93 0 0 9.92 0
0.5 9.87 0 0.5 9.86 0
1 8.73 2.25 1 6.46 4.88
1.5 7.29 2.8 1.5 4.9 4.99
2 6.96 3.1 2 4.92 4.94
2.5 6.67 3.34 2.5 4.93 5
3 6.39 3.57 3 4.92 4.99
3.5 6.24 3.75 3.5 4.94 4.74
4 6 3.94 4 5.34 4.71
4.5 5.88 4.09 4.5 4.99 4.97
5 5.27 4.98 5 4.83 5.01
5.5 4.93 5.04 5.5 4.89 4.99
6 4.9 4.99 6 4.92 5.04
6.5 4.93 5.1 6.5 4.91 4.97
7 4.28 5.8 7 4.97 4.97
7.5 4.62 4.91 7.5 4.99 4.82
8 5.05 4.45 8 5.16 4.76
8.5 5.93 4.09 8.5 4.94 4.98
9 5.73 4.2 9 4.92 5.02
9.5 5.62 4.32 9.5 4.87 5.03
10 6.12 3.2 10 4.91 5.01
10.5 6.91 3.11 10.5 4.87 5.04
11 8.48 0 11 8.49 4.94
11.5 9.87 0 11.5 9.9 0
SYN/ACK ECT test:
This test demonstrates the importance of ECT on SYN and SYN-ACK packets
by measuring the connection probability in the presence of competing
flows for a DCTCP connection attempt *without* ECT in the SYN packet.
The test was repeated five times for each number of competing flows.
Competing Flows 1 | 2 | 4 | 8 | 16
------------------------------
Mean Connection Probability 1 | 0.67 | 0.45 | 0.28 | 0
Median Connection Probability 1 | 0.65 | 0.45 | 0.25 | 0
As the number of competing flows moves beyond 1, the connection
probability drops rapidly.
Enabling DCTCP with this patch requires the following steps:
DCTCP must be running both on the sender and receiver side in your
data center, i.e.:
sysctl -w net.ipv4.tcp_congestion_control=dctcp
Also, ECN functionality must be enabled on all switches in your
data center for DCTCP to work. The default ECN marking threshold (K)
heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at
1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]).
In above tests, for each switch port, traffic was segregated into two
queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of
0x04 - the packet was placed into the DCTCP queue. All other packets
were placed into the default drop-tail queue. For the DCTCP queue,
RED/ECN marking was enabled, here, with a marking threshold of 75 KB.
More details however, we refer you to the paper [2] under section 3).
There are no code changes required to applications running in user
space. DCTCP has been implemented in full *isolation* of the rest of
the TCP code as its own congestion control module, so that it can run
without a need to expose code to the core of the TCP stack, and thus
nothing changes for non-DCTCP users.
Changes in the CA framework code are minimal, and DCTCP algorithm
operates on mechanisms that are already available in most Silicon.
The gain (dctcp_shift_g) is currently a fixed constant (1/16) from
the paper, but we leave the option that it can be chosen carefully
to a different value by the user.
In case DCTCP is being used and ECN support on peer site is off,
DCTCP falls back after 3WHS to operate in normal TCP Reno mode.
ss {-4,-6} -t -i diag interface:
... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054
ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584
send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15
reordering:101 rcv_space:29200
... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448
cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate
325.5Mbps rcv_rtt:1.5 rcv_space:29200
More information about DCTCP can be found in [1-4].
[1] http://simula.stanford.edu/~alizade/Site/DCTCP.html
[2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
[3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
[4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00
Joint work with Florian Westphal and Glenn Judd.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Per commit "77873803363c net_dma: mark broken" net_dma is no longer used
and there is no plan to fix it.
This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards.
Reverting the remainder of the net_dma induced changes is deferred to
subsequent patches.
Marked for stable due to Roman's report of a memory leak in
dma_pin_iovec_pages():
https://lkml.org/lkml/2014/9/3/177
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: David Whipple <whipple@securedatainnovations.ch>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: <stable@vger.kernel.org>
Reported-by: Roman Gushchin <klamm@yandex-team.ru>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
this patch adds all of eBPF verfier documentation and empty bpf_check()
The end goal for the verifier is to statically check safety of the program.
Verifier will catch:
- loops
- out of range jumps
- unreachable instructions
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention
More details in Documentation/networking/filter.txt
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF syscall is a multiplexor for a range of different operations on eBPF.
This patch introduces syscall with single command to create a map.
Next patch adds commands to access maps.
'maps' is a generic storage of different types for sharing data between kernel
and userspace.
Userspace example:
/* this syscall wrapper creates a map with given type and attributes
* and returns map_fd on success.
* use close(map_fd) to delete the map
*/
int bpf_create_map(enum bpf_map_type map_type, int key_size,
int value_size, int max_entries)
{
union bpf_attr attr = {
.map_type = map_type,
.key_size = key_size,
.value_size = value_size,
.max_entries = max_entries
};
return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
'union bpf_attr' is backwards compatible with future extensions.
More details in Documentation/networking/filter.txt and in manpage
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some missing files to .gitignore.
Push Documentation/.gitignore down into subdirectories.
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Add a bunch of previously unbuilt source files to the Documentation build
machinery.
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Change the Documentation makefiles from obj-m to subdir-y
to avoid generating unnecessary built-in.o files since nothing
in Documentation/ is ever linked in to vmlinux.
Signed-off-by: Peter Foley <pefoley2@pefoley.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
protected by a lock, meaning that hosts can be stuck hard if all cpus
want to check ICMP limits.
When say a DNS or NTP server process is restarted, inetpeer tree grows
quick and machine comes to its knees.
iptables can not help because the bottleneck happens before ICMP
messages are even cooked and sent.
This patch adds a new global limitation, using a token bucket filter,
controlled by two new sysctl :
icmp_msgs_per_sec - INTEGER
Limit maximal number of ICMP packets sent per second from this host.
Only messages whose type matches icmp_ratemask are
controlled by this limit.
Default: 1000
icmp_msgs_burst - INTEGER
icmp_msgs_per_sec controls number of ICMP packets sent per second,
while icmp_msgs_burst controls the burst size of these packets.
Default: 50
Note that if we really want to send millions of ICMP messages per
second, we might extend idea and infra added in commit 04ca6973f7
("ip: make IP identifiers less predictable") :
add a token bucket in the ip_idents hash and no longer rely on inetpeer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
arch/mips/net/bpf_jit.c
drivers/net/can/flexcan.c
Both the flexcan and MIPS bpf_jit conflicts were cases of simple
overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit c801e3cc19 ("ipv4: Clarify in docs that accept_local requires rp_filter.").
It is not needed anymore since commit 1dced6a854 ("ipv4: Restore accept_local behaviour in fib_validate_source()").
Suggested-by: Julian Anastasov <ja@ssi.bg>
Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
All previous instructions were 8-byte. This is first 16-byte instruction.
Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
insn[0].code = BPF_LD | BPF_DW | BPF_IMM
insn[0].dst_reg = destination register
insn[0].imm = lower 32-bit
insn[1].code = 0
insn[1].imm = upper 32-bit
All unused fields must be zero.
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
which loads 32-bit immediate value into a register.
x64 JITs it as single 'movabsq %rax, imm64'
arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn
Note that old eBPF programs are binary compatible with new interpreter.
It helps eBPF programs load 64-bit constant into a register with one
instruction instead of using two registers and 4 instructions:
BPF_MOV32_IMM(R1, imm32)
BPF_ALU64_IMM(BPF_LSH, R1, 32)
BPF_MOV32_IMM(R2, imm32)
BPF_ALU64_REG(BPF_OR, R1, R2)
User space generated programs will use this instruction to load constants only.
To tell kernel that user space needs a pointer the _pseudo_ variant of
this instruction may be added later, which will use extra bits of encoding
to indicate what type of pointer user space is asking kernel to provide.
For example 'off' or 'src_reg' fields can be used for such purpose.
src_reg = 1 could mean that user space is asking kernel to validate and
load in-kernel map pointer.
src_reg = 2 could mean that user space needs readonly data section pointer
src_reg = 3 could mean that user space needs a pointer to per-cpu local data
All such future pseudo instructions will not be carrying the actual pointer
as part of the instruction, but rather will be treated as a request to kernel
to provide one. The kernel will verify the request_for_a_pointer, then
will drop _pseudo_ marking and will store actual internal pointer inside
the instruction, so the end result is the interpreter and JITs never
see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
loads 64-bit immediate into a register. User space never operates on direct
pointers and verifier can easily recognize request_for_pointer vs other
instructions.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The JIT compiler emits A64 instructions. It supports eBPF only.
Legacy BPF is supported thanks to conversion by BPF core.
JIT is enabled in the same way as for other architectures:
echo 1 > /proc/sys/net/core/bpf_jit_enable
Or for additional compiler output:
echo 2 > /proc/sys/net/core/bpf_jit_enable
See Documentation/networking/filter.txt for more information.
The implementation passes all 57 tests in lib/test_bpf.c
on ARMv8 Foundation Model :) Also tested by Will on Juno platform.
Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A buffer is incorrectly zeroed to the length of the pointer. If
cfg_payload_len < sizeof(void *) this can overwrites unrelated memory.
The buffer contents are never read, so no need to zero.
Fixes: 8fe2f761ca ("net-timestamp: expand documentation")
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As in IPv6 people might increase the igmp query robustness variable to
make sure unsolicited state change reports aren't lost on the network. Add
and document this new knob to igmp code.
RFCs allow tuning this parameter back to first IGMP RFC, so we also use
this setting for all counters, including source specific multicast.
Also take over sysctl value when upping the interface and don't reuse
the last one seen on the interface.
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query
robustness variable. It specifies how many retransmit of unsolicited mld
retransmit should happen. Admins might want to tune this on lossy links.
Also reset mld state on interface down/up, so we pick up new sysctl
settings during interface up event.
IPv6 certification requests this knob to be available.
I didn't make this knob netns specific, as it is mostly a setting in a
physical environment and should be per host.
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expand Documentation/networking/timestamping.txt with new
interfaces and bytestream timestamping. Also minor
cleanup of the other text.
Import txtimestamp.c test of the new features.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Missing documentation for gc_thresh2 sysctl.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds newly added FCoE files to the build but only if FCoE module is configured.
Also, updates i40e document for added FCoE support.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Jack Morgan<jack.morgan@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
clean up names related to socket filtering and bpf in the following way:
- everything that deals with sockets keeps 'sk_*' prefix
- everything that is pure BPF is changed to 'bpf_*' prefix
split 'struct sk_filter' into
struct sk_filter {
atomic_t refcnt;
struct rcu_head rcu;
struct bpf_prog *prog;
};
and
struct bpf_prog {
u32 jited:1,
len:31;
struct sock_fprog_kern *orig_prog;
unsigned int (*bpf_func)(const struct sk_buff *skb,
const struct bpf_insn *filter);
union {
struct sock_filter insns[0];
struct bpf_insn insnsi[0];
struct work_struct work;
};
};
so that 'struct bpf_prog' can be used independent of sockets and cleans up
'unattached' bpf use cases
split SK_RUN_FILTER macro into:
SK_RUN_FILTER to be used with 'struct sk_filter *' and
BPF_PROG_RUN to be used with 'struct bpf_prog *'
__sk_filter_release(struct sk_filter *) gains
__bpf_prog_release(struct bpf_prog *) helper function
also perform related renames for the functions that work
with 'struct bpf_prog *', since they're on the same lines:
sk_filter_size -> bpf_prog_size
sk_filter_select_runtime -> bpf_prog_select_runtime
sk_filter_free -> bpf_prog_free
sk_unattached_filter_create -> bpf_prog_create
sk_unattached_filter_destroy -> bpf_prog_destroy
sk_store_orig_filter -> bpf_prog_store_orig_filter
sk_release_orig_filter -> bpf_release_orig_filter
__sk_migrate_filter -> bpf_migrate_filter
__sk_prepare_filter -> bpf_prepare_filter
API for attaching classic BPF to a socket stays the same:
sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
which is used by sockets, tun, af_packet
API for 'unattached' BPF programs becomes:
bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>