linux

Author	SHA1	Message	Date
Edward Cree	42ca087fbf	sfc: add module parameter to enable MCDI logging on new functions As many issues are encountered at probe time, where MCDI logging can't be enabled through the sysfs node, this change adds a module parameter 'mcdi_logging_default', which defaults to false. When set to true, newly- probed functions will have MCDI logging enabled. The setting can subsequently be changed as normal through the sysfs node. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:54:51 -04:00
Edward Cree	e7fef9b45a	sfc: add sysfs entry to control MCDI tracing MCDI tracing is enabled per-function with a sysfs file /sys/class/net/<NET_DEV>/device/mcdi_logging Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:54:51 -04:00
Edward Cree	75aba2a52d	sfc: add tracing of MCDI commands MCDI tracing is conditional on CONFIG_SFC_MCDI_LOGGING, which is enabled by default. Each MCDI command will produce a console line like sfc dom🚌dev:fn ifname: MCDI RPC REQ: xxxxxxxx [yyyyyyyy...] where xxxxxxxx etc. are the raw MCDI payload in 32-bit hex chunks. The response will then produce a similar line with "RESP" instead of "REQ", and containing the MCDI response payload (if any). Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:54:51 -04:00
Sorin Dumitru	14e1d0fa97	vxlan: release lock after each bucket in vxlan_cleanup We're seeing some softlockups from this function when there are a lot fdb entries on a vxlan device. Taking the lock for each bucket instead of the whole table is enough to fix that. Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:33:21 -04:00
Eric Dumazet	07f4c90062	tcp/dccp: try to not exhaust ip_local_port_range in connect() A long standing problem on busy servers is the tiny available TCP port range (/proc/sys/net/ipv4/ip_local_port_range) and the default sequential allocation of source ports in connect() system call. If a host is having a lot of active TCP sessions, chances are very high that all ports are in use by at least one flow, and subsequent bind(0) attempts fail, or have to scan a big portion of space to find a slot. In this patch, I changed the starting point in __inet_hash_connect() so that we try to favor even [1] ports, leaving odd ports for bind() users. We still perform a sequential search, so there is no guarantee, but if connect() targets are very different, end result is we leave more ports available to bind(), and we spread them all over the range, lowering time for both connect() and bind() to find a slot. This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range is even, ie if start/end values have different parity. Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to 32768 - 60999 (instead of 32768 - 61000) There is no change on security aspects here, only some poor hashing schemes could be eventually impacted by this change. [1] : The odd/even property depends on ip_local_port_range values parity Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:30:44 -04:00
David S. Miller	837b9955b1	Merge branch 'ip_frag_next' Florian Westphal says: ==================== net: force refragmentation for DF reassembed skbs output path tests: if (skb->len > mtu) ip_fragment() This breaks connectivity in one corner case: If the skb was reassembled, but has the DF bit set and .. .. its reassembled size is <= outdev mtu .. .. we will forward a DF packet larger than what the sender transmitted on wire. If a router later in the path can't forward this packet, it will send an icmp error in response to an mtu that the original sender never exceeded. This changes ipv4 defrag/output path to a) force refragmentation for DF reassembled skbs and b) set DF bit on all fragments when refragmenting if it was set on original frags. tested via: from scapy.all import * dip="10.23.42.2" payload="A"*1400 packet=IP(dst=dip,id=12345,flags='DF')/UDP(sport=42,dport=42)/payload frags=fragment(packet,fragsize=1200) for fragment in frags: send(fragment) Without this patch, we generate fragments without df bit set based on the outgoing device mtu when fragmenting after forwarding, ie. IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204) 192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400 IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244) 192.168.7.1 > 10.23.42.2: ip-proto-17 on ingress will either turn into IP (ttl 63, id 12345, offset 0, flags [+], proto UDP (17), length 1396) 192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400 IP (ttl 63, id 12345, offset 1376, flags [none], proto UDP (17), length 52) (mtu 1400: We strip df and send larger fragment), or IP (ttl 63, id 12345, offset 0, flags [DF], proto UDP (17), length 1428) 192.168.7.1.42 > 10.23.42.2.42: [udp sum ok] UDP, length 1400 if mtu is 1500. And in this case things break; router with a smaller mtu will send icmp error, but original sender only sent packets <= 1204 byte. With patch, we keep intent of such fragments and will emit DF-fragments that won't exceed 1204 byte in size. Joint work with Hannes Frederic Sowa. Changes since v2: - split unrelated patches from series - rework changelog of patch #2 to better illustrate breakage ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:03:31 -04:00
Florian Westphal	d6b915e29f	ip_fragment: don't forward defragmented DF packet We currently always send fragments without DF bit set. Thus, given following setup: mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280 A R1 R2 B Where R1 and R2 run linux with netfilter defragmentation/conntrack enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1 will respond with icmp too big error if one of these fragments exceeded 1400 bytes. However, if R1 receives fragment sizes 1200 and 100, it would forward the reassembled packet without refragmenting, i.e. R2 will send an icmp error in response to a packet that was never sent, citing mtu that the original sender never exceeded. The other minor issue is that a refragmentation on R1 will conceal the MTU of R2-B since refragmentation does not set DF bit on the fragments. This modifies ip_fragment so that we track largest fragment size seen both for DF and non-DF packets, and set frag_max_size to the largest value. If the DF fragment size is larger or equal to the non-df one, we will consider the packet a path mtu probe: We set DF bit on the reassembled skb and also tag it with a new IPCB flag to force refragmentation even if skb fits outdev mtu. We will also set DF bit on each fragment in this case. Joint work with Hannes Frederic Sowa. Reported-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:03:31 -04:00
Florian Westphal	c5501eb340	net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper ip_skb_dst_mtu is small inline helper, but its called in several places. before: 17061 44 0 17105 42d1 net/ipv4/ip_output.o after: 16805 44 0 16849 41d1 net/ipv4/ip_output.o Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 13:03:30 -04:00
David S. Miller	8c0ce7705e	Merge branch 'phy_rgmii' Florian Fainelli says: ==================== net: phy: phy_interface_is_rgmii helper As you suggested, here is the helper function to avoid missing some RGMII interface checks. Had to wait for net to be merged in net-next to avoid submitting the same patch/commit. Dan, you might want to rebase your dp83867 submission to use that helper when you this patchset gets merged into net-next, thanks! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 00:27:36 -04:00
Florian Fainelli	32a641615a	net: phy: Utilize phy_interface_is_rgmii Update all open-coded tests for all 4 PHY_INTERFACE_MODE_RGMII* values to use the newly introduced helper: phy_interface_is_rgmii. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 00:27:35 -04:00
Florian Fainelli	e463d88c36	net: phy: Add phy_interface_is_rgmii helper RGMII interfaces come in 4 different flavors that the PHY library needs to care about: regular RGMII (no delays), RGMII with either RX or TX delay, and both. In order to avoid errors of checking only for one type of RGMII interface and miss the 3 others, introduce a convenience function which tests for all values. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 00:27:35 -04:00
David S. Miller	ffa915d071	ipv4: Fix fib_trie.c build, missing linux/vmalloc.h include. We used to get this indirectly I supposed, but no longer do. Either way, an explicit include should have been done in the first place. net/ipv4/fib_trie.c: In function '__node_free_rcu': >> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration] vfree(n); ^ net/ipv4/fib_trie.c: In function 'tnode_alloc': >> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration] return vzalloc(size); ^ >> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer without a cast cc1: some warnings being treated as errors Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-27 00:19:03 -04:00
Eric Dumazet	d6a4e26afb	tcp: tcp_tso_autosize() minimum is one packet By making sure sk->sk_gso_max_segs minimal value is one, and sysctl_tcp_min_tso_segs minimal value is one as well, tcp_tso_autosize() will return a non zero value. We can then revert `843925f33f` ("tcp: Do not apply TSO segment limit to non-TSO packets") and save few cpu cycles in fast path. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-26 23:21:29 -04:00
Eric Dumazet	095dc8e0c3	tcp: fix/cleanup inet_ehash_locks_alloc() If tcp ehash table is constrained to a very small number of buckets (eg boot parameter thash_entries=128), then we can crash if spinlock array has more entries. While we are at it, un-inline inet_ehash_locks_alloc() and make following changes : - Budget 2 cache lines per cpu worth of 'spinlocks' - Try to kmalloc() the array to avoid extra TLB pressure. (Most servers at Google allocate 8192 bytes for this hash table) - Get rid of various #ifdef Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-26 19:48:46 -04:00
Jon Paul Maloy	f3903bcc00	tipc: fix bug in link protocol message create function In commit `dd3f9e70f5` ("tipc: add packet sequence number at instant of transmission") we made a change with the consequence that packets in the link backlog queue don't contain valid sequence numbers. However, when we create a link protocol message, we still use the sequence number of the first packet in the backlog, if there is any, as "next_sent" indicator in the message. This may entail unnecessary retransissions or stale packet transmission when there is very low traffic on the link. This commit fixes this issue by only using the current value of tipc_link::snd_nxt as indicator. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-26 19:43:03 -04:00
Eric Dumazet	05c985436d	net: fix inet_proto_csum_replace4() sparse errors make C=2 CF=-D__CHECK_ENDIAN__ net/core/utils.o ... net/core/utils.c:307:72: warning: incorrect type in argument 2 (different base types) net/core/utils.c:307:72: expected restricted __wsum [usertype] addend net/core/utils.c:307:72: got restricted __be32 [usertype] from net/core/utils.c:308:34: warning: incorrect type in argument 2 (different base types) net/core/utils.c:308:34: expected restricted __wsum [usertype] addend net/core/utils.c:308:34: got restricted __be32 [usertype] to net/core/utils.c:310:70: warning: incorrect type in argument 2 (different base types) net/core/utils.c:310:70: expected restricted __wsum [usertype] addend net/core/utils.c:310:70: got restricted __be32 [usertype] from net/core/utils.c:310:77: warning: incorrect type in argument 2 (different base types) net/core/utils.c:310:77: expected restricted __wsum [usertype] addend net/core/utils.c:310:77: got restricted __be32 [usertype] to net/core/utils.c:312:72: warning: incorrect type in argument 2 (different base types) net/core/utils.c:312:72: expected restricted __wsum [usertype] addend net/core/utils.c:312:72: got restricted __be32 [usertype] from net/core/utils.c:313:35: warning: incorrect type in argument 2 (different base types) net/core/utils.c:313:35: expected restricted __wsum [usertype] addend net/core/utils.c:313:35: got restricted __be32 [usertype] to Note we can use csum_replace4() helper Fixes: `58e3cac561` ("net: optimise inet_proto_csum_replace4()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 22:56:47 -04:00
Eric Dumazet	68319052d1	net: remove a sparse error in secure_dccpv6_sequence_number() make C=2 CF=-D__CHECK_ENDIAN__ net/core/secure_seq.o net/core/secure_seq.c:157:50: warning: restricted __be32 degrades to integer Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 22:55:37 -04:00
Wilson Kok	eb8d7baae2	bridge: skip fdb add if the port shouldn't learn Check in fdb_add_entry() if the source port should learn, similar check is used in br_fdb_update. Note that new fdb entries which are added manually or as local ones are still permitted. This patch has been tested by running traffic via a bridge port and switching the port's state, also by manually adding/removing entries from the bridge's fdb. Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 20:29:54 -04:00
Eric Dumazet	d496958145	pktgen: remove one sparse error net/core/pktgen.c:2672:43: warning: incorrect type in assignment (different base types) net/core/pktgen.c:2672:43: expected unsigned short [unsigned] [short] [usertype] <noident> net/core/pktgen.c:2672:43: got restricted __be16 [usertype] protocol Let's use proper struct ethhdr instead of hard coding everything. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 20:27:50 -04:00
Eric Dumazet	7f1598678d	ipv6: ipv6_select_ident() returns a __be32 ipv6_select_ident() returns a 32bit value in network order. Fixes: `286c2349f6` ("ipv6: Clean up ipv6_select_ident() and ip6_fragment()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 20:27:11 -04:00
David S. Miller	eedf4c66d0	Merge branch 'cpsw-cleanups' Richard Cochran says: ==================== cpsw cleanups While working on an out-of-tree customization, I noticed a few minor problems in the cpsw code. This series cleans up the issues I found. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:19:10 -04:00
Richard Cochran	61d22596a7	net: cpsw: remove redundant calls disabling dma interrupts. The function, cpsw_intr_disable, already calls cpdma_ctlr_int_ctrl. There is no need to disable the dma interrupts twice. This patch removes the extra calls. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:19:09 -04:00
Richard Cochran	071f1a960c	net: cpsw: remove redundant calls enabling dma interrupts. The function, cpsw_intr_enable, already calls cpdma_ctlr_int_ctrl. There is no need to enable the dma interrupts twice. This patch removes the extra call. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:19:09 -04:00
Richard Cochran	202c5919e2	net: cpsw: remove two unused global functions The funtions, cpsw_ale_flush and cpsw_ale_set_ageout, have never been used since they were first introduced. This patch removes the dead code. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:19:09 -04:00
Richard Cochran	26fe7eb862	net: cpsw: fix misplaced break statements. Having the breaks too far to the left makes parsing the dense switch/case block unnecessarily harder. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:19:09 -04:00
David S. Miller	d1a0ed795a	Merge branch 'rocker-cleanups' Simon Horman says: ==================== rocker: unused parameter and const cleanups This series provides some minor though verbose cleanup of rocker. The second patch depends on the first though it could be rebased. I had previously asked for v2 to be put on hold while some bugs I had found in the rocker driver were shaken out. That has now happened and the bugs turned out to be unrelated. Accordingly I am reposting the series. * Changes v2 -> v3 - Rebase and update for new variables and parameters that may be const * Changes v1 -> v2 - Found quite a few more variables and parameters to make const ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:17:09 -04:00
Simon Horman	e505464355	rocker: mark parameters and local variables as const Mark parameters and local variables as const where possible. Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:17:08 -04:00
Simon Horman	0985df7390	rocker: remove unused rocker_port parameter from rocker_port_kfree Remove unused rocker_port parameter from rocker_port_kfree. Also remove the rocker_port parameter from callers of rocker_port_kfree where the parameter it is now unused. Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 18:17:08 -04:00
Nicholas Mc Guire	005e8709c6	irda: use msecs_to_jiffies for conversion to jiffies API compliance scanning with coccinelle flagged: ./net/irda/timer.c:63:35-37: use of msecs_to_jiffies probably perferable Converting milliseconds to jiffies by "val * HZ / 1000" technically is not a clean solution as it does not handle all corner cases correctly. By changing the conversion to use msecs_to_jiffies(val) conversion is correct in all cases. Further the () around the arithmetic expression was dropped. Patch was compile tested for x86_64_defconfig + CONFIG_IRDA=m Patch is against 4.1-rc4 (localversion-next is -next-20150522) Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 17:46:21 -04:00
Joe Perches	d07ce242e6	neterion: s2io: Fix kernel doc formatting These two uses seem to have had carriage returns removed. Make these entries like all the others in this file. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 17:45:27 -04:00
Nicholas Mc Guire	bbfe0f37ae	irda: irda-usb: use msecs_to_jiffies for conversions API compliance scanning with coccinelle flagged: Converting milliseconds to jiffies by "val * HZ / 1000" is technically is not a clean solution as it does not handle all corner cases correctly. By changing the conversion to use msecs_to_jiffies(val) conversion is correct in all cases. in the current code: mod_timer(&self->rx_defer_timer, jiffies + (10 * HZ / 1000)); for HZ < 100 (e.g. CONFIG_HZ == 64\|32 in alpha) this effectively results in no delay at all. Patch was compile tested for x86_64_defconfig (implies CONFIG_USB_IRDA=m) Patch is against 4.1-rc4 (localversion-next is -next-20150522) Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 17:39:21 -04:00
Linus Lüssing	6ae4ae8e51	bridge: allow setting hash_max + multicast_router if interface is down Network managers like netifd (used in OpenWRT for instance) try to configure interface options after creation but before setting the interface up. Unfortunately the sysfs / bridge currently only allows to configure the hash_max and multicast_router options when the bridge interface is up. But since br_multicast_init() doesn't start any timers and only sets default values and initializes timers it should be save to reconfigure the default values after that, before things actually get active after the bridge is set up. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 17:28:01 -04:00
Florian Westphal	485fca664d	ipv6: don't increase size when refragmenting forwarded ipv6 skbs since commit `6aafeef03b` ("netfilter: push reasm skb through instead of original frag skbs") we will end up sometimes re-fragmenting skbs that we've reassembled. ipv6 defrag preserves the original skbs using the skb frag list, i.e. as long as the skb frag list is preserved there is no problem since we keep original geometry of fragments intact. However, in the rare case where the frag list is munged or skb is linearized, we might send larger fragments than what we originally received. A router in the path might then send packet-too-big errors even if sender never sent fragments exceeding the reported mtu: mtu 1500 - 1500:1400 - 1400:1280 - 1280 A R1 R2 B 1 - A sends to B, fragment size 1400 2 - R2 sends pkttoobig error for 1280 3 - A sends to B, fragment size 1280 4 - R2 sends pkttoobig error for 1280 again because it sees fragments of size 1400. make sure ip6_fragment always caps MTU at largest packet size seen when defragmented skb is forwarded. Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 17:22:23 -04:00
Shailendra Verma	376cd36dc7	atm:he - Change 1 to true for bool type variable. The variable irq_coalesce is bool type. So assign the value true instead of 1. Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 16:34:35 -04:00
Shailendra Verma	c489dbb189	net:xen-netback - Change 1 to true for bool type variable. The variable separate_tx_rx_irq is bool type so assigning true instead of 1. Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 16:34:35 -04:00
David S. Miller	c1a3403550	Merge branch 'ipv6_route_sharing' Martin KaFai Lau says: ==================== ipv6: Only create RTF_CACHE route after encountering pmtu exception v4 -> v5: - Patch 1 is new. Clean up the ipv6_select_ident() and ip6_fragment(). - Further simplify the newly added rt6_get_pcpu_route(). If there is a 'prev' after cmpxchg, return prev instead of the newly created percpu clone. v3 -> v4: - Patch 8 is new. It keeps track of the DST_NOCACHE routes in a list to handle the iface down/unregister event. - Remove rcu from the newly added rt6i_pcpu variable. It is not needed because it has already been protected by the existing reader/writer lock. - Thanks to 'Julian Anastasov <ja@ssi.bg>' for testing the FLOWI_FLAG_KNOWN_NH patches. v2 -> v3: - Patch 5 to 7 are new. They take care of cases where the daddr in skb is not the one used to do the route look-up. There is also related changes to rt6_nexthop() since v2 which is in patch 2/9. Thanks to 'Julian Anastasov <ja@ssi.bg>' for pointing it out. - Fix a few problems in __ip6_rt_update_pmtu(), like setting the expire and mtu before inserting to the tree and don't do dst_destroy() after tree insertion failure. Also update the rt6i_pmtu in fib6_add_rt2node(). Thanks to 'Steffen Klassert <steffen.klassert@secunet.com>' for pointing it out. - Merge ip6_pmtu_rt_cache_alloc() into ip6_rt_cache_alloc(). v1 -> v2: - Move the /128 route bug fixes to another series (accepted). - Create a function for checking (rt6i_flags & (RTF_NONEXTHOP \| RTF_GATEWAY)). - Avoid shuffling the skb network_header. Instead, change the function signature to take iph instead of skb. - Many Thanks to 'Hannes Frederic Sowa <hannes@stressinduktion.org>' on reviewing v1 and v2 and giving advice. --Martin ~~~ start: v1 compose message (with the out-dated parts removed) ~~~ This series is to avoid creating a RTF_CACHE route whenever we are consulting the fib6 tree with a new destination. Instead, only create RTF_CACHE route when we see a pmtu exception. Out of all ipv6 RTF_CACHE routes that are created, the percentage that has a different mtu is very small. In one of our end-user facing proxy server, only 1k out of 80k RTF_CACHE routes have a smaller MTU. For our DC traffic, there is no mtu exception. A large fib6 tree has problems like, 'ip -6 r show' takes a long time. gc may kick in too often. Also, when a service has restarted and a lot of new TCP conn requests come in, it creates pressure on the tree by inserting a lot of RTF_CACHE in a short time and it currently requires a write lock to do that. The first few patches are prep works to remove assumption that the returned rt is always RTF_CACHE. The patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception' do the lazy RTF_CACHE route creation. The following patches added percpu rt to compensate the performance loss after doing the RTF_CACHE lazy creation. Here is some numbers of the udpflood test. The udpflood has been slightly modified to have a time limit instead of count limit. A /64 via gateway route is used for the test. Each udpflood uses 10000 dst addresses. The dst addresses of different udpflood processes do not overlap with each other. 1 16M 15M 10 61M 61M 20 65M 62M 40 88M 83M ~~~ end: v1 compose message ~~~ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:35 -04:00
Martin KaFai Lau	d52d3997f8	ipv6: Create percpu rt6_info After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst->__refcnt). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:35 -04:00
Martin KaFai Lau	83a09abd1a	ipv6: Break up ip6_rt_copy() This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	8d0b94afdc	ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister This patch keeps track of the DST_NOCACHE routes in a list and replaces its dev with loopback during the iface down/unregister event. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	3da59bd945	ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set This patch always creates RTF_CACHE clone with DST_NOCACHE when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to the fl6->daddr. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Tested-by: Julian Anastasov <ja@ssi.bg> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	48e8aa6e31	ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags The neighbor look-up used to depend on the rt6i_gateway (if there is a gateway) or the rt6i_dst (if it is a RTF_CACHE clone) as the nexthop address. Note that rt6i_dst is set to fl6->daddr for the RTF_CACHE clone where fl6->daddr is the one used to do the route look-up. Now, we only create RTF_CACHE clone after encountering exception. When doing the neighbor look-up with a route that is neither a gateway nor a RTF_CACHE clone, the daddr in skb will be used as the nexthop. In some cases, the daddr in skb is not the one used to do the route look-up. One example is in ip_vs_dr_xmit_v6() where the real nexthop server address is different from the one in the skb. This patch is going to follow the IPv4 approach and ask the ip6_pol_route() callers to set the FLOWI_FLAG_KNOWN_NH properly. In the next patch, ip6_pol_route() will honor the FLOWI_FLAG_KNOWN_NH and create a RTF_CACHE clone. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Tested-by: Julian Anastasov <ja@ssi.bg> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	b197df4f0f	ipv6: Add rt6_get_cookie() function Instead of doing the rt6->rt6i_node check whenever we need to get the route's cookie. Refactor it into rt6_get_cookie(). It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also percpu rt6_info later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:34 -04:00
Martin KaFai Lau	45e4fd2668	ipv6: Only create RTF_CACHE routes after encountering pmtu exception This patch creates a RTF_CACHE routes only after encountering a pmtu exception. After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6 tree, the rt->rt6i_node->fn_sernum is bumped which will fail the ip6_dst_check() and trigger a relookup. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:33 -04:00
Martin KaFai Lau	8b9df26577	ipv6: Combine rt6_alloc_cow and rt6_alloc_clone A prep work for creating RTF_CACHE on exception only. After this patch, the same condition (rt->rt6i_flags & (RTF_NONEXTHOP \| RTF_GATEWAY)) is checked twice. This redundancy will be removed in the later patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:33 -04:00
Martin KaFai Lau	2647a9b070	ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst. Also, rt6i_gateway is always set to the nexthop while the nexthop could be a gateway or the rt6i_dst.addr. After removing the rt6i_dst and rt6i_src dependency in the last patch, we also need to stop the caller from depending on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:33 -04:00
Martin KaFai Lau	fd0273d793	ipv6: Remove external dependency on rt6i_dst and rt6i_src This patch removes the assumptions that the returned rt is always a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the destination and source address. The dst and src can be recovered from the calling site. We may consider to rename (rt6i_dst, rt6i_src) to (rt6i_key_dst, rt6i_key_src) later. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:32 -04:00
Martin KaFai Lau	286c2349f6	ipv6: Clean up ipv6_select_ident() and ip6_fragment() This patch changes the ipv6_select_ident() signature to return a fragment id instead of taking a whole frag_hdr as a param to only set the frag_hdr->identification. It also cleans up ip6_fragment() to obtain the fragment id at the beginning instead of using multiple "if" later to check fragment id has been generated or not. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 13:25:32 -04:00
Hariprasad Shenai	01b6961410	cxgb4: Add PHY firmware support for T420-BT cards Add support for flashing 10GBaseT adapter with BCM 84834 PHY and Aquantia AQ1202 PHY. Updating of the PHY firmware must happen before the INITIALIZE_CMD. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 00:17:24 -04:00
Daniel Borkmann	3b52960266	test_bpf: add more eBPF jump torture cases Add two more eBPF test cases for JITs, i.e. the second one revealed a bug in the x86_64 JIT compiler, where only an int3 filled image from the allocator was emitted and later wrongly set by the compiler as the bpf_func program code since optimization pass boundary was surpassed w/o actually emitting opcodes. Interpreter: [ 45.782892] test_bpf: #242 BPF_MAXINSNS: Very long jump backwards jited:0 11 PASS [ 45.783062] test_bpf: #243 BPF_MAXINSNS: Edge hopping nuthouse jited:0 14705 PASS After x86_64 JIT (fixed): [ 80.495638] test_bpf: #242 BPF_MAXINSNS: Very long jump backwards jited:1 6 PASS [ 80.495957] test_bpf: #243 BPF_MAXINSNS: Edge hopping nuthouse jited:1 17157 PASS Reference: http://thread.gmane.org/gmane.linux.network/364729 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 00:15:18 -04:00
David S. Miller	cff497c870	Merge branch 'amd-xgbe-next' Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2015-05-22 The following patches are included in this driver update series: - Retrieve and set an additional hardware feature setting - Fix the initial mode/speed determination when auto-negotiation is disabled - Add additional netif_dbg support to the driver This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-25 00:13:58 -04:00

1 2 3 4 5 ...

520502 Commits