Commit Graph

480731 Commits

Author SHA1 Message Date
Punit Agrawal
81da57e649 PM / devfreq: exynos: Enable building exynos PPMU as module
Export symbols from the PPMU driver needed to build the exynos bus
driver as a module.

Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2014-09-29 20:22:36 +09:00
Ãrjan Eide
bd7e927705 PM / devfreq: Export helper functions for drivers
These functions are indended for use by drivers and should be available
also when the driver is built as a module.

Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Ãrjan Eide <orjan.eide@arm.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2014-09-29 20:22:22 +09:00
Florian Westphal
db29a9508a netfilter: conntrack: disable generic tracking for known protocols
Given following iptables ruleset:

-P FORWARD DROP
-A FORWARD -m sctp --dport 9 -j ACCEPT
-A FORWARD -p tcp --dport 80 -j ACCEPT
-A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT

One would assume that this allows SCTP on port 9 and TCP on port 80.
Unfortunately, if the SCTP conntrack module is not loaded, this allows
*all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
which we think is a security issue.

This is because on the first SCTP packet on port 9, we create a dummy
"generic l4" conntrack entry without any port information (since
conntrack doesn't know how to extract this information).

All subsequent packets that are unknown will then be in established
state since they will fallback to proto_generic and will match the
'generic' entry.

Our originally proposed version [1] completely disabled generic protocol
tracking, but Jozsef suggests to not track protocols for which a more
suitable helper is available, hence we now mitigate the issue for in
tree known ct protocol helpers only, so that at least NAT and direction
information will still be preserved for others.

 [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html

Joint work with Daniel Borkmann.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 12:17:49 +02:00
Paul Bolle
adad5621f3 PM / devfreq: Remove ARCH_HAS_OPP completely
The Kconfig symbol ARCH_HAS_OPP became redundant in v3.16: commit
049d595a4d ("PM / OPP: Make OPP invisible to users in Kconfig")
removed the only dependency that used it. Setting it had no effect
anymore.

So commit 78c5e0bb14 ("PM / OPP: Remove ARCH_HAS_OPP") removed it. For
some reason that commit did not remove all select statements for that
symbol. These statements are now useless. Remove one from devfreq too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
2014-09-29 18:54:11 +09:00
Adrian Hunter
6a98f1e83a mmc: Fix incorrect warning when setting 0 Hz via debugfs
It is possible to turn off the card clock by setting
the frequency to zero via debugfs e.g.

	echo 0 > /sys/kernel/debug/mmc0/clock

However that produces an incorrect warning that is
designed to warn if the frequency is below the minimum
operating frequency.  So correct the warning.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-09-29 11:41:21 +02:00
Arturo Borrero
9363dc4b59 netfilter: nf_tables: store and dump set policy
We want to know in which cases the user explicitly sets the policy
options. In that case, we also want to dump back the info.

Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-09-29 11:28:03 +02:00
Adrian Hunter
6800754c36 mmc: Fix use of wrong device in mmc_gpiod_free_cd()
mmc_gpiod_free_cd() is paired with mmc_gpiod_request_cd()
and both must reference the same device which is the
actual host controller device not the mmc_host class
device.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-09-29 11:27:51 +02:00
Arnd Bergmann
5fef365b64 mmc: atmel-mci: fix mismatched section on atmci_cleanup_slot
As of 528bc7808f ("mmc: atmel-mci: Release mmc resources on failure in probe"),
the atmci_probe() function calls atmci_cleanup_slot in the failure path.

This causes a new warning whenever the driver is built:

WARNING: drivers/mmc/host/built-in.o(.init.text+0xa04): Section mismatch in reference from the function atmci_probe() to the function .exit.text:atmci_cleanup_slot()
The function __init atmci_probe() references
a function __exit atmci_cleanup_slot().

Gcc correctly warns about this function getting dropped in the link stage
for the built-in case, which would cause undefined behavior when this error
path is hit. The solution is to simply drop the __exit annotation.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 528bc7808f ("mmc: atmel-mci: Release mmc resources on failure in probe")
Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-09-29 11:27:32 +02:00
Peter Ujfalusi
04ed831f22 clk: ti: dra7-atl-clock: Mark the device as pm_runtime_irq_safe
It is safe to call the pm sync calls in interrupt context in this driver.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
2014-09-29 11:51:14 +03:00
Behan Webster
e8627a9ec3 clk: ti: LLVMLinux: Move __init outside of type definition
As written, the __init for ti_clk_get_div_table is in the middle of the return
type.

The gcc documentation indicates that section attributes should be added to the
end of the function declaration:

  extern void foobar (void) __attribute__ ((section ("bar")));

However gcc seems to be very permissive with where attributes can be placed.
clang on the other hand isn't so permissive, and fails if you put the section
definition in the middle of the return type:

drivers/clk/ti/divider.c:298:28: error: expected ';' after struct
static struct clk_div_table
                           ^
                           ;
drivers/clk/ti/divider.c:298:1: warning: 'static' ignored on this
      declaration [-Wmissing-declarations]
static struct clk_div_table
^
drivers/clk/ti/divider.c:299:9: error: type specifier missing,
      defaults to 'int' [-Werror,-Wimplicit-int]
__init *ti_clk_get_div_table(struct device_node *node)
~~~~~~  ^
drivers/clk/ti/divider.c:345:9: warning: incompatible pointer types
      returning 'struct clk_div_table *' from a function with result type 'int *' [-Wincompatible-pointer-types]
        return table;
               ^~~~~
drivers/clk/ti/divider.c:419:9: warning: incompatible pointer types
      assigning to 'const struct clk_div_table *' from 'int *' [-Wincompatible-pointer-types]
        *table = ti_clk_get_div_table(node);
               ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
3 warnings and 2 errors generated.

By convention, most of the kernel code puts section attributes between the
return type and function name. In the case where the return type is a pointer,
it's important to place the '*' on left of the __init.

This updated code works for both gcc and clang.

Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
2014-09-29 11:51:14 +03:00
Sebastian Andrzej Siewior
319f1276f9 clk: ti: consider the fact that of_clk_get() might return an error
I "forgot" to update the dtb and the kernel crashed:
|Unable to handle kernel NULL pointer dereference at virtual address 0000002e
|PC is at __clk_get_flags+0x4/0xc
|LR is at ti_dt_clockdomains_setup+0x70/0xe8

because I did not have the clock nodes. of_clk_get() returns an error
pointer which is not checked here.

Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Tero Kristo <t-kristo@ti.com>
2014-09-29 11:51:13 +03:00
Tero Kristo
73b5d5f711 clk: ti: dra7-atl-clock: fix a memory leak
of_clk_add_provider makes an internal copy of the parent_names property
while its called, thus it is no longer needed after this call and can
be freed.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
2014-09-29 11:51:13 +03:00
Tero Kristo
c08ee14cc6 clk: ti: change clock init to use generic of_clk_init
Previously, the TI clock driver initialized all the clocks hierarchically
under each separate clock provider node. Now, each clock that requires
IO access will instead check their parent node to find out which IO range
to use.

This patch allows the TI clock driver to use a few new features provided
by the generic of_clk_init, and also allows registration of clock nodes
outside the clock hierarchy (for example, any external clocks.)

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Cc: Mike Turquette <mturquette@linaro.org>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Ujfalusi <peter.ujfalusi@ti.com>
Cc: Jyri Sarha <jsarha@ti.com>
Cc: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Tony Lindgren <tony@atomide.com>
2014-09-29 11:51:13 +03:00
Jukka Rissanen
59790aa287 Bluetooth: 6lowpan: Make sure skb exists before accessing it
We need to make sure that the saved skb exists when
resuming or suspending a CoC channel. This can happen if
initial credits is 0 when channel is connected.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-09-29 10:10:02 +02:00
David S. Miller
842abe08aa Merge branch 'qca7000_spi'
Stefan Wahren says:

====================
add Qualcomm QCA7000 ethernet driver

This patch series adds support for the Qualcomm QCA7000 Homeplug GreenPHY.
The QCA7000 is serial-to-powerline bridge with two interfaces: UART and SPI.
These patches handles only the last one, with an Ethernet over SPI protocol
driver.

This driver based on the Qualcomm code [1], but contains a lot of changes
since last year:

* devicetree support
* DebugFS support
* ethtool support
* better error handling
* performance improvements
* code cleanup
* some bugfixes

The code has been tested only on Freescale i.MX28 boards, but should work
on other platforms.

[1] - https://github.com/IoE/qca7000

Changes in V3:
- Use ether_addr_copy instead of memcpy
- Remove qcaspi_set_mac_address
- Improve DT parsing
- replace OF_GPIO dependancy with OF
- fix compile error caused by SET_ETHTOOL_OPS
- fix possible endless loop when spi read fails
- fix DT documentation
- fix coding style
- fix sparse warnings

Changes in V2:
- replace in DT the SPI intr GPIO with pure interrupt
- make legacy mode a boolean DT property and remove it as module parameter
- make burst length a module parameter instead of DT property
- make pluggable a module parameter instead of DT property
- improve DT documentation
- replace debugFS register dump with ethtool function
- replace debugFS stats with ethtool function
- implement function to get ring parameter via ethtool
- implement function to set TX ring count via ethtool
- fix TX ring state in debugFS
- optimize tx ring flush
- add byte limit for TX ring to avoid bufferbloat
- fix TX queue full and write buffer miss counter
- fix SPI clk speed module parameter
- fix possible packet loss
- fix possible race during transmit
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:24:00 -04:00
Stefan Wahren
291ab06ecf net: qualcomm: new Ethernet over SPI driver for QCA7000
This patch adds the Ethernet over SPI driver for the
Qualcomm QCA7000 HomePlug GreenPHY.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:23:52 -04:00
Stefan Wahren
7d50df8f72 Documentation: add Device tree bindings for QCA7000
This patch adds the Device tree bindings for the
Ethernet over SPI protocol driver of the Qualcomm
QCA7000 HomePlug GreenPHY.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:23:52 -04:00
David S. Miller
a11238ec28 Merge branch 'dctcp'
Daniel Borkmann says:

====================
net: tcp: DCTCP congestion control algorithm

This patch series adds support for the DataCenter TCP (DCTCP) congestion
control algorithm. Please see individual patches for the details.

The last patch adds DCTCP as a congestion control module, and previous
ones add needed infrastructure to extend the congestion control framework.

Joint work between Florian Westphal, Daniel Borkmann and Glenn Judd.

v3 -> v2:
 - No changes anywhere, just a resend as requested by Dave
 - Added Stephen's ACK
v1 -> v2:
 - Rebased to latest net-next
 - Addressed Eric's feedback, thanks!
  - Update stale comment wrt. DCTCP ECN usage
  - Don't call INET_ECN_xmit for every packet
 - Add dctcp ss/inetdiag support to expose internal stats to userspace
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:17 -04:00
Daniel Borkmann
e3118e8359 net: tcp: add DCTCP congestion control algorithm
This work adds the DataCenter TCP (DCTCP) congestion control
algorithm [1], which has been first published at SIGCOMM 2010 [2],
resp. follow-up analysis at SIGMETRICS 2011 [3] (and also, more
recently as an informational IETF draft available at [4]).

DCTCP is an enhancement to the TCP congestion control algorithm for
data center networks. Typical data center workloads are i.e.
i) partition/aggregate (queries; bursty, delay sensitive), ii) short
messages e.g. 50KB-1MB (for coordination and control state; delay
sensitive), and iii) large flows e.g. 1MB-100MB (data update;
throughput sensitive). DCTCP has therefore been designed for such
environments to provide/achieve the following three requirements:

  * High burst tolerance (incast due to partition/aggregate)
  * Low latency (short flows, queries)
  * High throughput (continuous data updates, large file
    transfers) with commodity, shallow buffered switches

The basic idea of its design consists of two fundamentals: i) on the
switch side, packets are being marked when its internal queue
length > threshold K (K is chosen so that a large enough headroom
for marked traffic is still available in the switch queue); ii) the
sender/host side maintains a moving average of the fraction of marked
packets, so each RTT, F is being updated as follows:

 F := X / Y, where X is # of marked ACKs, Y is total # of ACKs
 alpha := (1 - g) * alpha + g * F, where g is a smoothing constant

The resulting alpha (iow: probability that switch queue is congested)
is then being used in order to adaptively decrease the congestion
window W:

 W := (1 - (alpha / 2)) * W

The means for receiving marked packets resp. marking them on switch
side in DCTCP is the use of ECN.

RFC3168 describes a mechanism for using Explicit Congestion Notification
from the switch for early detection of congestion, rather than waiting
for segment loss to occur.

However, this method only detects the presence of congestion, not
the *extent*. In the presence of mild congestion, it reduces the TCP
congestion window too aggressively and unnecessarily affects the
throughput of long flows [4].

DCTCP, as mentioned, enhances Explicit Congestion Notification (ECN)
processing to estimate the fraction of bytes that encounter congestion,
rather than simply detecting that some congestion has occurred. DCTCP
then scales the TCP congestion window based on this estimate [4],
thus it can derive multibit feedback from the information present in
the single-bit sequence of marks in its control law. And thus act in
*proportion* to the extent of congestion, not its *presence*.

Switches therefore set the Congestion Experienced (CE) codepoint in
packets when internal queue lengths exceed threshold K. Resulting,
DCTCP delivers the same or better throughput than normal TCP, while
using 90% less buffer space.

It was found in [2] that DCTCP enables the applications to handle 10x
the current background traffic, without impacting foreground traffic.
Moreover, a 10x increase in foreground traffic did not cause any
timeouts, and thus largely eliminates TCP incast collapse problems.

The algorithm itself has already seen deployments in large production
data centers since then.

We did a long-term stress-test and analysis in a data center, short
summary of our TCP incast tests with iperf compared to cubic:

This test measured DCTCP throughput and latency and compared it with
CUBIC throughput and latency for an incast scenario. In this test, 19
senders sent at maximum rate to a single receiver. The receiver simply
ran iperf -s.

The senders ran iperf -c <receiver> -t 30. All senders started
simultaneously (using local clocks synchronized by ntp).

This test was repeated multiple times. Below shows the results from a
single test. Other tests are similar. (DCTCP results were extremely
consistent, CUBIC results show some variance induced by the TCP timeouts
that CUBIC encountered.)

For this test, we report statistics on the number of TCP timeouts,
flow throughput, and traffic latency.

1) Timeouts (total over all flows, and per flow summaries):

            CUBIC            DCTCP
  Total     3227             25
  Mean       169.842          1.316
  Median     183              1
  Max        207              5
  Min        123              0
  Stddev      28.991          1.600

Timeout data is taken by measuring the net change in netstat -s
"other TCP timeouts" reported. As a result, the timeout measurements
above are not restricted to the test traffic, and we believe that it
is likely that all of the "DCTCP timeouts" are actually timeouts for
non-test traffic. We report them nevertheless. CUBIC will also include
some non-test timeouts, but they are drawfed by bona fide test traffic
timeouts for CUBIC. Clearly DCTCP does an excellent job of preventing
TCP timeouts. DCTCP reduces timeouts by at least two orders of
magnitude and may well have eliminated them in this scenario.

2) Throughput (per flow in Mbps):

            CUBIC            DCTCP
  Mean      521.684          521.895
  Median    464              523
  Max       776              527
  Min       403              519
  Stddev    105.891            2.601
  Fairness    0.962            0.999

Throughput data was simply the average throughput for each flow
reported by iperf. By avoiding TCP timeouts, DCTCP is able to
achieve much better per-flow results. In CUBIC, many flows
experience TCP timeouts which makes flow throughput unpredictable and
unfair. DCTCP, on the other hand, provides very clean predictable
throughput without incurring TCP timeouts. Thus, the standard deviation
of CUBIC throughput is dramatically higher than the standard deviation
of DCTCP throughput.

Mean throughput is nearly identical because even though cubic flows
suffer TCP timeouts, other flows will step in and fill the unused
bandwidth. Note that this test is something of a best case scenario
for incast under CUBIC: it allows other flows to fill in for flows
experiencing a timeout. Under situations where the receiver is issuing
requests and then waiting for all flows to complete, flows cannot fill
in for timed out flows and throughput will drop dramatically.

3) Latency (in ms):

            CUBIC            DCTCP
  Mean      4.0088           0.04219
  Median    4.055            0.0395
  Max       4.2              0.085
  Min       3.32             0.028
  Stddev    0.1666           0.01064

Latency for each protocol was computed by running "ping -i 0.2
<receiver>" from a single sender to the receiver during the incast
test. For DCTCP, "ping -Q 0x6 -i 0.2 <receiver>" was used to ensure
that traffic traversed the DCTCP queue and was not dropped when the
queue size was greater than the marking threshold. The summary
statistics above are over all ping metrics measured between the single
sender, receiver pair.

The latency results for this test show a dramatic difference between
CUBIC and DCTCP. CUBIC intentionally overflows the switch buffer
which incurs the maximum queue latency (more buffer memory will lead
to high latency.) DCTCP, on the other hand, deliberately attempts to
keep queue occupancy low. The result is a two orders of magnitude
reduction of latency with DCTCP - even with a switch with relatively
little RAM. Switches with larger amounts of RAM will incur increasing
amounts of latency for CUBIC, but not for DCTCP.

4) Convergence and stability test:

This test measured the time that DCTCP took to fairly redistribute
bandwidth when a new flow commences. It also measured DCTCP's ability
to remain stable at a fair bandwidth distribution. DCTCP is compared
with CUBIC for this test.

At the commencement of this test, a single flow is sending at maximum
rate (near 10 Gbps) to a single receiver. One second after that first
flow commences, a new flow from a distinct server begins sending to
the same receiver as the first flow. After the second flow has sent
data for 10 seconds, the second flow is terminated. The first flow
sends for an additional second. Ideally, the bandwidth would be evenly
shared as soon as the second flow starts, and recover as soon as it
stops.

The results of this test are shown below. Note that the flow bandwidth
for the two flows was measured near the same time, but not
simultaneously.

DCTCP performs nearly perfectly within the measurement limitations
of this test: bandwidth is quickly distributed fairly between the two
flows, remains stable throughout the duration of the test, and
recovers quickly. CUBIC, in contrast, is slow to divide the bandwidth
fairly, and has trouble remaining stable.

  CUBIC                      DCTCP

  Seconds  Flow 1  Flow 2    Seconds  Flow 1  Flow 2
   0       9.93    0          0       9.92    0
   0.5     9.87    0          0.5     9.86    0
   1       8.73    2.25       1       6.46    4.88
   1.5     7.29    2.8        1.5     4.9     4.99
   2       6.96    3.1        2       4.92    4.94
   2.5     6.67    3.34       2.5     4.93    5
   3       6.39    3.57       3       4.92    4.99
   3.5     6.24    3.75       3.5     4.94    4.74
   4       6       3.94       4       5.34    4.71
   4.5     5.88    4.09       4.5     4.99    4.97
   5       5.27    4.98       5       4.83    5.01
   5.5     4.93    5.04       5.5     4.89    4.99
   6       4.9     4.99       6       4.92    5.04
   6.5     4.93    5.1        6.5     4.91    4.97
   7       4.28    5.8        7       4.97    4.97
   7.5     4.62    4.91       7.5     4.99    4.82
   8       5.05    4.45       8       5.16    4.76
   8.5     5.93    4.09       8.5     4.94    4.98
   9       5.73    4.2        9       4.92    5.02
   9.5     5.62    4.32       9.5     4.87    5.03
  10       6.12    3.2       10       4.91    5.01
  10.5     6.91    3.11      10.5     4.87    5.04
  11       8.48    0         11       8.49    4.94
  11.5     9.87    0         11.5     9.9     0

SYN/ACK ECT test:

This test demonstrates the importance of ECT on SYN and SYN-ACK packets
by measuring the connection probability in the presence of competing
flows for a DCTCP connection attempt *without* ECT in the SYN packet.
The test was repeated five times for each number of competing flows.

              Competing Flows  1 |    2 |    4 |    8 |   16
                               ------------------------------
Mean Connection Probability    1 | 0.67 | 0.45 | 0.28 |    0
Median Connection Probability  1 | 0.65 | 0.45 | 0.25 |    0

As the number of competing flows moves beyond 1, the connection
probability drops rapidly.

Enabling DCTCP with this patch requires the following steps:

DCTCP must be running both on the sender and receiver side in your
data center, i.e.:

  sysctl -w net.ipv4.tcp_congestion_control=dctcp

Also, ECN functionality must be enabled on all switches in your
data center for DCTCP to work. The default ECN marking threshold (K)
heuristic on the switch for DCTCP is e.g., 20 packets (30KB) at
1Gbps, and 65 packets (~100KB) at 10Gbps (K > 1/7 * C * RTT, [4]).

In above tests, for each switch port, traffic was segregated into two
queues. For any packet with a DSCP of 0x01 - or equivalently a TOS of
0x04 - the packet was placed into the DCTCP queue. All other packets
were placed into the default drop-tail queue. For the DCTCP queue,
RED/ECN marking was enabled, here, with a marking threshold of 75 KB.
More details however, we refer you to the paper [2] under section 3).

There are no code changes required to applications running in user
space. DCTCP has been implemented in full *isolation* of the rest of
the TCP code as its own congestion control module, so that it can run
without a need to expose code to the core of the TCP stack, and thus
nothing changes for non-DCTCP users.

Changes in the CA framework code are minimal, and DCTCP algorithm
operates on mechanisms that are already available in most Silicon.
The gain (dctcp_shift_g) is currently a fixed constant (1/16) from
the paper, but we leave the option that it can be chosen carefully
to a different value by the user.

In case DCTCP is being used and ECN support on peer site is off,
DCTCP falls back after 3WHS to operate in normal TCP Reno mode.

ss {-4,-6} -t -i diag interface:

  ... dctcp wscale:7,7 rto:203 rtt:2.349/0.026 mss:1448 cwnd:2054
  ssthresh:1102 ce_state 0 alpha 15 ab_ecn 0 ab_tot 735584
  send 10129.2Mbps pacing_rate 20254.1Mbps unacked:1822 retrans:0/15
  reordering:101 rcv_space:29200

  ... dctcp-reno wscale:7,7 rto:201 rtt:0.711/1.327 ato:40 mss:1448
  cwnd:10 ssthresh:1102 fallback_mode send 162.9Mbps pacing_rate
  325.5Mbps rcv_rtt:1.5 rcv_space:29200

More information about DCTCP can be found in [1-4].

  [1] http://simula.stanford.edu/~alizade/Site/DCTCP.html
  [2] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp-final.pdf
  [3] http://simula.stanford.edu/~alizade/Site/DCTCP_files/dctcp_analysis-full.pdf
  [4] http://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00

Joint work with Florian Westphal and Glenn Judd.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
9890092e46 net: tcp: more detailed ACK events and events for CE marked packets
DataCenter TCP (DCTCP) determines cwnd growth based on ECN information
and ACK properties, e.g. ACK that updates window is treated differently
than DUPACK.

Also DCTCP needs information whether ACK was delayed ACK. Furthermore,
DCTCP also implements a CE state machine that keeps track of CE markings
of incoming packets.

Therefore, extend the congestion control framework to provide these
event types, so that DCTCP can be properly implemented as a normal
congestion algorithm module outside of the core stack.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
7354c8c389 net: tcp: split ack slow/fast events from cwnd_event
The congestion control ops "cwnd_event" currently supports
CA_EVENT_FAST_ACK and CA_EVENT_SLOW_ACK events (among others).
Both FAST and SLOW_ACK are only used by Westwood congestion
control algorithm.

This removes both flags from cwnd_event and adds a new
in_ack_event callback for this. The goal is to be able to
provide more detailed information about ACKs, such as whether
ECE flag was set, or whether the ACK resulted in a window
update.

It is required for DataCenter TCP (DCTCP) congestion control
algorithm as it makes a different choice depending on ECE being
set or not.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Daniel Borkmann
30e502a34b net: tcp: add flag for ca to indicate that ECN is required
This patch adds a flag to TCP congestion algorithms that allows
for requesting to mark IPv4/IPv6 sockets with transport as ECN
capable, that is, ECT(0), when required by a congestion algorithm.

It is currently used and needed in DataCenter TCP (DCTCP), as it
requires both peers to assert ECT on all IP packets sent - it
uses ECN feedback (i.e. CE, Congestion Encountered information)
from switches inside the data center to derive feedback to the
end hosts.

Therefore, simply add a new flag to icsk_ca_ops. Note that DCTCP's
algorithm/behaviour slightly diverges from RFC3168, therefore this
is only (!) enabled iff the assigned congestion control ops module
has requested this. By that, we can tightly couple this logic really
only to the provided congestion control ops.

Joint work with Florian Westphal and Glenn Judd.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
Florian Westphal
55d8694fa8 net: tcp: assign tcp cong_ops when tcp sk is created
Split assignment and initialization from one into two functions.

This is required by followup patches that add Datacenter TCP
(DCTCP) congestion control algorithm - we need to be able to
determine if the connection is moderated by DCTCP before the
3WHS has finished.

As we walk the available congestion control list during the
assignment, we are always guaranteed to have Reno present as
it's fixed compiled-in. Therefore, since we're doing the
early assignment, we don't have a real use for the Reno alias
tcp_init_congestion_ops anymore and can thus remove it.

Actual usage of the congestion control operations are being
made after the 3WHS has finished, in some cases however we
can access get_info() via diag if implemented, therefore we
need to zero out the private area for those modules.

Joint work with Daniel Borkmann and Glenn Judd.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:13:10 -04:00
John Fastabend
53dfd50181 net: sched: cls_rcvp, complete rcu conversion
This completes the cls_rsvp conversion to RCU safe
copy, update semantics.

As a result all cases of tcf_exts_change occur on
empty lists now.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:04:55 -04:00
Eric Dumazet
3d9a0d2f82 dql: dql_queued() should write first to reduce bus transactions
While doing high throughput test on a BQL enabled NIC,
I found a very high cost in ndo_start_xmit() when accessing BQL data.

It turned out the problem was caused by compiler trying to be
smart, but involving a bad MESI transaction :

  0.05 │  mov    0xc0(%rax),%edi    // LOAD dql->num_queued
  0.48 │  mov    %edx,0xc8(%rax)    // STORE dql->last_obj_cnt = count
 58.23 │  add    %edx,%edi
  0.58 │  cmp    %edi,0xc4(%rax)
  0.76 │  mov    %edi,0xc0(%rax)    // STORE dql->num_queued += count
  0.72 │  js     bd8

I got an incredible 10 % gain [1] by making sure cpu do not attempt
to get the cache line in Shared mode, but directly requests for
ownership.

New code :
	mov    %edx,0xc8(%rax)  // STORE dql->last_obj_cnt = count
	add    %edx,0xc0(%rax)  // RMW   dql->num_queued += count
	mov    0xc4(%rax),%ecx  // LOAD dql->adj_limit
	mov    0xc0(%rax),%edx  // LOAD dql->num_queued
	cmp    %edx,%ecx

The TX completion was running from another cpu, with high interrupts
rate.

Note that I am using barrier() as a soft hint, as mb() here could be
too heavy cost.

[1] This was a netperf TCP_STREAM with TSO disabled, but GSO enabled.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-29 00:04:55 -04:00
Aybuke Ozdemir
11889e817e staging: rtl8192u: ieee80211: Converted symbol to static.
This patch fixes this sparse warning:
drivers/staging/rtl8192u/ieee80211/ieee80211_crypt_ccmp.c:60:6: warning:
symbol 'ieee80211_ccmp_aes_encrypt' was not declared. Should it be static?

Signed-off-by: Aybuke Ozdemir <aybuke.147@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:39:27 -04:00
Georgiana Chelu
95c0bab252 staging: rtl8192u: Add blank line after variable declarations
Fix the following checkpatch.pl warning:
WARNING: Missing a blank line after declarations

Signed-off-by: Georgiana Chelu <georgiana.chelu93@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:39:27 -04:00
Mahati Chamarthy
f1cd53ecca Staging: rtl8192e: Remove unused variable
This patch removes a variable which has never been used. The following
Coccinelle semantic patch was used to make this transformation:

@@
type T;
identifier i;
constant C;
@@

- T i;
  <... when != i
- i = C;
  ...>

Signed-off-by: Mahati Chamarthy <mahati.chamarthy@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:39:27 -04:00
Esra Altintas
806031dbf1 staging: rtl8192u: Fixed trailing whitespace in copying
The following patch fixes the checkpatch.pl error:
ERROR: trailing whitespace

Signed-off-by: Esra Altintas <es.altintas@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:39:27 -04:00
Roxana Blaj
020af9a547 staging: rtl8192u: remove space before close parenthesis ")"
This fixes the checkpatch.pl error:
ERROR: space prohibited before that close parenthesis ')'

Signed-off-by: Roxana Blaj <roxanagabriela10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:39:27 -04:00
Roxana Blaj
bbfd888d84 staging: rtl8192u: remove space before semicolon
This fixes the checkpatch.pl warning:
WARNING: space prohibited before semicolon

Signed-off-by: Roxana Blaj <roxanagabriela10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:39:26 -04:00
Roxana Blaj
32b116edae staging: rtl8192u: add space after close brace '}'
This fixes the checkpatch.pl error:
ERROR: space required after that close brace '}'

Signed-off-by: Roxana Blaj <roxanagabriela10@gmail.com>
Acked-by: Daniel Baluta <daniel.baluta@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:36:24 -04:00
Dilek Uzulmez
8ec2f8f0b9 staging: slicoss: Fix void function return statements style warning
This fixes "void function return statements are not generally useful"
checkpatch.pl warning slicoss.c

Signed-off-by: Dilek Uzulmez <dilekuzulmez@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:36:24 -04:00
Roxana Blaj
5ec1aeb33f staging: rtl8192u: add space before the open parenthesis '('
This fixes the checkpatch.pl error:
ERROR: space required before the open parenthesis '('

Signed-off-by: Roxana Blaj <roxanagabriela10@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:36:24 -04:00
Mahati Chamarthy
6ab8609b92 Staging: rtl8192e: rtl8192e: Remove assigned unused variable
This patch removes an initialized variable which has never been used.
The following Coccinelle semantic patch was used to make this transformation:

@e@
identifier i;
position p;
type T;
@@

extern T i@p;

@@
type T;
identifier i;
constant C;
position p != e.p;
@@

- T i@p;
  <+... when != i
- i = C;
  ...+>

The braces around if and else which become unnecessary after the transformation
were also removed.

Signed-off-by: Mahati Chamarthy <mahati.chamarthy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:36:24 -04:00
Melike Yurtoglu
5a2da4abe3 staging: octeon: Fix missing blank line warning.
Fixes "Missing a blank line after declarations" checkpatch.pl warning in
ethernet-rgmii.c

Signed-off-by: Melike Yurtoglu <aysemelikeyurtoglu@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:30:54 -04:00
Dilek Uzulmez
aa66d88d0b staging: octeon: Removed unnecessary else expression.
This patch fixes "else is not generally useful after a break or return"
checkpatch.pl warning ethernet-util.h

Signed-off-by: Dilek Uzulmez <dilekuzulmez@gmail.com>
Acked-by: Daniel Baluta <daniel.baluta@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:30:54 -04:00
Esra Altintas
99f8dbc564 staging: octeon: Fix line 80 characters in ethernet.c
The following patch fixes the checpatch.pl warning:
WARNING: line over 80 characters

Signed-off-by: Esra Altintas <es.altintas@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:30:54 -04:00
Gulsah Kose
2e98f76c8f drivers: staging: gdm72xx: Removed unnecessary braces.
This patch fixes "braces {} are not necessary for single statement
blocks" checkpatch.pl warning in netlink_k.c

Signed-off-by: Gulsah Kose <gulsah.1004@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:27:35 -04:00
Gulsah Kose
93f509a881 drivers: staging: gdm72xx: Removed unnecessary else expression.
This patch fixes "else is not generally useful after a break or return"
checkpatch.pl warning in netlink_k.c

Signed-off-by: Gulsah Kose <gulsah.1004@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:27:35 -04:00
Gulsah Kose
d1fed02872 staging: gdm724x: Removed unnecessary else expression.
This patch fixes "else is not generally useful after a break or return"
checkpatch.pl warning in gdm_usb.c

Signed-off-by: Gulsah Kose <gulsah.1004@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:27:34 -04:00
Aybuke Ozdemir
a256779f7d staging: ft1000: ft1000-pcmcia: Add require space after that ','
This patch fixes checkpatch.pl error in file ft1000_hw.c
ERROR: space required after that ';' (ctx:VxV)

Signed-off-by: Aybuke Ozdemir <aybuke.147@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:26:42 -04:00
Gulsah Kose
56a28395b7 staging: ft1000: ft1000-pcmcia: Used "linux" instead of "asm".
This patch fixes "Use #include <linux/uaccess.h> instead of
<asm/uaccess.h" checkpatch.pl warning in ft1000_dnld.c

Signed-off-by: Gulsah Kose <gulsah.1004@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:26:42 -04:00
Gulsah Kose
96bcbea0d5 staging: ft1000: ft1000-pcmcia: Removed unnecessary else expression.
This patch fixes "else is not generally useful after a break or return"
checkpatch.pl warning in ft1000_dnld.c

Signed-off-by: Gulsah Kose <gulsah.1004@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:26:42 -04:00
Aybuke Ozdemir
2effbbdd95 Staging: unisys: common-spar: include: channels: Remove unnecessary semicolon
This fixes the checkpatch.pl warning:
WARNING: macros should not use a trailing semicolon.

Signed-off-by: Aybuke Ozdemir <aybuke.147@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:24:47 -04:00
Tapasweni Pathak
fd7dcd3997 staging: vt6655: Merge three lines into one
This patch merges three lines into one, removing unecessary
if check.

Signed-off-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Reviewed-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:22:21 -04:00
Tapasweni Pathak
f0cffbfe8c staging: vt6656: Merge three lines into one
This patch merges three lines into one, removing if branch

Signed-off-by: Tapasweni Pathak <tapaswenipathak@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:22:21 -04:00
Gulsah Kose
a793b2d817 staging: vt6655: Type conversion was made.
This patch fixes this sparse warning:
drivers/staging/vt6655/device_main.c:385:40: warning: mixing different enum types
drivers/staging/vt6655/device_main.c:385:40:     int enum _VIA_BB_TYPE versus
drivers/staging/vt6655/device_main.c:385:40:     int enum _VIA_PKT_TYPE

Signed-off-by: Gulsah Kose <gulsah.1004@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:22:21 -04:00
Aybuke Ozdemir
9e23c1b8c3 Staging: vt6655: Add require space before that '('
This patch fixes checkpatch.pl error in file device_main.c
ERROR: space required before the open parenthesis '('

Signed-off-by: Aybuke Ozdemir <aybuke.147@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:22:21 -04:00
Aybuke Ozdemir
6e61b441ce Staging: vt6655: Add require space after that ','
This patch fixes checkpatch.pl error in file device_main.c
ERROR: space required after that ';' (ctx:VxV)

Signed-off-by: Aybuke Ozdemir <aybuke.147@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-09-28 23:22:21 -04:00