Currently the recv ring low water mark is 1/4 the depth. Performance
measurements show that this limits iWARP throughput by flow controlling
the rds-stress senders. Setting it to 1/2 seems to max the T3
performance. I tried even higher levels but that didn't help and it
started to increase the rds thread cpu utilization.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a 64bit value that needs to be set atomically.
This is easy and quick on all 64bit archs, and can also be done
on x86/32 with set_64bit() (uses cmpxchg8b). However other
32b archs don't have this.
I actually changed this to the current state in preparation for
mainline because the old way (using a spinlock on 32b) resulted in
unsightly #ifdefs in the code. But obviously, being correct takes
precedence.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a bug where a connection was unexpectedly
not on *any* list while being destroyed. It also
cleans up some code duplication and regularizes some
function names.
* Grab appropriate lock in conn_free() and explain in comment
* Ensure via locking that a conn is never not on either
a dev's list or the nodev list
* Add rds_xx_remove_conn() to match rds_xx_add_conn()
* Make rds_xx_add_conn() return void
* Rename remove_{,nodev_}conns() to
destroy_{,nodev_}conns() and unify their implementation
in a helper function
* Document lock ordering as nodev conn_lock before
dev_conn_lock
Reported-by: Yosef Etigin <yosefe@voltaire.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rs_send_drop_to() is called during socket close. If it takes
m_rs_lock without disabling interrupts, then
rds_send_remove_from_sock() can run from the rx completion
handler and thus deadlock.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As reported by Stephen Rothwell.
> Today's linux-next build (powerpc allyesconfig) failed like this:
>
> net/rds/cong.c: In function 'rds_cong_set_bit':
> net/rds/cong.c:284: error: implicit declaration of function 'generic___set_le_bit'
> net/rds/cong.c: In function 'rds_cong_clear_bit':
> net/rds/cong.c:298: error: implicit declaration of function 'generic___clear_le_bit'
> net/rds/cong.c: In function 'rds_cong_test_bit':
> net/rds/cong.c:309: error: implicit declaration of function 'generic_test_le_bit'
Signed-off-by: David S. Miller <davem@davemloft.net>
Add RDS Kconfig and Makefile, and modify net/'s to add
us to the build.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although most of IB and iWARP are separated from each other,
there is some common code required to handle their shared
CM listen port. This code listens for CM events and then
dispatches the event to the appropriate transport, either
IB or iWARP.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for iWARP NICs is implemented as a separate
RDS transport from IB. The code, however, is very
similar to IB (it was forked, basically.) so let's keep
it in one changeset.
The reason for this duplicationis that despite its similarity
to IB, there are a number of places where it has different
semantics. iwarp zcopy support is still under development,
and giving it its own sandbox ensures that IB code isn't
disrupted while iwarp changes. Over time these transports
will re-converge.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Header parsing, ring refill. It puts the incoming data into an
rds_incoming struct, which is passed up to rds-core.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Specific to IB is a credits-based flow control mechanism, in
addition to the expected usage of the IB API to package outgoing
data into work requests.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Registers as an RDS transport and an IB client, and uses IB CM
API to allocate ids, queue pairs, and the rest of that fun stuff.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some transports may support RDMA features. This handles the
non-transport-specific parts, like pinning user pages and
tracking mapped regions.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upon receiving a datagram from the transport, RDS parses the
headers and potentially queues an ACK.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parsing of newly-received RDS message headers (including ext.
headers) and copy-to/from-user routines.
page.c implements a per-cpu page remainder cache, to reduce the
number of allocations needed for small datagrams.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS exposes a few tunable parameters via sysctls.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A simple rds transport to handle loopback connections.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While arguably the fact that the underlying transport needs a
connection to convey RDS's datagrame reliably is not important
to rds proper, the transports implemented so far (IB and TCP)
have both been connection-oriented, and so the connection
state machine-related code is in the common rds code.
This patch also includes several work items, to handle connecting,
sending, receiving, and shutdown.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS currently generates a lot of stats that are accessible via
the rds-info utility. This code implements the support for this.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS supports multiple transports. While this initial submission
only supports Infiniband transport, this abstraction allows others
to be added. We're working on an iWARP transport, and also see
UDP over DCB as another possibility.
This code handles transport registration.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS handles per-socket congestion by updating peers with a complete
congestion map (8KB). This code keeps track of these maps for itself
and ones received from peers.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RDS's main data structure definitions and exported functions.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the RDS (Reliable Datagram Sockets) interface.
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>