Commit Graph

531 Commits

Author SHA1 Message Date
Kinglong Mee
47f72efa8f NFSD: Free backchannel xprt in bc_destroy
Backchannel xprt isn't freed right now.
Free it in bc_destroy, and put the reference of THIS_MODULE.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30 10:47:35 -04:00
Kinglong Mee
315f3812db SUNRPC: fix memory leak of peer addresses in XPRT
Creating xprt failed after xs_format_peer_addresses,
sunrpc must free those memory of peer addresses in xprt.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:23:39 -04:00
Trond Myklebust
2ea24497a1 SUNRPC: RPC callbacks may be split across several TCP segments
Since TCP is a stream protocol, our callback read code needs to take into
account the fact that RPC callbacks are not always confined to a single
TCP segment.
This patch adds support for multiple TCP segments by ensuring that we
only remove the rpc_rqst structure from the 'free backchannel requests'
list once the data has been completely received. We rely on the fact
that TCP data is ordered for the duration of the connection.

Reported-by: shaobingqing <shaobingqing@bwstor.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-11 14:01:20 -05:00
Trond Myklebust
06ea0bfe6e SUNRPC: Fix races in xs_nospace()
When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.

Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52ee1 (SUNRPC: Don't start the retransmission timer...)
Reported-by: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-11 09:34:30 -05:00
Linus Torvalds
d9894c228b Merge branch 'for-3.14' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 - Handle some loose ends from the vfs read delegation support.
   (For example nfsd can stop breaking leases on its own in a
    fewer places where it can now depend on the vfs to.)
 - Make life a little easier for NFSv4-only configurations
   (thanks to Kinglong Mee).
 - Fix some gss-proxy problems (thanks Jeff Layton).
 - miscellaneous bug fixes and cleanup

* 'for-3.14' of git://linux-nfs.org/~bfields/linux: (38 commits)
  nfsd: consider CLAIM_FH when handing out delegation
  nfsd4: fix delegation-unlink/rename race
  nfsd4: delay setting current_fh in open
  nfsd4: minor nfs4_setlease cleanup
  gss_krb5: use lcm from kernel lib
  nfsd4: decrease nfsd4_encode_fattr stack usage
  nfsd: fix encode_entryplus_baggage stack usage
  nfsd4: simplify xdr encoding of nfsv4 names
  nfsd4: encode_rdattr_error cleanup
  nfsd4: nfsd4_encode_fattr cleanup
  minor svcauth_gss.c cleanup
  nfsd4: better VERIFY comment
  nfsd4: break only delegations when appropriate
  NFSD: Fix a memory leak in nfsd4_create_session
  sunrpc: get rid of use_gssp_lock
  sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt
  sunrpc: don't wait for write before allowing reads from use-gss-proxy file
  nfsd: get rid of unused function definition
  Define op_iattr for nfsd4_open instead using macro
  NFSD: fix compile warning without CONFIG_NFSD_V3
  ...
2014-01-30 10:18:43 -08:00
Linus Torvalds
2b2b15c32a NFS client updates for Linux 3.14
Highlights include:
 
 - Stable fix for an infinite loop in RPC state machine
 - Stable fix for a use after free situation in the NFSv4 trunking discovery
 - Stable fix for error handling in the NFSv4 trunking discovery
 - Stable fix for the page write update code
 - Stable fix for the NFSv4.1 mount time security negotiation
 - Stable fix for the NFSv4 open code.
 - O_DIRECT locking fixes
 - fix an Oops in the pnfs file commit code
 - RPC layer needs finer grained handling of connection errors
 - More RPC GSS upcall fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS5ozQAAoJEGcL54qWCgDy8EIQAMKYX1E5qOal3oJCzWdHAPNz
 ZSQ7CbA3c66vgJwpxy5Mz4gEtTK1IEzfTX31gLgkCXkyw54As+0lOa/SvoXFUusN
 BdBtskkIcVjhcly56xP2dzWGMsVrS8Vt+nwhsPv1Qaor5El0zXwPv8YE5PuuxJK5
 fyQdFEsywnCHtmFdyBdzsV8qHvAA0rxZTMmd6ZDBPCi9362D+pfp/1ESVOA6O14N
 rMBAbadF0pVM1UNvcvxSQaeqwCNqg5OuYKgyy9rhlH0WiQ6ijvKPrLVwg2pKZ2hj
 DCmwEqmKNEpxIFeOvmgFs/uhOEBx2IOF58xTc0+X81q96yTVm80anG1VTNFX577U
 gO8Ts0K/gWTD8ghxz4vh4/llc4yUv8ep8zB3qdSfL8C217UJIwnshkbPct7P1DTh
 8vpWtUeVJPu6rwcxMQXy0NntNZjRo1aqrv+htvFzPAMicM2KEAp73eOjStefvtr5
 JkdbvhhOR6dLwPrUEXM5FW5ewURegLjLcEqw3tq8kMnH0nEYjWOMBaB+uT0QFXun
 EXNqCpQHmHisem/3lGU+iVPc9lPf3C6tPIgjvoSplKcah1l3phVx6a5ReL22Zx2n
 qB2ePHfqToMjMcWiW3O3sbRpaDb+Br7xI4l8F3oeicvfv7SKB8k1u/w2IIoXKFIa
 FIdD6R0UIPgdnH5c03EC
 =abfY
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - stable fix for an infinite loop in RPC state machine
   - stable fix for a use after free situation in the NFSv4 trunking discovery
   - stable fix for error handling in the NFSv4 trunking discovery
   - stable fix for the page write update code
   - stable fix for the NFSv4.1 mount time security negotiation
   - stable fix for the NFSv4 open code.
   - O_DIRECT locking fixes
   - fix an Oops in the pnfs file commit code
   - RPC layer needs finer grained handling of connection errors
   - more RPC GSS upcall fixes"

* tag 'nfs-for-3.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (30 commits)
  pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done
  pnfs: fix BUG in filelayout_recover_commit_reqs
  nfs4: fix discover_server_trunking use after free
  NFSv4.1: Handle errors correctly in nfs41_walk_client_list
  nfs: always make sure page is up-to-date before extending a write to cover the entire page
  nfs: page cache invalidation for dio
  nfs: take i_mutex during direct I/O reads
  nfs: merge nfs_direct_write into nfs_file_direct_write
  nfs: merge nfs_direct_read into nfs_file_direct_read
  nfs: increment i_dio_count for reads, too
  nfs: defer inode_dio_done call until size update is done
  nfs: fix size updates for aio writes
  nfs4.1: properly handle ENOTSUP in SECINFO_NO_NAME
  NFSv4.1: Fix a race in nfs4_write_inode
  NFSv4.1: Don't trust attributes if a pNFS LAYOUTCOMMIT is outstanding
  point to the right include file in a comment (left over from a9004abc3)
  NFS: dprintk() should not print negative fileids and inode numbers
  nfs: fix dead code of ipv6_addr_scope
  sunrpc: Fix infinite loop in RPC state machine
  SUNRPC: Add tracepoint for socket errors
  ...
2014-01-28 08:46:44 -08:00
Aruna-Hewapathirane
63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Trond Myklebust
e8353c7682 SUNRPC: Add tracepoint for socket errors
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2013-12-31 14:02:46 -05:00
Trond Myklebust
2118071d3b SUNRPC: Report connection error values to rpc_tasks on the pending queue
Currently we only report EAGAIN, which is not descriptive enough for
softconn tasks.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2013-12-31 14:01:54 -05:00
Weng Meiling
28303ca309 sunrpc: fix some typos
Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-12-10 20:37:48 -05:00
Trond Myklebust
a6b31d18b0 SUNRPC: Fix a data corruption issue when retransmitting RPC calls
The following scenario can cause silent data corruption when doing
NFS writes. It has mainly been observed when doing database writes
using O_DIRECT.

1) The RPC client uses sendpage() to do zero-copy of the page data.
2) Due to networking issues, the reply from the server is delayed,
   and so the RPC client times out.

3) The client issues a second sendpage of the page data as part of
   an RPC call retransmission.

4) The reply to the first transmission arrives from the server
   _before_ the client hardware has emptied the TCP socket send
   buffer.
5) After processing the reply, the RPC state machine rules that
   the call to be done, and triggers the completion callbacks.
6) The application notices the RPC call is done, and reuses the
   pages to store something else (e.g. a new write).

7) The client NIC drains the TCP socket send buffer. Since the
   page data has now changed, it reads a corrupted version of the
   initial RPC call, and puts it on the wire.

This patch fixes the problem in the following manner:

The ordering guarantees of TCP ensure that when the server sends a
reply, then we know that the _first_ transmission has completed. Using
zero-copy in that situation is therefore safe.
If a time out occurs, we then send the retransmission using sendmsg()
(i.e. no zero-copy), We then know that the socket contains a full copy of
the data, and so it will retransmit a faithful reproduction even if the
RPC call completes, and the application reuses the O_DIRECT buffer in
the meantime.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-11-08 17:19:15 -05:00
Trond Myklebust
a1311d87fa SUNRPC: Cleanup xs_destroy()
There is no longer any need for a separate xs_local_destroy() helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31 09:31:17 -04:00
NeilBrown
93dc41bdc5 SUNRPC: close a rare race in xs_tcp_setup_socket.
We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:

  xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.

The 'sock' passed to that last function is NULL.

The only way I can see this happening is a concurrent call to
xs_close:

  xs_close -> xs_reset_transport -> sock_release -> inet_release

inet_release sets:
   sock->sk = NULL;
inet_stream_connect calls
   lock_sock(sock->sk);
which gets NULL.

All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.

So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().

To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().

As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>
[Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-31 09:14:50 -04:00
J. Bruce Fields
e3bfab1848 sunrpc: comment typo fix
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-28 18:16:54 -04:00
Trond Myklebust
8b71798c0d SUNRPC: Only update the TCP connect cookie on a successful connect
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01 18:22:10 -04:00
Trond Myklebust
7f260e8575 SUNRPC: Enable the keepalive option for TCP sockets
For NFSv4 we want to avoid retransmitting RPC calls unless the TCP
connection breaks. However we still want to detect TCP connection
breakage as soon as possible. Do this by setting the keepalive option
with the idle timeout and count set to the 'timeo' and 'retrans' mount
options.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-10-01 18:22:10 -04:00
Linus Torvalds
bf97293eb8 NFS client updates for Linux 3.12
Highlights include:
 
 - Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
   lease loss due to a network partition, where doing so may result in data
   corruption. Add a kernel parameter to control choice of legacy behaviour
   or not.
 - Performance improvements when 2 processes are writing to the same file.
 - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
 - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
   NFS clients from being able to manipulate our lease and file lockingr
   state.
 - Allow sharing of RPCSEC_GSS caches between different rpc clients
 - Fix the broken NFSv4 security auto-negotiation between client and server
 - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
 - Add a tracepoint framework for debugging NFSv4 state recovery issues.
 - Add tracing to the generic NFS layer.
 - Add tracing for the SUNRPC socket connection state.
 - Clean up the rpc_pipefs mount/umount event management.
 - Merge more patches from Chuck in preparation for NFSv4 migration support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF
 w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05
 7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf
 mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ
 nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3
 XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ
 STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI
 4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2
 LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9
 f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf
 4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3
 uV2w8LgX18aZOZXJVkCM
 =ZuW+
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
     such as lease loss due to a network partition, where doing so may
     result in data corruption.  Add a kernel parameter to control
     choice of legacy behaviour or not.
   - Performance improvements when 2 processes are writing to the same
     file.
   - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
   - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
     NFS clients from being able to manipulate our lease and file
     locking state.
   - Allow sharing of RPCSEC_GSS caches between different rpc clients.
   - Fix the broken NFSv4 security auto-negotiation between client and
     server.
   - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
   - Add a tracepoint framework for debugging NFSv4 state recovery
     issues.
   - Add tracing to the generic NFS layer.
   - Add tracing for the SUNRPC socket connection state.
   - Clean up the rpc_pipefs mount/umount event management.
   - Merge more patches from Chuck in preparation for NFSv4 migration
     support"

* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
  NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
  NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
  NFSv4: Allow security autonegotiation for submounts
  NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
  NFSv4: Fix security auto-negotiation
  NFS: Clean up nfs_parse_security_flavors()
  NFS: Clean up the auth flavour array mess
  NFSv4.1 Use MDS auth flavor for data server connection
  NFS: Don't check lock owner compatability unless file is locked (part 2)
  NFS: Don't check lock owner compatibility in writes unless file is locked
  nfs4: Map NFS4ERR_WRONG_CRED to EPERM
  nfs4.1: Add SP4_MACH_CRED write and commit support
  nfs4.1: Add SP4_MACH_CRED stateid support
  nfs4.1: Add SP4_MACH_CRED secinfo support
  nfs4.1: Add SP4_MACH_CRED cleanup support
  nfs4.1: Add state protection handler
  nfs4.1: Minimal SP4_MACH_CRED implementation
  SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
  SUNRPC: Add an identifier for struct rpc_clnt
  SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
  ...
2013-09-09 09:19:15 -07:00
Trond Myklebust
40b5ea0c25 SUNRPC: Add tracepoints to help debug socket connection issues
Add client side debugging to help trace socket connection/disconnection
and unexpected state change issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-09-04 12:26:31 -04:00
Eric Dumazet
64dc61306c net: add sk_stream_is_writeable() helper
Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-24 17:54:48 -07:00
Linus Torvalds
0ff08ba5d0 Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:
 "Changes this time include:

   - 4.1 enabled on the server by default: the last 4.1-specific issues
     I know of are fixed, so we're not going to find the rest of the
     bugs without more exposure.
   - Experimental support for NFSv4.2 MAC Labeling (to allow running
     selinux over NFS), from Dave Quigley.
   - Fixes for some delicate cache/upcall races that could cause rare
     server hangs; thanks to Neil Brown and Bodo Stroesser for extreme
     debugging persistence.
   - Fixes for some bugs found at the recent NFS bakeathon, mostly v4
     and v4.1-specific, but also a generic bug handling fragmented rpc
     calls"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux: (31 commits)
  nfsd4: support minorversion 1 by default
  nfsd4: allow destroy_session over destroyed session
  svcrpc: fix failures to handle -1 uid's
  sunrpc: Don't schedule an upcall on a replaced cache entry.
  net/sunrpc: xpt_auth_cache should be ignored when expired.
  sunrpc/cache: ensure items removed from cache do not have pending upcalls.
  sunrpc/cache: use cache_fresh_unlocked consistently and correctly.
  sunrpc/cache: remove races with queuing an upcall.
  nfsd4: return delegation immediately if lease fails
  nfsd4: do not throw away 4.1 lock state on last unlock
  nfsd4: delegation-based open reclaims should bypass permissions
  svcrpc: don't error out on small tcp fragment
  svcrpc: fix handling of too-short rpc's
  nfsd4: minor read_buf cleanup
  nfsd4: fix decoding of compounds across page boundaries
  nfsd4: clean up nfs4_open_delegation
  NFSD: Don't give out read delegations on creates
  nfsd4: allow client to send no cb_sec flavors
  nfsd4: fail attempts to request gss on the backchannel
  nfsd4: implement minimal SP4_MACH_CRED
  ...
2013-07-11 10:17:13 -07:00
Joe Perches
fe2c6338fd net: Convert uses of typedef ctl_table to struct ctl_table
Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
  xargs perl -p -i -e '\
	sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
        s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-13 02:36:09 -07:00
J. Bruce Fields
2fccbd9cc0 sunrpc: server back channel needs no rpcbind method
XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp()
(using xprt_set_bound()), and is never cleared, so ->rpcbind() will
never need to be called.

Reported-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-05-15 09:27:02 -04:00
J. Bruce Fields
7073ea8799 SUNRPC: attempt AF_LOCAL connect on setup
In the gss-proxy case, setup time is when I know I'll have the right
namespace for the connect.

In other cases, it might be useful to get any connection errors
earlier--though actually in practice it doesn't make any difference for
rpcbind.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-04-26 11:41:25 -04:00
J. Bruce Fields
c85b03ab20 Merge Trond's nfs-for-next
Merging Trond's nfs-for-next branch, mainly to get
b7993cebb8 "SUNRPC: Allow rpc_create() to
request that TCP slots be unlimited", which a small piece of the
gss-proxy work depends on.
2013-04-26 11:37:43 -04:00
Trond Myklebust
b7993cebb8 SUNRPC: Allow rpc_create() to request that TCP slots be unlimited
This is mainly for use by NFSv4.1, where the session negotiation
ultimately wants to decide how many RPC slots we can fill.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-04-14 12:26:03 -04:00
Trond Myklebust
3ed5e2a2c3 SUNRPC: Report network/connection errors correctly for SOFTCONN rpc tasks
In the case of a SOFTCONN rpc task, we really want to ensure that it
reports errors like ENETUNREACH back to the caller. Currently, only
some of these errors are being reported back (connect errors are not),
and they are being converted by the RPC layer into EIO.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-25 12:04:10 -04:00
J. Bruce Fields
190b1ecf25 sunrpc: don't attempt to cancel unitialized work
As of dc107402ae "SUNRPC: make AF_LOCAL connect synchronous", we no longer initialize connect_worker in the
AF_LOCAL case, resulting in warnings like:

    WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0() Hardware name: Bochs
    ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
    Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd sunrpc
    Pid: 4816, comm: nfsd Tainted: G        W    3.8.0-rc2-00049-gdc10740 #801
    Call Trace:
     [<ffffffff8156ec00>] ? free_obj_work+0x60/0xa0
     [<ffffffff81046aaf>] warn_slowpath_common+0x7f/0xc0
     [<ffffffff81046ba6>] warn_slowpath_fmt+0x46/0x50
     [<ffffffff8156eccc>] debug_print_object+0x8c/0xb0
     [<ffffffff81055030>] ? timer_debug_hint+0x10/0x10
     [<ffffffff8156f7e3>] debug_object_assert_init+0xe3/0x120
     [<ffffffff81057ebb>] del_timer+0x2b/0x80
     [<ffffffff8109c4e6>] ? mark_held_locks+0x86/0x110
     [<ffffffff81065a29>] try_to_grab_pending+0xd9/0x150
     [<ffffffff81065b57>] __cancel_work_timer+0x27/0xc0
     [<ffffffff81065c03>] cancel_delayed_work_sync+0x13/0x20
     [<ffffffffa0007067>] xs_destroy+0x27/0x80 [sunrpc]
     [<ffffffffa00040d8>] xprt_destroy+0x78/0xa0 [sunrpc]
     [<ffffffffa0006241>] xprt_put+0x21/0x30 [sunrpc]
     [<ffffffffa00030cf>] rpc_free_client+0x10f/0x1a0 [sunrpc]
     [<ffffffffa0002ff3>] ? rpc_free_client+0x33/0x1a0 [sunrpc]
     [<ffffffffa0002f7e>] rpc_release_client+0x6e/0xb0 [sunrpc]
     [<ffffffffa000325d>] rpc_shutdown_client+0xfd/0x1b0 [sunrpc]
     [<ffffffffa0017196>] rpcb_put_local+0x106/0x130 [sunrpc]
    ...

Acked-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-09 12:43:42 -05:00
Linus Torvalds
b6669737d3 Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Miscellaneous bugfixes, plus:

   - An overhaul of the DRC cache by Jeff Layton.  The main effect is
     just to make it larger.  This decreases the chances of intermittent
     errors especially in the UDP case.  But we'll need to watch for any
     reports of performance regressions.

   - Containerized nfsd: with some limitations, we now support
     per-container nfs-service, thanks to extensive work from Stanislav
     Kinsbursky over the last year."

Some notes about conflicts, since there were *two* non-data semantic
conflicts here:

 - idr_remove_all() had been added by a memory leak fix, but has since
   become deprecated since idr_destroy() does it for us now.

 - xs_local_connect() had been added by this branch to make AF_LOCAL
   connections be synchronous, but in the meantime Trond had changed the
   calling convention in order to avoid a RCU dereference.

There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.

* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
  SUNRPC: make AF_LOCAL connect synchronous
  nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
  svcrpc: fix rpc server shutdown races
  svcrpc: make svc_age_temp_xprts enqueue under sv_lock
  lockd: nlmclnt_reclaim(): avoid stack overflow
  nfsd: enable NFSv4 state in containers
  nfsd: disable usermode helper client tracker in container
  nfsd: use proper net while reading "exports" file
  nfsd: containerize NFSd filesystem
  nfsd: fix comments on nfsd_cache_lookup
  SUNRPC: move cache_detail->cache_request callback call to cache_read()
  SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
  SUNRPC: rework cache upcall logic
  SUNRPC: introduce cache_detail->cache_request callback
  NFS: simplify and clean cache library
  NFS: use SUNRPC cache creation and destruction helper for DNS cache
  nfsd4: free_stid can be static
  nfsd: keep a checksum of the first 256 bytes of request
  sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
  sunrpc: fix comment in struct xdr_buf definition
  ...
2013-02-28 18:02:55 -08:00
J. Bruce Fields
dc107402ae SUNRPC: make AF_LOCAL connect synchronous
It doesn't appear that anyone actually needs to connect asynchronously.

Also, using a workqueue for the connect means we lose the namespace
information from the original process.  This is a problem since there's
no way to explicitly pass in a filesystem namespace for resolution of an
AF_LOCAL address.

Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-28 09:47:17 -08:00
Jeff Layton
5976687a2b sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h
These routines are used by server and client code, so having them in a
separate header would be best.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-05 09:41:14 -05:00
Trond Myklebust
6a24dfb645 SUNRPC: Pass pointers to struct rpc_xprt to the congestion window
Avoid access to task->tk_xprt

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:47 -05:00
Trond Myklebust
3dc0da278e SUNRPC: Fix an RCU dereference in xs_local_rpcbind
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:47 -05:00
Trond Myklebust
1b092092bf SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback
Avoid another RCU dereference by passing the pointer to struct rpc_xprt
from the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:47 -05:00
Trond Myklebust
a4f0835c60 SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()
tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is
defined as an __rcu variable. Replace dereferences of tk_xprt with
non-rcu dereferences where it is safe to do so.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-01 10:13:47 -05:00
Trond Myklebust
1efc28780b SUNRPC: variable 'svsk' is unused in function bc_send_request
Silence a compile time warning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 17:05:57 -05:00
Trond Myklebust
4a20a988f7 SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
Silence the unnecessary warning "unhandled error (111) connecting to..."
and convert it to a dprintk for debugging purposes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 17:02:29 -05:00
Weston Andros Adamson
b8a13d039c SUNRPC: remove BUG_ON from bc_malloc
Replace BUG_ON() with WARN_ON_ONCE() and NULL return - the caller will handle
this like a memory allocation failure.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:42 -05:00
Weston Andros Adamson
1b7a181907 SUNRPC: remove BUG_ONs from *_reclassify_socket*
Replace multiple BUG_ON() calls with WARN_ON_ONCE() and early return when
sanity checking socket ownership (lock). The bind call will fail if the
socket was unsuccessfully reclassified.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04 14:43:41 -05:00
Trond Myklebust
f878b657ce SUNRPC: Get rid of the xs_error_report socket callback
Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.

Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:46:15 -04:00
Trond Myklebust
4bc1e68ed6 SUNRPC: Prevent races in xs_abort_connection()
The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().

All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:46:08 -04:00
Trond Myklebust
b9d2bb2ee5 Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."
This reverts commit 55420c24a0.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:45:39 -04:00
Trond Myklebust
d0bea455dd SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT
This is needed to ensure that we call xprt_connect() upon the next
call to call_connect().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
2012-10-24 10:44:49 -04:00
Trond Myklebust
d19751e7b9 SUNRPC: Get rid of the redundant xprt->shutdown bit field
It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28 16:03:05 -04:00
Bryan Schumaker
84e28a307e SUNRPC: Set alloc_slot for backchannel tcp ops
f39c1bfb5a (SUNRPC: Fix a UDP transport
regression) introduced the "alloc_slot" function for xprt operations,
but never created one for the backchannel operations.  This patch fixes
a null pointer dereference when mounting NFS over v4.1.

Call Trace:
 [<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc]
 [<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc]
 [<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc]
 [<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc]
 [<ffffffff81073589>] process_one_work+0x139/0x500
 [<ffffffff81070e70>] ? alloc_worker+0x70/0x70
 [<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc]
 [<ffffffff81073d1e>] worker_thread+0x15e/0x460
 [<ffffffff8145c839>] ? preempt_schedule+0x49/0x70
 [<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230
 [<ffffffff81079603>] kthread+0x93/0xa0
 [<ffffffff81465d04>] kernel_thread_helper+0x4/0x10
 [<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff81465d00>] ? gs_change+0x13/0x13

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-25 10:33:59 -04:00
Trond Myklebust
a519fc7a70 SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT
Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@vger.kernel.org
2012-09-19 18:16:10 -04:00
Trond Myklebust
f39c1bfb5a SUNRPC: Fix a UDP transport regression
Commit 43cedbf0e8 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2012-09-07 11:43:49 -04:00
Linus Torvalds
ac694dbdbc Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
 - MM
 - a few random fixes
 - a couple of RTC leftovers

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
  rtc/rtc-88pm80x: remove unneed devm_kfree
  rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
  mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
  tmpfs: distribute interleave better across nodes
  mm: remove redundant initialization
  mm: warn if pg_data_t isn't initialized with zero
  mips: zero out pg_data_t when it's allocated
  memcg: gix memory accounting scalability in shrink_page_list
  mm/sparse: remove index_init_lock
  mm/sparse: more checks on mem_section number
  mm/sparse: optimize sparse_index_alloc
  memcg: add mem_cgroup_from_css() helper
  memcg: further prevent OOM with too many dirty pages
  memcg: prevent OOM with too many dirty pages
  mm: mmu_notifier: fix freed page still mapped in secondary MMU
  mm: memcg: only check anon swapin page charges for swap cache
  mm: memcg: only check swap cache pages for repeated charging
  mm: memcg: split swapin charge function into private and public part
  mm: memcg: remove needless !mm fixup to init_mm when charging
  mm: memcg: remove unneeded shmem charge type
  ...
2012-07-31 19:25:39 -07:00
Linus Torvalds
6dbb35b0a7 NFS client updates for Linux 3.6
Features include:
 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.
 - Fix Oopses in the NFSv4 idmapper
 - Fix a deadlock whereby rpciod tries to allocate a new socket and
   ends up recursing into the NFS code due to memory reclaim.
 - Increase the number of permitted callback connections.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb
 XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg
 scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik
 hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk
 2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6
 KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc
 mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt
 uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu
 zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S
 jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe
 sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM
 cV9Ex8zh2Y4aiUXCy3uA
 =KSix
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull second wave of NFS client updates from Trond Myklebust:

 - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
   separate modules.

 - Fix Oopses in the NFSv4 idmapper

 - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
   up recursing into the NFS code due to memory reclaim.

 - Increase the number of permitted callback connections.

* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: explicitly reject LOCK_MAND flock() requests
  nfs: increase number of permitted callback connections.
  SUNRPC: return negative value in case rpcbind client creation error
  NFS: Convert v4 into a module
  NFS: Convert v3 into a module
  NFS: Convert v2 into a module
  NFS: Keep module parameters in the generic NFS client
  NFS: Split out remaining NFS v4 inode functions
  NFS: Pass super operations and xattr handlers in the nfs_subversion
  NFS: Only initialize the ACL client in the v3 case
  NFS: Create a try_mount rpc op
  NFS: Remove the NFS v4 xdev mount function
  NFS: Add version registering framework
  NFS: Fix a number of bugs in the idmapper
  nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
  sunrpc: clarify comments on rpc_make_runnable
  pnfsblock: bail out partial page IO
2012-07-31 18:45:44 -07:00
Mel Gorman
a564b8f039 nfs: enable swap on NFS
Implement the new swapfile a_ops for NFS and hook up ->direct_IO.  This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:48 -07:00
Jeff Layton
5cf02d09b5 nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

    PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
     #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
     #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
     #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
     #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
     #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
     #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
     #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
     #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
     #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
     #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-07-30 18:55:59 -04:00
David S. Miller
60d354ebeb sunrpc: Don't do a dst_confirm() on an input routes.
xs_udp_data_ready() is operating on received packets, and tries to
do a dst_confirm() on the dst attached to the SKB.

This isn't right, dst confirmation is for output routes, not input
rights.  It's for resetting the timers on the nexthop neighbour entry
for the route, indicating that we've got good evidence that we've
successfully reached it.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:02:43 -07:00
J. Bruce Fields
92769108f5 sunrpc: skip portmap calls on sessions backchannel
There's obviously no point to doing portmap calls over the sessions
backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:48 -04:00
Trond Myklebust
09acfea5d8 SUNRPC: Fix a few sparse warnings
net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
 - svc_partial_recvfrom now takes a struct kvec, so the variable
   save_iovbase needs to be an ordinary (void *)

Make a bunch of variables in net/sunrpc/xprtsock.c static

Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.

Fix a couple of conflicting function declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-11 19:30:02 -04:00
Andy Adamson
15a4520621 SUNRPC: add sending,pending queue and max slot to xprt stats
With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d8f, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.

The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.

Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-16 14:55:27 -05:00
Trond Myklebust
24ca9a8477 SUNRPC: Ensure we return EAGAIN in xs_nospace if congestion is cleared
By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail
to find evidence of socket congestion, we are making the RPC engine believe
that the message was incorrectly sent and so it disconnects the socket
instead of just retrying.

The bug appears to have been introduced by commit
5e3771ce2d (SUNRPC: Ensure that xs_nospace
return values are propagated).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 2.6.30]
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
2011-11-22 23:55:27 +02:00
Stanislav Kinsbursky
2aa13531bb SUNRPC: destroy freshly allocated transport in case of sockaddr init error
Otherwise we will leak xprt structure and struct net reference.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-10 14:50:07 -05:00
Trond Myklebust
d9ba131d8f SUNRPC: Support dynamic slot allocation for TCP connections
Allow the number of available slots to grow with the TCP window size.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-17 18:11:30 -04:00
Trond Myklebust
43cedbf0e8 SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
This throttles the allocation of new slots when the socket is busy
reconnecting and/or is out of buffer space.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-17 16:01:03 -04:00
Trond Myklebust
9e00abc3c2 SUNRPC: sunrpc should not explicitly depend on NFS config options
Change explicit references to CONFIG_NFS_V4_1 to implicit ones
Get rid of the unnecessary defines in backchannel_rqst.c and
bc_svc.c: the Makefile takes care of those dependency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15 09:12:23 -04:00
Chuck Lever
176e21ee2e SUNRPC: Support for RPC over AF_LOCAL transports
TI-RPC introduces the capability of performing RPC over AF_LOCAL
sockets.  It uses this mainly for registering and unregistering
local RPC services securely with the local rpcbind, but we could
also conceivably use it as a generic upcall mechanism.

This patch provides a client-side only implementation for the moment.
We might also consider a server-side implementation to provide
AF_LOCAL access to NLM (for statd downcalls, and such like).

Autobinding is not supported on kernel AF_LOCAL transports at this
time.  Kernel ULPs must specify the pathname of the remote endpoint
when an AF_LOCAL transport is created.  rpcbind supports registering
services available via AF_LOCAL, so the kernel could handle it with
some adjustment to ->rpcbind and ->set_port.  But we don't need this
feature for doing upcalls via well-known named sockets.

This has not been tested with ULPs that move a substantial amount of
data.  Thus, I can't attest to how robust the write_space and
congestion management logic is.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:47 -04:00
Chuck Lever
61677eeec2 SUNRPC: Rename xs_encode_tcp_fragment_header()
Clean up: Use a more generic name for xs_encode_tcp_fragment_header();
it's appropriate to use for all stream transport types.  We're about
to add new stream transport.

Also, move it to a place where it is more easily shared amongst the
various send_request methods.  And finally, replace the "htonl" macro
invocation with its modern equivalent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-05-27 17:42:47 -04:00
Trond Myklebust
fe19a96b10 SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
The TCP connection state code depends on the state_change() callback
being called when the SYN_SENT state is set. However the networking layer
doesn't actually call us back in that case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-05-27 17:42:00 -04:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Trond Myklebust
246408dcd5 SUNRPC: Never reuse the socket port after an xs_close()
If we call xs_close(), we're in one of two situations:
 - Autoclose, which means we don't expect to resend a request
 - bind+connect failed, which probably means the port is in use

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2011-03-22 18:42:33 -04:00
Ben Hutchings
4cea288aaf sunrpc: Propagate errors from xs_bind() through xs_create_sock()
xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
error, but it currently returns 0 if xs_bind() fails.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [v2.6.37]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-03-10 15:04:58 -05:00
Linus Torvalds
18bce371ae Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: (62 commits)
  nfsd4: fix callback restarting
  nfsd: break lease on unlink, link, and rename
  nfsd4: break lease on nfsd setattr
  nfsd: don't support msnfs export option
  nfsd4: initialize cb_per_client
  nfsd4: allow restarting callbacks
  nfsd4: simplify nfsd4_cb_prepare
  nfsd4: give out delegations more quickly in 4.1 case
  nfsd4: add helper function to run callbacks
  nfsd4: make sure sequence flags are set after destroy_session
  nfsd4: re-probe callback on connection loss
  nfsd4: set sequence flag when backchannel is down
  nfsd4: keep finer-grained callback status
  rpc: allow xprt_class->setup to return a preexisting xprt
  rpc: keep backchannel xprt as long as server connection
  rpc: move sk_bc_xprt to svc_xprt
  nfsd4: allow backchannel recovery
  nfsd4: support BIND_CONN_TO_SESSION
  nfsd4: modify session list under cl_lock
  Documentation: fl_mylease no longer exists
  ...

Fix up conflicts in fs/nfsd/vfs.c with the vfs-scale work.  The
vfs-scale work touched some msnfs cases, and this merge removes support
for that entirely, so the conflict was trivial to resolve.
2011-01-14 13:17:26 -08:00
J. Bruce Fields
f0418aa4b1 rpc: allow xprt_class->setup to return a preexisting xprt
This allows us to reuse the xprt associated with a server connection if
one has already been set up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
99de8ea962 rpc: keep backchannel xprt as long as server connection
Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

	A connection's association with a session is not exclusive.  A
	connection associated with the channel(s) of one session may be
	simultaneously associated with the channel(s) of other sessions
	including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
J. Bruce Fields
d75faea330 rpc: move sk_bc_xprt to svc_xprt
This seems obviously transport-level information even if it's currently
used only by the server socket code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-01-11 15:04:10 -05:00
Tejun Heo
afe2c511fb workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync()
cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago.  Convert all the
in-kernel users.  The conversions are completely equivalent and
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
2010-12-15 10:56:11 +01:00
Linus Torvalds
4390110fef Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
  svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
  svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
  svcrpc: assume svc_delete_xprt() called only once
  svcrpc: never clear XPT_BUSY on dead xprt
  nfsd4: fix connection allocation in sequence()
  nfsd4: only require krb5 principal for NFSv4.0 callbacks
  nfsd4: move minorversion to client
  nfsd4: delay session removal till free_client
  nfsd4: separate callback change and callback probe
  nfsd4: callback program number is per-session
  nfsd4: track backchannel connections
  nfsd4: confirm only on succesful create_session
  nfsd4: make backchannel sequence number per-session
  nfsd4: use client pointer to backchannel session
  nfsd4: move callback setup into session init code
  nfsd4: don't cache seq_misordered replies
  SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
  SUNRPC: Use conventional switch statement when reclassifying sockets
  sunrpc/xprtrdma: clean up workqueue usage
  sunrpc: Turn list_for_each-s into the ..._entry-s
  ...

Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
2010-10-26 09:55:25 -07:00
Chuck Lever
9247685088 SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
The source address field in the transport's sock_xprt is initialized
ONLY IF the RPC application passed a pointer to a source address
during the call to rpc_create().  However, xs_bind() subsequently uses
the value of this field without regard to whether the source address
was initialized during transport creation or not.

So far we've been lucky: the uninitialized value of this field is
zeroes.  xs_bind(), until recently, used only the sin[6]_addr field in
this sockaddr, and all zeroes is a valid value for this: it means
ANYADDR.  This is a happy coincidence.

However, xs_bind() now wants to use the sa_family field as well, and
expects it to be initialized to something other than zero.

Therefore, the source address sockaddr field should be fully
initialized at transport create time in _every_ case, not just when
the RPC application wants to use a specific bind address.

Bruce added a workaround for this missing initialization by adjusting
commit 6bc9638a, but the "right" way to do this is to ensure that the
source address sockaddr is always correctly initialized from the
get-go.

This patch doesn't introduce a behavior change.  It's simply a
clean-up of Bruce's fix, to prevent future problems of this kind.  It
may look like overkill, but

  a) it clearly documents the default initial value of this field,

  b) it doesn't assume that the sockaddr_storage memory is first
     initialized to any particular value, and

  c) it will fail verbosely if some unknown address family is passed
     in

Originally introduced by commit d3bc9a1d.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-21 10:11:47 -04:00
Chuck Lever
4232e8634a SUNRPC: Use conventional switch statement when reclassifying sockets
Clean up.

Defensive coding: If "family" is ever something that is neither
AF_INET nor AF_INET6, xs_reclassify_socket6() is not the appropriate
default action.  Choose to do nothing in that case.

Introduced by commit 6bc9638a.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-21 10:11:46 -04:00
Pavel Emelyanov
50fa0d40a9 sunrpc: Remove dead "else" branch from bc xprt creation
Since the xprt in question is forcibly set to be bound the else
branch of this check is unneeded.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:16 -04:00
Pavel Emelyanov
8c14ff2aaf sunrpc: Remove UDP worker wrappers
Same for UDP sockets creation paths.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
cdd518d524 sunrpc: Remove TCP worker wrappers
The v4 and the v6 wrappers only pass the respective family
to the xs_tcp_setup_socket. This family can be taken from the
xprt's sockaddr.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
7dfe1fc362 sunrpc: Pass family to setup_socket calls
Now we have a single socket creation routine and can call it
directly from the setup_socket routines.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
6bc9638ab4 sunrpc: Merge xs_create_sock code
After xs_bind is merged it's easy to merge its callers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@redhat.com: fix address family initialization]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
beb59b6828 sunrpc: Merge the xs_bind code
There's the only difference betseen the xs_bind4 and the
xs_bind6 - the size of sockaddr structure they use.

Fortunatelly its size can be indirectly get from the transport.

Change since v1:
* use sockaddr_storage instead of sockaddr
* use rpc_set_port instead of manual port assigning

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@redhat.com: fix address family initialization]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
573018c07e sunrpc: Call xs_create_sockX directly from setup_socket
Remove now unneeded wrappers that just add type and protocol
to socket creation callback.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
22d44a7d8a sunrpc: Factor out v6 sockets creation
Same patch for v6 protocols.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:15 -04:00
Pavel Emelyanov
22f793268d sunrpc: Factor out v4 sockets creation
The UDPv4 and TCPv4 socket creation callbacks now look very similar.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:14 -04:00
Pavel Emelyanov
b65c031061 sunrpc: Factor out udp sockets creation
Make it look like the TCP sockets creation.
Unfortunately the git diff made the patch look messy :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:14 -04:00
Pavel Emelyanov
58dddac9c5 sunrpc: Remove duplicate xprt/transport arguments from calls
The xs_tcp_reuse_connection takes the xprt only to pass it down
to the xs_abort_connection. The later one can get it from the given
transport itself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:14 -04:00
Pavel Emelyanov
a9f5f0f7bf sunrpc: Get xprt pointer once in xs_tcp_setup_socket
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:14 -04:00
Pavel Emelyanov
baaf4e487a sunrpc: Remove unused sock arg from xs_next_srcport
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:14 -04:00
Pavel Emelyanov
5d4ec93297 sunrpc: Remove unused sock arg from xs_get_srcport
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-19 10:48:14 -04:00
Pavel Emelyanov
14ec63c333 sunrpc: Create sockets in net namespaces
The context is already known in all the sock_create callers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:19:00 -04:00
Pavel Emelyanov
37aa213373 sunrpc: Tag rpc_xprt with net
The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:58 -04:00
Pavel Emelyanov
e204e621b4 sunrpc: Factor out rpc_xprt freeing
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:53 -04:00
Pavel Emelyanov
bd1722d431 sunrpc: Factor out rpc_xprt allocation
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:52 -04:00
Eric Dumazet
f064af1e50 net: fix a lockdep splat
We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A]                         CPU2 [C]
read_lock(callback_lock)
<BH>                             spin_lock_bh(slock)
<wait to spin_lock(slock)>
                                 <wait to write_lock_bh(callback_lock)>

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-24 22:26:10 -07:00
Linus Torvalds
763008c435 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix an Oops in the NFSv4 atomic open code
  NFS: Fix the selection of security flavours in Kconfig
  NFS: fix the return value of nfs_file_fsync()
  rpcrdma: Fix SQ size calculation when memreg is FRMR
  xprtrdma: Do not truncate iova_start values in frmr registrations.
  nfs: Remove redundant NULL check upon kfree()
  nfs: Add "lookupcache" to displayed mount options
  NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
  SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
2010-08-18 15:45:23 -07:00
Rusty Russell
9bbb9e5a33 param: use ops in struct kernel_param, rather than get and set fns directly
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.

The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).

Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are).  This causes some
harmless warnings until we fix them (in following patches).

To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
2010-08-11 23:04:13 +09:30
Andy Chittenden
669502ff31 SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
When reusing a TCP connection, ensure that it's aborted if a previous
shutdown attempt has been made on that connection so that the RPC over
TCP recovery mechanism succeeds.

Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-08-10 10:19:53 -04:00
Trond Myklebust
b76ce56192 SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()
If the attempt to read the calldir fails, then instead of storing the read
bytes, we currently discard them. This leads to a garbage final result when
upon re-entry to the same routine, we read the remaining bytes.

Fixes the regression in bugzilla number 16213. Please see
    https://bugzilla.kernel.org/show_bug.cgi?id=16213

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-06-22 13:21:18 -04:00
J. Bruce Fields
0a68b0bed0 sunrpc: fix leak on error on socket xprt setup
Also collect exit code together while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26 08:43:50 -04:00
Linus Torvalds
f8965467f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
  qlcnic: adding co maintainer
  ixgbe: add support for active DA cables
  ixgbe: dcb, do not tag tc_prio_control frames
  ixgbe: fix ixgbe_tx_is_paused logic
  ixgbe: always enable vlan strip/insert when DCB is enabled
  ixgbe: remove some redundant code in setting FCoE FIP filter
  ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
  ixgbe: fix header len when unsplit packet overflows to data buffer
  ipv6: Never schedule DAD timer on dead address
  ipv6: Use POSTDAD state
  ipv6: Use state_lock to protect ifa state
  ipv6: Replace inet6_ifaddr->dead with state
  cxgb4: notify upper drivers if the device is already up when they load
  cxgb4: keep interrupts available when the ports are brought down
  cxgb4: fix initial addition of MAC address
  cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
  cnic: Convert cnic_local_flags to atomic ops.
  can: Fix SJA1000 command register writes on SMP systems
  bridge: fix build for CONFIG_SYSFS disabled
  ARCNET: Limit com20020 PCI ID matches for SOHARD cards
  ...

Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).

Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
2010-05-20 21:04:44 -07:00
Joe Perches
3fa21e07e6 net: Remove unnecessary returns from void function()s
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:14 -07:00
Trond Myklebust
d60dbb20a7 SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst
It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:36 -04:00
Trond Myklebust
712a433866 SUNRPC: Fix xs_setup_bc_tcp()
It is a BUG for anybody to call this function without setting
args->bc_xprt. Trying to return an error value is just wrong, since the
user cannot fix this: it is a programming error, not a user error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Chuck Lever
bbc72cea58 SUNRPC: RPC metrics and RTT estimator should use same RTT value
Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:32 -04:00
Trond Myklebust
a8ce4a8f37 SUNRPC: Fail over more quickly on connect errors
We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Remove the field xprt->connect_timeout since it has been obsoleted by
xprt->reestablish_timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:30 -04:00
Trond Myklebust
0b9e794313 SUNRPC: Move the test for XPRT_CONNECTING into xprt_connect()
This fixes a bug with setting xprt->stat.connect_start.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:29 -04:00
Trond Myklebust
c9acb42ef1 SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel
The ->release_request() callback was designed to allow the transport layer
to do housekeeping after the RPC call is done. It cannot be used to free
the request itself, and doing so leads to a use-after-free bug in
xprt_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:32:44 -04:00
Linus Torvalds
7c34691abe Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: ensure bdi_unregister is called on mount failure.
  NFS: Avoid a deadlock in nfs_release_page
  NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
  nfs4: Make the v4 callback service hidden
  nfs: fix unlikely memory leak
  rpc client can not deal with ENOSOCK, so translate it into ENOCONN
2010-03-18 16:50:09 -07:00
Linus Torvalds
d89b218b80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
  bridge: ensure to unlock in error path in br_multicast_query().
  drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
  sky2: Avoid rtnl_unlock without rtnl_lock
  ipv6: Send netlink notification when DAD fails
  drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
  ipconfig: Handle devices which take some time to come up.
  mac80211: Fix memory leak in ieee80211_if_write()
  mac80211: Fix (dynamic) power save entry
  ipw2200: use kmalloc for large local variables
  ath5k: read eeprom IQ calibration values correctly for G mode
  ath5k: fix I/Q calibration (for real)
  ath5k: fix TSF reset
  ath5k: use fixed antenna for tx descriptors
  libipw: split ieee->networks into small pieces
  mac80211: Fix sta_mtx unlocking on insert STA failure path
  rt2x00: remove KSEG1ADDR define from rt2x00soc.h
  net: add ColdFire support to the smc91x driver
  asix: fix setting mac address for AX88772
  ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
  net: Fix dev_mc_add()
  ...
2010-03-13 14:50:18 -08:00
Joe Perches
81160e66cc net/sunrpc: Convert (void)snprintf to snprintf
(Applies on top of "Remove uses of NIPQUAD, use %pI4")

Casts to void of snprintf are most uncommon in kernel source.
9 use casts, 1301 do not.

Remove the remaining uses in net/sunrpc/

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:15:59 -08:00
Joe Perches
fc0b579168 net/sunrpc: Remove uses of NIPQUAD, use %pI4
Originally submitted Jan 1, 2010
http://patchwork.kernel.org/patch/71221/

Convert NIPQUAD to the %pI4 format extension where possible
Convert %02x%02x%02x%02x/NIPQUAD to %08x/ntohl

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:15:28 -08:00
Bian Naimeng
5fe46e9d73 rpc client can not deal with ENOSOCK, so translate it into ENOCONN
If NFSv4 client send a request before connect, or the old connection was broken
because a ETIMEOUT error catched by call_status, ->send_request will return
ENOSOCK, but rpc layer can not deal with it, so make sure ->send_request can
translate ENOSOCK into ENOCONN.

Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:05:57 -05:00
Trond Myklebust
9fcfe0c83c SUNRPC: Handle EINVAL error returns from the TCP connect operation
This can, for instance, happen if the user specifies a link local IPv6
address.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-02 13:06:21 -05:00
H Hartley Sweeten
5a51f13adf xprtsock.c: make bc_{malloc/free} static
xprtsock.c: make bc_{malloc/free} static

The server backchannel buf_alloc and buf_free methods should
be static since they are not used outside this file.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-02-10 08:30:53 -05:00
Trond Myklebust
52c9948b1f Merge branch 'nfs-for-2.6.33' 2009-12-13 13:56:27 -05:00
Chuck Lever
09a21c4102 SUNRPC: Allow RPCs to fail quickly if the server is unreachable
The kernel sometimes makes RPC calls to services that aren't running.
Because the kernel's RPC client always assumes the hard retry semantic
when reconnecting a connection-oriented RPC transport, the underlying
reconnect logic takes a long while to time out, even though the remote
may have responded immediately with ECONNREFUSED.

In certain cases, like upcalls to our local rpcbind daemon, or for NFS
mount requests, we'd like the kernel to fail immediately if the remote
service isn't reachable.  This allows another transport to be tried
immediately, or the pending request can be abandoned quickly.

Introduce a per-request flag which controls how call_transmit_status()
behaves when request transmission fails because the server cannot be
reached.

We don't want soft connection semantics to apply to other errors.  The
default case of the switch statement in call_transmit_status() no
longer falls through; the fall through code is copied to the default
case, and a "break;" is added.

The transport's connection re-establishment timeout is also ignored for
such requests.  We want the request to fail immediately, so the
reconnect delay is skipped.  Additionally, we don't want a connect
failure here to further increase the reconnect timeout value, since
this request will not be retried.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-12-03 15:58:56 -05:00
Eric W. Biederman
6d4561110a sysctl: Drop & in front of every proc_handler.
For consistency drop & in front of every proc_handler.  Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-18 08:37:40 -08:00
Eric W. Biederman
f8572d8f2a sysctl net: Remove unused binary sysctl code
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12 02:05:06 -08:00
Neil Brown
61d0a8e6a8 NFS/RPC: fix problems with reestablish_timeout and related code.
[[resending with correct cc:  - "vfs.kernel.org" just isn't right!]]

xprt->reestablish_timeout is used to cause TCP connection attempts to
back off if the connection fails so as not to hammer the network,
but to still allow immediate connections when there is no reason to
believe there is a problem.

It is not used for the first connection (when transport->sock is NULL)
but only on reconnects.

It is currently set:

 a/ to 0 when xs_tcp_state_change finds a state of TCP_FIN_WAIT1
    on the assumption that the client has closed the connection
    so the reconnect should be immediate when needed.
 b/ to at least XS_TCP_INIT_REEST_TO when xs_tcp_state_change
    detects TCP_CLOSING or TCP_CLOSE_WAIT on the assumption that the
    server closed the connection so a small delay at least is
    required.
 c/ as above when xs_tcp_state_change detects TCP_SYN_SENT, so that
    it is never 0 while a connection has been attempted, else
    the doubling will produce 0 and there will be no backoff.
 d/ to double is value (up to a limit) when delaying a connection,
    thus providing exponential backoff and
 e/ to XS_TCP_INIT_REEST_TO in xs_setup_tcp as simple initialisation.

So you can see it is highly dependant on xs_tcp_state_change being
called as expected.  However experimental evidence shows that
xs_tcp_state_change does not see all state changes.
("rpcdebug -m rpc trans" can help show what actually happens).

Results show:
 TCP_ESTABLISHED is reported when a connection is made.  TCP_SYN_SENT
 is never reported, so rule 'c' above is never effective.

 When the server closes the connection, TCP_CLOSE_WAIT and
 TCP_LAST_ACK *might* be reported, and TCP_CLOSE is always
 reported.  This rule 'b' above will sometimes be effective, but
 not reliably.

 When the client closes the connection, it used to result in
 TCP_FIN_WAIT1, TCP_FIN_WAIT2, TCP_CLOSE.  However since commit
 f75e674 (SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on
 reconnect) we don't see *any* events on client-close.  I think this
 is because xs_restore_old_callbacks is called to disconnect
 xs_tcp_state_change before the socket is closed.
 In any case, rule 'a' no longer applies.

So all that is left are rule d, which successfully doubles the
timeout which is never rest, and rule e which initialises the timeout.

Even if the rules worked as expected, there would be a problem because
a successful connection does not reset the timeout, so a sequence
of events where the server closes the connection (e.g. during failover
testing) will cause longer and longer timeouts with no good reason.

This patch:

 - sets reestablish_timeout to 0 in xs_close thus effecting rule 'a'
 - sets it to 0 in xs_tcp_data_ready to ensure that a successful
   connection resets the timeout
 - sets it to at least XS_TCP_INIT_REEST_TO after it is doubled,
   thus effecting rule c

I have not reimplemented rule b and the new version of rule c
seems sufficient.

I suspect other code in xs_tcp_data_ready needs to be revised as well.
For example I don't think connect_cookie is being incremented as often
as it should be.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-23 14:36:37 -04:00
Alexandros Batsakis
f300baba5a nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-13 15:46:15 -04:00
Rahul Iyer
4cfc7e6019 nfsd41: sunrpc: Added rpc server-side backchannel handling
When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@netapp.com>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-09-11 15:04:16 -04:00
Trond Myklebust
976a6f921c Merge branch 'patches_cel-for-2.6.32' into nfs-for-2.6.32 2009-08-10 17:45:50 -04:00
Chuck Lever
9dc3b095b7 SUNRPC: Update xprt address strings after an rpcbind completes
After a bind completes, update the transport instance's address
strings so debugging messages display the current port the transport
is connected to.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:46 -04:00
Chuck Lever
c740eff84b SUNRPC: Kill RPC_DISPLAY_ALL
At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL.
Currently there are no uses of RPC_DISPLAY_ALL outside the transport
modules themselves, so we can safely get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:46 -04:00
Chuck Lever
fbfffbd5e7 SUNRPC: Rename sock_xprt.addr as sock_xprt.srcaddr
Clean up: Give the "addr" and "port" field less ambiguous names.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:46 -04:00
Chuck Lever
c877b849d3 SUNRPC: Use rpc_ntop() for constructing transport address strings
Clean up:  In addition to using the new generic rpc_ntop() and
rpc_get_port() functions, have the RPC client compute the presentation
address buffer sizes dynamically using kstrdup().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:36 -04:00
Chuck Lever
ba809130bc SUNRPC: Remove duplicate universal address generation
RPC universal address generation is currently done in several places:
rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c.  Remove the
redundant cases that convert a socket address to a universal
address.  The nfs4proc.c case takes a pre-formatted presentation
address string, not a socket address, so we'll leave that one.

Because the new uaddr constructor uses the recently introduced
rpc_ntop(), it now supports proper "::" shorthanding for IPv6
addresses.  This allows the kernel to register properly formed
universal addresses with the local rpcbind service, in _all_ cases.

The kernel can now also send properly formed universal addresses in
RPCB_GETADDR requests, and support link-local properly when
encoding and decoding IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:35 -04:00
Trond Myklebust
cbf1107126 SUNRPC: convert some sysctls into module parameters
Parameters like the minimum reserved port, and the number of slot entries
should really be module parameters rather than sysctls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:06:19 -04:00
Trond Myklebust
1f84603c09 Merge branch 'devel-for-2.6.31' into for-2.6.31
Conflicts:
	fs/nfs/client.c
	fs/nfs/super.c
2009-06-18 18:13:44 -07:00
Trond Myklebust
301933a0ac Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31 2009-06-17 17:59:58 -07:00
Ricardo Labiaga
0d90ba1cd4 nfs41: Backchannel callback service helper routines
Executes the backchannel task on the RPC state machine using
the existing open connection previously established by the client.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>

nfs41: Add bc_svc.o to sunrpc Makefile.

[nfs41: bc_send() does not need to be exported outside RPC module]
[nfs41: xprt_free_bc_request() need not be exported outside RPC module]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:28 -07:00
Trond Myklebust
88b5ed73bc SUNRPC: Fix a missing "break" option in xs_tcp_setup_socket()
In the case of -EADDRNOTAVAIL and/or unhandled connection errors, we want
to get rid of the existing socket and retry immediately, just as the
comment says. Currently we end up sleeping for a minute, due to the missing
"break" statement.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:57 -07:00
Ricardo Labiaga
44b98efdd0 nfs41: New xs_tcp_read_data()
Handles RPC replies and backchannel callbacks.  Traditionally the NFS
client has expected only RPC replies on its open connections.  With
NFSv4.1, callbacks can arrive over an existing open connection.

This patch refactors the old xs_tcp_read_request() into an RPC reply handler:
xs_tcp_read_reply(), a new backchannel callback handler: xs_tcp_read_callback(),
and a common routine to read the data off the transport: xs_tcp_read_common().
The new xs_tcp_read_callback() queues callback requests onto a queue where
the callback service (a separate thread) is listening for the processing.

This patch incorporates work and suggestions from Rahul Iyer (iyer@netapp.com)
and Benny Halevy (bhalevy@panasas.com).

xs_tcp_read_callback() drops the connection when the number of expected
callbacks is exceeded.  Use xprt_force_disconnect(), ensuring tasks on
the pending queue are awaken on disconnect.

[nfs41: Keep track of RPC call/reply direction with a flag]
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: xs_tcp_read_callback() should use xprt_force_disconnect()]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Moves embedded #ifdefs into #ifdef function blocks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 13:06:16 -07:00
Ricardo Labiaga
f4a2e418bf nfs41: Process the RPC call direction
Reading and storing the RPC direction is a three step process.

1. xs_tcp_read_calldir() reads the RPC direction, but it will not store it
in the XDR buffer since the 'struct rpc_rqst' is not yet available.

2. The 'struct rpc_rqst' is obtained during the TCP_RCV_COPY_DATA state.
This state need not necessarily be preceeded by the TCP_RCV_READ_CALLDIR.
For example, we may be reading a continuation packet to a large reply.
Therefore, we can't simply obtain the 'struct rpc_rqst' during the
TCP_RCV_READ_CALLDIR state and assume it's available during TCP_RCV_COPY_DATA.

This patch adds a new TCP_RCV_READ_CALLDIR flag to indicate the need to
read the RPC direction.  It then uses TCP_RCV_COPY_CALLDIR to indicate the
RPC direction needs to be saved after the 'struct rpc_rqst' has been allocated.

3. The 'struct rpc_rqst' is obtained by the xs_tcp_read_data() helper
functions.  xs_tcp_read_common() then saves the RPC direction in the XDR
buffer if TCP_RCV_COPY_CALLDIR is set.  This will happen when we're reading
the data immediately after the direction was read.  xs_tcp_read_common()
then clears this flag.

[was nfs41: Skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Add RPC direction back into the XDR buffer]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Don't skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:46 -07:00
Ricardo Labiaga
18dca02aeb nfs41: Add ability to read RPC call direction on TCP stream.
NFSv4.1 callbacks can arrive over an existing connection. This patch adds
the logic to read the RPC call direction (call or reply). It does this by
updating the state machine to look for the call direction invoking
xs_tcp_read_calldir(...) after reading the XID.

[nfs41: Keep track of RPC call/reply direction with a flag]

As per 11/14/08 review of RFC 53/85.

Add a new flag to track whether the incoming message is an RPC call or an
RPC reply.  TCP_RPC_REPLY is set in the 'struct sock_xprt' tcp_flags in
xs_tcp_read_calldir() if the message is an RPC reply sent on the forechannel.
It is cleared if the message is an RPC request sent on the back channel.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:45 -07:00
Eric Dumazet
adf30907d6 net: skb->dst accessors
Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-06-03 02:51:04 -07:00
Trond Myklebust
f75e6745aa SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect
See http://bugzilla.kernel.org/show_bug.cgi?id=13034

If the port gets into a TIME_WAIT state, then we cannot reconnect without
binding to a new port.

Tested-by: Petr Vandrovec <petr@vandrovec.name>
Tested-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-02 16:35:08 -07:00
Trond Myklebust
cc85906110 Merge branch 'devel' into for-linus 2009-04-01 13:28:15 -04:00
David S. Miller
08abe18af1 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/net/wimax/i2400m/usb-notif.c
2009-03-26 15:23:24 -07:00
Trond Myklebust
55420c24a0 SUNRPC: Ensure we close the socket on EPIPE errors too...
As long as one task is holding the socket lock, then calls to
xprt_force_disconnect(xprt) will not succeed in shutting down the socket.
In particular, this would mean that a server initiated shutdown will not
succeed until the lock is relinquished.
In order to avoid the deadlock, we should ensure that xs_tcp_send_request()
closes the socket on EPIPE errors too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:17:36 -04:00
Trond Myklebust
b61d59fffd SUNRPC: xs_tcp_connect_worker{4,6}: merge common code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:17:35 -04:00
Trond Myklebust
25fe6142a5 SUNRPC: Add a sysctl to control the duration of the socket linger timeout
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:17:34 -04:00
Trond Myklebust
7d1e8255cf SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets
This fixes a regression against FreeBSD servers as reported by Tomas
Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers
don't ever react to the client closing the socket, and so commit
e06799f958 (SUNRPC: Use shutdown() instead of
close() when disconnecting a TCP socket) causes the setup to hang forever
whenever the client attempts to close and then reconnect.

We break the deadlock by adding a 'linger2' style timeout to the socket,
after which, the client will abort the connection using a TCP 'RST'.

The default timeout is set to 15 seconds. A subsequent patch will put it
under user control by means of a systctl.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:17:34 -04:00
Trond Myklebust
5e3771ce2d SUNRPC: Ensure that xs_nospace return values are propagated
If xs_nospace() finds that the socket has disconnected, it attempts to
return ENOTCONN, however that value is then squashed by the callers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:01 -04:00
Trond Myklebust
8a2cec295f SUNRPC: Delay, then retry on connection errors.
Enforce the comment in xs_tcp_connect_worker4/xs_tcp_connect_worker6 that
we should delay, then retry on certain connection errors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:01 -04:00
Trond Myklebust
2a4919919a SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending
While we should definitely return socket errors to the task that is
currently trying to send data, there is no need to propagate the same error
to all the other tasks on xprt->pending. Doing so actually slows down
recovery, since it causes more than one tasks to attempt socket recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:00 -04:00
Trond Myklebust
482f32e65d SUNRPC: Handle socket errors correctly
Ensure that we pick up and handle socket errors as they occur.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:38:00 -04:00
Trond Myklebust
c8485e4d63 SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()
If we get an ECONNREFUSED error, we currently go to sleep on the
'xprt->sending' wait queue. The problem is that no timeout is set there,
and there is nothing else that will wake the task up later.

We should deal with ECONNREFUSED in call_status, given that is where we
also deal with -EHOSTDOWN, and friends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:59 -04:00
Trond Myklebust
40d2549db5 SUNRPC: Don't disconnect if a connection is still in progress.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:58 -04:00
Trond Myklebust
670f945731 SUNRPC: Ensure we set XPRT_CLOSING only after we've sent a tcp FIN...
...so that we can distinguish between when we need to shutdown and when we
don't. Also remove the call to xs_tcp_shutdown() from xs_tcp_connect(),
since xprt_connect() makes the same test.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:58 -04:00
Chuck Lever
fe315e76fc SUNRPC: Avoid spurious wake-up during UDP connect processing
To clear out old state, the UDP connect workers unconditionally invoke
xs_close() before proceeding with a new connect.  Nowadays this causes
a spurious wake-up of the task waiting for the connect to complete.

This is a little racey, but usually harmless.  The waiting task
immediately retries the connect via a call_bind/call_connect sequence,
which usually finds the transport already in the connected state
because the connect worker has finished in the background.

To avoid a spurious wake-up, factor the xs_close() logic that resets
the underlying socket into a helper, and have the UDP connect workers
call that helper instead of xs_close().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:21 -04:00
Trond Myklebust
01d37c428a SUNRPC: xprt_connect() don't abort the task if the transport isn't bound
If the transport isn't bound, then we should just return ENOTCONN, letting
call_connect_status() and/or call_status() deal with retrying. Currently,
we appear to abort all pending tasks with an EIO error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:09:39 -04:00
Trond Myklebust
fba91afbec SUNRPC: Fix an Oops due to socket not set up yet...
We can Oops in both xs_udp_send_request() and xs_tcp_send_request() if the
call to xs_sendpages() returns an error due to the socket not yet being
set up.
Deal with that situation by returning a new error: ENOTSOCK, so that we
know to avoid dereferencing transport->sock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:06:41 -04:00
Ilpo Järvinen
1f0fa15432 net/sunrpc/xprtsock.c: some common code found
$ diff-funcs xs_udp_write_space net/sunrpc/xprtsock.c
net/sunrpc/xprtsock.c xs_tcp_write_space
 --- net/sunrpc/xprtsock.c:xs_udp_write_space()
 +++ net/sunrpc/xprtsock.c:xs_tcp_write_space()
@@ -1,4 +1,4 @@
- * xs_udp_write_space - callback invoked when socket buffer space
+ * xs_tcp_write_space - callback invoked when socket buffer space
  *                             becomes available
  * @sk: socket whose state has changed
  *
@@ -7,12 +7,12 @@
  * progress, otherwise we'll waste resources thrashing kernel_sendmsg
  * with a bunch of small requests.
  */
-static void xs_udp_write_space(struct sock *sk)
+static void xs_tcp_write_space(struct sock *sk)
 {
 	read_lock(&sk->sk_callback_lock);

-	/* from net/core/sock.c:sock_def_write_space */
-	if (sock_writeable(sk)) {
+	/* from net/core/stream.c:sk_stream_write_space */
+	if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) {
 		struct socket *sock;
 		struct rpc_xprt *xprt;


$ codiff net/sunrpc/xprtsock.o net/sunrpc/xprtsock.o.new
net/sunrpc/xprtsock.c:
  xs_tcp_write_space | -163
  xs_udp_write_space | -163
 2 functions changed, 326 bytes removed

net/sunrpc/xprtsock.c:
  xs_write_space | +179
 1 function changed, 179 bytes added

net/sunrpc/xprtsock.o.new:
 3 functions changed, 179 bytes added, 326 bytes removed, diff: -147

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:48:33 -08:00
David S. Miller
e0db4a786b sunrpc: Fix build warning due to typo in %pI4 format changes.
Noticed by Stephen Hemminger.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-02 23:57:06 -08:00
Harvey Harrison
21454aaad3 net: replace NIPQUAD() in net/*/
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:54:56 -07:00
David S. Miller
a1744d3bee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/p54/p54common.c
2008-10-31 00:17:34 -07:00
Harvey Harrison
5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Harvey Harrison
4b7a4274ca net: replace %#p6 format specifier with %pi6
gcc warns when using the # modifier with the %p format specifier,
so we can't use this to omit the colons when needed, introduces
%pi6 instead.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:50:24 -07:00
Harvey Harrison
fdb46ee752 net, misc: replace uses of NIP6_FMT with %p6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 23:02:32 -07:00
Harvey Harrison
b071195deb net: replace all current users of NIP6_SEQFMT with %#p6
The define in kernel.h can be done away with at a later time.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 16:05:40 -07:00
Trond Myklebust
2a9e1cfa23 SUNRPC: Respond promptly to server TCP resets
If the server sends us an RST error while we're in the TCP_ESTABLISHED
state, then that will not result in a state change, and so the RPC client
ends up hanging forever (see
http://bugzilla.kernel.org/show_bug.cgi?id=11154)

We can intercept the reset by setting up an sk->sk_error_report callback,
which will then allow us to initiate a proper shutdown and retry...

We also make sure that if the send request receives an ECONNRESET, then we
shutdown too...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-28 15:21:39 -04:00
Alan Cox
113aa838ec net: Rationalise email address: Network Specific Parts
Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-13 19:01:08 -07:00
Chuck Lever
b22602a673 SUNRPC: Ensure all transports set rq_xtime consistently
The RPC client uses the rq_xtime field in each RPC request to determine the
round-trip time of the request.  Currently, the rq_xtime field is
initialized by each transport just before it starts enqueing a request to
be sent.  However, transports do not handle initializing this value
consistently; sometimes they don't initialize it at all.

To make the measurement of request round-trip time consistent for all
RPC client transport capabilities, pull rq_xtime initialization into the
RPC client's generic transport logic.  Now all transports will get a
standardized RTT measure automatically, from:

  xprt_transmit()

to

  xprt_complete_rqst()

This makes round-trip time calculation more accurate for the TCP transport.
The socket ->sendmsg() method can return "-EAGAIN" if the socket's output
buffer is full, so the TCP transport's ->send_request() method may call
the ->sendmsg() method repeatedly until it gets all of the request's bytes
queued in the socket's buffer.

Currently, the TCP transport sets the rq_xtime field every time through
that loop so the final value is the timestamp just before the *last* call
to the underlying socket's ->sendmsg() method.  After this patch, the
rq_xtime field contains a timestamp that reflects the time just before the
*first* call to ->sendmsg().

This is consequential under heavy workloads because large requests often
take multiple ->sendmsg() calls to get all the bytes of a request queued.
The TCP transport causes the request to sleep until the remote end of the
socket has received enough bytes to clear space in the socket's local
output buffer.  This delay can be quite significant.

The method introduced by this patch is a more accurate measure of RTT
for stream transports, since the server can cause enough back pressure
to delay (ie increase the latency of) requests from the client.

Additionally, this patch corrects the behavior of the RDMA transport, which
entirely neglected to initialize the rq_xtime field.  RPC performance
metrics for RDMA transports now display correct RPC request round trip
times.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Tom Talpey <thomas.talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:15 -04:00
Trond Myklebust
233607dbbc Merge branch 'devel' 2008-04-24 14:01:02 -04:00
Trond Myklebust
7c1d71cf56 SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
NFSv4 requires us to ensure that we break the TCP connection before we're
allowed to retransmit a request. However in the case where we're
retransmitting several requests that have been sent on the same
connection, we need to ensure that we don't interfere with the attempt to
reconnect and/or break the connection again once it has been established.

We therefore introduce a 'connection' cookie that is bumped every time a
connection is broken. This allows requests to track if they need to force a
disconnection.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:12 -04:00
Trond Myklebust
06b4b681ab SUNRPC: remove XS_SENDMSG_RETRY
The condition for exiting from the loop in xs_tcp_send_request() should be
that we find we're not making progress (i.e. number of bytes sent is 0).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:05 -04:00
Trond Myklebust
b6ddf64ffe SUNRPC: Fix up xprt_write_space()
The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.

SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.

Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:52:44 -04:00
Harvey Harrison
0dc47877a3 net: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-05 20:47:47 -08:00
Trond Myklebust
ff2d7db848 SUNRPC: Ensure that we read all available tcp data
Don't stop until we run out of data, or we hit an error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-28 23:26:27 -08:00
Chuck Lever
33e01dc7f5 SUNRPC: Clean up functions that free address_strings array
Clean up: document the rule (kfree) and the exceptions
(RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in
a transport's address_strings array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:08 -05:00
Chuck Lever
b454ae9060 SUNRPC: fewer conditionals in the format_ip_address routines
Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID.  This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:04 -05:00
Trond Myklebust
ba7392bb37 SUNRPC: Add support for per-client timeout values
In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
2881ae74e6 SUNRPC: Clean up the transport timeout initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Trond Myklebust
663b8858dd SUNRPC: Reconnect immediately whenever the server isn't refusing it.
If we've disconnected from the server, rather than the other way round,
then it makes little sense to wait 3 seconds before reconnecting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
62da3b2488 SUNRPC: Rename xprt_disconnect()
xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:27 -05:00
Trond Myklebust
3ebb067d92 SUNRPC: Make call_status()/call_decode() call xprt_force_disconnect()
Move the calls to xprt_disconnect() over to xprt_force_disconnect() in
order to enable the transport layer to manage the state of the
XPRT_CONNECTED flag.
Ditto in xs_tcp_read_fraghdr().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:26 -05:00
Trond Myklebust
7272dcd31d SUNRPC: xprt_autoclose() should not call xprt_disconnect()
The transport layer should do that itself whenever appropriate.

Note that the RDMA transport already assumes that it needs to call
xprt_disconnect in xprt_rdma_close().
For TCP sockets, we want to call xprt_disconnect() only after the
connection has been closed by both ends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:26 -05:00
Trond Myklebust
e06799f958 SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket
By using shutdown() rather than close() we allow the RPC client to wait
for the TCP close handshake to complete before we start trying to reconnect
using the same port.
We use shutdown(SHUT_WR) only instead of shutting down both directions,
however we wait until the server has closed the connection on its side.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:26 -05:00
Trond Myklebust
ef80367071 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
3b948ae5be SUNRPC: Allow the client to detect if the TCP connection is closed
Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
67a391d72c SUNRPC: Fix TCP rebinding logic
Currently the TCP rebinding logic assumes that if we're not using a
reserved port, then we don't need to reconnect on the same port if a
disconnection event occurs. This breaks most RPC duplicate reply cache
implementations.

Also take into account the fact that xprt_min_resvport and
xprt_max_resvport may change while we're reconnecting, since the user may
change them at any time via the sysctls. Ensure that we check the port
boundaries every time we loop in xs_bind4/xs_bind6. Also ensure that if the
boundaries change, we only scan the ports a maximum of 2 times.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:25 -05:00
Trond Myklebust
66af1e5585 SUNRPC: Fix a race in xs_tcp_state_change()
When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Herbert Xu
1781f7f580 [UDP]: Restore missing inDatagrams increments
The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28 14:56:33 -08:00
Adrian Bunk
483066d62e SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static
xs_setup_{udp,tcp}() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:50 -05:00
Linus Torvalds
f4921aff5b Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c
2007-10-15 10:47:35 -07:00
John Heffner
02b3d34631 [NET] Cleanup: Use sock_owned_by_user() macro
Changes asserts in sunrpc to use sock_owned_by_user() macro instead of
referencing sock_lock.owner directly.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10 16:49:00 -07:00
Trond Myklebust
2199700f1d SUNRPC: Fix buggy UDP transmission
xs_sendpages() may return a negative result. We sure as hell don't want to
add that to the 'tk_bytes_sent' tally...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:37 -04:00
Chuck Lever
1321d8d971 SUNRPC: Fix bytes-per-op accounting for RPC over UDP
NFS performance metrics reported zero bytes sent per op when mounting with
UDP.  The UDP socket transport wasn't properly counting the number of bytes
sent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:19 -04:00
\"Talpey, Thomas\
4fa016eb24 NFS/SUNRPC: support transport protocol naming
To prepare for including non-sockets-based RPC transports, select
RPC transports by an identifier (to be used in following patches).

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:50 -04:00
\"Talpey, Thomas\
49c36fcc44 SUNRPC: rearrange RPC sockets definitions
To prepare for including non-sockets-based RPC transports, move the
sockets-dependent definitions into their own file.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:48 -04:00
\"Talpey, Thomas\
3c341b0b92 SUNRPC: rename the rpc_xprtsock_create structure
To prepare for including non-sockets-based RPC transports, change the
overly suggestive name of the transport creation arguments struct.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:45 -04:00
\"Talpey, Thomas\
bc25571e21 SUNRPC: Finish API to load RPC transport implementations dynamically
Allow RPC client transport implementations to be loaded as needed, or
as they become available from distributors or third-party vendors.

Note that we leave the IP sockets implementation in sunrpc.o
permanently, as IP functionality is always available in any
kernel that runs NFS.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:42 -04:00
\"Talpey, Thomas\
4417c8c41a SUNRPC: export per-transport rpcbind netid's
The rpcbind (v3+) netid is provided by each RPC client transport. This fixes
an omission in IPv6 rpcbind client support, and enables future extension.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:20 -04:00
Chuck Lever
756805e7a7 SUNRPC: Add support for formatted universal addresses
"Universal addresses" are a string representation of an IP address and
port.  They are described fully in RFC 3530, section 2.2.  Add support
for generating them in the RPC client's socket transport module.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2007-10-09 17:16:29 -04:00
Chuck Lever
8945ee5e27 SUNRPC: Split xs_reclassify_socket into an IPv4 and IPv6 version
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:26 -04:00
Chuck Lever
95392c593e SUNRPC: Add a helper for extracting the address using the correct type
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:24 -04:00
Chuck Lever
8f9d5b1a2e SUNRPC: Add IPv6 address support to net/sunrpc/xprtsock.c
Finalize support for setting up RPC client transports to remote RPC
services addressed via IPv6.

Based on work done by Gilles Quillard at Bull Open Source.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:21 -04:00
Chuck Lever
68e220bd5c SUNRPC: create connect workers for IPv6
Clone separate connect worker functions for connecting AF_INET6 sockets.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:18 -04:00
Chuck Lever
9c3d72de28 SUNRPC: Rename IPv4 connect workers
Prepare for introduction of IPv6 versions of same.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:15 -04:00
Chuck Lever
16be2d20d9 SUNRPC: Refactor a part of socket connect logic into a helper function
Finishing a socket connect is the same for IPv4 and IPv6, so split it out
into a helper.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:13 -04:00
Chuck Lever
90058d37c3 SUNRPC: create an IPv6-savvy mechanism for binding to a reserved port
Clone xs_bindresvport into two functions, one that can handle IPv4
addresses, and one that can handle IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:10 -04:00
Chuck Lever
7dc753f039 SUNRPC: Rename xs_bind() to prepare for IPv6-specific bind method
Prepare for introduction of IPv6-specific socket bind function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:08 -04:00
Chuck Lever
20612005c5 SUNRPC: Introduce support for setting the port number in IPv6 addresses
We could clone xs_set_port, but this is easier overall.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:16:05 -04:00
Chuck Lever
4b6473fba4 SUNRPC: add a function to format IPv6 addresses
Clone xs_format_ipv4_peer_addresses into an IPv6 version.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:59 -04:00
Chuck Lever
ba10f2c234 SUNRPC: Rename xs_format_peer_addresses
Prepare to add an IPv6 version of xs_format_peer_addresses by renaming it
to xs_format_ipv4_peer_addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:55 -04:00
Chuck Lever
fbfe3cc677 SUNRPC: Add hex-formatted address support to rpc_peeraddr2str()
Add support for the NFS client's need to export volume information
with IP addresses formatted in hex instead of decimal.

This isn't used yet, but subsequent patches (not in this series) will
change the NFS client to use this functionality.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:52 -04:00
Chuck Lever
0c43b3d81c SUNRPC: Free address buffers in a loop
Use more generic logic to free buffers holding formatted addresses.  This
makes it less likely a bug will be introduced when adding additional buffer
types in xs_format_peer_address().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:49 -04:00
Chuck Lever
bda243df2f SUNRPC: Use standard macros for printing IP addresses
include/linux/kernel.h gives us some nice macros for formatting IP
addresses.  Use them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:46 -04:00
Chuck Lever
b595bb1506 SUNRPC: Fix a signed v. unsigned comparison in net/sunrpc/xprtsock.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:44 -04:00
Frank van Maarseveen
d3bc9a1deb SUNRPC client: add interface for binding to a local address
In addition to binding to a local privileged port the NFS client should
allow binding to a specific local address. This is used by the server
for callbacks. The patch adds the necessary interface.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Frank van Maarseveen
96802a0951 SUNRPC: cleanup transport creation argument passing
Cleanup argument passing to functions for creating an RPC transport.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Chuck Lever
45160d6275 SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name
Clean up, for consistency.  Rename rpcb_getport as rpcb_getport_async, to
match the naming scheme of rpcb_getport_sync.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Trond Myklebust
c1384c9c4c SUNRPC: fix hang due to eventd deadlock...
Brian Behlendorf writes:

The root cause of the NFS hang we were observing appears to be a rare
deadlock between the kernel provided usermodehelper API and the linux NFS
client.  The deadlock can arise because both of these services use the
generic linux work queues.  The usermodehelper API run the specified user
application in the context of the work queue.  And NFS submits both cleanup
and reconnect work to the generic work queue for handling.  Normally this
is fine but a deadlock can result in the following situation.

  - NFS client is in a disconnected state
  - [events/0] runs a usermodehelper app with an NFS dependent operation,
    this triggers an NFS reconnect.
  - NFS reconnect happens to be submitted to [events/0] work queue.
  - Deadlock, the [events/0] work queue will never process the
    reconnect because it is blocked on the previous NFS dependent
    operation which will not complete.`

The solution is simply to run reconnect requests on rpciod.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:31 -04:00
Chuck Lever
e9b1c9c98c SUNRPC: switch socket-based RPC transports to use rpcbind
Now that we have a version of the portmapper that supports versions 3 and 4
of the rpcbind protocol, use it for new RPC client connections over
sockets.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:13 -07:00
Eric W. Biederman
0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Eric W. Biederman
2b1bec5f52 [PATCH] sysctl: sunrpc: don't unnecessarily set ctl_table->de
We don't need this to prevent module unload races so remove the unnecessary
code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:55 -08:00
Eric W. Biederman
7e35280e51 [PATCH] sysctl: sunrpc: remove unnecessary insert_at_head flag
Because the sunrpc sysctls don't conflict with any other sysctls the setting
the insert at head flag to register_sysctl has no semantic meaning.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:55 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Chuck Lever
46121cf7d8 SUNRPC: fix print format for tk_pid
The tk_pid field is an unsigned short.  The proper print format specifier for
that type is %5u, not %4d.

Also clean up some miscellaneous print formatting nits.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:10 -08:00
Trond Myklebust
21b4e73692 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus 2006-12-07 16:35:17 -05:00
Trond Myklebust
34161db6b1 Merge branch 'master' of /home/trondmy/kernel/linux-2.6/ into merge_linus
Conflicts:

	include/linux/sunrpc/xprt.h
	net/sunrpc/xprtsock.c
Fix up conflicts with the workqueue changes.
2006-12-07 15:48:15 -05:00
Peter Zijlstra
ed07536ed6 [PATCH] lockdep: annotate nfs/nfsd in-kernel sockets
Stick NFS sockets in their own class to avoid some lockdep warnings.  NFS
sockets are never exposed to user-space, and will hence not trigger certain
code paths that would otherwise pose deadlock scenarios.

[akpm@osdl.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Dickson <SteveD@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07 08:39:30 -08:00
Chuck Lever
fbf76683ff SUNRPC: relocate the creation of socket-specific tunables
Clean-up:

The RPC client currently creates some sysctls that are specific to the
socket transport.  Move those entirely into xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:53 -05:00
Chuck Lever
282b32e17f SUNRPC: create stubs for xprtsock init and cleanup
Over time we will want to add some specific init and cleanup logic for the
xprtsock implementation.  Add stub routines for initialization and exit
processing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:53 -05:00
Chuck Lever
dd4564715e SUNRPC: Rename skb_reader_t and friends
Clean-up:  hch suggested that the RPC client shouldn't pollute the name
space used by the generic skb manipulation routines in net/core/skbuff.c.

Rename a couple of types in xdr.h to adhere to this convention.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:52 -05:00
Chuck Lever
9d29231690 SUNRPC: skb_read_bits is the same as xs_tcp_copy_data
Clean-up: eliminate xs_tcp_copy_data -- it's exactly the same logic as the
common routine skb_read_bits.  The UDP and TCP socket read code now share
the same routine for copying data into an xdr_buf.

Now that skb_read_bits() is exported, rename it to avoid confusing it with
a generic skb_* function.  As these functions are XDR-specific, they should
not have names that suggest they are of generic use.  Also rename
skb_read_and_csum_bits() to be consistent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:52 -05:00
Chuck Lever
7559c7a28f SUNRPC: Make address format buffers more generic
For now we will assume that all transports will use the address format
buffers in the rpc_xprt struct to store their addresses.  Change
rpc_peer2str() to be a generic routine to handle this, and get rid of the
print_address() op in the rpc_xprt_ops vector.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:51 -05:00
Chuck Lever
314dfd7987 SUNRPC: move saved socket callback functions to a private data structure
Move the three fields for saving socket callback functions out of the
rpc_xprt structure and into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:51 -05:00
Chuck Lever
7c6e066ec2 SUNRPC: Move the UDP socket bufsize parameters to a private data structure
Move the socket-specific buffer size parameters for UDP sockets to a
private data structure maintained in net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
c847546182 SUNRPC: Move rpc_xprt socket connect fields into private data structure
Move the socket-specific connection management fields out of the generic
rpc_xprt structure into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
e136d0926e SUNRPC: Move TCP state flags into xprtsock.c
Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename
them to use the naming scheme in use in xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:50 -05:00
Chuck Lever
51971139b2 SUNRPC: Move TCP receive state variables into private data structure
Move the TCP receive state variables from the generic rpc_xprt structure to
a private structure maintained inside net/sunrpc/xprtsock.c.

Also rename a function/variable pair to refer to RPC fragment headers
instead of record markers, to be consistent with types defined in
sunrpc/*.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:49 -05:00
Chuck Lever
ee0ac0c227 SUNRPC: Remove sock and inet fields from rpc_xprt
The "sock" and "inet" fields are socket-specific.  Move them to a private
data structure maintained entirely within net/sunrpc/xprtsock.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:49 -05:00
Chuck Lever
ffc2e518c9 SUNRPC: Allocate a private data area for socket-specific rpc_xprt fields
When setting up a new transport instance, allocate enough memory for an
rpc_xprt and a private area.  As part of the same memory allocation, it
will be easy to find one, given a pointer to the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:48 -05:00
Chuck Lever
c8541ecdd5 SUNRPC: Make the transport-specific setup routine allocate rpc_xprt
Change the location where the rpc_xprt structure is allocated so each
transport implementation can allocate a private area from the same
chunk of memory.

Note also that xprt->ops->destroy, rather than xprt_destroy, is now
responsible for freeing rpc_xprt when the transport is destroyed.

Test plan:
Connectathon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:34 -05:00
Trond Myklebust
24c5684b65 SUNRPC: Clean up xs_send_pages()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06 10:46:33 -05:00
David Howells
65f27f3844 WorkStruct: Pass the work_struct pointer instead of context data
Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct.  This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function.  This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated..  This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems.  But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default.  Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:55:48 +00:00
David Howells
52bad64d95 WorkStruct: Separate delayable and non-delayable events.
Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
architecture it's nearly 100 bytes in size.  This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-22 14:54:01 +00:00
Chuck Lever
b7766da7f7 [PATCH] SUNRPC: fix a typo
Yes, this actually passed tests the way it was.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-20 10:26:39 -07:00
Alexey Dobriyan
d8ed029d60 [SUNRPC]: trivial endianness annotations
pure s/u32/__be32/

[AV: large part based on Alexey's patches]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28 18:01:21 -07:00
Linus Torvalds
9f261e0113 Merge git://git.linux-nfs.org/pub/linux/nfs-2.6
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits)
  NFS: unmark NFS direct I/O as experimental
  NFS: add comments clarifying the use of nfs_post_op_update()
  NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers
  NFS: Use SEEK_END instead of hardcoded value
  NFSv4: When mounting with a port=0 argument, substitute port=2049
  NFSv4: Poll more aggressively when handling NFS4ERR_DELAY
  NFSv4: Handle the condition NFS4ERR_FILE_OPEN
  NFSv4: Retry lease recovery if it failed during a synchronous operation.
  NFS: Don't invalidate the symlink we just stuffed into the cache
  NFS: Make read() return an ESTALE if the file has been deleted
  NFSv4: It's perfectly legal for clp to be NULL here....
  NFS: nfs_lookup - don't hash dentry when optimising away the lookup
  SUNRPC: Fix Oops in pmap_getport_done
  SUNRPC: Add refcounting to the struct rpc_xprt
  SUNRPC: Clean up soft task error handling
  SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors
  SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status
  Fix a referral error Oops
  NFS: NFS_ROOT should use the new rpc_create API
  NFS: Fix up compiler warnings on 64-bit platforms in client.c
  ...

Manually resolved conflict in net/sunrpc/xprtsock.c
2006-09-23 16:58:40 -07:00
Chuck Lever
ff9aa5e56d SUNRPC: Eliminate xprt_create_proto and rpc_create_client
The two function call API for creating a new RPC client is now obsolete.
Remove it.

Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services.  The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.

Test plan:
Repeated runs of Connectathon locking suite.  Check network trace to ensure
correctness of NLM requests and replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:51 -04:00
Chuck Lever
c4efcb1d3e SUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer address
IPv6 addresses are big (128 bytes).  Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:48 -04:00
Chuck Lever
edb267a688 SUNRPC: add xprt switch API for printing formatted remote peer addresses
Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:47 -04:00
Chuck Lever
bbf7c1dd2a SUNRPC: Introduce transport switch callout for pluggable rpcbind
Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms.  For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.

Test plan:
Destructive testing (unplugging the network temporarily).  Connectathon
with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:44 -04:00
Chuck Lever
ec739ef03d SUNRPC: Create a helper to tell whether a transport is bound
Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field.  This change is required to support
alternate RPC host address formats (eg IPv6).

Test-plan:
Destructive testing (unplugging the network temporarily).  Repeated runs of
Connectathon locking suite with UDP and TCP.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-09-22 23:24:39 -04:00
Sridhar Samudrala
53fad3cbff [SUNRPC]: Remove the unnecessary check for highmem in xs_sendpages().
Just call kernel_sendpage() directly.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:16 -07:00
Sridhar Samudrala
e6242e928e [SUNRPC]: Update to use in-kernel sockets API.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22 14:54:06 -07:00
Trond Myklebust
e0ab53deaa RPC: Ensure that we disconnect TCP socket when client requests error out
If we're part way through transmitting a TCP request, and the client
errors, then we need to disconnect and reconnect the TCP socket in order to
avoid confusing the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 031a50c8b9ea82616abd4a4e18021a25848941ce commit)
2006-08-03 16:56:55 -04:00
Panagiotis Issaris
0da974f4f3 [NET]: Conversions from kmalloc+memset to k(z|c)alloc.
Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-07-21 14:51:30 -07:00
Chuck Lever
b85d880684 SUNRPC: select privileged port numbers at random
Make the RPC client select privileged ephemeral source ports at
random.  This improves DRC behavior on the server by using the
same port when reconnecting for the same mount point, but using
a different port for fresh mounts.

The Linux TCP implementation already does this for nonprivileged
ports.  Note that TCP sockets in TIME_WAIT will prevent quick reuse
of a random ephemeral port number by leaving the port INUSE until
the connection transitions out of TIME_WAIT.

Test plan:
Connectathon against every known server implementation using multiple
mount points.  Locking especially.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-06-09 09:34:05 -04:00
Chuck Lever
ef759a2e54 SUNRPC: introduce per-task RPC iostats
Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent.  Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:17 -05:00
Chuck Lever
262ca07de4 SUNRPC: add a handful of per-xprt counters
Monitor generic transport events.  Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20 13:44:16 -05:00
Trond Myklebust
632e3bdc50 SUNRPC: Ensure client closes the socket when server initiates a close
If the server decides to close the RPC socket, we currently don't actually
 respond until either another RPC call is scheduled, or until xprt_autoclose()
 gets called by the socket expiry timer (which may be up to 5 minutes
 later).

 This patch ensures that xprt_autoclose() is called much sooner if the
 server closes the socket.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:57 -05:00
Chuck Lever
922004120b SUNRPC: transport switch API for setting port number
At some point, transport endpoint addresses will no longer be IPv4.  To hide
 the structure of the rpc_xprt's address field from ULPs and port mappers,
 add an API for setting the port number during an RPC bind operation.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  NFSv2/3 and NFSv4 mounting should be carefully checked.
 Probably need to rig a server where certain services aren't running, or
 that returns an error for some typical operation.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:56 -05:00
Chuck Lever
0210714834 SUNRPC: switchable buffer allocation
Add RPC client transport switch support for replacing buffer management
 on a per-transport basis.

 In the current IPv4 socket transport implementation, RPC buffers are
 allocated as needed for each RPC message that is sent.  Some transport
 implementations may choose to use pre-allocated buffers for encoding,
 sending, receiving, and unmarshalling RPC messages, however.  For
 transports capable of direct data placement, the buffers can be carved
 out of a pre-registered area of memory rather than from a slab cache.

 Test-plan:
 Millions of fsx operations.  Performance characterization with "sio" and
 "iozone".  Use oprofile and other tools to look for significant regression
 in CPU utilization.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06 14:58:55 -05:00
Trond Myklebust
b079fa7baa RPC: Do not block on skb allocation
If we get something like the following,
 [  125.300636]  [<c04086e1>] schedule_timeout+0x54/0xa5
 [  125.305931]  [<c040866e>] io_schedule_timeout+0x29/0x33
 [  125.311495]  [<c02880c4>] blk_congestion_wait+0x70/0x85
 [  125.317058]  [<c014136b>] throttle_vm_writeout+0x69/0x7d
 [  125.322720]  [<c014714d>] shrink_zone+0xe0/0xfa
 [  125.327560]  [<c01471d4>] shrink_caches+0x6d/0x6f
 [  125.332581]  [<c01472a6>] try_to_free_pages+0xd0/0x1b5
 [  125.338056]  [<c013fa4b>] __alloc_pages+0x135/0x2e8
 [  125.343258]  [<c03b74ad>] tcp_sendmsg+0xaa0/0xb78
 [  125.348281]  [<c03d4666>] inet_sendmsg+0x48/0x53
 [  125.353212]  [<c0388716>] sock_sendmsg+0xb8/0xd3
 [  125.358147]  [<c0388773>] kernel_sendmsg+0x42/0x4f
 [  125.363259]  [<c038bc00>] sock_no_sendpage+0x5e/0x77
 [  125.368556]  [<c03ee7af>] xs_tcp_send_request+0x2af/0x375
 then the socket is blocked until memory is reclaimed, and no
 progress can ever be made.

 Try to access the emergency pools by using GFP_ATOMIC.

 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19 23:11:54 -05:00
Chuck Lever
c556b75496 SUNRPC: allow sunrpc.o to link when CONFIG_SYSCTL is disabled
The sunrpc module should build properly even when CONFIG_SYSCTL is
 disabled.

 Reported by Jan-Benedict Glaw.

 Test plan:
 Compile kernel with CONFIG_NFS as a module and built-in, and CONFIG_SYSCTL
 enabled and disabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04 15:39:45 -05:00
Chuck Lever
470056c288 [PATCH] RPC: rationalize set_buffer_size
In fact, ->set_buffer_size should be completely functionless for non-UDP.

 Test-plan:
 Check socket buffer size on UDP sockets over time.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:55 -04:00
Chuck Lever
03bf4b707e [PATCH] RPC: parametrize various transport connect timeouts
Each transport implementation can now set unique bind, connect,
 reestablishment, and idle timeout values.  These are variables,
 allowing the values to be modified dynamically.  This permits
 exponential backoff of any of these values, for instance.

 As an example, we implement exponential backoff for the connection
 reestablishment timeout.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:53 -04:00
Chuck Lever
3167e12c0c [PATCH] RPC: make sure to get the same local port number when reconnecting
Implement a best practice: if the remote end drops our connection, try to
 reconnect using the same port number.  This is important because the NFS
 server's Duplicate Reply Cache often hashes on the source port number.
 If the client reuses the port number when it reconnects, the server's DRC
 will be more effective.

 Based on suggestions by Mike Eisler, Olaf Kirch, and Alexey Kuznetsky.

 Test-plan:
 Destructive testing.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:52 -04:00
Chuck Lever
529b33c6db [PATCH] RPC: allow RPC client's port range to be adjustable
Select an RPC client source port between 650 and 1023 instead of between
 1 and 800.  The old range conflicts with a number of network services.
 Provide sysctls to allow admins to select a different port range.

 Note that this doesn't affect user-level RPC library behavior, which
 still uses 1 to 800.

 Based on a suggestion by Olaf Kirch <okir@suse.de>.

 Test-plan:
 Repeated mount and unmount.  Destructive testing.  Idle timeouts.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:50 -04:00
Chuck Lever
555ee3af16 [PATCH] RPC: clean up after nocong was removed
Clean-up:  Move some macros that are specific to the Van Jacobson
 implementation into xprt.c.  Get rid of the cong_wait field in
 rpc_xprt, which is no longer used.  Get rid of xprt_clear_backlog.

 Test-plan:
 Compile with CONFIG_NFS enabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:48 -04:00
Chuck Lever
ed63c00370 [PATCH] RPC: remove xprt->nocong
Get rid of the "xprt->nocong" variable.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
 Look for significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:47 -04:00
Chuck Lever
a58dd398f5 [PATCH] RPC: add a release_rqst callout to the RPC transport switch
The final place where congestion control state is adjusted is in
 xprt_release, where each request is finally released.  Add a callout
 there to allow transports to perform additional processing when a
 request is about to be released.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:45 -04:00
Chuck Lever
1570c1e41e [PATCH] RPC: add generic interface for adjusting the congestion window
A new interface that allows transports to adjust their congestion window
 using the Van Jacobson implementation in xprt.c is provided.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for
 significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:43 -04:00
Chuck Lever
46c0ee8bc4 [PATCH] RPC: separate xprt_timer implementations
Allow transports to hook the retransmit timer interrupt.  Some transports
 calculate their congestion window here so that a retransmit timeout has
 immediate effect on the congestion window.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:41 -04:00
Chuck Lever
49e9a89086 [PATCH] RPC: expose API for serializing access to RPC transports
The next method we abstract is the one that releases a transport,
 allowing another task to have access to the transport.

 Again, one generic version of this is provided for transports that
 don't need the RPC client to perform congestion control, and one
 version is for transports that can use the original Van Jacobson
 implementation in xprt.c.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for
 significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:40 -04:00
Chuck Lever
12a804698b [PATCH] RPC: expose API for serializing access to RPC transports
The next several patches introduce an API that allows transports to
 choose whether the RPC client provides congestion control or whether
 the transport itself provides it.

 The first method we abstract is the one that serializes access to the
 RPC transport to prevent the bytes from different requests from mingling
 together.  This method provides proper request serialization and the
 opportunity to prevent new requests from being started because the
 transport is congested.

 The normal situation is for the transport to handle congestion control
 itself.  Although NFS over UDP was first, it has been recognized after
 years of experience that having the transport provide congestion control
 is much better than doing it in the RPC client.  Thus TCP, and probably
 every future transport implementation, will use the default method,
 xprt_lock_write, provided in xprt.c, which does not provide any kind
 of congestion control.  UDP can continue using the xprt.c-provided
 Van Jacobson congestion avoidance implementation.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:38 -04:00
Chuck Lever
fe3aca290f [PATCH] RPC: add API to set transport-specific timeouts
Prepare the way to remove the "xprt->nocong" variable by adding a callout
 to the RPC client transport switch API to handle setting RPC retransmit
 timeouts.

 Add a pair of generic helper functions that provide the ability to set a
 simple fixed timeout, or to set a timeout based on the state of a round-
 trip estimator.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:36 -04:00
Chuck Lever
43118c29de [PATCH] RPC: get rid of xprt->stream
Now we can fix up the last few places that use the "xprt->stream"
 variable, and get rid of it from the rpc_xprt structure.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:35 -04:00
Chuck Lever
808012fbb2 [PATCH] RPC: skip over transport-specific heads automatically
Add a generic mechanism for skipping over transport-specific headers
 when constructing an RPC request.  This removes another "xprt->stream"
 dependency.

 Test-plan:
 Write-intensive workload on a single mount point (try both UDP and
 TCP).

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:33 -04:00
Chuck Lever
262965f53d [PATCH] RPC: separate TCP and UDP socket write paths
Split the RPC client's main socket write path into a TCP version and a UDP
 version to eliminate another dependency on the "xprt->stream" variable.

 Compiler optimization removes unneeded code from xs_sendpages, as this
 function is now called with some constant arguments.

 We can now cleanly perform transport protocol-specific return code testing
 and error recovery in each path.

 Test-plan:
 Millions of fsx operations.  Performance characterization such as
 "sio" or "iozone".  Examine oprofile results for any changes before and
 after this patch is applied.

 Version: Thu, 11 Aug 2005 16:08:46 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:31 -04:00
Chuck Lever
b0d93ad511 [PATCH] RPC: separate TCP and UDP transport connection logic
Create separate connection worker functions for managing UDP and TCP
 transport sockets.  This eliminates several dependencies on "xprt->stream".

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon with
 v2, v3, and v4.

 Version: Thu, 11 Aug 2005 16:08:18 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:29 -04:00
Chuck Lever
c7b2cae8a6 [PATCH] RPC: separate TCP and UDP write space callbacks
Split the socket write space callback function into a TCP version and UDP
 version, eliminating one dependence on the "xprt->stream" variable.

 Keep the common pieces of this path in xprt.c so other transports can use
 it too.

 Test-plan:
 Write-intensive workload on a single mount point.

 Version: Thu, 11 Aug 2005 16:07:51 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:28 -04:00
Chuck Lever
55aa4f58aa [PATCH] RPC: client-side transport switch cleanup
Clean-up: change some comments to reflect the realities of the new RPC
 transport switch mechanism.  Get rid of unused xprt_receive() prototype.

 Also, organize function prototypes in xprt.h by usage and scope.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:07:21 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:26 -04:00
Chuck Lever
44fbac2288 [PATCH] RPC: Add helper for waking tasks pending on a transport
Clean-up: remove only reference to xprt->pending from the socket transport
 implementation.  This makes a cleaner interface for other transport
 implementations as well.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:06:52 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:24 -04:00
Chuck Lever
2226feb6bc [PATCH] RPC: rename the sockstate field
Clean-up: get rid of a name reference to sockets in the generic parts of the
 RPC client by renaming the sockstate field in the rpc_xprt structure.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:05:53 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:21 -04:00
Chuck Lever
4a0f8c04f2 [PATCH] RPC: Rename sock_lock
Clean-up: replace a name reference to sockets in the generic parts of the RPC
 client by renaming sock_lock in the rpc_xprt structure.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:05:00 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:17 -04:00
Chuck Lever
b4b5cc85ed [PATCH] RPC: Reduce stack utilization in xs_sendpages
Reduce stack utilization of the RPC socket transport's send path.

 A couple of unlikely()s are added to ensure the compiler places the
 tail processing at the end of the csect.

 Test-plan:
 Millions of fsx operations.  Performance characterization such as "sio" or
 "iozone".

 Version: Thu, 11 Aug 2005 16:04:30 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:16 -04:00
Chuck Lever
9903cd1c27 [PATCH] RPC: transport switch function naming
Introduce block header comments and a function naming convention to the
 socket transport implementation.  Provide a debug setting for transports
 that is separate from RPCDBG_XPRT.  Eliminate xprt_default_timeout().

 Provide block comments for exposed interfaces in xprt.c, and eliminate
 the useless obvious comments.

 Convert printk's to dprintk's.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:04:04 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:14 -04:00
Chuck Lever
a246b0105b [PATCH] RPC: introduce client-side transport switch
Move the bulk of client-side socket-specific code into a separate source
 file, net/sunrpc/xprtsock.c.

 Test-plan:
 Millions of fsx operations.  Performance characterization such as "sio" or
 "iozone".  Destructive testing (unplugging the network temporarily, server
 reboots).  Connectathon with v2, v3, and v4.

 Version: Thu, 11 Aug 2005 16:03:38 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:12 -04:00