Commit Graph

575451 Commits

Author SHA1 Message Date
Trond Myklebust
fc7ff36747 pNFS: If we have to delay the layout callback, mark the layout for return
If the client needs to delay the layout callback, then speed up the recall
process by marking the remaining layout segments to be actively returned
by the client.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:04 -05:00
Trond Myklebust
0654cc726f NFSv4.1/pNFS: Add a helper to mark the layout as returned
This ensures that we don't reuse the stateid if a layout return or
implied layout return means that we've returned all layout segments

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:04 -05:00
Trond Myklebust
ab7d763e47 pNFS: Ensure nfs4_layoutget_prepare returns the correct error
If we're unable to perform the layoutget due to an invalid open stateid
or a bulk recall, ensure that we return the error so that the caller
can decide on an appropriate action.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:33:03 -05:00
Trond Myklebust
4d0ac22109 pNFS/flexfiles: Ensure we record layoutstats even if RPC is terminated early
Currently, we will only record the layoutstats correctly if the
RPC call successfully obtains a slot. If we exit before that
happens, then we may find ourselves starting the busy timer through
the call in ff_layout_(read|write)_prepare_layoutstats, but never stopping it.

The same thing happens if we're doing DA-DS.

The fix is to ensure that we catch these cases in the rpc_release()
callback.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:41 -05:00
Trond Myklebust
37e9ed22b1 pNFS: Add flag to track if we've called nfs4_ff_layout_stat_io_start_read/write
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:41 -05:00
Trond Myklebust
7eeea16797 pNFS/flexfiles: Fix a statistics gathering imbalance
When we replay a failed read, write or commit to the dataserver, we
need to ensure that we call ff_layout_read_prepare_v3(),
ff_layout_write_prepare_v3 or ff_layout_commit_prepare_v3() so that we
reset the statistics.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:40 -05:00
Trond Myklebust
b9fc773ef5 pNFS/flexfiles: Don't mark the entire layout as failed, when returning it
In pNFS/flexfiles, we want to return the layout without necessarily marking
it as having completely failed. We therefore move the call to
pnfs_layout_io_set_failed() out of pnfs_error_mark_layout_for_return(),
and then ensura that pNFS/files layout calls it separately.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:40 -05:00
Trond Myklebust
2e5b29f044 pNFS/flexfiles: Don't prevent flexfiles client from retrying LAYOUTGET
Fix a bug in which flexfiles clients are falling back to I/O through the
MDS even when the FF_FLAGS_NO_IO_THRU_MDS flag is set.

The flexfiles client will always report errors through the LAYOUTRETURN
and/or LAYOUTERROR mechanisms, so it should normally be safe for it
to retry the LAYOUTGET until it fails or succeeds.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:40 -05:00
Peng Tao
141b9b59ed pnfs/flexfiles: count io stat in rpc_count_stats callback
If client ever restarts IO due to some errors, we'll endup
mis-counting IO stats if we do the counting in .rpc_done
callback. Move it to .rpc_count_stats callback that is only
called when releasing RPC.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:39 -05:00
Peng Tao
c22eeb8697 pnfs/flexfiles: do not mark delay-like status as DS failure
We just need to delay and retry in these cases.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:39 -05:00
Peng Tao
7c1e6e58e2 NFS41: map NFS4ERR_LAYOUTUNAVAILABLE to ENODATA
Instead of mapping it to EIO that is a fatal error and
fails application. We'll go inband after getting
NFS4ERR_LAYOUTUNAVAILABLE.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:38 -05:00
Peng Tao
d6c843b96e nfs: only remove page from mapping if launder_page fails
Instead of dropping pages when write fails, only do it when
we get fatal failure in launder_page write back.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:38 -05:00
Peng Tao
0bcbf039f6 nfs: handle request add failure properly
When we fail to queue a read page to IO descriptor,
we need to clean it up otherwise it is hanging around
preventing nfs module from being removed.

When we fail to queue a write page to IO descriptor,
we need to clean it up and also save the failure status
to open context. Then at file close, we can try to write
pages back again and drop the page if it fails to writeback
in .launder_page, which will be done in the next patch.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:37 -05:00
Peng Tao
2bff228857 nfs: centralize pgio error cleanup
In case we fail during setting things up for read/write IO, set
pg_error in IO descriptor and do the cleanup in nfs_pageio_add_request,
where we clean up all pages that are still hanging around on the IO
descriptor.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:37 -05:00
Peng Tao
c18b96a1b8 nfs: clean up rest of reqs when failing to add one
If we fail to set up things before sending anything over wire,
we need to clean up the reqs that are still attached to the
IO descriptor.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:37 -05:00
Peng Tao
d600ad1f2b NFS41: pop some layoutget errors to application
For ERESTARTSYS/EIO/EROFS/ENOSPC/E2BIG in layoutget, we
should just bail out instead of hiding the error and
retrying inband IO.

Change all the call sites to pop the error all the way up.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:36 -05:00
Trond Myklebust
d0379a5d06 pNFS/flexfiles: Support server-supplied layoutstats sampling period
Some servers want to be able to control the frequency with which clients
report layoutstats, for instance, in order to monitor QoS for a particular
file or set of file. In order to support this, the flexfiles layout allows
the server to pass this info as a hint in the layout payload.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 14:32:36 -05:00
Linus Torvalds
8513342170 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
 "This fixes a bug in the algif_skcipher interface that can trigger a
  kernel WARN_ON from user-space.  It does so by using the new skcipher
  interface which unlike the previous ablkcipher does not need to create
  extra geniv objects which is what was used to trigger the WARN_ON"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_skcipher - Use new skcipher interface
2015-12-28 10:44:41 -08:00
Linus Torvalds
2c7143d4f5 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key handling bugfix from James Morris:
 "Fix a race between keyctl_read() and keyctl_revoke()"

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  KEYS: Fix race between read and revoke
2015-12-28 10:35:19 -08:00
Trond Myklebust
494f74a26c NFS: Flush reclaim writes using FLUSH_COND_STABLE
If there are already writes queued up for commit, then don't flush
just this page even if it is a reclaim issue.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 13:34:59 -05:00
Trond Myklebust
b0ac1bd2bb NFS: Background flush should not be low priority
Background flush is needed in order to satisfy the global page limits.
Don't subvert by reducing the priority.
This should also address a write starvation issue that was reported by
Neil Brown.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 13:30:42 -05:00
Trond Myklebust
1a093ceb05 NFSv4.1/pnfs: Fixup an lo->plh_block_lgets imbalance in layoutreturn
Since commit 2d8ae84fbc, nothing is bumping lo->plh_block_lgets in the
layoutreturn path, so it should not be touched in nfs4_layoutreturn_release
either.

Fixes: 2d8ae84fbc ("NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets...")
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 13:23:36 -05:00
Trond Myklebust
762674f86d NFSv4: Don't perform cached access checks before we've OPENed the file
Donald Buczek reports that a nfs4 client incorrectly denies
execute access based on outdated file mode (missing 'x' bit).
After the mode on the server is 'fixed' (chmod +x) further execution
attempts continue to fail, because the nfs ACCESS call updates
the access parameter but not the mode parameter or the mode in
the inode.

The root cause is ultimately that the VFS is calling may_open()
before the NFS client has a chance to OPEN the file and hence revalidate
the access and attribute caches.

Al Viro suggests:
>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>>> that things will be caught by server anyway?
>>
>> That can work as long as we're guaranteed that everything that calls
>> inode_permission() with MAY_OPEN on a regular file will also follow up
>> with a vfs_open() or dentry_open() on success. Is this always the
>> case?
>
> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
> it doesn't have ->tmpfile() instance anyway)
>
> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>
> 3) in do_last(), followed on success by vfs_open()
>
> That's all.  All calls of inode_permission() that get MAY_OPEN come from
> may_open(), and there's no other callers of that puppy.

Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Link: http://lkml.kernel.org/r/1451046656-26319-1-git-send-email-buczek@molgen.mpg.de
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-12-28 13:22:20 -05:00
Oliver Freyermuth
f7d7f59ab1 USB: cp210x: add ID for ELV Marble Sound Board 1
Add the USB device ID for ELV Marble Sound Board 1.

Signed-off-by: Oliver Freyermuth <o.freyermuth@googlemail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2015-12-28 19:07:35 +01:00
Pablo Neira Ayuso
5913beaf0d netfilter: nfnetlink: pass down netns pointer to commit() and abort() callbacks
Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-28 18:43:15 +01:00
Pablo Neira Ayuso
7b8002a151 netfilter: nfnetlink: pass down netns pointer to call() and call_rcu()
Adapt callsites to avoid recurrent lookup of the netns pointer.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-28 18:41:41 +01:00
Pablo Neira Ayuso
f4c756b4ea netfilter: nf_tables: remove check against removal of inactive objects
The following sequence inside a batch, although not very useful, is
valid:

 add table foo
 ...
 delete table foo

This may be generated by some robot while applying some incremental
upgrade, so remove the defensive checks against this.

This patch keeps the check on the get/dump path by now, we have to
replace the inactive flag by introducing object generations.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-28 18:37:20 +01:00
Pablo Neira Ayuso
5ebe0b0eec netfilter: nf_tables: destroy basechain and rules on netdevice removal
If the netdevice is destroyed, the resources that are attached should
be released too as they belong to the device that is now gone.

Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-28 18:34:35 +01:00
Pablo Neira Ayuso
df05ef874b netfilter: nf_tables: release objects on netns destruction
We have to release the existing objects on netns removal otherwise we
leak them. Chains are unregistered in first place to make sure no
packets are walking on our rules and sets anymore.

The object release happens by when we unregister the family via
nft_release_afinfo() which is called from nft_unregister_afinfo() from
the corresponding __net_exit path in every family.

Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-12-28 18:34:35 +01:00
Devesh Sharma
f41647ef06 RDMA/be2net: Remove open and close entry points
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issueing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

In order to resolve the above deadlock condition, ocrdma intorduced a
patch to stop listening to administrative open/close events generated from
be2net driver. It now depends on link-state-change async-event generated from
CNA. This change leaves behind dead code which used to generate administrative
open/close events. This patch cleans-up all that dead code from be2net.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:55 -05:00
Devesh Sharma
10a214dc99 RDMA/ocrdma: Depend on async link events from CNA
Recently Dough Ledford reported a deadlock happening
between ocrdma-load sequence and NetworkManager service
issuing "open" on be2net interface.

The deadlock happens when any be2net hook (e.g. open/close) is called
in parallel to insmod ocrdma.ko.

A. be2net is sending administrative open/close event to ocrdma holding
   device_list_mutex. It does this from ndo_open/ndo_stop hooks of be2net.
   So sequence of locks is rtnl_lock---> device_list lock

B.  When new ocrdma roce device gets registered, infiniband stack now
    takes rtnl_lock in ib_register_device() in GID initialization routines.
    So sequence of locks in this path is device_list lock ---> rtnl_lock.

This improper locking sequence causes deadlock.

With this patch we stop using administrative open and close events
injected by be2net driver. These events were used to dispatch PORT_ACTIVE
and PORT_ERROR events to the IB-stack. This patch implements a logic
to receive async-link-events generated from CNA whenever link-state-change
is detected. Now on, these async-events will be used to dispatch
PORT_ACTIVE and PORT_ERROR events to IB-stack.

Depending on async-events from CNA removes the need to hold device-list-mutex
and thus breaks the busy-wait scenario.

Reported-by: Doug Ledford <dledford@redhat.com>
CC: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:54 -05:00
Devesh Sharma
36ac0db0db RDMA/ocrdma: Dispatch only port event when port state changes
Dispatch only port event to IB stack when port state changes.
Don't explicitly modify qps to error. Let application listen to
port events on async event queue or let QP fail with retry-exceeded
completion error.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:54 -05:00
Devesh Sharma
c6002d5602 RDMA/ocrdma: Fix vlan-id assignment in qp parameters
vlan-id is wrongly getting as 0 when PFC is enabled.
Set vlan-id configured by user in QP parameters.
In case vlan interface is not used, flash a warning to
user to configure vlan and assign vlan-id as 0 in qp params.

Fixes: dbf727de74 ('IB/core: Use GID table in AH creation and dmac resolution')
Cc: Matan Barak <matanb@mellanox.com>
Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-28 11:45:54 -05:00
Joerg Roedel
a639a8eecf iommu/amd: Preallocate dma_ops apertures based on dma_mask
Preallocate between 4 and 8 apertures when a device gets it
dma_mask. With more apertures we reduce the lock contention
of the domain lock significantly.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:54 +01:00
Joerg Roedel
7b5e25b84e iommu/amd: Use trylock to aquire bitmap_lock
First search for a non-contended aperture with trylock
before spinning.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:54 +01:00
Joerg Roedel
5f6bed5005 iommu/amd: Make dma_ops_domain->next_index percpu
Make this pointer percpu so that we start searching for new
addresses in the range we last stopped and which is has a
higher probability of being still in the cache.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:54 +01:00
Joerg Roedel
92d420ec02 iommu/amd: Relax locking in dma_ops path
Remove the long holding times of the domain->lock and rely
on the bitmap_lock instead.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:54 +01:00
Joerg Roedel
a73c156665 iommu/amd: Initialize new aperture range before making it visible
Make sure the aperture range is fully initialized before it
is visible to the address allocator.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:53 +01:00
Joerg Roedel
7bfa5bd270 iommu/amd: Build io page-tables with cmpxchg64
This allows to build up the page-tables without holding any
locks. As a consequence it removes the need to pre-populate
dma_ops page-tables.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:53 +01:00
Joerg Roedel
266a3bd28f iommu/amd: Allocate new aperture ranges in dma_ops_alloc_addresses
It really belongs there and not in __map_single.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:53 +01:00
Joerg Roedel
4eeca8c5e7 iommu/amd: Optimize dma_ops_free_addresses
Don't flush the iommu tlb when we free something behind the
current next_bit pointer. Update the next_bit pointer
instead and let the flush happen on the next wraparound in
the allocation path.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:53 +01:00
Joerg Roedel
ab7032bb9c iommu/amd: Remove need_flush from struct dma_ops_domain
The flushing of iommu tlbs is now done on a per-range basis.
So there is no need anymore for domain-wide flush tracking.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:53 +01:00
Joerg Roedel
2a87442c5b iommu/amd: Iterate over all aperture ranges in dma_ops_area_alloc
This way we don't need to care about the next_index wrapping
around in dma_ops_alloc_addresses.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:52 +01:00
Joerg Roedel
d41ab09896 iommu/amd: Flush iommu tlb in dma_ops_free_addresses
Instead of setting need_flush, do the flush directly in
dma_ops_free_addresses.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:52 +01:00
Joerg Roedel
ebaecb423b iommu/amd: Rename dma_ops_domain->next_address to next_index
It points to the next aperture index to allocate from. We
don't need the full address anymore because this is now
tracked in struct aperture_range.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:52 +01:00
Joerg Roedel
05ab49e005 iommu/amd: Remove 'start' parameter from dma_ops_area_alloc
Parameter is not needed because the value is part of the
already passed in struct dma_ops_domain.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:52 +01:00
Joerg Roedel
ccb50e03da iommu/amd: Flush iommu tlb in dma_ops_aperture_alloc()
Since the allocator wraparound happens in this function now,
flush the iommu tlb there too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:51 +01:00
Joerg Roedel
60e6a7cb44 iommu/amd: Retry address allocation within one aperture
Instead of skipping to the next aperture, first try again in
the current one.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:51 +01:00
Joerg Roedel
ae62d49c7a iommu/amd: Move aperture_range.offset to another cache-line
Moving it before the pte_pages array puts in into the same
cache-line as the spin-lock and the bitmap array pointer.
This should safe a cache-miss.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:51 +01:00
Joerg Roedel
a0f51447f4 iommu/amd: Add dma_ops_aperture_alloc() function
Make this a wrapper around iommu_ops_area_alloc() for now
and add more logic to this function later on.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2015-12-28 17:18:51 +01:00