Commit Graph

603061 Commits

Author SHA1 Message Date
Trond Myklebust
f71dfe8fc9 pNFS: Remove redundant pnfs_mark_layout_returned_if_empty()
That's already being taken care of in pnfs_layout_remove_lseg().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:41 -04:00
Trond Myklebust
d9b61708fe pNFS: Clear the layout metadata if the server changed the layout stateid
If the server changed the layout stateid's "other" field, then
we should treat the old layout as being completely gone. In that
case, we want to clear the metadata such as scheduled layoutreturns.

Do this by calling pnfs_mark_layout_stateid_invalid().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:41 -04:00
Trond Myklebust
5f46be049b pNFS: Cleanup - don't open code pnfs_mark_layout_stateid_invalid()
Ensure nfs42_layoutstat_done() layoutget don't open code layout stateid
invalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:40 -04:00
Trond Myklebust
e036f46453 NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id
When determining which layout segments to return, we do want
pnfs_mark_matching_lsegs_return to check that they match the layout
sequence id. This ensures that we don't waste time if the server
is replaying a layout recall that has already been satisfied.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:40 -04:00
Trond Myklebust
2d6cf5ab0b pNFS: Do not set plh_return_seq for non-callback related layoutreturns
In cases where we need to send a layoutreturn in order to propagate
an error, we should not tie that to a specific layout stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:39 -04:00
Trond Myklebust
e5fd1904b8 pNFS: Ensure layoutreturn acts as a completion for layout callbacks
When we return NFS_OK to the CB_LAYOUTRECALL, we are required to
send a layoutreturn that "completes" that layout recall request, using
the correct stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:39 -04:00
Trond Myklebust
793b7fe558 pNFS: Fix CB_LAYOUTRECALL stateid verification
We want to evaluate in this order:

If the client holds no layout for this inode, then return
NFS4ERR_NOMATCHING_LAYOUT; it probably forgot the layout.

If the client finds the inode among the list of layouts, but the corresponding
stateid has not yet been initialised, then return NFS4ERR_DELAY to ask the
server to retry once the outstanding LAYOUTGET is complete.

If the current layout stateid's "other" field does not match the recalled
stateid, return NFS4ERR_BAD_STATEID.

If already processing a layout recall with a newer stateid, return
NFS4ERR_OLD_STATEID. This can only happens for servers that are
non-compliant with the NFSv4.1 protocol.

If already processing a layout recall with an older stateid, return
NFS4ERR_DELAY to ask the server to retry once the outstanding
LAYOUTRETURN is complete. Again, this is technically incompliant with
the NFSv4.1 protocol.

If the current layout sequence id is newer than the recalled stateid's
sequence id, return NFS4ERR_OLD_STATEID. This too implies protocol
non-compliance.

If the current layout sequence id is older than the recalled stateid's
sequence id+1, return NFS4ERR_DELAY.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:38 -04:00
Trond Myklebust
ecebb80bf3 pNFS: Always update the layout barrier seqid on LAYOUTGET
Currently, pnfs_set_layout_stateid() will update the layout sequence
id barrier only if the stateid itself is newer than the current
layout stateid. However in a situation where multiple LAYOUTGET calls
and a LAYOUTRETURN raced, it is entirely possible for one of the
LAYOUTGET to set the current stateid to something newer than the
LAYOUTRETURN that needs to set the barrier.

The fix is to allow the "update_barrier" flag to force a check as to
whether or not the barrier needs to be updated.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:38 -04:00
Trond Myklebust
13bede18de pNFS: Always update the layout stateid if NFS_LAYOUT_INVALID_STID is set
If the layout stateid is invalid, then pnfs_set_layout_stateid() must
always initialise it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 16:16:25 -04:00
Trond Myklebust
8e0acf9046 pNFS: Clear the layout return tracking on layout reinitialisation
Ensure that we don't carry over layoutreturn info from a previous
incarnation of this layout.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 12:51:49 -04:00
Trond Myklebust
45fcc7bca7 pNFS: LAYOUTRETURN should only update the stateid if the layout is valid
If the layout was completely returned, then ignore the returned layout
stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-24 12:51:49 -04:00
Trond Myklebust
dc05973b28 Merge commit 'e7bdea7750eb'
Needed in order to work on top of pNFS changes in Linus' upstream kernel.
2016-07-24 12:51:10 -04:00
Artem Savkov
297fae4d0b Fix NULL pointer dereference in bl_free_device().
When bl_parse_deviceid() fails in bl_alloc_deviceid_node() on
blkdev_get_by_*() step we get an pnfs_block_dev struct that is
uninitialized except for bdev field which is set to whatever error
blkdev_get_by_*() returns.  bl_free_device() then tries to call
blkdev_put() if bdev is not 0 resulting in a wrong pointer dereference.

Fixing this by setting bdev in struct pnfs_block_dev only if we didn't
get an error from blkdev_get_by_*().

Signed-off-by: Artem Savkov <asavkov@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-22 15:14:21 -04:00
Kinglong Mee
c77efc1e78 nfs/blocklayout: Check max uuids and devices before decoding
Avoid nfs return uuids/devices larger than maximum.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Kinglong Mee
ecc2b88c4a nfs/blocklayout: Make sure calculate signature length aligned
Avoid a bad nfs server return an unaligned length of signature.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:51:11 -04:00
Christoph Hellwig
11487ddbdb nfs/blocklayout: support RH/Fedora dm-mpath device nodes
Instead of reusing the wwn-* names for multipath devices nodes RHEL and
Fedora introduce new dm-mpath-uuid-* nodes with a slightly different
naming scheme.  Try these names first to ensure we always get a
multipath-capable device if it exists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig
d702d41ed4 nfs/blocklayout: refactor open-by-wwn
The current code works with the standard udev/systemd names, but we'll have
to add another method in the next patch.  Refactor it into a separate helper
to make room for the new variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Christoph Hellwig
0173ca0544 nfs/blocklayout: use proper fmode for opening block devices
This was fixed for the original block layout code a while ago, but also
needs to be fixed for the SCSI layout path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-07-15 15:46:08 -04:00
Linus Torvalds
e7bdea7750 NFS client bugfixes for Linux 4.7
Stable bugfixes:
 - Fix _cancel_empty_pagelist
 - Fix a double page unlock
 - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
 - Fix another OPEN_DOWNGRADE bug
 
 Other bugfixes:
 - Ensure we handle delegation errors in nfs4_proc_layoutget()
 - Layout stateids start out as being invalid
 - Add sparse lock annotations for pnfs_find_alloc_layout
 - Handle bad delegation stateids in nfs4_layoutget_handle_exception
 - Fix up O_DIRECT results
 - Fix potential use after free of state in nfs4_do_reclaim.
 - Mark the layout stateid invalid when all segments are removed
 - Don't let readdirplus revalidate an inode that was marked as stale
 - Fix potential race in nfs_fhget()
 - Fix an unused variable warning
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXdDWkAAoJENfLVL+wpUDrpsUP/1F2zu12r/Zkv3ZEFhShcpQc
 2N1TRD9X7Lruod2pUD95qqjjdw+/vu3LjcyJljrasRaJENijvZ2GQhKkB7xPODlu
 qxZcmnQsH+WmpmJKcqAByAW1czcNGMoMHnt4tV0gG21NH+XUb92fgn+aGeIJDVrK
 Hcd9d8TfnFWO70ZgTUXW/hv0CXwu4MEJhN2JfF4lolbxUkmjLHHLoxSDDm0AdXGC
 EE8f0V9/7xurvOeLe5bQOQXfZPedBydsLNXa1ZacMGKgmBUoRNxJ5yCpPUtcTVBx
 HwbiY+WDFQ7MdKTzUQqqbnrIqKw8Hu4SugIV/vHRqR+Lhc6u29YGOqdU4d2G8IKW
 Nh8MBqS+dDefCkL3TJoE7MpjhP3EOO6HXnv5FjMZLOuu2X2o+Sz3+DkhYCq6pj/g
 fFh480vZfZYaTsfDf1ttvVN8kIvQ+1Uk3LK6aC2EVwgPrv+0OIRu36F0JQQimxOp
 EbYDlhVk7mzH/ZQ31GmPPbSIk+3sm2V58lqXnUMovoqPFPiN3xZDuBcnlyFrrzaI
 NjOvsdVxkOdHWbYZyBQzj16Vo651EYbAAUEwsud70N8C3aCgkTxCZ30Q0v+KqqxU
 pP5kz3zYUdkXQeHxE6T0iXG9fGcv/nGS21hTfT01YJCK7v67K8TYRNMrOEVURVgk
 LSD/CZJXJHVJn1Vr4F7o
 =IGNO
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "Stable bugfixes:
   - Fix _cancel_empty_pagelist
   - Fix a double page unlock
   - Make nfs_atomic_open() call d_drop() on all ->open_context() errors.
   - Fix another OPEN_DOWNGRADE bug

  Other bugfixes:
   - Ensure we handle delegation errors in nfs4_proc_layoutget()
   - Layout stateids start out as being invalid
   - Add sparse lock annotations for pnfs_find_alloc_layout
   - Handle bad delegation stateids in nfs4_layoutget_handle_exception
   - Fix up O_DIRECT results
   - Fix potential use after free of state in nfs4_do_reclaim.
   - Mark the layout stateid invalid when all segments are removed
   - Don't let readdirplus revalidate an inode that was marked as stale
   - Fix potential race in nfs_fhget()
   - Fix an unused variable warning"

* tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix another OPEN_DOWNGRADE bug
  make nfs_atomic_open() call d_drop() on all ->open_context() errors.
  NFS: Fix an unused variable warning
  NFS: Fix potential race in nfs_fhget()
  NFS: Don't let readdirplus revalidate an inode that was marked as stale
  NFSv4.1/pnfs: Mark the layout stateid invalid when all segments are removed
  NFS: Fix a double page unlock
  pnfs_nfs: fix _cancel_empty_pagelist
  nfs4: Fix potential use after free of state in nfs4_do_reclaim.
  NFS: Fix up O_DIRECT results
  NFS/pnfs: handle bad delegation stateids in nfs4_layoutget_handle_exception
  NFSv4.1/pnfs: Add sparse lock annotations for pnfs_find_alloc_layout
  NFSv4.1/pnfs: Layout stateids start out as being invalid
  NFSv4.1/pnfs: Ensure we handle delegation errors in nfs4_proc_layoutget()
2016-06-29 15:30:26 -07:00
Linus Torvalds
89a82a9218 Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
 "Two small patches to fix audit problems in 4.7-rcX: the first fixes a
  potential kref leak, the second removes some header file noise.

  The first is an important bug fix that really should go in before 4.7
  is released, the second is not critical, but falls into the very-nice-
  to-have category so I'm including in the pull request.

  Both patches are straightforward, self-contained, and pass our
  testsuite without problem"

* 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit:
  audit: move audit_get_tty to reduce scope and kabi changes
  audit: move calcs after alloc and check when logging set loginuid
2016-06-29 15:18:47 -07:00
Linus Torvalds
32826ac41f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "I've been traveling so this accumulates more than week or so of bug
  fixing.  It perhaps looks a little worse than it really is.

   1) Fix deadlock in ath10k driver, from Ben Greear.

   2) Increase scan timeout in iwlwifi, from Luca Coelho.

   3) Unbreak STP by properly reinjecting STP packets back into the
      stack.  Regression fix from Ido Schimmel.

   4) Mediatek driver fixes (missing malloc failure checks, leaking of
      scratch memory, wrong indexing when mapping TX buffers, etc.) from
      John Crispin.

   5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic
      Sowa.

   6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su.

   7) Fix netlink notifications in ovs for tunnels, delete link messages
      are never emitted because of how the device registry state is
      handled.  From Nicolas Dichtel.

   8) Conntrack module leaks kmemcache on unload, from Florian Westphal.

   9) Prevent endless jump loops in nft rules, from Liping Zhang and
      Pablo Neira Ayuso.

  10) Not early enough spinlock initialization in mlx4, from Eric
      Dumazet.

  11) Bind refcount leak in act_ipt, from Cong WANG.

  12) Missing RCU locking in HTB scheduler, from Florian Westphal.

  13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU
      barrier, using heap for SG and IV, and erroneous use of async flag
      when allocating AEAD conext.)

  14) RCU handling fix in TIPC, from Ying Xue.

  15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in
      SIT driver, from Simon Horman.

  16) Socket timer deadlock fix in TIPC from Jon Paul Maloy.

  17) Fix potential deadlock in team enslave, from Ido Schimmel.

  18) Memory leak in KCM procfs handling, from Jiri Slaby.

  19) ESN generation fix in ipv4 ESP, from Herbert Xu.

  20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong
      WANG.

  21) Use after free in netem, from Eric Dumazet.

  22) Uninitialized last assert time in multicast router code, from Tom
      Goff.

  23) Skip raw sockets in sock_diag destruction broadcast, from Willem
      de Bruijn.

  24) Fix link status reporting in thunderx, from Sunil Goutham.

  25) Limit resegmentation of retransmit queue so that we do not
      retransmit too large GSO frames.  From Eric Dumazet.

  26) Delay bpf program release after grace period, from Daniel
      Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits)
  openvswitch: fix conntrack netlink event delivery
  qed: Protect the doorbell BAR with the write barriers.
  neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
  e1000e: keep VLAN interfaces functional after rxvlan off
  cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
  qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
  bpf, perf: delay release of BPF prog after grace period
  net: bridge: fix vlan stats continue counter
  tcp: do not send too big packets at retransmit time
  ibmvnic: fix to use list_for_each_safe() when delete items
  net: thunderx: Fix TL4 configuration for secondary Qsets
  net: thunderx: Fix link status reporting
  net/mlx5e: Reorganize ethtool statistics
  net/mlx5e: Fix number of PFC counters reported to ethtool
  net/mlx5e: Prevent adding the same vxlan port
  net/mlx5e: Check for BlueFlame capability before allocating SQ uar
  net/mlx5e: Change enum to better reflect usage
  net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
  net/mlx5: Update command strings
  net: marvell: Add separate config ANEG function for Marvell 88E1111
  ...
2016-06-29 11:50:42 -07:00
Linus Torvalds
653c574a7d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Another two bug fixes for 4.7:

   - The revert of patch which removed boot information for systems
     using an intermediate boot kernel, e.g. the SLES12 grub setup.

   - A fix for an incorrect inline assembly constraint that causes
     broken code to be generated with gcc 4.8.5"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix test_fp_ctl inline assembly contraints
  Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
2016-06-29 11:48:05 -07:00
Linus Torvalds
00bf377d19 Pin control fixes for the v4.7 cycle:
- Driver fixes for i.MX, single register, Tegra and BayTrail.
 
 - MAINTAINERS entry for the documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXc3S0AAoJEEEQszewGV1zbZoP+wcffLAiH4YWO/iKPpZatSLK
 zAtFIYPvzHgITaF8VTyNKvNRiOv+11QECQAw8Eibn2hd28AUGrpPpNkeU6edCFdK
 myh/QSROM1VYTCDq76KW+zsRaCkZ9HRPj2HGz3vyqe8pStJgOYlCPY7hKTDdr3/h
 Umg01CBlzr74dMIQ51CgfiIAR7IMYBfoE2TU+/kR7C+HaLyRB7sUDSw+/HGkG+g/
 5++dwPMbk+Z7J+bDjnKlO+jNcuLrRERwoMGCd2mzu8rvmd7Yh0c0IamKbbiTfZS5
 QfT+fI6eVulcStlImAOZdLR79KB2JEdQH/1wSFupABCrucHBVxQiPSisKsa6qzw5
 cia6hW8IzLwbGfytYIX89lPYP43FLbUqfjIplCXMZaUIQcnikgmmyIi/VgZAN1AY
 Mi1SxXJ8EUtoQ/21nZpYdqIb4kRWiIqmjklSPDc/oiFnGGIuH+X5jnRka4Fws7Cz
 1ilGpccfoQL04lvaWpKQK2FnNXU9USPGXg8hV/RXuAjNddrvxa9PQU0NxrVVhR9S
 h1u2LVtjx4CELnPY9+hWUgxfXGEGbgv1V4xHOdofytBvfVhf+Qyzbg7Gn5+30s1p
 VgVliGHyA0/Up7u8E1HTSsxbioyJkdvZ1A+6ibESJnlnCL5/2oKFhMAbMTdD/QQ9
 LXmkKpXY+kWhEM/9z8MV
 =e4gh
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Here are a bunch of fixes for pin control.  Just drivers and a
  MAINTAINERS fixup:

   - Driver fixes for i.MX, single register, Tegra and BayTrail.

   - MAINTAINERS entry for the documentation"

* tag 'pinctrl-v4.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: baytrail: Fix mingled clock pins
  MAINTAINERS: belong Documentation/pinctrl.txt properly
  pinctrl: tegra: Fix build dependency
  gpio: tegra: Make lockdep class file-scoped
  pinctrl: single: Fix missing flush of posted write for a wakeirq
  pinctrl: imx: Do not treat a PIN without MUX register as an error
2016-06-29 10:05:44 -07:00
Linus Torvalds
52827f389b Merge branch 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Three fix patches.  Two are for cgroup / css init failure path.  The
  last one makes css_set_lock irq-safe as the deadline scheduler ends up
  calling put_css_set() from irq context"

* 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Disable IRQs while holding css_set_lock
  cgroup: set css->id to -1 during init
  cgroup: remove redundant cleanup in css_create
2016-06-29 10:04:42 -07:00
David S. Miller
751ad819b0 Just two small fixes
* fix mesh peer link counter, decrement wasn't always done at all
  * fix ethertype (length) for packets without RFC 1042 or bridge
    tunnel header
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJXc5wQAAoJEGt7eEactAAd2HQP/RS3lZaNn97spZp9Vb9Swqd6
 UjhT4vHL2wHPMhPals2M2ztRhL8df/55tOhSuCwUki0gTsAfP4+Jz1PZ4okFUSBn
 6EHvRtAgSAzjBElZVEsNO2sM/qm0UOnlqm4XpOXcKzpnRzDSlIa6vpnctp2MJMBI
 PB46x8BPQoi192aTzleMqgBwjrMX1ohsM6rgOIEf7rvLhftx6LRRm7/uKaiwenVT
 ZxkP6jIOs5ofkUHLHS4OPyId4yReO8oMMbRRA4fW1z2T8hME9PH3jTZL3Znaj7VF
 s2LLSqwUlv06k0WFu3SzJh750jCJ6cnJiIqggDlPChrb0/DPicqgWR7/tJAvNVfj
 WJqugeNtEVI+4ElGNxFzFAvjAKqO8fHY5U8Ko4LtZuwhjpo5GDvA+FErFESRXvCG
 c647cDMhn+6OXwpUwsggQK4c+AS4QX/PzHjmhW5tKORPXlYVXAAz+JDXn195WnzV
 w4UAO4ZWai9XY6oSc63uudaHMt8Xkmq8PRQsH1hHG20CAJ+7bGugaWVTci5fzeNh
 JC4X+Xh6t+5qq1jEWgr+KguVKXtP3pVetqGy9kYQrJov0fTe2Z6deEeY08sH1Ffz
 xdnQUsZphvqSkosTw76JtNPXz2ajxZCD3KNWjcpkL/NNs9lnIZPZc/mnzKP7ltro
 ZEYw0HbcXVt/uFhHlwdG
 =dAbs
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-06-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two small fixes
 * fix mesh peer link counter, decrement wasn't always done at all
 * fix ethertype (length) for packets without RFC 1042 or bridge
   tunnel header
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 08:33:46 -04:00
Samuel Gauthier
d913d3a763 openvswitch: fix conntrack netlink event delivery
Only the first and last netlink message for a particular conntrack are
actually sent. The first message is sent through nf_conntrack_confirm when
the conntrack is committed. The last one is sent when the conntrack is
destroyed on timeout. The other conntrack state change messages are not
advertised.

When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
is called for each packet, from the postrouting hook, which in turn calls
nf_ct_deliver_cached_events to send the state change netlink messages.

This commit fixes the problem by calling nf_ct_deliver_cached_events in the
non-commit case as well.

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
CC: Joe Stringer <joestringer@nicira.com>
CC: Justin Pettit <jpettit@nicira.com>
CC: Andy Zhou <azhou@nicira.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 08:13:59 -04:00
Sudarsana Reddy Kalluru
34c7bb4705 qed: Protect the doorbell BAR with the write barriers.
SPQ doorbell is currently protected with the compilation barrier. Under the
stress scenarios, we may get into a state where (due to the weak ordering)
several ramrod doorbells were written to the BAR with an out-of-order
producer values. Need to change the barrier type to a write barrier to make
sure that the write buffer is flushed after each doorbell.

Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 08:12:45 -04:00
David Barroso
b560f03ddf neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e8 ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso <dbarroso@fastly.com>
Signed-off-by: Lennert Buytenhek <lbuytenhek@fastly.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 07:58:28 -04:00
Jarod Wilson
889ad45666 e1000e: keep VLAN interfaces functional after rxvlan off
I've got a bug report about an e1000e interface, where a VLAN interface is
set up on top of it:

$ ip link add link ens1f0 name ens1f0.99 type vlan id 99
$ ip link set ens1f0 up
$ ip link set ens1f0.99 up
$ ip addr add 192.168.99.92 dev ens1f0.99

At this point, I can ping another host on vlan 99, ip 192.168.99.91.
However, if I do the following:

$ ethtool -K ens1f0 rxvlan off

Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on
again. I'm not sure if this is actually intended behavior, or if there's a
lack of software VLAN stripping fallback, or what, but things continue to
work if I simply don't call e1000e_vlan_strip_disable() if there are
active VLANs (plagiarizing a function from the e1000 driver here) on the
interface.

Also slipped a related-ish fix to the kerneldoc text for
e1000e_vlan_strip_disable here...

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 07:39:48 -04:00
Felix Fietkau
c041778c96 cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header
The PDU length of incoming LLC frames is set to the total skb payload size
in __ieee80211_data_to_8023() of net/wireless/util.c which incorrectly
includes the length of the IEEE 802.11 header.

The resulting LLC frame header has a too large PDU length, causing the
llc_fixup_skb() function of net/llc/llc_input.c to reject the incoming
skb, effectively breaking STP.

Solve the problem by properly substracting the IEEE 802.11 frame header size
from the PDU length, allowing the LLC processor to pick up the incoming
control messages.

Special thanks to Gerry Rozema for tracking down the regression and proposing
a suitable patch.

Fixes: 2d1c304cb2 ("cfg80211: add function for 802.3 conversion with separate output buffer")
Cc: stable@vger.kernel.org
Reported-by: Gerry Rozema <gerryr@rozeware.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-29 11:50:33 +02:00
Dan Carpenter
5b4d10f5e0 qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag()
There is a static checker warning here "warn: mask and shift to zero"
and the code sets "ring" to zero every time.  From looking at how
QLCNIC_FETCH_RING_ID() is used in qlcnic_83xx_process_rcv_ring() the
qlcnic_83xx_hndl() should be removed.

Fixes: 4be41e92f7 ('qlcnic: 83xx data path routines')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:46:16 -04:00
Daniel Borkmann
ceb5607035 bpf, perf: delay release of BPF prog after grace period
Commit dead9f29dd ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).

Fixes: dead9f29dd ("perf: Fix race in BPF program unregister")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:42:55 -04:00
Nikolay Aleksandrov
565ce8f32a net: bridge: fix vlan stats continue counter
I made a dumb off-by-one mistake when I added the vlan stats counter
dumping code. The increment should happen before the check, not after
otherwise we miss one entry when we continue dumping.

Fixes: a60c090361 ("bridge: netlink: export per-vlan stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:33:35 -04:00
Eric Dumazet
a3d2e9f8eb tcp: do not send too big packets at retransmit time
Arjun reported a bug in TCP stack and bisected it to a recent commit.

In case where we process SACK, we can coalesce multiple skbs
into fat ones (tcp_shift_skb_data()), to lower write queue
overhead, because we do not expect to retransmit these packets.

However, SACK reneging can happen, forcing the sender to retransmit
all these packets. If skb->len is above 64KB, we then send buggy
IP packets that could hang TSO engine on cxgb4.

Neal suggested to use tcp_tso_autosize() instead of tp->gso_segs
so that we cook packets of optimal size vs TCP/pacing.

Thanks to Arjun for reporting the bug and running the tests !

Fixes: 10d3be5692 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Arjun V <arjun@chelsio.com>
Tested-by: Arjun V <arjun@chelsio.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:25:11 -04:00
Wei Yongjun
96183182ad ibmvnic: fix to use list_for_each_safe() when delete items
Since we will remove items off the list using list_del() we need
to use a safe version of the list_for_each() macro aptly named
list_for_each_safe().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:23:42 -04:00
David S. Miller
b2c1b30e3e Merge branch 'thunderx-fixes'
Sunil Goutham says:

====================
net: thunderx: Miscellaneous fixes

This 2 patch series fixes issues w.r.t physical link status
reporting and transmit datapath configuration for
secondary qsets.

Changes from v1:
Fixed lmac disable sequence for interfaces of type SGMII.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:14:19 -04:00
Sunil Goutham
3e29adba56 net: thunderx: Fix TL4 configuration for secondary Qsets
TL4 calculation for a given SQ of secondary Qsets is incorrect
and goes out of bounds and also for some SQ's TL4 chosen will
transmit data via a different BGX interface and not same as
primary Qset's interface.

This patch fixes this issue.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:14:13 -04:00
Sunil Goutham
3f4c68cfde net: thunderx: Fix link status reporting
Check for SMU RX local/remote faults along with SPU LINK
status. Otherwise at times link is UP at our end but DOWN
at link partner's side. Also due to an issue in BGX it's
rarely seen that initialization doesn't happen properly
and SMU RX reports faults with everything fine at SPU.
This patch tries to reinitialize LMAC to fix it.

Also fixed LMAC disable sequence to properly bring down link.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Tao Wang <tao.wang@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 05:14:13 -04:00
David S. Miller
f5074d0ce2 Merge branch 'mlx5-100G-fixes'
Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes#2 for 4.7-rc

The following series provides one-liners fixes for mlx5 driver plus one
medium patch to reorganize ethtool counters reporting.

Highlights:
	- Added MODIFY_FLOW_TABLE to command strings table
	- Add ConnectX-5 PCIe 4.0 to list of supported devices
	- Rename ASYNC_EVENTS enum
	- Enable BlueFlame only when supported by device
	- Avoid adding same vxlan port twice
	- Report the correct number of PFC counters
	- Reorganize ethtool reported counters and remove duplications
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:59 -04:00
Gal Pressman
bfe6d8d1d4 net/mlx5e: Reorganize ethtool statistics
Categorize and reorganize ethtool statistics counters by renaming to
"rx_*" and "tx_*" and removing redundant and duplicated counters, this
way they are easier to grasp and more user friendly.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:47 -04:00
Gal Pressman
ed80ec4c17 net/mlx5e: Fix number of PFC counters reported to ethtool
Number of PFC counters used to count only number of priorities with PFC
enabled, but each priority has more than one counter, hence the need to
multiply it by the number of PFC counters per priority.

Fixes: cf678570d5 ('net/mlx5e: Add per priority group to PPort counters')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:46 -04:00
Matthew Finlay
9ceec359e4 net/mlx5e: Prevent adding the same vxlan port
Do not allow the same vxlan udp port to be added to the device more than
once.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:46 -04:00
Gal Pressman
fd4782c213 net/mlx5e: Check for BlueFlame capability before allocating SQ uar
Previous to this patch mapping was always set to write combining without
checking whether BlueFlame is supported in the device.

Fixes: 0ba422410b ('net/mlx5: Fix global UAR mapping')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:46 -04:00
Eli Cohen
e0f46eb9f6 net/mlx5e: Change enum to better reflect usage
Change MLX5E_STATE_ASYNC_EVENTS_ENABLE to
MLX5E_STATE_ASYNC_EVENTS_ENABLED since it represent a state and not an
operation.

Fixes: acff797cd1 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:46 -04:00
Majd Dibbiny
7092fe8669 net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices
Add the upcoming ConnectX-5 PCIe 4.0 device to the list of
supported devices by the mlx5 driver.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:46 -04:00
Eli Cohen
5be1ea899d net/mlx5: Update command strings
Add command string for MODIFY_FLOW_TABLE which is used by the driver.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:28:46 -04:00
Harini Katakam
3ec0a0f10c net: marvell: Add separate config ANEG function for Marvell 88E1111
Marvell 88E1111 currently uses the generic marvell config ANEG function.
This function has a sequence accessing Page 5 and Register 31,
both of which are not defined or reserved for this PHY.
Hence this patch adds a new config ANEG function for Marvell 88E1111
without these erroneous accesses.

Signed-off-by: Harini Katakam <harinik@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:07:36 -04:00
David S. Miller
d10f0b312a Merge branch 'batman-adv-fixes'
Sven Eckelmann says:

====================
batman-adv: Fixes for Linux 4.7

Antonio currently seems to be occupied. This is currently rather unfortunate
because there are patches waiting in the batman-adv development repository
maint(enance) branch [1] since up to 6 weeks. I am now getting asked when
these patches will hit the distribution kernels and therefore decided to
submit these patches directly to netdev.

The patch from Simon works around the problem that warnings could be triggered
in the translation table code via packets using a VLAN not configured on the
target host. This warning was replaced with a rate limited info message.

Ben Hutchings found an superfluous batadv_softif_vlan_put in the error
handling code of the translation table while he backported the "batman-adv:
Fix reference counting of vlan object for tt_local_entry" patch to the stable
kernels. He noticed correctly that this batadv_softif_vlan_put should also
have been removed by the said patch.

The most requested fix at the moment is related to a double free in the
translation table code. It is a race condition which mostly happens on systems
with multiple cores and multiple network interface attached to batman-adv. Two
Freifunk communities which were haunted by weird crashes (with backtraces
reporting problems in other parts of the kernel) were kind enough to test this
patch. They reported that there systems are now running stable after applying
this patch.

An invalid memory access was detected in the batadv_icmp_packet_rr handling
code when receiving a skbuff with fragments. The last patch is fixing a memory
leak when the interface is removed via .dellink. The code to fix it was copied
from the code handling the legacy sysfs interface to remove netdevices from a
batman-adv netdevice.

There are still 28 patches in the development tree for v4.8 but I will leave
them to Antonio because these are cleanups and features and therefore for net-
next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:01:53 -04:00
Sven Eckelmann
420cb1b764 batman-adv: Clean up untagged vlan when destroying via rtnl-link
The untagged vlan object is only destroyed when the interface is removed
via the legacy sysfs interface. But it also has to be destroyed when the
standard rtnl-link interface is used.

Fixes: 5d2c05b213 ("batman-adv: add per VLAN interface attribute framework")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:01:48 -04:00
Sven Eckelmann
3b55e44220 batman-adv: Fix ICMP RR ethernet access after skb_linearize
The skb_linearize may reallocate the skb. This makes the calculated pointer
for ethhdr invalid. But it the pointer is used later to fill in the RR
field of the batadv_icmp_packet_rr packet.

Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the
pointer and avoid the invalid read.

Fixes: da6b8c20a5 ("batman-adv: generalize batman-adv icmp packet handling")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 04:01:48 -04:00