Commit Graph

520225 Commits

Author SHA1 Message Date
David S. Miller
6b9107d6a1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-04-14

This series contains updates to i40e and i40evf.

Mitch provides a fix for i40e, where VFs were gone and the associated
VSI's had been removed and the rings were not stopped, which in some
circumstances cased memory corruption or DMAR errors.  So stop all the
rings associated with each VF before releasing its resources.  Also
cleaned up a poorly indented piece of code.  Fixes VF link state, where
VF devices were assuming link is up unless told otherwise, which means
that VFs instantiated on a PF with no link, would report the wrong state.

Anjali adds support to add Flow director Sideband rules for a VF from it's
PF.  Fixes a recently discovered hardware issue, where after a VFLR
hardware might be indicating to us a reset completion little too early, so
wait another 10 msec for cache to be cleaned up.

Jesse enables the user to dump the internal hardware state for better
debugging by allowing a bash script to acquire information about the
internal hardware state.  The data output to the kernel log is collected
by the script and can then be sent to Intel.  Also fixed a possible
failure path to allocate memory that was found by smatch.  Cleaned up
unused local variables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15 17:29:00 -04:00
Eric Dumazet
074975d037 bnx2x: Fix busy_poll vs netpoll
Commit 9a2620c877 ("bnx2x: prevent WARN during driver unload")
switched the napi/busy_lock locking mechanism from spin_lock() into
spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
disables interrupts prior to calling our napi mechanism.

This switches the driver into using atomic assignments instead of the
spinlock mechanisms previously employed.

Based on initial patch from Yuval Mintz & Ariel Elior

I basically added softirq starvation avoidance, and mixture
of atomic operations, plain writes and barriers.

Note this slightly reduces the overhead for this driver when no
busy_poll sockets are in use.

Fixes: 9a2620c877 ("bnx2x: prevent WARN during driver unload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15 17:25:02 -04:00
Linus Torvalds
6d50ff91d9 File locking related changes for v4.1 (pile #1)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVLoesAAoJEAAOaEEZVoIVJVoP/jVPt1TbEdA1wP/YO6dcrhtL
 wjaWjDLJDUZ6zhHYAYIZ0Psko1LOkZzIa31bqlGGdjBbwHAKLqipgwoIPsOmD0YZ
 Ja4apWH7zRSWdzmYNkvigMpHmYpFCLuU2y7+BzWLH/MlohdN7zWkI4RZNpVddE35
 m/MmIVlp8IPMbrPMR07/6mkKpqLY6FMxMEhEQvYfGzBiUyF6sJN9IdN7/4S4LK/7
 W6rN6ZSKiioAp4vOSaPkhOMoM+H2uiBemKl+ZnznlOkUwL9aw+R8O8+481VgwqQ5
 beDdZzOIW4UD/vA98AhXIP74z/zHLezVWDL5GNkMro7SZuUUJo0PzJDsuWbzNjYi
 nzMkbJp2SGmIKAUWqyKQZXOWft3KJ0x8eqPtd+kgYniQvjCPH7o1De8xRhQ5HNwf
 URJLkfBpJujjEewZ/gKldgcd6IJ6e5qhtg1L4ShNaV0jL6wtbDF4JLGOZkKb1wDB
 +QJ7lvsrqDo7EeEM06BzYcDF+Opp0IeXSR7RQzv5uYP2HZYR7angccuGVPJJqbbg
 IGSEhTGhokaiprvuE60SMPNBKXDW+gdhbN9z4XYfIL1brcV6dQzbm5RUR1duzSF1
 VYZxXECnDrDHXBqMUaF4su5E55Ivni4zK4B3j2WqYu72LMcjngr7cPqIL8U5KYTn
 nVC5Cz6nVZ0l7JoQ63p4
 =MPZM
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.1-1' of git://git.samba.org/jlayton/linux

Pull file locking related changes from Jeff Layton:
 "This set is mostly minor cleanups to the overhaul that went in last
  cycle.  The other noticeable items are the changes to the lm_get_owner
  and lm_put_owner prototypes, and the fact that we no longer need to
  use the i_lock to protect the i_flctx pointer"

* tag 'locks-v4.1-1' of git://git.samba.org/jlayton/linux:
  locks: use cmpxchg to assign i_flctx pointer
  locks: get rid of WE_CAN_BREAK_LSLK_NOW dead code
  locks: change lm_get_owner and lm_put_owner prototypes
  locks: don't allocate a lock context for an F_UNLCK request
  locks: Add lockdep assertion for blocked_lock_lock
  locks: remove extraneous IS_POSIX and IS_FLOCK tests
  locks: Remove unnecessary IS_POSIX test
2015-04-15 14:22:45 -07:00
Thomas Gleixner
48b63776c8 net: hip04: Make tx coalesce timer actually work
The code sets the expiry value of the timer to a relative value and
starts it with hrtimer_start_expires. That's fine, but that only works
once. The timer is started in relative mode, so the expiry value gets
overwritten with the absolut expiry time (now + expiry).

So once the timer expired, a new call to hrtimer_start_expires results
in an immidiately expired timer, because the expiry value is
already in the past.

Use the proper mechanisms to (re)start the timer in the intended way.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: dingtianhong <dingtianhong@huawei.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Zhangfei Gao <zhangfei.gao@linaro.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: netdev@vger.kernel.org
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-15 17:21:29 -04:00
Linus Torvalds
eccd02f32a crypto: fix mis-merge with the networking merge
The networking updates from David Miller removed the iocb argument from
sendmsg and recvmsg (in commit 1b78414047: "net: Remove iocb argument
from sendmsg and recvmsg"), but the crypto code had added new instances
of them.

When I pulled the crypto update, it was a silent semantic mis-merge, and
I overlooked the new warning messages in my test-build.  I try to fix
those in the merge itself, but that relies on me noticing. Oh well.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 14:09:46 -07:00
Brian Bian
5fa0fa4b01 powercap / RAPL: Add support for Intel Skylake processors
Signed-off-by: Brian Bian <brian.bian@intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-15 23:06:16 +02:00
Borislav Petkov
64df1fdfcc cpufreq: intel_pstate: Fix an annoying !CONFIG_SMP warning
I keep seeing

  drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_init’:
  drivers/cpufreq/intel_pstate.c:1187:26: warning: initialization from incompatible pointer type
    struct cpuinfo_x86 *c = &boot_cpu_data;

when doing randconfig builds.

This is caused by the fact that when !CONFIG_SMP, asm/processor.h
defines cpu_info to boot_cpu_data and the local variable

  struct cpu_defaults *cpu_info

overshadows it leading to this unfortunate assignment in the
preprocessed source:

 struct cpu_defaults *boot_cpu_data;
 struct cpuinfo_x86 *c = &boot_cpu_data;

Rename the local variable and use static_cpu_has_safe() which alleviates
the need for defining a local cpuinfo_x86 pointer.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-15 23:02:24 +02:00
Linus Torvalds
fa2e5c073a Merge branch 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc
Pull exec domain removal from Richard Weinberger:
 "This series removes execution domain support from Linux.

  The idea behind exec domains was to support different ABIs.  The
  feature was never complete nor stable.  Let's rip it out and make the
  kernel signal handling code less complicated"

* 'exec_domain_rip_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc: (27 commits)
  arm64: Removed unused variable
  sparc: Fix execution domain removal
  Remove rest of exec domains.
  arch: Remove exec_domain from remaining archs
  arc: Remove signal translation and exec_domain
  xtensa: Remove signal translation and exec_domain
  xtensa: Autogenerate offsets in struct thread_info
  x86: Remove signal translation and exec_domain
  unicore32: Remove signal translation and exec_domain
  um: Remove signal translation and exec_domain
  tile: Remove signal translation and exec_domain
  sparc: Remove signal translation and exec_domain
  sh: Remove signal translation and exec_domain
  s390: Remove signal translation and exec_domain
  mn10300: Remove signal translation and exec_domain
  microblaze: Remove signal translation and exec_domain
  m68k: Remove signal translation and exec_domain
  m32r: Remove signal translation and exec_domain
  m32r: Autogenerate offsets in struct thread_info
  frv: Remove signal translation and exec_domain
  ...
2015-04-15 13:53:55 -07:00
Linus Torvalds
e44740c1a9 Merge tag 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
 - hostfs saw a face lifting
 - old/broken stuff was removed (SMP, HIGHMEM, SKAS3/4)
 - random cleanups and bug fixes

* tag 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (26 commits)
  um: Print minimum physical memory requirement
  um: Move uml_postsetup in the init_thread stack
  um: add a kmsg_dumper
  x86, UML: fix integer overflow in ELF_ET_DYN_BASE
  um: hostfs: Reduce number of syscalls in readdir
  um: Remove broken highmem support
  um: Remove broken SMP support
  um: Remove SKAS3/4 support
  um: Remove ppc cruft
  um: Remove ia64 cruft
  um: Remove dead code from stacktrace
  hostfs: No need to box and later unbox the file mode
  hostfs: Use page_offset()
  hostfs: Set page flags in hostfs_readpage() correctly
  hostfs: Remove superfluous initializations in hostfs_open()
  hostfs: hostfs_open: Reset open flags upon each retry
  hostfs: Remove superfluous test in hostfs_open()
  hostfs: Report append flag in ->show_options()
  hostfs: Use __getname() in follow_link
  hostfs: Remove open coded strcpy()
  ...
2015-04-15 13:49:27 -07:00
Linus Torvalds
d613896926 This pull request includes the following UBI/UBIFS changes:
* Powercut emulation for UBI
 * A huge update to UBI Fastmap
 * Cleanups and bugfixes all over UBI and UBIFS
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJVLsBfAAoJEEtJtSqsAOnWyaYP/30jnSEu4y2agDPwdev0wnqF
 QRAAVWxLCzinBkElXJ0POvhdQwnxP37ACqa/mNVuTa5++N1eI8PPpcRu+EUrVmoP
 mSRmEBdarvNYRLmOaoBBwY4qFvmcZx9Odt2BIdXz9xV+Mk220VdRW2PBn8JfGDn4
 JJziWKmq9HUKXMQY9SVkOfT5tW+xHdjRUuNwp2S5nuV4hz2rUv0ChlYqCBqvca8/
 tfjtlCLBGGuFSXfa9wtdCTWlU2Hyh4sG9u72c23iGqHmwOc9ZIpBlgiz+nYZFXoo
 GriBDBXG5ib1p+Px+oecWPBLv/Ipgc887IJHFAz3exPmSkQQTnuCSYFsraS1zqwZ
 91kOGjcF5wY9EwBbT0a1VDdOwYT9MC5gD7G71MarDt0BWEYDvPufd6XPc5qPfP/9
 xtscNQZ2FNd4gWGRT5irBOqIERdsWt2cjgD0aK2gbV1p9meUBU6hlvMIOU1YmBKg
 TwK8o+RwhE13JWwSExqcW+FRz9Aup/W/EDRNuuUGOXFFSnSvqSwE/y3JeP7jydIf
 7J3rvrKdRWObTO/uZvtx+mwU3FgNPoITCgQQtAz7jexjksLWMxVi1gF1VYULScW/
 p5ewCFsuRQSeCL/M1cKO1trYARbe9YITewbrQKuL8wknQKGcXY0XiV0OpPlH0xXD
 LqnvNoeixq4AH7rkMQIk
 =nzVw
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.1-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This pull request includes the following UBI/UBIFS changes:

   - powercut emulation for UBI
   - a huge update to UBI Fastmap
   - cleanups and bugfixes all over UBI and UBIFS"

* tag 'upstream-4.1-rc1' of git://git.infradead.org/linux-ubifs: (50 commits)
  UBI: power cut emulation for testing
  UBIFS: fix output format of INUM_WATERMARK
  UBI: Fastmap: Fall back to scanning mode after ECC error
  UBI: Fastmap: Remove is_fm_block()
  UBI: Fastmap: Add blank line after declarations
  UBI: Fastmap: Remove else after return.
  UBI: Fastmap: Introduce may_reserve_for_fm()
  UBI: Fastmap: Introduce ubi_fastmap_init()
  UBI: Fastmap: Wire up WL accessor functions
  UBI: Add accessor functions for WL data structures
  UBI: Move fastmap specific functions out of wl.c
  UBI: Fastmap: Add new module parameter fm_debug
  UBI: Fastmap: Make self_check_eba() depend on fastmap self checking
  UBI: Fastmap: Add self check to detect absent PEBs
  UBI: Fix stale pointers in ubi->lookuptbl
  UBI: Fastmap: Enhance fastmap checking
  UBI: Add initial support for fastmap self checks
  UBI: Fastmap: Rework fastmap error paths
  UBI: Fastmap: Prepare for variable sized fastmaps
  UBI: Fastmap: Locking updates
  ...
2015-04-15 13:43:40 -07:00
Kristen Carlson Accardi
6a82ba6d4f intel_pstate: Change the setpoint for Atom params
Change the setpoint for the Baytrail and Cherrytrail CPUs.  This
will cause more aggressive pstate selection and improves
performance on a variety of workloads with little power penalty.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-04-15 22:42:32 +02:00
Doug Ledford
c1c2fef6cf Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1 2015-04-15 16:24:49 -04:00
Scott Wood
50c6a665b3 powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
Commit dc6c9a35b6 ("mm: account pmd page tables to the process")
added a counter that is incremented whenever a PMD is allocated and
decremented whenever a PMD is freed.  For hugepages on PPC, common code
is used to allocated PMDs, but arch-specific code is used to free PMDs.

This results in kernel output such as "BUG: non-zero nr_pmds on freeing
mm: 1" when using hugepages.

Update the PPC hugepage PMD freeing code to decrement the count, just
as the above commit did for free_pmd_range().

Fixes: dc6c9a35b6 ("mm: account pmd page tables to the process")
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.0.x
2015-04-15 15:24:22 -05:00
Linus Torvalds
fa927894bb Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs update from Al Viro:
 "Now that net-next went in...  Here's the next big chunk - killing
  ->aio_read() and ->aio_write().

  There'll be one more pile today (direct_IO changes and
  generic_write_checks() cleanups/fixes), but I'd prefer to keep that
  one separate"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  ->aio_read and ->aio_write removed
  pcm: another weird API abuse
  infinibad: weird APIs switched to ->write_iter()
  kill do_sync_read/do_sync_write
  fuse: use iov_iter_get_pages() for non-splice path
  fuse: switch to ->read_iter/->write_iter
  switch drivers/char/mem.c to ->read_iter/->write_iter
  make new_sync_{read,write}() static
  coredump: accept any write method
  switch /dev/loop to vfs_iter_write()
  serial2002: switch to __vfs_read/__vfs_write
  ashmem: use __vfs_read()
  export __vfs_read()
  autofs: switch to __vfs_write()
  new helper: __vfs_write()
  switch hugetlbfs to ->read_iter()
  coda: switch to ->read_iter/->write_iter
  ncpfs: switch to ->read_iter/->write_iter
  net/9p: remove (now-)unused helpers
  p9_client_attach(): set fid->uid correctly
  ...
2015-04-15 13:22:56 -07:00
Honggang LI
59d2d18cc4 mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures
If CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for x86 systems and physical
memory is more than 4GB, dma_map_page may return a valid memory
address which greater than 0xffffffff. As a result, the mlx5 device page
allocator RB tree will be initialized with valid addresses greater than
0xfffffff.

However, (addr & PAGE_MASK) set the high four bytes to zeros. So, it's
impossible for the function, free_4k, to release the pages whose
addresses greater than 4GB. Memory leaks. And mlx5_ib module can't
release the pages when user try to remove the module, as a result,
system hang.

[root@rdma05 root]# dmesg  | grep addr | head
addr             = 3fe384000
addr & PAGE_MASK =  fe384000
[root@rdma05 root]# rmmod mlx5_ib   <---- hang on

---------------------- cosnole log -----------------
mlx5_ib 0000:04:00.0: irq 138 for MSI/MSI-X
  alloc irq_desc for 139 on node -1
  alloc kstat_irqs on node -1
mlx5_ib 0000:04:00.0: irq 139 for MSI/MSI-X
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
---------------------- cosnole log -----------------

Fixes: bf0bf77f65 ('mlx5: Support communicating arbitrary host page size to firmware')
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:20:51 -04:00
Sagi Grimberg
ba943fb237 IB/iser: Rewrite bounce buffer code path
In some rare cases, IO operations may be not aligned to page
boundaries. This prevents iser from performing fast memory
registration. In order to overcome that iser uses a bounce
buffer to carry the transaction. We basically allocate a buffer
in the size of the transaction and perform a copy.

The buffer allocation using kmalloc is too restrictive since it
requires higher order (atomic) allocations for large transactions
(which may result in memory exhaustion fairly fast for some workloads).
We rewrite the bounce buffer code path to allocate scattered pages
and perform a copy between the transaction sg and the bounce sg.

Reported-by: Alex Lyakas <alex@zadarastorage.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:14 -04:00
Sagi Grimberg
4fcd1470a0 IB/iser: Bump version to 1.6
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
ad1e567242 IB/iser: Remove code duplication for a single DMA entry
In singleton scatterlists, DMA memory registration code
is taken both for Fastreg and FMR code paths. Move it to
a function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
6ef8bb837d IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
Instead of passing ib_sge as output variable, we pass the mem_reg
pointer to have the routines fill the rkey as well. This reduces
code duplication and extra assignments. This is a preparation step
to unify some registration logics together. Also, pass iser_fast_reg_mr
the fastreg descriptor directly.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
90a6684c30 IB/iser: Modify struct iser_mem_reg members
No need to keep lkey, va, len variables, we can keep
them as struct ib_sge. This will help when we change the
memory registration logic.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
8b95aa2c1b IB/iser: Make fastreg pool cache friendly
Memory regions are resources that are saved
in the device caches. Increase the probability for
a cache hit by adding the MRU descriptor to pool
head.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
4dec2a27e3 IB/iser: Move PI context alloc/free to routines
Make iser_[create|destroy]_fastreg_desc shorter, more
readable and easily extendable.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
bd8b944eee IB/iser: Move fastreg descriptor pool get/put to helper functions
Instead of open-coding connection fastreg pool get/put,
we introduce iser_reg_desc[get|put] helpers.

We aren't setting these static as this will be a per-device
routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg
a bit.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
f0e35c27a5 IB/iser: Merge build page-vec into register page-vec
No need for these two separate. Keep it in a single routine
like in the fastreg case. This will also make iser_reg_page_vec
closer to iser_fast_reg_mr arguments. This is a preparation
step for registration flow refactor.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
b130ededff IB/iser: Get rid of struct iser_rdma_regd
This struct members other than struct iser_mem_reg are unused,
so remove it altogether.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
6847fdeb0b IB/iser: Remove redundant assignments in iser_reg_page_vec
Buffer length was assigned twice, and no reason to set va to
io_addr and then add the offset, just set va to io_addr + offset.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:13 -04:00
Sagi Grimberg
d03e61d036 IB/iser: Move memory reg/dereg routines to iser_memory.c
As memory registration/de-registration methods, lets
move them to their natural location. While we're at it,
make iser_reg_page_vec routine static.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg
5640832590 IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
No need to pass that, we can take it from the task.
In a later stage, this function will be invoked
according to a device capability.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg
e3784bd1d9 IB/iser: Remove a redundant struct iser_data_buf
No need to keep two iser_data_buf structures just in case we use
mem copy. We can avoid that just by adding a pointer to the original
sg. So keep only two iser_data_buf per command (data and protection)
and pass the relevant data_buf to bounce buffer routine.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg
ecc3993a2a IB/iser: Remove redundant cmd_data_len calculation
This code was added before we had protection data length
calculation (in iser_send_command), so we needed to calc
the sg data length from the sg itself. This is not needed
anymore.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Adir Lev <adirl@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg
a065fe6aa2 IB/iser: Fix wrong calculation of protection buffer length
This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.

Fixes: d77e65350f ('libiscsi, iser: Adjust data_length to include protection information')
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg
30bf1d58ae IB/iser: Handle fastreg/local_inv completion errors
Fast registration and local invalidate work requests can
also fail. We should call error completion handler for them.

Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Sagi Grimberg
c4de4663e0 IB/iser: Fix unload during ep_poll wrong dereference
In case the user unloaded ib_iser while ep_connect is in
progress, we need to destroy the endpoint although ep_disconnect
wasn't invoked (we detect this by the iser conn state != DOWN).
However, if we got an REJECTED/UNREACHABLE CM event we move the
connection state to DOWN which will prevent us from destroying
the endpoint in the module unload stage. Fix this by setting the
connection state to TERMINATING in iser_conn_error so we can still
destroy the endpoint at unload stage.

Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:07:12 -04:00
Doug Ledford
9f5d32af09 ib_srpt: convert printk's to pr_* functions
The driver already defined the pr_format, it just hadn't
been converted to use pr_info, pr_warn, and pr_err instead
of the equivalent printks.  Convert so that messages from
the driver are now properly tagged with their driver name
and can be more easily debugged.

In addition, a number of these printk's were not newline
terminated, so fix that at the same time.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:54 -04:00
Bart Van Assche
56b5390caf IB/srp: Use P_Key cache for P_Key lookups
This change slightly reduces the time needed to log in.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:54 -04:00
Sebastian Ott
cc47d369b5 infiniband/mlx4: check for mapping error
Since ib_dma_map_single can fail use ib_dma_mapping_error to check
for errors.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:39 -04:00
Selvin Xavier
d2928a8c0d MAINTAINERS: Adding list of maintainers for ocrdma
Updating the MAINTAINERS file with ocrdma maintainers and their email ids

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:39 -04:00
Stephen Hemminger
b962dc0a48 rdma: replace deprecated ifconfig in doc
The ifconfig command has been deprecated for many years.
To encourage new users not to continue using it and learning
iproute2; the ifconfig should not be used in examples.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:39 -04:00
Sébastien Dugué
a233c4b54c ib_uverbs: Fix pages leak when using XRC SRQs
Hello,

  When an application using XRCs abruptly terminates, the mmaped pages
of the CQ buffers are leaked.

  This comes from the fact that when resources are released in
ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their
refcount is not 0.

  When creating an XRC SRQ, we increment the associated CQ refcount.
This refcount is only decremented when the SRQ is released.

  Therefore we need to release the SRQs prior to the CQs to make sure
that all references to the CQs are gone before trying to release these.

Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:39 -04:00
Erez Shitrit
ca9b590caa IB/mlx4: Fix WQE LSO segment calculation
The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.

It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.

The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.

An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.

Fixes: b832be1e40 ('IB/mlx4: Add IPoIB LSO support')
Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit
0e5544d9bf IB/ipoib: Remove IPOIB_MCAST_RUN bit
After Doug Ledford's changes there is no need in that bit, it's
semantic becomes subset of the IPOIB_FLAG_OPER_UP bit.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit
1e85b806f9 IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's
Whenever there is no path->ah to the destination, keep only defined
number of skb's. Otherwise there are cases that the driver can keep
infinite list of skb's.

For example, when one device want to send unicast arp to the destination,
and from some reason the SM doesn't respond, the driver currently keeps
all the skb's. If that unicast arp traffic stopped, all  these skb's
are kept by the path object till the interface is down.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit
2c01073095 IB/ipoib: Handle QP in SQE state
As the result of a completion error the QP can moved to SQE state by
the hardware. Since it's not the Error state, there are no flushes
and hence the driver doesn't know about that.

The fix creates a task that after completion with error which is not a
flush tracks the QP state and if it is in SQE state moves it back to RTS.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:19 -04:00
Erez Shitrit
3fd0605caa IB/ipoib: Update broadcast record values after each successful join request
Update the cached broadcast record in the priv object after every new
join of this broadcast domain group.

These values are needed for the port configuration (MTU size) and to
all the new multicast (non-broadcast) join requests initial parameters.

For example, SM starts with 2K MTU for all the fabric, and after that it
restarts (or handover to new SM) with new port configuration of 4K MTU.
Without using the new values, the driver will keep its old configuration
of 2K and will not apply the new configuration of 4K.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Erez Shitrit
a44878d100 IB/ipoib: Use one linear skb in RX flow
The current code in the RX flow uses two sg entries for each incoming
packet, the first one was for the IB headers and the second for the rest
of the data, that causes two  dma map/unmap and two allocations, and few
more actions that were done at the data path.

Use only one linear skb on each incoming packet, for the data (IB
headers and payload), that reduces the packet processing in the
data-path (only one skb, no frags, the first frag was not used anyway,
less memory allocations) and the dma handling (only one dma map/unmap
over each incoming packet instead of two map/unmap per each incoming packet).

After commit 73d3fe6d1c ("gro: fix aggregation for skb using frag_list") from
Eric Dumazet, we will get full aggregation for large packets.

When running bandwidth tests before and after the (over the card's numa node),
using "netperf -H 1.1.1.3 -T -t TCP_STREAM", the results before are ~12Gbs before
and after ~16Gbs on my setup (Mellanox's ConnectX3).

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford
1c0453d64a IB/ipoib: drop mcast_mutex usage
We needed the mcast_mutex when we had to prevent the join completion
callback from having the value it stored in mcast->mc overwritten
by a delayed return from ib_sa_join_multicast.  By storing the return
of ib_sa_join_multicast in an intermediate variable, we prevent a
delayed return from ib_sa_join_multicast overwriting the valid
contents of mcast->mc, and we no longer need a mutex to force the
join callback to run after the return of ib_sa_join_multicast.  This
allows us to do away with the mutex entirely and protect our critical
sections with a just a spinlock instead.  This is highly desirable
as there were some places where we couldn't use a mutex because the
code was not allowed to sleep, and so we were currently using a mix
of mutex and spinlock to protect what we needed to protect.  Now we
only have a spin lock and the locking complexity is greatly reduced.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford
d2fe937ce6 IB/ipoib: deserialize multicast joins
Allow the ipoib layer to attempt to join all outstanding multicast
groups at once.  The ib_sa layer will serialize multiple attempts to
join the same group, but will process attempts to join different groups
in parallel.  Take advantage of that.

In order to make this happen, change the mcast_join_thread to loop
through all needed joins, sending a join request for each one that we
still need to join.  There are a few special cases we handle though:

1) Don't attempt to join anything but the broadcast group until the join
of the broadcast group has succeeded.
2) No longer restart the join task at the end of completion handling.
If we completed successfully, we are done.  The join task now needs kicked
either by mcast_send or mcast_restart_task or mcast_start_thread, but
should not need started anytime else except when scheduling a backoff
attempt to rejoin.
3) No longer use separate join/completion routines for regular and
sendonly joins, pass them all through the same routine and just do the
right thing based on the SENDONLY join flag.
4) Only try to join a SENDONLY join twice, then drop the packets and
quit trying.  We leave the mcast group in the list so that if we get a
new packet, all that we have to do is queue up the packet and restart
the join task and it will automatically try to join twice and then
either send or flush the queue again.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford
69911416d8 IB/ipoib: fix MCAST_FLAG_BUSY usage
Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
in how it was used.  We didn't always initialize the completion struct
before we set the flag, and we didn't always call complete on the
completion struct from all paths that complete it.  And when we did
complete it, sometimes we continued to touch the mcast entry after
the completion, opening us up to possible use after free issues.

This made it less than totally effective, and certainly made its use
confusing.  And in the flush function we would use the presence of this
flag to signal that we should wait on the completion struct, but we never
cleared this flag, ever.

In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.

 1) Remove the MCAST_JOIN_STARTED flag entirely
 2) Change MCAST_FLAG_BUSY so it now only means a join is in-flight
 3) Test mcast->mc directly to see if we have completed
    ib_sa_join_multicast (using IS_ERR_OR_NULL)
 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
    the mcast->done completion struct
 5) Make sure that before calling complete(&mcast->done), we always clear
    the MCAST_FLAG_BUSY bit
 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
    take the mutex in our join callback.  This forces
    ib_sa_multicast_join to return and set mcast->mc before we process
    the callback.  This way, our callback can safely clear mcast->mc
    if there is an error on the join and we will do the right thing as
    a result in mcast_dev_flush.
 7) Because we need the mutex to synchronize mcast->mc, we can no
    longer call mcast_sendonly_join directly from mcast_send and
    instead must add sendonly join processing to the mcast_join_task
 8) Make MCAST_RUN mean that we have a working mcast subsystem, not that
    we have a running task.  We know when we need to reschedule our
    join task thread and don't need a flag to tell us.
 9) Add a helper for rescheduling the join task thread

A number of different races are resolved with these changes.  These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.

One race looks something like this:

Thread 1                             Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
  alloc member
  call callback
                                     ifconfig ib0 down
				     wait_for_completion
    callback call completes
                                     wait_for_completion in
				     mcast_dev_flush completes
				       mcast->mc is PTR_ERR_OR_NULL
				       so we skip ib_sa_leave_multicast
    return from callback
  return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast

We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c

Another like this:

Thread 1                   Thread 2         Thread 3
ib_sa_multicast_join
                                            ifconfig ib0 down
					    priv->broadcast = NULL
                           join_complete
			                    wait_for_completion
			   mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
			   complete
			   return -EAGAIN (making mcast->mc invalid)
			   		    call ib_sa_multicast_leave
					    on invalid mcast->mc, hang
					    forever

By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we
run the callback.  This allows us to clear mcast->mc if there is an
error and the join is going to fail.  We do this before we complete
the mcast.  In this way, mcast_dev_flush always sees consistent state
in regards to mcast->mc membership at the time that the
wait_for_completion() returns.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford
efc82eeeae IB/ipoib: No longer use flush as a parameter
Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue.  Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00
Doug Ledford
0b39578bcd IB/ipoib: Use dedicated workqueues per interface
During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue means
that we would also have to prevent all races between different devices
(for instance, ipoib_mcast_restart_task on interface ib0 can race with
ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on
the rtnl_lock).

There are several possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:06:18 -04:00