Commit Graph

584072 Commits

Author SHA1 Message Date
Geliang Tang
5ee61e95b6 libceph: use KMEM_CACHE macro
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:57 +01:00
Geliang Tang
99ec269779 ceph: use kmem_cache_zalloc
Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:56 +01:00
Geliang Tang
03d9440676 rbd: use KMEM_CACHE macro
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:56 +01:00
Yan, Zheng
200fd27c8f ceph: use lookup request to revalidate dentry
If dentry has no lease, ceph_d_revalidate() previously return 0.
This causes VFS to invalidate the dentry and create a new dentry
for later lookup. Invalidating a dentry also detach any underneath
mount points. So mount point inside cephfs can disapear mystically
(even the mount point is not modified by other hosts).

The fix is using lookup request to revalidate dentry without lease.
This can partly solve the mount points disapear issue (as long as
the mount point is not modified by other hosts)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:56 +01:00
Yan, Zheng
641235d8f8 ceph: kill ceph_get_dentry_parent_inode()
use vfs helper dget_parent() instead

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
315f240880 ceph: fix security xattr deadlock
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.

The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)

Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
29dccfa5af ceph: don't request vxattrs from MDS
It's uselese because MDS reply does not carry any vxattr.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
132ca7e1de ceph: fix mounting same fs multiple times
Now __ceph_open_session() only accepts closed client. An opened
client will tigger BUG_ON().

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:54 +01:00
Yan, Zheng
4531126753 ceph: remove unnecessary NULL check
If page->mapping is NULL, releasepage() callback does not get called.
Remove the unnecessary NULL check to make static code analysis tool
happy

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:54 +01:00
Yan, Zheng
a3d714c336 ceph: avoid updating directory inode's i_size accidentally
Directory inode's i_size is used by readdir cache.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:53 +01:00
Yan, Zheng
af5e5eb574 ceph: fix race during filling readdir cache
Readdir cache uses page cache to save dentry pointers. When adding
dentry pointers to middle of a page, we need to make sure the page
already exists. Otherwise the beginning part of the page will be
invalid pointers.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:53 +01:00
Ilya Dryomov
89f081730c libceph: use sizeof_footer() more
Don't open-code sizeof_footer() in read_partial_message() and
ceph_msg_revoke().  Also, after switching to sizeof_footer(), it's now
possible to use con_out_kvec_add() in prepare_write_message_footer().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2016-03-25 18:51:53 +01:00
Ilya Dryomov
34b759b4a2 ceph: kill ceph_empty_snapc
ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
for sizing the request message.

Further, in all four cases the subsequent ceph_osdc_build_request() is
passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
and making ceph_empty_snapc entirely useless.  The two cases where it
actually mattered were removed in commits 8605609049 ("ceph: avoid
sending unnessesary FLUSHSNAP message") and 23078637e0 ("ceph: fix
queuing inode to mdsdir's snaprealm").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by:  Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Anton Protopopov
ce4355932a ceph: fix a wrong comparison
A negative value rc compared to the positive value ENOENT in the
finish_read() function.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Deepa Dinamani
8bbd47140c ceph: replace CURRENT_TIME by current_fs_time()
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Yan, Zheng
5b64640cf6 ceph: scattered page writeback
This patch makes ceph_writepages_start() try using single OSD request
to write all dirty pages within a strip unit. When a nonconsecutive
dirty page is found, ceph_writepages_start() tries starting a new write
operation to existing OSD request. If it succeeds, it uses the new
operation to writeback the dirty page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:51 +01:00
Yan, Zheng
2c63f49a72 libceph: add helper that duplicates last extent operation
This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:43 +01:00
Ilya Dryomov
3f1af42ad0 libceph: enable large, variable-sized OSD requests
Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests.  The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.

r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once.  ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc().  req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:43 +01:00
Ilya Dryomov
9e767adbd3 libceph: osdc->req_mempool should be backed by a slab pool
ceph_osd_request_cache was introduced a long time ago.  Also, osd_req
is about to get a flexible array member, which ceph_osd_request_cache
is going to be aware of.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:43 +01:00
Ilya Dryomov
ae458f5a17 libceph: make r_request msg_size calculation clearer
Although msg_size is calculated correctly, the terms are grouped in
a misleading way - snaps appears to not have room for a u32 length.
Move calculation closer to its use and regroup terms.

No functional change.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:42 +01:00
Yan, Zheng
7665d85b73 libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op
This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:42 +01:00
Ilya Dryomov
de2aa102ea libceph: rename ceph_osd_req_op::payload_len to indata_len
Follow userspace nomenclature on this - the next commit adds
outdata_len.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng
a587d71b0a ceph: remove useless BUG_ON
ceph_osdc_start_request() never return -EOLDSNAP

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng
133e91566c ceph: don't enable rbytes mount option by default
When rbytes mount option is enabled, directory size is recursive
size. Recursive size is not updated instantly. This can cause
directory size to change between successive stat(1)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng
d1eee0c0e1 ceph: encode ctime in cap message
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:40 +01:00
Ilya Dryomov
b5d91704f5 libceph: behave in mon_fault() if cur_mon < 0
This can happen if __close_session() in ceph_monc_stop() races with
a connection reset.  We need to ignore such faults, otherwise it's
likely we would take !hunting, call __schedule_delayed() and end up
with delayed_work() executing on invalid memory, among other things.

The (two!) con->private tests are useless, as nothing ever clears
con->private.  Nuke them.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:40 +01:00
Ilya Dryomov
bee3a37c47 libceph: reschedule tick in mon_fault()
Doing __schedule_delayed() in the hunting branch is pointless, as the
tick will have already been scheduled by then.

What we need to do instead is *reschedule* it in the !hunting branch,
after reopen_session() changes hunt_mult, which affects the delay.
This helps with spacing out connection attempts and avoiding things
like two back-to-back attempts followed by a longer period of waiting
around.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:40 +01:00
Ilya Dryomov
1752b50ca2 libceph: introduce and switch to reopen_session()
hunting is now set in __open_session() and cleared in finish_hunting(),
instead of all around.  The "session lost" message is printed not only
on connection resets, but also on keepalive timeouts.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:39 +01:00
Ilya Dryomov
168b9090c7 libceph: monc hunt rate is 3s with backoff up to 30s
Unless we are in the process of setting up a client (i.e. connecting to
the monitor cluster for the first time), apply a backoff: every time we
want to reopen a session, increase our timeout by a multiple (currently
2); when we complete the connection, reduce that multipler by 50%.

Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:39 +01:00
Ilya Dryomov
58d81b1294 libceph: monc ping rate is 10s
Split ping interval and ping timeout: ping interval is 10s; keepalive
timeout is 30s.

Make monc_ping_timeout a constant while at it - it's not actually
exported as a mount option (and the rest of tick-related settings won't
be either), so it's got no place in ceph_options.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:39 +01:00
Ilya Dryomov
0e04dc26cc libceph: pick a different monitor when reconnecting
Don't try to reconnect to the same monitor when we fail to establish
a session within a timeout or it's lost.

For that, pick_new_mon() needs to see the old value of cur_mon, so
don't clear it in __close_session() - all calls to __close_session()
but one are followed by __open_session() anyway.  __open_session() is
only called when a new session needs to be established, so the "already
open?" branch, which is now in the way, is simply dropped.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Ilya Dryomov
82dcabad75 libceph: revamp subs code, switch to SUBSCRIBE2 protocol
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime".  To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs.  Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Ilya Dryomov
0f9af169a1 libceph: decouple hunting and subs management
Coupling hunting state with subscribe state is not a good idea.  Clear
hunting when we complete the authentication handshake.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Ilya Dryomov
02ac956c42 libceph: move debugfs initialization into __ceph_open_session()
Our debugfs dir name is a concatenation of cluster fsid and client
unique ID ("global_id").  It used to be the case that we learned
global_id first, nowadays we always learn fsid first - the monmap is
sent before any auth replies are.  ceph_debugfs_client_init() call in
ceph_monc_handle_map() is therefore never executed and can be removed.

Its counterpart in handle_auth_reply() doesn't really belong there
either: having to do monc->client and unlocking early to work around
lockdep is a testament to that.  Move it into __ceph_open_session(),
where it can be called unconditionally.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:37 +01:00
Linus Torvalds
3c2de27d79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - Preparations of parallel lookups (the remaining main obstacle is the
   need to move security_d_instantiate(); once that becomes safe, the
   rest will be a matter of rather short series local to fs/*.c

 - preadv2/pwritev2 series from Christoph

 - assorted fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
  splice: handle zero nr_pages in splice_to_pipe()
  vfs: show_vfsstat: do not ignore errors from show_devname method
  dcache.c: new helper: __d_add()
  don't bother with __d_instantiate(dentry, NULL)
  untangle fsnotify_d_instantiate() a bit
  uninline d_add()
  replace d_add_unique() with saner primitive
  quota: use lookup_one_len_unlocked()
  cifs_get_root(): use lookup_one_len_unlocked()
  nfs_lookup: don't bother with d_instantiate(dentry, NULL)
  kill dentry_unhash()
  ceph_fill_trace(): don't bother with d_instantiate(dn, NULL)
  autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup()
  configfs: move d_rehash() into configfs_create() for regular files
  ceph: don't bother with d_rehash() in splice_dentry()
  namei: teach lookup_slow() to skip revalidate
  namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()
  lookup_one_len_unlocked(): use lookup_dcache()
  namei: simplify invalidation logics in lookup_dcache()
  namei: change calling conventions for lookup_{fast,slow} and follow_managed()
  ...
2016-03-19 18:52:29 -07:00
Linus Torvalds
51b3eae8db Merge branch 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "A small set of patches for audit this time; just three in total and
  one is a spelling fix.

  The two patches with actual content are designed to help prevent new
  instances of auditd from displacing an existing, functioning auditd
  and to generate a log of the attempt.  Not to worry, dead/stuck auditd
  instances can still be replaced by a new instance without problem.

  Nothing controversial, and everything passes our regression suite"

* 'stable-4.6' of git://git.infradead.org/users/pcmoore/audit:
  audit: Fix typo in comment
  audit: log failed attempts to change audit_pid configuration
  audit: stop an old auditd being starved out by a new auditd
2016-03-19 17:52:49 -07:00
Linus Torvalds
de06dbfa78 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "Another mixture of changes this time around:

   - Split XIP linker file from main linker file to make it more
     maintainable, and various XIP fixes, and clean up a resulting
     macro.

   - Decompressor cleanups from Masahiro Yamada

   - Avoid printing an error for a missing L2 cache

   - Remove some duplicated symbols in System.map, and move
     vectors/stubs back into kernel VMA

   - Various low priority fixes from Arnd

   - Updates to allow bus match functions to return negative errno
     values, touching some drivers and the driver core.  Greg has acked
     these changes.

   - Virtualisation platform udpates form Jean-Philippe Brucker.

   - Security enhancements from Kees Cook

   - Rework some Kconfig dependencies and move PSCI idle management code
     out of arch/arm into drivers/firmware/psci.c

   - ARM DMA mapping updates, touching media, acked by Mauro.

   - Fix places in ARM code which should be using virt_to_idmap() so
     that Keystone2 can work.

   - Fix Marvell Tauros2 to work again with non-DT boots.

   - Provide a delay timer for ARM Orion platforms"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (45 commits)
  ARM: 8546/1: dma-mapping: refactor to fix coherent+cma+gfp=0
  ARM: 8547/1: dma-mapping: store buffer information
  ARM: 8543/1: decompressor: rename suffix_y to compress-y
  ARM: 8542/1: decompressor: merge piggy.*.S and simplify Makefile
  ARM: 8541/1: decompressor: drop redundant FORCE in Makefile
  ARM: 8540/1: decompressor: use clean-files instead of extra-y to clean files
  ARM: 8539/1: decompressor: drop more unneeded assignments to "targets"
  ARM: 8538/1: decompressor: drop unneeded assignments to "targets"
  ARM: 8532/1: uncompress: mark putc as inline
  ARM: 8531/1: turn init_new_context into an inline function
  ARM: 8530/1: remove VIRT_TO_BUS
  ARM: 8537/1: drop unused DEBUG_RODATA from XIP_KERNEL
  ARM: 8536/1: mm: hide __start_rodata_section_aligned for non-debug builds
  ARM: 8535/1: mm: DEBUG_RODATA makes no sense with XIP_KERNEL
  ARM: 8534/1: virt: fix hyp-stub build for pre-ARMv7 CPUs
  ARM: make the physical-relative calculation more obvious
  ARM: 8512/1: proc-v7.S: Adjust stack address when XIP_KERNEL
  ARM: 8411/1: Add default SPARSEMEM settings
  ARM: 8503/1: clk_register_clkdev: remove format string interface
  ARM: 8529/1: remove 'i' and 'zi' targets
  ...
2016-03-19 16:31:54 -07:00
Linus Torvalds
b31a3bc3db arch/sh changes for 4.6. They include minor cleanups, a fix for a
crash that likely affects all sh models with MMU, and introduction of
 a framework for boards described by device tree, which sets the stage
 for future J2 support.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJW7IIaAAoJELcQ+SIFb8HaDMYH/1tb4FhXGJyVZed7LlK/cpRm
 wgdl30DPvi8V595GX6fnLS7XKvmekrH2LN0rUHS59H6W56SBu/BdsjH/DfrlId5I
 Dc19cZYTOFuhwKa75egyEeYrUqkdJMzyDx7maAZSN4BvQ0WL83x4UJwE24vsswW8
 h/7s5xFr9XZZ82HqJ6FDyR9beE7RePCnshNVKqTI7q15ht9SFNeJ35bUQxIwCXYl
 85eEU0EbPKF15E96eHo1YkAswyohshSMC5474wNwwhpsGmXzc71EtbPCvSGwMlxh
 1krA0kUrZAp/3S6/2qENosQqcK0mLhK2ibrNUZGgKc5J/iLwLzfCKxULVIt1IAw=
 =yg5n
 -----END PGP SIGNATURE-----

Merge tag 'tag-sh-for-4.6' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker:
 "This includes minor cleanups, a fix for a crash that likely affects
  all sh models with MMU, and introduction of a framework for boards
  described by device tree, which sets the stage for future J2 support"

* tag 'tag-sh-for-4.6' of git://git.libc.org/linux-sh:
  sched/preempt, sh: kmap_coherent relies on disabled preemption
  sh: add SMP method selection to device tree pseudo-board
  sh: add device tree support and generic board using device tree
  sh: remove arch-specific localtimer and use generic one
  sh: make MMU-specific SMP code conditional on CONFIG_MMU
  sh: provide unified syscall trap compatible with all SH models
  sh: New gcc support
  sh: Disable trace for kernel uncompressing.
  sh: Use generic clkdev.h header
2016-03-19 16:09:43 -07:00
Linus Torvalds
d5e2d00898 powerpc updates for 4.6
Highlights:
  - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras
  - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V
  - Add POWER9 cputable entry from Michael Neuling
  - FPU/Altivec/VSX save/restore optimisations from Cyril Bur
  - Add support for new ftrace ABI on ppc64le from Torsten Duwe
 
 Various cleanups & minor fixes from:
  - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril
    Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey,
    Sukadev Bhattiprolu, Suraj Jitindar Singh.
 
 General:
  - atomics: Allow architectures to define their own __atomic_op_* helpers from
    Boqun Feng
  - Implement atomic{, 64}_*_return_* variants and acquire/release/relaxed
    variants for (cmp)xchg from Boqun Feng
  - Add powernv_defconfig from Jeremy Kerr
  - Fix BUG_ON() reporting in real mode from Balbir Singh
  - Add xmon command to dump OPAL msglog from Andrew Donnellan
  - Add xmon command to dump process/task similar to ps(1) from Douglas Miller
  - Clean up memory hotplug failure paths from David Gibson
 
 pci/eeh:
  - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei
    Yang.
  - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
  - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
  - PCI: Add pcibios_bus_add_device() weak function from Wei Yang
  - MAINTAINERS: Update EEH details and maintainership from Russell Currey
 
 cxl:
  - Support added to the CXL driver for running on both bare-metal and
    hypervisor systems, from Christophe Lombard and Frederic Barrat.
  - Ignore probes for virtual afu pci devices from Vaibhav Jain
 
 perf:
  - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu
  - hv-24x7: Fix usage with chip events, display change in counter values,
    display domain indices in sysfs, eliminate domain suffix in event names,
    from Sukadev Bhattiprolu
 
 Freescale:
  - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum
    optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and
    other dt bits, and minor fixes/cleanup."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW69OrAAoJEFHr6jzI4aWAe5EQAJw/hE6WBQc6a7Tj70AnXOqR
 qk/m5pZjuTwQxfBteIvHR1pE5eXdlvtAjcD254LVkFkAbIn19W/h2k0VX/nlee7P
 n/VRHRifjtGmukqHrPYJJ7ua9mNlY7pxh3leGSixBFASnSWqMxNNNziNQtSTcuCs
 TjHiw6NkZ/kzeunA4bAfE4yHVUZjmL74oiS9JbLyaVHqoW4fqWLlh26AKo2yYMZI
 qPicBBG4HBi3FGvoexnKxlJNdcV4HO7LzDjJmCSfUKYCJi+Pw19T5qmhso0q0qVz
 vHg/A8HNeG4Hn83pNVmLeQSAIQRZ3DvTtcLgbjPo+TVwm/hzrRRBWipTeOVbkLW8
 2bcOXT4t7LWUq15EAJ1LYgYZGzcLrfRfUeOcuQ1TWd3+PcfY9pE7FmizsxAAfaVe
 E9j9mpz4XnIqBtWkFHneTIHkQ5OWptyKuZJEaYH0nut4VsP0k8NarkseafGqBPu7
 5eG83gbiQbCVixfOgblV9eocJ29JcwpjPAY4CZSGJimShg909FV7WRgZgJkKWrbK
 dBRco8Jcp4VglGfo2qymv7Uj4KwQoypBREOhiKUvrAsVlDxPfx+bcskhjGu9xGDC
 xs/+nme0/lKa/wg5K4C3mQ1GAlkMWHI0ojhJjsyODbetup5UbkEu03wjAaTdO9dT
 Y6ptGm0rYAJluPNlziFj
 =qkAt
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "This was delayed a day or two by some build-breakage on old toolchains
  which we've now fixed.

  There's two PCI commits both acked by Bjorn.

  There's one commit to mm/hugepage.c which is (co)authored by Kirill.

  Highlights:
   - Restructure Linux PTE on Book3S/64 to Radix format from Paul
     Mackerras
   - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh
     Kumar K.V
   - Add POWER9 cputable entry from Michael Neuling
   - FPU/Altivec/VSX save/restore optimisations from Cyril Bur
   - Add support for new ftrace ABI on ppc64le from Torsten Duwe

  Various cleanups & minor fixes from:
   - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy,
     Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell
     Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh.

  General:
   - atomics: Allow architectures to define their own __atomic_op_*
     helpers from Boqun Feng
   - Implement atomic{, 64}_*_return_* variants and acquire/release/
     relaxed variants for (cmp)xchg from Boqun Feng
   - Add powernv_defconfig from Jeremy Kerr
   - Fix BUG_ON() reporting in real mode from Balbir Singh
   - Add xmon command to dump OPAL msglog from Andrew Donnellan
   - Add xmon command to dump process/task similar to ps(1) from Douglas
     Miller
   - Clean up memory hotplug failure paths from David Gibson

  pci/eeh:
   - Redesign SR-IOV on PowerNV to give absolute isolation between VFs
     from Wei Yang.
   - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
   - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
   - PCI: Add pcibios_bus_add_device() weak function from Wei Yang
   - MAINTAINERS: Update EEH details and maintainership from Russell
     Currey

  cxl:
   - Support added to the CXL driver for running on both bare-metal and
     hypervisor systems, from Christophe Lombard and Frederic Barrat.
   - Ignore probes for virtual afu pci devices from Vaibhav Jain

  perf:
   - Export Power8 generic and cache events to sysfs from Sukadev
     Bhattiprolu
   - hv-24x7: Fix usage with chip events, display change in counter
     values, display domain indices in sysfs, eliminate domain suffix in
     event names, from Sukadev Bhattiprolu

  Freescale:
   - Updates from Scott: "Highlights include 8xx optimizations, 32-bit
     checksum optimizations, 86xx consolidation, e5500/e6500 cpu
     hotplug, more fman and other dt bits, and minor fixes/cleanup"

* tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
  powerpc: Fix unrecoverable SLB miss during restore_math()
  powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers
  powerpc/rcpm: Fix build break when SMP=n
  powerpc/book3e-64: Use hardcoded mttmr opcode
  powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible
  powerpc/T104xRDB: add tdm riser card node to device tree
  powerpc32: PAGE_EXEC required for inittext
  powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree
  powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
  powerpc/86xx: Introduce and use common dtsi
  powerpc/86xx: Update device tree
  powerpc/86xx: Move dts files to fsl directory
  powerpc/86xx: Switch to kconfig fragments approach
  powerpc/86xx: Update defconfigs
  powerpc/86xx: Consolidate common platform code
  powerpc32: Remove one insn in mulhdu
  powerpc32: small optimisation in flush_icache_range()
  powerpc: Simplify test in __dma_sync()
  powerpc32: move xxxxx_dcache_range() functions inline
  powerpc32: Remove clear_pages() and define clear_page() inline
  ...
2016-03-19 15:38:41 -07:00
Linus Torvalds
31e182363b DeviceTree updates for 4.6:
- New tool dtx_diff to diff DT files.
 
 - Sync kernel's dtc/libfdt to current dtc repo master.
 
 - Fix for reserved memory regions located in highmem.
 
 - Document standard unit suffixes for DT properties.
 
 - Various DT binding doc updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW7B34AAoJEPr7XbWNvGHDI4QP/0W0s3dvJ4l/TNeu/n1ZN8i7
 H4FYNcNGVEPJMaI4gIV+GOnxJ4hTnEX2ftG9MEZCywVHWNCg4IAaIW1wQ52Sufnc
 Uncc9qO0MMR5Y1Siu/DuLi+ARyPQrke3M/o5xU9XJjSdz9QpnYw1MpMnpAJANomk
 /8Wjn2jEBHDJrxmJ73nE/CAVu8iFyWHTmt5pDQBoQub2NVuAX6rNcVmpmr0PhDMd
 0CKokB+wmLHZEA2R4BBefjLwwKU1WrF/5ytXpsJ01NeZsExagJZz1fQdy3Z6o1Xx
 PRTukmmNhknatNTJOD8XmLr/SWN2CKNuJK5EOoV2opAvN/fc+mrk95DGKzjh5n64
 aCRHzZgKAOYOqdVJKHfJ9hfzgG/zdt4mt1RKhLD+6qZNoSeQSrtikN3DWibPMZQR
 uHRC9fqalx+W4cBH4jakGYrpsbQOaQFjb6vHY8V/auSB2cx97YVuKCawnSerjWZv
 1tu8dqNG0qwt9g0hgA+ycwitulUNSLSvLGp3mxBJXN2ULPHygCw53JUWFQEZhoAa
 JQqZ7lrUSE+AEp6Cwc0sNn07UMoCAoZKbTaurm2rn2RrKMfnirbf/sdZkc9Hq2pg
 TIJOJlmJmvheoCl7894iV1l18ooPqldk6h8d9fc6rngaQYkNaV0c9nlaMLTa2Rwi
 bVztrJItiymIhjIB86Iw
 =hz7v
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:

 - new tool 'dtx_diff' to diff DT files

 - sync kernel's dtc/libfdt to current dtc repo master

 - fix for reserved memory regions located in highmem

 - document standard unit suffixes for DT properties

 - various DT binding doc updates

* tag 'devicetree-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: Add vendor prefix for eGalax_eMPIA Technology Inc
  Input: ads7846: Add description how to use internal reference (ADS7846)
  ARM: realview: add EB syscon variants to bindings
  devicetree: bindings: ARM: Use "uV" for micro-volt
  serial: fsl-imx-uart: Fix typo in fsl,dte-mode description
  of: add 'const' for of_property_*_string*() parameter '*np'
  of/unittest: fix infinite loop in of_unittest_destroy_tracked_overlays()
  of: alloc anywhere from memblock if range not specified
  kbuild: Allow using host dtc instead of kernel's copy
  of: resolver: Add missing of_node_get and of_node_put
  of: Add United Radiant Technology Corporation vendor prefix
  dt/bindings: add documentation on standard property unit suffixes
  scripts/dtc: Update to upstream commit b06e55c88b9b
  ARM: boot: Add an implementation of strnlen for libfdt
  scripts/dtc: dtx_diff - add info to error message
  dtc: create tool to diff device trees
2016-03-19 15:15:07 -07:00
Linus Torvalds
1200b6809d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support more Realtek wireless chips, from Jes Sorenson.

   2) New BPF types for per-cpu hash and arrap maps, from Alexei
      Starovoitov.

   3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

   4) Allow the use of SO_REUSEPORT in order to do per-thread processing
   of incoming TCP/UDP connections.  The muxing can be done using a
   BPF program which hashes the incoming packet.  From Craig Gallek.

   5) Add a multiplexer for TCP streams, to provide a messaged based
      interface.  BPF programs can be used to determine the message
      boundaries.  From Tom Herbert.

   6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

   7) Avoid factorial complexity when taking down an inetdev interface
      with lots of configured addresses.  We were doing things like
      traversing the entire address less for each address removed, and
      flushing the entire netfilter conntrack table for every address as
      well.

   8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

   9) Allow offloading u32 classifiers to hardware, and implement for
      ixgbe, from John Fastabend.

  10) Allow configuring IRQ coalescing parameters on a per-queue basis,
      from Kan Liang.

  11) Extend ethtool so that larger link mode masks can be supported.
      From David Decotigny.

  12) Introduce devlink, which can be used to configure port link types
      (ethernet vs Infiniband, etc.), port splitting, and switch device
      level attributes as a whole.  From Jiri Pirko.

  13) Hardware offload support for flower classifiers, from Amir Vadai.

  14) Add "Local Checksum Offload".  Basically, for a tunneled packet
      the checksum of the outer header is 'constant' (because with the
      checksum field filled into the inner protocol header, the payload
      of the outer frame checksums to 'zero'), and we can take advantage
      of that in various ways.  From Edward Cree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
  bonding: fix bond_get_stats()
  net: bcmgenet: fix dma api length mismatch
  net/mlx4_core: Fix backward compatibility on VFs
  phy: mdio-thunder: Fix some Kconfig typos
  lan78xx: add ndo_get_stats64
  lan78xx: handle statistics counter rollover
  RDS: TCP: Remove unused constant
  RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
  net: smc911x: convert pxa dma to dmaengine
  team: remove duplicate set of flag IFF_MULTICAST
  bonding: remove duplicate set of flag IFF_MULTICAST
  net: fix a comment typo
  ethernet: micrel: fix some error codes
  ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
  bpf, dst: add and use dst_tclassid helper
  bpf: make skb->tc_classid also readable
  net: mvneta: bm: clarify dependencies
  cls_bpf: reset class and reuse major in da
  ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
  ldmvsw: Add ldmvsw.c driver code
  ...
2016-03-19 10:05:34 -07:00
Linus Torvalds
6b5f04b6cf Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "cgroup changes for v4.6-rc1.  No userland visible behavior changes in
  this pull request.  I'll send out a separate pull request for the
  addition of cgroup namespace support.

   - The biggest change is the revamping of cgroup core task migration
     and controller handling logic.  There are quite a few places where
     controllers and tasks are manipulated.  Previously, many of those
     places implemented custom operations for each specific use case
     assuming specific starting conditions.  While this worked, it makes
     the code fragile and difficult to follow.

     The bulk of this pull request restructures these operations so that
     most related operations are performed through common helpers which
     implement recursive (subtrees are always processed consistently)
     and idempotent (they make cgroup hierarchy converge to the target
     state rather than performing operations assuming specific starting
     conditions).  This makes the code a lot easier to understand,
     verify and extend.

   - Implicit controller support is added.  This is primarily for using
     perf_event on the v2 hierarchy so that perf can match cgroup v2
     path without requiring the user to do anything special.  The kernel
     portion of perf_event changes is acked but userland changes are
     still pending review.

   - cgroup_no_v1= boot parameter added to ease testing cgroup v2 in
     certain environments.

   - There is a regression introduced during v4.4 devel cycle where
     attempts to migrate zombie tasks can mess up internal object
     management.  This was fixed earlier this week and included in this
     pull request w/ stable cc'd.

   - Misc non-critical fixes and improvements"

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (44 commits)
  cgroup: avoid false positive gcc-6 warning
  cgroup: ignore css_sets associated with dead cgroups during migration
  Documentation: cgroup v2: Trivial heading correction.
  cgroup: implement cgroup_subsys->implicit_on_dfl
  cgroup: use css_set->mg_dst_cgrp for the migration target cgroup
  cgroup: make cgroup[_taskset]_migrate() take cgroup_root instead of cgroup
  cgroup: move migration destination verification out of cgroup_migrate_prepare_dst()
  cgroup: fix incorrect destination cgroup in cgroup_update_dfl_csses()
  cgroup: Trivial correction to reflect controller.
  cgroup: remove stale item in cgroup-v1 document INDEX file.
  cgroup: update css iteration in cgroup_update_dfl_csses()
  cgroup: allocate 2x cgrp_cset_links when setting up a new root
  cgroup: make cgroup_calc_subtree_ss_mask() take @this_ss_mask
  cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends
  cgroup: use cgroup_apply_enable_control() in cgroup creation path
  cgroup: combine cgroup_mutex locking and offline css draining
  cgroup: factor out cgroup_{apply|finalize}_control() from cgroup_subtree_control_write()
  cgroup: introduce cgroup_{save|propagate|restore}_control()
  cgroup: make cgroup_drain_offline() and cgroup_apply_control_{disable|enable}() recursive
  cgroup: factor out cgroup_apply_control_enable() from cgroup_subtree_control_write()
  ...
2016-03-18 20:25:49 -07:00
Eric Dumazet
fe30937b65 bonding: fix bond_get_stats()
bond_get_stats() can be called from rtnetlink (with RTNL held)
or from /proc/net/dev seq handler (with RCU held)

The logic added in commit 5f0c5f73e5 ("bonding: make global bonding
stats more reliable") kind of assumed only one cpu could run there.

If multiple threads are reading /proc/net/dev, stats can be really
messed up after a while.

A second problem is that some fields are 32bit, so we need to properly
handle the wrap around problem.

Given that RTNL is not always held, we need to use
bond_for_each_slave_rcu().

Fixes: 5f0c5f73e5 ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 23:14:15 -04:00
Eric Dumazet
eee5772322 net: bcmgenet: fix dma api length mismatch
When un-mapping skb->data in __bcmgenet_tx_reclaim(),
we must use the length that was used in original dma_map_single(),
instead of skb->len that might be bigger (includes the frags)

We simply can store skb_len into tx_cb_ptr->dma_len and use it
at unmap time.

Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 23:12:44 -04:00
Eli Cohen
76e39ccf9c net/mlx4_core: Fix backward compatibility on VFs
Commit 85743f1eb3 ("net/mlx4_core: Set UAR page size to 4KB regardless
of system page size") introduced dependency where old VF drivers without
this fix fail to load if the PF driver runs with this commit.

To resolve this add a module parameter which disables that functionality
by default.  If both the PF and VFs are running with a driver with that
commit the administrator may set the module param to true.

The module parameter is called enable_4k_uar.

Fixes: 85743f1eb3 ('net/mlx4_core: Set UAR page size to 4KB ...')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 23:09:45 -04:00
Linus Torvalds
fcab86add7 Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata updates from Tejun Heo:

 - ahci grew runtime power management support so that the controller can
   be turned off if no devices are attached.

 - sata_via isn't dead yet.  It got hotplug support and more refined
   workaround for certain WD drives.

 - Misc cleanups.  There's a merge from for-4.5-fixes to avoid confusing
   conflicts in ahci PCI ID table.

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata: ahci_xgene: dereferencing uninitialized pointer in probe
  AHCI: Remove obsolete Intel Lewisburg SATA RAID device IDs
  ata: sata_rcar: Use ARCH_RENESAS
  sata_via: Implement hotplug for VT6421
  sata_via: Apply WD workaround only when needed on VT6421
  ahci: Add runtime PM support for the host controller
  ahci: Add functions to manage runtime PM of AHCI ports
  ahci: Convert driver to use modern PM hooks
  ahci: Cache host controller version
  scsi: Drop runtime PM usage count after host is added
  scsi: Set request queue runtime PM status back to active on resume
  block: Add blk_set_runtime_active()
  ata: ahci_mvebu: add support for Armada 3700 variant
  libata: fix unbalanced spin_lock_irqsave/spin_unlock_irq() in ata_scsi_park_show()
  libata: support AHCI on OCTEON platform
2016-03-18 20:06:46 -07:00
Linus Torvalds
ef504fa591 Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Three trivial workqueue changes"

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix comment for work_on_cpu()
  sched/core: Get rid of 'cpu' argument in wq_worker_sleeping()
  workqueue: Replace usage of init_name with dev_set_name()
2016-03-18 20:05:39 -07:00
Andreas Färber
e2ad1f976b phy: mdio-thunder: Fix some Kconfig typos
Drop two extra occurrences of "on" in option title and help text.

Fixes: 379d7ac7ca ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.")
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 23:02:03 -04:00
Woojung Huh
a59f8c5b04 lan78xx: add ndo_get_stats64
Add lan78xx_get_stats64 of ndo_get_stats64 to report
all statistics counters including errors from HW statistics.

Read from HW when auto suspend is disabled, use saved counter when
auto suspend is enabled because periodic call to ndo_get_stats64
prevents USB auto suspend.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 22:27:48 -04:00
Woojung Huh
20ff55655a lan78xx: handle statistics counter rollover
Update to handle statistics counter rollover.
Check statistics counter periodically and compensate it when
counter value rolls over at max (20 or 32bits).

Simple mechanism adjusts monitoring timer to allow USB auto suspend.

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-18 22:27:48 -04:00