Commit Graph

766102 Commits

Author SHA1 Message Date
Stephen Boyd
45ba387511 Merge branches 'clk-allwinner', 'clk-rockchip', 'clk-tegra', 'clk-berlin' and 'clk-qcom-mmagic' into clk-next
* clk-allwinner:
  clk: sunxi-ng: r40: export a regmap to access the GMAC register
  clk: sunxi-ng: r40: rewrite init code to a platform driver
  clk: sunxi-ng: add support for H6 PRCM CCU

* clk-rockchip:
  clk: rockchip: remove deprecated gate-clk code and dt-binding
  clk: rockchip: use match_string() helper

* clk-tegra:
  clk: tegra: Add quirk for getting CDEV1/2 clocks on Tegra20
  clk: tegra20: Correct parents of CDEV1/2 clocks
  clk: tegra20: Add DEV1/DEV2 OSC dividers

* clk-berlin:
  clk: berlin: switch to SPDX license identifier

* clk-qcom-mmagic:
  clk: qcom: mmcc-msm8996: leave all mmagic gdscs and clocks always enabled
  clk: qcom: Register the gdscs before the clocks
  clk: qcom: gdsc: Add support for ALWAYS_ON gdscs
2018-06-04 12:27:44 -07:00
Stephen Boyd
7fa50aa559 Merge branches 'clk-hisi-usb', 'clk-silent-bulk', 'clk-mtk-hdmi', 'clk-mtk-mali' and 'clk-imx6ul-ccosr' into clk-next
* clk-hisi-usb:
  clk: hisilicon: add missing usb3 clocks for Hi3798CV200 SoC

* clk-silent-bulk:
  clk: bulk: silently error out on EPROBE_DEFER

* clk-mtk-hdmi:
  clk: mediatek: correct the clocks for MT2701 HDMI PHY module

* clk-mtk-mali:
  clk: mediatek: add g3dsys support for MT2701 and MT7623
  dt-bindings: reset: mediatek: add entry for Mali-450 node to refer
  dt-bindings: clock: mediatek: add entry for Mali-450 node to refer
  dt-bindings: clock: mediatek: add g3dsys bindings

* clk-imx6ul-ccosr:
  clk: imx: Add new clo01 and clo2 controlled by CCOSR
2018-06-04 12:27:40 -07:00
Stephen Boyd
b7c82cec04 Merge branches 'clk-stm32mp1', 'clk-samsung', 'clk-uniphier-mpeg', 'clk-stratix10' and 'clk-aspeed' into clk-next
* clk-stm32mp1:
  clk: stm32mp1: Fix a memory leak in 'clk_stm32_register_gate_ops()'
  clk: stm32mp1: Add CLK_IGNORE_UNUSED to ck_sys_dbg clock
  clk: stm32mp1: remove ck_apb_dbg clock
  clk: stm32mp1: set stgen_k clock as critical
  clk: stm32mp1: add missing tzc2 clock
  clk: stm32mp1: fix SAI3 & SAI4 clocks
  clk: stm32mp1: remove unused dfsdm_src[] const
  clk: stm32mp1: add missing static

* clk-samsung:
  clk: samsung: simplify getting .drvdata

* clk-uniphier-mpeg:
  clk: uniphier: add LD11/LD20 stream demux system clock

* clk-stratix10:
  clk: socfpga: stratix10: suppress unbinding platform's clock driver
  clk: socfpga: stratix10: use platform driver APIs

* clk-aspeed:
  clk:aspeed: Fix reset bits for PCI/VGA and PECI
  clk: aspeed: Support second reset register
2018-06-04 12:27:34 -07:00
Stephen Boyd
872e47f75f Merge branches 'clk-qcom-rpmh', 'clk-npcm7xx', 'clk-of-parent-count' and 'clk-qcom-rcg-fix' into clk-next
* clk-qcom-rpmh:
  dt-bindings: clock: Introduce QCOM RPMh clock bindings

* clk-npcm7xx:
  clk: npcm7xx: fix return value check in npcm7xx_clk_init()
  clk: npcm7xx: add clock controller
  dt-binding: clk: npcm750: Add binding for Nuvoton NPCM7XX Clock

* clk-of-parent-count:
  pinctrl: sunxi: Use of_clk_get_parent_count() instead of open coding
  soc/tegra: pmc: Use of_clk_get_parent_count() instead of open coding
  soc: rockchip: power-domain: Use of_clk_get_parent_count() instead of open coding
  ARM: timer-sp: Use of_clk_get_parent_count() instead of open coding
  clk: Extract OF clock helpers in <linux/of_clk.h>

* clk-qcom-rcg-fix:
  clk: qcom: Base rcg parent rate off plan frequency
2018-06-04 12:27:29 -07:00
Stephen Boyd
43705f5294 Merge branch 'clk-actions' into clk-next
* clk-actions:
  clk: actions: Add S900 SoC clock support
  clk: actions: Add pll clock support
  clk: actions: Add composite clock support
  clk: actions: Add fixed factor clock support
  clk: actions: Add factor clock support
  clk: actions: Add divider clock support
  clk: actions: Add mux clock support
  clk: actions: Add gate clock support
  clk: actions: Add common clock driver support
  dt-bindings: clock: Add Actions S900 clock bindings
2018-06-04 12:27:02 -07:00
Stephen Boyd
101cfc9f78 Merge branches 'clk-warn', 'clk-core', 'clk-spear' and 'clk-qcom-msm8998' into clk-next
* clk-warn:
  clk: Print the clock name and warning cause

* clk-core:
  clk: Remove clk_init_cb typedef

* clk-spear:
  clk: spear: fix WDT clock definition on SPEAr600

* clk-qcom-msm8998:
  clk: qcom: Add MSM8998 Global Clock Control (GCC) driver
2018-06-04 12:26:39 -07:00
David S. Miller
4cd328f839 Merge branch 'sh_eth-fix-and-clean-up-sh_eth_soft_swap'
Sergei Shtylyov says:

====================
sh_eth: fix & clean up sh_eth_soft_swap()

Here's a set of 3 patches against DaveM's 'net-next.git' repo. First one fixes an
old buffer endiannes issue (luckily, the ARM SoCs are smart enough to not actually
care) plus couple clean ups around sh_eth_soft_swap()...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:26 -04:00
Sergei Shtylyov
1100149a23 sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap()
When initializing 'maxp' in sh_eth_soft_swap(), the buffer length needs
to be rounded  up -- that's just asking for DIV_ROUND_UP()!

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:25 -04:00
Sergei Shtylyov
bb2fa4e847 sh_eth: uninline sh_eth_soft_swap()
sh_eth_tsu_soft_swap() is called twice by the driver, remove *inline* and
move  that function  from the header to the driver itself to let gcc decide
whether to expand it inline or not...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:25 -04:00
Sergei Shtylyov
232b6743e4 sh_eth: make sh_eth_soft_swap() work on ARM
Browsing  thru the driver disassembly, I noticed that ARM gcc generated
no  code  whatsoever for sh_eth_soft_swap() while building a little-endian
kernel -- apparently __LITTLE_ENDIAN__ was not being #define'd, however
it got implicitly #define'd when building with the SH gcc (I could only
find the explicit #define __LITTLE_ENDIAN that was #include'd when building
a little-endian kernel).  Luckily, the Ether controller  only doing big-
endian DMA is encountered on the early SH771x SoCs only and all ARM SoCs
implement EDMR.DE and thus set 'sh_eth_cpu_data::hw_swap'. But anyway, we
need to fix the #ifdef inside sh_eth_soft_swap() to something that would
work on all architectures...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:24 -04:00
Trond Myklebust
3f0b3cf46e NFS: Filter cache invalidation when holding a delegation
If the client holds a delegation, then ensure we filter out attempts
to invalidate the size, owner, group owner, or mode unless we made the
change, in which case, check that NFS_INO_REVAL_FORCED is set by the
caller.
Always filter out attempts to invalidate the change attribute and
size, since we are authoritative for those.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
4ebe83af20 NFS: Ignore NFS_INO_REVAL_FORCED in nfs_check_inode_attributes()
If we hold a delegation, we should not need to call
nfs_check_inode_attributes() since we already know which attributes
are valid, and which ones may still need revalidation. The state
of the NFS_INO_REVAL_FORCED flag is therefore irrelevant.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
c80d17c55d NFS: Improve caching while holding a delegation
Make sure that the client completely ignores change attribute and size
changes on the server when it holds a delegation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
0b467264d0 NFS: Fix attribute revalidation
Don't mark attributes as invalid just because they have changed. Instead,
for the purposes of adjusting the attribute cache timeout, keep a
separate variable that tracks whether or not a change occurred.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
6a97d02dfe NFS: fix up nfs_setattr_update_inode
Always try to set the attributes, even if we don't have a valid struct
nfs_fattr.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
97c2c17af9 NFSv4: Ensure the inode is clean when we set a delegation
If there are attributes that are still invalid when we set a delegation,
then we need to set the NFS_INO_REVAL_FORCED flag.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:58 -04:00
Trond Myklebust
7c6726546c NFSv4: Ignore NFS_INO_REVAL_FORCED in nfs4_proc_access
If we hold a delegation, we don't need to care about whether or not
the inode attributes are up to date. We know we can cache the results
of this call regardless.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-04 15:03:20 -04:00
Linus Torvalds
c5e7a7ea22 swait: strengthen language to discourage use
We already earlier discouraged people from using this interface in
commit 88796e7e5c ("sched/swait: Document it clearly that the swait
facilities are special and shouldn't be used"), but I just got a pull
request with a new broken user.

So make the comment *really* clear.

The swait interfaces are bad, and should not be used unless you have
some *very* strong reasons that include tons of hard performance numbers
on just why you want to use them, and you show that you actually
understand that they aren't at all like the normal wait/wakeup
interfaces.

So far, every single user has been suspect.  The main user is KVM, which
is completely pointless (there is only ever one waiter, which avoids the
interface subtleties, but also means that having a queue instead of a
pointer is counter-productive and certainly not an "optimization").

So make the comments much stronger.

Not that anybody likely reads them anyway, but there's always some
slight hope that it will cause somebody to think twice.

I'd like to remove this interface entirely, but there is the theoretical
possibility that it's actually the right thing to use in some situation,
most likely some deep RT use.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-04 12:01:15 -07:00
Dongsheng Yang
23edca8649 rbd: flush rbd_dev->watch_dwork after watch is unregistered
There is a problem if we are going to unmap a rbd device and the
watch_dwork is going to queue delayed work for watch:

unmap Thread                    watch Thread                  timer
do_rbd_remove
  cancel_tasks_sync(rbd_dev)
                                queue_delayed_work for watch
  destroy_workqueue(rbd_dev->task_wq)
    drain_workqueue(wq)
    destroy other resources in wq
                                                              call_timer_fn
                                                                __queue_work()

Then the delayed work escape the cancel_tasks_sync() and
destroy_workqueue() and we will get an user-after-free call trace:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  Modules linked in:
  CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     4.17.0-rc6+ #13
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  RIP: 0010:__queue_work+0x6a/0x3b0
  RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
  RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
  RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
  R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
  R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
  FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
  Call Trace:
   <IRQ>
   ? __queue_work+0x3b0/0x3b0
   call_timer_fn+0x2d/0x130
   run_timer_softirq+0x16e/0x430
   ? tick_sched_timer+0x37/0x70
   __do_softirq+0xd2/0x280
   irq_exit+0xd5/0xe0
   smp_apic_timer_interrupt+0x6c/0x130
   apic_timer_interrupt+0xf/0x20

[ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch()
  either bails out early because the watch is UNREGISTERED at that point
  or just gets cancelled. ]

Cc: stable@vger.kernel.org
Fixes: 99d1694310 ("rbd: retry watch re-registration periodically")
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:02 +02:00
Chengguang Xu
c7f04944bc ceph: update description of some mount options
Based on code, default value of rsize/wsize is 16 MB.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:02 +02:00
Chengguang Xu
3619aa8b74 ceph: show ino32 if the value is different with default
In current ceph_show_options(), there is no item for showing 'ino32',
so add showing mount option 'ino32' if the value is different with
default.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:02 +02:00
Chengguang Xu
8db0c7596f ceph: strengthen rsize/wsize/readdir_max_bytes validation
The check (intval < PAGE_SIZE) will involve type cast, so even when
specifying negative value to rsize/wsize/readdir_max_bytes, it will
pass the validation check successfully.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:01 +02:00
Chengguang Xu
c36ed50de2 ceph: fix alignment of rasize
On currently logic:
when I specify rasize=0~1 then it will be 4096.
when I specify rasize=2~4097 then it will be 8192.

Make it the same as rsize & wsize.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:01 +02:00
Luis Henriques
73fb0949cf ceph: fix use-after-free in ceph_statfs()
KASAN found an UAF in ceph_statfs.  This was a one-off bug but looking at
the code it looks like the monmap access needs to be protected as it can
be modified while we're accessing it.  Fix this by protecting the access
with the monc->mutex.

  BUG: KASAN: use-after-free in ceph_statfs+0x21d/0x2c0
  Read of size 8 at addr ffff88006844f2e0 by task trinity-c5/304

  CPU: 0 PID: 304 Comm: trinity-c5 Not tainted 4.17.0-rc6+ #172
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
  Call Trace:
   dump_stack+0xa5/0x11b
   ? show_regs_print_info+0x5/0x5
   ? kmsg_dump_rewind+0x118/0x118
   ? ceph_statfs+0x21d/0x2c0
   print_address_description+0x73/0x2b0
   ? ceph_statfs+0x21d/0x2c0
   kasan_report+0x243/0x360
   ceph_statfs+0x21d/0x2c0
   ? ceph_umount_begin+0x80/0x80
   ? kmem_cache_alloc+0xdf/0x1a0
   statfs_by_dentry+0x79/0xb0
   vfs_statfs+0x28/0x110
   user_statfs+0x8c/0xe0
   ? vfs_statfs+0x110/0x110
   ? __fdget_raw+0x10/0x10
   __se_sys_statfs+0x5d/0xa0
   ? user_statfs+0xe0/0xe0
   ? mutex_unlock+0x1d/0x40
   ? __x64_sys_statfs+0x20/0x30
   do_syscall_64+0xee/0x290
   ? syscall_return_slowpath+0x1c0/0x1c0
   ? page_fault+0x1e/0x30
   ? syscall_return_slowpath+0x13c/0x1c0
   ? prepare_exit_to_usermode+0xdb/0x140
   ? syscall_trace_enter+0x330/0x330
   ? __put_user_4+0x1c/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

  Allocated by task 130:
   __kmalloc+0x124/0x210
   ceph_monmap_decode+0x1c1/0x400
   dispatch+0x113/0xd20
   ceph_con_workfn+0xa7e/0x44e0
   process_one_work+0x5f0/0xa30
   worker_thread+0x184/0xa70
   kthread+0x1a0/0x1c0
   ret_from_fork+0x35/0x40

  Freed by task 130:
   kfree+0xb8/0x210
   dispatch+0x15a/0xd20
   ceph_con_workfn+0xa7e/0x44e0
   process_one_work+0x5f0/0xa30
   worker_thread+0x184/0xa70
   kthread+0x1a0/0x1c0
   ret_from_fork+0x35/0x40

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:01 +02:00
Yan, Zheng
aae1a442f8 ceph: prevent i_version from going back
inode info from non-auth can be stale.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:01 +02:00
Yan, Zheng
fa466743a9 ceph: fix wrong check for the case of updating link count
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:01 +02:00
Ilya Dryomov
a86f009f10 libceph: allocate the locator string with GFP_NOFAIL
calc_target() isn't supposed to fail with anything but POOL_DNE, in
which case we report that the pool doesn't exist and fail the request
with -ENOENT.  Doing this for -ENOMEM is at the very least confusing
and also harmful -- as the preceding requests complete, a short-lived
locator string allocation is likely to succeed after a wait.

(We used to call ceph_object_locator_to_pg() for a pi lookup.  In
theory that could fail with -ENOENT, hence the "ret != -ENOENT" warning
being removed.)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:46:00 +02:00
Ilya Dryomov
c843d13cae libceph: make abort_on_full a per-osdc setting
The intent behind making it a per-request setting was that it would be
set for writes, but not for reads.  As it is, the flag is set for all
fs/ceph requests except for pool perm check stat request (technically
a read).

ceph_osdc_abort_on_full() skips reads since the previous commit and
I don't see a use case for marking individual requests.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:46:00 +02:00
Ilya Dryomov
690f951d7e libceph: don't abort reads in ceph_osdc_abort_on_full()
Don't consider reads for aborting and use ->base_oloc instead of
->target_oloc, as done in __submit_request().

Strictly speaking, we shouldn't be aborting FULL_TRY/FULL_FORCE writes
either.  But, there is an inconsistency in FULL_TRY/FULL_FORCE handling
on the OSD side [1], so given that neither of these is used in the
kernel client, leave it for when the OSD behaviour is sorted out.

[1] http://tracker.ceph.com/issues/24339

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:59 +02:00
Ilya Dryomov
6001567c14 libceph: avoid a use-after-free during map check
Sending map check after complete_request() was called is not only
useless, but can lead to a use-after-free as req->r_kref decrement in
__complete_request() races with map check code.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:59 +02:00
Ilya Dryomov
29e878201e libceph: don't warn if req->r_abort_on_full is set
The "FULL or reached pool quota" warning is there to explain paused
requests.  No need to emit it if pausing isn't going to occur.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
4eea0fefd7 libceph: use for_each_request() in ceph_osdc_abort_on_full()
Scanning the trees just to see if there is anything to abort is
unnecessary -- all that is needed here is to update the epoch barrier
first, before we start aborting.  Simplify and do the update inside the
loop before calling abort_request() for the first time.

The switch to for_each_request() also fixes a bug: homeless requests
weren't even considered for aborting.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
88bc1922c2 libceph: defer __complete_request() to a workqueue
In the common case, req->r_callback is called by handle_reply() on the
ceph-msgr worker thread without any locks.  If handle_reply() fails, it
is called with both osd->lock and osdc->lock.  In the map check case,
it is called with just osdc->lock but held for write.  Finally, if the
request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(),
it is called directly on the submitter's thread, again with both locks.

req->r_callback on the submitter's thread is relatively new (introduced
in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting
on itself:

  inode_wait_for_writeback+0x26/0x40
  evict+0xb5/0x1a0
  iput+0x1d2/0x220
  ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
  writepages_finish+0x2d3/0x410 [ceph]
  __complete_request+0x26/0x60 [libceph]
  complete_request+0x2e/0x70 [libceph]
  __submit_request+0x256/0x330 [libceph]
  submit_request+0x2b/0x30 [libceph]
  ceph_osdc_start_request+0x25/0x40 [libceph]
  ceph_writepages_start+0xdfe/0x1320 [ceph]
  do_writepages+0x1f/0x70
  __writeback_single_inode+0x45/0x330
  writeback_sb_inodes+0x26a/0x600
  __writeback_inodes_wb+0x92/0xc0
  wb_writeback+0x274/0x330
  wb_workfn+0x2d5/0x3b0

Defer __complete_request() to a workqueue in all failure cases so it's
never on the same thread as ceph_osdc_start_request() and always called
with no locks held.

Link: http://tracker.ceph.com/issues/23978
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
26df726bcd libceph: move more code into __complete_request()
Move req->r_completion wake up and req->r_kref decrement into
__complete_request().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-06-04 20:45:58 +02:00
Ilya Dryomov
0d09c57d08 libceph: no need to call flush_workqueue() before destruction
destroy_workqueue() drains the workqueue before proceeding with
destruction.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:57 +02:00
Yan, Zheng
a57d9064e4 ceph: flush pending works before shutdown super
Pending works hold inode references, which cause "Busy inodes after
unmount" warning.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:57 +02:00
Yan, Zheng
12b69d5f6f ceph: abort osd requests on force umount
This avoid force umount waiting on page writeback:

  io_schedule+0xd/0x30
  wait_on_page_bit_common+0xc6/0x130
  __filemap_fdatawait_range+0xbd/0x100
  filemap_fdatawait_keep_errors+0x15/0x40
  sync_inodes_sb+0x1cf/0x240
  sync_filesystem+0x52/0x90
  generic_shutdown_super+0x1d/0x110
  ceph_kill_sb+0x28/0x80 [ceph]
  deactivate_locked_super+0x35/0x60
  cleanup_mnt+0x36/0x70
  task_work_run+0x79/0xa0
  exit_to_usermode_loop+0x62/0x70
  do_syscall_64+0xdb/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  0xffffffffffffffff

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:57 +02:00
Ilya Dryomov
66850df585 libceph: introduce ceph_osdc_abort_requests()
This will be used by the filesystem for "umount -f".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:57 +02:00
Luis Henriques
8c6286f1c6 ceph: fix st_nlink stat for directories
Currently, calling stat on a cephfs directory returns 1 for st_nlink.
This behaviour has recently changed in the fuse client, as some
applications seem to expect this value to be either 0 (if it's
unlinked) or 2 + number of subdirectories.  This behaviour was changed
in the fuse client with commit 67c7e4619188 ("client: use common
interp of st_nlink for dirs").

This patch modifies the kernel client to have a similar behaviour.

Link: https://tracker.ceph.com/issues/23873
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:56 +02:00
Yan, Zheng
597817ddbb ceph: support file lock on directory
Link: http://tracker.ceph.com/issues/24028
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:56 +02:00
Ilya Dryomov
6dd4940ba5 ceph: show wsize only if non-default
This is how it was before commit 95cca2b44e ("ceph: limit osd write
size") went in.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:56 +02:00
Yan, Zheng
4985d6f9e5 ceph: handle the new nfiles/nsubdirs fields in cap message
Without these new fields, stale st_size is returned in following
case.

1. MDS modifies a directory
2. MDS issues CEPH_CAP_ANY_SHARED to client
3. The client satifies stat(2) by its cached metadata. set st_size
   to "i_files + i_subdirs".

Link: http://tracker.ceph.com/issues/23855
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:56 +02:00
Yan, Zheng
a1c6b83581 ceph: define argument structure for handle_cap_grant
The data structure includes the versioned feilds of cap message.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:56 +02:00
Yan, Zheng
2af54a72b5 ceph: update i_files/i_subdirs only when Fs cap is issued
In MDS, file/subdir counts of a directory inode are protected by
filelock. In request reply without Fs cap, nfiles/nsubdirs can be
stale.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:55 +02:00
Yan, Zheng
49a9f4f671 ceph: always get rstat from auth mds
rstat is not tracked by capability. client can't know if rstat from
non-auth mds is uptodate or not.

Link: http://tracker.ceph.com/issues/23538
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:55 +02:00
Yan, Zheng
4e9906e798 ceph: use bit flags to define vxattr attributes
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:55 +02:00
Ilya Dryomov
e5c9388399 libceph: use MSG_TRUNC for discarding received bytes
Avoid a copy into the "skip buffer".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:55 +02:00
Ilya Dryomov
d2935d6f75 libceph: get rid of more_kvec in try_write()
All gotos to "more" are conditioned on con->state == OPEN, but the only
thing "more" does is opening the socket if con->state == PREOPEN.  Kill
that label and rename "more_kvec" to "more".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-06-04 20:45:55 +02:00
Chengguang Xu
fe943d5042 libceph, rbd: add error handling for osd_req_op_cls_init()
Add proper error handling for osd_req_op_cls_init() to replace
BUG_ON statement when failing from memory allocation.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04 20:45:54 +02:00
Linus Torvalds
a31895ad7f Merge tag 'regmap-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
 "This is another quiet release for regmap, there's one minor feature
  improvement for the recently added slimbus support and a few minor
  fixes and cleanups"

* tag 'regmap-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: slimbus: allow register offsets up to 16 bits
  regmap: add missing prototype for devm_init_slimbus
  regmap: Skip clk_put for attached clocks when freeing context
  regmap: include <linux/ktime.h> from include/linux/regmap.h
2018-06-04 11:38:16 -07:00