Commit Graph

948914 Commits

Author SHA1 Message Date
Ingo Molnar
992414a18c Merge branch 'locking/nmi' into locking/core, to pick up completed topic branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-08-03 13:00:27 +02:00
Roi Dayan
4203b19c27 netfilter: flowtable: Set offload timeout when adding flow
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

Fixes: 90964016e5 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-03 12:37:24 +02:00
Roi Dayan
73f9407b3e netfilter: conntrack: Move nf_ct_offload_timeout to header file
To be used by callers from other modules.

[ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution
  issue --Pablo ]

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-03 12:36:47 +02:00
Florian Westphal
2ef740da4f selftests: netfilter: add meta iif/oif match test
simple test case, but would have caught this:

FAIL: iifgroupcount, want "packets 2", got
table inet filter {
        counter iifgroupcount {
                packets 0 bytes 0
        }
}

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-03 12:02:22 +02:00
Colin Ian King
2c81ef286c ceph: remove redundant initialization of variable mds
The variable mds is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:28 +02:00
Xiubo Li
a7caa88f8b ceph: fix use-after-free for fsc->mdsc
If the ceph_mdsc_init() fails, it will free the mdsc already.

Reported-by: syzbot+b57f46d8d6ea51960b8c@syzkaller.appspotmail.com
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:27 +02:00
Jia Yang
8e298deb8d ceph: remove unused variables in ceph_mdsmap_decode()
Fix build warnings:

  fs/ceph/mdsmap.c: In function ‘ceph_mdsmap_decode’:
  fs/ceph/mdsmap.c:192:7: warning: variable ‘info_cv’ set but not used [-Wunused-but-set-variable]
  fs/ceph/mdsmap.c:177:7: warning: variable ‘state_seq’ set but not used [-Wunused-but-set-variable]
  fs/ceph/mdsmap.c:123:15: warning: variable ‘mdsmap_cv’ set but not used [-Wunused-but-set-variable]

Note that p is increased in ceph_decode_*.

Signed-off-by: Jia Yang <jiayang5@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:27 +02:00
Randy Dunlap
f1f565a269 ceph: delete repeated words in fs/ceph/
Drop duplicated words "down" and "the" in fs/ceph/.

[ idryomov: merge into a single patch ]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:27 +02:00
Xiubo Li
3b4168dd8b ceph: send client provided metric flags in client metadata
Send metric flags to the MDS, indicating what metrics the client
supports. Currently that consists of cap statistics, and read, write and
metadata latencies.

URL: https://tracker.ceph.com/issues/43435
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:27 +02:00
Xiubo Li
18f473b384 ceph: periodically send perf metrics to MDSes
This will send the caps/read/write/metadata metrics to any available MDS
once per second, which will be the same as the userland client.  It will
skip the MDS sessions which don't support the metric collection, as the
MDSs will close socket connections when they get an unknown type
message.

We can disable the metric sending via the disable_send_metrics module
parameter.

[ jlayton: fix up endianness bug in ceph_mdsc_send_metrics() ]

URL: https://tracker.ceph.com/issues/43215
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:26 +02:00
Xiubo Li
aaf5a47620 ceph: check the sesion state and return false in case it is closed
If the session is already in closed state, we should skip it.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:26 +02:00
Alexander A. Klimov
94f17c00d6 libceph: replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

[ idryomov: Do the same for the CRUSH paper and replace
  ceph.newdream.net with ceph.io. ]

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:26 +02:00
Xu Wang
c00e4522ad ceph: remove unnecessary cast in kfree()
Remove unnecassary casts in the argument to kfree.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:26 +02:00
Jeff Layton
042f649810 libceph: just have osd_req_op_init() return a pointer
The caller can just ignore the return. No need for this wrapper that
just casts the other function to void.

[ idryomov: argument alignment ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:25 +02:00
Xiubo Li
d1d9655052 ceph: do not access the kiocb after aio requests
In aio case, if the completion comes very fast just before the
ceph_read_iter() returns to fs/aio.c, the kiocb will be freed in
the completion callback, then if ceph_read_iter() access again
we will potentially hit the use-after-free bug.

[ jlayton: initialize direct_lock early, and use it everywhere ]

URL: https://tracker.ceph.com/issues/45649
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:25 +02:00
Jeff Layton
585d72f33e ceph: clean up and optimize ceph_check_delayed_caps()
Make this loop look a bit more sane. Also optimize away the spinlock
release/reacquire if we can't get an inode reference.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:21 +02:00
Xiubo Li
fa99677342 ceph: fix potential mdsc use-after-free crash
Make sure the delayed work stopped before releasing the resources.

cancel_delayed_work_sync() will only guarantee that the work finishes
executing if the work is already in the ->worklist.  That means after
the cancel_delayed_work_sync() returns, it will leave the work requeued
if it was rearmed at the end. That can lead to a use after free once the
work struct is freed.

Fix it by flushing the delayed work instead of trying to cancel it, and
ensure that the work doesn't rearm if the mdsc is stopping.

URL: https://tracker.ceph.com/issues/46293
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:21 +02:00
Xiubo Li
b682c6d41b ceph: switch to WARN_ON_ONCE in encode_supported_features()
...and let the errnos bubble up to the callers.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:16 +02:00
Xiubo Li
4f1d756def ceph: add global total_caps to count the mdsc's total caps number
This will help to reduce using the global mdsc->mutex lock in many
places.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:15 +02:00
Xiubo Li
3e699bd865 ceph: add check_session_state() helper and make it global
And remove the unsed mdsc parameter to simplify the code.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:10 +02:00
Johannes Berg
5981fe5b05 mac80211: fix misplaced while instead of if
This never was intended to be a 'while' loop, it should've
just been an 'if' instead of 'while'. Fix this.

I noticed this while applying another patch from Ben that
intended to fix a busy loop at this spot.

Cc: stable@vger.kernel.org
Fixes: b16798f5b9 ("mac80211: mark station unauthorized before key removal")
Reported-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20200803110209.253009ae41ff.I3522aad099392b31d5cf2dcca34cbac7e5832dde@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 11:03:14 +02:00
Ilya Dryomov
6e6f0f0116 libceph: dump class and method names on method calls
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:03:01 +02:00
Ilya Dryomov
5133ba8f15 libceph: use target_copy() in send_linger()
Instead of copying just oloc, oid and flags, copy the entire
linger target.  This is more for consistency than anything else,
as send_linger() -> submit_request() -> __submit_request() sends
the request regardless of what calc_target() says (i.e. both on
CALC_TARGET_NO_ACTION and CALC_TARGET_NEED_RESEND).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-08-03 11:03:01 +02:00
Miaohe Lin
3b1648f109 nl80211: use eth_zero_addr() to clear mac address
Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/1596273349-24333-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:56:22 +02:00
Miaohe Lin
47d76e3190 mac80211: use eth_zero_addr() to clear mac address
Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/1596273158-24183-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:56:04 +02:00
John Crispin
6628d00116 mac8211: fix struct initialisation
Sparse showed up with the following error.
net/mac80211/agg-rx.c:480:43: warning: Using plain integer as NULL pointer

Fixes: 2ab4587675 (mac80211: add support for the ADDBA extension element)
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20200803084540.179908-1-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:55:17 +02:00
Jouni Malinen
4e56cde15f mac80211: Handle special status codes in SAE commit
SAE authentication has been extended with H2E (IEEE 802.11 REVmd) and PK
(WFA) options. Those extensions use special status code values in the
SAE commit messages (Authentication frame with transaction sequence
number 1) to identify which extension is in use. mac80211 was
interpreting those new values as the AP denying authentication and that
resulted in failure to complete SAE authentication in some cases.

Fix this by adding exceptions for the new status code values 126 and
127.

Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200731183830.18735-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:54:54 +02:00
Hui Wang
07c9983b56 Revert "ALSA: hda: call runtime_allow() for all hda controllers"
This reverts commit 9a6418487b ("ALSA: hda: call runtime_allow()
for all hda controllers").

The reverted patch already introduced some regressions on some
machines:
 - on gemini-lake machines, the error of "azx_get_response timeout"
   happens in the hda driver.
 - on the machines with alc662 codec, the audio jack detection doesn't
   work anymore.

Fixes: 9a6418487b ("ALSA: hda: call runtime_allow() for all hda controllers")
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208511
Cc: <stable@vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Link: https://lore.kernel.org/r/20200803064638.6139-1-hui.wang@canonical.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-03 09:28:41 +02:00
ChangSyun Peng
45a4d8fd6c md/raid5: Allow degraded raid6 to do rmw
Degraded raid6 always do reconstruct-write now. With raid6 xor supported,
we can do rmw in degraded raid6. This patch can reduce many read IOs to
improve performance.

If the failed disk is P, Q or the disk we want to write to, we may need to
do reconstruct-write in max degraded raid6. In this situation we can not
read enough data from handle_stripe_dirtying() so we have to set force_rcw
in handle_stripe_fill() to read all data.

Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Reviewed-by: Danny Shih <dannyshih@synology.com>
Signed-off-by: ChangSyun Peng <allenpeng@synology.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:54:31 -07:00
ChangSyun Peng
a1c6ae3d9f md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
In degraded raid5, we need to read parity to do reconstruct-write when
data disks fail. However, we can not read parity from
handle_stripe_dirtying() in force reconstruct-write mode.

Reproducible Steps:

1. Create degraded raid5
mdadm -C /dev/md2 --assume-clean -l5 -n3 /dev/sda2 /dev/sdb2 missing
2. Set rmw_level to 0
echo 0 > /sys/block/md2/md/rmw_level
3. IO to raid5

Now some io may be stuck in raid5. We can use handle_stripe_fill() to read
the parity in this situation.

Cc: <stable@vger.kernel.org> # v4.4+
Reviewed-by: Alex Wu <alexwu@synology.com>
Reviewed-by: BingJing Chang <bingjingc@synology.com>
Reviewed-by: Danny Shih <dannyshih@synology.com>
Signed-off-by: ChangSyun Peng <allenpeng@synology.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:54:25 -07:00
Roger Pau Monne
f5ec672326 Revert "xen/balloon: Fix crash when ballooning on x86 32 bit PAE"
This reverts commit dfd74a1edf.

This has been fixed by commit dca4436d1c which added the out
of bounds check to __add_memory, so that trying to add blocks past
MAX_PHYSMEM_BITS will fail.

Note the check in the Xen balloon driver was bogus anyway, as it
checked the start address of the resource, but it should instead test
the end address to assert the whole resource falls below
MAX_PHYSMEM_BITS.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200727091342.52325-4-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 08:16:18 +02:00
Roger Pau Monne
88a479ff6e xen/balloon: make the balloon wait interruptible
So it can be killed, or else processes can get hung indefinitely
waiting for balloon pages.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200727091342.52325-3-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 08:15:09 +02:00
Roger Pau Monne
1951fa33ec xen/balloon: fix accounting in alloc_xenballooned_pages error path
target_unpopulated is incremented with nr_pages at the start of the
function, but the call to free_xenballooned_pages will only subtract
pgno number of pages, and thus the rest need to be subtracted before
returning or else accounting will be skewed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200727091342.52325-2-roger.pau@citrix.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 08:14:01 +02:00
Connor McAdams
7fe3530427 ALSA: hda/ca0132 - Fix AE-5 microphone selection commands.
The ca0113 command had the wrong group_id, 0x48 when it should've been
0x30. The front microphone selection should now work.

Signed-off-by: Connor McAdams <conmanx360@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200803002928.8638-3-conmanx360@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-03 08:12:17 +02:00
Connor McAdams
cc5edb1bd3 ALSA: hda/ca0132 - Add new quirk ID for Recon3D.
Add a new quirk ID for the Recon3D, as tested by me.

Signed-off-by: Connor McAdams <conmanx360@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200803002928.8638-2-conmanx360@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-03 08:12:02 +02:00
Connor McAdams
a00dc409de ALSA: hda/ca0132 - Fix ZxR Headphone gain control get value.
When the ZxR headphone gain control was added, the ca0132_switch_get
function was not updated, which meant that the changes to the control
state were not saved when entering/exiting alsamixer.

Signed-off-by: Connor McAdams <conmanx360@gmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200803002928.8638-1-conmanx360@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2020-08-03 08:11:40 +02:00
Takashi Iwai
3b5d1afd1f Merge branch 'for-next' into for-linus 2020-08-03 08:10:08 +02:00
Guoqing Jiang
3a31cf3d21 raid5: don't duplicate code for different paths in handle_stripe
As we can see, R5_LOCKED is set and s.locked is increased whether
R5_ReWrite is set or not, so move it to common path.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:52 -07:00
Guoqing Jiang
01b5d32a57 raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show
Replace mddev_lock with spin_lock to align with other show methods in
raid5_attrs.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:52 -07:00
Guoqing Jiang
b3db8a2163 md: print errno in super_written
It is better to print errno instead of bi_status.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:51 -07:00
Guoqing Jiang
e3914d596f md/raid5: remove the redundant setting of STRIPE_HANDLE
The flag is already set before compare rcw with rmw, so it is
not necessary to do it again.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:51 -07:00
Sebastian Parschauer
ec164d07aa md: register new md sysfs file 'uuid' read-only
Report the UUID of the MD array in the following format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

This is useful if you don't want to wait for udev to identify array.
And it is also easy for script to monitor it with the format.

Signed-off-by: Sebastian Parschauer <s.parschauer@gmx.de>
[Guoqing: mention the change in md.rst]
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:51 -07:00
Xiao Ni
d9c0fa509e md: fix max sectors calculation for super 1.0
To grow size of super 1.0 raid array, it is necessary to check the device
max usable size.

Now it uses rdev->sectors for max usable size. If one disk is 500G and the
raid device only uses the 100GB of this disk. rdev->sectors can't tell the
real max usable size. The max usable size should be

dev_size-(superblock_size+bitmap_size+badblock_size).

Also, remove unnecessary sb_start update in super_1_rdev_size_change().

Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
2020-08-02 23:03:51 -07:00
Randy Dunlap
4e722d4fe2 xen: hypercall.h: fix duplicated word
Change the repeated word "as" to "as a".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Link: https://lore.kernel.org/r/20200726001731.19540-1-rdunlap@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 07:48:39 +02:00
Randy Dunlap
e5a52fd2b8 xen/gntdev: gntdev.h: drop a duplicated word
Drop the repeated word "of" in a comment.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Link: https://lore.kernel.org/r/20200719003317.21454-1-rdunlap@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 07:46:42 +02:00
Souptick Joarder
ff669aa812 xen/privcmd: Convert get_user_pages*() to pin_user_pages*()
In 2019, we introduced pin_user_pages*() and now we are converting
get_user_pages*() to the new API as appropriate. [1] & [2] could
be referred for more information. This is case 5 as per document [1].

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
        https://lwn.net/Articles/807108/

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Paul Durrant <xadimgnik@gmail.com>
Link: https://lore.kernel.org/r/1594525195-28345-4-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 07:43:16 +02:00
Souptick Joarder
a0c34d2251 xen/privcmd: Mark pages as dirty
pages need to be marked as dirty before unpinned it in
unlock_pages() which was oversight. This is fixed now.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Paul Durrant <xadimgnik@gmail.com>
Link: https://lore.kernel.org/r/1594525195-28345-3-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 07:42:18 +02:00
Souptick Joarder
e398fb4bdf xen/privcmd: Corrected error handling path
Previously, if lock_pages() end up partially mapping pages, it used
to return -ERRNO due to which unlock_pages() have to go through
each pages[i] till *nr_pages* to validate them. This can be avoided
by passing correct number of partially mapped pages & -ERRNO separately,
while returning from lock_pages() due to error.

With this fix unlock_pages() doesn't need to validate pages[i] till
*nr_pages* for error scenario and few condition checks can be ignored.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Paul Durrant <xadimgnik@gmail.com>
Link: https://lore.kernel.org/r/1594525195-28345-2-git-send-email-jrdr.linux@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
2020-08-03 07:41:25 +02:00
Linus Torvalds
c6fe44d96f list: add "list_del_init_careful()" to go with "list_empty_careful()"
That gives us ordering guarantees around the pair.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-02 20:39:44 -07:00
Linus Torvalds
2a9127fcf2 mm: rewrite wait_on_page_bit_common() logic
It turns out that wait_on_page_bit_common() had several problems,
ranging from just unfair behavioe due to re-queueing at the end of the
wait queue when re-trying, and an outright bug that could result in
missed wakeups (but probably never happened in practice).

This rewrites the whole logic to avoid both issues, by simply moving the
logic to check (and possibly take) the bit lock into the wakeup path
instead.

That makes everything much more straightforward, and means that we never
need to re-queue the wait entry: if we get woken up, we'll be notified
through WQ_FLAG_WOKEN, and the wait queue entry will have been removed,
and everything will have been done for us.

Link: https://lore.kernel.org/lkml/CAHk-=wjJA2Z3kUFb-5s=6+n0qbTs8ELqKFt9B3pH85a8fGD73w@mail.gmail.com/
Link: https://lore.kernel.org/lkml/alpine.LSU.2.11.2007221359450.1017@eggly.anvils/
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-02 20:39:44 -07:00