Rename struct xfs_legacy_ictimestamp to struct xfs_log_legacy_timestamp
as it is a type used for logging timestamps with no relationship to the
in-core inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Rename xfs_ictimestamp_t to xfs_log_timestamp_t as it is a type used
for logging timestamps with no relationship to the in-core inode.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Very late in the cycle but both risky if left unfixed and more or less
obvious..
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmCB9k0PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpWsAIALdmVH9jkmih5R4HUVHPWzzTQKdwpUk0kNoi
AE80GWEDbKKNmFv2L23oX2jgfyVlNe5kGIw/kTc+DUngNTxAFfzRcXM42ix76bXP
GthLXGu5bRnAw1r59fNRXesa+dphpfT45n4pnLRxZCC/ahtT0GZiK9fqMDzy+wEc
mluKtL7hKxp2LPj2GupZ9WBCDKmcVwDKnmc2U87F4zDb12LWZaEU8zxUvJv5OyGS
fXbQ0bEa8Z6iOHl9CAiwcc+5Iv67zTA/dm5trYPFDl44K9tIBB+aBzDH8gzclRrg
fKQuClkYyst3j8Uvq0T8rSW5t/6z37X7D3q8sgNJDh4FY1dAd6M=
=9K6w
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Very late in the cycle but both risky if left unfixed and more or less
obvious.."
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
vhost-vdpa: protect concurrent access to vhost device iotlb
Set err = -ENOMEM if dma_map_sg_attrs() fails so the function reutrns
error.
Fixes: 94abbccdf2 ("vdpa/mlx5: Add shared memory registration code")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20210411083646.910546-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Protect vhost device iotlb by vhost_dev->mutex. Otherwise,
it might cause corruption of the list and interval tree in
struct vhost_iotlb if userspace sends the VHOST_IOTLB_MSG_V2
message concurrently.
Fixes: 4c8cf318("vhost: introduce vDPA-based backend")
Cc: stable@vger.kernel.org
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20210412095512.178-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Kunihiko Hayashi says:
====================
Change phy-mode to RGMII-ID to enable delay pins for RTL8211E
UniPhier PXs2, LD20, and PXs3 boards have RTL8211E ethernet phy, and the
phy have the RX/TX delays of RGMII interface using pull-ups on the RXDLY
and TXDLY pins.
After the commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e rx/tx
delay config"), the delays are working correctly, however, "rgmii" means
no delay and the phy doesn't work. So need to set the phy-mode to
"rgmii-id" to show that RX/TX delays are enabled.
Changes since v1:
- Fix the commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
UniPhier LD20 and PXs3 boards have RTL8211E ethernet phy, and the phy have
the RX/TX delays of RGMII interface using pull-ups on the RXDLY and TXDLY
pins.
After the commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e rx/tx
delay config"), the delays are working correctly, however, "rgmii" means
no delay and the phy doesn't work. So need to set the phy-mode to
"rgmii-id" to show that RX/TX delays are enabled.
Fixes: c73730ee4c ("arm64: dts: uniphier: add AVE ethernet node")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UniPhier PXs2 boards have RTL8211E ethernet phy, and the phy have the RX/TX
delays of RGMII interface using pull-ups on the RXDLY and TXDLY pins.
After the commit bbc4d71d63 ("net: phy: realtek: fix rtl8211e rx/tx
delay config"), the delays are working correctly, however, "rgmii" means
no delay and the phy doesn't work. So need to set the phy-mode to
"rgmii-id" to show that RX/TX delays are enabled.
Fixes: e3cc931921 ("ARM: dts: uniphier: add AVE ethernet node")
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mohammad Athari Bin Ismail says:
====================
Enable DWMAC HW descriptor prefetch
This patch series to add setting for HW descriptor prefetch for DWMAC
version 5.20 onwards. For Intel platform, enable the capability by
default.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
DWMAC Core 5.20 onwards supports HW descriptor prefetching.
Additionally, it also depends on platform specific RTL configuration.
This capability could be enabled by setting DMA_Mode bit-19 (DCHE).
So, to enable this cability, platform must set plat->dma_cfg->dche = true
and the DWMAC core version must be 5.20 onwards. Else, this capability
wouldn`t be configured
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling comm_channel_event in mlx4_master_comm_channel uses a double
loop to determine which slaves have requested work. The search is
always started at lowest slave. This leads to unfairness; lower VFs
tends to be prioritized over higher VFs.
The patch uses find_next_bit to determine which slaves to handle.
Fairness is implemented by always starting at the next to the last
start.
An MPI program has been used to measure improvements. It runs 500
ibv_reg_mr, synchronizes with all other instances and then runs 500
ibv_dereg_mr.
The results running 500 processes, time reported is for running 500
calls:
ibv_reg_mr:
Mod. Org.
mlx4_1 403.356ms 424.674ms
mlx4_2 403.355ms 424.674ms
mlx4_3 403.354ms 424.674ms
mlx4_4 403.355ms 424.674ms
mlx4_5 403.357ms 424.677ms
mlx4_6 403.354ms 424.676ms
mlx4_7 403.357ms 424.675ms
mlx4_8 403.355ms 424.675ms
ibv_dereg_mr:
Mod. Org.
mlx4_1 116.408ms 142.818ms
mlx4_2 116.434ms 142.793ms
mlx4_3 116.488ms 143.247ms
mlx4_4 116.679ms 143.230ms
mlx4_5 112.017ms 107.204ms
mlx4_6 112.032ms 107.516ms
mlx4_7 112.083ms 184.195ms
mlx4_8 115.089ms 190.618ms
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that bnxt_show_temp() returns long but "rc" is an int
and "len" is a u32. With ternary operations the type promotion is quite
tricky. The negative "rc" is first promoted to u32 and then to long so
it ends up being a high positive value instead of a a negative as we
intended.
Fix this by removing the ternary.
Fixes: d69753fa1e ("bnxt_en: return proper error codes in bnxt_show_temp")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several variables are being initialized with values that is never
read and being updated later with a new value. The initializations
are redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422120412.246291-1-colin.king@canonical.com
Use the new helper function and avoid unnecessery second lock/unlock,
which was present in old approach with thermal_cdev_update().
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-4-lukasz.luba@arm.com
Use the new helper function and avoid unnecessery second lock/unlock,
which was present in old approach with thermal_cdev_update().
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-3-lukasz.luba@arm.com
The tz->lock must be hold during the looping over the instances in that
thermal zone. This lock was missing in the governor code since the
beginning, so it's hard to point into a particular commit.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422153624.6074-2-lukasz.luba@arm.com
The cooling device state change generates an event, also when there is no
need, because temperature is low and device is not throttled. Avoid to
unnecessary update the cooling device which means also not sending event.
The cooling device state has not changed because the temperature is still
below the first activation trip point value, so we can do this.
Add a tracking mechanism to make sure it updates cooling devices only
once - when the temperature dropps below first trip point.
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422114308.29684-4-lukasz.luba@arm.com
When the temperature is below the first activation trip point the cooling
devices are not checked, so they cannot maintain fresh statistics. It
leads into the situation, when temperature crosses first trip point, the
statistics are stale and show state for very long period. This has impact
on IPA algorithm calculation and wrong decisions. Thus, check the cooling
devices even when the temperature is low, to refresh these statistics.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210422114308.29684-3-lukasz.luba@arm.com
Add optional dma-coherent property to binding doc.
Found by 'make dtbs_check' on arm64/amlogic DT files.
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210421204833.18523-2-khilman@baylibre.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Take a pass at cleaning up a bunch of warnings
from 'make dtbs_check' that have crept in.
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210421204833.18523-1-khilman@baylibre.com'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCYIEprQAKCRDj7w1vZxhR
xSnLAQDInWpOUh1W1ovIe0lHOWYDvByozCh4zQcssKHJhwp25QD/RPTVbwIJI0KV
pOaf8QuqaP74K7KVpbEPBhJG0d6GtwM=
=AaDi
-----END PGP SIGNATURE-----
Merge tag 'sunxi-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
One fix for the MMC card detect on the Pine H64 board
* tag 'sunxi-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
Link: https://lore.kernel.org/r/45fc5e4d-ef48-4729-a869-79a8f288bb83.lettre@localhost
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
If a generic XDP program changes the destination MAC address from/to
multicast/broadcast, the skb->pkt_type is updated to properly handle
the packet when passed up the stack. When changing the MAC from/to
the NICs MAC, PACKET_HOST/OTHERHOST is not updated, though, making
the behavior different from that of native XDP.
Remember the PACKET_HOST/OTHERHOST state before calling the program
in generic XDP, and update pkt_type accordingly if the destination
MAC address has changed. As eth_type_trans() assumes a default
pkt_type of PACKET_HOST, restore that before calling it.
The use case for this is when a XDP program wants to push received
packets up the stack by rewriting the MAC to the NICs MAC, for
example by cluster nodes sharing MAC addresses.
Fixes: 2972495699 ("net: fix generic XDP to handle if eth header was mangled")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210419141559.8611-1-martin@strongswan.org
When the timeout occurs, we still have to run the following process
for releasing patch request. Otherwise, the PHY would keep no link.
Therefore, use break to stop the loop of loading firmware and
release the patch request rather than return the function directly.
Fixes: 4a51b0e8a0 ("r8152: support PHY firmware for RTL8156 series")
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-04-22
This series contains updates to virtchnl header file, ice, and iavf
drivers.
Vignesh adds support to warn about potentially malicious VFs; those that
are overflowing the mailbox for the ice driver.
Michal adds support for an allowlist/denylist of VF commands based on
supported capabilities for the ice driver.
Brett adds support for iavf UDP segmentation offload by adding the
capability bit to virtchnl, advertising support in the ice driver, and
enabling it in the iavf driver. He also adds a helper function for
getting the VF VSI for ice.
Colin Ian King removes an unneeded pointer assignment.
Qi enables support in the ice driver to support virtchnl requests from
the iavf to configure its own RSS input set. This includes adding new
capability bits, structures, and commands to virtchnl header file.
Haiyue enables configuring RSS flow hash via ethtool to support TCP, UDP
and SCTP protocols in iavf.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an urgent regression fix for a tpm patch set that went in this
merge window. It looks like a rebase before the original pull request
lost a tpm_try_get_ops() so we have a lock imbalance in our code which
is causing oopses. The original patch was correct on the mailing
list.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYIGgCCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishec6AP94khAy
a2Qm9aVa7k6zommZOcly2Vgxcq6SBDQyF7cWyQEAkVBZOpa9b8OyaK2lPS1CDJ5I
avSlzMNYR2iXIArIIEU=
=lY2e
-----END PGP SIGNATURE-----
Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd
Pull tpm fix from James Bottomley:
"This is an urgent regression fix for a tpm patch set that went in this
merge window. It looks like a rebase before the original pull request
lost a tpm_try_get_ops() so we have a lock imbalance in our code which
is causing oopses. The original patch was correct on the mailing list.
I'm sending this in agreement with Mimi (as joint maintainers of
trusted keys) because Jarkko is off communing with the Reindeer or
whatever it is Finns do when on holiday"
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd:
KEYS: trusted: Fix TPM reservation for seal/unseal
Upon file deletion, zero out all fields in ext4_dir_entry2 besides rec_len.
In case sensitive data is stored in filenames, this ensures no potentially
sensitive data is left in the directory entry upon deletion. Also, wipe
these fields upon moving a directory entry during the conversion to an
htree and when splitting htree nodes.
The data wiped may still exist in the journal, but there are future
commits planned to address this.
Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
Link: https://lore.kernel.org/r/20210422180834.2242353-1-leah.rumancik@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric has noticed that after pagecache read rework, generic/418 is
occasionally failing for ext4 when blocksize < pagesize. In fact, the
pagecache rework just made hard to hit race in ext4 more likely. The
problem is that since ext4 conversion of direct IO writes to iomap
framework (commit 378f32bab3), we update inode size after direct IO
write only after invalidating page cache. Thus if buffered read sneaks
at unfortunate moment like:
CPU1 - write at offset 1k CPU2 - read from offset 0
iomap_dio_rw(..., IOMAP_DIO_FORCE_WAIT);
ext4_readpage();
ext4_handle_inode_extension()
the read will zero out tail of the page as it still sees smaller inode
size and thus page cache becomes inconsistent with on-disk contents with
all the consequences.
Fix the problem by moving inode size update into end_io handler which
gets called before the page cache is invalidated.
Reported-and-tested-by: Eric Whitney <enwlinux@gmail.com>
Fixes: 378f32bab3 ("ext4: introduce direct I/O write using iomap infrastructure")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20210415155417.4734-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There are a few warnings about empty debug macros in this driver:
drivers/net/ethernet/neterion/vxge/vxge-main.c: In function 'vxge_probe':
drivers/net/ethernet/neterion/vxge/vxge-main.c:4480:76: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
4480 | "Failed in enabling SRIOV mode: %d\n", ret);
Change them to proper 'do { } while (0)' expressions to make the
code a little more robust and avoid the warnings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure that the poll system call returns proper error flags when port
is removed (nullified port ops), allowing user side to properly fail,
without further read or write.
Fixes: 9a44c1cc63 ("net: Add a WWAN subsystem")
Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the sampling truncation length is invalid (zero), pass the length
of the packet. Without the fix, no payload is reported to user space
when the truncation length is zero.
Fixes: a8700c3dd0 ("netdevsim: Add dummy psample implementation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A link time bug that I had fixed before has come back now that
another sub-module was added to the enetc driver:
ERROR: modpost: "enetc_ierb_register_pf" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined!
The problem is that the enetc Makefile is not actually used for
the ierb module if that is the only built-in driver in there
and everything else is a loadable module.
Fix it by always entering the directory this time, regardless
of which symbols are configured. This should reliably fix the
problem and prevent it from coming back another time.
Fixes: 112463ddbe ("net: dsa: felix: fix link error")
Fixes: e7d48e5fbf ("net: enetc: add a mini driver for the Integrated Endpoint Register Block")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MANA driver causes a build failure in some configurations when
it selects an unavailable symbol:
WARNING: unmet direct dependencies detected for PCI_HYPERV
Depends on [n]: PCI [=y] && X86_64 [=y] && HYPERV [=n] && PCI_MSI [=y] && PCI_MSI_IRQ_DOMAIN [=y] && SYSFS [=y]
Selected by [y]:
- MICROSOFT_MANA [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROSOFT [=y] && PCI_MSI [=y] && X86_64 [=y]
drivers/pci/controller/pci-hyperv.c: In function 'hv_irq_unmask':
drivers/pci/controller/pci-hyperv.c:1217:9: error: implicit declaration of function 'hv_set_msi_entry_from_desc' [-Werror=implicit-function-declaration]
1217 | hv_set_msi_entry_from_desc(¶ms->int_entry.msi_entry, msi_desc);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
A PCI driver should never depend on a particular host bridge
implementation in the first place, but if we have this dependency
it's better to express it as a 'depends on' rather than 'select'.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing downshift params without software reset has no effect,
so call genphy_soft_reset() after change downshift params.
As the datasheet says:
Changes to these bits are disruptive to the normal operation therefore,
any changes to these registers must be followed by software reset
to take effect.
Fixes: 5c6bc5199b ("net: phy: marvell: add downshift support for M88E1111")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing downshift params without software reset has no effect,
so call genphy_soft_reset() after change downshift params.
As the datasheet says:
Changes to these bits are disruptive to the normal operation therefore,
any changes to these registers must be followed by software reset
to take effect.
Fixes: 911af5e149 ("net: phy: marvell: fix downshift function naming")
Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new flag LANDLOCK_CREATE_RULESET_VERSION to
landlock_create_ruleset(2). This enables to retreive a Landlock ABI
version that is useful to efficiently follow a best-effort security
approach. Indeed, it would be a missed opportunity to abort the whole
sandbox building, because some features are unavailable, instead of
protecting users as much as possible with the subset of features
provided by the running kernel.
This new flag enables user space to identify the minimum set of Landlock
features supported by the running kernel without relying on a filesystem
interface (e.g. /proc/version, which might be inaccessible) nor testing
multiple syscall argument combinations (i.e. syscall bisection). New
Landlock features will be documented and tied to a minimum version
number (greater than 1). The current version will be incremented for
each new kernel release supporting new Landlock features. User space
libraries can leverage this information to seamlessly restrict processes
as much as possible while being compatible with newer APIs.
This is a much more lighter approach than the previous
landlock_get_features(2): the complexity is pushed to user space
libraries. This flag meets similar needs as securityfs versions:
selinux/policyvers, apparmor/features/*/version* and tomoyo/version.
Supporting this flag now will be convenient for backward compatibility.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-14-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Add a first document describing userspace API: how to define and enforce
a Landlock security policy. This is explained with a simple example.
The Landlock system calls are described with their expected behavior and
current limitations.
Another document is dedicated to kernel developers, describing guiding
principles and some important kernel structures.
This documentation can be built with the Sphinx framework.
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-13-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Add a basic sandbox tool to launch a command which can only access a
list of file hierarchies in a read-only or read-write way.
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-12-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Test all Landlock system calls, ptrace hooks semantic and filesystem
access-control with multiple layouts.
Test coverage for security/landlock/ is 93.6% of lines. The code not
covered only deals with internal kernel errors (e.g. memory allocation)
and race conditions.
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Vincent Dagonneau <vincent.dagonneau@ssi.gouv.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-11-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
These 3 system calls are designed to be used by unprivileged processes
to sandbox themselves:
* landlock_create_ruleset(2): Creates a ruleset and returns its file
descriptor.
* landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a
ruleset, identified by the dedicated file descriptor.
* landlock_restrict_self(2): Enforces a ruleset on the calling thread
and its future children (similar to seccomp). This syscall has the
same usage restrictions as seccomp(2): the caller must have the
no_new_privs attribute set or have CAP_SYS_ADMIN in the current user
namespace.
All these syscalls have a "flags" argument (not currently used) to
enable extensibility.
Here are the motivations for these new syscalls:
* A sandboxed process may not have access to file systems, including
/dev, /sys or /proc, but it should still be able to add more
restrictions to itself.
* Neither prctl(2) nor seccomp(2) (which was used in a previous version)
fit well with the current definition of a Landlock security policy.
All passed structs (attributes) are checked at build time to ensure that
they don't contain holes and that they are aligned the same way for each
architecture.
See the user and kernel documentation for more details (provided by a
following commit):
* Documentation/userspace-api/landlock.rst
* Documentation/security/landlock.rst
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Link: https://lore.kernel.org/r/20210422154123.13086-9-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
The sb_delete security hook is called when shutting down a superblock,
which may be useful to release kernel objects tied to the superblock's
lifetime (e.g. inodes).
This new hook is needed by Landlock to release (ephemerally) tagged
struct inodes. This comes from the unprivileged nature of Landlock
described in the next commit.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-7-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Using Landlock objects and ruleset, it is possible to tag inodes
according to a process's domain. To enable an unprivileged process to
express a file hierarchy, it first needs to open a directory (or a file)
and pass this file descriptor to the kernel through
landlock_add_rule(2). When checking if a file access request is
allowed, we walk from the requested dentry to the real root, following
the different mount layers. The access to each "tagged" inodes are
collected according to their rule layer level, and ANDed to create
access to the requested file hierarchy. This makes possible to identify
a lot of files without tagging every inodes nor modifying the
filesystem, while still following the view and understanding the user
has from the filesystem.
Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
keep the same struct inodes for the same inodes whereas these inodes are
in use.
This commit adds a minimal set of supported filesystem access-control
which doesn't enable to restrict all file-related actions. This is the
result of multiple discussions to minimize the code of Landlock to ease
review. Thanks to the Landlock design, extending this access-control
without breaking user space will not be a problem. Moreover, seccomp
filters can be used to restrict the use of syscall families which may
not be currently handled by Landlock.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: James Morris <jmorris@namei.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210422154123.13086-8-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Move management of the superblock->sb_security blob out of the
individual security modules and into the security infrastructure.
Instead of allocating the blobs from within the modules, the modules
tell the infrastructure how much space is required, and the space is
allocated there.
Cc: John Johansen <john.johansen@canonical.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-6-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Using ptrace(2) and related debug features on a target process can lead
to a privilege escalation. Indeed, ptrace(2) can be used by an attacker
to impersonate another task and to remain undetected while performing
malicious activities. Thanks to ptrace_may_access(), various part of
the kernel can check if a tracer is more privileged than a tracee.
A landlocked process has fewer privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process's rules (i.e. the tracee must be in a sub-domain of the tracer).
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-5-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
Process's credentials point to a Landlock domain, which is underneath
implemented with a ruleset. In the following commits, this domain is
used to check and enforce the ptrace and filesystem security policies.
A domain is inherited from a parent to its child the same way a thread
inherits a seccomp policy.
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-4-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
A Landlock ruleset is mainly a red-black tree with Landlock rules as
nodes. This enables quick update and lookup to match a requested
access, e.g. to a file. A ruleset is usable through a dedicated file
descriptor (cf. following commit implementing syscalls) which enables a
process to create and populate a ruleset with new rules.
A domain is a ruleset tied to a set of processes. This group of rules
defines the security policy enforced on these processes and their future
children. A domain can transition to a new domain which is the
intersection of all its constraints and those of a ruleset provided by
the current process. This modification only impact the current process.
This means that a process can only gain more constraints (i.e. lose
accesses) over time.
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20210422154123.13086-3-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>
A Landlock object enables to identify a kernel object (e.g. an inode).
A Landlock rule is a set of access rights allowed on an object. Rules
are grouped in rulesets that may be tied to a set of processes (i.e.
subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially
unprivileged ones) to sandbox themselves, we cannot rely on a
system-wide object identification such as file extended attributes.
Indeed, we need innocuous, composable and modular access-controls.
The main challenge with these constraints is to identify kernel objects
while this identification is useful (i.e. when a security policy makes
use of this object). But this identification data should be freed once
no policy is using it. This ephemeral tagging should not and may not be
written in the filesystem. We then need to manage the lifetime of a
rule according to the lifetime of its objects. To avoid a global lock,
this implementation make use of RCU and counters to safely reference
objects.
A following commit uses this generic object management for inodes.
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210422154123.13086-2-mic@digikod.net
Signed-off-by: James Morris <jamorris@linux.microsoft.com>