Commit Graph

737480 Commits

Author SHA1 Message Date
Xiao Liang
40d4071ce2 perf/x86/amd/power: Do not load AMD power module on !AMD platforms
The AMD power module can be loaded on non AMD platforms, but unload fails
with the following Oops:

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: __list_del_entry_valid+0x29/0x90
 Call Trace:
  perf_pmu_unregister+0x25/0xf0
  amd_power_pmu_exit+0x1c/0xd23 [power]
  SyS_delete_module+0x1a8/0x2b0
  ? exit_to_usermode_loop+0x8f/0xb0
  entry_SYSCALL_64_fastpath+0x20/0x83

Return -ENODEV instead of 0 from the module init function if the CPU does
not match.

Fixes: c7ab62bfbe ("perf/x86/amd/power: Add AMD accumulated power reporting mechanism")
Signed-off-by: Xiao Liang <xiliang@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180122061252.6394-1-xiliang@redhat.com
2018-01-24 13:00:35 +01:00
Marc Zyngier
c5baa1be8f irqdomain: Kill CONFIG_IRQ_DOMAIN_DEBUG
CONFIG_IRQ_DOMAIN_DEBUG is similar to CONFIG_GENERIC_IRQ_DEBUGFS,
just with less information.

Spring cleanup time.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yang Shunyong <shunyong.yang@hxt-semitech.com>
Link: https://lkml.kernel.org/r/20180117142647.23622-1-marc.zyngier@arm.com
2018-01-24 12:32:58 +01:00
Waiman Long
1df37383a8 x86/retpoline: Remove the esp/rsp thunk
It doesn't make sense to have an indirect call thunk with esp/rsp as
retpoline code won't work correctly with the stack pointer register.
Removing it will help compiler writers to catch error in case such
a thunk call is emitted incorrectly.

Fixes: 76b043848f ("x86/retpoline: Add initial retpoline support")
Suggested-by: Jeff Law <law@redhat.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Kees Cook <keescook@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
Cc: Paul Turner <pjt@google.com>
Link: https://lkml.kernel.org/r/1516658974-27852-1-git-send-email-longman@redhat.com
2018-01-24 12:31:55 +01:00
Kuninori Morimoto
03bbf9f5e4 ASoC: ak4613: call dummy write for PW_MGMT1/3 when Playback
Power Down Release Command (PMVR, PMDAC, RSTN, PMDA1-PMDA6)
which are located on PW_MGMT1 / PW_MGMT3 register must be
write again after at least 5 LRCK cycle or later on each command.
Otherwise, Playback volume will be 0dB.
Basically, it should be

        1.   PowerDownRelease by Power Management1 <= call 1.x after 5LRCK
        1.x  Dummy write      to Power Management1
        2.   PowerDownRelease by Power Management3 <= call 2.x after 5LRCK
        2.x  Dummy write      to Power Management3

To avoid too many dummy write, this patch is merging these.

        1.   PowerDownRelease by Power Management1
        2.   PowerDownRelease by Power Management3   <= call after 5LRCK
        2.x  Dummy write      to Power Management1/3 <= merge dummy write

This patch adds dummy write when Playback Start timing.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-01-24 11:31:25 +00:00
Aaro Koskinen
6507831f0e MIPS: bcm47xx: enable ZBOOT support
Enable ZBOOT support. The WRT54GL router's bootloader limits kernel
size to 3 MB with the normal load address, which is a bit challenging
vmlinux size with modern Linux. A compressed kernel allows booting
much bigger kernels.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18492/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-01-24 11:20:29 +00:00
Luis de Bethencourt
cc4804448c MIPS: Fix trailing semicolon
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Fixes: d0f0f63ac1 ("MIPS: Rewrite csum_fold to plain C.")
Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18517/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-01-24 11:17:20 +00:00
Kuninori Morimoto
f30a4c313e ASoC: soc-pcm: don't call flush_delayed_work() many times in soc_pcm_private_free()
commit f523acebbb ("ASoC: add Component level pcm_new/pcm_free v2")
added component level pcm_new/pcm_free, but flush_delayed_work()
on soc_pcm_private_free() is called in for_each_rtdcom() loop.
It doesn't need to be called many times.
This patch moves it out of loop.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2018-01-24 10:30:20 +00:00
Amir Goldstein
8383f17488 ovl: wire up NFS export operations
Now that NFS export operations are implemented, enable overlayfs NFS
export support if the "nfs_export" feature is enabled.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:06 +01:00
Amir Goldstein
0617015403 ovl: lookup indexed ancestor of lower dir
ovl_lookup_real() in lower layer walks back lower parents to find the
topmost indexed parent. If an indexed ancestor is found before reaching
lower layer root, ovl_lookup_real() is called recursively with upper
layer to walk back from indexed upper to the topmost connected/hashed
upper parent (or up to root).

ovl_lookup_real() in upper layer then walks forward to connect the topmost
upper overlay dir dentry and ovl_lookup_real() in lower layer continues to
walk forward to connect the decoded lower overlay dir dentry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:05 +01:00
Amir Goldstein
4b91c30a5a ovl: lookup connected ancestor of dir in inode cache
Decoding a dir file handle requires walking backward up to layer root and
for lower dir also checking the index to see if any of the parents have
been copied up.

Lookup overlay ancestor dentry in inode/dentry cache by decoded real
parents to shortcut looking up all the way back to layer root.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:05 +01:00
Amir Goldstein
7a9dadef96 ovl: hash non-indexed dir by upper inode for NFS export
Non-indexed upper dirs are encoded as upper file handles. When NFS export
is enabled, hash non-indexed directory inodes by upper inode, so we can
find them in inode cache using the decoded upper inode.

When NFS export is disabled, directories are not indexed on copy up, so
hash non-indexed directory inodes by origin inode, the same hash key
that is used before copy up.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:04 +01:00
Amir Goldstein
988925164f ovl: decode pure lower dir file handles
Similar to decoding a pure upper dir file handle, decoding a pure lower
dir file handle is implemented by looking an overlay dentry of the same
path as the pure lower path and verifying that the overlay dentry's
real lower matches the decoded real lower file handle.

Unlike the case of upper dir file handle, the lookup of overlay path by
lower real path can fail or find a mismatched overlay dentry if any of
the lower parents have been copied up and renamed. To address this case
we will need to check if any of the lower parents are indexed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:04 +01:00
Amir Goldstein
3b0bfc6ed3 ovl: decode indexed dir file handles
Decoding an indexed dir file handle is done by looking up the file handle
in index dir by name and then decoding the upper dir from the index origin
file handle. The decoded upper path is used to lookup an overlay dentry of
the same path.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:03 +01:00
Amir Goldstein
9436a1a339 ovl: decode lower file handles of unlinked but open files
Lookup overlay inode in cache by origin inode, so we can decode a file
handle of an open file even if the index has a whiteout index entry to
mark this overlay inode was unlinked.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:03 +01:00
Amir Goldstein
f71bd9cfb6 ovl: decode indexed non-dir file handles
Decoding an indexed non-dir file handle is similar to decoding a lower
non-dir file handle, but additionally, we lookup the file handle in index
dir by name to find the real upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:03 +01:00
Amir Goldstein
f941866fc4 ovl: decode lower non-dir file handles
Decoding a lower non-dir file handle is done by decoding the lower dentry
from underlying lower fs, finding or allocating an overlay inode that is
hashed by the real lower inode and instantiating an overlay dentry with
that inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:02 +01:00
Amir Goldstein
03e1c584ff ovl: encode lower file handles
For indexed or lower non-dir, encode a non-connectable lower file handle
from origin inode. For indexed or lower dir, when ofs->numlower == 1,
encode a lower file handle from lower dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:02 +01:00
Amir Goldstein
05e1f11816 ovl: copy up before encoding non-connectable dir file handle
Decoding a merge dir, whose origin's parent is under a redirected
lower dir is not always possible. As a simple aproximation, we do
not encode lower dir file handles when overlay has multiple lower
layers and origin is below the topmost lower layer.

We should later relax this condition and copy up only the parent
that is under a redirected lower.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:01 +01:00
Amir Goldstein
b305e8443f ovl: encode non-indexed upper file handles
We only need to encode origin if there is a chance that the same object was
encoded pre copy up and then we need to stay consistent with the same
encoding also after copy up.

In case a non-pure upper is not indexed, then it was copied up before NFS
export support was enabled. In that case, we don't need to worry about
staying consistent with pre copy up encoding and we encode an upper file
handle.

This mitigates the problem that with no index, we cannot find an upper
inode from origin inode, so we cannot decode a non-indexed upper from
origin file handle.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:01 +01:00
Amir Goldstein
3985b70a3e ovl: decode connected upper dir file handles
Until this change, we decoded upper file handles by instantiating an
overlay dentry from the real upper dentry. This is sufficient to handle
pure upper files, but insufficient to handle merge/impure dirs.

To that end, if decoded real upper dir is connected and hashed, we
lookup an overlay dentry with the same path as the real upper dir.
If decoded real upper is non-dir, we instantiate a disconnected overlay
dentry as before this change.

Because ovl_fh_to_dentry() returns a connected overlay dir dentry,
exportfs never needs to call get_parent() and get_name() to reconnect an
upper overlay dir. Because connectable non-dir file handles are not
supported, exportfs will not be able to use fh_to_parent() and get_name()
methods to reconnect a disconnected non-dir to its parent. Therefore, the
methods get_parent() and get_name() are implemented just to print out a
sanity warning and the method fh_to_parent() is implemented to warn the
user that using the 'subtree_check' exportfs option is not supported.

An alternative approach could have been to implement instantiating of
an overlay directory inode from origin/index and implement get_parent()
and get_name() by calling into underlying fs operations and them
instantiating the overlay parent dir.

The reasons for not choosing the get_parent() approach were:
- Obtaining a disconnected overlay dir dentry would requires a
  delicate re-factoring of ovl_lookup() to get a dentry with overlay
  parent info. It was preferred to avoid doing that re-factoring unless
  it was proven worthy.
- Going down the path of disconnected dir would mean that the (non
  trivial) code path of d_splice_alias() could be traveled and that
  meant writing more tests and introduces race cases that are very hard
  to hit on purpose. Taking the path of connecting overlay dentry by
  forward lookup is therefore the safe and boring way to avoid surprises.

The culprits of the chosen "connected overlay dentry" approach:
- We need to take special care to rename of ancestors while connecting
  the overlay dentry by real dentry path. These subtleties are usually
  handled by generic exportfs and VFS code.
- In a hypothetical workload, we could end up in a loop trying to connect,
  interrupted by rename and restarting connect forever.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:00 +01:00
Amir Goldstein
8556a4205b ovl: decode pure upper file handles
Decoding an upper file handle is done by decoding the upper dentry from
underlying upper fs, finding or allocating an overlay inode that is
hashed by the real upper inode and instantiating an overlay dentry with
that inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:26:00 +01:00
Amir Goldstein
8ed5eec9d6 ovl: encode pure upper file handles
Encode overlay file handles as struct ovl_fh containing the file handle
encoding of the real upper inode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:59 +01:00
Amir Goldstein
a01f64b5c0 ovl: document NFS export
Document NFS export design.
Followup patches will implement this design.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:59 +01:00
Miklos Szeredi
f9c34674bc vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
Those helpers are going to be used by overlayfs to implement
NFS export decode.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:59 +01:00
Amir Goldstein
c62520a83b ovl: store 'has_upper' and 'opaque' as bit flags
We need to make some room in struct ovl_entry to store information
about redirected ancestors for NFS export, so cram two booleans as
bit flags.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:58 +01:00
Amir Goldstein
aa3ff3c152 ovl: copy up of disconnected dentries
With NFS export, some operations on decoded file handles (e.g. open,
link, setattr, xattr_set) may call copy up with a disconnected non-dir.
In this case, we will copy up lower inode to index dir without
linking it to upper dir.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:58 +01:00
Amir Goldstein
829c28be9b ovl: use d_splice_alias() in place of d_add() in lookup
This is required for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:57 +01:00
Amir Goldstein
0aceb53e73 ovl: do not pass overlay dentry to ovl_get_inode()
This is needed for using ovl_get_inode() for decoding file handles
for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:57 +01:00
Amir Goldstein
91ffe7beb3 ovl: factor out ovl_get_index_fh() helper
The helper is needed to lookup an index by file handle for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:56 +01:00
Amir Goldstein
24f0b17203 ovl: whiteout orphan index entries on mount
Orphan index entries are non-dir index entries whose union nlink count
dropped to zero. With index=on, orphan index entries are removed on
mount. With NFS export feature enabled, orphan index entries are replaced
with white out index entries to block future open by handle from opening
the lower file.

When dir index has a stale 'upper' xattr, we assume that the upper dir
was removed and we treat the dir index as orphan entry that needs to be
whited out or removed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:56 +01:00
Amir Goldstein
e7dd0e7134 ovl: whiteout index when union nlink drops to zero
With NFS export feature enabled, when overlay inode nlink drops to
zero, instead of removing the index entry, replace it with a whiteout
index entry.

This is needed for NFS export in order to prevent future open by handle
from opening the lower file directly.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:56 +01:00
Amir Goldstein
89a17556ce ovl: cleanup dir index when dir nlink drops to zero
When non-dir index union nlink drops to zero the non-dir index
is cleaned. Do the same for directory type index entries when
union directory is removed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:55 +01:00
Amir Goldstein
016b720f55 ovl: index directories on copy up for NFS export
With the NFS export feature enabled, all dirs are indexed on copy up.
Non-dir files are copied up directly to indexdir and then hardlinked
to upper dir.

Directories are copied up to indexdir, then an index entry is created
in indexdir with 'upper' xattr pointing to the copied up dir and then
the copied up dir is moved to upper dir.

Directory index is also used for consistency verification, like
detecting multiple redirected dirs to the same lower dir on lookup.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:55 +01:00
Amir Goldstein
fbd2d2074b ovl: index all non-dir on copy up for NFS export
With the NFS export feature enabled, all non-dir are indexed on copy up.
The copy up origin inode of an indexed non-dir can be used as a unique
identifier of the overlay object.

The full index is also used for consistency verfication, like detecting
multiple non-hardlink uppers with the same 'origin' on lookup.

Directory index on copy up will be implemented by following patch.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:54 +01:00
Amir Goldstein
24b33ee104 ovl: create ovl_need_index() helper
The helper determines which lower file needs to be indexed
on copy up and before nlink changes.

For index=on, the helper evaluates to true for lower hardlinks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:54 +01:00
Amir Goldstein
9ee60ce249 ovl: cleanup temp index entries
A previous failed attempt to create or whiteout a directory index may
leave index entries named '#%x' in the index dir. Cleanup those temp
entries on mount instead of failing the mount.

In the future, we may drop 'work' dir and use 'index' dir instead.
This change is enough for cleaning up copy up leftovers 'from the future',
but it is not enough for cleaning up rmdir leftovers 'from the future'
(i.e. temp dir containing whiteouts).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:53 +01:00
Amir Goldstein
e8f9e5b780 ovl: verify directory index entries on mount
Directory index entries should have 'upper' xattr pointing to the real
upper dir. Verifying that the upper dir file handle is not stale is
expensive, so only verify stale directory index entries on mount if
NFS export feature is enabled.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:53 +01:00
Amir Goldstein
7db25d36d9 ovl: verify whiteout index entries on mount
Whiteout index entries are used as an indication that an exported
overlay file handle should be treated as stale (i.e. after unlink
of the overlay inode).

Check on mount that whiteout index entries have a name that looks like
a valid file handle and cleanup invalid index entries.

For whiteout index entries, do not check that they also have valid
origin fh and nlink xattr, because those xattr do not exist for a
whiteout index entry.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:53 +01:00
Amir Goldstein
ad1d615cec ovl: use directory index entries for consistency verification
A directory index is a directory type entry in index dir with a
"trusted.overlay.upper" xattr containing an encoded ovl_fh of the merge
directory upper dir inode.

On lookup of non-dir files, lower file is followed by origin file handle.
On lookup of dir entries, lower dir is found by name and then compared
to origin file handle. We only trust dir index if we verified that lower
dir matches origin file handle, otherwise index may be inconsistent and
we ignore it.

If we find an indexed non-upper dir or an indexed merged dir, whose
index 'upper' xattr points to a different upper dir, that means that the
lower directory may be also referenced by another upper dir via redirect,
so we fail the lookup on inconsistency error.

To be consistent with directory index entries format, the association of
index dir to upper root dir, that was stored by older kernels in
"trusted.overlay.origin" xattr is now stored in "trusted.overlay.upper"
xattr. This also serves as an indication that overlay was mounted with a
kernel that support index directory entries. For backward compatibility,
if an 'origin' xattr exists on the index dir we also verify it on mount.

Directory index entries are going to be used for NFS export.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:52 +01:00
Amir Goldstein
86eaa13046 ovl: unbless lower st_ino of unverified origin
On a malformed overlay, several redirected dirs can point to the same
dir on a lower layer. This presents a similar challenge as broken
hardlinks, because different objects in the overlay can return the same
st_ino/st_dev pair from stat(2).

For broken hardlinks, we do not provide constant st_ino on copy up to
avoid this inconsistency. When NFS export feature is enabled, apply
the same logic to files and directories with unverified lower origin.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:52 +01:00
Amir Goldstein
37b12916c0 ovl: verify stored origin fh matches lower dir
When the NFS export feature is enabled, overlayfs implicitly enables the
feature "verify_lower". When the "verify_lower" feature is enabled, a
directory inode found in lower layer by name or by redirect_dir is
verified against the file handle of the copy up origin that is stored in
the upper layer.

This introduces a change of behavior for the case of lower layer
modification while overlay is offline. A lower directory created or
moved offline under an exisitng upper directory, will not be merged with
that upper directory.

The NFS export feature should not be used after copying layers, because
the new lower directory inodes would fail verification and won't be
merged with upper directories.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:51 +01:00
Amir Goldstein
f168f1098d ovl: add support for "nfs_export" configuration
Introduce the "nfs_export" config, module and mount options.

The NFS export feature depends on the "index" feature and enables two
implicit overlayfs features: "index_all" and "verify_lower".
The "index_all" feature creates an index on copy up of every file and
directory. The "verify_lower" feature uses the full index to detect
overlay filesystems inconsistencies on lookup, like redirect from
multiple upper dirs to the same lower dir.

NFS export can be enabled for non-upper mount with no index. However,
because lower layer redirects cannot be verified with the index, enabling
NFS export support on an overlay with no upper layer requires turning off
redirect follow (e.g. "redirect_dir=nofollow").

The full index may incur some overhead on mount time, especially when
verifying that lower directory file handles are not stale.

NFS export support, full index and consistency verification will be
implemented by following patches.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-01-24 11:25:37 +01:00
Daniel Borkmann
1fc0bf49e4 Merge branch 'bpf-samples-sockmap-improvements'
John Fastabend says:

====================
The sockmap sample is pretty simple at the moment. All it does is open
a few sockets attach BPF programs/sockmaps and sends a few packets.

However, for testing and debugging I wanted to have more control over
the sendmsg format and data than provided by tools like iperf3/netperf,
etc. The reason is for testing BPF programs and stream parser it is
helpful to be able submit multiple sendmsg calls with different msg
layouts. For example lots of 1B iovs or a single large MB of data, etc.

Additionally, my current test setup requires an entire orchestration
layer (cilium) to run. As well as lighttpd and http traffic generators
or for kafka testing brokers and clients. This makes it a bit more
difficult when doing performance optimizations to incrementally test
small changes and come up with performance delta's and perf numbers.

By adding a few more options and an additional few tests the sockmap
sample program can show a more complete example and do some of the
above. Because the sample program is self contained it doesn't require
additional infrastructure to run either.

This series, although still fairly crude, does provide some nice
additions. They are

  - a new sendmsg tests with a sender and recv threads
  - a new base tests so we can get metrics/data without BPF
  - multiple GBps of throughput on base and sendmsg tests
  - automatically set rlimit and common variables

That said the UI is still primitive, more features could be added,
more tests might be useful, the reporting is bare bones, etc. But,
IMO lets push this now rather than sit on it for weeks until I get
time to do the above improvements. Additional patches can address
the other limitations/issues. Another thing I am considering is
moving this into selftests, after a few more fixes so we avoid
false failures, so that we get more sockmap testing.

v2: removed bogus file added by patch 3/7
v3: 1/7 replace goto out with returns, remove sighandler update,
    2/7 free iov in error cases
    3/7 fix bogus makefile change, bail out early on errors
v4: add Martin's "nits" and ACKs along with fixes to 2/7 iov free
    also pointed out by Martin.

Thanks Daniel and Martin for the reviews!
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:47:00 +01:00
John Fastabend
8e0ef38052 bpf: sockmap set rlimit
Avoid extra step of setting limit from cmdline and do it directly in
the program.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:59 +01:00
John Fastabend
ede154776c bpf: sockmap put client sockets in blocking mode
Put client sockets in blocking mode otherwise with sendmsg tests
its easy to overrun the socket buffers which results in the test
being aborted.

The original non-blocking was added to handle listen/accept with
a single thread the client/accepted sockets do not need to be
non-blocking.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:59 +01:00
John Fastabend
ce5373be1a bpf: sockmap sample add base test without any BPF for comparison
Add a base test that does not use BPF hooks to test baseline case.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:59 +01:00
John Fastabend
66fdd1a3cd bpf: sockmap sample, report bytes/sec
Report bytes/sec sent as well as total bytes. Useful to get rough
idea how different configurations and usage patterns perform with
sockmap.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:59 +01:00
John Fastabend
d7d6437acf bpf: sockmap sample, use fork() for send and recv
Currently for SENDMSG tests first send completes then recv runs. This
does not work well for large data sizes and/or many iterations. So
fork the recv and send handler so that we run both send and recv. In
the future we can add a parameter to do more than a single fork of
tx/rx.

With this we can get many GBps of data which helps exercise the
sockmap code.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:58 +01:00
John Fastabend
eaf8c6eec5 bpf: add sendmsg option for testing BPF programs
When testing BPF programs using sockmap I often want to have more
control over how sendmsg is exercised. This becomes even more useful
as new sockmap program types are added.

This adds a test type option to select type of test to run. Currently,
only "ping" and "sendmsg" are supported, but more can be added as
needed.

The new help argument gives the following,

 Usage: ./sockmap --cgroup <cgroup_path>
 options:
 --help         -h
 --cgroup       -c
 --rate         -r
 --verbose      -v
 --iov_count    -i
 --length       -l
 --test         -t

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:58 +01:00
John Fastabend
6627426fa2 bpf: refactor sockmap sample program update for arg parsing
sockmap sample program takes arguments from cmd line but it reads them
in using offsets into the array. Because we want to add more arguments
in the future lets do proper argument handling.

Also refactor code to pull apart sock init and ping/pong test. This
allows us to add new tests in the future.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-24 10:46:58 +01:00