When an NFC-DEP Initiator receives a frame with
an incorrect CRC or with a parity error, and the
frame is at least 4 bytes long, its supposed to
send a NACK to the Target. The Initiator can
send up to 'N(retry,nack)' consecutive NACKs
where 2 <= 'N(retry,nack)' <= 5. When the limit
is exceeded, a PROTOCOL EXCEPTION is raised.
Any other type of transmission error is to be
ignored and the Initiator should continue
waiting for a new frame. This is described
in section 14.12.5.4 of the NFC Digital Protocol
Spec.
The digital layer's NFC-DEP code doesn't implement
any of this so add it. This support diverges from
the spec in two significant ways:
a) NACKs will be sent for ANY error reported by the
driver except a timeout. This is done because
there is currently no way for the digital layer
to distinguish a CRC or parity error from any
other type of error reported by the driver.
b) All other errors will cause a PROTOCOL EXCEPTION
even frames with CRC errors that are less than 4
bytes.
The value chosen for 'N(retry,nack)' is 2.
Targets do not send NACK PDUs.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the peer in an NFC-DEP exchange has a
packet to send that is larger than the local
maximum payload, it sets the 'MI' bit in the
'I' PDU. This indicates that NFC-DEP chaining
is to occur.
When such a PDU is received, the local side
responds with an 'ACK' PDU and this continues
until the peer sends an 'I' PDU with the 'MI'
bit cleared. This indicates that the chaining
sequence is complete and the entire packet has
been transferred.
Receiving chained PDUs is currently not supported
by the digital layer so add that support. When a
chaining sequence is initiated by the peer, the
digital layer will allocate an skb large enough
to hold 8 maximum sized frame payloads. The maximum
payload can range from 64 to 254 bytes so 8 * 254 =
2032 seems like a reasonable compromise between
potentially wasting memory and constantly reallocating
new, larger skbs.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When the NFC-DEP code is given a packet to send
that is larger than the peer's maximum payload,
its supposed to set the 'MI' bit in the 'I' PDU's
Protocol Frame Byte (PFB). Setting this bit
indicates that NFC-DEP chaining is to occur.
When NFC-DEP chaining is progress, sender 'I' PDUs
are acknowledged with 'ACK' PDUs until the last 'I'
PDU in the chain (which has the 'MI' bit cleared)
is responded to with a normal 'I' PDU. This can
occur while in Initiator mode or in Target mode.
Sender NFC-DEP chaining is currently not implemented
in the digital layer so add that support. Unfortunately,
since sending a frame may require writing the CRC to the
end of the data, the relevant data part of the original
skb must be copied for each intermediate frame.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The maximum payload for NFC-DEP exchanges (i.e., the
number of bytes between SoD and EoD) is negotiated
using the ATR_REQ, ATR_RES, and PSL_REQ commands.
The valid maximum lengths are 64, 128, 192, and 254
bytes.
Currently, NFC-DEP code assumes that both sides are
always using 254 byte maximums and ignores attempts
by the peer to change it. Instead, implement the
negotiation code, enforce the local maximum when
receiving data from the peer, and don't send payloads
that exceed the remote's maximum. The default local
maximum is 254 bytes.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NFC-DEP DEP_REQ and DEP_RES exchanges using 'I'
and 'ACK/NACK' PDUs have a sequence number called
the Packet Number Information (PNI). The PNI
is incremented (modulo 4) after every DEP_REQ/
DEP_RES pair and should be verified by the digital
layer code. That verification isn't always done,
though, so add code to make sure that it is done.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
According to chapter 14 of the NFC-DEP Digital
Protocol Spec., the NAD byte should never be
present in DEP_REQ or DEP_RES frames. However,
this is not enforced so add that enforcement code.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When in Target mode, the Initiator specifies whether
subsequent DEP_REQ and DEP_RES frames will include
a DID byte by the value passed in the ATR_REQ. If
the DID value in the ATR_REQ is '0' then no DID
byte will be included. If the DID value is between
'1' and '14' then a DID byte containing the same
value must be included in subsequent DEP_REQ and
DEP_RES frames. Any other DID value is invalid.
This is specified in sections 14.8.1.2 and 14.8.2.2
of the NFC Digital Protocol Spec.
Checking the DID value (if it should be there at all),
is not currently supported by the digital layer's
NFC-DEP code. Add this support by remembering the
DID value in the ATR_REQ, checking the DID value of
received DEP_REQ frames (if it should be there at all),
and including the remembered DID value in DEP_RES
frames when appropriate.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When in Initiator mode, the digital layer's
NFC-DEP code always sets the Device ID (DID)
value in the ATR_REQ to '0'. This means that
subsequent DEP_REQ and DEP_RES frames must
never include a DID byte. This is specified
in sections 14.8.1.1 and 14.8.2.1 of the NFC
Digital Protocol Spec.
Currently, the digital layer's NFC-DEP code
doesn't enforce this rule so add code to ensure
that there is no DID byte in DEP_RES frames.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Rearrange some of the code in digital_in_recv_dep_res()
and digital_tg_recv_dep_req() so the initial code looks
similar. The real reason is prepare the code for some
upcoming patches that require these changes.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When digital_in_send_cmd() or digital_tg_send_cmd()
fail, they do not free the skb that was passed to
them so the routine that allocated the skb should
free it. Currently, there are several routines in
the NFC-DEP code that don't do this so make them.
Reviewed-by: Thierry Escande <thierry.escande@linux.intel.com>
Tested-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
run again, until the patches for using the physical architected
timers got accepted.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJUcK60AAoJEPOmecmc0R2BPqYH/202nLDZaBuY3ZaXIoRUNZx4
s5oPQWmJ9xSW4K93tnAtrQ3YP9NE4SrSSAKIszeY7sNI0XapdFrT8ARinh3bA8Ir
cCr0iXMGHwpWF9K9nfYM8z+fNIwVOa2KktbmZi9tgOvxfOFkadRXoBj6gdxwN8oO
2UTgbB7iD2F0FRTj+kmrxTML+vkW8LyladQIoyyeCwoFCkUYoPerLpuwvnPEujoO
aN3eot9w7uqP28NwsD9nC5Ns+mzICF9ts3IRCyv1zoGkXpA+7meHvNvtrOW2v3JJ
xdUh8bvCwSMrb0jh07PDfSFMkOZlfNIl8eXUZrxx66ss8VDMCXwVaxLS7llxdCk=
=gBaE
-----END PGP SIGNATURE-----
Merge tag 'v3.19-rockchip-dts3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into next/dt
Pull "temporarily disable rk3288-smp" from Heiko Stuebner:
Disable smp again on rk3288 temporarily to make Olof's boottest
run again, until the patches for using the physical architected
timers got accepted.
* tag 'v3.19-rockchip-dts3' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
ARM: dts: rockchip: temporarily disable smp on rk3288
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The dma_mask and coherent_dma_mask need to be set or DMA memory allocations
will fail with error messages like this:
ep93xx-dma ep93xx-dma-m2p: coherent DMA mask is unset
ep93xx-dma ep93xx-dma-m2m: coherent DMA mask is unset
Add the missing information to the ep93xx-dma-m2p and ep93xx-dma-m2m
devices.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Reported-by: Jeremy Moles <cubicool@gmail.com>
Tested-by: Alexander Sverdlin <subaparts@yandex.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
When injecting a floating interrupt and no CPU is idle we
kick one CPU to do an external exit. In case of I/O we
should trigger an I/O exit instead. This does not matter
for Linux guests as external and I/O interrupts are
enabled/disabled at the same time, but play safe anyway.
The same holds true for machine checks. Since there is no
special exit, just reuse the generic stop exit. The injection
code inside the VCPU loop will recheck anyway and rearm the
proper exits (e.g. control registers) if necessary.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
A couple of our interception handlers rewind the PSW to the beginning
of the instruction to run the intercepted instruction again during the
next SIE entry. This normally works fine, but there is also the
possibility that the instruction did not get run directly but via an
EXECUTE instruction.
In this case, the PSW does not point to the instruction that caused the
interception, but to the EXECUTE instruction! So we've got to rewind the
PSW to the beginning of the EXECUTE instruction instead.
This is now accomplished with a new helper function kvm_s390_rewind_psw().
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch includes two small fixes for the PFMF handler: First, the
start address for PFMF has to be masked according to the current
addressing mode, which is now done with kvm_s390_logical_to_effective().
Second, the protection exceptions have a lower priority than the
specification exceptions, so the check for low-address protection
has to be moved after the last spot where we inject a specification
exception.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch adds the ahci_st.c driver found on STMicroelectronics
stih41x consumer electronics SoC's into the STI arch section
of the maintainers file.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Acked-by: Maxime Coquelin <maxime.coquelin@st.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This option has been marked for deprecation and removal for
a little more than two years, but it's not been very clearly
signalled since it was always possible to just select it.
Make it unselectable now to signal anyone who's still using
it after all this time more clearly. They can still get it
back, but only by patching the kernel.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Replace the ARCH_NR_GPIOS-sized static array of GPIO descriptors by
dynamically-allocated arrays for each GPIO chip.
This change makes gpio_to_desc() perform in O(n) (where n is the number
of GPIO chips registered) instead of O(1), however since n is rarely
bigger than 1 or 2 no noticeable performance issue is expected.
Besides this provides more incentive for GPIO consumers to move to the
gpiod interface. One could use a O(log(n)) structure to link the GPIO
chips together, but considering the low limit of n the hypothetical
performance benefit is probably not worth the added complexity.
This patch uses kcalloc() in gpiochip_add(), which removes the ability
to add a chip before kcalloc() can operate. I am not aware of such
cases, but if someone bisects up to this patch then I will be proven
wrong...
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
gpiolib's gpiod_get_direction() function returns the EINVAL error
if .get_direction callback is not defined.
The patch implements the callback for mxs chip which is useful
for debugging.
Inspired by arch/arm/mach-at91/gpio.c
On the moment the patch is required to get the patch
"serial: mxs-auart: enable PPS support" working.
It is planned to introduce new mctrl_gpio helpers to avoid
gpiod_get_direction() function.
Signed-off-by: Janusz Uzycki <j.uzycki@elproma.com.pl>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Use dynamic allocation of GPIOs instead of looking at the gpio%u alias
in DT.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
secure_computing() is called first in syscall_trace_enter() so that
a system call will be aborted quickly without doing succeeding syscall
tracing if seccomp rules want to deny that system call.
On compat task, syscall numbers for system calls allowed in seccomp mode 1
are different from those on normal tasks, and so _NR_seccomp_xxx_32's need
to be redefined.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
SIGSYS is primarily used in secure computing to notify tracer of syscall
events. This patch allows signal handler on compat task to get correct
information with SA_SIGINFO specified when this signal is delivered.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Those values (__NR_seccomp_*) are used solely in secure_computing()
to identify mode 1 system calls. If compat system calls have different
syscall numbers, asm/seccomp.h may override them.
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If tracer modifies a syscall number to -1, this traced system call should
be skipped with a return value specified in x0.
This patch implements this semantics.
Please note:
* syscall entry tracing and syscall exit tracing (ftrace tracepoint and
audit) are always executed, if enabled, even when skipping a system call
(that is, -1).
In this way, we can avoid a potential bug where audit_syscall_entry()
might be called without audit_syscall_exit() at the previous system call
being called, that would cause OOPs in audit_syscall_entry().
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[will: fixed up conflict with blr rework]
Signed-off-by: Will Deacon <will.deacon@arm.com>
This regeset is intended to be used to get and set a system call number
while tracing.
There was some discussion about possible approaches to do so:
(1) modify x8 register with ptrace(PTRACE_SETREGSET) indirectly,
and update regs->syscallno later on in syscall_trace_enter(), or
(2) define a dedicated regset for this purpose as on s390, or
(3) support ptrace(PTRACE_SET_SYSCALL) as on arch/arm
Thinking of the fact that user_pt_regs doesn't expose 'syscallno' to
tracer as well as that secure_computing() expects a changed syscall number,
especially case of -1, to be visible before this function returns in
syscall_trace_enter(), (1) doesn't work well.
We will take (2) since it looks much cleaner.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Simplify cpu_relax() to a simple barrier(). Performance wise this doesn't
seem to make any big difference anymore, since nearly all lock variants
have directed yield semantics in the meantime.
Also this makes s390 behave like all other architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case somebody attempted to open the device during online
processing the partition detection ioctl may have failed.
Added a retry loop to avoid not detected partitions.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix race for sleep_on requests leading to list corruption.
The SLEEP_ON_END_TAG is set during CQR clean up. Remove it from
interrupt handler to avoid the CQR from being cleared when it is
still in the device_queue.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
During device activation all paths could be lost and since the device
is not active it has no indication of this fact - hence the CQR will
time-out. The following cancelation might fail with -EINVAL because
CIO took over control and started path verification. In this case mark
the CQR as being CLEARED since it could not be running any more.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The debug_unregister() function performs also input parameter validation.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The common code ftrace_return_address(n), which is just a wrapper for
__builtin_return_address(n), will only work for n > 0 if CONFIG_FRAME_POINTER
is set to 'y'. Otherwise it will return 0.
Since on s390 we will never have that config option set to 'y'
ftrace_return_address() won't work at all for n > 0.
Luckily we always compile the kernel with -mkernel-backchain which
in turn means that __builtin_return_address(n) will always work.
So let ftrace_return_address(n) map to __builtin_return_address(n).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Allow to specify the Compoment ID for Call Home via the kernel
configuration. This removes the need for distribution specific
patch against the sclp_async.c source file.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
User can pass snoop option to enable/disable the snoop behavior, but
currently azx_check_snoop_available() always turns it off for some
devices. For better debuggability, change the parameter as bint, and
allow user to enable/disable forcibly the snoop when specified via the
module option.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add a new driver_caps bit, AZX_DCAPS_SNOOP_OFF, to set the snoop off
as default. This new bit is used for the checks in
azx_check_snoop_available(). Most of case-switches are replaced with
the new dcaps in each entry.
While working on it, for avoiding to spend more bits, combine three
bits AZX_DCAPS_SNOOP_SCH, AZX_DCAPS_SNOOP_ATI and
AZX_DCAPS_SNOOP_NVIDIA bits into a flat type of two bits. This
reduces the bits usages, and assign AZX_DCAPS_OFF to this empty bit
now.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Little cleanup to distinguish each phase easily
Signed-off-by: Changman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: modify indentation for code readability]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Two regression fixes from Ville.
* tag 'drm-intel-fixes-2014-11-27' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Cancel vdd off work before suspend
drm/i915: Ignore SURFLIVE and flip counter when the GPU gets reset
More on-disk format consolidation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Move the on-disk ACL format to xfs_format.h, so that repair can
use the common defintion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
More consolidatation for the on-disk format defintions. Note that the
XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related
to the on disk format, but depends on a CONFIG_ option.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Here blkno is a daddr_t, which is a __s64; it's possible to hold
a value which is negative, and thus pass the (blkno >= eofs)
test. Then we try to do a xfs_perag_get() for a ridiculous
agno via xfs_daddr_to_agno(), and bad things happen when that
fails, and returns a null pag which is dereferenced shortly
thereafter.
Found via a user-supplied fuzzed image...
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The expectation since the introduction the lazy superblock counters is
that the counters are synced and superblock logged appropriately as part
of the filesystem freeze sequence. This does not occur, however, due to
the logic in xfs_fs_writable() that prevents progress when the fs is in
any state other than SB_UNFROZEN.
While this is a bug, it has not been exposed to date because the last
thing XFS does during freeze is dirty the log. The log recovery process
recalculates the counters from AGI/AGF metadata to ensure everything is
correct. Therefore should a crash occur while an fs is frozen, the
subsequent log recovery puts everything back in order. See the following
commit for reference:
92821e2b [XFS] Lazy Superblock Counters
We might not always want to rely on dirtying the log on a frozen fs.
Modify xfs_log_sbcount() to proceed when the filesystem is freezing but
not once the freeze process has completed. Modify xfs_fs_writable() to
accept the minimum freeze level for which modifications should be
blocked to support various codepaths.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The error handling in xfs_qm_log_quotaoff() has a couple problems. If
xfs_trans_commit() fails, we fall through to the error block and call
xfs_trans_cancel(). This is incorrect on commit failure. If
xfs_trans_reserve() fails, we jump to the error block, cancel the tp and
restore the superblock qflags to oldsbqflag. However, oldsbqflag has
been initialized to zero and not yet updated from the original flags so
we set the flags to zero.
Fix up the error handling in xfs_qm_log_quotaoff() to not restore flags
if they haven't been modified and not cancel the tp on commit failure.
Remove the flag restore code altogether because commit error is the only
failure condition and we don't know whether the transaction made it to
disk.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
There's no need to store a full struct xfs_trans_res on the stack in
xfs_create() and copy the fields. Use a pointer to the appropriate
structures embedded in the xfs_mount.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The xfslogd workqueue is a global, single-job workqueue for buffer ioend
processing. This means we allow for a single work item at a time for all
possible XFS mounts on a system. fsstress testing in loopback XFS over
XFS configurations has reproduced xfslogd deadlocks due to the single
threaded nature of the queue and dependencies introduced between the
separate XFS instances by online discard (-o discard).
Discard over a loopback device converts the discard request to a hole
punch (fallocate) on the underlying file. Online discard requests are
issued synchronously and from xfslogd context in XFS, hence the xfslogd
workqueue is blocked in the upper fs waiting on a hole punch request to
be servied in the lower fs. If the lower fs issues I/O that depends on
xfslogd to complete, both filesystems end up hung indefinitely. This is
reproduced reliabily by generic/013 on XFS->loop->XFS test devices with
the '-o discard' mount option.
Further, docker implementations appear to use this kind of configuration
for container instance filesystems by default (container fs->dm->
loop->base fs) and therefore are subject to this deadlock when running
on XFS.
Replace the global xfslogd workqueue with a per-mount variant. This
guarantees each mount access to a single worker and prevents deadlocks
due to inter-fs dependencies introduced by discard. Since the queue is
only responsible for buffer iodone processing at this point in time,
rename xfslogd to xfs-buf.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>