Jumping to the very next instruction is not very useful:
jmp label
label:
Removing the jump.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a preparatory patch for moving stub32_execve[at]() to this
file. It makes sense to have all execve stubs in one place, so
that they can reuse code.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Interrupt entry points are handled with the following code,
each 32-byte code block contains seven entry points:
...
[push][jump 22] // 4 bytes
[push][jump 18] // 4 bytes
[push][jump 14] // 4 bytes
[push][jump 10] // 4 bytes
[push][jump 6] // 4 bytes
[push][jump 2] // 4 bytes
[push][jump common_interrupt][padding] // 8 bytes
[push][jump]
[push][jump]
[push][jump]
[push][jump]
[push][jump]
[push][jump]
[push][jump common_interrupt][padding]
[padding_2]
common_interrupt:
And there is a table which holds pointers to every entry point,
IOW: to every push.
In cold cache, two jumps are still costlier than one, even
though we get the benefit of them residing in the same
cacheline.
This change replaces short jumps with near ones to
'common_interrupt', and pads every push+jump pair to 8 bytes. This
way, each interrupt takes only one jump.
This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
dispatch table with ".align 8" - we do not need anything
stronger than that.
The table of entry addresses (the interrupt[] array) is no
longer necessary, the address of entries can be easily
calculated as (irq_entries_start + i*8).
text data bss dec hex filename
12546 0 0 12546 3102 entry_64.o.before
11626 0 0 11626 2d6a entry_64.o
The size decrease is because 1656 bytes of .init.rodata are
gone. That's initdata, though. The resident size does go up a
bit.
Run-tested (32 and 64 bits).
Acked-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This change does two things:
Copy-pastes "retint_swapgs:" code into syscall handling code,
the copy is under "syscall_return:" label. The code is unchanged
apart from some label renames.
Removes "opportunistic sysret" code from "retint_swapgs:" code
block, since now it won't be reached by syscall return. This in
fact removes most of the code in question.
text data bss dec hex filename
12530 0 0 12530 30f2 entry_64.o.before
12562 0 0 12562 3112 entry_64.o
Run-tested.
Acked-and-Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427993219-7291-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is my sigreturn test, added mostly unchanged from its old
home. It exercises the sigreturn(2) syscall, specifically
focusing on its interactions with various IRET corner cases.
It tests for correct behavior in several areas that were
historically dangerously buggy. For example, it exercises espfix
on kernels of both bitnesses under various conditions, and it
contains testcases for several now-fixed bugs in IRET error
handling.
If you run it on older kernels without the fixes, your system will
crash. It probably won't eat your data in the process.
There is no released kernel on which the sigreturn_64 test will
pass, but it passes on tip:x86/asm.
I plan to switch to lib.mk for Linux 4.2.
I'm not using the ksft_ helpers at all yet. I can do that later.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/89d10b76b92c7202d8123654dc8d36701c017b3d.1428386971.git.luto@kernel.org
[ Fixed empty format string GCC build warning in trivial_32bit_program.c ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
1) In TCP, don't register an FRTO for cumulatively ACK'd data that was
previously SACK'd, from Neal Cardwell.
2) Need to hold RNL mutex in ipv4 multicast code namespace cleanup,
from Cong WANG.
3) Similarly we have to hold RNL mutex for fib_rules_unregister(), also
from Cong WANG.
4) Revert and rework netns nsid allocation fix, from Nicolas Dichtel.
5) When we encapsulate for a tunnel device, skb->sk still points to the
user socket. So this leads to cases where we retraverse the
ipv4/ipv6 output path with skb->sk being of some other address
family (f.e. AF_PACKET). This can cause things to crash since the
ipv4 output path is dereferencing an AF_PACKET socket as if it were
an ipv4 one.
The short term fix for 'net' and -stable is to elide these socket
checks once we've entered an encapsulation sequence by testing
xmit_recursion.
Longer term we have a better solution wherein we pass the tunnel's
socket down through the output paths, but that is way too invasive
for 'net' and -stable.
From Hannes Frederic Sowa.
6) l2tp_init() failure path forgets to unregister per-net ops, from
Cong WANG.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
net: dsa: fix filling routing table from OF description
l2tp: unregister l2tp_net_ops on failure path
mvneta: dont call mvneta_adjust_link() manually
ipv6: protect skb->sk accesses from recursive dereference inside the stack
netns: don't allocate an id for dead netns
Revert "netns: don't clear nsid too early on removal"
ip6mr: call del_timer_sync() in ip6mr_free_table()
net: move fib_rules_unregister() under rtnl lock
ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup
tcp: fix FRTO undo on cumulative ACK of SACKed range
xen-netfront: transmit fully GSO-sized packets
Commit 1daa4303b4 ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.
Fixes: 1daa4303b4 ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to description in 'include/net/dsa.h', in cascade switches
configurations where there are more than one interconnected devices,
'rtable' array in 'dsa_chip_data' structure is used to indicate which
port on this switch should be used to send packets to that are destined
for corresponding switch.
However, dsa_of_setup_routing_table() fills 'rtable' with port numbers
of the _target_ switch, but not current one.
This commit removes redundant devicetree parsing and adds needed port
number as a function argument. So dsa_of_setup_routing_table() now just
looks for target switch number by parsing parent of 'link' device node.
To remove possible misunderstandings with the way of determining target
switch number, a corresponding comment was added to the source code and
to the DSA device tree bindings documentation file.
This was tested on a custom board with two Marvell 88E6095 switches with
following corresponding routing tables: { -1, 10 } and { 8, -1 }.
Signed-off-by: Pavel Nakonechny <pavel.nakonechny@skitlab.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull input fixes from Dmitry Torokhov:
"Updates for the input subsystem - two more tweaks for ALPS driver to
work out kinks after splitting the touchpad, trackstick, and potential
external PS/2 mouse into separate input devices.
Changes to support ALPS SS4 devices (protocol V8) will be coming in
4.1..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: alps - document stick behavior for protocol V2
Input: alps - report V2 Dualpoint Stick events via the right evdev node
Input: alps - report interleaved bare PS/2 packets via dev3
mvneta_adjust_link() is a callback for of_phy_connect() and should
not be called directly. The result of calling it directly is as below:
Signed-off-by: David S. Miller <davem@davemloft.net>
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Take a look at the first instruction byte before optimizing the NOP -
there might be something else there already, like the ALTERNATIVE_2()
in rdtsc_barrier() which NOPs out on AMD even though we just
patched in an MFENCE.
This happens because the alternatives sees X86_FEATURE_MFENCE_RDTSC,
AMD CPUs set it, we patch in the MFENCE and right afterwards it sees
X86_FEATURE_LFENCE_RDTSC which AMD CPUs don't set and we blindly
optimize the NOP.
Checking whether at least the first byte is 0x90 prevents that.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428181662-18020-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On failure, sys_execve() does not clobber EXTRA_REGS, so we can
just return to userpsace without saving/restoring them.
On success, ELF_PLAT_INIT() in sys_execve() clears all these
registers.
On other executable formats:
- binfmt_flat.c has similar FLAT_PLAT_INIT, but x86 (and everyone
else except sh) doesn't define it.
- binfmt_elf_fdpic.c has ELF_FDPIC_PLAT_INIT, but x86 (and most
others) doesn't define it.
- There are no such hooks in binfmt_aout.c et al. We inherit
EXTRA_REGS from the prior executable.
This inconsistency was not intended.
This change removes SAVE/RESTORE_EXTRA_REGS in stub_execve,
removes register clearing in ELF_PLAT_INIT(),
and instead simply clears them on success in stub_execve.
Run-tested.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428173719-7637-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The 'pax' argument is unnecesary. Instead, store the RAX value
directly in regs.
This pattern goes all the way back to 2.1.106pre1, when restore_sigcontext()
was changed to return an error code instead of EAX directly:
https://git.kernel.org/cgit/linux/kernel/git/history/history.git/diff/arch/i386/kernel/signal.c?id=9a8f8b7ca3f319bd668298d447bdf32730e51174
In 2007 sigaltstack syscall support was added, where the return
value of restore_sigcontext() was changed to carry the memory-copying
failure code.
But instead of putting 'ax' into regs->ax directly, it was carried
in via a pointer and then returned, where the generic syscall return
code copied it to regs->ax.
So there was never any deeper reason for this suboptimal pattern, it
was simply never noticed after being introduced.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1428152303-17154-1-git-send-email-brgerst@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Document that protocol V2 uses standard (bare) PS/2 mouse packets for the
DualPoint stick.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-By: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
On V2 devices the DualPoint Stick reports bare packets, these should be
reported via the "AlpsPS/2 ALPS DualPoint Stick" dev2 evdev node, which also
has the INPUT_PROP_POINTING_STICK propbit set.
Note that since there is no way to distinguish these packets from an external
PS/2 mouse (insofar as these laptops have an external PS/2 port) this means
that we will be reporting PS/2 mouse events via this evdev node too, as we've
been doing in kernel 3.19 and older.
This has been tested on a Dell Latitude D620 and a Dell Latitude E6400,
which both have a V2 touchpad + a DualPoint Stick which reports bare packets.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Bare packets should be reported via the same evdev device independent on
whether they are detected on the beginning of a packet or in the middle
of a packet.
This has been tested on a Dell Latitude E6400, where the DualPoint Stick
reports bare packets, which get reported via dev3 when the touchpad is
idle, and via dev2 when the touchpad and stick are used simultaneously.
This commit fixes this inconsistency by always reporting bare packets via
dev3. Note that since the come from a DualPoint Stick they really should be
reported via dev2, this gets fixed in a later commit.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Here are some small USB fixes and new device ids for 4.0-rc6. Nothing
major, some xhci fixes for reported problems, and some usb-serial device
ids.
All have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlUfxcMACgkQMUfUDdst+ykhvQCfXSkt4KzzKt0nwpNR/NwX5cvD
OQYAnjpRbyzjMiRezIwdd6HQAOUkVPof
=Sa+B
-----END PGP SIGNATURE-----
Merge tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes and new device ids for 4.0-rc6. Nothing
major, some xhci fixes for reported problems, and some usb-serial
device ids.
All have been in linux-next for a while"
* tag 'usb-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: ftdi_sio: Use jtag quirk for SNAP Connect E10
usb: isp1760: fix spin unlock in the error path of isp1760_udc_start
usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers
usb: xhci: handle Config Error Change (CEC) in xhci driver
USB: keyspan_pda: add new device id
USB: ftdi_sio: Added custom PID for Synapse Wireless product
Here are some staging driver fixes, well, really all just IIO driver
fixes, for 4.0-rc6. They fix issues that have been reported with these
drivers.
All of these patches have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlUfxVEACgkQMUfUDdst+yni4wCgl2FjuX1rCevXsakajPgIengK
l3sAoJDoBpymSdCWNt9nFokeMJSkT+Oe
=nRb4
-----END PGP SIGNATURE-----
Merge tag 'staging-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some staging driver fixes, well, really all just IIO driver
fixes, for 4.0-rc6. They fix issues that have been reported with
these drivers.
All of these patches have been in linux-next for a while"
* tag 'staging-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: imu: Use iio_trigger_get for indio_dev->trig assignment
iio: adc: vf610: use ADC clock within specification
iio/adc/cc10001_adc.c: Fix !HAS_IOMEM build
iio: core: Fix double free.
iio:inv-mpu6050: Fix inconsistency for the scale channel
staging: iio: dummy: Fix undefined symbol build error
iio: inv_mpu6050: Clear timestamps fifo while resetting hardware fifo
staging: iio: hmc5843: Set iio name property in sysfs
iio: bmc150: change sampling frequency
iio: fix drivers that check buffer->scan_mask
Here are 3 serial driver fixes for 4.0-rc6. They fix some reported
issues with the samsung and fsl_lpuart drivers.
All have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlUfxjQACgkQMUfUDdst+ykl3gCfXUHCu4NypB8RvqjThs6TMIgG
kpUAoLvnaNRD+HLyCMxDphxuN7DlBMVq
=9SnO
-----END PGP SIGNATURE-----
Merge tag 'tty-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are 3 serial driver fixes for 4.0-rc6. They fix some reported
issues with the samsung and fsl_lpuart drivers.
All have been in linux-next for a while"
* tag 'tty-4.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: fsl_lpuart: clear receive flag on FIFO flush
tty: serial: fsl_lpuart: specify transmit FIFO size
serial: samsung: Clear operation mode on UART shutdown
Quentin caught a corner case with the generation of instruction
padding in the ALTERNATIVE_2 macro: if len(orig_insn) <
len(alt1) < len(alt2), then not enough padding gets added and
that is not good(tm) as we could overwrite the beginning of the
next instruction.
Luckily, at the time of this writing, we don't have
ALTERNATIVE_2() invocations which have that problem and even if
we did, a simple fix would be to prepend the instructions with
enough prefixes so that that corner case doesn't happen.
However, best it would be if we fixed it properly. See below for
a simple, abstracted example of what we're doing.
So what we ended up doing is, we compute the
max(len(alt1), len(alt2)) - len(orig_insn)
and feed that value to the .skip gas directive. The max() cannot
have conditionals due to gas limitations, thus the fancy integer
math.
With this patch, all ALTERNATIVE_2 sites get padded correctly;
generating obscure test cases pass too:
#define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b)))))
#define gen_skip(orig, alt1, alt2, marker) \
.skip -((alt_max_short(alt1, alt2) - (orig)) > 0) * \
(alt_max_short(alt1, alt2) - (orig)),marker
.pushsection .text, "ax"
.globl main
main:
gen_skip(1, 2, 4, 0x09)
gen_skip(4, 1, 2, 0x10)
...
.popsection
Thanks to Quentin for catching it and double-checking the fix!
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150404133443.GE21152@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull input subsystem fixes from Dmitry Torokhov:
"A fix for ALPS driver for issue introduced in the latest update and a
tweak for yet another Lenovo box in Synaptics.
There will be more ALPS tweaks coming.."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: define INPUT_PROP_ACCELEROMETER behavior
Input: synaptics - fix min-max quirk value for E440
Input: synaptics - add quirk for Thinkpad E440
Input: ALPS - fix max coordinates for v5 and v7 protocols
Input: add MT_TOOL_PALM
Pull block layer fix from Jens Axboe:
"Just one patch in this pull request, fixing a regression caused by a
'mathematically correct' change to lcm()"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix blk_stack_limits() regression due to lcm() change
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a SYSRET single-stepping fix, a dmi-scan robustization
fix, a reboot quirk and a kgdb fixlet"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kgdb/x86: Fix reporting of 'si' in kgdb on x86_64
x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
MAINTAINERS: Change the x86 microcode loader maintainer
firmware: dmi_scan: Prevent dmi_num integer overflow
Simple bugfix for bad device tree data on the PA-Semi platform
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVHba/AAoJEMWQL496c2LNihcP/1oPJb/JrBkYMGyhldZpJED4
QSQe77rwyU5Fnc5xpIa/smbyoTZBbHrvaL/q0v/mUPNPKaiOs74JNnReXqEyj6Af
97N4Xwv9+s8o2WtJLLPsjeJ/HxEwiSm3GJmTDRybbhbDJjHY6LMe3LIwloz5gWD/
I0HJ4L7Wm4lFJNY2QMS2aM1khJzrr9uES6qkRq3NEhhU6m2ftbcCDXkBFNVi6L7g
tSzA+aDlTk73knlxdFG++Z0J06JhCO5e2tAUHvZfgp21ni/mLSk44hoVuxqacpqK
Gtsob6CcT1HMjGHCfq8SSSf93pllMOJLNCes8Y2nxBNSjF8KYucMRwJLqiVTdqBR
g6h4dMBrqAujAouYg27Db8NP2sx+xaX/4/NkhptVhNwCigZTO8eBY/yR8NfYmorR
x9ryVga7cmd3FYWNh1mRiuduGPHpPho7VdUbrVSb0QEeWYvF60TDMM/xipMdonk7
ZvjHzJarcr9WW00kKzAKSzzH6/sRa/djbkVkJJcN4KjFZsIM5OVKo3WlNbBdXtxs
G/BRu5tq/wvypaIfT6cd2WwxC0dg7RGeqRIez+ifTUot6rCZopLnZmcrOH6i3Kou
Mid64FxV65a9XZb472yQPNVr7dIEow1cB6fc0JWyUMziQSO4oLgNvgQBHAWN65aD
AxXiY5qOZOzT4XAlI9ER
=BjMX
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
Pull devicetree fix from Grant Likely:
"Simple bugfix for bad device tree data on the PA-Semi platform"
* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
drivers/of: Add empty ranges quirk for PA-Semi
Pull CIFS fixes from Steve French:
"A set of small cifs fixes fixing a memory leak, kernel oops, and
infinite loop (and some spotted by Coverity)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Fix warning
Fix another dereference before null check warning
CIFS: session servername can't be null
Fix warning on impossible comparison
Fix coverity warning
Fix dereference before null check warning
Don't ignore errors on encrypting password in SMBTcon
Fix warning on uninitialized buftype
cifs: potential memory leaks when parsing mnt opts
cifs: fix use-after-free bug in find_writable_file
cifs: smb2_clone_range() - exit on unhandled error
First, let's explain the problem.
Suppose you have an ipip interface that stands in the netns foo and its link
part in the netns bar (so the netns bar has an nsid into the netns foo).
Now, you remove the netns bar:
- the bar nsid into the netns foo is removed
- the netns exit method of ipip is called, thus our ipip iface is removed:
=> a netlink message is built in the netns foo to advertise this deletion
=> this netlink message requests an nsid for bar, thus a new nsid is
allocated for bar and never removed.
This patch adds a check in peernet2id() so that an id cannot be allocated for
a netns which is currently destroyed.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts
commit 4217291e59 ("netns: don't clear nsid too early on removal").
This is not the right fix, it introduces races.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit:
e2b32e6785 ("x86, kaslr: randomize module base load address")
made module base address randomization unconditional and didn't regard
disabled KKASLR due to CONFIG_HIBERNATION and command line option
"nokaslr". For more info see (now reverted) commit:
f47233c2d3 ("x86/mm/ASLR: Propagate base load address calculation")
In order to propagate KASLR status to kernel proper, we need a single bit
in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
top-down allocated bits for bits supposed to be used by the bootloader.
Originally-From: Jiri Kosina <jkosina@suse.cz>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit:
4214a16b02 ("x86/asm/entry/64/compat: Use SYSRETL to return from compat mode SYSENTER")
removed the last user of ENABLE_INTERRUPTS_SYSEXIT32. Kill the
macro now too.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1428049714-829-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
SYSEXIT is scary on 64-bit kernels -- SYSEXIT must be invoked
with usergs and IRQs on. That means that we rely on STI to
correctly mask interrupts for one instruction. This is okay by
itself, but the semantics with respect to NMIs are unclear.
Avoid the whole issue by using SYSRETL instead. For background,
Intel CPUs don't allow SYSCALL from compat mode, but they do
allow SYSRETL back to compat mode. Go figure.
To avoid doing too much at once, this doesn't revamp the calling
convention. We still return with EBP, EDX, and ECX on the user
stack.
Oddly this seems to be 30 cycles or so faster. Avoiding POPFQ
and STI will account for under half of that, I think, so my best
guess is that Intel just optimizes SYSRET much better than
SYSEXIT.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/57a0bf1b5230b2716a64ebe48e9bc1110f7ab433.1428019097.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once,
and we unnecessarily cache the value in tss.sp1. We never
read the cached value.
Remove all of the caching. It serves no purpose.
Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At Denys' request, clean up the comment describing stack padding
in the 32-bit sysenter path.
No code changes.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/41fee7bb8490ae840fe7ef2699f9c2feb932e729.1428002830.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add support for the new CLWB (cache line write back)
instruction. This instruction was announced in the document
"Intel Architecture Instruction Set Extensions Programming
Reference" with reference number 319433-022.
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
The CLWB instruction is used to write back the contents of
dirtied cache lines to memory without evicting the cache lines
from the processor's cache hierarchy. This should be used in
favor of clflushopt or clflush in cases where you require the
cache line to be written to memory but plan to access the data
again in the near future.
One of the main use cases for this is with persistent memory
where CLWB can be used with PCOMMIT to ensure that data has been
accepted to memory and is durable on the DIMM.
This function shows how to properly use CLWB/CLFLUSHOPT/CLFLUSH
and PCOMMIT with appropriate fencing:
void flush_and_commit_buffer(void *vaddr, unsigned int size)
{
void *vend = vaddr + size - 1;
for (; vaddr < vend; vaddr += boot_cpu_data.x86_clflush_size)
clwb(vaddr);
/* Flush any possible final partial cacheline */
clwb(vend);
/*
* Use SFENCE to order CLWB/CLFLUSHOPT/CLFLUSH cache flushes.
* (MFENCE via mb() also works)
*/
wmb();
/* PCOMMIT and the required SFENCE for ordering */
pcommit_sfence();
}
After this function completes the data pointed to by vaddr is
has been accepted to memory and will be durable if the vaddr
points to persistent memory.
Regarding the details of how the alternatives assembly is set
up, we need one additional byte at the beginning of the CLFLUSH
so that we can flip it into a CLFLUSHOPT by changing that byte
into a 0x66 prefix. Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX. Both have no
functional effect with the plain CLFLUSH, but I've been told
that executing a CLFLUSH + prefix should be faster than
executing a CLFLUSH + NOP.
We had to hard code the assembly for CLWB because, lacking the
ability to assemble the CLWB instruction itself, the next
closest thing is to have an xsaveopt instruction with a 0x66
prefix. Unfortunately XSAVEOPT itself is also relatively new,
and isn't included by all the GCC versions that the kernel needs
to support.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1422377631-8986-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We need to wait for the flying timers, since we
are going to free the mrtable right after it.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have to hold rtnl lock for fib_rules_unregister()
otherwise the following race could happen:
fib_rules_unregister(): fib_nl_delrule():
... ...
... ops = lookup_rules_ops();
list_del_rcu(&ops->list);
list_for_each_entry(ops->rules) {
fib_rules_cleanup_ops(ops); ...
list_del_rcu(); list_del_rcu();
}
Note, net->rules_mod_lock is actually not needed at all,
either upper layer netns code or rtnl lock guarantees
we are safe.
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the IPv4 part for commit 905a6f96a1
(ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanup).
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull drm fixes from Dave Airlie:
"One drm core fix, one exynos regression fix, two sets of radeon fixes
(Alex was a bit behind last week), and two i915 fixes.
Nothing too serious we seem to have calmed down i915 since last week"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr
drm: Exynos: Respect framebuffer pitch for FIMD/Mixer
drm/i915: Reject the colorkey ioctls for primary and cursor planes
drm/i915: Skip allocating shadow batch for 0-length batches
drm/radeon: programm the VCE fw BAR as well
drm/radeon: always dump the ring content if it's available
radeon: Do not directly dereference pointers to BIOS area.
drm/radeon/dpm: fix 120hz handling harder
drm/edid: set ELD for firmware and debugfs override EDIDs
- GICv3 ITS
- Small batch of fixes discovered while writing the kvm ITS emulation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJVHcHDAAoJEP45WPkGe8ZnrXUP/Rp67onmizG6AtVNTCB5HT0V
6HNrLS4cnlpkFpWLf75ojZKfjJ/Gi3Bo4aqie9iWbOgGuI4/vxo35FdoBP8992k2
Stm++FR1jWdxbsTZJxdIqkl2uw7enA/0wpJL2JttdHFb+7Inirh0rphEyNAw24y4
Qe1hQT+c4CzCLA6ED5bqN7z28X9RoesW6dt8zDk/08IrolVsVSt+ZEdepBBBIZYf
u3d+mOP0UohAew+g+hVLvwTplcQVQK+ZGdnzoyMf9s/3r2bAobGFZvCQxoJLm7fw
tAV1Uf/iYFkqBzWYj8VC9crxY7QK2sr68OA+QMAkJoBYW0ypBowvggx+noAd4FO4
jPKGbmEJrYxbk3woxwZKvihU1fXbsQq8OykEk/urZeZyCPIXz9TIDneiI+RGGh9a
1o6Iu605bbKttwzlHjzU8C0y34qDLyJpA+zjIQJbXWOExyHynP4jIDEkvV7PGIQn
Z88QKvkDn0TUdnPkaBUHOI1Y6SJm/ernR0FomIKWyd4innwfjyQCTRXaCryfRVWW
lLRSakFyaRWjmNX50eDRp8x8yUPJhyg4Dd25Dd4KWZv5QhD7DJctywwjDCf+l9jC
8U8BbvhbPO2pwXratGNcbjapotAHX9Y46BHr0TrI4nbcaUz5+JvPLMQE6UA/92SQ
aT8pb4m7zfS5zhZPL06D
=Q4Dt
-----END PGP SIGNATURE-----
Merge tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux
Pull irqchip fixes from Jason Cooper:
"This is the second round of fixes for irqchip. It contains some fixes
found while the arm64 guys were writing the kvm gicv3 its emulation.
GICv3 ITS:
- Small batch of fixes discovered while writing the kvm ITS emulation"
* tag 'irqchip-fixes-4.0-2' of git://git.infradead.org/users/jcooper/linux:
irqchip: gicv3-its: Use non-cacheable accesses when no shareability
irqchip: gicv3-its: Fix PROP/PEND and BASE/CBASE confusion
irqchip: gicv3-its: Fix device ID encoding
irqchip: gicv3-its: Fix encoding of collection's target redistributor
Just two small fixes for radeon, both destined for stable.
* 'drm-fixes-4.0' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix wait in radeon_mn_invalidate_range_start
drm/radeon: add extra check in radeon_ttm_tt_unpin_userptr