linux

Author	SHA1	Message	Date
Vasily Gorbik	686140a1a9	s390: introduce CPU alternatives Implement CPU alternatives, which allows to optionally patch newer instructions at runtime, based on CPU facilities availability. A new kernel boot parameter "noaltinstr" disables patching. Current implementation is derived from x86 alternatives. Although ideal instructions padding (when altinstr is longer then oldinstr) is added at compile time, and no oldinstr nops optimization has to be done at runtime. Also couple of compile time sanity checks are done: 1. oldinstr and altinstr must be <= 254 bytes long, 2. oldinstr and altinstr must not have an odd length. alternative(oldinstr, altinstr, facility); alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2); Both compile time and runtime padding consists of either 6/4/2 bytes nop or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes. .altinstructions and .altinstr_replacement sections are part of __init_begin : __init_end region and are freed after initialization. Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-18 14:11:29 +02:00
Sebastian Ott	0dcd91a9e6	s390/debug: only write data once debug_event_common memsets the active debug entry with zeros to prevent stale data leakage. This is overwritten with the actual debug data in the next step. Only write zeros to that part of the debug entry that's not used by new debug data. Micro benchmarks show a 2-10% reduction of cpu cycles with this approach. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-18 14:11:23 +02:00
Sebastian Ott	94158e544f	s390/debug: improve debug_event debug_event currently truncates the data if used with a size larger than the buf_size of the debug feature. For lots of callers of this function, wrappers have been implemented that loop until all data is handled. Move that functionality into debug_event_common and get rid of the wrappers. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-18 14:11:19 +02:00
Philipp Rudo	7c3eaaa391	s390/kexec: Fix checksum validation return code for kdump Before kexec boots to a crash kernel it checks whether the image in memory changed after load. This is done by the function kdump_csum_valid, which returns true, i.e. an int != 0, on success and 0 otherwise. In other words when kdump_csum_valid returns an error code it means that the validation succeeded. This is not only counterintuitive but also produces the wrong result if the kernel was build without CONFIG_CRASH_DUMP. Fix this by making kdump_csum_valid return a bool. Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-18 14:11:16 +02:00
Heiko Carstens	496da0d706	s390/debug: adjust coding style The debug feature code hasn't been touched in ages and the code also looks like this. Therefore clean up the code so it looks a bit more like current coding style. There is no functional change - actually I made also sure that the generated code with performance_defconfig is identical. A diff of old vs new with "objdump -d" is empty. The code is still not checkpatch clean, but that was not the goal. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-16 08:19:26 +02:00
Heiko Carstens	df8bbd0c98	s390/kprobes: remove KPROBE_SWAP_INST state For an unknown reason the s390 kprobes instruction replacement function modifies the kprobe_status of the current CPU to KPROBE_SWAP_INST. This was supposed to catch traps that happened during instruction patching. Such a fault is not supposed to happen, and silently discarding such a fault is certainly also not what we want. In fact s390 is the only architecture which has this odd piece of code. Just remove this and behave like all other architectures. This was pointed out by Jens Remus. Reported-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-12 07:16:50 +02:00
Heiko Carstens	41879ff65d	s390/mm: use memset64 instead of clear_table Use memset64 instead of the (now) open-coded variant clear_table. Performance wise there is no difference. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-09 11:18:06 +02:00
Martin Schwidefsky	3bdf5679c9	Merge branch 'sthyi' into features Add the store-hypervisor-information code into features using a tip branch for parallel merging into the KVM tree.	2017-10-09 11:16:49 +02:00
QingFeng Hao	3d8757b87d	s390/sthyi: add s390_sthyi system call Add a syscall of s390_sthyi to implement STHYI instruction in LPAR which reuses the implementation for KVM by Janosch Frank - commit `95ca2cb579` ("KVM: s390: Add sthyi emulation"). STHYI(Store Hypervisor Information) is an emulated z/VM instruction that provides a guest with basic information about the layers it is running on. This includes information about the cpu configuration of both the machine and the lpar, as well as their names, machine model and machine type. This information enables an application to determine the maximum capacity of CPs and IFLs available to software. For the arguments of s390_sthyi, code shall be 0 and flags is reserved for future use, info is the output argument to store the required hypervisor info. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-09 11:15:36 +02:00
QingFeng Hao	9fb6c9b3fe	s390/sthyi: add cache to store hypervisor info STHYI requires extensive locking in the higher hypervisors and is very computational/memory expensive. Therefore we cache the retrieved hypervisor info whose valid period is 1s with mutex to allow concurrent access. rw semaphore can't benefit here due to cache line bounce. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-09 11:15:35 +02:00
QingFeng Hao	b7c92f1a4e	s390/sthyi: reorganize sthyi implementation As we need to support sthyi instruction on LPAR too, move the common code to kernel part and kvm related code to intercept.c for better reuse. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-10-09 11:15:33 +02:00
Heiko Carstens	e0d281d067	s390/disassembler: add new z14 instructions Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-29 15:52:24 +02:00
Heiko Carstens	ea7c360b10	s390/disassembler: add missing z13 instructions Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-29 15:52:18 +02:00
Heiko Carstens	630f789e80	s390/disassembler: add sthyi instruction This instruction came with a z/VM extension and not with a specific machine generation. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-29 15:52:11 +02:00
Heiko Carstens	7e1263b720	s390/disassembler: remove double instructions Remove a couple of instructions that are listed twice. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-29 15:52:06 +02:00
Heiko Carstens	caefea1d3a	s390/disassembler: fix LRDFU format Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-29 15:52:03 +02:00
Heiko Carstens	5c50538752	s390/disassembler: add missing end marker for e7 table The e7 opcode table does not have an end marker. Hence when trying to find an unknown e7 instruction the code will access memory behind the table until it finds something that matches the opcode, or the kernel crashes, whatever comes first. This affects not only the in-kernel disassembler but also uprobes and kprobes which refuse to set a probe on unknown instructions, and therefore search the opcode tables to figure out if instructions are known or not. Cc: <stable@vger.kernel.org> # v3.18+ Fixes: `3585cb0280` ("s390/disassembler: add vector instructions") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-29 15:51:59 +02:00
Martin Schwidefsky	b96f7d881a	s390/spinlock: introduce spinlock wait queueing The queued spinlock code for s390 follows the principles of the common code qspinlock implementation but with a few notable differences. The format of the spinlock_t locking word differs, s390 needs to store the logical CPU number of the lock holder in the spinlock_t to be able to use the diagnose 9c directed yield hypervisor call. The inline code sequences for spin_lock and spin_unlock are nice and short. The inline portion of a spin_lock now typically looks like this: lhi %r0,0 # 0 indicates an empty lock l %r1,0x3a0 # CPU number + 1 from lowcore cs %r0,%r1,<some_lock> # lock operation jnz call_wait # on failure call wait function locked: ... call_wait: la %r2,<some_lock> brasl %r14,arch_spin_lock_wait j locked A spin_unlock is as simple as before: lhi %r0,0 sth %r0,2(%r2) # unlock operation After a CPU has queued itself it may not enable interrupts again for the arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function is removed. To improve performance the code implements opportunistic lock stealing. If the wait function finds a spinlock_t that indicates that the lock is free but there are queued waiters, the CPU may steal the lock up to three times without queueing itself. The lock stealing update the steal counter in the lock word to prevent more than 3 steals. The counter is reset at the time the CPU next in the queue successfully takes the lock. While the queued spinlocks improve performance in a system with dedicated CPUs, in a virtualized environment with continuously overcommitted CPUs the queued spinlocks can have a negative effect on performance. This is due to the fact that a queued CPU that is preempted by the hypervisor will block the queue at some point even without holding the lock. With the classic spinlock it does not matter if a CPU is preempted that waits for the lock. Therefore use the queued spinlock code only if the system runs with dedicated CPUs and fall back to classic spinlocks when running with shared CPUs. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:44 +02:00
Martin Schwidefsky	1887aa07b6	s390/topology: add detection of dedicated vs shared CPUs The topology information returned by STSI 15.x.x contains a flag if the CPUs of a topology-list are dedicated or shared. Make this information available if the machine provides topology information. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:43 +02:00
Heiko Carstens	1922099979	s390/cpumf: remove superfluous nr_cpumask_bits check Paul Burton reported that the nr_cpumask_bits check within cpumsf_pmu_event_init() is not necessary. Actually there is already a prior check within perf_event_alloc(). Therefore remove the check. Reported-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:43 +02:00
Alice Frosi	262832bc5a	s390/ptrace: add runtime instrumention register get/set Add runtime instrumention register get and set which allows to read and modify the runtime instrumention control block. Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:41 +02:00
Alice Frosi	bb59c2da3f	s390/runtime_instrumentation: clean up struct runtime_instr_cb Update runtime_instr_cb structure to be consistent with the runtime instrumentation documentation. Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:40 +02:00
Heiko Carstens	79962038df	s390: add support for FORTIFY_SOURCE This is the quite trivial backend for s390 which is required to enable FORTIFY_SOURCE support. See commit `6974f0c455` ("include/linux/string.h: add the option of fortified string.h functions") for more details. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:40 +02:00
Heiko Carstens	59a19ea9a0	s390: get rid of exit_thread() exit_thread() is empty now. Therefore remove it and get rid of a pointless branch. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:40 +02:00
Heiko Carstens	7b83c6297d	s390/guarded storage: simplify task exit handling Free data structures required for guarded storage from arch_release_task_struct(). This allows to simplify the code a bit, and also makes the semantics a bit easier: arch_release_task_struct() is never called from the task that is being removed. In addition this allows to get rid of exit_thread() in a later patch. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:40 +02:00
Heiko Carstens	5ef2d5231d	s390/ptrace: fix guarded storage regset handling If the guarded storage regset for current is supposed to be changed, the regset from user space is copied directly into the guarded storage control block. If then the process gets scheduled away while the control block is being copied and before the new control block has been loaded, the result is random: the process can be scheduled away due to a page fault or preemption. If that happens the already copied parts will be overwritten by save_gs_cb(), called from switch_to(). Avoid this by copying the data to a temporary buffer on the stack and do the actual update with preemption disabled. Fixes: `f5bbd72198` ("s390/ptrace: guarded storage regset for the current task") Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:40 +02:00
Heiko Carstens	fa1edf3f63	s390/guarded storage: fix possible memory corruption For PREEMPT enabled kernels the guarded storage (GS) code contains a possible use-after-free bug. If a task that makes use of GS exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_gs() gets preempted after the GS control block of the task has been freed but before the pointer to it is set to NULL, then save_gs_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: `916cda1aa1` ("s390: add a system call for guarded storage") Cc: <stable@vger.kernel.org> # v4.12+ Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:39 +02:00
Heiko Carstens	8d9047f8b9	s390/runtime instrumentation: simplify task exit handling Free data structures required for runtime instrumentation from arch_release_task_struct(). This allows to simplify the code a bit, and also makes the semantics a bit easier: arch_release_task_struct() is never called from the task that is being removed. In addition this allows to get rid of exit_thread() in a later patch. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:39 +02:00
Heiko Carstens	d6e646ad7c	s390/runtime instrumention: fix possible memory corruption For PREEMPT enabled kernels the runtime instrumentation (RI) code contains a possible use-after-free bug. If a task that makes use of RI exits, it will execute do_exit() while still enabled for preemption. That function will call exit_thread_runtime_instr() via exit_thread(). If exit_thread_runtime_instr() gets preempted after the RI control block of the task has been freed but before the pointer to it is set to NULL, then save_ri_cb(), called from switch_to(), will write to already freed memory. Avoid this and simply disable preemption while freeing the control block and setting the pointer to NULL. Fixes: `e4b8b3f33f` ("s390: add support for runtime instrumentation") Cc: <stable@vger.kernel.org> # v3.7+ Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:39 +02:00
Heiko Carstens	8076428f0c	s390: convert release_thread() into a static inline function release_thread() is an empty function that gets called on every task exit. Move the function to a header file and force inlining of it, so that the compiler can optimize it away instead of generating a pointless function call. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-28 07:29:39 +02:00
Heiko Carstens	51dce3867c	s390/topology: enable / disable topology dynamically Add a new sysctl file /proc/sys/s390/topology which displays if topology is on (1) or off (0) as specified by the "topology=" kernel parameter. This allows to change topology information during runtime and configuring it via /etc/sysctl.conf instead of using the kernel line parameter. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-20 13:47:55 +02:00
Heiko Carstens	1b25fda053	s390/topology: alternative topology for topology-less machines If running on machines that do not provide topology information we currently generate a "fake" topology which defines the maximum distance between each cpu: each cpu will be put into an own drawer. Historically this used to be the best option for (virtual) machines in overcommited hypervisors. For some workloads however it is better to generate a different topology where all cpus are siblings within a package (all cpus are core siblings). This shows performance improvements of up to 10%, depending on the workload. In order to keep the current behaviour, but also allow to switch to the different core sibling topology use the existing "topology=" kernel parameter: Specifying "topology=on" on machines without topology information will generate the core siblings (fake) topology information, instead of the default topology information where all cpus have the maximum distance. On machines which provide topology information specifying "topology=on" does not have any effect. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-20 13:47:54 +02:00
Pu Hou	fc3100d64f	s390/perf: fix bug when creating per-thread event A per-thread event could not be created correctly like below: perf record --per-thread -e rB0000 -- sleep 1 Error: The sys_perf_event_open() syscall returned with 19 (No such device) for event (rB0000). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? This bug was introduced by: commit `c311c79799` Author: Alexey Dobriyan <adobriyan@gmail.com> Date: Mon May 8 15:56:15 2017 -0700 cpumask: make "nr_cpumask_bits" unsigned If a per-thread event is not attached to any CPU, the cpu field in struct perf_event is -1. The above commit converts the CPU number to unsigned int, which result in an illegal CPU number. Fixes: `c311c79799` ("cpumask: make "nr_cpumask_bits" unsigned") Cc: <stable@vger.kernel.org> # v4.12+ Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-09-13 16:34:23 +02:00
Linus Torvalds	dd198ce714	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP	2017-09-11 18:34:47 -07:00
Jan Höppner	7bf76f0169	s390/dasd: Change unsigned long long to unsigned long Unsigned long long and unsigned long were different in size for 31-bit. For 64-bit the size for both datatypes is 8 Bytes and since the support for 31-bit is long gone we can clean up a little and change everything to unsigned long. Change get_phys_clock() along the way to accept unsigned long as well so that the DASD code can be consistent. Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-08-23 13:31:51 +02:00
Heiko Carstens	e1108e8f0d	s390/smp: convert cpuhp_setup_state() return code to zero on success cpuhp_setup_state() returns a state number on CPUHP_AP_ONLINE_DYN if no error occurred. Therefore convert the return code to zero before using it as exit code for s390_smp_init(). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-08-23 13:31:49 +02:00
Heiko Carstens	673cfddf6e	s390: fix 'novx' early parameter handling Specifying the 'novx' kernel parameter always results in a warning: Malformed early option 'novx' The reason for this is that the novx early parameter handling function always returns a non-zero value which means that an error occurred. Fix this and return the correct zero value instead. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-08-23 13:31:47 +02:00
Heiko Carstens	3f4298427a	s390/vmcp: make use of contiguous memory allocator If memory is fragmented it is unlikely that large order memory allocations succeed. This has been an issue with the vmcp device driver since a long time, since it requires large physical contiguous memory ares for large responses. To hopefully resolve this issue make use of the contiguous memory allocator (cma). This patch adds a vmcp specific vmcp cma area with a default size of 4MB. The size can be changed either via the VMCP_CMA_SIZE config option at compile time or with the "vmcp_cma" kernel parameter (e.g. "vmcp_cma=16m"). For any vmcp response buffers larger than 16k memory from the cma area will be allocated. If such an allocation fails, there is a fallback to the buddy allocator. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-08-09 09:09:35 -04:00
Heiko Carstens	cd4386a931	s390/cpcmd,vmcp: avoid GFP_DMA allocations According to the CP Programming Services manual Diagnose Code 8 "Virtual Console Function" can be used in all addressing modes. Also the input and output buffers do not have a limitation which specifies they need to be below the 2GB line. This is true at least since z/VM 5.4. Therefore remove the sam31/64 instructions and allow for simple GFP_KERNEL allocations. This makes it easier to allocate a 1MB page if the user requested such a large return buffer. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-08-09 09:09:35 -04:00
Martin Schwidefsky	6997c32365	s390: add support for IBM z14 machines Add detection for machine type 0x3906 and set the ELF platform name to z14. Add the miscellaneous-instruction-extension 2 facility to the list of facilities for z14. And allow to generate code that only runs on a z14 machine. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-07-26 08:25:15 +02:00
Martin Schwidefsky	6e2ef5e4f6	s390/time: add support for the TOD clock epoch extension The TOD epoch extension adds 8 epoch bits to the TOD clock to provide a continuous clock after 2042/09/17. The store-clock-extended (STCKE) instruction will store the epoch index in the first byte of the 16 bytes stored by the instruction. The read_boot_clock64 and the read_presistent_clock64 functions need to take the additional bits into account to give the correct result after 2042/09/17. The clock-comparator register will stay 64 bit wide. The comparison of the clock-comparator with the TOD clock is limited to bytes 1 to 8 of the extended TOD format. To deal with the overflow problem due to an epoch change the clock-comparator sign control in CR0 can be used to switch the comparison of the 64-bit TOD clock with the clock-comparator to a signed comparison. The decision between the signed vs. unsigned clock-comparator comparisons is done at boot time. Only if the TOD clock is in the second half of a 142 year epoch the signed comparison is used. This solves the epoch overflow issue as long as the machine is booted at least once in an epoch. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-07-26 08:25:14 +02:00
Heiko Carstens	f1c1174fa0	s390/mm: use new mm defines instead of magic values Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-07-26 08:25:09 +02:00
Martin Schwidefsky	810fa7efe0	Merge branch 'tlb-flushing' into features Add the TLB flushing changes via a tip branch to ease merging with the KVM tree.	2017-07-26 08:23:27 +02:00
Linus Torvalds	eeb7c41d9d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Three bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: set change and reference bit on lazy key enablement s390: chp: handle CRW_ERC_INIT for channel-path status change s390/perf: fix problem state detection	2017-07-25 08:44:27 -07:00
Martin Schwidefsky	c9b5ad546e	s390/mm: tag normal pages vs pages used in page tables The ESSA instruction has a new option that allows to tag pages that are not used as a page table. Without the tag the hypervisor has to assume that any guest page could be used in a page table inside the guest. This forces the hypervisor to flush all guest TLB entries whenever a host page table entry is invalidated. With the tag the host can skip the TLB flush if the page is tagged as normal page. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-07-25 06:55:28 +02:00
Eric W. Biederman	cc731525f2	signal: Remove kernel interal si_code magic struct siginfo is a union and the kernel since 2.4 has been hiding a union tag in the high 16bits of si_code using the values: __SI_KILL __SI_TIMER __SI_POLL __SI_FAULT __SI_CHLD __SI_RT __SI_MESGQ __SI_SYS While this looks plausible on the surface, in practice this situation has not worked well. - Injected positive signals are not copied to user space properly unless they have these magic high bits set. - Injected positive signals are not reported properly by signalfd unless they have these magic high bits set. - These kernel internal values leaked to userspace via ptrace_peek_siginfo - It was possible to inject these kernel internal values and cause the the kernel to misbehave. - Kernel developers got confused and expected these kernel internal values in userspace in kernel self tests. - Kernel developers got confused and set si_code to __SI_FAULT which is SI_USER in userspace which causes userspace to think an ordinary user sent the signal and that it was not kernel generated. - The values make it impossible to reorganize the code to transform siginfo_copy_to_user into a plain copy_to_user. As si_code must be massaged before being passed to userspace. So remove these kernel internal si codes and make the kernel code simpler and more maintainable. To replace these kernel internal magic si_codes introduce the helper function siginfo_layout, that takes a signal number and an si_code and computes which union member of siginfo is being used. Have siginfo_layout return an enumeration so that gcc will have enough information to warn if a switch statement does not handle all of union members. A couple of architectures have a messed up ABI that defines signal specific duplications of SI_USER which causes more special cases in siginfo_layout than I would like. The good news is only problem architectures pay the cost. Update all of the code that used the previous magic __SI_ values to use the new SIL_ values and to call siginfo_layout to get those values. Escept where not all of the cases are handled remove the defaults in the switch statements so that if a new case is missed in the future the lack will show up at compile time. Modify the code that copies siginfo si_code to userspace to just copy the value and not cast si_code to a short first. The high bits are no longer used to hold a magic union member. Fixup the siginfo header files to stop including the __SI_ values in their constants and for the headers that were missing it to properly update the number of si_codes for each signal type. The fixes to copy_siginfo_from_user32 implementations has the interesting property that several of them perviously should never have worked as the __SI_ values they depended up where kernel internal. With that dependency gone those implementations should work much better. The idea of not passing the __SI_ values out to userspace and then not reinserting them has been tested with criu and criu worked without changes. Ref: 2.4.0-test1 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2017-07-24 14:30:28 -05:00
Christian Borntraeger	b855629b9a	s390/perf: fix problem state detection The P sample bit indicates problem state and not PER. Fixes: commit `a752598254` ("s390: rename struct psw_bits members") Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2017-07-13 11:28:29 +02:00
Xunlei Pang	203e9e4121	kexec: move vmcoreinfo out of the kernel's .bss section As Eric said, "what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section." This patch allocates extra pages for these vmcoreinfo_XXX variables, one advantage is that it enhances some safety of vmcoreinfo, because vmcoreinfo now is kept far away from other kernel data structures. Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Reviewed-by: Juergen Gross <jgross@suse.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Young <dyoung@redhat.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-07-12 16:25:59 -07:00
Linus Torvalds	5518b69b76	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Reasonably busy this cycle, but perhaps not as busy as in the 4.12 merge window: 1) Several optimizations for UDP processing under high load from Paolo Abeni. 2) Support pacing internally in TCP when using the sch_fq packet scheduler for this is not practical. From Eric Dumazet. 3) Support mutliple filter chains per qdisc, from Jiri Pirko. 4) Move to 1ms TCP timestamp clock, from Eric Dumazet. 5) Add batch dequeueing to vhost_net, from Jason Wang. 6) Flesh out more completely SCTP checksum offload support, from Davide Caratti. 7) More plumbing of extended netlink ACKs, from David Ahern, Pablo Neira Ayuso, and Matthias Schiffer. 8) Add devlink support to nfp driver, from Simon Horman. 9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa Prabhu. 10) Add stack depth tracking to BPF verifier and use this information in the various eBPF JITs. From Alexei Starovoitov. 11) Support XDP on qed device VFs, from Yuval Mintz. 12) Introduce BPF PROG ID for better introspection of installed BPF programs. From Martin KaFai Lau. 13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann. 14) For loads, allow narrower accesses in bpf verifier checking, from Yonghong Song. 15) Support MIPS in the BPF selftests and samples infrastructure, the MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David Daney. 16) Support kernel based TLS, from Dave Watson and others. 17) Remove completely DST garbage collection, from Wei Wang. 18) Allow installing TCP MD5 rules using prefixes, from Ivan Delalande. 19) Add XDP support to Intel i40e driver, from Björn Töpel 20) Add support for TC flower offload in nfp driver, from Simon Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub Kicinski, and Bert van Leeuwen. 21) IPSEC offloading support in mlx5, from Ilan Tayari. 22) Add HW PTP support to macb driver, from Rafal Ozieblo. 23) Networking refcount_t conversions, From Elena Reshetova. 24) Add sock_ops support to BPF, from Lawrence Brako. This is useful for tuning the TCP sockopt settings of a group of applications, currently via CGROUPs" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits) net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap cxgb4: Support for get_ts_info ethtool method cxgb4: Add PTP Hardware Clock (PHC) support cxgb4: time stamping interface for PTP nfp: default to chained metadata prepend format nfp: remove legacy MAC address lookup nfp: improve order of interfaces in breakout mode net: macb: remove extraneous return when MACB_EXT_DESC is defined bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case bpf: fix return in load_bpf_file mpls: fix rtm policy in mpls_getroute net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t net, ax25: convert ax25_route.refcount from atomic_t to refcount_t net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t ...	2017-07-05 12:31:59 -07:00
Linus Torvalds	9a9594efe5	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...	2017-07-03 18:08:06 -07:00

1 2 3 4 5 ...

2171 Commits