linux

Author	SHA1	Message	Date
Vikas Shivappa	a9fcf8627d	x86/intel_rdt/cqm: Add cpus file support The cpus file is extended to support resource monitoring. This is used to over-ride the RMID of the default group when running on specific CPUs. It works similar to the resource control. The "cpus" and "cpus_list" file is present in default group, ctrl_mon groups and monitor groups. Each "cpus" file or cpu_list file reads a cpumask or list showing which CPUs belong to the resource group. By default all online cpus belong to the default root group. A CPU can be present in one "ctrl_mon" and one "monitor" group simultaneously. They can be added to a resource group by writing the CPU to the file. When a CPU is added to a ctrl_mon group it is automatically removed from the previous ctrl_mon group. A CPU can be added to a monitor group only if it is present in the parent ctrl_mon group and when a CPU is added to a monitor group, it is automatically removed from the previous monitor group. When CPUs go offline, they are automatically removed from the ctrl_mon and monitor groups. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-18-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:25 +02:00
Vikas Shivappa	b09d981b3f	x86/intel_rdt: Prepare to add RDT monitor cpus file support Separate the ctrl cpus file handling from the generic cpus file handling and convert the per cpu closid from u32 to a struct which will be used later to add rmid to the same struct. Also cleanup some name space. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-17-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:25 +02:00
Vikas Shivappa	d6aaba615a	x86/intel_rdt/cqm: Add tasks file support The root directory, ctrl_mon and monitor groups are populated with a read/write file named "tasks". When read, it shows all the task IDs assigned to the resource group. Tasks can be added to groups by writing the PID to the file. A task can be present in one "ctrl_mon" group "and" one "monitor" group. IOW a PID_x can be seen in a ctrl_mon group and a monitor group at the same time. When a task is added to a ctrl_mon group, it is automatically removed from the previous ctrl_mon group where it belonged. Similarly if a task is moved to a monitor group it is removed from the previous monitor group . Also since the monitor groups can only have subset of tasks of parent ctrl_mon group, a task can be moved to a monitor group only if its already present in the parent ctrl_mon group. Task membership is indicated by a new field in the task_struct "u32 rmid" which holds the RMID for the task. RMID=0 is reserved for the default root group where the tasks belong to at mount. [tony: zero the rmid if rdtgroup was deleted when task was being moved] Signed-off-by: Tony Luck <tony.luck@linux.intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-16-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:24 +02:00
Vikas Shivappa	0734ded1ab	x86/intel_rdt: Change closid type from int to u32 OS associates a CLOSid(Class of service id) to a task by writing the high 32 bits of per CPU IA32_PQR_ASSOC MSR when a task is scheduled in. CPUID.(EAX=10H, ECX=1):EDX[15:0] enumerates the max CLOSID supported and it is zero indexed. Hence change the type to u32 from int. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-15-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:24 +02:00
Vikas Shivappa	c7d9aac613	x86/intel_rdt/cqm: Add mkdir support for RDT monitoring Resource control groups can be created using mkdir in resctrl fs(rdtgroup). In order to extend the resctrl interface to support monitoring the control groups, extend the current mkdir to support resource monitoring also. This allows the rdtgroup created under the root directory to be able to both control and monitor resources (ctrl_mon group). The ctrl_mon groups are associated with one CLOSID like the legacy rdtgroups and one RMID(Resource monitoring ID) as well. Hardware uses RMID to track the resource usage. Once either of the CLOSID or RMID are exhausted, the mkdir fails with -ENOSPC. If there are RMIDs in limbo list but not free an -EBUSY is returned. User can also monitor a subset of the ctrl_mon rdtgroup's tasks/cpus using the monitor groups. The monitor groups are created using mkdir under the "mon_groups" directory in every ctrl_mon group. [Merged Tony's code: Removed a lot of common mkdir code, a fix to handling of the list of the child rdtgroups and some cleanups in list traversal. Also the changes to have similar alloc and free for CLOS/RMID and return -EBUSY when RMIDs are in limbo and not free] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-14-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:23 +02:00
Vikas Shivappa	65b4f40305	x86/intel_rdt: Prepare for RDT monitoring mkdir support Separate the ctrl mkdir code from the rest in order to prepare for adding support for RDT monitoring mkdir support as well. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-13-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:23 +02:00
Vikas Shivappa	d4ab332010	x86/intel_rdt/cqm: Add info files for RDT monitoring Add info directory files specific to RDT monitoring. num_rmids: The number of RMIDs which are valid for the resource. mon_features: Lists the monitoring events if monitoring is enabled for the resource. max_threshold_occupancy: This is specific to llc_occupancy monitoring and is used to determine if an RMID can be reused. Provides an upper bound on the threshold and is shown to the user in bytes though the internal value will be rounded to the scaling factor supported by the h/w. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-12-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:22 +02:00
Tony luck	5dc1d5c6ba	x86/intel_rdt: Simplify info and base file lists The info directory files and base files need to be different for each resource like cache and Memory bandwidth. With in each resource, the files would be further different for monitoring and ctrl. This leads to a lot of different static array declarations given that we are adding resctrl monitoring. Simplify this to one common list of files and then declare a set of flags to choose the files based on the resource, whether it is info or base and if it is control type file. This is as a preparation to include monitoring based info and base files. No functional change. [Vikas: Extended the flags to have few bits per category like resource, info/base etc] Signed-off-by: Tony luck <tony.luck@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-11-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:22 +02:00
Vikas Shivappa	edf6fa1c4a	x86/intel_rdt/cqm: Add RMID (Resource monitoring ID) management Hardware uses RMID(Resource monitoring ID) to keep track of each of the RDT events associated with tasks. The number of RMIDs is dependent on the SKU and is enumerated via CPUID. We add support to manage the RMIDs which include managing the RMID allocation and reading LLC occupancy for an RMID. RMID allocation is managed by keeping a free list which is initialized to all available RMIDs except for RMID 0 which is always reserved for root group. RMIDs goto a limbo list once they are freed since the RMIDs are still tagged to cache lines of the tasks which were using them - thereby still having some occupancy. They continue to be in limbo list until the occupancy < threshold_occupancy. The threshold_occupancy is a user configurable value. OS uses IA32_QM_CTR MSR to read the occupancy associated with an RMID after programming the IA32_EVENTSEL MSR with the RMID. [Tony: Improved limbo search] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-10-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:21 +02:00
Vikas Shivappa	6a445edce6	x86/intel_rdt/cqm: Add RDT monitoring initialization Add common data structures for RDT resource monitoring and perform RDT monitoring related data structure initializations which include setting up the RMID(Resource monitoring ID) lists and event list which the resource supports. [ tony: some cleanup to make adding MBM easier later, remove "cqm" from some names, make some data structure local to intel_rdt_monitor.c static. Add copyright header] [ tglx: Made it readable ] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-9-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:21 +02:00
Vikas Shivappa	dd131853f3	x86/intel_rdt: Make rdt_resources_all more readable Change the format of the global rdt_resources_all. This holds all the RDT resource structure initialization values. Make this more readable by using the format: rdt_resources_all[] = { [<resource_index>] = {... } ... } Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-8-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:20 +02:00
Vikas Shivappa	1b5c0b7583	x86/intel_rdt: Cleanup namespace to support RDT monitoring Few of the data-structures have generic names although they are RDT allocation specific. Rename them to be allocation specific to accommodate RDT monitoring. E.g. s/enabled/alloc_enabled/ No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-7-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:20 +02:00
Reinette Chatre	cb2200e967	x86/intel_rdt: Mark rdt_root and closid_alloc as static Sparse reports that both of these can be static. Make it so. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Link: http://lkml.kernel.org/r/1501017287-28083-6-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:20 +02:00
Vikas Shivappa	0583020456	x86/intel_rdt: Change file names to accommodate RDT monitor code Because the "perf cqm" and resctrl code were separately added and indivdually configurable, there seem to be separate context switch code and also things on global .h which are not really needed. Move only the scheduling specific code and definitions to <asm/intel_rdt_sched.h> and the put all the other declarations to a local intel_rdt.h. h/t to Reinette Chatre for pointing out that we should separate the public interfaces used by other parts of the kernel from private objects shared between the various files comprising RDT. No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-5-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:19 +02:00
Vikas Shivappa	f01d7d51f5	x86/intel_rdt: Introduce a common compile option for RDT We currently have a CONFIG_RDT_A which is for RDT(Resource directory technology) allocation based resctrl filesystem interface. As a preparation to add support for RDT monitoring as well into the same resctrl filesystem, change the config option to be CONFIG_RDT which would include both RDT allocation and monitoring code. No functional change. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:19 +02:00
Vikas Shivappa	1640ae9471	x86/intel_rdt/cqm: Documentation for resctrl based RDT Monitoring Add a description of resctrl based RDT(resource director technology) monitoring extension and its usage. [Tony: Added descriptions for how monitoring and allocation are measured and some cleanups] Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-3-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:19 +02:00
Vikas Shivappa	c39a0e2c88	x86/perf/cqm: Wipe out perf based cqm 'perf cqm' never worked due to the incompatibility between perf infrastructure and cqm hardware support. The hardware uses RMIDs to track the llc occupancy of tasks and these RMIDs are per package. This makes monitoring a hierarchy like cgroup along with monitoring of tasks separately difficult and several patches sent to lkml to fix them were NACKed. Further more, the following issues in the current perf cqm make it almost unusable: 1. No support to monitor the same group of tasks for which we do allocation using resctrl. 2. It gives random and inaccurate data (mostly 0s) once we run out of RMIDs due to issues in Recycling. 3. Recycling results in inaccuracy of data because we cannot guarantee that the RMID was stolen from a task when it was not pulling data into cache or even when it pulled the least data. Also for monitoring llc_occupancy, if we stop using an RMID_x and then start using an RMID_y after we reclaim an RMID from an other event, we miss accounting all the occupancy that was tagged to RMID_x at a later perf_count. 2. Recycling code makes the monitoring code complex including scheduling because the event can lose RMID any time. Since MBM counters count bandwidth for a period of time by taking snap shot of total bytes at two different times, recycling complicates the way we count MBM in a hierarchy. Also we need a spin lock while we do the processing to account for MBM counter overflow. We also currently use a spin lock in scheduling to prevent the RMID from being taken away. 4. Lack of support when we run different kind of event like task, system-wide and cgroup events together. Data mostly prints 0s. This is also because we can have only one RMID tied to a cpu as defined by the cqm hardware but a perf can at the same time tie multiple events during one sched_in. 5. No support of monitoring a group of tasks. There is partial support for cgroup but it does not work once there is a hierarchy of cgroups or if we want to monitor a task in a cgroup and the cgroup itself. 6. No support for monitoring tasks for the lifetime without perf overhead. 7. It reported the aggregate cache occupancy or memory bandwidth over all sockets. But most cloud and VMM based use cases want to know the individual per-socket usage. Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: eranian@google.com Cc: vikas.shivappa@intel.com Cc: ak@linux.intel.com Cc: davidcc@google.com Cc: reinette.chatre@intel.com Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com	2017-08-01 22:41:18 +02:00
Trond Myklebust	fd40559c86	NFSv4: Fix EXCHANGE_ID corrupt verifier issue The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was changed to be asynchronous by commit `8d89bd70bc`. If we interrrupt the call to rpc_wait_for_completion_task(), we can therefore end up transmitting random stack contents in lieu of the verifier. Fixes: `8d89bd70bc` ("NFS setup async exchange_id") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2017-08-01 16:28:55 -04:00
Archit Taneja	d4cea38ebb	drm/msm/dsi: Calculate link clock rates with updated dsi->lanes After the commit mentioned below, we start computing the byte and pixel clocks (dsi_calc_clk_rate) in the DSI bridge's mode_set() op. The calculation involves the number of DSI lanes being used by the downstream bridge/panel. If the downstream bridge/panel tries to change the number of DSI lanes (as done in the ADV7533 driver) in its mode_set() op, then our DSI host driver will not have the correct number of lanes when computing byte/pixel clocks. Fix this by delaying the clock rate calculation in the DSI bridge enable path. In particular, compute the clock rates in msm_dsi_host_get_phy_clk_req(). This fixes the DSI host error interrupts seen when we try to switch between modes that require different number of lanes (4 to 3 lanes, or vice versa) on db410c. The error interrupts occur since the byte/pixel clock rates aren't according to what the DSI video mode timing engine expects. Fixes: `b62aa70a98` ("drm/msm/dsi: Move PHY operations out of host") Signed-off-by: Archit Taneja <architt@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-08-01 16:26:01 -04:00
Rob Clark	af1f5f12c2	drm/msm/mdp5: fix unclocked register access in _cursor_set() Fixes an insta-reboot when screen-blanking kicks in, due to cursor updates without clocks enabled. Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-08-01 16:25:48 -04:00
Dan Carpenter	71e3dfa167	drm/msm: unlock on error in msm_gem_get_iova() We recently added locking to this function but there was a direct return that was overlooked where we need to unlock. Fixes: `0e08270a1f` ("drm/msm: Separate locking of buffer resources from struct_mutex") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-08-01 16:24:21 -04:00
Wanpeng Li	337c017ccd	KVM: async_pf: make rcu irq exit if not triggered from idle task WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1 RIP: 0010:rcu_note_context_switch+0x207/0x6b0 Call Trace: __schedule+0xda/0xba0 ? kvm_async_pf_task_wait+0x1b2/0x270 schedule+0x40/0x90 kvm_async_pf_task_wait+0x1cc/0x270 ? prepare_to_swait+0x22/0x70 do_async_page_fault+0x77/0xb0 ? do_async_page_fault+0x77/0xb0 async_page_fault+0x28/0x30 RIP: 0010:__d_lookup_rcu+0x90/0x1e0 I encounter this when trying to stress the async page fault in L1 guest w/ L2 guests running. Commit `9b132fbe54` (Add rcu user eqs exception hooks for async page fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu idle eqs when needed, to protect the code that needs use rcu. However, we need to call the pair even if the function calls schedule(), as seen from the above backtrace. This patch fixes it by informing the RCU subsystem exit/enter the irq towards/away from idle for both n.halted and !n.halted. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-08-01 22:24:18 +02:00
Paolo Bonzini	b96fb43977	KVM: nVMX: fixes to nested virt interrupt injection There are three issues in nested_vmx_check_exception: 1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported by Wanpeng Li; 2) it should rebuild the interruption info and exit qualification fields from scratch, as reported by Jim Mattson, because the values from the L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes a page fault, the EPT misconfig's exit qualification is incorrect). 3) CR2 and DR6 should not be written for exception intercept vmexits (CR2 only for AMD). This patch fixes the first two and adds a comment about the last, outlining the fix. Cc: Jim Mattson <jmattson@google.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-08-01 22:24:17 +02:00
Dan Carpenter	65e9310889	drm/msm: fix an integer overflow test We recently added an integer overflow check but it needs an additional tweak to work properly on 32 bit systems. The problem is that we're doing the right hand side of the assignment as type unsigned long so the max it will have an integer overflow instead of being larger than SIZE_MAX. That means the "sz > SIZE_MAX" condition is never true even on 32 bit systems. We need to first cast it to u64 and then do the math. Fixes: `4a630fadbb` ("drm/msm: Fix potential buffer overflow issue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-08-01 16:23:55 -04:00
Viresh Kumar	d490c9cd2f	drm/msm/mdp5: Fix compilation warnings Following compilation warnings were observed for these files: CC [M] drivers/gpu/drm/msm/mdp/mdp5/mdp5_mdss.o drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c: In function 'blend_setup': drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c:223:7: warning: missing braces around initializer [-Wmissing-braces] enum mdp5_pipe stage[STAGE_MAX + 1][MAX_PIPE_STAGE] = { SSPP_NONE }; ^ drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c:223:7: warning: (near initialization for 'stage[0]') [-Wmissing-braces] drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c:224:7: warning: missing braces around initializer [-Wmissing-braces] enum mdp5_pipe r_stage[STAGE_MAX + 1][MAX_PIPE_STAGE] = { SSPP_NONE }; ^ drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c:224:7: warning: (near initialization for 'r_stage[0]') [-Wmissing-braces] drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c: In function 'mdp5_plane_mode_set': drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c:892:9: warning: missing braces around initializer [-Wmissing-braces] struct phase_step step = { 0 }; ^ drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c:892:9: warning: (near initialization for 'step.x') [-Wmissing-braces] drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c:893:9: warning: missing braces around initializer [-Wmissing-braces] struct pixel_ext pe = { 0 }; ^ drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c:893:9: warning: (near initialization for 'pe.left') [-Wmissing-braces] This happens because in the first case we were initializing a two dimensional array with {0} and in the second case we were initializing a struct containing two arrays with {0}. Fix them by adding another pair of {}. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2017-08-01 16:23:33 -04:00
Paolo Bonzini	7313c69805	KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 Do this in the caller of nested_vmx_vmexit instead. nested_vmx_check_exception was doing a vmwrite to the vmcs02's VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move the field to vmcs12->vm_exit_intr_error_code. However that isn't possible on pre-Haswell machines. Moving the vmcs12 write to the callers fixes it. Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1, thanks to fengguang.wu@intel.com] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-08-01 22:23:25 +02:00
Linus Torvalds	26c5cebfdb	Merge branch 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parsic fixes from Helge Deller: - Our cache flushing code ran into a BUG in case context is not current. Fix it by flushing the whole cache in such rare situations (by Dave Anglin). - Fix a "sleeping function called from invalid context BUG" in our pdc_stable driver by rearranging our locks (by James Bottomley) - The thread and irq stacks require more than 16 KB since kernel 4.11. Increase both to 32 KB. - Define CONFIG_CPU_BIG_ENDIAN unconditionally on parisc to avoid wrong behaviour in qrwlock functions (by Babu Moger). * 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Define CONFIG_CPU_BIG_ENDIAN parisc: pdc_stable: Fix locking when creating sysfs links parisc: Increase thread and stack size to 32kb parisc: Handle vma's whose context is not current in flush_cache_range	2017-08-01 13:20:24 -07:00
Neil Armstrong	12ada0513d	ARM64: dts: meson-gxbb-nanopi-k2: Add GPIO lines names This patch describes the GPIO lines usage on the Nanopi K2 board. This is useful in the debugfs gpio file and using the cdev gpio API. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Kevin Hilman <khilman@baylibre.com>	2017-08-01 12:57:41 -07:00
Neil Armstrong	60795933b7	ARM64: dts: meson-gxl-khadas-vim: Add GPIO lines names This patch describes the GPIO lines usage on the Khadas VIM board. This is useful in the debugfs gpio file and using the cdev gpio API. Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> [khilman: minor whitespace fix] Signed-off-by: Kevin Hilman <khilman@baylibre.com>	2017-08-01 12:57:26 -07:00
Jerome Brunet	5149616e7a	ARM64: dts: meson-gxbb: p20x: add card regulator settle times Changing the card voltage on the p200 is not instantaneous, especially when switching from 3.3v to 1.8v. I take at least 70ms for the regulator to go from 3.3v to 1.8v. Add margin to that to make sure we don't upset the sdcard during the voltage switch Fixes: `ef8d2ffedf` ("ARM64: dts: meson-gxbb: add MMC support") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Signed-off-by: Kevin Hilman <khilman@baylibre.com>	2017-08-01 12:46:45 -07:00
Martin Blumenstingl	45631ea8b5	ARM: dts: meson: mark the clock controller also as reset controller The clock controller provides a few reset lines as well. Add the corresponding CPU cores. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: Kevin Hilman <khilman@baylibre.com>	2017-08-01 12:33:47 -07:00
Fabio Estevam	eb353aaa7c	mtd: mtk-quadspi: Remove unneeded pinctrl header There is no need to include <linux/pinctrl/consumer.h> as no pinctrl function is used in this driver, so just remove it. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>	2017-08-01 21:24:47 +02:00
Fabio Estevam	7fd0db5b6e	mtd: atmel-quadspi: Remove unneeded pinctrl header There is no need to include <linux/pinctrl/consumer.h> as no pinctrl function is used in this driver, so just remove it. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>	2017-08-01 21:23:21 +02:00
Hector Martin	fd1b8668af	USB: serial: option: add D-Link DWM-222 device ID Add device id for D-Link DWM-222. Cc: stable@vger.kernel.org Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Johan Hovold <johan@kernel.org>	2017-08-01 21:22:59 +02:00
Kevin Hilman	7e8634e821	dt-bindings: amlogic: add unstable statement Due to the lack of documentation on this SoC family, discovery of new features, and correcting of previous misunderstandings is expected, so it's unrealistic to expect any form of stable bindings for this SoC family. Make that clear inthe binding documentation. Signed-off-by: Kevin Hilman <khilman@baylibre.com> Acked-by: Jerome Brunet <jbrunet@baylibre.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Kevin Hilman <khilman@baylibre.com>	2017-08-01 12:20:40 -07:00
Logan Gunthorpe	0eb4634536	ntb: ntb_test: ensure the link is up before trying to configure the mws After the link tests, there is a race on one side of the test for the link coming up. It's possible, in some cases, for the test script to write to the 'peer_trans' files before the link has come up. To fix this, we simply use the link event file to ensure both sides see the link as up before continuning. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> Fixes: `a9c59ef774` ("ntb_test: Add a selftest script for the NTB subsystem")	2017-08-01 15:18:59 -04:00
Alexander Sverdlin	c4b3eacc1d	mtd: spi-nor: Recover from Spansion/Cypress errors S25FL{128\|256\|512}S datasheets say: "When P_ERR or E_ERR bits are set to one, the WIP bit will remain set to one indicating the device remains busy and unable to receive new operation commands. A Clear Status Register (CLSR) command must be received to return the device to standby mode." Current spi-nor code works until first error occurs, but write/erase errors are not just rare hardware failures, they also occur if user tries to flash write-protected areas. After such attempt no SPI command can be executed any more and even read fails. This patch adds support for P_ERR and E_ERR bits in Status Register 1 (so that operation fails immediately and not after a long timeout) and proper recovery from the error condition. Tested on Spansion S25FS128S, which is supported by S25FL129P entry. Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@wedev4u.fr>	2017-08-01 21:15:33 +02:00
Kees Cook	fe8993b3a0	exec: Consolidate pdeath_signal clearing Instead of an additional secureexec check for pdeath_signal, just move it up into the initial secureexec test. Neither perf nor arch code touches pdeath_signal, so the relocation shouldn't change anything. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com>	2017-08-01 12:03:14 -07:00
Kees Cook	64701dee41	exec: Use sane stack rlimit under secureexec For a secureexec, before memory layout selection has happened, reset the stack rlimit to something sane to avoid the caller having control over the resulting layouts. $ ulimit -s 8192 $ ulimit -s unlimited $ /bin/sh -c 'ulimit -s' unlimited $ sudo /bin/sh -c 'ulimit -s' 8192 Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>	2017-08-01 12:03:14 -07:00
Kees Cook	473d89639d	exec: Consolidate dumpability logic Since it's already valid to set dumpability in the early part of setup_new_exec(), we can consolidate the logic into a single place. The BINPRM_FLAGS_ENFORCE_NONDUMP is set during would_dump() calls before setup_new_exec(), so its test is safe to move as well. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com>	2017-08-01 12:03:13 -07:00
Kees Cook	35b372b76f	smack: Remove redundant pdeath_signal clearing This removes the redundant pdeath_signal clearing in Smack: the check in smack_bprm_committing_creds() matches the check in smack_bprm_set_creds() (which used to be in the now-removed smack_bprm_securexec() hook) and since secureexec is now being checked for clearing pdeath_signal, this is redundant to the common exec code. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>	2017-08-01 12:03:12 -07:00
Kees Cook	a70423dfbc	exec: Use secureexec for clearing pdeath_signal Like dumpability, clearing pdeath_signal happens both in setup_new_exec() and later in commit_creds(). The test in setup_new_exec() is different from all other privilege comparisons, though: it is checking the new cred (bprm) uid vs the old cred (current) euid. This appears to be a bug, introduced by commit `a6f76f23d2` ("CRED: Make execve() take advantage of copy-on-write credentials"): - if (bprm->e_uid != current_euid() \|\| - bprm->e_gid != current_egid()) { - set_dumpable(current->mm, suid_dumpable); + if (bprm->cred->uid != current_euid() \|\| + bprm->cred->gid != current_egid()) { It was bprm euid vs current euid (and egids), but the effective got dropped. Nothing in the exec flow changes bprm->cred->uid (nor gid). The call traces are: prepare_bprm_creds() prepare_exec_creds() prepare_creds() memcpy(new_creds, old_creds, ...) security_prepare_creds() (unimplemented by commoncap) ... prepare_binprm() bprm_fill_uid() resets euid/egid to current euid/egid sets euid/egid on bprm based on set*id file bits security_bprm_set_creds() cap_bprm_set_creds() handle all caps-based manipulations so this test is effectively a test of current_uid() vs current_euid(), which is wrong, just like the prior dumpability tests were wrong. The commit log says "Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds()." This may be meaning the earlier old euid vs new euid (and egid) test that got changed. Luckily, as with dumpability, this is all masked by commit_creds() which performs old/new euid and egid tests and clears pdeath_signal. And again, like dumpability, we should include LSM secureexec logic for pdeath_signal clearing. For example, Smack goes out of its way to clear pdeath_signal when it finds a secureexec condition. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com>	2017-08-01 12:03:11 -07:00
Kees Cook	e37fdb785a	exec: Use secureexec for setting dumpability The examination of "current" to decide dumpability is wrong. This was a check of and euid/uid (or egid/gid) mismatch in the existing process, not the newly created one. This appears to stretch back into even the "history.git" tree. Luckily, dumpability is later set in commit_creds(). In earlier kernel versions before creds existed, similar checks also existed late in the exec flow, covering up the mistake as far back as I could find. Note that because the commit_creds() check examines differences of euid, uid, egid, gid, and capabilities between the old and new creds, it would look like the setup_new_exec() dumpability test could be entirely removed. However, the secureexec test may cover a different set of tests (specific to the LSMs) than what commit_creds() checks for. So, fix this test to use secureexec (the removed euid tests are redundant to the commoncap secureexec checks now). Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com>	2017-08-01 12:03:10 -07:00
Kees Cook	2af6228026	LSM: drop bprm_secureexec hook This removes the bprm_secureexec hook since the logic has been folded into the bprm_set_creds hook for all LSMs now. Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>	2017-08-01 12:03:10 -07:00
Kees Cook	ee67ae7ef6	commoncap: Move cap_elevated calculation into bprm_set_creds Instead of a separate function, open-code the cap_elevated test, which lets us entirely remove bprm->cap_effective (to use the local "effective" variable instead), and more accurately examine euid/egid changes via the existing local "is_setid". The following LTP tests were run to validate the changes: # ./runltp -f syscalls -s cap # ./runltp -f securebits # ./runltp -f cap_bounds # ./runltp -f filecaps All kernel selftests for capabilities and exec continue to pass as well. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Andy Lutomirski <luto@kernel.org>	2017-08-01 12:03:09 -07:00
Kees Cook	46d98eb4e1	commoncap: Refactor to remove bprm_secureexec hook The commoncap implementation of the bprm_secureexec hook is the only LSM that depends on the final call to its bprm_set_creds hook (since it may be called for multiple files, it ignores bprm->called_set_creds). As a result, it cannot safely _clear_ bprm->secureexec since other LSMs may have set it. Instead, remove the bprm_secureexec hook by introducing a new flag to bprm specific to commoncap: cap_elevated. This is similar to cap_effective, but that is used for a specific subset of elevated privileges, and exists solely to track state from bprm_set_creds to bprm_secureexec. As such, it will be removed in the next patch. Here, set the new bprm->cap_elevated flag when setuid/setgid has happened from bprm_fill_uid() or fscapabilities have been prepared. This temporarily moves the bprm_secureexec hook to a static inline. The helper will be removed in the next patch; this makes the step easier to review and bisect, since this does not introduce any changes to inputs nor outputs to the "elevated privileges" calculation. The new flag is merged with the bprm->secureexec flag in setup_new_exec() since this marks the end of any further prepare_binprm() calls. Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Andy Lutomirski <luto@kernel.org> Acked-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>	2017-08-01 12:03:08 -07:00
Kees Cook	ccbb6e1065	smack: Refactor to remove bprm_secureexec hook The Smack bprm_secureexec hook can be merged with the bprm_set_creds hook since it's dealing with the same information, and all of the details are finalized during the first call to the bprm_set_creds hook via prepare_binprm() (subsequent calls due to binfmt_script, etc, are ignored via bprm->called_set_creds). Here, the test can just happen at the end of the bprm_set_creds hook, and the bprm_secureexec hook can be dropped. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>	2017-08-01 12:03:07 -07:00
Kees Cook	62874c3adf	selinux: Refactor to remove bprm_secureexec hook The SELinux bprm_secureexec hook can be merged with the bprm_set_creds hook since it's dealing with the same information, and all of the details are finalized during the first call to the bprm_set_creds hook via prepare_binprm() (subsequent calls due to binfmt_script, etc, are ignored via bprm->called_set_creds). Here, the test can just happen at the end of the bprm_set_creds hook, and the bprm_secureexec hook can be dropped. Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Paul Moore <paul@paul-moore.com> Tested-by: Paul Moore <paul@paul-moore.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Andy Lutomirski <luto@kernel.org>	2017-08-01 12:03:07 -07:00
Kees Cook	993b3ab064	apparmor: Refactor to remove bprm_secureexec hook The AppArmor bprm_secureexec hook can be merged with the bprm_set_creds hook since it's dealing with the same information, and all of the details are finalized during the first call to the bprm_set_creds hook via prepare_binprm() (subsequent calls due to binfmt_script, etc, are ignored via bprm->called_set_creds). Here, all the comments describe how secureexec is actually calculated during bprm_set_creds, so this actually does it, drops the bprm flag that was being used internally by AppArmor, and drops the bprm_secureexec hook. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: John Johansen <john.johansen@canonical.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Acked-by: Serge Hallyn <serge@hallyn.com>	2017-08-01 12:03:06 -07:00
Kees Cook	c425e189ff	binfmt: Introduce secureexec flag The bprm_secureexec hook can be moved earlier. Right now, it is called during create_elf_tables(), via load_binary(), via search_binary_handler(), via exec_binprm(). Nearly all (see exception below) state used by bprm_secureexec is created during the bprm_set_creds hook, called from prepare_binprm(). For all LSMs (except commoncaps described next), only the first execution of bprm_set_creds takes any effect (they all check bprm->called_set_creds which prepare_binprm() sets after the first call to the bprm_set_creds hook). However, all these LSMs also only do anything with bprm_secureexec when they detected a secure state during their first run of bprm_set_creds. Therefore, it is functionally identical to move the detection into bprm_set_creds, since the results from secureexec here only need to be based on the first call to the LSM's bprm_set_creds hook. The single exception is that the commoncaps secureexec hook also examines euid/uid and egid/gid differences which are controlled by bprm_fill_uid(), via prepare_binprm(), which can be called multiple times (e.g. binfmt_script, binfmt_misc), and may clear the euid/egid for the final load (i.e. the script interpreter). However, while commoncaps specifically ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time prepare_binprm() may get called, it needs to base the secureexec decision on the final call to bprm_set_creds. As a result, it will need special handling. To begin this refactoring, this adds the secureexec flag to the bprm struct, and calls the secureexec hook during setup_new_exec(). This is safe since all the cred work is finished (and past the point of no return). This explicit call will be removed in later patches once the hook has been removed. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: James Morris <james.l.morris@oracle.com>	2017-08-01 12:03:05 -07:00

... 181 182 183 184 185 ...

704772 Commits