linux

Author	SHA1	Message	Date
Joerg Roedel	186dfc9d69	x86/swiotlb: Try coherent allocations with __GFP_NOWARN When we boot a kdump kernel in high memory, there is by default only 72MB of low memory available. The swiotlb code takes 64MB of it (by default) so that there are only 8MB left to allocate from. On systems with many devices this causes page allocator warnings from dma_generic_alloc_coherent(): systemd-udevd: page allocation failure: order:0, mode:0x280d4 CPU: 0 PID: 197 Comm: systemd-udevd Tainted: G W 3.12.28-4-default #1 Hardware name: HP ProLiant DL980 G7, BIOS P66 07/30/2012 ffff8800781335e0 ffffffff8150b1db 00000000000280d4 ffffffff8113af90 0000000000000000 0000000000000000 ffff88007efdbb00 0000000100000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 Call Trace: dump_trace+0x7d/0x2d0 show_stack_log_lvl+0x94/0x170 show_stack+0x21/0x50 dump_stack+0x41/0x51 warn_alloc_failed+0xf0/0x160 __alloc_pages_slowpath+0x72f/0x796 __alloc_pages_nodemask+0x1ea/0x210 dma_generic_alloc_coherent+0x96/0x140 x86_swiotlb_alloc_coherent+0x1c/0x50 ttm_dma_pool_alloc_new_pages+0xab/0x320 [ttm] ttm_dma_populate+0x3ce/0x640 [ttm] ttm_tt_bind+0x36/0x60 [ttm] ttm_bo_handle_move_mem+0x55f/0x5c0 [ttm] ttm_bo_move_buffer+0x105/0x130 [ttm] ttm_bo_validate+0xc1/0x130 [ttm] ttm_bo_init+0x24b/0x400 [ttm] radeon_bo_create+0x16c/0x200 [radeon] radeon_ring_init+0x11e/0x2b0 [radeon] r100_cp_init+0x123/0x5b0 [radeon] r100_startup+0x194/0x230 [radeon] r100_init+0x223/0x410 [radeon] radeon_device_init+0x6af/0x830 [radeon] radeon_driver_load_kms+0x89/0x180 [radeon] drm_get_pci_dev+0x121/0x2f0 [drm] local_pci_probe+0x39/0x60 pci_device_probe+0xa9/0x120 driver_probe_device+0x9d/0x3d0 __driver_attach+0x8b/0x90 bus_for_each_dev+0x5b/0x90 bus_add_driver+0x1f8/0x2c0 driver_register+0x5b/0xe0 do_one_initcall+0xf2/0x1a0 load_module+0x1207/0x1c70 SYSC_finit_module+0x75/0xa0 system_call_fastpath+0x16/0x1b 0x7fac533d2788 After these warnings the code enters a fall-back path and allocated directly from the swiotlb aperture in the end. So remove these warnings as this is not a fatal error. Signed-off-by: Joerg Roedel <jroedel@suse.de> [ Simplify, reflow comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jörg Rödel <joro@8bytes.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/1433500202-25531-3-git-send-email-joro@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-11 08:28:38 +02:00
Alexey Kardashevskiy	e633bc86a9	vfio: powerpc/spapr: Support Dynamic DMA windows This adds create/remove window ioctls to create and remove DMA windows. sPAPR defines a Dynamic DMA windows capability which allows para-virtualized guests to create additional DMA windows on a PCI bus. The existing linux kernels use this new window to map the entire guest memory and switch to the direct DMA operations saving time on map/unmap requests which would normally happen in a big amounts. This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows. Up to 2 windows are supported now by the hardware and by this driver. This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional information such as a number of supported windows and maximum number levels of TCE tables. DDW is added as a capability, not as a SPAPR TCE IOMMU v2 unique feature as we still want to support v2 on platforms which cannot do DDW for the sake of TCE acceleration in KVM (coming soon). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:55 +10:00
Alexey Kardashevskiy	2157e7b82f	vfio: powerpc/spapr: Register memory and define IOMMU v2 The existing implementation accounts the whole DMA window in the locked_vm counter. This is going to be worse with multiple containers and huge DMA windows. Also, real-time accounting would requite additional tracking of accounted pages due to the page size difference - IOMMU uses 4K pages and system uses 4K or 64K pages. Another issue is that actual pages pinning/unpinning happens on every DMA map/unmap request. This does not affect the performance much now as we spend way too much time now on switching context between guest/userspace/host but this will start to matter when we add in-kernel DMA map/unmap acceleration. This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU. New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces 2 new ioctls to register/unregister DMA memory - VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY - which receive user space address and size of a memory region which needs to be pinned/unpinned and counted in locked_vm. New IOMMU splits physical pages pinning and TCE table update into 2 different operations. It requires: 1) guest pages to be registered first 2) consequent map/unmap requests to work only with pre-registered memory. For the default single window case this means that the entire guest (instead of 2GB) needs to be pinned before using VFIO. When a huge DMA window is added, no additional pinning will be required, otherwise it would be guest RAM + 2GB. The new memory registration ioctls are not supported by VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration will require memory to be preregistered in order to work. The accounting is done per the user process. This advertises v2 SPAPR TCE IOMMU and restricts what the userspace can do with v1 or v2 IOMMUs. In order to support memory pre-registration, we need a way to track the use of every registered memory region and only allow unregistration if a region is not in use anymore. So we need a way to tell from what region the just cleared TCE was from. This adds a userspace view of the TCE table into iommu_table struct. It contains userspace address, one per TCE entry. The table is only allocated when the ownership over an IOMMU group is taken which means it is only used from outside of the powernv code (such as VFIO). As v2 IOMMU supports IODA2 and pre-IODA2 IOMMUs (which do not support DDW API), this creates a default DMA window for IODA2 for consistency. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:55 +10:00
Alexey Kardashevskiy	15b244a88e	powerpc/mmu: Add userspace-to-physical addresses translation cache We are adding support for DMA memory pre-registration to be used in conjunction with VFIO. The idea is that the userspace which is going to run a guest may want to pre-register a user space memory region so it all gets pinned once and never goes away. Having this done, a hypervisor will not have to pin/unpin pages on every DMA map/unmap request. This is going to help with multiple pinning of the same memory. Another use of it is in-kernel real mode (mmu off) acceleration of DMA requests where real time translation of guest physical to host physical addresses is non-trivial and may fail as linux ptes may be temporarily invalid. Also, having cached host physical addresses (compared to just pinning at the start and then walking the page table again on every H_PUT_TCE), we can be sure that the addresses which we put into TCE table are the ones we already pinned. This adds a list of memory regions to mm_context_t. Each region consists of a header and a list of physical addresses. This adds API to: 1. register/unregister memory regions; 2. do final cleanup (which puts all pre-registered pages); 3. do userspace to physical address translation; 4. manage usage counters; multiple registration of the same memory is allowed (once per container). This implements 2 counters per registered memory region: - @mapped: incremented on every DMA mapping; decremented on unmapping; initialized to 1 when a region is just registered; once it becomes zero, no more mappings allowe; - @used: incremented on every "register" ioctl; decremented on "unregister"; unregistration is allowed for DMA mapped regions unless it is the very last reference. For the very last reference this checks that the region is still mapped and returns -EBUSY so the userspace gets to know that memory is still pinned and unregistration needs to be retried; @used remains 1. Host physical addresses are stored in vmalloc'ed array. In order to access these in the real mode (mmu off), there is a real_vmalloc_addr() helper. In-kernel acceleration patchset will move it from KVM to MMU code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy	46d3e1e162	vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control Before the IOMMU user (VFIO) would take control over the IOMMU table belonging to a specific IOMMU group. This approach did not allow sharing tables between IOMMU groups attached to the same container. This introduces a new IOMMU ownership flavour when the user can not just control the existing IOMMU table but remove/create tables on demand. If an IOMMU implements take/release_ownership() callbacks, this lets the user have full control over the IOMMU group. When the ownership is taken, the platform code removes all the windows so the caller must create them. Before returning the ownership back to the platform code, VFIO unprograms and removes all the tables it created. This changes IODA2's onwership handler to remove the existing table rather than manipulating with the existing one. From now on, iommu_take_ownership() and iommu_release_ownership() are only called from the vfio_iommu_spapr_tce driver. Old-style ownership is still supported allowing VFIO to run on older P5IOC2 and IODA IO controllers. No change in userspace-visible behaviour is expected. Since it recreates TCE tables on each ownership change, related kernel traces will appear more often. This adds a pnv_pci_ioda2_setup_default_config() which is called when PE is being configured at boot time and when the ownership is passed from VFIO to the platform code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:54 +10:00
Alexey Kardashevskiy	0054719386	powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table This adds a way for the IOMMU user to know how much a new table will use so it can be accounted in the locked_vm limit before allocation happens. This stores the allocated table size in pnv_pci_ioda2_get_table_size() so the locked_vm counter can be updated correctly when a table is being disposed. This defines an iommu_table_group_ops callback to let VFIO know how much memory will be locked if a table is created. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:53 +10:00
Alexey Kardashevskiy	c035e37b58	powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release The existing code programmed TVT#0 with some address and then immediately released that memory. This makes use of pnv_pci_ioda2_unset_window() and pnv_pci_ioda2_set_bypass() which do correct resource release and TVT update. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:53 +10:00
Alexey Kardashevskiy	4793d65d1a	vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API This extends iommu_table_group_ops by a set of callbacks to support dynamic DMA windows management. create_table() creates a TCE table with specific parameters. it receives iommu_table_group to know nodeid in order to allocate TCE table memory closer to the PHB. The exact format of allocated multi-level table might be also specific to the PHB model (not the case now though). This callback calculated the DMA window offset on a PCI bus from @num and stores it in a just created table. set_window() sets the window at specified TVT index + @num on PHB. unset_window() unsets the window from specified TVT. This adds a free() callback to iommu_table_ops to free the memory (potentially a tree of tables) allocated for the TCE table. create_table() and free() are supposed to be called once per VFIO container and set_window()/unset_window() are supposed to be called for every group in a container. This adds IOMMU capabilities to iommu_table_group such as default 32bit window parameters and others. This makes use of new values in vfio_iommu_spapr_tce. IODA1/P5IOC2 do not support DDW so they do not advertise pagemasks to the userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:52 +10:00
Alexey Kardashevskiy	bbb845c4ba	powerpc/powernv: Implement multilevel TCE tables TCE tables might get too big in case of 4K IOMMU pages and DDW enabled on huge guests (hundreds of GB of RAM) so the kernel might be unable to allocate contiguous chunk of physical memory to store the TCE table. To address this, POWER8 CPU (actually, IODA2) supports multi-level TCE tables, up to 5 levels which splits the table into a tree of smaller subtables. This adds multi-level TCE tables support to pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages() helpers. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:51 +10:00
Alexey Kardashevskiy	43cb60ab7f	powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window This is a part of moving DMA window programming to an iommu_ops callback. pnv_pci_ioda2_set_window() takes an iommu_table_group as a first parameter (not pnv_ioda_pe) as it is going to be used as a callback for VFIO DDW code. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:51 +10:00
Alexey Kardashevskiy	aca6913f55	powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages This is a part of moving TCE table allocation into an iommu_ops callback to support multiple IOMMU groups per one VFIO container. This moves the code which allocates the actual TCE tables to helpers: pnv_pci_ioda2_table_alloc_pages() and pnv_pci_ioda2_table_free_pages(). These do not allocate/free the iommu_table struct. This enforces window size to be a power of two. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:50 +10:00
Alexey Kardashevskiy	e5aad1e678	powerpc/powernv/ioda2: Rework iommu_table creation This moves iommu_table creation to the beginning to make following changes easier to review. This starts using table parameters from the iommu_table struct. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:50 +10:00
Alexey Kardashevskiy	05c6cfb9dc	powerpc/iommu/powernv: Release replaced TCE At the moment writing new TCE value to the IOMMU table fails with EBUSY if there is a valid entry already. However PAPR specification allows the guest to write new TCE value without clearing it first. Another problem this patch is addressing is the use of pool locks for external IOMMU users such as VFIO. The pool locks are to protect DMA page allocator rather than entries and since the host kernel does not control what pages are in use, there is no point in pool locks and exchange()+put_page(oldtce) is sufficient to avoid possible races. This adds an exchange() callback to iommu_table_ops which does the same thing as set() plus it returns replaced TCE and DMA direction so the caller can release the pages afterwards. The exchange() receives a physical address unlike set() which receives linear mapping address; and returns a physical address as the clear() does. This implements exchange() for P5IOC2/IODA/IODA2. This adds a requirement for a platform to have exchange() implemented in order to support VFIO. This replaces iommu_tce_build() and iommu_clear_tce() with a single iommu_tce_xchg(). This makes sure that TCE permission bits are not set in TCE passed to IOMMU API as those are to be calculated by platform code from DMA direction. This moves SetPageDirty() to the IOMMU code to make it work for both VFIO ioctl interface in in-kernel TCE acceleration (when it becomes available later). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:49 +10:00
Alexey Kardashevskiy	c5bb44edee	powerpc/powernv: Implement accessor to TCE entry This replaces direct accesses to TCE table with a helper which returns an TCE entry address. This does not make difference now but will when multi-level TCE tables get introduces. No change in behavior is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:49 +10:00
Alexey Kardashevskiy	e57080f17d	powerpc/powernv/ioda2: Add TCE invalidation for all attached groups The iommu_table struct keeps a list of IOMMU groups it is used for. At the moment there is just a single group attached but further patches will add TCE table sharing. When sharing is enabled, TCE cache in each PE needs to be invalidated so does the patch. This does not change pnv_pci_ioda1_tce_invalidate() as there is no plan to enable TCE table sharing on PHBs older than IODA2. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:48 +10:00
Alexey Kardashevskiy	5780fb0426	powerpc/powernv/ioda2: Move TCE kill register address to PE At the moment the DMA setup code looks for the "ibm,opal-tce-kill" property which contains the TCE kill register address. Writing to this register invalidates TCE cache on IODA/IODA2 hub. This moves the register address from iommu_table to pnv_pnb as this register belongs to PHB and invalidates TCE cache for all tables of all attached PEs. This moves the property reading/remapping code to a helper which is called when DMA is being configured for PE and which does DMA setup for both IODA1 and IODA2. This adds a new pnv_pci_ioda2_tce_invalidate_entire() helper which invalidates cache for the entire table. It should be called after every call to opal_pci_map_pe_dma_window(). It was not required before because there was just a single TCE table and 64bit DMA was handled via bypass window (which has no table so no cache was used) but this is going to change with Dynamic DMA windows (DDW). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:48 +10:00
Alexey Kardashevskiy	b82c75bfbe	powerpc/iommu: Fix IOMMU ownership control functions This adds missing locks in iommu_take_ownership()/ iommu_release_ownership(). This marks all pages busy in iommu_table::it_map in order to catch errors if there is an attempt to use this table while ownership over it is taken. This only clears TCE content if there is no page marked busy in it_map. Clearing must be done outside of the table locks as iommu_clear_tce() called from iommu_clear_tces_and_put_pages() does this. In order to use bitmap_empty(), the existing code clears bit#0 which is set even in an empty table if it is bus-mapped at 0 as iommu_init_table() reserves page#0 to prevent buggy drivers from crashing when allocated page is bus-mapped at zero (which is correct). This restores the bit in the case of failure to bring the it_map to the state it was in when we called iommu_take_ownership(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:47 +10:00
Alexey Kardashevskiy	f87a88642e	vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control This adds tce_iommu_take_ownership() and tce_iommu_release_ownership which call in a loop iommu_take_ownership()/iommu_release_ownership() for every table on the group. As there is just one now, no change in behaviour is expected. At the moment the iommu_table struct has a set_bypass() which enables/ disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code which calls this callback when external IOMMU users such as VFIO are about to get over a PHB. The set_bypass() callback is not really an iommu_table function but IOMMU/PE function. This introduces a iommu_table_group_ops struct and adds take_ownership()/release_ownership() callbacks to it which are called when an external user takes/releases control over the IOMMU. This replaces set_bypass() with ownership callbacks as it is not necessarily just bypass enabling, it can be something else/more so let's give it more generic name. The callbacks is implemented for IODA2 only. Other platforms (P5IOC2, IODA1) will use the old iommu_take_ownership/iommu_release_ownership API. The following patches will replace iommu_take_ownership/ iommu_release_ownership calls in IODA2 with full IOMMU table release/ create. As we here and touching bypass control, this removes pnv_pci_ioda2_setup_bypass_pe() as it does not do much more compared to pnv_pci_ioda2_set_bypass. This moves tce_bypass_base initialization to pnv_pci_ioda2_setup_dma_pe. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:47 +10:00
Alexey Kardashevskiy	0eaf4defc7	powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group So far one TCE table could only be used by one IOMMU group. However IODA2 hardware allows programming the same TCE table address to multiple PE allowing sharing tables. This replaces a single pointer to a group in a iommu_table struct with a linked list of groups which provides the way of invalidating TCE cache for every PE when an actual TCE table is updated. This adds pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers to manage the list. However without VFIO, it is still going to be a single IOMMU group per iommu_table. This changes iommu_add_device() to add a device to a first group from the group list of a table as it is only called from the platform init code or PCI bus notifier and at these moments there is only one group per table. This does not change TCE invalidation code to loop through all attached groups in order to simplify this patch and because it is not really needed in most cases. IODA2 is fixed in a later patch. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:16:15 +10:00
Alexey Kardashevskiy	b348aa6529	powerpc/spapr: vfio: Replace iommu_table with iommu_table_group Modern IBM POWERPC systems support multiple (currently two) TCE tables per IOMMU group (a.k.a. PE). This adds a iommu_table_group container for TCE tables. Right now just one table is supported. This defines iommu_table_group struct which stores pointers to iommu_group and iommu_table(s). This replaces iommu_table with iommu_table_group where iommu_table was used to identify a group: - iommu_register_group(); - iommudata of generic iommu_group; This removes @data from iommu_table as it_table_group provides same access to pnv_ioda_pe. For IODA, instead of embedding iommu_table, the new iommu_table_group keeps pointers to those. The iommu_table structs are allocated dynamically. For P5IOC2, both iommu_table_group and iommu_table are embedded into PE struct. As there is no EEH and SRIOV support for P5IOC2, iommu_free_table() should not be called on iommu_table struct pointers so we can keep it embedded in pnv_phb::p5ioc2. For pSeries, this replaces multiple calls of kzalloc_node() with a new iommu_pseries_alloc_group() helper and stores the table group struct pointer into the pci_dn struct. For release, a iommu_table_free_group() helper is added. This moves iommu_table struct allocation from SR-IOV code to the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized. This change is here because those lines had to be changed anyway. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:57 +10:00
Alexey Kardashevskiy	decbda2572	powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is supposed to be called on IODA1/2 and not called on p5ioc2. It receives start and end host addresses of TCE table. IODA2 actually needs PCI addresses to invalidate the cache. Those can be calculated from host addresses but since we are going to implement multi-level TCE tables, calculating PCI address from a host address might get either tricky or ugly as TCE table remains flat on PCI bus but not in RAM. This moves pnv_pci_ioda_tce_invalidate() from generic pnv_tce_build/ pnt_tce_free and defines IODA1/2-specific callbacks which call generic ones and do PHB-model-specific TCE cache invalidation. P5IOC2 keeps using generic callbacks as before. This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and number of pages which are PCI addresses shifted by IOMMU page shift. No change in behaviour is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	da004c3600	powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table This adds a iommu_table_ops struct and puts pointer to it into the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush callbacks from ppc_md to the new struct where they really belong to. This adds the requirement for @it_ops to be initialized before calling iommu_init_table() to make sure that we do not leave any IOMMU table with iommu_table_ops uninitialized. This is not a parameter of iommu_init_table() though as there will be cases when iommu_init_table() will not be called on TCE tables, for example - VFIO. This does s/tce_build/set/, s/tce_free/clear/ and removes "tce_" redundant prefixes. This removes tce_xxx_rm handlers from ppc_md but does not add them to iommu_table_ops as this will be done later if we decide to support TCE hypercalls in real mode. This removes _vm callbacks as only virtual mode is supported by now so this also removes @rm parameter. For pSeries, this always uses tce_buildmulti_pSeriesLP/ tce_buildmulti_pSeriesLP. This changes multi callback to fall back to tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not present. The reason for this is we still have to support "multitce=off" boot parameter in disable_multitce() and we do not want to walk through all IOMMU tables in the system and replace "multi" callbacks with single ones. For powernv, this defines _ops per PHB type which are P5IOC2/IODA1/IODA2. This makes the callbacks for them public. Later patches will extend callbacks for IODA1/2. No change in behaviour is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	10b35b2b74	powerpc/powernv: Do not set "read" flag if direction==DMA_NONE Normally a bitmap from the iommu_table is used to track what TCE entry is in use. Since we are going to use iommu_table without its locks and do xchg() instead, it becomes essential not to put bits which are not implied in the direction flag as the old TCE value (more precisely - the permission bits) will be used to decide whether to put the page or not. This adds iommu_direction_to_tce_perm() (its counterpart is there already) and uses it for powernv's pnv_tce_build(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:56 +10:00
Alexey Kardashevskiy	9b14a1ff86	vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver This moves page pinning (get_user_pages_fast()/put_page()) code out of the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs to as the platform code does not deal with page pinning. This makes iommu_take_ownership()/iommu_release_ownership() deal with the IOMMU table bitmap only. This removes page unpinning from iommu_take_ownership() as the actual TCE table might contain garbage and doing put_page() on it is undefined behaviour. Besides the last part, the rest of the patch is mechanical. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for the vfio related changes] Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	8aca92d82d	powerpc/iommu: Always release iommu_table in iommu_free_table() At the moment iommu_free_table() only releases memory if the table was initialized for the platform code use, i.e. it had it_map initialized (which purpose is to track DMA memory space use). With dynamic DMA windows, we will need to be able to release iommu_table even if it was used for VFIO in which case it_map is NULL so does the patch. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	ac9a58891a	powerpc/iommu: Put IOMMU group explicitly So far an iommu_table lifetime was the same as PE. Dynamic DMA windows will change this and iommu_free_table() will not always require the group to be released. This moves iommu_group_put() out of iommu_free_table(). This adds a iommu_pseries_free_table() helper which does iommu_group_put() and iommu_free_table(). Later it will be changed to receive a table_group and we will have to change less lines then. This should cause no behavioural change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:55 +10:00
Alexey Kardashevskiy	c5773822c0	powerpc/powernv/ioda: Clean up IOMMU group registration The existing code has 3 calls to iommu_register_group() and all 3 branches actually cover all possible cases. This replaces 3 calls with one and moves the registration earlier; the latter will make more sense when we add TCE table sharing. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:54 +10:00
Alexey Kardashevskiy	4617082ec0	powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group The set_iommu_table_base_and_group() name suggests that the function sets table base and add a device to an IOMMU group. The actual purpose for table base setting is to put some reference into a device so later iommu_add_device() can get the IOMMU group reference and the device to the group. At the moment a group cannot be explicitly passed to iommu_add_device() as we want it to work from the bus notifier, we can fix it later and remove confusing calls of set_iommu_table_base(). This replaces set_iommu_table_base_and_group() with a couple of set_iommu_table_base() + iommu_add_device() which makes reading the code easier. This adds few comments why set_iommu_table_base() and iommu_add_device() are called where they are called. For IODA1/2, this essentially removes iommu_add_device() call from the pnv_pci_ioda_dma_dev_setup() as it will always fail at this particular place: - for physical PE, the device is already attached by iommu_add_device() in pnv_pci_ioda_setup_dma_pe(); - for virtual PE, the sysfs entries are not ready to create all symlinks so actual adding is happening in tce_iommu_bus_notifier. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:54 +10:00
Alexey Kardashevskiy	ea30e99e8e	powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group This relies on the fact that a PCI device always has an IOMMU table which may not be the case when we get dynamic DMA windows so let's use more reliable check for IOMMU group here. As we do not rely on the table presence here, remove the workaround from pnv_pci_ioda2_set_bypass(); also remove the @add_to_iommu_group parameter from pnv_ioda_setup_bus_dma(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-11 15:14:54 +10:00
Linus Torvalds	2bfc60dd28	Fix build errors affecting blackfin and score architectures. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVeHo5AAoJEMsfJm/On5mBU8oQAJIAMPvFs0VqrzkgzILjDpmO sDhNwrEMbRSyTHZMvcuvAbMpeSYAeGu4KZ1oQlxxr18Sw0QJ9Asw79drhWI9+5ic pt6DBNaPbLlfAdbOgZaIdcETIUPE9SGPcmIMJLpVtgJ+9LmJnmSOWI358d1WFEJQ u45OQfWM5tLxMFwFeMj9dFrAGP9f6Z+lSbdp5RoOqFZ4S6Ad/5mwy3i09Hr+n5pY HUoSFd+Z/4WT2pFW3//x/Mwejy4FqMIp6OEk8XsrGz3S0wqKRyeuEuypLqBuRAjj IgfVtUwlIlmRs1on3/20qOb7O/O9BYceRtFYZ7OraIf/ljaGn97tUlkcBq0BW2IK MGLWppTngnpW5EJ6+h4qtF/KnWRPVNZKDC6hA4KMwBm8PVAdKr1bFxkG2dS3Ycz/ Tn24s8EtaSszQLNWacUhP8JblVh7OQk7QXEKTN8QT31qJHcHpS+ZY+Fk7KHHTDud kta42WPau5GHQC4sVjmcqxdth1BpP2IWZAEUNdANTK8T1g1HOAcZqWGbYCBBiM2Q ClP1pJ54+W8uhw3W44hm/bIs20YQ6rKYZColK8xOL1ykNUjV6HVxqN9q8+clrlo0 SQMWUtsCQn33SM48P+zE2V9NYLHaO4SSdSo6Li+KxUdU0u2IZG8IigJ+OzIWi43n G6Lub3GQQTZg1qix7PmQ =PGHg -----END PGP SIGNATURE----- Merge tag 'misc-for-linus-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull misc fixes from Guenter Roeck: "There are two patches here. One fixes a build error affecting the blackfin architecture, the other fixes a build error affecting the score architecture. The score maintainer (Lennox Wu) has a hard time sending you the score patch, and the blackfin maintainer (Steven Miao) has been silent since -rc1. Since 4.1 is about to be released, I figured it would be useful to get the patches upstream to avoid the related build failures in the final release" * tag 'misc-for-linus-4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: score: Fix exception handler label blackfin: Fix build error	2015-06-10 17:16:32 -07:00
Andrew Morton	5ec45a192f	arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug Fix this compile issue with gcc-4.4.4: arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write': arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer ... gcc-4.4.4 (at least) has issues when using anonymous unions in initializers. Fixes: `edc90b7dc4` ("KVM: MMU: fix SMAP virtualization") Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-06-10 16:43:43 -07:00
Luis R. Rodriguez	e55645ec57	ia64: remove paravirt code All the ia64 pvops code is now dead code since both xen and kvm support have been ripped out [0] [1]. Just that no one had troubled to rip this stuff out. The only useful remaining pieces were the old pvops docs but that was recently also generalized and moved out from ia64 [2]. This has been run time tested on an ia64 Madison system. [0] `003f7de625` "KVM: ia64: remove" since v3.19-rc1 [1] `d52eefb47d` "ia64/xen: Remove Xen support for ia64" since v3.14-rc1 [2] "virtual: Documentation: simplify and generalize paravirt_ops.txt" Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2015-06-10 14:26:32 -07:00
Guenter Roeck	80524e9336	score: Fix exception handler label The latest version of modinfo fails to compile score architecture targets with the following error. FATAL: The relocation at __ex_table+0x634 references section "__ex_table" which is not executable, IOW the kernel will fault if it ever tries to jump to it. Something is seriously wrong and should be fixed. The probem is caused by a bad label in an __ex_table entry. Acked-by: Lennox Wu <lennox.wu@gmail.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2015-06-10 10:19:47 -07:00
Guenter Roeck	5eea900389	blackfin: Fix build error Fix include/asm-generic/io.h: In function 'readb': include/asm-generic/io.h:113:2: error: implicit declaration of function 'bfin_read8' include/asm-generic/io.h: In function 'readw': include/asm-generic/io.h:121:2: error: implicit declaration of function 'bfin_read16' include/asm-generic/io.h: In function 'readl': include/asm-generic/io.h:129:2: error: implicit declaration of function 'bfin_read32' include/asm-generic/io.h: In function 'writeb': include/asm-generic/io.h:147:2: error: implicit declaration of function 'bfin_write8' include/asm-generic/io.h: In function 'writew': include/asm-generic/io.h:155:2: error: implicit declaration of function 'bfin_write16' include/asm-generic/io.h: In function 'writel': include/asm-generic/io.h:163:2: error: implicit declaration of function 'bfin_write32' Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: `1a3372bc52` ("blackfin: io: define __raw_readx/writex with bfin_readx/writex") Cc: Steven Miao <realmz6@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>	2015-06-10 10:19:24 -07:00
Ralf Baechle	9cc719ab3f	MIPS: MSA: bugfix - disable MSA correctly for new threads/processes. Due to the slightly odd way that new threads and processes start execution when scheduled for the very first time they were bypassing the required disable_msa call. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-10 11:24:53 +02:00
Ralf Baechle	d9fb566045	MIPS: Loongson: Do not register 8250 platform device from module. If CONFIG_SERIAL_8250 is set to m, the Loongson seria.ko module might get unloaded while the serial driver modules are still loaded resulting in stale references to the destroyed platform_device instance. Anyway, platform devices should always be registered indicated what devices are present, _not_ what drivers have been configured. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Patchwork: https://patchwork.linux-mips.org/patch/10538/	2015-06-10 10:43:07 +02:00
Ralf Baechle	34b1252bd9	MIPS: Cobalt: Do not build MTD platform device registration code as module. If CONFIG_MTD_PHYSMAP is set to m, the Cobalt mtd.ko module might get unloaded while the drivers/mtd modules are still loaded resulting in stale references to the destroyed platform_device instance. Anyway, platform devices should always be registered indicated what devices are present, _not_ what drivers have been configured. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-10 10:39:30 +02:00
Andy Lutomirski	539f511365	x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code The error_entry/error_exit code to handle gsbase and whether we return to user mdoe was a mess: - error_sti was misnamed. In particular, it did not enable interrupts. - Error handling for gs_change was hopelessly tangled the normal usermode path. Separate it out. This saves a branch in normal entries from kernel mode. - The comments were bad. Fix it up. As a nice side effect, there's now a code path that happens on error entries from user mode. We'll use it soon. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f1be898ab93360169fb845ab85185948832209ee.1433878454.git.luto@kernel.org [ Prettified it, clarified comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-10 08:47:57 +02:00
Denys Vlasenko	a92fde2523	x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation We use three MOVs to swap edx and ecx. We can use one XCHG instead. Expand the comments. It's difficult to keep track which arg# every register corresponds to, so spell it out. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433876051-26604-3-git-send-email-dvlasenk@redhat.com [ Expanded the comments some more. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-10 08:42:13 +02:00
Denys Vlasenko	1536bb46fa	x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() Here it is not obvious why we load pt_regs->cx to %esi etc. Lets improve comments. Explain that here we combine two things: first, we reload registers since some of them are clobbered by the C function we just called; and we also convert 32-bit syscall params to 64-bit C ABI, because we are going to jump back to syscall dispatch code. Move reloading of 6th argument into the macro instead of having it after each of two macro invocations. No actual code changes here. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433876051-26604-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-10 08:42:13 +02:00
Denys Vlasenko	aee4b013a7	x86/asm/entry/32: Fix fallout from the R9 trick removal in the SYSCALL code I put %ebp restoration code too late. Under strace, it is not reached and %ebp is not restored upon return to userspace. This is the fix. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433876051-26604-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-10 08:42:12 +02:00
Aneesh Kumar K.V	cfcb3d80a2	powerpc/mm: Add trace point for tracking hash pte fault This enables us to understand how many hash fault we are taking when running benchmarks. For ex: -bash-4.2# ./perf stat -e powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30 -P -n 1000 ... Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000': 1,10,04,075 powerpc:hash_fault 1,10,03,429 page-faults 30.865978991 seconds time elapsed NOTE: The impact of the tracepoint was not noticeable when running test. It was within the run-time variance of the test. For ex: without-patch: -------------- Performance counter stats for './a.out 3000 300': 643 page-faults # 0.089 M/sec 7.236562 task-clock (msec) # 0.928 CPUs utilized 2,179,213 stalled-cycles-frontend # 0.00% frontend cycles idle 17,174,367 stalled-cycles-backend # 0.00% backend cycles idle 0 context-switches # 0.000 K/sec 0.007794658 seconds time elapsed And with-patch: --------------- Performance counter stats for './a.out 3000 300': 643 page-faults # 0.089 M/sec 7.233746 task-clock (msec) # 0.921 CPUs utilized 0 context-switches # 0.000 K/sec 0.007854876 seconds time elapsed Performance counter stats for './a.out 3000 300': 643 page-faults # 0.087 M/sec 649 powerpc:hash_fault # 0.087 M/sec 7.430376 task-clock (msec) # 0.938 CPUs utilized 2,347,174 stalled-cycles-frontend # 0.00% frontend cycles idle 17,524,282 stalled-cycles-backend # 0.00% backend cycles idle 0 context-switches # 0.000 K/sec 0.007920284 seconds time elapsed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-10 14:06:29 +10:00
Bjorn Helgaas	1dace0116d	x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A The Foxconn K8M890-8237A has two PCI host bridges, and we can't assign resources correctly without the information from _CRS that tells us which address ranges are claimed by which bridge. In the bugs mentioned below, we incorrectly assign a sound card address (this example is from 1033299): bus: 00 index 2 [mem 0x80000000-0xfcffffffff] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f]) pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xbfefffff] (ignored) pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff] (ignored) pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfebfffff] (ignored) ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff]) pci_root PNP0A08:01: host bridge window [mem 0xbff00000-0xbfffffff] (ignored) pci 0000:80:01.0: [1106:3288] type 0 class 0x000403 pci 0000:80:01.0: reg 10: [mem 0xbfffc000-0xbfffffff 64bit] pci 0000:80:01.0: address space collision: [mem 0xbfffc000-0xbfffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff] pci 0000:80:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit] BUG: unable to handle kernel paging request at ffffc90000378000 IP: [<ffffffffa0345f63>] azx_create+0x37c/0x822 [snd_hda_intel] We assigned 0xfd_0000_0000, but that is not in any of the host bridge windows, and the sound card doesn't work. Turn on pci=use_crs automatically for this system. Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/931368 Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1033299 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org	2015-06-09 18:54:07 -05:00
Michael Holzheu	6651ee070b	s390/bpf: implement bpf_tail_call() helper bpf_tail_call() arguments: - ctx......: Context pointer - jmp_table: One of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table - index....: Index in the jump table In this implementation s390x JIT does stack unwinding and jumps into the callee program prologue. Caller and callee use the same stack. With this patch a tail call generates the following code on s390x: if (index >= array->map.max_entries) goto out 000003ff8001c7e4: e31030100016 llgf %r1,16(%r3) 000003ff8001c7ea: ec41001fa065 clgrj %r4,%r1,10,3ff8001c828 if (tail_call_cnt++ > MAX_TAIL_CALL_CNT) goto out; 000003ff8001c7f0: a7080001 lhi %r0,1 000003ff8001c7f4: eb10f25000fa laal %r1,%r0,592(%r15) 000003ff8001c7fa: ec120017207f clij %r1,32,2,3ff8001c828 prog = array->prog[index]; if (prog == NULL) goto out; 000003ff8001c800: eb140003000d sllg %r1,%r4,3 000003ff8001c806: e31310800004 lg %r1,128(%r3,%r1) 000003ff8001c80c: ec18000e007d clgij %r1,0,8,3ff8001c828 Restore registers before calling function 000003ff8001c812: eb68f2980004 lmg %r6,%r8,664(%r15) 000003ff8001c818: ebbff2c00004 lmg %r11,%r15,704(%r15) goto *(prog->bpf_func + tail_call_start); 000003ff8001c81e: e31100200004 lg %r1,32(%r1,%r0) 000003ff8001c824: 47f01006 bc 15,6(%r1) Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-09 11:47:10 -07:00
Firo Yang	a5f56ba3b4	ARM: KVM: Remove pointless void pointer cast No need to cast the void pointer returned by kmalloc() in arch/arm/kvm/mmu.c::kvm_alloc_stage2_pgd(). Signed-off-by: Firo Yang <firogm@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-06-09 18:05:17 +01:00
Herbert Xu	05dee9c7eb	nios2: Export get_cycles nios2 is the only architecture that does not inline get_cycles and does not export it. This breaks crypto as it uses get_cycles in a number of modules. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2015-06-09 22:26:00 +08:00
Dave Hansen	97ac46a508	x86/mpx: Allow 32-bit binaries on 64-bit kernels again Now that the bugs in mixed mode MPX handling are fixed, re-allow 32-bit binaries on 64-bit kernels again. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.70277DAD@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:34 +02:00
Dave Hansen	bea03c50b8	x86/mpx: Do not count MPX VMAs as neighbors when unmapping The comment pretty much says it all. I wrote a test program that does lots of random allocations and forces bounds tables to be created. It came up with a layout like this: .... \| BOUNDS DIRECTORY ENTRY COVERS \| .... \| BOUNDS TABLE COVERS \| \| BOUNDS TABLE \| REAL ALLOC \| BOUNDS TABLE \| Unmapping "REAL ALLOC" should have been able to free the bounds table "covering" the "REAL ALLOC" because it was the last real user. But, the neighboring VMA bounds tables were found, considered as real neighbors, and we declined to free the bounds table covering the area. Doing this over and over left a small but significant number of these orphans. Handling them is fairly straighforward. All we have to do is walk the VMAs and skip all of the MPX ones when looking for neighbors. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.A6BD90BF@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:34 +02:00
Dave Hansen	3ceaccdf92	x86/mpx: Rewrite the unmap code The MPX code needs to clear out bounds tables for memory which is no longer in use. We do this when a userspace mapping is torn down (unmapped). There are two modes: 1. An entire bounds table becomes unused, and can be freed and its pointer removed from the bounds directory. This happens either when a large mapping is torn down, or when a small mapping is torn down and it is the last mapping "covered" by a bounds table. 2. Only part of a bounds table becomes unused, in which case we free the backing memory as if MADV_DONTNEED was called. The old code was a spaghetti mess of "edge" bounds tables where the edges were handled specially, even if we were unmapping an entire one. Non-edge bounds tables are always fully unmapped, but share a different code path from the edge ones. The old code had a bug where it was unmapping too much memory. I worked on fixing it for two days and gave up. I didn't write the original code. I didn't particularly like it, but it worked, so I left it. After my debug session, I realized it was undebuggagle and buggy, so out it went. I also wrote a new unmapping test program which uncovers bugs pretty nicely. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183706.DCAEC67D@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:34 +02:00
Dave Hansen	613fcb7d3c	x86/mpx: Support 32-bit binaries on 64-bit kernels Right now, the kernel can only switch between 64-bit and 32-bit binaries at compile time. This patch adds support for 32-bit binaries on 64-bit kernels when we support ia32 emulation. We essentially choose which set of table sizes to use when doing arithmetic for the bounds table calculations. This also uses a different approach for calculating the table indexes than before. I think the new one makes it much more clear what is going on, and allows us to share more code between the 32-bit and 64-bit cases. Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.E01F21E2@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:34 +02:00
Dave Hansen	6ac52bb491	x86/mpx: Use 32-bit-only cmpxchg() for 32-bit apps user_atomic_cmpxchg_inatomic() actually looks at sizeof(ptr) to figure out how many bytes to copy. If we run it on a 64-bit kernel with a 64-bit pointer, it will copy a 64-bit bounds directory entry. That's fine, except when we have 32-bit programs with 32-bit bounds directory entries and we only want* 32-bits. This patch breaks the cmpxchg() operation out in to its own function and performs the 32-bit type swizzling in there. Note, the "64-bit" version of this code _would_ work on a 32-bit-only kernel. The issue this patch addresses is only for when the kernel's 'long' is mismatched from the size of the bounds directory entry of the process we are working on. The new helper modifies 'actual_old_val' or returns an error. But gcc doesn't know this, so it warns about 'actual_old_val' being unused. Shut it up with an uninitialized_var(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183705.672B115E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:33 +02:00
Dave Hansen	5458765390	x86/mpx: Introduce new 'directory entry' to 'addr' helper function Currently, to get from a bounds directory entry to the virtual address of a bounds table, we simply mask off a few low bits. However, the set of bits we mask off is different for 32-bit and 64-bit binaries. This breaks the operation out in to a helper function and also adds a temporary variable to store the result until we are sure we are returning one. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.007686CE@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:33 +02:00
Dave Hansen	a1149fc83a	x86/mpx: Add temporary variable to reduce masking When we allocate a bounds table, we call mmap(), then add a "valid" bit to the value before storing it in to the bounds directory. If we fail along the way, we go and mask that valid bit _back_ out. That seems a little silly, and this makes it much more clear when we have a plain address versus an actual table _entry_. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.3D69D5F4@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:33 +02:00
Dave Hansen	b0e9b09b3b	x86: Make is_64bit_mm() widely available The uprobes code has a nice helper, is_64bit_mm(), that consults both the runtime and compile-time flags for 32-bit support. Instead of reinventing the wheel, pull it in to an x86 header so we can use it for MPX. I prefer passing the 'mm' around to test_thread_flag(TIF_IA32) because it makes it explicit where the context is coming from. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.F0209999@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	cd4996dce1	x86/mpx: Trace allocation of new bounds tables Bounds tables are a significant consumer of memory. It is important to know when they are being allocated. Add a trace point to trace whenever an allocation occurs and also its virtual address. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183704.EC23A93E@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	2a1dcb1f79	x86/mpx: Trace the attempts to find bounds tables There are two different events being traced here. They are doing similar things so share a trace "EVENT_CLASS" and are presented together. 1. Trace when MPX is zapping pages "mpx_unmap_zap": When MPX can not free an entire bounds table, it will instead try to zap unused parts of a bounds table to free the backing memory. This decreases RSS (resident set size) without decreasing the virtual space allocated for bounds tables. 2. Trace attempts to find bounds tables "mpx_unmap_search": This event traces any time we go looking to unmap a bounds table for a given virtual address range. This is useful to ensure that the kernel actually "tried" to free a bounds table versus times it succeeded in finding one. It might try and fail if it realized that a table was shared with an adjacent VMA which is not being unmapped. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.B9D2468B@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	97efebf1bc	x86/mpx: Trace entry to bounds exception paths There are two basic things that can happen as the result of a bounds exception (#BR): 1. We allocate a new bounds table 2. We pass up a bounds exception to userspace. This patch adds a trace point for the case where we are passing the exception up to userspace with a signal. We are also explicit that we're printing out the inverse of the 'upper' that we encounter. If you want to filter, for instance, you need to ~ the value first. The reason we do this is because of how 'upper' is stored in the bounds table. If a pointer's range is: 0x1000 -> 0x2000 it is stored in the bounds table as (32-bits here for brevity): lower: 0x00001000 upper: 0xffffdfff That is so that an all 0's entry: lower: 0x00000000 upper: 0x00000000 corresponds to the "init" bounds which store a range of: 0x00000000 -> 0xffffffff That is, by far, the common case, and that lets us use the zero page, or deduplicate the memory, etc... The 'upper' stored in the table is gibberish to print by itself, so we print ~upper to get the actual, logical, human-readable value printed out. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.027BB9B0@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:32 +02:00
Dave Hansen	e7126cf5f1	x86/mpx: Trace #BR exceptions This is the first in a series of MPX tracing patches. I've found these extremely useful in the process of debugging applications and the kernel code itself. This exception hooks in to the bounds (#BR) exception very early and allows capturing the key registers which would influence how the exception is handled. Note that bndcfgu/bndstatus are technically still 64-bit registers even in 32-bit mode. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183703.5FE2619A@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:31 +02:00
Dave Hansen	8c3641e957	x86/mpx: Introduce a boot-time disable flag MPX has the _potential_ to cause some issues. Say part of your init system tried to protect one of its components from buffer overflows with MPX. If there were a false positive, it's possible that MPX could keep a system from booting. MPX could also potentially cause performance issues since it is present in hot paths like the unmap path. Allow it to be disabled at boot time. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150607183702.2E8B77AB@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:31 +02:00
Dave Hansen	eb099e5bc5	x86/mpx: Restrict the mmap() size check to bounds tables The comment and code here are confusing. We do not currently allocate the bounds directory in the kernel. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183702.222CEC2A@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:31 +02:00
Qiaowei Ren	3c1d323009	x86/mpx: Remove redundant MPX_BNDCFG_ADDR_MASK MPX_BNDCFG_ADDR_MASK is defined two times, so this patch removes redundant one. Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183702.5F129376@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:30 +02:00
Dave Hansen	46a6e0cf1c	x86/mpx: Clean up the code by not passing a task pointer around when unnecessary The MPX code can only work on the current task. You can not, for instance, enable MPX management in another process or thread. You can also not handle a fault for another process or thread. Despite this, we pass a task_struct around prolifically. This patch removes all of the task struct passing for code paths where the code can not deal with another task (which turns out to be all of them). This has no functional changes. It's just a cleanup. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183702.6A81DA2C@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:30 +02:00
Dave Hansen	a84eeaa96b	x86/mpx: Use the new get_xsave_field_ptr()API The MPX registers (bndcsr/bndcfgu/bndstatus) are not directly accessible via normal instructions. They essentially act as if they were floating point registers and are saved/restored along with those registers. There are two main paths in the MPX code where we care about the contents of these registers: 1. #BR (bounds) faults 2. the prctl() code where we are setting MPX up Both of those paths _might_ be called without the FPU having been used. That means that 'tsk->thread.fpu.state' might never be allocated. Also, fpu_save_init() is not preempt-safe. It was a bug to call it without disabling preemption. The new get_xsave_addr() calls unlazy_fpu() instead and properly disables preemption. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183701.BC0D37CF@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:30 +02:00
Dave Hansen	04cd027bcb	x86/fpu/xstate: Wrap get_xsave_addr() to make it safer The MPX code appears is calling a low-level FPU function (copy_fpregs_to_fpstate()). This function is not able to be called in all contexts, although it is safe to call directly in some cases. Although probably correct, the current code is ugly and potentially error-prone. So, add a wrapper that calls the (slightly) higher-level fpu__save() (which is preempt- safe) and also ensures that we even have an FPU context (in the case that this was called when in lazy FPU mode). Ingo had this to say about the details about when we need preemption disabled: > it's indeed generally unsafe to access/copy FPU registers with preemption enabled, > for two reasons: > > - on older systems that use FSAVE the instruction destroys FPU register > contents, which has to be handled carefully > > - even on newer systems if we copy to FPU registers (which this code doesn't) > then we don't want a context switch to occur in the middle of it, because a > context switch will write to the fpstate, potentially overwriting our new data > with old FPU state. > > But it's safe to access FPU registers with preemption enabled in a couple of > special cases: > > - potentially destructively saving FPU registers: the signal handling code does > this in copy_fpstate_to_sigframe(), because it can rely on the signal restore > side to restore the original FPU state. > > - reading FPU registers on modern systems: we don't do this anywhere at the > moment, mostly to keep symmetry with older systems where FSAVE is > destructive. > > - initializing FPU registers on modern systems: fpu__clear() does this. Here > it's safe because we don't copy from the fpstate. > > - directly writing FPU registers from user-space memory (!). We do this in > fpu__restore_sig(), and it's safe because neither context switches nor > irq-handler FPU use can corrupt the source context of the copy (which is > user-space memory). > > Note that the MPX code's current use of copy_fpregs_to_fpstate() was safe I think, > because: > > - MPX is predicated on eagerfpu, so the destructive F[N]SAVE instruction won't be > used. > > - the code was only reading FPU registers, and was doing it only in places that > guaranteed that an FPU state was already active (i.e. didn't do it in > kthreads) Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: bp@alien8.de Link: http://lkml.kernel.org/r/20150607183700.AA881696@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:29 +02:00
Dave Hansen	0c4109bec0	x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions get_xsave_addr() assumes that if an xsave bit is present in the hardware (pcntxt_mask) that it is present in a given xsave buffer. Due to an bug in the xsave code on all of the systems that have MPX (and thus all the users of this code), that has been a true assumption. But, the bug is getting fixed, so our assumption is not going to hold any more. It's quite possible (and normal) for an enabled state to be present on 'pcntxt_mask', but not in 'xstate_bv'. We need to consult 'xstate_bv'. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150607183700.1E739B34@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 12:24:29 +02:00
Denys Vlasenko	9b47feb708	x86/asm/entry: Clean up entry*.S style, final bits A few bits were missed. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 11:48:24 +02:00
Ingo Molnar	15c1247953	Revert "perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization" This reverts commit `c05199e5a5`. Vince Weaver reported the following crash while perf fuzzing: [ 79.473121] kernel BUG at mm/vmalloc.c:1335! [ 79.694391] Call Trace: [ 79.696997] <IRQ> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50 [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90 [ 79.712414] [<ffffffff810635e5>] __ioremap_caller+0x195/0x350 [ 79.718610] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90 [ 79.725462] [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0 [ 79.732346] [<ffffffff810637b7>] ioremap_nocache+0x17/0x20 [ 79.738283] [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90 [ 79.744945] [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110 [ 79.752020] [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60 [ 79.758832] [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330 [ 79.765519] [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0 [ 79.771481] [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90 [ 79.777858] [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0 [ 79.784418] [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30 [ 79.790820] [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40 [ 79.797546] [<ffffffff8115ef80>] remote_function+0x50/0x60 [ 79.803535] [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180 [ 79.810840] [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60 [ 79.819328] [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0 [ 79.827614] [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80 [ 79.835465] <EOI> [ 79.837543] [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160 [ 79.844377] [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160 [ 79.851015] [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20 [ 79.856791] [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440 [ 79.863165] [<ffffffff816c9ddb>] rest_init+0xbb/0xd0 The offending commit is clearly confused as it moves heavy initialization work into IPI context. Revert it. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Yan, Zheng <zheng.z.yan@intel.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-09 11:44:37 +02:00
Markos Chandras	d7b631419b	MIPS: pgtable-bits: Fix XPA damage to R6 definitions. Commit `be0c37c985` ("MIPS: Rearrange PTE bits into fixed positions.") rearranged the PTE bits into fixed positions in preparation for the XPA support. However, this patch broke R6 since it only took R2 cores into consideration for the RI/XI bits leading to boot failures. We fix this by adding the missing CONFIG_CPU_MIPSR6 definitions Fixes: `be0c37c985` ("MIPS: Rearrange PTE bits into fixed positions.") Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/10208/ Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-09 10:45:05 +02:00
David S. Miller	941742f497	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-06-08 20:06:56 -07:00
Linus Torvalds	5879ae5fd0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix stack allocation in s390 BPF JIT, from Michael Holzheu. 2) Disable LRO on openvswitch paths, from Jiri Benc. 3) UDP early demux doesn't handle multicast group membership properly, fix from Shawn Bohrer. 4) Fix TX queue hang due to incorrect handling of mixed sized fragments and linearlization in i40e driver, from Anjali Singhai Jain. 5) Cannot use disable_irq() in timer handler of AMD xgbe driver, from Thomas Lendacky. 6) b2net driver improperly assumes pci_alloc_consistent() gives zero'd out memory, use dma_zalloc_coherent(). From Sriharsha Basavapatna. 7) Fix use-after-free in MPLS and ipv6, from Robert Shearman. 8) Missing neif_napi_del() calls in cleanup paths of b44 driver, from Hauke Mehrtens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: replace last open coded skb_orphan_frags with function call net: bcmgenet: power on MII block for all MII modes ipv6: Fix protocol resubmission ipv6: fix possible use after free of dev stats b44: call netif_napi_del() bridge: disable softirqs around br_fdb_update to avoid lockup Revert "bridge: use _bh spinlock variant for br_fdb_update to avoid lockup" mpls: fix possible use after free of device be2net: Replace dma/pci_alloc_coherent() calls with dma_zalloc_coherent() bridge: use _bh spinlock variant for br_fdb_update to avoid lockup amd-xgbe: Use disable_irq_nosync from within timer function rhashtable: add missing import <linux/export.h> i40e: Make sure to be in VEB mode if SRIOV is enabled at probe i40e: start up in VEPA mode by default i40e/i40evf: Fix mixed size frags and linearization ipv4/udp: Verify multicast group is ours in upd_v4_early_demux() openvswitch: disable LRO s390/bpf: fix bpf frame pointer setup s390/bpf: fix stack allocation	2015-06-08 17:41:04 -07:00
Ingo Molnar	bace7117d3	x86/asm/entry: (Re-)rename __NR_entry_INT80_compat_max to __NR_syscall_compat_max Brian Gerst noticed that I did a weird rename in the following commit: `b2502b418e` ("x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32") which renamed __NR_ia32_syscall_max to __NR_entry_INT80_compat_max. Now the original name was a misnomer, but the new one is a misnomer as well, as all the 32-bit compat syscall entry points (sysenter, syscall) share the system call table, not just the INT80 based one. Rename it to __NR_syscall_compat_max. Reported-by: Brian Gerst <brgerst@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 23:43:38 +02:00
Denys Vlasenko	eb47854415	x86/asm/entry/32: Reinstate clearing of pt_regs->r8..r11 on EFAULT path I broke this recently when I changed pt_regs->r8..r11 clearing logic in INT 80 code path. There is a branch from SYSENTER/SYSCALL code to INT 80 code: if we fail to retrieve arg6, we return EFAULT. Before this patch, in this case we don't clear pt_regs->r8..r11. This patch fixes this. The resulting code is smaller and simpler. While at it, remove incorrect comment about syscall dispatching CALL insn: it does not use RIP-relative addressing form (the comment was meant to be "TODO: make this rip-relative", and morphed since then, dropping "TODO"). Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433701470-28800-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 23:43:37 +02:00
Linus Torvalds	40b985fbe9	ARM: SoC fixes for v4.1-rc About 10 days worth of small bug fixes, and the (hopefully) final round fixes for from arm-soc land for the -rc cycle. Nothing special to note, but here's a brief summary of fixes by SoC type: - OMAP: small set of misc. DT fixes; boot fix for THUMB2 kernel - mediatek: PMIC fixes; DT fix for model name - exynos: wakeup interupt fixes for 3250 - mvebu: revert mbus patch which broke DMA masters -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVddmWAAoJEFk3GJrT+8Zlk2EP/RFjkrOZYIvO99h6vUQC2YFF aHZxuKqg5PzcBhj5qVl+VlnNyGR29KtHnrCRgcG0Ap8Tm+FN5Vf+AGpuf1NobosX iIkDtcmbtXcfHUTF+oEsYwrSkAW1EjYoQRZu3RGxZ+tXStIauP/b1K8sexkeL5/2 pqyECWGhy7zLWP0p4afl4EbKAgGGPI5VdpPMfvagcwcyoQ1beB3ULHX7tKbFqPEZ CIVcs8s0Y+ENTIYYjXy2o4SjBxmh3Wb2mrRX7yni/AjZ0Z0opBmGZ4VcOTlca0gB wBtVKyQR18BtJCQPW3hKzC4+Y/FtOxkR2fQ32/HbYFXwMzkso4YcSg/ARVdGj3xo tXKVJtcwWIk9P+sUD6hU/9bT/ZKX/timXr8Dpjj522wTTHKb0Rf53kfer5uwHhJ+ mY3NNxotldn/6bb7NH0Q8HflhgPDWM66i92QYLlb24+Ngsm81aZClrLxDS+dUaW+ NRpPVlK7JWR/LHVFXcH28xbOudwSPri5+MDBdsPPui42s6WZl+4qWVdsiDKssA3w mGtNEV2+nq17MXyLoaAfhM0vqDnAbt82DlfudJDGExCmHLVkfZV73CK84NA5IjwY yK0HErz9FkpIye+FI6m8HnfxbEeTSyX5djpO8TjVSPoXsi873jGeXPdVQbYwiT8M EAbvopBzN0puhDp0QDYj =gSjy -----END PGP SIGNATURE----- Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Kevin Hilman: "About 10 days worth of small bug fixes, and the (hopefully) final round fixes for from arm-soc land for the -rc cycle. Nothing special to note, but here's a brief summary of fixes by SoC type: - OMAP: small set of misc DT fixes; boot fix for THUMB2 kernel - mediatek: PMIC fixes; DT fix for model name - exynos: wakeup interupt fixes for 3250 - mvebu: revert mbus patch which broke DMA masters * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: am335x-boneblack: disable RTC-only sleep to avoid hardware damage ARM: dts: AM35xx: fix system control module clocks arm64: dts: mt8173-evb: fix model name ARM: exynos: Fix wake-up interrupts for Exynos3250 ARM: dts: Fix n900 dts file to work around 4.1 touchscreen regression on n900 ARM: dts: Fix dm816x to use right compatible flag for MUSB ARM: OMAP3: Fix booting with thumb2 kernel Revert "bus: mvebu-mbus: make sure SDRAM CS for DMA don't overlap the MBus bridge window" bus: mvebu-mbus: do not set WIN_CTRL_SYNCBARRIER on non io-coherent platforms. ARM: mvebu: armada-xp-linksys-mamba: Disable internal RTC soc: mediatek: Add compile dependency to pmic-wrapper soc: mediatek: PMIC wrap: Fix register state machine handling soc: mediatek: PMIC wrap: Fix clock rate handling	2015-06-08 13:21:58 -07:00
Ingo Molnar	4d7321381e	x86/asm/entry/64: Clean up entry_64.S Make the 64-bit syscall entry code a bit more readable: - use consistent assembly coding style similar to the other entry_*.S files - remove old comments that are not true anymore - eliminate whitespace noise - use consistent vertical spacing - fix various comments - reorganize entry point generation tables to be more readable No code changed: # arch/x86/entry/entry_64.o: text data bss dec hex filename 12282 0 0 12282 2ffa entry_64.o.before 12282 0 0 12282 2ffa entry_64.o.after md5: cbab1f2d727a2a8a87618eeb79f391b7 entry_64.o.before.asm cbab1f2d727a2a8a87618eeb79f391b7 entry_64.o.after.asm Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 20:48:43 +02:00
Ingo Molnar	9dda1658a9	Merge branch 'x86/asm' into x86/core, to prepare for new patch Collect all changes to arch/x86/entry/entry_64.S, before applying patch that changes most of the file. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 20:48:20 +02:00
Josh Stone	04d7e098f5	arm64: fix missing syscall trace exit If a syscall is entered without TIF_SYSCALL_TRACE set, then it goes on the fast path. It's then possible to have TIF_SYSCALL_TRACE added in the middle of the syscall, but ret_fast_syscall doesn't check this flag again. This causes a ptrace syscall-exit-stop to be missed. For instance, from a PTRACE_EVENT_FORK reported during do_fork, the tracer might resume with PTRACE_SYSCALL, setting TIF_SYSCALL_TRACE. Now the completion of the fork should have a syscall-exit-stop. Russell King fixed this on arm by re-checking _TIF_SYSCALL_WORK in the fast exit path. Do the same on arm64. Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Josh Stone <jistone@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-08 18:34:21 +01:00
Kevin Hilman	0a68c6bc7c	Omap fixes for the -rc cycle, including a fix for potential hardware breakage on BeagleBones: - BeagleBones don't support RTC-only mode, it can cause hardware damage if system-power-controller is specified without ti,pmic-shutdown-controller - Fix a recent regression to am3517 SoCs caused by the recent clock move that was not noticed until now despite automated boot testing - Fix a regression for n900 touchscreen triggered by recent recent input changes - Fix compatible property for dm816x USB to avoid errors with USB Ethernet - Fix oops for omap3 when built with CONFIG_THUMB2_KERNEL -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVbLwXAAoJEBvUPslcq6VzGnoQANbB1fMm74/aMuyVW4LoErio ii83kk2pX8EWbePle0rUrrre55LzRhJRr2Mcj4bSniafNqkYr36dQrxErnqwmBtA XawjoVPQB3mG/tBD0oVkzmJtaAXW3GA8IkmQrVe4jUqCh7AjnHYZ5IYjFtGxbJey oyHI48jcxQE1hhNfeTwHOlLhIIPGpRfdE8vYWOlM+rvm/7ZmKCNmnfZzx0XAyLjq rXw3IEgyIMbrbHy8fvdE/t2paWV+kb7urVzS/eu7Zn60CpZ9gwWFz4uENvvN2mDk L78Jz6uxxNrSmGCY+A1LBNWdt7KgiK1GqX/NkI9yc3vvkaJ/aYUh/1zae3pcofpY HNPcGWNAszDpP1xxCvwhNdJWaKWytZbHadTuVcyU86bKnEiu8Ph3Nh//EizNk/fK gpSEzRNPP8oVKY3iUIwtG8CfeiKZHI3EjyYTYM+Z9wg0OMpospX4A6VAiyUpuNO+ DeuAMGC46OhxOperErl4R+qomx2nf7d2FLvJnes/cp5sxM97Qeu7XYqzoqKVyJyU uLKNKmRS71Q1yddQKBVT5nCh5lTw/Mm+qCTHmeelRj52HbtxoT44EzdOr/OhY/0z td07Y+B1SANLrq5r/tBPBrYc0imJPKzD9Woyab7PASE0KxhNqkyAG0zA/9b/zLYo cqnk7D22Hs5ZOm4LyY3Q =Saui -----END PGP SIGNATURE----- Merge tag 'omap-for-v4.1/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Merge omap fixes for v4.1, urgent fix to avoid potential hardware damage From Tony Lindgren: Omap fixes for the -rc cycle, including a fix for potential hardware breakage on BeagleBones: - BeagleBones don't support RTC-only mode, it can cause hardware damage if system-power-controller is specified without ti,pmic-shutdown-controller - Fix a recent regression to am3517 SoCs caused by the recent clock move that was not noticed until now despite automated boot testing - Fix a regression for n900 touchscreen triggered by recent recent input changes - Fix compatible property for dm816x USB to avoid errors with USB Ethernet - Fix oops for omap3 when built with CONFIG_THUMB2_KERNEL * tag 'omap-for-v4.1/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: am335x-boneblack: disable RTC-only sleep to avoid hardware damage ARM: dts: AM35xx: fix system control module clocks ARM: dts: Fix n900 dts file to work around 4.1 touchscreen regression on n900 ARM: dts: Fix dm816x to use right compatible flag for MUSB ARM: OMAP3: Fix booting with thumb2 kernel	2015-06-08 10:32:55 -07:00
Bjorn Helgaas	01d72a9518	PCI: Remove unused pci_dma_burst_advice() pci_dma_burst_advice() was added by `e24c2d963a` ("[PATCH] PCI: DMA bursting advice") but apparently never used. Remove it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michal Simek <monstr@monstr.eu> # microblaze CC: David S. Miller <davem@davemloft.net>	2015-06-08 07:56:43 -05:00
Bjorn Helgaas	d59d36a7fc	PCI: Remove unused pcibios_select_root() (again) `a6c140969b` ("Delete pcibios_select_root") removed pcibios_select_root(). But `a7db504052` ("PCI: remove pcibios_scan_all_fns()") added a few copies back, probably with some incorrect merge conflict resolutions. Remove the still-unused pcibios_select_root() definitions. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2015-06-08 07:56:21 -05:00
Bjorn Helgaas	633adc711d	PCI: Remove unnecessary #includes of <asm/pci.h> In include/linux/pci.h, we already #include <asm/pci.h>, so we don't need to include <asm/pci.h> directly. Remove the unnecessary includes. All the files here already include <linux/pci.h>. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> # sh Acked-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-08 07:56:09 -05:00
Bjorn Helgaas	cd11433eda	PCI: Include <linux/pci.h>, not <asm/pci.h> We already include <asm/pci.h> from <linux/pci.h>, so just include <linux/pci.h> directly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org	2015-06-08 07:55:03 -05:00
Bjorn Helgaas	c4fa9a6ae2	microblaze/PCI: Remove unnecessary struct pci_dev declaration All users of arch/microblaze/include/asm/pci.h get it by including include/linux/pci.h, which in turn includes <asm/pci.h> after it declares struct pci_dev. The forward declaration here is unnecessary, so remove it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Michal Simek <monstr@monstr.eu> Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2015-06-08 10:03:14 +02:00
Bjorn Helgaas	99fdc6b9bd	microblaze/PCI: Remove unnecessary pci_bus_find_capability() declaration pci_bus_find_capability() is declared in include/linux/pci.h. Remove the pci_bus_find_capability() declaration from arch/microblaze/include/asm/pci.h. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Michal Simek <monstr@monstr.eu> Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2015-06-08 10:03:06 +02:00
Bjorn Helgaas	de093d3ea1	microblaze/PCI: Remove unused declarations The following declarations were copied from powerpc but are unused on microblaze: struct pci_controller init_phb_dynamic(struct device_node dn) int remove_phb_dynamic(struct pci_controller phb) struct pci_dev of_create_pci_dev(struct device_node node, ...) void of_scan_pci_bridge(struct device_node node, ...) void of_scan_bus(struct device_node node, struct pci_bus bus) void of_rescan_bus(struct device_node node, struct pci_bus bus) Remove them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Michal Simek <monstr@monstr.eu> Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2015-06-08 10:02:55 +02:00
Michal Simek	55ae2f3b88	microblaze: Label local function static Warnings found by sparse: arch/microblaze/kernel/dma.c:157:5: warning: symbol 'dma_direct_mmap_coherent' was not declared. Should it be static? arch/microblaze/kernel/kgdb.c:35:14: warning: symbol 'pvr' was not declared. Should it be static? Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2015-06-08 09:57:08 +02:00
Michal Simek	60587dbbe6	microblaze: Add missing release version code Add missing release version code for v9.4 and v9.5. Signed-off-by: Michal Simek <michal.simek@xilinx.com>	2015-06-08 09:57:07 +02:00
Ingo Molnar	a49976d14f	x86/asm/entry/32: Clean up entry_32.S Make the 32-bit syscall entry code a bit more readable: - use consistent assembly coding style similar to entry_64.S - remove old comments that are not true anymore - eliminate whitespace noise - use consistent vertical spacing - fix various comments No code changed: # arch/x86/entry/entry_32.o: text data bss dec hex filename 6025 0 0 6025 1789 entry_32.o.before 6025 0 0 6025 1789 entry_32.o.after md5: f3fa16b2b0dca804f052deb6b30ba6cb entry_32.o.before.asm f3fa16b2b0dca804f052deb6b30ba6cb entry_32.o.after.asm Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 09:54:24 +02:00
Ingo Molnar	b2502b418e	x86/asm/entry: Untangle 'system_call' into two entry points: entry_SYSCALL_64 and entry_INT80_32 The 'system_call' entry points differ starkly between native 32-bit and 64-bit kernels: on 32-bit kernels it defines the INT 0x80 entry point, while on 64-bit it's the SYSCALL entry point. This is pretty confusing when looking at generic code, and it also obscures the nature of the entry point at the assembly level. So unangle this by splitting the name into its two uses: system_call (32) -> entry_INT80_32 system_call (64) -> entry_SYSCALL_64 As per the generic naming scheme for x86 system call entry points: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 09:14:21 +02:00
Ingo Molnar	4c8cd0c50d	x86/asm/entry: Untangle 'ia32_sysenter_target' into two entry points: entry_SYSENTER_32 and entry_SYSENTER_compat So the SYSENTER instruction is pretty quirky and it has different behavior depending on bitness and CPU maker. Yet we create a false sense of coherency by naming it 'ia32_sysenter_target' in both of the cases. Split the name into its two uses: ia32_sysenter_target (32) -> entry_SYSENTER_32 ia32_sysenter_target (64) -> entry_SYSENTER_compat As per the generic naming scheme for x86 system call entry points: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 08:47:46 +02:00
Ingo Molnar	2cd23553b4	x86/asm/entry: Rename compat syscall entry points Rename the following system call entry points: ia32_cstar_target -> entry_SYSCALL_compat ia32_syscall -> entry_INT80_compat The generic naming scheme for x86 system call entry points is: entry_MNEMONIC_qualifier where 'qualifier' is one of _32, _64 or _compat. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-08 08:47:36 +02:00
Linus Torvalds	866e6441a4	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "Eight fixes across arch/mips. Nothing stands particuarly out nor is complicated but fixes keep coming in at a higher than comfortable rate" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: KVM: Do not sign extend on unsigned MMIO load MIPS: BPF: Fix stack pointer allocation MIPS: Loongson-3: Fix a cpu-hotplug issue in loongson3_ipi_interrupt() MIPS: Fix enabling of DEBUG_STACKOVERFLOW MIPS: c-r4k: Fix typo in probe_scache() MIPS: Avoid an FPE exception in FCSR mask probing MIPS: ath79: Add a missing new line in log message MIPS: ralink: Fix clearing the illegal access interrupt	2015-06-07 16:56:10 -07:00
Peter Zijlstra	a3d86542de	perf/x86/intel/pebs: Add PEBSv3 decoding PEBSv3 as present on Skylake fixed the long standing issue of the status bits. They now really reflect the events that generated the record. Tested-by: Andi Kleen <ak@linux.intel.com> Tested-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:09:16 +02:00
Kan Liang	f38b0dbb49	perf/x86/intel: Introduce PERF_RECORD_LOST_SAMPLES After enlarging the PEBS interrupt threshold, there may be some mixed up PEBS samples which are discarded by the kernel. This patch makes the kernel emit a PERF_RECORD_LOST_SAMPLES record with the number of possible discarded records when it is impossible to demux the samples. It makes sure the user is not left in the dark about such discards. Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1431285195-14269-8-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:09:02 +02:00
Yan, Zheng	156174999d	perf/intel/x86: Enlarge the PEBS buffer Currently the PEBS buffer size is 4k, it can only hold about 21 PEBS records. This patch enlarges the PEBS buffer size to 64k (the same as the BTS buffer). 64k memory can hold about 330 PEBS records. This will significantly reduce the number of PMIs when batched PEBS interrupts are enabled. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-7-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:57 +02:00
Yan, Zheng	9c964efa43	perf/x86/intel: Drain the PEBS buffer during context switches Flush the PEBS buffer during context switches if PEBS interrupt threshold is larger than one. This allows perf to supply TID for sample outputs. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-6-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:54 +02:00
Yan, Zheng	3569c0d7c5	perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold) PEBS always had the capability to log samples to its buffers without an interrupt. Traditionally perf has not used this but always set the PEBS threshold to one. For frequently occurring events (like cycles or branches or load/store) this in term requires using a relatively high sampling period to avoid overloading the system, by only processing PMIs. This in term increases sampling error. For the common cases we still need to use the PMI because the PEBS hardware has various limitations. The biggest one is that it can not supply a callgraph. It also requires setting a fixed period, as the hardware does not support adaptive period. Another issue is that it cannot supply a time stamp and some other options. To supply a TID it requires flushing on context switch. It can however supply the IP, the load/store address, TSX information, registers, and some other things. So we can make PEBS work for some specific cases, basically as long as you can do without a callgraph and can set the period you can use this new PEBS mode. The main benefit is the ability to support much lower sampling period (down to -c 1000) without extensive overhead. One use cases is for example to increase the resolution of the c2c tool. Another is double checking when you suspect the standard sampling has too much sampling error. Some numbers on the overhead, using cycle soak, comparing the elapsed time from "kernbench -M -H" between plain (threshold set to one) and multi (large threshold). The test command for plain: "perf record --time -e cycles:p -c $period -- kernbench -M -H" The test command for multi: "perf record --no-time -e cycles:p -c $period -- kernbench -M -H" ( The only difference of test command between multi and plain is time stamp options. Since time stamp is not supported by large PEBS threshold, it can be used as a flag to indicate if large threshold is enabled during the test. ) period plain(Sec) multi(Sec) Delta 10003 32.7 16.5 16.2 20003 30.2 16.2 14.0 40003 18.6 14.1 4.5 80003 16.8 14.6 2.2 100003 16.9 14.1 2.8 800003 15.4 15.7 -0.3 1000003 15.3 15.2 0.2 2000003 15.3 15.1 0.1 With periods below 100003, plain (threshold one) cause much more overhead. With 10003 sampling period, the Elapsed Time for multi is even 2X faster than plain. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:49 +02:00
Yan, Zheng	21509084f9	perf/x86/intel: Handle multiple records in the PEBS buffer When the PEBS interrupt threshold is larger than one record and the machine supports multiple PEBS events, the records of these events are mixed up and we need to demultiplex them. Demuxing the records is hard because the hardware is deficient. The hardware has two issues that, when combined, create impossible scenarios to demux. The first issue is that the 'status' field of the PEBS record is a copy of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a problem let us first describe the regular PEBS cycle: A) the CTRn value reaches 0: - the corresponding bit in GLOBAL_STATUS gets set - we start arming the hardware assist < some unspecified amount of time later -- this could cover multiple events of interest > B) the hardware assist is armed, any next event will trigger it C) a matching event happens: - the hardware assist triggers and generates a PEBS record this includes a copy of GLOBAL_STATUS at this moment - if we auto-reload we (re)set CTRn - we clear the relevant bit in GLOBAL_STATUS Now consider the following chain of events: A0, B0, A1, C0 The event generated for counter 0 will include a status with counter 1 set, even though its not at all related to the record. A similar thing can happen with a !PEBS event if it just happens to overflow at the right moment. The second issue is that the hardware will only emit one record for two or more counters if the event that triggers the assist is 'close'. The 'close' can be several cycles. In some cases even the complete assist, if the event is something that doesn't need retirement. For instance, consider this chain of events: A0, B0, A1, B1, C01 Where C01 is an event that triggers both hardware assists, we will generate but a single record, but again with both counters listed in the status field. This time the record pertains to both events. Note that these two cases are different but undistinguishable with the data as generated. Therefore demuxing records with multiple PEBS bits (we can safely ignore status bits for !PEBS counters) is impossible. Furthermore we cannot emit the record to both events because that might cause a data leak -- the events might not have the same privileges -- so what this patch does is discard such events. The assumption/hope is that such discards will be rare. Here lists some possible ways you may get high discard rate. - when you count the same thing multiple times. But it is not a useful configuration. - you can be unfortunate if you measure with a userspace only PEBS event along with either a kernel or unrestricted PEBS event. Imagine the event triggering and setting the overflow flag right before entering the kernel. Then all kernel side events will end up with multiple bits set. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> [ Changelog improvements. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:45 +02:00
Yan, Zheng	43cf76312f	perf/x86/intel: Introduce setup_pebs_sample_data() Move code that sets up the PEBS sample data to a separate function. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-3-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:40 +02:00
Yan, Zheng	851559e35f	perf/x86/intel: Use the PEBS auto reload mechanism when possible When a fixed period is specified, this patch makes perf use the PEBS auto reload mechanism. This makes normal profiling faster, because it avoids one costly MSR write in the PMI handler. However, the reset value will be loaded by hardware assist. There is a small delay compared to the previous non-auto-reload mechanism. The delay time is arbitrary, but very small. The assist cost is 400-800 cycles, assuming common cases with everything cached. The minimum period the patch currently uses is 10000. In that extreme case it can be ~10% if cycles are used. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-2-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:35 +02:00
Stephane Eranian	7b74cfb2ec	perf/x86/intel: add support for PERF_SAMPLE_BRANCH_IND_JUMP This patch enables support for branch sampling filter for indirect jumps (IND_JUMP). It enables LBR IND_JMP filtering where available. There is also software filtering support. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@redhat.com Cc: dsahern@gmail.com Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/1431637800-31061-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 16:08:27 +02:00
Frederic Weisbecker	4eaca0a887	preempt: Use preempt_schedule_context() as the official tracing preemption point preempt_schedule_context() is a tracing safe preemption point but it's only used when CONFIG_CONTEXT_TRACKING=y. Other configs have tracing recursion issues since commit: `b30f0e3ffe` ("sched/preempt: Optimize preemption operations on __schedule() callers") introduced function based preemp_count_*() ops. Lets make it available on all configs and give it a more appropriate name for its new position. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433432349-1021-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:57:42 +02:00
Kan Liang	8cf1a3de97	perf/x86/intel/uncore: Fix CBOX bit wide and UBOX reg on Haswell-EP CBOX counters are increased to 48b on HSX. Correct the MSR address for HSWEP_U_MSR_PMON_CTR0 and HSWEP_U_MSR_PMON_CTL0. See specification in: http://www.intel.com/content/www/us/en/processors/xeon/ xeon-e5-v3-uncore-performance-monitoring.html Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1432645835-7918-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:46:50 +02:00
Andy Shevchenko	7b179b8feb	x86/microcode: Correct CPU family related variable types Change the type of variables and function prototypes to be in alignment with what the x86_() / __x86_() family/model functions return. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-21-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:38:15 +02:00
Borislav Petkov	ee38a90709	x86/microcode: Disable builtin microcode loading on 32-bit for now Andy Shevchenko reported machine freezes when booting latest tip on 32-bit setups. Problem is, the builtin microcode handling cannot really work that early, when we haven't even enabled paging. A proper fix would involve handling that case specially as every other early 32-bit boot case in the microcode loader and would require much more involved changes for which it is too late now, more than a week before the upcoming merge window. So, disable the builtin microcode loading on 32-bit for now. Reported-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-20-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:38:14 +02:00
Borislav Petkov	b72e7464e4	x86/uapi: Do not export <asm/msr-index.h> as part of the user API headers This header containing all MSRs and respective bit definitions got exported to userspace in conjunction with the big UAPI shuffle. But, it doesn't belong in the UAPI headers because userspace can do its own MSR defines and exporting them from the kernel blocks us from doing cleanups/renames in that header. Which is ridiculous - it is not kernel's job to export such a header and keep MSRs list and their names stable. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-19-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:36:04 +02:00
Ingo Molnar	c2f9b0af8b	Merge branch 'x86/ras' into x86/core, to fix conflicts Conflicts: arch/x86/include/asm/irq_vectors.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:35:27 +02:00
Borislav Petkov	c8e56d20f2	x86: Kill CONFIG_X86_HT In talking to Aravind recently about making certain AMD topology attributes available to the MCE injection module, it seemed like that CONFIG_X86_HT thing is more or less superfluous. It is def_bool y, depends on SMP and gets enabled in the majority of .configs - distro and otherwise - out there. So let's kill it and make code behind it depend directly on SMP. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Walter <dwalter@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:44 +02:00
Ashok Raj	243d657eaf	x86/mce: Handle Local MCE events Add the necessary changes to do_machine_check() to be able to process MCEs signaled as local MCEs. Typically, only recoverable errors (SRAR type) will be Signaled as LMCE. The architecture does not restrict to only those errors, however. When errors are signaled as LMCE, there is no need for the MCE handler to perform rendezvous with other logical processors unlike earlier processors that would broadcast machine check errors. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-17-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:15 +02:00
Ashok Raj	88d538672e	x86/mce: Add infrastructure to support Local MCE Initialize and prepare for handling LMCEs. Add a boot-time option to disable LMCEs. Signed-off-by: Ashok Raj <ashok.raj@intel.com> [ Simplify stuff, align statements for better readability, reflow comments; kill unused lmce_clear(); save us an MSR write if LMCE is already enabled. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-16-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:14 +02:00
Ashok Raj	bc12edb873	x86/mce: Add Local MCE definitions Add required definitions to support Local Machine Check Exceptions. Historically, machine check exceptions on Intel x86 processors have been broadcast to all logical processors in the system. Upcoming CPUs will support an opt-in mechanism to request some machine check exceptions be delivered to a single logical processor experiencing the fault. See http://www.intel.com/sdm Volume 3, System Programming Guide, chapter 15 for more information on MSRs and documentation on Local MCE. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1433436928-31903-15-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:33:13 +02:00
Toshi Kani	623dffb2a2	x86/mm/pat: Add set_memory_wt() for Write-Through type Now that reserve_ram_pages_type() accepts the WT type, add set_memory_wt(), set_memory_array_wt() and set_pages_array_wt() in order to be able to set memory to Write-Through page cache mode. Also, extend ioremap_change_attr() to accept the WT type. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-13-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:29:00 +02:00
Toshi Kani	35a5a10411	x86/mm/pat: Extend set_page_memtype() to support Write-Through type As set_memory_wb() calls free_ram_pages_type(), which then calls set_page_memtype() with -1, _PGMT_DEFAULT is used for tracking the WB type. _PGMT_WB is defined but unused. Thus, rename _PGMT_DEFAULT to _PGMT_WB to clarify the usage, and release the slot used by _PGMT_WB. Furthermore, change free_ram_pages_type() to call set_page_memtype() with _PGMT_WB, and get_page_memtype() to return _PAGE_CACHE_MODE_WB for _PGMT_WB. Then, define _PGMT_WT in the freed slot. This allows set_page_memtype() to track the WT type. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-12-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:59 +02:00
Toshi Kani	d1b4bfbfac	x86/mm/pat: Add pgprot_writethrough() Add pgprot_writethrough() for setting page protection flags to Write-Through mode. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:58 +02:00
Toshi Kani	c7c95f19f3	video/fbdev, asm/io.h: Remove ioremap_writethrough() Replace all calls to ioremap_writethrough() with ioremap_wt(). Remove ioremap_writethrough() too. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-10-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:57 +02:00
Toshi Kani	556269c138	arch/*/io.h: Add ioremap_wt() to all architectures Add ioremap_wt() to all arch-specific asm/io.h headers which define ioremap_wc() locally. These headers do not include <asm-generic/iomap.h>. Some of them include <asm-generic/io.h>, but ioremap_wt() is defined for consistency since they define all ioremap_xxx locally. In all architectures without Write-Through support, ioremap_wt() is defined indentical to ioremap_nocache(). frv and m68k already have ioremap_writethrough(). On those we add ioremap_wt() indetical to ioremap_writethrough() and defines ARCH_HAS_IOREMAP_WT in both architectures. The ioremap_wt() interface is exported to drivers. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-9-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:57 +02:00
Toshi Kani	d838270e25	x86/mm, asm-generic: Add ioremap_wt() for creating Write-Through mappings Add ioremap_wt() for creating Write-Through mappings on x86. It follows the same model as ioremap_wc() for multi-arch support. Define ARCH_HAS_IOREMAP_WT in the x86 version of io.h to indicate that ioremap_wt() is implemented on x86. Also update the PAT documentation file to cover ioremap_wt(). Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-8-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:56 +02:00
Toshi Kani	ecb2febaaa	x86/mm: Teach is_new_memtype_allowed() about Write-Through type __ioremap_caller() calls reserve_memtype() and the passed down @new_pcm contains the actual page cache type it reserved in the success case. is_new_memtype_allowed() verifies if converting to the new page cache type is allowed when @pcm (the requested type) is different from @new_pcm. When WT is requested, the caller expects that writes are ordered and uncached. Therefore, enhance is_new_memtype_allowed() to disallow the following cases: - If the request is WT, mapping type cannot be WB - If the request is WT, mapping type cannot be WC Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:55 +02:00
Toshi Kani	0d69bdff45	x86/mm/pat: Change reserve_memtype() for Write-Through type When a target range is in RAM, reserve_ram_pages_type() verifies the requested type. Change it to fail WT and WP requests with -EINVAL since set_page_memtype() is limited to handle three types: WB, WC and UC-. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-6-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:55 +02:00
Toshi Kani	d79a40caf8	x86/mm/pat: Use 7th PAT MSR slot for Write-Through PAT type Assign Write-Through type to the PA7 slot in the PAT MSR when the processor is not affected by PAT errata. The PA7 slot is chosen to improve robustness in the presence of errata that might cause the high PAT bit to be ignored. This way a buggy PA7 slot access will hit the PA3 slot, which is UC, so at worst we lose performance without causing a correctness issue. The following Intel processors are affected by the PAT errata. Errata CPUID ---------------------------------------------------- Pentium 2, A52 family 0x6, model 0x5 Pentium 3, E27 family 0x6, model 0x7, 0x8 Pentium 3 Xenon, G26 family 0x6, model 0x7, 0x8, 0xa Pentium M, Y26 family 0x6, model 0x9 Pentium M 90nm, X9 family 0x6, model 0xd Pentium 4, N46 family 0xf, model 0x0 Instead of making sharp boundary checks, we remain conservative and exclude all Pentium 2, 3, M and 4 family processors. For those, _PAGE_CACHE_MODE_WT is redirected to UC- per the default setup in __cachemode2pte_tbl[]. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: https://lkml.kernel.org/r/1433187393-22688-2-git-send-email-toshi.kani@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:54 +02:00
Borislav Petkov	7202fdb1b3	x86/mm/pat: Remove pat_enabled() checks Now that we emulate a PAT table when PAT is disabled, there's no need for those checks anymore as the PAT abstraction will handle those cases too. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-4-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:53 +02:00
Borislav Petkov	9cd25aac1f	x86/mm/pat: Emulate PAT when it is disabled In the case when PAT is disabled on the command line with "nopat" or when virtualization doesn't support PAT (correctly) - see `9d34cfdf47` ("x86: Don't rely on VMWare emulating PAT MSR correctly"). we emulate it using the PWT and PCD cache attribute bits. Get rid of boot_pat_state while at it. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:52 +02:00
Borislav Petkov	9dac629094	x86/mm/pat: Untangle pat_init() Split it into a BSP and AP version which makes the PAT initialization path actually readable again. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: jgross@suse.com Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 15:28:52 +02:00
Ingo Molnar	138bd56a21	x86/asm/entry/64/compat: Rename ia32entry.S -> entry_64_compat.S So we now have the following system entry code related files, which define the following system call instruction and other entry paths: entry_32.S # 32-bit binaries on 32-bit kernels entry_64.S # 64-bit binaries on 64-bit kernels entry_64_compat.S # 32-bit binaries on 64-bit kernels Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-07 14:56:50 +02:00
Anshuman Khandual	d3cb06e0cd	powerpc/dscr: Add some in-code documentation This patch adds some in-code documentation to the DSCR related code to make it more readable without having any functional change to it. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-07 19:29:15 +10:00
Anshuman Khandual	1db365258a	powerpc/kernel: Rename PACA_DSCR to PACA_DSCR_DEFAULT PACA_DSCR offset macro tracks dscr_default element in the paca structure. Better change the name of this macro to match that of the data element it tracks. Makes the code more readable. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-07 19:29:00 +10:00
Anshuman Khandual	280e109992	powerpc/kernel: Remove the unused extern dscr_default The process context switch code no longer uses dscr_default variable from the sysfs.c file. The variable became unused when we started storing the CPU specific DSCR value in the PACA structure instead. This patch just removes this extern declaration. It was originally added by the following commit. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-07 19:27:26 +10:00
Anshuman Khandual	c952c1c482	powerpc: Fix handling of DSCR related facility unavailable exception Currently DSCR (Data Stream Control Register) can be accessed with mfspr or mtspr instructions inside a thread via two different SPR numbers. One being the user accessible problem state SPR number 0x03 and the other being the privilege state SPR number 0x11. All access through the privilege state SPR number get emulated through illegal instruction exception. Any access through the problem state SPR number raises one facility unavailable exception which sets the thread based dscr_inherit bit and enables DSCR facility through FSCR register thus allowing direct access to DSCR without going through this exception in the future. We set the thread.dscr_inherit bit whether the access was with mfspr or mtspr instruction which is neither correct nor does it match the behaviour through the instruction emulation code path driven from privilege state SPR number. User currently observes two different kind of behaviour when accessing the DSCR through these two SPR numbers. This problem can be observed through these two test cases by replacing the privilege state SPR number with the problem state SPR number. (1) http://ozlabs.org/~anton/junkcode/dscr_default_test.c (2) http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c This patch fixes the problem by making sure that the behaviour visible to the user remains the same irrespective of which SPR number is being used. Inside facility unavailable exception, we check whether it was cuased by a mfspr or a mtspr isntrucction. In case of mfspr instruction, just emulate the instruction. In case of mtspr instruction, set the thread based dscr_inherit bit and also enable the facility through FSCR. All user SPR based mfspr instruction will be emulated till one user SPR based mtspr has been executed. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-07 19:19:57 +10:00
David Gibson	502f159c02	powerpc/eeh: Fix trivial error in eeh_restore_dev_state() Commit `28158cd` "powerpc/eeh: Enhance pcibios_set_pcie_reset_state()" introduced a fix for a problem where certain configurations could lead to pci_reset_function() destroying the state of PCI devices other than the one specified. Unfortunately, the fix has a trivial bug - it calls pci_save_state() again, when it should be calling pci_restore_state(). This corrects the problem. Cc: Gavin Shan <gwshan@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-07 19:11:49 +10:00
Nicholas Mc Guire	ed9244e6c5	MIPS: KVM: Do not sign extend on unsigned MMIO load Fix possible unintended sign extension in unsigned MMIO loads by casting to uint16_t in the case of mmio_needed != 2. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Reviewed-by: James Hogan <james.hogan@imgtec.com> Tested-by: James Hogan <james.hogan@imgtec.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/9985/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-06 10:21:10 +02:00
Markos Chandras	8833bc308b	MIPS: BPF: Fix stack pointer allocation Fix stack pointer offset which could potentially corrupt argument registers in the previous frame. The calculated offset reflects the size of all the registers we need to preserve so there is no need for this erroneous subtraction. [ralf@linux-mips.org: Fixed conflict due to only applying this fix part of the entire series as part of 4.1 fixes.] Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: http://patchwork.linux-mips.org/patch/10527/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-06 10:21:10 +02:00
Huacai Chen	e1fb96e064	MIPS: Loongson-3: Fix a cpu-hotplug issue in loongson3_ipi_interrupt() setup_per_cpu_areas() only setup __per_cpu_offset[] for each possible cpu, but loongson_sysconf.nr_cpus can be greater than possible cpus (due to reserved_cpus_mask). So in loongson3_ipi_interrupt(), percpu access will touch the original varible in .data..percpu section which has been freed. Without this patch, cpu-hotplug will cause memery corruption. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: John Crispin <john@phrozen.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Patchwork: http://patchwork.linux-mips.org/patch/10524/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-06 10:21:10 +02:00
James Hogan	5f35b9cd55	MIPS: Fix enabling of DEBUG_STACKOVERFLOW Commit `334c86c494` ("MIPS: IRQ: Add stackoverflow detection") added kernel stack overflow detection, however it only enabled it conditional upon the preprocessor definition DEBUG_STACKOVERFLOW, which is never actually defined. The Kconfig option is called DEBUG_STACKOVERFLOW, which manifests to the preprocessor as CONFIG_DEBUG_STACKOVERFLOW, so switch it to using that definition instead. Fixes: `334c86c494` ("MIPS: IRQ: Add stackoverflow detection") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Adam Jiang <jiang.adam@gmail.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 2.6.37+ Patchwork: http://patchwork.linux-mips.org/patch/10531/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-06 10:21:09 +02:00
Joshua Kinard	755af33b00	MIPS: c-r4k: Fix typo in probe_scache() Fixes a typo in arch/mips/mm/c-r4k.c's probe_scache(). Signed-off-by: Joshua Kinard <kumba@gentoo.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-06 10:21:09 +02:00
Duc Dang	e1e6e5c4de	arm64: dts: Add APM X-Gene PCIe MSI nodes There is a single MSI block in X-Gene v1 SOC which serves all 5 PCIe ports. Signed-off-by: Duc Dang <dhdang@apm.com> Signed-off-by: Tanmay Inamdar <tinamdar@apm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>	2015-06-05 16:02:53 -05:00
Linus Torvalds	51d0f0cb3a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - early_idt_handlers[] fix that fixes the build with bleeding edge tooling - build warning fix on GCC 5.1 - vm86 fix plus self-test to make it harder to break it again" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode	2015-06-05 10:03:48 -07:00
Linus Torvalds	a0e9c6efa5	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "The biggest chunk of the changes are two regression fixes: a HT workaround fix and an event-group scheduling fix. It's been verified with 5 days of fuzzer testing. Other fixes: - eBPF fix - a BIOS breakage detection fix - PMU driver fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix a refactoring bug perf/x86: Tweak broken BIOS rules during check_hw_exists() perf/x86/intel/pt: Untangle pt_buffer_reset_markers() perf: Disallow sparse AUX allocations for non-SG PMUs in overwrite mode perf/x86: Improve HT workaround GP counter constraint perf/x86: Fix event/group validation perf: Fix race in BPF program unregister	2015-06-05 10:00:53 -07:00
Paolo Bonzini	e80a4a9426	KVM: x86: mark legacy PCI device assignment as deprecated Follow up to commit `e194bbdf36`. Suggested-by: Bandan Das <bsd@redhat.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:39 +02:00
Paolo Bonzini	6d396b5520	KVM: x86: advertise KVM_CAP_X86_SMM ... and we're done. :) Because SMBASE is usually relocated above 1M on modern chipsets, and SMM handlers might indeed rely on 4G segment limits, we only expose it if KVM is able to run the guest in big real mode. This includes any of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:38 +02:00
Paolo Bonzini	699023e239	KVM: x86: add SMM to the MMU role, support SMRAM address space This is now very simple to do. The only interesting part is a simple trick to find the right memslot in gfn_to_rmap, retrieving the address space from the spte role word. The same trick is used in the auditing code. The comment on top of union kvm_mmu_page_role has been stale forever, so remove it. Speaking of stale code, remove pad_for_nice_hex_output too: it was splitting the "access" bitfield across two bytes and thus had effectively turned into pad_for_ugly_hex_output. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:37 +02:00
Paolo Bonzini	9da0e4d5ac	KVM: x86: work on all available address spaces This patch has no semantic change, but it prepares for the introduction of a second address space for system management mode. A new function x86_set_memory_region (and the "slots_lock taken" counterpart __x86_set_memory_region) is introduced in order to operate on all address spaces when adding or deleting private memory slots. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:37 +02:00
Paolo Bonzini	54bf36aac5	KVM: x86: use vcpu-specific functions to read/write/translate GFNs We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to check whether the VCPU is in system management mode and use a different set of memslots. Switch from kvm_* to the newly-introduced kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:36 +02:00
Paolo Bonzini	e4cd1da944	KVM: x86: pass struct kvm_mmu_page to gfn_to_rmap This is always available (with one exception in the auditing code), and with the same auditing exception the level was coming from sp->role.level. Later, the spte's role will also be used to look up the right memslots array. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:35 +02:00
Paolo Bonzini	f481b069e6	KVM: implement multiple address spaces Only two ioctls have to be modified; the address space id is placed in the higher 16 bits of their slot id argument. As of this patch, no architecture defines more than one address space; x86 will be the first. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:35 +02:00
Tomi Valkeinen	8dc0a56529	Merge branch 'ti-dra7-dss' into 4.2/fbdev Merge arch/ changes for TI's DRA7 SoC Display Subsystem.	2015-06-05 16:55:52 +03:00
Denys Vlasenko	7a5a9824c1	x86/asm/entry/32: Remove unnecessary optimization in stub32_clone Really swap arguments #4 and #5 in stub32_clone instead of "optimizing" it into a move. Yes, tls_val is currently unused. Yes, on some CPUs XCHG is a little bit more expensive than MOV. But a cycle or two on an expensive syscall like clone() is way below noise floor, and this optimization is simply not worth the obfuscation of logic. [ There's also ongoing work on the clone() ABI by Josh Triplett that will depend on this change later on. ] Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433339930-20880-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:41:28 +02:00
Denys Vlasenko	5cdc683b7d	x86/asm/entry/32: Explain the stub32_clone logic The reason for copying of %r8 to %rcx is quite non-obvious. Add a comment which explains why it is done. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433339930-20880-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:41:27 +02:00
Ingo Molnar	54ad726c51	x86/asm/entry/32: Improve code readability Make the 64-bit compat 32-bit syscall entry code a bit more readable: - eliminate whitespace noise - use consistent vertical spacing - use consistent assembly coding style similar to entry_64.S - fix various comments No code changed: arch/x86/entry/ia32entry.o: text data bss dec hex filename 1391 0 0 1391 56f ia32entry.o.before 1391 0 0 1391 56f ia32entry.o.after md5: f28501dcc366e68b557313942c6496d6 ia32entry.o.before.asm f28501dcc366e68b557313942c6496d6 ia32entry.o.after.asm Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:22:22 +02:00
Denys Vlasenko	53e9accf0f	x86/asm/entry/32: Do not use R9 in SYSCALL32 entry point SYSENTER and SYSCALL 32-bit entry points differ in handling of arg2 and arg6. SYSENTER: * ecx arg2 * ebp user stack * 0(%ebp) arg6 SYSCALL: * ebp arg2 * esp user stack * 0(%esp) arg6 Sysenter code loads 0(%ebp) to %ebp right away. (This destroys %ebp. It means we do not preserve it on return. It's not causing problems since userspace VDSO code does not depend on it, and SYSENTER insn can't be sanely used outside of VDSO). Syscall code loads 0(%ebp) to %r9. This allows to eliminate one MOV insn (r9 is a register where arg6 should be for 64-bit ABI), but on audit/ptrace code paths this requires juggling of r9 and ebp: (1) ptrace expects arg6 to be in pt_regs->bp; (2) r9 is callee-clobbered register and needs to be saved/restored around calls to C functions. This patch changes syscall code to load 0(%ebp) to %ebp, making it more similar to sysenter code. It's a bit smaller: text data bss dec hex filename 1407 0 0 1407 57f ia32entry.o.before 1391 0 0 1391 56f ia32entry.o To preserve ABI compat, we restore ebp on exit. Run-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433336169-18964-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:22:22 +02:00
Denys Vlasenko	73cbf68791	x86/asm/entry/32: Open-code LOAD_ARGS32 This macro is small, has only three callsites, and one of them is slightly different using a conditional parameter. A few saved lines aren't worth the resulting obfuscation. Generated machine code is identical. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433271842-9139-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:22:22 +02:00
Denys Vlasenko	ef0cd5dc25	x86/asm/entry/32: Open-code CLEAR_RREGS This macro is small, has only four callsites, and one of them is slightly different using a conditional parameter. A few saved lines aren't worth the resulting obfuscation. Generated machine code is identical. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> [ Added comments. ] Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433271842-9139-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:22:22 +02:00
Denys Vlasenko	61b1e3e782	x86/asm/entry/32: Simplify the zeroing of pt_regs->r8..r11 in the int80 code path 32-bit syscall entry points do not save the complete pt_regs struct, they leave some fields uninitialized. However, they must be careful to not leak uninitialized data in pt_regs->r8..r11 to ptrace users. CLEAR_RREGS macro is used to zero these fields out when needed. However, in the int80 code path this zeroing is unconditional. This patch simplifies it by storing zeroes there right away, when pt_regs is constructed on stack. This uses shorter instructions: text data bss dec hex filename 1423 0 0 1423 58f ia32entry.o.before 1407 0 0 1407 57f ia32entry.o Compile-tested. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1433266510-2938-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 13:22:21 +02:00
Andy Lutomirski	5ca6f70f38	x86/asm/entry/64: Remove pointless jump to irq_return INTERRUPT_RETURN turns into a jmp instruction. There's no need for extra indirection. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2f2318653dbad284a59311f13f08cea71298fd7c.1433449436.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 12:42:41 +02:00
Catalin Marinas	addc8120a7	Merge branch 'arm64/psci-rework' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux * 'arm64/psci-rework' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux: arm64: psci: remove ACPI coupling arm64: psci: kill psci_power_state arm64: psci: account for Trusted OS instances arm64: psci: support unsigned return values arm64: psci: remove unnecessary id indirection arm64: smp: consistently use error codes arm64: smp_plat: add get_logical_index arm/arm64: kvm: add missing PSCI include Conflicts: arch/arm64/kernel/smp.c	2015-06-05 11:21:23 +01:00
Marc Zyngier	eb7c11ee3c	arm64: alternative: Work around .inst assembler bugs AArch64 toolchains suffer from the following bug: $ cat blah.S 1: .inst 0x01020304 .if ((. - 1b) != 4) .error "blah" .endif $ aarch64-linux-gnu-gcc -c blah.S blah.S: Assembler messages: blah.S:3: Error: non-constant expression in ".if" statement which precludes the use of msr_s and co as part of alternatives. We workaround this issue by not directly testing the labels themselves, but by moving the current output pointer by a value that should always be zero. If this value is not null, then we will trigger a backward move, which is expclicitely forbidden. This triggers the error we're after: AS arch/arm64/kvm/hyp.o arch/arm64/kvm/hyp.S: Assembler messages: arch/arm64/kvm/hyp.S:1377: Error: attempt to move .org backwards scripts/Makefile.build:294: recipe for target 'arch/arm64/kvm/hyp.o' failed make[1]: *** [arch/arm64/kvm/hyp.o] Error 1 Makefile:946: recipe for target 'arch/arm64/kvm' failed Not pretty, but at least works on the current toolchains. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-05 10:38:54 +01:00
Marc Zyngier	8d883b23ae	arm64: alternative: Merge alternative-asm.h into alternative.h asm/alternative-asm.h and asm/alternative.h are extremely similar, and really deserve to live in the same file (as this makes further modufications a bit easier). Fold the content of alternative-asm.h into alternative.h, and update the few users. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-05 10:38:53 +01:00
Marc Zyngier	7616fc8bcd	arm64: alternative: Allow immediate branch as alternative instruction Since all branches are PC-relative on AArch64, these instructions cannot be used as an alternative with the simplistic approach we currently have (the immediate has been computed from the .altinstr_replacement section, and end-up being completely off if the target is outside of the replacement sequence). This patch handles the branch instructions in a different way, using the insn framework to recompute the immediate, and generate the right displacement in the above case. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-05 10:38:53 +01:00
Marc Zyngier	b0dd9c02d4	arm64: Rework alternate sequence for ARM erratum 845719 The workaround for erratum 845719 is currently using a branch between two alternate sequences, which is quite fragile, and that we are going to break as we rework the alternative code. This patch reworks the workaround to fit in a single alternative sequence. The generated code itself is unchanged. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-05 10:38:52 +01:00
Andy Lutomirski	cf991de2f6	x86/asm/msr: Make wrmsrl_safe() a function The wrmsrl_safe macro performs invalid shifts if the value argument is 32 bits. This makes it unnecessarily awkward to write code that puts an unsigned long into an MSR. Convert it to a real inline function. For inspiration, see: `7c74d5b7b7` ("x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR value"). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: <linux-kernel@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> [ Applied small improvements. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-05 09:41:22 +02:00
Geert Uytterhoeven	f1b3b4450d	powerpc/85xx: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760 Since commit `100832abf0` ("usb: isp1760: Make HCD support optional"), CONFIG_USB_ISP1760_HCD is automatically selected when needed. Enabling that option in the defconfig is now a no-op, and no longer enables ISP1760 HCD support. Re-enable the ISP1760 driver in the defconfig by enabling USB_ISP1760_HOST_ROLE instead. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Scott Wood <scottwood@freescale.com>	2015-06-05 01:54:00 -05:00
Jeremy Kerr	0d7cd8550d	powerpc/powernv: Add opal-prd channel This change adds a char device to access the "PRD" (processor runtime diagnostics) channel to OPAL firmware. Includes contributions from Vaidyanathan Srinivasan, Neelesh Gupta & Vishal Kulkarni. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-05 08:32:21 +10:00
Jeremy Kerr	594fcb9ec9	powerpc/powernv: Expose OPAL APIs required by PRD interface The (upcoming) opal-prd driver needs to access the message notifier and xscom code, so add EXPORT_SYMBOL_GPL macros for these. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-05 08:32:20 +10:00
Jeremy Kerr	48c0615495	powerpc/powernv: Merge common platform device initialisation opal_ipmi_init and opal_flash_init are equivalent, except for the compatbile string. Merge these two into a common opal_pdev_init function. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-05 08:32:20 +10:00
Paolo Bonzini	660a5d517a	KVM: x86: save/load state on SMM switch The big ugly one. This patch adds support for switching in and out of system management mode, respectively upon receiving KVM_REQ_SMI and upon executing a RSM instruction. Both 32- and 64-bit formats are supported for the SMM state save area. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:17:46 +02:00
Alexander Shishkin	b44a2b53be	perf/x86/intel/pt: Fix a refactoring bug Commit `066450be41` ("perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()") changed attribute initialization so that only the first attribute gets initialized using sysfs_attr_init(), which upsets lockdep. This patch fixes the glitch so that all allocated attributes are properly initialized thus fixing the lockdep warning reported by Tvrtko and Imre. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reported-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: <linux-kernel@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-04 16:07:51 +02:00
Paolo Bonzini	cd7764fe9f	KVM: x86: latch INITs while in system management mode Do not process INITs immediately while in system management mode, keep it instead in apic->pending_events. Tell userspace if an INIT is pending when they issue GET_VCPU_EVENTS, and similarly handle the new field in SET_VCPU_EVENTS. Note that the same treatment should be done while in VMX non-root mode. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:51 +02:00
Paolo Bonzini	64d6067057	KVM: x86: stubs for SMM support This patch adds the interface between x86.c and the emulator: the SMBASE register, a new emulator flag, the RSM instruction. It also adds a new request bit that will be used by the KVM_SMI ioctl. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:45 +02:00
Paolo Bonzini	f077825a87	KVM: x86: API changes for SMM support This patch includes changes to the external API for SMM support. Userspace can predicate the availability of the new fields and ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end of the patch series. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:11 +02:00
Paolo Bonzini	a584539b24	KVM: x86: pass the whole hflags field to emulator and back The hflags field will contain information about system management mode and will be useful for the emulator. Pass the entire field rather than just the guest-mode information. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:05 +02:00
Paolo Bonzini	609e36d372	KVM: x86: pass host_initiated to functions that read MSRs SMBASE is only readable from SMM for the VCPU, but it must be always accessible if userspace is accessing it. Thus, all functions that read MSRs are changed to accept a struct msr_data; the host_initiated and index fields are pre-initialized, while the data field is filled on return. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:00 +02:00
Paolo Bonzini	62ef68bb4d	KVM: x86: introduce num_emulated_msrs We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if the host CPU does not support SMM virtualization. Introduce the logic to do that, and also move paravirt MSRs to emulated_msrs for simplicity and to get rid of KVM_SAVE_MSRS_BEGIN. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:00:46 +02:00
Anton Blanchard	18725226af	powerpc/config: Enable bnx2x on ppc64 and pseries defconfigs Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-04 22:33:27 +10:00
Cédric Le Goater	14aae78f08	powerpc/powernv: convert OPAL codes returned by sysparam calls The opal_{get,set}_param calls return internal error codes which need to be translated in errnos in Linux. Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2015-06-04 22:27:56 +10:00
Paolo Bonzini	e69fab5df4	KVM: x86: clear hidden CPU state at reset time This was noticed by Radim while reviewing the implementation of system management mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 10:44:44 +02:00
Paolo Bonzini	ce40cd3fc7	kvm: x86: fix kvm_apic_has_events to check for NULL pointer Malicious (or egregiously buggy) userspace can trigger it, but it should never happen in normal operation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 10:16:17 +02:00
Paolo Bonzini	e194bbdf36	kvm: x86: default legacy PCI device assignment support to "n" VFIO has proved itself a much better option than KVM's built-in device assignment. It is mature, provides better isolation because it enforces ACS, and even the userspace code is being tested on a wider variety of hardware these days than the legacy support. Disable legacy device assignment by default. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 09:51:50 +02:00
Tomi Valkeinen	0c53493866	arm/dts: am57xx-beagle-x15.dts: add HDMI AM57xx Beagle X15 has a HDMI output. This patch adds the device tree nodes required for HDMI. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: devicetree@vger.kernel.org Acked-by: Tony Lindgren <tony@atomide.com>	2015-06-04 09:02:15 +03:00
Tomi Valkeinen	fadf0d0bba	arm/dts: dra72-evm.dts: add HDMI DRA72 EVM has a HDMI output. This patch adds the device tree nodes required for HDMI. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: devicetree@vger.kernel.org Acked-by: Tony Lindgren <tony@atomide.com>	2015-06-04 09:02:14 +03:00
Tomi Valkeinen	95c1cd1392	arm/dts: dra7.dtsi: add DSS support DRA7xxx contains a very similar DSS to OMAP5. The main differences are: * no DSI or RFBI support. * 1 or 2 dedicated video PLLs. * need to do additional configuration to the DRA7 CONTROL module. DRA72xx has only one video PLL, and DRA74xx has two. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: devicetree@vger.kernel.org Acked-by: Tony Lindgren <tony@atomide.com>	2015-06-04 09:02:14 +03:00
Tomi Valkeinen	403ee909e4	ARM: OMAP2+: display: detect DRA7 DSS Add platform code to detect DRA7 DSS. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Tony Lindgren <tony@atomide.com>	2015-06-04 09:02:14 +03:00
Tomi Valkeinen	5b5992ac64	ARM: OMAP: display: change compat names to array Simplify the DSS detection logic by creating a list of the omapdss compat strings, instead of checking each separately with an 'if'. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Tony Lindgren <tony@atomide.com>	2015-06-04 09:02:09 +03:00
Tomi Valkeinen	a3818c6d57	ARM: DRA7: hwmod: set DSS submodule parent hwmods Set DSS core hwmod as the parent for all the DSS submodules. This ensures that the parent hwmods are enabled before any DSS submodules are accessed. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Paul Walmsley <paul@pwsan.com>	2015-06-04 09:01:23 +03:00
Tomi Valkeinen	42121688f9	ARM: DRA7: hwmod: add DMM hwmod description Add DMM hwmod entries for DRA7. This is identical to DMM on OMAP5. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Paul Walmsley <paul@pwsan.com>	2015-06-04 09:01:23 +03:00
Tomi Valkeinen	b21a9c3ee8	arm/dts: dra7xx: add 'ti,set-rate-parent' for dss_dss_clk We need set-rate-parent flags for the display's clock path so that the DSS driver can change the clock rate of the PLL. This patchs adds the ti,set-rate-parent flag to 'dss_dss_clk' clock node, which is only a gate clock, allowing the setting of the clock rate to propagate to the PLL. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: devicetree@vger.kernel.org Acked-by: Tero Kristo <t-kristo@ti.com>	2015-06-04 09:01:23 +03:00
Ingo Molnar	00398a0018	x86/asm/entry: Move the vsyscall code to arch/x86/entry/vsyscall/ The vsyscall code is entry code too, so move it to arch/x86/entry/vsyscall/. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-04 07:37:37 +02:00
Ingo Molnar	1f57d5d85b	x86/asm/entry: Move the arch/x86/syscalls/ definitions to arch/x86/entry/syscalls/ The build time generated syscall definitions are entry code related, move them into the arch/x86/entry/ directory. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-04 07:37:37 +02:00
Ingo Molnar	d36f947904	x86/asm/entry: Move arch/x86/include/asm/calling.h to arch/x86/entry/ asm/calling.h is private to the entry code, make this more apparent by moving it to the new arch/x86/entry/ directory. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-04 07:37:36 +02:00
Ingo Molnar	e6b93f4e48	x86/asm/entry: Move the 'thunk' functions to arch/x86/entry/ These are all calling x86 entry code functions, so move them close to other entry code. Change lib-y to obj-y: there's no real difference between the two as we don't really drop any of them during the linking stage, and obj-y is the more common approach for core kernel object code. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-04 07:37:33 +02:00
Michael Holzheu	88aeca15d6	s390/bpf: fix bpf frame pointer setup Currently the bpf frame pointer is set to the old r15. This is wrong because of packed stack. Fix this and adjust the frame pointer to respect packed stack. This now generates a prolog like the following: 3ff8001c3fa: eb67f0480024 stmg %r6,%r7,72(%r15) 3ff8001c400: ebcff0780024 stmg %r12,%r15,120(%r15) 3ff8001c406: b904001f lgr %r1,%r15 <- load backchain 3ff8001c40a: 41d0f048 la %r13,72(%r15) <- load adjusted bfp 3ff8001c40e: a7fbfd98 aghi %r15,-616 3ff8001c412: e310f0980024 stg %r1,152(%r15) <- save backchain Fixes: `0546231057` ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-03 19:31:39 -07:00
Michael Holzheu	bbac1c9488	s390/bpf: fix stack allocation On s390x we have to provide 160 bytes stack space before we can call the next function. From the 160 bytes that we got from the previous function we only use 11 * 8 bytes and have 160 - 11 * 8 bytes left. Currently for BPF we allocate additional 160 - 11 * 8 bytes for the next function. This is wrong because then the next function only gets: (160 - 11 * 8) + (160 - 11 * 8) = 2 * 72 = 144 bytes Fix this and allocate enough memory for the next function. Fixes: `0546231057` ("s390/bpf: Add s390x eBPF JIT compiler backend") Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-03 19:31:39 -07:00
Ingo Molnar	d603c8e184	x86/asm/entry, x86/vdso: Move the vDSO code to arch/x86/entry/vdso/ Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 18:51:37 +02:00
Ingo Molnar	19a433f451	x86/asm/entry: Move the compat syscall entry code to arch/x86/entry/ Move the ia32entry.S file over into arch/x86/entry/. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 18:51:32 +02:00
Ingo Molnar	905a36a285	x86/asm/entry: Move entry_64.S and entry_32.S to arch/x86/entry/ Create a new directory hierarchy for the low level x86 entry code: arch/x86/entry/* This will host all the low level glue that is currently scattered all across arch/x86/. Start with entry_64.S and entry_32.S. Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 18:51:28 +02:00
Marc Zyngier	10b48f7ef2	arm64: insn: Add aarch64_{get,set}_branch_offset In order to deal with branches located in alternate sequences, but pointing to the main kernel text, it is required to extract the relative displacement encoded in the instruction, and to be able to update said instruction with a new offset (once it is known). For this, we introduce three new helpers: - aarch64_insn_is_branch_imm is a predicate indicating if the instruction is an immediate branch - aarch64_get_branch_offset returns a signed value representing the byte offset encoded in a branch instruction - aarch64_set_branch_offset takes an instruction and an offset, and returns the corresponding updated instruction. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2015-06-03 15:43:24 +01:00
Paolo Bonzini	f71f81d70a	KVM: s390: Fix and cleanup for 4.2 (kvm/next) One small fix for a commit targetted for 4.2 and one cleanup regarding our printks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJVbWMvAAoJEBF7vIC1phx8G9IP/1VxA0aXRyUEA0KgHIbkWYJb DPXpznubJqgWTYGMWM9R8/frIwH3rnJZMdWDNtu3SMULQTC9HzTMcl5PAe+XEe9F +oxU4IgUdvcbZkI49rnHn+3n99vhnQS5emmX7ivPV2YQrWFVC36gAOMlNe+S40fJ SJ4iXANo+3LT6MaeD67Kcb+nLsrGTTP+6RtNthc4yV14fYPLdafy8+5BAvMZfLRn xWS9In8zqQtCnaB4eJ08C4D7MuNL6yIu3s54PLunKVlvCayxThsFNk+al/QwyS74 6vJZLCFX55RLSBZLkEYH6b2k1ckF//ZgLOL29sLIHwi2Ry01guZ43PjjRa/jdkbj cOq5rDsfcfKp8sIMJhGF5Y/UneaqBW+/vAfQrIHANDwUcCkfFh95/Gv/nF5KrcsO 0pvzi+SSnu7Y2hWL5iJIvrHAclMazEHewWnubur7UTgkzPWxA35gfBqwZir5q/pI cG2AELzjERWYWIip4hT2z1UGSKZQNYOddrmZxN6noj0MCyIdnq/wuklOr9y5HTif ei+k0xtaViEES4vlc0H6Jo5Cplgv28nxXBemAtNwCCL8iGVJbM7JJcbclrcIkqgc AgIWSTd8ZsUqBZUWLX37CXqhdym1LmyE3r1A/eV42NXFWatZnDbCzy9k10y9RVGX /i5OFil2B640rmFUEMHM =t6ws -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20150602' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next KVM: s390: Fix and cleanup for 4.2 (kvm/next) One small fix for a commit targetted for 4.2 and one cleanup regarding our printks.	2015-06-03 14:51:02 +02:00
Tomi Valkeinen	2d5a3c803d	arm: dra7: add DESHDCP clock Add a new Linux clock for DRA7 based SoCs to control DESHDCP clock. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Tero Kristo <t-kristo@ti.com>	2015-06-03 14:23:25 +03:00
Stephen Rothwell	d6472302f2	x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so remove it from there and fix up the resulting build problems triggered on x86 {64\|32}-bit {def\|allmod\|allno}configs. The breakages were triggering in places where x86 builds relied on vmalloc() facilities but did not include <linux/vmalloc.h> explicitly and relied on the implicit inclusion via <asm/io.h>. Also add: - <linux/init.h> to <linux/io.h> - <asm/pgtable_types> to <asm/io.h> ... which were two other implicit header file dependencies. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [ Tidied up the changelog. ] Acked-by: David Miller <davem@davemloft.net> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Colin Cross <ccross@android.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: James E.J. Bottomley <JBottomley@odin.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Suma Ramars <sramars@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 12:02:00 +02:00
Ingo Molnar	6471b825c4	x86/kconfig: Reorganize arch feature Kconfig select's Peter Zijstra noticed that in arch/x86/Kconfig there are a lot of X86_{32,64} clauses in the X86 symbol, plus there are a number of similar selects in the X86_32 and X86_64 config definitions as well - which all overlap in an inconsistent mess. So: - move all select's from X86_32 and X86_64 to the X64 config option - sort their names, so that duplications are easier to spot - align their if clauses, so that they are easier to identify at a glance - and so that weirdnesses stand out more No change in functionality: 105 insertions(+) 105 deletions(-) Originally-from: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150602153027.GU3644@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 10:08:52 +02:00
Ingo Molnar	71966f3a0b	Merge branch 'locking/core' into x86/core, to prepare for dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 10:07:35 +02:00
Ingo Molnar	34e7724c07	Merge branches 'x86/mm', 'x86/build', 'x86/apic' and 'x86/platform' into x86/core, to apply dependent patch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-06-03 10:05:18 +02:00
Maciej W. Rozycki	90b712ddab	MIPS: Avoid an FPE exception in FCSR mask probing Use the default FCSR value in mask probing, avoiding an FPE exception where reset has left any exception enable and their corresponding cause bits set and the register is then rewritten with these bits active. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Joshua Kinard <kumba@gentoo.org> Cc: linux-mips@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2015-06-03 09:50:29 +02:00

... 2 3 4 5 6 ...

110203 Commits