Commit Graph

14537 Commits

Author SHA1 Message Date
Linus Torvalds
f689b742f2 powerpc updates for 4.5
- Ground work for the new Power9 MMU from Aneesh Kumar K.V
  - Optimise FP/VMX/VSX context switching from Anton Blanchard
 
  - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica Gupta,
    Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling, Andrew Donnellan
  - Allow wrapper to work on non-english system from Laurent Vivier
  - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
  - Fix module autoload for rackmeter & axonram drivers from Luis de Bethencourt
  - Include KVM guest test in all interrupt vectors from Paul Mackerras
  - Fix DSCR inheritance over fork() from Anton Blanchard
  - Make value-returning atomics & {cmp}xchg* & their atomic_ versions fully ordered from Boqun Feng
  - Print MSR TM bits in oops messages from Michael Neuling
  - Add TM signal return & invalid stack selftests from Michael Neuling
  - Limit EPOW reset event warnings from Vipin K Parashar
  - Remove the Cell QPACE code from Rashmica Gupta
  - Append linux_banner to exception information in xmon from Rashmica Gupta
  - Add selftest to check if VSRs are corrupted from Rashmica Gupta
  - Remove broken GregorianDay() from Daniel Axtens
  - Import Anton's context_switch2 benchmark into selftests from Michael Ellerman
  - Add selftest script to test HMI functionality from Daniel Axtens
  - Remove obsolete OPAL v2 support from Stewart Smith
  - Make enter_rtas() private from Michael Ellerman
  - PPR exception cleanups from Michael Ellerman
  - Add page soft dirty tracking from Laurent Dufour
  - Add support for Nvlink NPUs from Alistair Popple
  - Add support for kexec on 476fpe from Alistair Popple
  - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
  - Copy only required pieces of the mm_context_t to the paca from Michael Neuling
  - Add a kmsg_dumper that flushes OPAL console output on panic from Russell Currey
  - Implement save_stack_trace_regs() to enable kprobe stack tracing from Steven Rostedt
  - Add HWCAP bits for Power9 from Michael Ellerman
  - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
  - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
  - scripts/recordmcount.pl: support data in text section on powerpc from Ulrich Weigand
  - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand
 
  - cxl: Fix possible idr warning when contexts are released from Vaibhav Jain
  - cxl: use correct operator when writing pcie config space values from Andrew Donnellan
  - cxl: Fix DSI misses when the context owning task exits from Vaibhav Jain
  - cxl: fix build for GCC 4.6.x from Brian Norris
  - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
  - cxl: Enable PCI device ID for future IBM CXL adapter from Uma Krishnan
 
  - Freescale updates from Scott: Highlights include moving QE code out of
    arch/powerpc (to be shared with arm), device tree updates, and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmIxeAAoJEFHr6jzI4aWAA+cQAIXAw4WfVWJ2V4ZK+1eKfB57
 fdXG71PuXG+WYIWy71ly8keLHdzzD1NQ2OUB64bUVRq202nRgVc15ZYKRJ/FE/sP
 SkxaQ2AG/2kI2EflWshOi0Lu9qaZ+LMHJnszIqE/9lnGSB2kUI/cwsSXgziiMKXR
 XNci9v14SdDd40YV/6BSZXoxApwyq9cUbZ7rnzFLmz4hrFuKmB/L3LABDF8QcpH7
 sGt/YaHGOtqP0UX7h5KQTFLGe1OPvK6NWixSXeZKQ71ED6cho1iKUEOtBA9EZeIN
 QM5JdHFWgX8MMRA0OHAgidkSiqO38BXjmjkVYWoIbYz7Zax3ThmrDHB4IpFwWnk3
 l7WBykEXY7KEqpZzbh0GFGehZWzVZvLnNgDdvpmpk/GkPzeYKomBj7ZZfm3H1yGD
 BTHPwuWCTX+/K75yEVNO8aJO12wBg7DRl4IEwBgqhwU8ga4FvUOCJkm+SCxA1Dnn
 qlpS7qPwTXNIEfKMJcxp5X0KiwDY1EoOotd4glTN0jbeY5GEYcxe+7RQ302GrYxP
 zcc8EGLn8h6BtQvV3ypNHF5l6QeTW/0ZlO9c236tIuUQ5gQU39SQci7jQKsYjSzv
 BB1XdLHkbtIvYDkmbnr1elbeJCDbrWL9rAXRUTRyfuCzaFWTfZmfVNe8c8qwDMLk
 TUxMR/38aI7bLcIQjwj9
 =R5bX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Core:
   - Ground work for the new Power9 MMU from Aneesh Kumar K.V
   - Optimise FP/VMX/VSX context switching from Anton Blanchard

  Misc:
   - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica
     Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling,
     Andrew Donnellan
   - Allow wrapper to work on non-english system from Laurent Vivier
   - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
   - Fix module autoload for rackmeter & axonram drivers from Luis de
     Bethencourt
   - Include KVM guest test in all interrupt vectors from Paul Mackerras
   - Fix DSCR inheritance over fork() from Anton Blanchard
   - Make value-returning atomics & {cmp}xchg* & their atomic_ versions
     fully ordered from Boqun Feng
   - Print MSR TM bits in oops messages from Michael Neuling
   - Add TM signal return & invalid stack selftests from Michael Neuling
   - Limit EPOW reset event warnings from Vipin K Parashar
   - Remove the Cell QPACE code from Rashmica Gupta
   - Append linux_banner to exception information in xmon from Rashmica
     Gupta
   - Add selftest to check if VSRs are corrupted from Rashmica Gupta
   - Remove broken GregorianDay() from Daniel Axtens
   - Import Anton's context_switch2 benchmark into selftests from
     Michael Ellerman
   - Add selftest script to test HMI functionality from Daniel Axtens
   - Remove obsolete OPAL v2 support from Stewart Smith
   - Make enter_rtas() private from Michael Ellerman
   - PPR exception cleanups from Michael Ellerman
   - Add page soft dirty tracking from Laurent Dufour
   - Add support for Nvlink NPUs from Alistair Popple
   - Add support for kexec on 476fpe from Alistair Popple
   - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
   - Copy only required pieces of the mm_context_t to the paca from
     Michael Neuling
   - Add a kmsg_dumper that flushes OPAL console output on panic from
     Russell Currey
   - Implement save_stack_trace_regs() to enable kprobe stack tracing
     from Steven Rostedt
   - Add HWCAP bits for Power9 from Michael Ellerman
   - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
   - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
   - scripts/recordmcount.pl: support data in text section on powerpc
     from Ulrich Weigand
   - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand

  cxl:
   - cxl: Fix possible idr warning when contexts are released from
     Vaibhav Jain
   - cxl: use correct operator when writing pcie config space values
     from Andrew Donnellan
   - cxl: Fix DSI misses when the context owning task exits from Vaibhav
     Jain
   - cxl: fix build for GCC 4.6.x from Brian Norris
   - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
   - cxl: Enable PCI device ID for future IBM CXL adapter from Uma
     Krishnan

  Freescale:
   - Freescale updates from Scott: Highlights include moving QE code out
     of arch/powerpc (to be shared with arm), device tree updates, and
     minor fixes"

* tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits)
  powerpc/module: Handle R_PPC64_ENTRY relocations
  scripts/recordmcount.pl: support data in text section on powerpc
  powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
  powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff
  powerpc/mm: Fix _PAGE_PTE breaking swapoff
  cxl: Enable PCI device ID for future IBM CXL adapter
  cxl: use -Werror only with CONFIG_PPC_WERROR
  cxl: fix build for GCC 4.6.x
  powerpc: Add HWCAP bits for Power9
  powerpc/powernv: Reserve PE#0 on NPU
  powerpc/powernv: Change NPU PE# assignment
  powerpc/powernv: Fix update of NVLink DMA mask
  powerpc/powernv: Remove misleading comment in pci.c
  powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
  powerpc: Fix build break due to paca mm_context_t changes
  cxl: Fix DSI misses when the context owning task exits
  MAINTAINERS: Update Scott Wood's e-mail address
  powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
  powerpc: Fix style of self-test config prompts
  powerpc/powernv: Only delay opal_rtc_read() retry when necessary
  ...
2016-01-15 13:18:47 -08:00
Linus Torvalds
875fc4f5dd Merge branch 'akpm' (patches from Andrew)
Merge first patch-bomb from Andrew Morton:

 - A few hotfixes which missed 4.4 becasue I was asleep.  cc'ed to
   -stable

 - A few misc fixes

 - OCFS2 updates

 - Part of MM.  Including pretty large changes to page-flags handling
   and to thp management which have been buffered up for 2-3 cycles now.

  I have a lot of MM material this time.

[ It turns out the THP part wasn't quite ready, so that got dropped from
  this series  - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  zsmalloc: reorganize struct size_class to pack 4 bytes hole
  mm/zbud.c: use list_last_entry() instead of list_tail_entry()
  zram/zcomp: do not zero out zcomp private pages
  zram: pass gfp from zcomp frontend to backend
  zram: try vmalloc() after kmalloc()
  zram/zcomp: use GFP_NOIO to allocate streams
  mm: add tracepoint for scanning pages
  drivers/base/memory.c: fix kernel warning during memory hotplug on ppc64
  mm/page_isolation: use macro to judge the alignment
  mm: fix noisy sparse warning in LIBCFS_ALLOC_PRE()
  mm: rework virtual memory accounting
  include/linux/memblock.h: fix ordering of 'flags' argument in comments
  mm: move lru_to_page to mm_inline.h
  Documentation/filesystems: describe the shared memory usage/accounting
  memory-hotplug: don't BUG() in register_memory_resource()
  hugetlb: make mm and fs code explicitly non-modular
  mm/swapfile.c: use list_for_each_entry_safe in free_swap_count_continuations
  mm: /proc/pid/clear_refs: no need to clear VM_SOFTDIRTY in clear_soft_dirty_pmd()
  mm: make sure isolate_lru_page() is never called for tail page
  vmstat: make vmstat_updater deferrable again and shut down on idle
  ...
2016-01-15 11:41:44 -08:00
Paolo Bonzini
171b5682aa Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD 2016-01-15 17:49:39 +01:00
Linus Torvalds
0f0836b7eb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:

 - RO/NX attribute fixes for patch module relocations from Josh
   Poimboeuf.  As part of this effort, module.c has been cleaned up as
   well and livepatching is piggy-backing on this cleanup.  Rusty is OK
   with this whole lot going through livepatching tree.

 - symbol disambiguation support from Chris J Arges.  That series is
   also

        Reviewed-by: Miroslav Benes <mbenes@suse.cz>

   but this came in only after I've alredy pushed out.  Didn't want to
   rebase because of that, hence I am mentioning it here.

 - symbol lookup fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: Cleanup module page permission changes
  module: keep percpu symbols in module's symtab
  module: clean up RO/NX handling.
  module: use a structure to encapsulate layout.
  gcov: use within_module() helper.
  module: Use the same logic for setting and unsetting RO/NX
  livepatch: function,sympos scheme in livepatch sysfs directory
  livepatch: add sympos as disambiguator field to klp_reloc
  livepatch: add old_sympos as disambiguator field to klp_func
2016-01-14 16:38:02 -08:00
Vladimir Davydov
5d097056c9 kmemcg: account certain kmem allocations to memcg
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
Greg Kurz
b4d7f161fe KVM: PPC: Fix ONE_REG AltiVec support
The get and set operations got exchanged by mistake when moving the
code from book3s.c to powerpc.c.

Fixes: 3840edc803
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-01-14 14:43:35 +11:00
Linus Torvalds
d080827f85 libnvdimm for 4.5
1/ Media error handling: The 'badblocks' implementation that originated
    in md-raid is up-levelled to a generic capability of a block device.
    This initial implementation is limited to being consulted in the pmem
    block-i/o path.  Later, 'badblocks' will be consulted when creating
    dax mappings.
 
 2/ Raw block device dax: For virtualization and other cases that want
    large contiguous mappings of persistent memory, add the capability to
    dax-mmap a block device directly.
 
 3/ Increased /dev/mem restrictions: Add an option to treat all io-memory
    as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access while a driver is
    actively using an address range.  This behavior is controlled via the
    new CONFIG_IO_STRICT_DEVMEM option and can be overridden by the
    existing "iomem=relaxed" kernel command line option.
 
 4/ Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
    block device shutdown crash fix, and other small libnvdimm fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWlrhjAAoJEB7SkWpmfYgCFbAQALKsQfFwT6JFS+zlPgiNpbqw
 2VMNKEH0AfGYGj96mT02j2q+vSUmXLMIDMTsbe0sDdtwFZtQbFmhmryzPWUVppSu
 KGTlLPW8vuEhQVs91+UI3BQKkvpi0+tbR8hPOh9W6QhjpRT+lyHFKnsNR5HZy5wB
 K4/VMaT5ffd5/pXRTjkYiPQYTwWyfcvNjICj0YtqhPvOwS031m77JpFsWJ8HSpEX
 K99VlzNUPMXd1pYkHmFNXWw52fhRGNhwAEomLeKMdQfKms+KnbKp8BOSA0aCqU8E
 kpujQcilDXJwykFQZOFI3Z5Dxvrv8lxFTU8HRMBvo3ESzfTWjfqcvyjGOjDUcruw
 ihESFSJtdZzhrBiMnf9RRqSpMFJvAT8MVT6Q4D3mZUHCMPbUqFJsQjMPt9hEH3ho
 4F0D2lesOCkubUKFTZmjMoDb+szuKbVhYK8TeFVVEhizinc/Aj0NKuazJqi+CXB/
 xh0ER4ZxD8wvzqFFWvS5UvR1G9I5fr7+3jGRUrqGLHlSdeXP9dkEg28ao3QbWk3x
 1dPOen6ZqQ9WJ/E7eGmXbVEz2R4Xd79hMXQzdQwmKDk/KbxRoAp7hyU8BslAyrBf
 HCdmVt+RAgrxZYfFRXuLhqwEBThJnNrgZA3qu74FUpkpFg6xRUu1bAYBiF7N+bFi
 82b5UbMkveBTtkXjJoiR
 =7V5r
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The bulk of this has appeared in -next and independently received a
  build success notification from the kbuild robot.  The 'for-4.5/block-
  dax' topic branch was rebased over the weekend to drop the "block
  device end-of-life" rework that Al would like to see re-implemented
  with a notifier, and to address bug reports against the badblocks
  integration.

  There is pending feedback against "libnvdimm: Add a poison list and
  export badblocks" received last week.  Linda identified some localized
  fixups that we will handle incrementally.

  Summary:

   - Media error handling: The 'badblocks' implementation that
     originated in md-raid is up-levelled to a generic capability of a
     block device.  This initial implementation is limited to being
     consulted in the pmem block-i/o path.  Later, 'badblocks' will be
     consulted when creating dax mappings.

   - Raw block device dax: For virtualization and other cases that want
     large contiguous mappings of persistent memory, add the capability
     to dax-mmap a block device directly.

   - Increased /dev/mem restrictions: Add an option to treat all
     io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access
     while a driver is actively using an address range.  This behavior
     is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be
     overridden by the existing "iomem=relaxed" kernel command line
     option.

   - Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
     block device shutdown crash fix, and other small libnvdimm fixes"

* tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits)
  block: kill disk_{check|set|clear|alloc}_badblocks
  libnvdimm, pmem: nvdimm_read_bytes() badblocks support
  pmem, dax: disable dax in the presence of bad blocks
  pmem: fail io-requests to known bad blocks
  libnvdimm: convert to statically allocated badblocks
  libnvdimm: don't fail init for full badblocks list
  block, badblocks: introduce devm_init_badblocks
  block: clarify badblocks lifetime
  badblocks: rename badblocks_free to badblocks_exit
  libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h
  libnvdimm: Add a poison list and export badblocks
  nfit_test: Enable DSMs for all test NFITs
  md: convert to use the generic badblocks code
  block: Add badblock management for gendisks
  badblocks: Add core badblock management code
  block: fix del_gendisk() vs blkdev_ioctl crash
  block: enable dax for raw block devices
  block: introduce bdev_file_inode()
  restrict /dev/mem to idle io memory ranges
  arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug
  ...
2016-01-13 19:15:14 -08:00
Michael Ellerman
be6bfc29bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include moving QE code out of arch/powerpc (to be shared with
arm), device tree updates, and minor fixes."
2016-01-14 09:55:01 +11:00
Linus Torvalds
aee3bfa330 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from Davic Miller:

 1) Support busy polling generically, for all NAPI drivers.  From Eric
    Dumazet.

 2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

 3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

 6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

 9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers.  From Kalle Valo.

10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation.  From Lorenzo Colitti.

11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

14) Dead route support in MPLS, from Roopa Prabhu.

15) TSO support for thunderx chips, from Sunil Goutham.

16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack.  From Tom Herbert.

18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
  net: bnxt: always return values from _bnxt_get_max_rings
  net: bpf: reject invalid shifts
  phonet: properly unshare skbs in phonet_rcv()
  dwc_eth_qos: Fix dma address for multi-fragment skbs
  phy: remove an unneeded condition
  mdio: remove an unneed condition
  mdio_bus: NULL dereference on allocation error
  net: Fix typo in netdev_intersect_features
  net: freescale: mac-fec: Fix build error from phy_device API change
  net: freescale: ucc_geth: Fix build error from phy_device API change
  bonding: Prevent IPv6 link local address on enslaved devices
  IB/mlx5: Add flow steering support
  net/mlx5_core: Export flow steering API
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Managing root flow table
  ...
2016-01-12 18:57:02 -08:00
Linus Torvalds
c597b6bcd5 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Algorithms:
   - Add RSA padding algorithm

  Drivers:
   - Add GCM mode support to atmel
   - Add atmel support for SAMA5D2 devices
   - Add cipher modes to talitos
   - Add rockchip driver for rk3288
   - Add qat support for C3XXX and C62X"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (103 commits)
  crypto: hifn_795x, picoxcell - use ablkcipher_request_cast
  crypto: qat - fix SKU definiftion for c3xxx dev
  crypto: qat - Fix random config build issue
  crypto: ccp - use to_pci_dev and to_platform_device
  crypto: qat - Rename dh895xcc mmp firmware
  crypto: 842 - remove WARN inside printk
  crypto: atmel-aes - add debug facilities to monitor register accesses.
  crypto: atmel-aes - add support to GCM mode
  crypto: atmel-aes - change the DMA threshold
  crypto: atmel-aes - fix the counter overflow in CTR mode
  crypto: atmel-aes - fix atmel-ctr-aes driver for RFC 3686
  crypto: atmel-aes - create sections to regroup functions by usage
  crypto: atmel-aes - fix typo and indentation
  crypto: atmel-aes - use SIZE_IN_WORDS() helper macro
  crypto: atmel-aes - improve performances of data transfer
  crypto: atmel-aes - fix atmel_aes_remove()
  crypto: atmel-aes - remove useless AES_FLAGS_DMA flag
  crypto: atmel-aes - reduce latency of DMA completion
  crypto: atmel-aes - remove unused 'err' member of struct atmel_aes_dev
  crypto: atmel-aes - rework crypto request completion
  ...
2016-01-12 18:51:14 -08:00
Ulrich Weigand
a61674bdfc powerpc/module: Handle R_PPC64_ENTRY relocations
GCC 6 will include changes to generated code with -mcmodel=large,
which is used to build kernel modules on powerpc64le.  This was
necessary because the large model is supposed to allow arbitrary
sizes and locations of the code and data sections, but the ELFv2
global entry point prolog still made the unconditional assumption
that the TOC associated with any particular function can be found
within 2 GB of the function entry point:

func:
	addis r2,r12,(.TOC.-func)@ha
	addi  r2,r2,(.TOC.-func)@l
	.localentry func, .-func

To remove this assumption, GCC will now generate instead this global
entry point prolog sequence when using -mcmodel=large:

	.quad .TOC.-func
func:
	.reloc ., R_PPC64_ENTRY
	ld    r2, -8(r12)
	add   r2, r2, r12
	.localentry func, .-func

The new .reloc triggers an optimization in the linker that will
replace this new prolog with the original code (see above) if the
linker determines that the distance between .TOC. and func is in
range after all.

Since this new relocation is now present in module object files,
the kernel module loader is required to handle them too.  This
patch adds support for the new relocation and implements the
same optimization done by the GNU linker.

Cc: stable@vger.kernel.org
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-13 12:37:05 +11:00
Russell Currey
c88c5d4373 powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
parameters and returned nothing.  The call was updated to accept the
terminal number to flush, and returned various values depending on the
state of the output buffer.

The prototype has been updated and its usage in the OPAL kmsg dumper has
been modified to support its new behaviour as an incremental flush.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-13 12:35:17 +11:00
Linus Torvalds
33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
Linus Torvalds
1baa5efbeb * s390: Support for runtime instrumentation within guests,
support of 248 VCPUs.
 
 * ARM: rewrite of the arm64 world switch in C, support for
 16-bit VM identifiers.  Performance counter virtualization
 missed the boat.
 
 * x86: Support for more Hyper-V features (synthetic interrupt
 controller), MMU cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWlSKwAAoJEL/70l94x66DY0UIAK5vp4zfQoQOJC4KP4Xgxwdu
 kpnK2Boz3/74o1b0y5+eJZoUZCsXCVLtmP5uhmMxUYWDgByFG2X8ZDhPFwB5FYLT
 2dN+Lr4tsolgIfRdHZtrT6Svp9SDL039bWTdscnbR6l37/j9FRWvpKdhI3orloFD
 /i4CSW2dVIq1/9Xctwu/rtcOEesEx4Cad+6YV3/530eVAXFzE908nXfmqJNZTocY
 YCGcmrMVCOu0ng5QM4xSzmmYjKMLUcRs+QzZWkVBzdJtTgwZUr09yj7I2dZ1yj/i
 cxYrJy6shSwE74XkXsmvG+au3C5u3vX4tnXjBFErnPJ99oqzHatVnFWNRhj4dLQ=
 =PIj1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC changes will come next week.

   - s390: Support for runtime instrumentation within guests, support of
     248 VCPUs.

   - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM
     identifiers.  Performance counter virtualization missed the boat.

   - x86: Support for more Hyper-V features (synthetic interrupt
     controller), MMU cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits)
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  kvm/x86: Hyper-V SynIC timers tracepoints
  kvm/x86: Hyper-V SynIC tracepoints
  kvm/x86: Update SynIC timers on guest entry only
  kvm/x86: Skip SynIC vector check for QEMU side
  kvm/x86: Hyper-V fix SynIC timer disabling condition
  kvm/x86: Reorg stimer_expiration() to better control timer restart
  kvm/x86: Hyper-V unify stimer_start() and stimer_restart()
  kvm/x86: Drop stimer_stop() function
  kvm/x86: Hyper-V timers fix incorrect logical operation
  KVM: move architecture-dependent requests to arch/
  KVM: renumber vcpu->request bits
  KVM: document which architecture uses each request bit
  KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests
  kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock
  KVM: s390: implement the RI support of guest
  kvm/s390: drop unpaired smp_mb
  kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa
  KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
  arm/arm64: KVM: Detect vGIC presence at runtime
  ...
2016-01-12 13:22:12 -08:00
Michael S. Tsirkin
003472a93a powerpc: define __smp_xxx
This defines __smp_xxx barriers for powerpc
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

This reduces the amount of arch-specific boiler-plate code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:46:53 +02:00
Michael S. Tsirkin
fbd7ec0236 powerpc: reuse asm-generic/barrier.h
On powerpc read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-01-12 20:46:48 +02:00
Davidlohr Bueso
5a1b26d7c6 lcoking/barriers, arch: Use smp barriers in smp_store_release()
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-01-12 20:46:46 +02:00
Linus Torvalds
d870a9d5e3 * hide EDAC workqueue from users (Borislav Petkov)
* edac_subsys init/teardown cleanup (Borislav Petkov)
 
 * make mpc85xx-pci-edac a platform device (Scott Wood)
 
 * sb_edac KNL gen2 support (Jim Snow)
 
 * other small cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWk2YzAAoJEBLB8Bhh3lVKzeAQAJ2XHbErMLzqIiFHpLCcCGTP
 49pjv/i2C1+lVcGld4ddFcyGei5TU4h/+TqaI2QxrheN8kLjmFpaT+kb8v4nT09+
 oW9cMa3DiY79cNdRmYegaJfkEk+CLu9JEHrFw5by8RShy83LliN/HsvQ9bupxL93
 5TMvVOEWdtxcEA5y7/7YCVfvRbjEmQFJJByZjyXTOxTXXpqDdRy0mleDAaIa7c35
 o3HbR3DXm5YL0Nb0gul5IkCgzhBfAzMXS2vaOK4Jnqlbn9kjE6NtP1Ks6NNFSLQ+
 xd189BQwqOuQnMqczwnskaWhxySTlP36DUlomJKgr3IePveUnhTg0mtYMHluYPZe
 rnXykwF+CLHNztPsYya1u3y3xiYR1jnHy3UHUxygOJxqOFocfniBTFf3Q/Kd36DX
 9bFWTpuBn+grjiIXnZYtzALOnj+I9BZLFsUARSv1PrbBMcXDwGfkjNWmnQHqQyQ0
 Nbgs9+h+b17mG8kSbrYBYGmgxp1lmiZ+B7CUb1jOEjgPyD8ypNeaZUY2Hf+Sh+26
 1cF8HSaGc7oEcfJGqKqeg6AYnxMl3FQBFKsy2bnX9IsdTihqKOAxYqih409/cj9r
 HwQSn0ipFDXFq/dns3icwtvBSzIAK8BjKRyCtnD5KOvKzZuoRgEB2Wyhq3yFY34q
 ON6y6wDmIveCRpEvqFKN
 =9d1r
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - hide EDAC workqueue from users (Borislav Petkov)

 - edac_subsys init/teardown cleanup (Borislav Petkov)

 - make mpc85xx-pci-edac a platform device (Scott Wood)

 - sb_edac KNL gen2 support (Jim Snow)

 - other small cleanups all over the place

* tag 'edac_for_4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, i5100: Use to_delayed_work()
  MAINTAINERS: Fix EDAC repo URLs format
  EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing
  EDAC: Rework workqueue handling
  EDAC: Make edac_device workqueue setup/teardown functions static
  EDAC: Remove edac_get_sysfs_subsys() error handling
  EDAC: Unexport and make edac_subsys static
  EDAC: Rip out the edac_subsys reference counting
  EDAC: Robustify workqueues destruction
  EDAC, mc_sysfs: Fix freeing bus' name
  EDAC, mpc85xx: Make mpc85xx-pci-edac a platform device
  EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support
  EDAC, sb_edac: Add support for duplicate device IDs
  EDAC, sb_edac: Virtualize several hard-coded functions
  EDAC, mv64x60: Use platform_register/unregister_drivers()
  EDAC, mpc85xx: Use platform_register/unregister_drivers()
  EDAC: Add DDR4 flag
  EDAC: Remove references to bluesmoke.sourceforge.net
  EDAC, pci: Remove old disabled code
2016-01-11 20:36:20 -08:00
Hugh Dickins
2f10f1a788 powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff
Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y
but CONFIG_MEM_SOFT_DIRTY is not set.  That's because the non-zero
_PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not
discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be
recognized.

(I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on
CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete
attempt to solve this problem.)

It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and
and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff
should be made more robust; but nevertheless, fix up the powerpc ifdefs
as x86_64 and s390 (which met the same problem) have them, defining the
bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set.

Fixes: 7207f43665 ("powerpc/mm: Add page soft dirty tracking")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-12 11:03:53 +11:00
Aneesh Kumar K.V
44734f23de powerpc/mm: Fix _PAGE_PTE breaking swapoff
Core kernel expects swp_entry_t to consist of only swap type and swap
offset. We should not leak pte bits into swp_entry_t. This breaks
swapoff which use the swap type and offset to build a swp_entry_t and
later compare that to the swp_entry_t obtained from linux page table
pte. Leaking pte bits into swp_entry_t breaks that comparison and
results in us looping in try_to_unuse.

The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c,
since swapoff is circling around and around that function, reading from
each used swap block into a page, then trying to find where that page
belongs, looking at every non-file pte of every mm that ever swapped.

Fixes: 6a119eae94 ("powerpc/mm: Add a _PAGE_PTE bit")
Reported-by: Hugh Dickins <hughd@google.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-12 11:02:35 +11:00
Linus Torvalds
24af98c4cf Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "So we have a laundry list of locking subsystem changes:

   - continuing barrier API and code improvements

   - futex enhancements

   - atomics API improvements

   - pvqspinlock enhancements: in particular lock stealing and adaptive
     spinning

   - qspinlock micro-enhancements"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
  futex: Cleanup the goto confusion in requeue_pi()
  futex: Remove pointless put_pi_state calls in requeue()
  futex: Document pi_state refcounting in requeue code
  futex: Rename free_pi_state() to put_pi_state()
  futex: Drop refcount if requeue_pi() acquired the rtmutex
  locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
  lcoking/barriers, arch: Use smp barriers in smp_store_release()
  locking/cmpxchg, arch: Remove tas() definitions
  locking/pvqspinlock: Queue node adaptive spinning
  locking/pvqspinlock: Allow limited lock stealing
  locking/pvqspinlock: Collect slowpath lock statistics
  sched/core, locking: Document Program-Order guarantees
  locking, sched: Introduce smp_cond_acquire() and use it
  locking/pvqspinlock, x86: Optimize the PV unlock code path
  locking/qspinlock: Avoid redundant read of next pointer
  locking/qspinlock: Prefetch the next node cacheline
  locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
  atomics: Add test for atomic operations with _relaxed variants
2016-01-11 14:18:38 -08:00
Michael Ellerman
e708c24cd0 powerpc: Add HWCAP bits for Power9
In order to support Power9 we need two new HWCAP bits. We are merging
these ahead of the cputable entry so that glibc can start referring to
them.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-11 20:30:52 +11:00
Alistair Popple
08f48f3234 powerpc/powernv: Reserve PE#0 on NPU
P8+ hardware reports all errors on PE#0. This patch ensures PE#0 is
not assigned to NPU devices so that it can be used for EEH.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-11 20:30:52 +11:00
Alistair Popple
b521549a09 powerpc/powernv: Change NPU PE# assignment
The P8+ hardware supports four partitionable endpoints (PEs) however
the hardware reports all errors as occurring on PE#0. This means we
need to reserve this PE for error handling (EEH) and not assign it to
a NPU device, implying that some devices will need to share PEs.

This patch changes the PE assignment for NPU devices such that NPU
devices which connect to the same GPU are assigned to the same
PE#.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-11 20:30:51 +11:00
Alistair Popple
419dbd5e1f powerpc/powernv: Fix update of NVLink DMA mask
The emulated NVLink PCI devices share the same IODA2 TCE tables but only
support a single TVT (instead of the normal two for PCI devices). This
requires the kernel to manually replace windows with either the bypass
or non-bypass window depending on what the driver has requested.

Unfortunately an incorrect optimisation was made in
pnv_pci_ioda_dma_set_mask() which caused updating of some NPU device PEs
to be skipped in certain configurations due to an incorrect assumption
that a NULL peer PE in the array indicated there were no more peers
present. This patch fixes the problem by ensuring all peer PEs are
updated.

Fixes: 5d2aa710e6 ("powerpc/powernv: Add support for Nvlink NPUs")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-11 20:30:48 +11:00
Russell Currey
b0eab5b29a powerpc/powernv: Remove misleading comment in pci.c
PCI in powernv now supports quite a bit more than p5ioc2, so remove the
outdated comment.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-11 16:56:01 +11:00
Steven Rostedt
35de3b1aa1 powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
It has come to my attention that kprobe event stack tracing does not
work on powerpc. You can see with the following:

  # cd /sys/kernel/debug/tracing
  # echo stacktrace > trace_options
  # echo 'p kfree' > kprobe_events
  # echo 1 > events/kprobes/enable

Will print the following warning:
  save_stack_trace_regs() not implemented yet.

Although save_stack_trace() (which normal event stack traces use) is
implemented, save_stack_trace_regs() which kprobe events use is not.
This is a cheap attempt to implement that function.

Note, This may have issues if a task tries to get a stack trace from
another task with its regs, because it just passes in "current" to
save_context_stack(). But this does solve the issue with stack tracing
kprobe events.

Reported-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-11 14:27:28 +11:00
Dan Williams
21266be9ed arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug
Let all the archs that implement devmem_is_allowed() opt-in to a common
definition of CONFIG_STRICT_DEVM in lib/Kconfig.debug.

Cc: Kees Cook <keescook@chromium.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko: drop 'default y' for s390]
Acked-by: Ingo Molnar <mingo@redhat.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-09 06:30:49 -08:00
Michael Ellerman
c33e54fafa powerpc: Fix build break due to paca mm_context_t changes
Commit 2fc251a8dd ("powerpc: Copy only required pieces of the
mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y
and CONFIG_PPC_MM_SLICES=n.

That only happens for a kernel built with 4K pages and HUGETLB disabled,
which is why we missed it.

Fix it by adding a mm_ctx_user_psize member to the paca and populating
it in the appropriate places.

Fixes: 2fc251a8dd ("powerpc: Copy only required pieces of the mm_context_t to the paca")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-09 08:28:44 +11:00
Paolo Bonzini
2860c4b167 KVM: move architecture-dependent requests to arch/
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08 19:04:36 +01:00
Andrew Lunn
e7f4dc3536 mdio: Move allocation of interrupts into core
Have mdio_alloc() create the array of interrupt numbers, and
initialize it to POLLING. This is what most MDIO drivers want, so
allowing code to be removed from the drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-07 14:31:26 -05:00
Paolo Bonzini
def840ede3 KVM/ARM changes for Linux v4.5
- Complete rewrite of the arm64 world switch in C, hopefully
   paving the way for more sharing with the 32bit code, better
   maintainability and easier integration of new features.
   Also smaller and slightly faster in some cases...
 - Support for 16bit VM identifiers
 - Various cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWe80IAAoJECPQ0LrRPXpDdt8P/ittxzklIT7rsZxdOiIY6vLQ
 i2hWGo1KdZR+8rsNyQEeGyg2Ocdja0Vld9ciBKgXKeKtc6x6AHfq0x6eyGRbF2jJ
 Wdkd2a9lLJVJJIf1LBhOQuwjNiEvAgvqO5nwXL77s/rqx3Ur5OlyohSvRFBy7Pqp
 8cdnCV/43I/y7k0iGhitFVrEC9TL0cfeJmM7YhXkt8IcpkcpCDfgdI7wgIb0ntvv
 dqvrRmfp+Q3/hJ6SVRsy6uzOrjFjRE8hIIG6TiqfRd/FgI5x2xvGkSAE3Wx2YdRM
 myPDiAuY2wOyALZpn9zD7qFMOfI2wX8kaX5S/ctnbvLelkmQblI39/zfYuxJ36xC
 Mo2yMKcvT37AIMz2fxx3mGnIR7NZBNXVQGJmv/1p9vMQ8RbUXUhT0w6hP5SH9S7m
 RDoOkfd37wQugQ7bgI7cqg4hodMRlybGPq8QaKp80y0Ej3cPblM+y0fbR153SSbS
 6nCwYceFLdWJEV+tTFpKD5cvxOGeYfoC/8LcVRYRcWg6nlr4+qo61MHevyCe/Qxw
 Pw+z9wFpoVKumRT72KmzFUxFQqRnTshE3KJqNJdsqMPM8ZuW5TJ/MtUa+JdAWmSH
 dEAqd7Hy5W1CqgGEk+los+QlKw+5uFepcZ7SOuQ2ehME/Ae3xI8s9+UE6yqeHx6Y
 PV2s5lyJRCPfm3qOu7TS
 =wuir
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next

KVM/ARM changes for Linux v4.5

- Complete rewrite of the arm64 world switch in C, hopefully
  paving the way for more sharing with the 32bit code, better
  maintainability and easier integration of new features.
  Also smaller and slightly faster in some cases...
- Support for 16bit VM identifiers
- Various cleanups
2016-01-07 11:00:57 +01:00
David S. Miller
9e0efaf6b4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-01-06 22:54:18 -05:00
Rabin Vincent
55795ef546 net: filter: make JITs zero A for SKF_AD_ALU_XOR_X
The SKF_AD_ALU_XOR_X ancillary is not like the other ancillary data
instructions since it XORs A with X while all the others replace A with
some loaded value.  All the BPF JITs fail to clear A if this is used as
the first instruction in a filter.  This was found using american fuzzy
lop.

Add a helper to determine if A needs to be cleared given the first
instruction in a filter, and use this in the JITs.  Except for ARM, the
rest have only been compile-tested.

Fixes: 3480593131 ("net: filter: get rid of BPF_S_* enum")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-06 00:43:52 -05:00
Craig Gallek
538950a1b7 soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:59 -05:00
Al Viro
7812bf173a ppc: get rid of the remnants of __get_user64()
When __get_user64() had been removed, its helper (__get_user64_nocheck)
got missed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-04 10:28:12 -05:00
Al Viro
7e935c7ca1 Merge branch 'memdup_user_nul' into work.misc 2016-01-04 10:25:34 -05:00
David S. Miller
c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Thomas Falcon
032c5e8284 Driver for IBM System i/p VNIC protocol
This is a new device driver for a high performance SR-IOV assisted virtual
network for IBM System p and IBM System i systems.  The SR-IOV VF will be
attached to the VIOS partition and mapped to the Linux client via the
hypervisor's VNIC protocol that this driver implements.

This driver is able to perform basic tx and rx, new features
and improvements will be added as they are being developed and tested.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-28 00:12:13 -05:00
Andrew Donnellan
dc3799bb9a powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
Fix off-by-one error in opal_mce_check_early_recovery() when checking
whether the NIP falls within OPAL space.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:41 +11:00
Andrew Donnellan
759fb100b2 powerpc: Fix style of self-test config prompts
A few of the config prompts for powerpc self-tests have periods at the
end, which is inconsistent with the rest of the prompts. Remove the
periods.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:40 +11:00
Michael Neuling
57a9039052 powerpc/powernv: Only delay opal_rtc_read() retry when necessary
Only delay opal_rtc_read() when busy and are going to retry.

This has the advantage of possibly saving a massive 10ms off booting!

Kudos to Stewart for noticing.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:40 +11:00
Russell Currey
affddff69c powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called.  When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.

Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines.  Before this
patch, however, only partial output would be printed during the timeout wait.

This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.

The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
message may still be truncated as follows:

>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not

This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:40 +11:00
Michael Neuling
2fc251a8dd powerpc: Copy only required pieces of the mm_context_t to the paca
Currently we copy the whole mm_context_t to the paca but only access a
few bits of it.  This is wasteful of space paca and also takes quite
some time in the hot path of context switching.

This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.

Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:

  http://ozlabs.org/~anton/junkcode/context_switch2.c
  ./context_switch2 --test=yield --process 0 0

1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html

Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:14 +11:00
Hongtao Jia
3045e409e4 powerpc/mpc85xx: Add TMU device tree support for T1023/T1024
Also add nodes and properties for thermal management support. Meanwhile
preprocessor support is needed using thermal of framework.

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-23 22:21:11 -06:00
Hongtao Jia
be489a3936 powerpc/mpc85xx: Add TMU device tree support for T1040/T1042
Also add nodes and properties for thermal management support. Meanwhile
preprocessor support is needed using thermal of framework.

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
Reviewed-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-23 22:21:11 -06:00
Raghav Dogra
479f6a7fc6 powerpc/fsl_lbc: removal of dead code
The condition check is not used.

Signed-off-by: Raghav Dogra <raghav@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-23 15:02:00 -06:00
Zhao Qiang
b0417a2870 powerpc/p1010rdb: Update dts for pcie interrupt-map
p1010rdb uses the irq[4:5] for inta and intb to pcie,
it is active-high, so set it.

Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:23:22 -06:00
Scott Wood
2749539b4b powerpc/e6500: add locking to hugetlb
e6500 has threads but does not have TLB write conditional.  Thus,
the hugetlb code needs to take the same lock that the normal TLB miss
handlers take, to ensure that the tlbsx and tlbwe are atomic.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:23:22 -06:00
li pengbo
9d68e7accf powerpc/85xx: Enable TWR_P102x in mpc85xx_basic_defconfig
Enable TWR_P102x option by default in mpc85xx_basic_defconfig to support
p1025twr board.

Signed-off-by: Pengbo Li <Pengbo.Li@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:23:21 -06:00
Daniel Walker
433c858a61 powerpc/85xx: mpc85xx ADS: remove pci exclude
This code was reworked in commit,

905e75c46d

This change removed the fsl_add_bridge() which originally was above
the addition of the pci_exclude_device function. I think the assumption was that
the pci_exclude_device would prevent changes to the bridge PCI config after
it's been added. It seems it wasn't fully tested on MPC85xx ADS because
if you move the fsl_add_bridge() the pci_exclude_device is set in the machine
description then you can never update the PCI Config since the exclude
prevents it. This disrupts things like DMA.

This issue was extensively debugged by David Beazley.

Cc: xe-kernel@external.cisco.com
Cc: dbeazley@cisco.com
Cc: dwalker@fifo99.com
Signed-off-by: Daniel Walker <danielwa@cisco.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:23:21 -06:00
Igal Liberman
4e9de5e970 powerpc/mpc85xx: Update B4 FMan MURAM size
FMan V3H has 2 different MURAM sizes:
    In B4860/4420 the MURAM size is 512KB.
    In T4240 and T2080 the MURAM size is 384KB.

The MURAM size in FMan V3H device tree is 384KB.
This patch updates the MURAM size for B4 to 512KB.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:23:11 -06:00
Harninder Rai
720d7aebcd powerpc/85xx: Add PCIe controller support for bsc9132qds
1. Use machine_arch_initcall to hook mpc85xx_common_publish_devices
This can ensure before pcibios_init() is called, pci controllers have
been probed and added to the hose_list.
2. Add a workaround for errata A-005434
For the BSC9132, PEX_PEXIWARn[TRGT] for all windows defaults to 0xF,
which is mapped to CCSRBAR. However, for other products, 0xF is
mapped to the local memory. Therefore, for the BSC9132, any default
PCI Express access to the local memory (DDR) will now access the
CCSRBAR. This patch changes the mapping of targets of inbound windows
PEX_PEXIWARn[TRGT] to the Local address space – 0x0 (from 0xF).

Signed-off-by: Harninder Rai <harninder.rai@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Hou Zhiqiang <B48286@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:17:15 -06:00
Harninder Rai
230dd6059a powerpc/fsl: Add PCI node in device tree of bsc9132qds
Signed-off-by: Harninder Rai <harninder.rai@freescale.com>
Signed-off-by: Minghuan Lian <Minghuan.Lian@freescale.com>
Signed-off-by: Hou Zhiqiang <B48286@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:17:14 -06:00
Linus Torvalds
e73a31778a KVM/ARM fixes for v4.4-rc7
- A series of fixes to the MTRR emulation, tested in the BZ by several users
   so they should be safe this late
 
 - A fix for a division by zero
 
 - Two very simple ARM and PPC fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWeV/6AAoJEL/70l94x66DqzsH/05YnLi2GsX5WeZHMfIUgzgT
 S/GoIkA7A4E2eXVoGg824MWppSViUzZkWgYFQTG4+KY9WPXzm9z2ij7DIlUHCD6n
 QfevgQx1kIu1obyhm6bYM2xUdM3f7NCsQgw9bXZObB0ay+b/+GjR9/RbCbx60EO5
 K1P+kveK6PFlS9/hc0PLztu6WkPV9BCO1RJUbeAEdnrMbpuQfHC+coR7MHRCiv2V
 iy8f1CqrGaO5YPm9/3GbdH1xMKew4OZShOxTXwtvUThdrLkks2c8sk6FoLzqkznH
 LMHVIpkm4mrIgThZG7VqZMXOrWvBtsCt04Vr9MzCM6QetB02b/Uz0xKvMYx2kZQ=
 =pmYz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - A series of fixes to the MTRR emulation, tested in the BZ by several
   users so they should be safe this late

 - A fix for a division by zero

 - Two very simple ARM and PPC fixes

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Reload pit counters for all channels when restoring state
  KVM: MTRR: treat memory as writeback if MTRR is disabled in guest CPUID
  KVM: MTRR: observe maxphyaddr from guest CPUID, not host
  KVM: MTRR: fix fixed MTRR segment look up
  KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX
  KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check
  kvm: x86: move tracepoints outside extended quiescent state
  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
2015-12-22 15:47:39 -08:00
Zhao Qiang
7aa1aa6ece QE: Move QE from arch/powerpc to drivers/soc
ls1 has qe and ls1 has arm cpu.
move qe from arch/powerpc to drivers/soc/fsl
to adapt to powerpc and arm

Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 17:12:56 -06:00
Zhao Qiang
302c059f2e QE: use subsys_initcall to init qe
Use subsys_initcall to init qe to adapt ARM architecture.
Remove qe_reset from PowerPC platform file.

Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 17:10:18 -06:00
Zhao Qiang
1291e49e89 QE/CPM: move muram management functions to qe_common
QE and CPM have the same muram, they use the same management
functions. Now QE support both ARM and PowerPC, it is necessary
to move QE to "driver/soc", so move the muram management functions
from cpm_common to qe_common for preparing to move QE code to "driver/soc"

Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 17:10:18 -06:00
Zhao Qiang
0e6e01ff69 CPM/QE: use genalloc to manage CPM/QE muram
Use genalloc to manage CPM/QE muram instead of rheap.

Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 17:10:18 -06:00
Michael Neuling
c395465da6 powerpc: Add function to copy mm_context_t to the paca
This adds a function to copy the mm->context to the paca.  This is
only a basic conversion for now but will be used more extensively in
the next patch.

This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-19 22:13:12 +11:00
Alistair Popple
036592fbbe powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"
Commit 25642e1459 ("powerpc/opal-irqchip: Fix double endian
conversion") fixed an endian bug by calling opal_handle_events() in
opal_event_unmask().

However this introduced a deadlock if we find an event is active
during unmasking and call opal_handle_events() again. The bad call
sequence is:

  opal_interrupt()
  -> opal_handle_events()
     -> generic_handle_irq()
        -> handle_level_irq()
           -> raw_spin_lock(&desc->lock)
              handle_irq_event(desc)
              unmask_irq(desc)
              -> opal_event_unmask()
                 -> opal_handle_events()
                    -> generic_handle_irq()
                       -> handle_level_irq()
                          -> raw_spin_lock(&desc->lock)	(BOOM)

When generating multiple opal events in quick succession this would lead
to the following stall warnings:

EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32
INFO: rcu_sched detected stalls on CPUs/tasks:

         12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065
         15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065
         (detected by 13, t=2102 jiffies, g=1325, c=1324, q=602)
NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696]
INFO: rcu_sched detected stalls on CPUs/tasks:
         12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371
         15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371
         (detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290)

This patch corrects the problem by queuing the work if an event is
active during unmasking, which is similar to the pre-endian fix
behaviour.

Fixes: 25642e1459 ("powerpc/opal-irqchip: Fix double endian conversion")
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-18 22:24:15 +11:00
Daniel Axtens
1b855e167b powerpc: Add missing calls to va_end()
cppcheck picked up that there were a couple of missing va_end()
calls in functions using va_start().

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 23:23:22 +11:00
Nathan Fontenot
e9d764f803 powerpc/pseries: Enable kernel CPU dlpar from sysfs
Enable new kernel cpu hotplug functionality by allowing cpu dlpar requests
to be initiated from sysfs.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:03 +11:00
Nathan Fontenot
90edf184b9 powerpc/pseries: Add CPU dlpar add functionality
Add the ability to hotplug add cpus via rtas hotplug events by either
specifying the drc index of the CPU to add, or providing a count of the
number of CPUs to add.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:02 +11:00
Nathan Fontenot
ac71380071 powerpc/pseries: Add CPU dlpar remove functionality
Add the ability to dlpar remove CPUs via hotplug rtas events, either by
specifying the drc-index of the CPU to remove or providing a count of cpus
to remove.

To remove multiple cpus in a single request we create a list of possible
DR (Dynamic Reconfiguration) cpus and their drc indexes that can be
removed.  We can then traverse the list remove each cpu and easily clean
up in any cases of failure.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:02 +11:00
Nathan Fontenot
e666ae0b10 powerpc/pseries: Update CPU hotplug error recovery
Update the cpu dlpar add/remove paths to do better error recovery when
a failure occurs during the add/remove operation.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:02 +11:00
Nathan Fontenot
d98389f375 powerpc/pseries: Factor out common cpu hotplug code
Re-factor the cpu hotplug code to support doing cpu hotplug completely in
the kernel and using the existing sysfs probe/release interfaces. This
patch pulls out pieces of existing cpu hotplug code into common routines,
dlpar_cpu_add() and dlpar_cpu_remove(), to be used by both interfaces.
There are no functional changes introduced.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:01 +11:00
Nathan Fontenot
183deeea58 powerpc/pseries: Consolidate CPU hotplug code to hotplug-cpu.c
No functional changes, this patch is simply a move of the cpu hotplug
code from pseries/dlpar.c to pseries/hotplug-cpu.c. This is in an effort
to consolidate all of the cpu hotplug code in a common place.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:01 +11:00
Nathan Fontenot
1f859adb92 powerpc/pseries: Verify CPU doesn't exist before adding
When DLPAR adding a CPU we should verify that the CPU does not already
exist. Failure to do so can generate a kernel oops;

[    9.465585] kernel BUG at arch/powerpc/platforms/pseries/dlpar.c:382!
[    9.465796] Oops: Exception in kernel mode, sig: 5 [#1]

This oops can be generated by causing a probe to be performed on a cpu
by writing to the sysfs cpu probe file (/sys/devices/system/cpu/probe).
This patch adds a check for the existence of cpu prior to probing the cpu
so userspace doing the wrong thing won't trigger a BUG_ON().

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:01 +11:00
Alistair Popple
4450022b49 powerpc/476fpe: Add support for kexec
PPC476FPE has a different PVR from previous PPC476 processors. The
kexec code checks the PVR in order to correctly setup the MMU. When
the initial support for 476FPE processors was added the corresponding
change in the kexec code was missed. This patch simply adds the check
and solves the following bug on kexec:

kexec: Starting new kernel
Bye!
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0xee9a50f8
cpu 0x0: Vector: 400 (Instruction Access) at [ee9d7d20]
    pc: ee9a50f8
    lr: ee9a50e4
    sp: ee9d7dd0
    msr: 21020
    current = 0xee40f000
    pid   = 960, comm = kexec
enter ? for help
[link register   ] ee9a50e4
[ee9d7dd0] c0013748 default_machine_kexec+0x58/0x70 (unreliable)
[ee9d7df0] c0012f04 machine_kexec+0x34/0x40
[ee9d7e00] c00aa1ec kernel_kexec+0x9c/0xb0
[ee9d7e20] c005d704 SyS_reboot+0x1f4/0x220
[ee9d7f40] c000db68 ret_from_syscall+0x0/0x3c

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:00 +11:00
Alistair Popple
5d2aa710e6 powerpc/powernv: Add support for Nvlink NPUs
NVLink is a high speed interconnect that is used in conjunction with a
PCI-E connection to create an interface between CPU and GPU that
provides very high data bandwidth. A PCI-E connection to a GPU is used
as the control path to initiate and report status of large data
transfers sent via the NVLink.

On IBM Power systems the NVLink processing unit (NPU) is similar to
the existing PHB3. This patch adds support for a new NPU PHB type. DMA
operations on the NPU are not supported as this patch sets the TCE
translation tables to be the same as the related GPU PCIe device for
each NVLink. Therefore all DMA operations are setup and controlled via
the PCIe device.

EEH is not presently supported for the NPU devices, although it may be
added in future.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:41:00 +11:00
Alistair Popple
a84bf32140 powerpc: Add __raw_rm_writeq() function
Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to
include/asm/io.h so that it can be used by other code.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:59 +11:00
Alistair Popple
94973b24d6 Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field"
This commit removed the pcidev field from struct pci_dn as it was no
longer in use by the kernel. However to support finding the
association of Nvlink devices to GPU devices from the device-tree this
field is required.

This reverts commit 250c7b277c ("powerpc/pci: Remove unused struct
pci_dn.pcidev field").

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:59 +11:00
Gavin Shan
e80c4e7ca5 powerpc/powernv: Fix M64 resource name in /proc/iomem
The name of PCI root bus's M64 resource isn't initialized properly.
When dumping "/proc/iomem", "<BAD>" is seen for those M64 resources
on PCI root buses.

   ~# cat /proc/iomem | grep -e "BAD"
   3b0000000000-3b0fefffffff : <BAD>
   3b1000000000-3b1fefffffff : <BAD>
   3c0000000000-3c0fefffffff : <BAD>
   3c1000000000-3c1fefffffff : <BAD>
   3c2000000000-3c2fefffffff : <BAD>

This fixes the issue by setting the name of PCI root bus's M64
resource to that of PHB's device node full name. With the patch,
no "<BAD>" is seen from "/proc/iomem".

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:59 +11:00
Laurent Dufour
7207f43665 powerpc/mm: Add page soft dirty tracking
User space checkpoint and restart tool (CRIU) needs the page's change
to be soft tracked. This allows to do a pre checkpoint and then dump
only touched pages.

This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when
the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when
the page is swapped out.

To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k
and hash 64k pte, the bits already defined in hash-*4k.h should be
shifted left by one.

The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in
the swap pte. A check is added to ensure that the bit is not
overwritten by _PAGE_HPTEFLAGS.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:58 +11:00
Michael Ellerman
2613265cb5 powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIES
The STD_EXCEPTION_PSERIES macro takes both a vector number, and a
location (memory address). However both are always identical, so combine
them to save repeating ourselves.

This does mean an exception handler must always exist at the location in
memory that matches its vector number. But that's OK because this is the
"STD" macro (standard), which does exactly that. We have other macros
for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:58 +11:00
Michael Ellerman
d8725ce86c powerpc/kernel: Open code SET_DEFAULT_THREAD_PPR
This is only used in one location, open code it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:57 +11:00
Michael Ellerman
d030a4b5eb powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPR
HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:57 +11:00
Michael Ellerman
d6265aeaf8 powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARD
HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most
of our first level exception handlers. It conditionally executes a
HMT_MEDIUM instruction, which sets the processor priority to medium.

On on modern systems, ie. Power7 and later, it is nop'ed out at boot.
All it does is make the exception vectors more cramped, and consume 4
bytes of icache.

On old systems it has the effect of boosting the processor priority at
the start of exception processing. If we were previously in the idle
loop for example, we may be at low or very low priority. This is
desirable as we want to process the exception as fast as possible.

However looking closely at the generated code, we see that in all cases
we execute another HMT_MEDIUM just four instructions later. With code
patching applied, the final code on an old (Power6) system will look
like, eg:

  c000000000000300 <data_access_pSeries>:
  c000000000000300:	7c 42 13 78	mr	r2,r2		<-
  c000000000000304:	7d b2 43 a6	mtsprg	2,r13
  c000000000000308:	7d b1 42 a6	mfsprg	r13,1
  c00000000000030c:	f9 2d 00 80	std	r9,128(r13)
  c000000000000310:	60 00 00 00	nop
  c000000000000314:	7c 42 13 78	mr	r2,r2		<-

So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is
not justified by the benefit of boosting the processor priority for the
duration of four instructions, and therefore we drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:57 +11:00
Michael Ellerman
cd5cdeb6c8 powerpc/rtas: Make enter_rtas() private
There are no longer any users of enter_rtas() outside of rtas.c, so make
it "private", by moving the declaration inside rtas.c. Hopefully this
will encourage people to use one of the wrappers which takes the sharp
edges off the RTAS calling sequence.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:56 +11:00
Michael Ellerman
4456f45246 powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()
Although call_rtas_display_status() does actually want to use the
regular RTAS locking, it doesn't want the extra logic that is in
rtas_call(), so currently it open codes the logic.

Instead we can use rtas_call_unlocked(), after taking the RTAS lock.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:56 +11:00
Michael Ellerman
b2e8590fa1 powerpc/pseries: Use rtas_call_unlocked() in pseries hotplug
Avoid open coding the logic by using rtas_call_unlocked().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:55 +11:00
Michael Ellerman
08eb105a7c powerpc/xmon: Use rtas_call_unlocked() in xmon
Avoid open coding the logic by using rtas_call_unlocked().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:55 +11:00
Michael Ellerman
209eb4e5cb powerpc/rtas: Add rtas_call_unlocked()
Most users of RTAS (Run-Time Abstraction Services) use rtas_call(),
which deals with locking as well as endian handling.

However we have two users outside of rtas.c that can't use rtas_call()
because they have different locking requirements.

The hotplug CPU code can't take the RTAS lock because the CPU would go
offline with the lock held and no other CPUs would be able to call RTAS
until the CPU came back online.

The xmon code doesn't want to take the lock because it would risk dead
locking when we are trying to recover from a crash.

Both sites required multiple patches when we added little endian
support, proving that programmers can't do endian right.

Although that ship has sailed, we can still clean the code up by
providing an unlocked version of rtas_call() which avoids the need to
open code the logic elsewhere.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:55 +11:00
Stewart Smith
e4d54f71d2 powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
be cared about or supported by mainline kernels.

So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
exclusively.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:54 +11:00
Stewart Smith
7261aafc09 powerpc/powernv: Remove OPALv2 firmware define and references
OPALv2 only ever existed in the lab and didn't escape to the world.
All OPAL systems in the wild are OPALv3.

The probability of there being an OPALv2 system still powered on
anywhere inside IBM is approximately zero, let alone anyone
expecting to run mainline kernels.

So, start to remove references to OPALv2.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:54 +11:00
Stewart Smith
786842b62f powerpc/powernv: panic() on OPAL < V3
The OpenPower Abstraction Layer firmware went through a couple
of iterations in the lab before being released. What we now know
as OPAL advertises itself as OPALv3.

OPALv2 and OPALv1 never made it outside the lab, and the possibility
of anyone at all ever building a mainline kernel today and expecting
it to boot on such hardware is zero.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:53 +11:00
Stewart Smith
98da62b716 powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
When running on newer OPAL firmware that supports sending extra
OPAL_MSG types, we would print a warning on *every* message received.

This could be a problem for kernels that don't support OPAL_MSG_OCC
on machines that are running real close to thermal limits and the
OCC is throttling the chip. For a kernel that is paying attention to
the message queue, we could get these notifications quite often.

Conceivably, future message types could also come fairly often,
and printing that we didn't understand them 10,000 times provides
no further information than printing them once.

Cc: stable@vger.kernel.org
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 20:42:13 +11:00
Haren Myneni
6333ed8f26 crypto: nx-842 - Mask XERS0 bit in return value
NX842 coprocessor sets 3rd bit in CR register with XER[S0] which is
nothing to do with NX request. Since this bit can be set with other
valuable return status, mast this bit.

One of other bits (INITIATED, BUSY or REJECTED) will be returned for
any given NX request.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2015-12-17 16:42:12 +08:00
Michael Ellerman
2475c36213 Partial revert of "powerpc: Individual System V IPC system calls"
This partially reverts commit a34236155a.

While reviewing the glibc patch to exploit the individual IPC calls,
Arnd & Andreas noticed that we were still requiring userspace to pass
IPC_64 in order to get the new style IPC API.

With a bit of cleanup in the kernel we can drop that requirement, and
instead only provide the new style API, which will simplify things for
userspace.

Rather than try and sneak that patch into 4.4, instead we will drop the
individual IPC calls for powerpc, and merge them again in 4.5 once the
cleanup patch has gone in.

Because we've already added sys_mlock2() as syscall #378, we don't do a
full revert of the IPC calls. Instead we drop the __NR #defines, and
send those now undefined syscall numbers to sys_ni_syscall(). This
leaves a gap in the syscall numbers, but we'll reuse them when we merge
the individual IPC calls.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2015-12-16 21:52:32 +11:00
Daniel Axtens
00b912b0c8 powerpc: Remove broken GregorianDay()
GregorianDay() is supposed to calculate the day of the week
(tm->tm_wday) for a given day/month/year. In that calcuation it
indexed into an array called MonthOffset using tm->tm_mon-1. However
tm_mon is zero-based, not one-based, so this is off-by-one. It also
means that every January, GregoiranDay() will access element -1 of
the MonthOffset array.

It also doesn't appear to be a correct algorithm either: see in
contrast kernel/time/timeconv.c's time_to_tm function.

It's been broken forever, which suggests no-one in userland uses
this. It looks like no-one in the kernel uses tm->tm_wday either
(see e.g. drivers/rtc/rtc-ds1305.c:319).

tm->tm_wday is conventionally set to -1 when not available in
hardware so we can simply set it to -1 and drop the function.
(There are over a dozen other drivers in drivers/rtc that do
this.)

Found using UBSAN.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org> # as an example of what UBSan finds.
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Cc: rtc-linux@googlegroups.com
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-16 12:54:04 +11:00
Rashmica Gupta
eb925d6460 powerpc/xmon: Append linux_banner to exception information in xmon.
Currently if you are in xmon without an oops etc. to view the kernel
version you have to type "d $linux_banner" - not necessarily obvious. As
this is useful information, append to the output of "e" command.

Example output:
  $mon> e
  cpu 0x1: Vector: 0  at [c0000000f879ba80]
      pc: c000000000081718: sysrq_handle_xmon+0x68/0x80
      lr: c000000000081718: sysrq_handle_xmon+0x68/0x80
      sp: c0000000f879bbe0
     msr: 8000000000009033
    current = 0xc0000000f604d5c0
    paca    = 0xc00000000fdc0480	 softe: 0	 irq_happened: 0x01
      pid   = 2467, comm = bash
  Linux version 4.4.0-rc2-00008-gc51af91c3ab3-dirty (rashmica@circle) (gcc
  version 5.1.1 20150629 (GCC) ) #45 SMP Wed Nov 25 10:25:12 AEDT 2015

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 20:41:50 +11:00
Rashmica Gupta
24ad1648ed powerpc/cell: Remove the Cell QPACE code
All users of QPACE have upgraded to QPACE2 so remove the Cell QPACE code.

Signed-off-by: Rashmica Gupta <rashmicy@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 20:41:50 +11:00
Vipin K Parashar
b4af279a7c powerpc/pseries: Limit EPOW reset event warnings
Kernel prints respective warnings about various EPOW events for
user information/action after parsing EPOW interrupts. At times
below EPOW reset event warning is seen to be flooding kernel log
over a period of time.

May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared
May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared
May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared
May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared
May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared
May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared

These EPOW reset events are spurious in nature and are triggered by
firmware without an actual EPOW event being reset. This patch avoids these
multiple EPOW reset warnings by using a counter variable. This variable
is incremented every time an EPOW event is reported. Upon receiving a EPOW
reset event the same variable is checked to filter out spurious events and
decremented accordingly.

This patch also improves log messages to better describe EPOW event being
reported. Merged adjacent log messages into single one to reduce number of
lines printed per event.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 20:41:49 +11:00
Michael Ellerman
1901d8bb45 powerpc fixes for 4.4 #2
- tm: Block signal return from setting invalid MSR state from Michael Neuling
  - tm: Check for already reclaimed tasks from Michael Neuling
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWVul+AAoJEFHr6jzI4aWAUZAQAK2s+E4WTtcExXSG1bqq05o6
 5wCLtIq6M91h5HDpgBfN7S5OXpRd72ZeIVdzC0HkFeLBF3y7NSHSEezUw4g/GGfz
 K2xGV1CCXC3Rb3qyHSdyi6+c1AnLVRPBVzVxPVmlXigrXeFiQ4613YW9rzf9b8fs
 oktUciwW9aHbrIv7g8f82gpuk9jwwhp/sF+1H/7fGOozT4CFsKo4wj4HOOCBwH4y
 ODEjs6Z+9Uwb6Kfvi/rn3k4XA1wC36WFq3ORI6KrmK/ZB1eR0Kwf0IELYpMj8cOX
 q5ZtCH7t68f9vmEK2B34AUijf/amm+2vLwvF6xAuZJFPUPZtgMBdRcqkLalbtPAO
 8hlyPPgoZcgR/Of+lEYxUobcL0SMNufXwmfwRO35ktkm9Z9Ee96C8NNbpybBSDXL
 YRa6is5MeO4GL8Gbcc0TA50hGjok7o3acGE6HSAReyzf0guQ4xqcif7+6lfWZPkk
 P3aM02ajp2qoqyjhT/Ei6JlMptAiuQY+HvELFneqn5s9nDbv6cGuYZNNap0c1fK+
 74W0p7MiZh7+IF5HpyUIeYV836inXMDIoKzjA6H3OWitk/1lbcrbF34Qpz9zAWZn
 YF3w786ZzzQLw0jcALaqZejm58MLGIakO4MNDB0/ZBh0nKKfEV8WvPOjev78OAp1
 +pDrJh0iQzrujN6OMKQ0
 =IL8A
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.4-3' into next

Merge the two TM fixes we merged in 4.4. We are about to merge selftests
for these, and without the fixes the selftests will oops.

powerpc fixes for 4.4 #2

 - tm: Block signal return from setting invalid MSR state from Michael Neuling
 - tm: Check for already reclaimed tasks from Michael Neuling
2015-12-14 20:40:32 +11:00
Michael Neuling
801c0b2c4d powerpc: Print MSR TM bits in oops messages
Print MSR TM bits in oops messages.  This appends them to the end
like this:

    MSR: 8000000502823031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[TE]>

You get the TM[] only if at least one TM MSR bit is set.  Inside the
TM[], E means Enabled (bit 32), S means Suspended (bit 33), and T
means Transactional (bit 34)

If no bits are set, you get no TM[] output.

Include rework of printbits() to handle this case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 20:40:26 +11:00
Boqun Feng
81d7a3294d powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_
versions all need to be fully ordered, however they are now just
RELEASE+ACQUIRE, which are not fully ordered.

So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics
of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit
b97021f855 ("powerpc: Fix atomic_xxx_return barrier semantics")

This patch depends on patch "powerpc: Make value-returning atomics fully
ordered" for PPC_ATOMIC_ENTRY_BARRIER definition.

Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 20:39:01 +11:00
Boqun Feng
49e9cf3f0c powerpc: Make value-returning atomics fully ordered
According to memory-barriers.txt:

> Any atomic operation that modifies some state in memory and returns
> information about the state (old or new) implies an SMP-conditional
> general memory barrier (smp_mb()) on each side of the actual
> operation ...

Which mean these operations should be fully ordered. However on PPC,
PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation,
which is currently "lwsync" if SMP=y. The leading "lwsync" can not
guarantee fully ordered atomics, according to Paul Mckenney:

https://lkml.org/lkml/2015/10/14/970

To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee
the fully-ordered semantics.

This also makes futex atomics fully ordered, which can avoid possible
memory ordering problems if userspace code relies on futex system call
for fully ordered semantics.

Fixes: b97021f855 ("powerpc: Fix atomic_xxx_return barrier semantics")
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 20:38:18 +11:00
Aneesh Kumar K.V
4dcbd88eb6 powerpc/mm: Don't open code pgtable_t size
The slot information of base page size hash pte is stored in the
pgtable_t w.r.t transparent hugepage. We need to make sure we don't
index beyond pgtable_t size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:17 +11:00
Aneesh Kumar K.V
4ad90c8649 powerpc/mm: Use H_READ with H_READ_4
This will bulk read 4 hash pte slot entries and should reduce the loop

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:17 +11:00
Aneesh Kumar K.V
45949ebe6c powerpc/nohash: we don't use real_pte_t for nohash
Remove the related functions and #defines

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:16 +11:00
Aneesh Kumar K.V
cc50380db3 powerpc/nohash: Update 64K nohash config to have 32 pte fragement
They don't need to track 4k subpage slot details and hence don't need
second half of pgtable_t.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:16 +11:00
Aneesh Kumar K.V
4d9057c39a powerpc/mm: Don't hardcode the hash pte slot shift
Use the #define instead of open-coding the same

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:15 +11:00
Aneesh Kumar K.V
62607bc64c powerpc/mm: Don't hardcode page table size
pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:15 +11:00
Aneesh Kumar K.V
6a119eae94 powerpc/mm: Add a _PAGE_PTE bit
For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:14 +11:00
Aneesh Kumar K.V
e34aa03ca4 powerpc/mm: Move THP headers around
We support THP only with book3s_64 and 64K page size. Move
THP details to hash64-64k.h to clarify the same.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:14 +11:00
Aneesh Kumar K.V
26a344aea4 powerpc/mm: Move hugetlb related headers
W.r.t hugetlb, we support two format for pmd. With book3s_64 and
64K linux page size, we can have pte at the pmd level. Hence we
don't need to support hugepd there. For everything else hugepd
is supported and pmd_huge is (0).

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:13 +11:00
Aneesh Kumar K.V
40e8550afc powerpc/mm: Move WIMG update to helper.
Only difference here is, we apply the WIMG mapping early, so rflags
passed to updatepp will also be changed.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:13 +11:00
Aneesh Kumar K.V
c6a3c495f0 powerpc/mm: Add helper for converting pte bit to hpte bits
Instead of open coding it in multiple code paths, export the helper
and add more documentation. Also make sure we don't make assumption
regarding pte bit position

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:12 +11:00
Aneesh Kumar K.V
a43c0eb836 powerpc/mm: Convert 4k insert from asm to C
This is similar to 64K insert. May be we want to consolidate

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:12 +11:00
Aneesh Kumar K.V
89ff725051 powerpc/mm: Convert __hash_page_64K to C
Convert from asm to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:11 +11:00
Aneesh Kumar K.V
227fdbee5a powerpc/mm: Increase the width of #define
No real change, only style changes

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:11 +11:00
Aneesh Kumar K.V
506b863c68 powerpc/mm: Remove pte_val usage for the second half of pgtable_t
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:10 +11:00
Aneesh Kumar K.V
bf680d5160 powerpc/mm: Don't track subpage valid bit in pte_t
This free up 11 bits in pte_t. In the later patch we also change
the pte_t format so that we can start supporting migration pte
at pmd level. We now track 4k subpage valid bit as below

If we have _PAGE_COMBO set, we override the _PAGE_F_GIX_SHIFT
and _PAGE_F_SECOND. Together we have 4 bits, each of them
used to indicate whether any of the 4 4k subpage in that group
is valid. ie,

[ group 1 bit ]   [ group 2 bit ]  ..... [ group 4 ]
[ subpage 1 - 4]  [ subpage 5- 8]  ..... [ subpage 13 - 16]

We still track each 4k subpage slot number and secondary hash
information in the second half of pgtable_t. Removing the subpage
tracking have some significant overhead on aim9 and ebizzy benchmark and
to support THP with 4K subpage, we do need a pgtable_t of 4096 bytes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:10 +11:00
Aneesh Kumar K.V
106713a145 powerpc/mm: Remove the dependency on pte bit position in asm code
We should not expect pte bit position in asm code. Simply
by moving part of that to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:10 +11:00
Aneesh Kumar K.V
91f1da9979 powerpc/mm: Convert 4k hash insert to C
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:09 +11:00
Aneesh Kumar K.V
17ed9e3192 powerpc/booke: Move nohash headers
Move the booke related headers below booke/32 or booke/64

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:09 +11:00
Aneesh Kumar K.V
1ca7212932 powerpc/mm: Move PTE bits from generic functions to hash64 functions.
functions which operate on pte bits are moved to hash*.h and other
generic functions are moved to pgtable.h

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:08 +11:00
Aneesh Kumar K.V
371352ca0e powerpc/mm: Move hash64 PTE bits from book3s/64/pgtable.h to hash.h
This enables us to keep hash64 related bits together, and makes it easy
to follow.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:08 +11:00
Aneesh Kumar K.V
f281b5d50c powerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalue
We convert them static inline function here as we did with pte_val in
the previous patch

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:07 +11:00
Aneesh Kumar K.V
10bd3808df powerpc/mm: Don't use pte_val as lvalue
We also convert few #define to static inline in this patch for better
type checking

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:07 +11:00
Aneesh Kumar K.V
b0412ea94b powerpc/mm: Drop pte-common.h from BOOK3S 64
We copy only needed PTE bits define from pte-common.h to respective
hash related header. This should greatly simply later patches in which
we are going to change the pte format for hash config

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:06 +11:00
Aneesh Kumar K.V
ee4889c7bc powerpc/mm: Don't have generic headers introduce functions touching pte bits
We are going to drop pte_common.h in the later patch. The idea is to
enable hash code not require to define all PTE bits. Having PTE bits
defined in pte_common.h made the code unnecessarily complex.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:06 +11:00
Aneesh Kumar K.V
cbbb8683fb powerpc/mm: Delete booke bits from book3s
We also move __ASSEMBLY__ towards the end of header. This avoid
having #ifndef __ASSEMBLY___ all over the header

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:05 +11:00
Aneesh Kumar K.V
ab537dca2f powerpc/mm: Move hash specific pte width and other defines to book3s
This further make a copy of pte defines to book3s/64/hash*.h. This
remove the dependency on pgtable-ppc64-4k.h and pgtable-ppc64-64k.h

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:05 +11:00
Aneesh Kumar K.V
3dfcb315d8 powerpc/mm: make a separate copy for book3s
In this patch we do:
cp pgtable-ppc32.h book3s/32/pgtable.h
cp pgtable-ppc64.h book3s/64/pgtable.h

This enable us to do further changes to hash specific config.
We will change the page table format for 64bit hash in later patches.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:05 +11:00
Aneesh Kumar K.V
26b6a3d9bb powerpc/mm: move pte headers to book3s directory
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:04 +11:00
Aneesh Kumar K.V
0863d7f213 powerpc/mm: Fix infinite loop in hash fault with 4K page size
This is the same bug we fixed as part of 09567e7fd4
("powerpc/mm: Check paca psize is up to date for huge mappings"). Please
check that for details. The difference here is that faults were
happening on a 4K page at an address previously mapped by hugetlb.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:04 +11:00
Scott Wood
666db563d3 EDAC, mpc85xx: Make mpc85xx-pci-edac a platform device
Originally the mpc85xx-pci-edac driver bound directly to the PCI
controller node.

Commit

  905e75c46d ("powerpc/fsl-pci: Unify pci/pcie initialization code")

turned the PCI controller code into a platform device. Since we can't
have two drivers binding to the same device, the EDAC code was changed
to be called into as a library-style submodule. However, this doesn't
work if the EDAC driver is built as a module.

Commit

  8d8fcba6d1ea ("EDAC: Rip out the edac_subsys reference counting")

exposed another problem with this approach -- mpc85xx_pci_err_probe()
was being called in the same early boot phase that the PCI controller
is initialized, rather than in the device_initcall phase that the EDAC
layer expects. This caused a crash on boot.

To fix this, the PCI controller code now creates a child platform device
specifically for EDAC, which the mpc85xx-pci-edac driver binds to.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Jia Hongtao <B38951@freescale.com>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Kim Phillips <kim.phillips@freescale.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Link: http://lkml.kernel.org/r/1449774432-18593-1-git-send-email-scottwood@freescale.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-12-11 16:56:16 +01:00
Bjorn Helgaas
800e07b609 Merge branches 'pci/aspm', 'pci/hotplug', 'pci/misc' and 'pci/msi' into next
* pci/aspm:
  PCI/ASPM: Make sysfs link_state_store() consistent with link_state_show()

* pci/hotplug:
  PCI: pciehp: Always protect pciehp_disable_slot() with hotplug mutex

* pci/misc:
  x86/PCI: Simplify pci_bios_{read,write}
  PCI: Simplify config space size computation
  PCI: Limit config space size for Netronome NFP6000 family
  PCI: Add Netronome vendor and device IDs
  PCI: Support PCIe devices with short cfg_size
  x86/PCI: Clarify AMD Fam10h config access restrictions comment
  PCI: Print warnings for all invalid expansion ROM headers
  PCI: Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask

* pci/msi:
  PCI/MSI: Remove empty pci_msi_init_pci_dev()
  PCI/MSI: Initialize MSI capability for all architectures
2015-12-10 19:40:14 -06:00
Bjorn Helgaas
93de690176 PCI: Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask
Bit 7 of the "Header Type" register indicates a multi-function device when
set.  Bits 0-6 contain encoded values, where 0x1 indicates a PCI-PCI
bridge.  It is incorrect to test this as though it were a mask.

For example, while the PCI 3.0 spec only defines values 0x0, 0x1, and 0x2,
it's conceivable that a future spec could define 0x3 to mean something
else; then tests for "(hdr_type & 0x7f) & PCI_HEADER_TYPE_BRIDGE" would
incorrectly succeed for this new 0x3 header type.

Test bits 0-6 of the Header Type for equality with PCI_HEADER_TYPE_BRIDGE.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-12-10 19:38:06 -06:00
Anton Blanchard
db1231dcdb powerpc: Fix DSCR inheritance over fork()
Two DSCR tests have a hack in them:

	/*
	 * XXX: Force a context switch out so that DSCR
	 * current value is copied into the thread struct
	 * which is required for the child to inherit the
	 * changed value.
	 */
	sleep(1);

We should not be working around this in the testcase, it is a kernel bug.
Fix it by copying the current DSCR to the child, instead of what we
had in the thread struct at last context switch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-10 21:11:13 +11:00
Anton Blanchard
20dbe67062 powerpc: Call restore_sprs() before _switch()
commit 152d523e63 ("powerpc: Create context switch helpers save_sprs()
and restore_sprs()") moved the restore of SPRs after the call to _switch().

There is an issue with this approach - new tasks do not return through
_switch(), they are set up by copy_thread() to directly return through
ret_from_fork() or ret_from_kernel_thread(). This means restore_sprs() is
not getting called for new tasks.

Fix this by moving restore_sprs() before _switch().

Fixes: 152d523e63 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-10 21:10:55 +11:00
Anton Blanchard
d64d02ce4e powerpc: Call check_if_tm_restore_required() in enable_kernel_*()
Commit a0e72cf12b ("powerpc: Create msr_check_and_{set,clear}()")
removed a call to check_if_tm_restore_required() in the
enable_kernel_*() functions. Add them back in.

Fixes: a0e72cf12b ("powerpc: Create msr_check_and_{set,clear}()")
Reported-by: Rashmica Gupta <rashmicy@gmail.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-10 20:10:53 +11:00
Thomas Huth
696066f875 KVM: PPC: Increase memslots to 512
Only using 32 memslots for KVM on powerpc is way too low, you can
nowadays hit this limit quite fast by adding a couple of PCI devices
and/or pluggable memory DIMMs to the guest.

x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256
pluggable DIMM slots, 3 private slots and 253 slots for other things
like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in
total). We should do something similar for powerpc, and since we do
not use private slots here, we can set the value to 512 directly.

While we're at it, also remove the KVM_MEM_SLOTS_NUM definition
from the powerpc-specific header since this gets defined in the
generic kvm_host.h header anyway.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-10 11:36:24 +11:00
Paul Mackerras
c20875a3e6 KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR
Currently it is possible for userspace (e.g. QEMU) to set a value
for the MSR for a guest VCPU which has both of the TS bits set,
which is an illegal combination.  The result of this is that when
we execute a hrfid (hypervisor return from interrupt doubleword)
instruction to enter the guest, the CPU will take a TM Bad Thing
type of program interrupt (vector 0x700).

Now, if PR KVM is configured in the kernel along with HV KVM, we
actually handle this without crashing the host or giving hypervisor
privilege to the guest; instead what happens is that we deliver a
program interrupt to the guest, with SRR0 reflecting the address
of the hrfid instruction and SRR1 containing the MSR value at that
point.  If PR KVM is not configured in the kernel, then we try to
run the host's program interrupt handler with the MMU set to the
guest context, which almost certainly causes a host crash.

This closes the hole by making kvmppc_set_msr_hv() check for the
illegal combination and force the TS field to a safe value (00,
meaning non-transactional).

Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-10 11:34:27 +11:00
Al Viro
b808b1d632 don't open-code generic_file_llseek_size()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-09 13:00:45 -05:00
Geyslan G. Bem
edfaff269f KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s'
The vcpu_book3s variable is assigned but never used. So remove it.
Found using cppcheck.

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-09 16:06:32 +11:00
Thomas Huth
760a7364f2 KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
In the old DABR register, the BT (Breakpoint Translation) bit
is bit number 61. In the new DAWRX register, the WT (Watchpoint
Translation) bit is bit number 59. So to move the DABR-BT bit
into the position of the DAWRX-WT bit, it has to be shifted by
two, not only by one. This fixes hardware watchpoints in gdb of
older guests that only use the H_SET_DABR/X interface instead
of the new H_SET_MODE interface.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-09 16:05:01 +11:00
Paul Mackerras
1c9e3d51d5 KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
As we saw with the TM Bad Thing type of program interrupt occurring
on the hrfid that enters the guest, it is not completely impossible
to have a trap occurring in the guest entry/exit code, despite the
fact that the code has been written to avoid taking any traps.

This adds a check in the kvmppc_handle_exit_hv() function to detect
the case when a trap has occurred in the hypervisor-mode code, and
instead of treating it just like a trap in guest code, we now print
a message and return to userspace with a KVM_EXIT_INTERNAL_ERROR
exit reason.

Of the various interrupts that get handled in the assembly code in
the guest exit path and that can return directly to the guest, the
only one that can occur when MSR.HV=1 and MSR.EE=0 is machine check
(other than system call, which we can avoid just by not doing a sc
instruction).  Therefore this adds code to the machine check path to
ensure that if the MCE occurred in hypervisor mode, we exit to the
host rather than trying to continue the guest.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-12-09 15:46:14 +11:00
Andrew Donnellan
dc9c41bd9e Revert "powerpc/eeh: Don't unfreeze PHB PE after reset"
This reverts commit 527d10ef3a.

The reverted commit breaks cxlflash devices following an EEH reset (and
possibly other cxl devices, however this has not been tested).

The reverted commit changed the behaviour of eeh_reset_device() so that PHB
PEs are not unfrozen following the completion of the reset. This should not
be problematic, as no device resources should have been associated with the
PHB PE.

However, when attempting to load the cxlflash driver after a reset, the
driver attempts to read Vital Product Data through a call to
pci_read_vpd() (which is called on the physical cxl device, not on the
virtual AFU device). pci_read_vpd() in turn attempts to read from the cxl
device's config space. This fails, as the PE it's trying to read from is
still frozen. In turn, the driver gets an -ENODEV and fails to initialise.

It appears this issue only affects some parts of the VPD area, as "lspci
-vvv", which only reads a subset of the VPD bytes, is not broken by the
original patch.

At this stage, we don't fully understand why we're trying to read a frozen
PE, and we don't know how this affects other cxl devices. It is possible
that there is an underlying bug in the cxl driver or the powerpc CAPI
support code, or alternatively a bug in the PCI resource allocation/mapping
code that is incorrectly mapping resources to PE#0.

As such, this fix is incomplete, however it is necessary to prevent a
serious regression in CAPI support.

In the meantime, revert the commit, especially as it was intended to be a
non-functional change.

Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Ian Munsie <imunsie@au1.ibm.com>
Cc: Daniel Axtens <dja@axtens.net>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-09 14:05:10 +11:00
Paul Gortmaker
5b01310cfc powerpc/sbc8641: drop bogus PHY IRQ entries from DTS file
This file was originally cloned off of the MPC8641D-HPCN reference
platform, which actually had a PHY IRQ line connected. However this
board does not. The bogus entry was largely inert and went undetected
until commit 321beec504 ("net: phy: Use
interrupts when available in NOLINK state") was added to the tree.

With the above commit, the board fails to NFS boot since it sits waiting
for a PHY IRQ event that of course never arrives. Removing the bogus
entries from the DTS file fixes the issue.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-09 14:00:39 +11:00
Alistair Popple
25642e1459 powerpc/opal-irqchip: Fix double endian conversion
The OPAL event calls return a mask of events that are active in big
endian format. This is checked when unmasking the events in the
irqchip by comparison with a cached value. The cached value was stored
in big endian format but should've been converted to CPU endian
first.

This bug leads to OPAL event delivery being delayed or dropped on some
systems. Symptoms may include a non-functional console.

The bug is fixed by calling opal_handle_events(...) instead of
duplicating code in opal_event_unmask(...).

Fixes: 9f0fd0499d ("powerpc/powernv: Add a virtual irqchip for opal events")
Cc: stable@vger.kernel.org # v4.2+
Reported-by: Douglas L Lehr <dllehr@us.ibm.com>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-08 16:53:31 +11:00
Rusty Russell
7523e4dc50 module: use a structure to encapsulate layout.
Makes it easier to handle init vs core cleanly, though the change is
fairly invasive across random architectures.

It simplifies the rbtree code immediately, however, while keeping the
core data together in the same cachline (now iff the rbtree code is
enabled).

Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-12-04 22:46:25 +01:00
Davidlohr Bueso
d5a73cadf3 lcoking/barriers, arch: Use smp barriers in smp_store_release()
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 11:39:51 +01:00
Anton Blanchard
d1e1cf2e38 powerpc: clean up asm/switch_to.h
Remove a bunch of unnecessary fallback functions and group
things in a more logical way.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-02 19:34:41 +11:00
Anton Blanchard
f3d885ccba powerpc: Rearrange __switch_to()
Most of __switch_to() is housekeeping, TLB batching, timekeeping etc.
Move these away from the more complex and critical context switching
code.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-02 19:34:41 +11:00
Anton Blanchard
579e633e76 powerpc: create flush_all_to_thread()
Create a single function that flushes everything (FP, VMX, VSX, SPE).
Doing this all at once means we only do one MSR write.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-02 19:34:40 +11:00
Anton Blanchard
c208505900 powerpc: create giveup_all()
Create a single function that gives everything up (FP, VMX, VSX, SPE).
Doing this all at once means we only do one MSR write.

A context switch microbenchmark using yield():

http://ozlabs.org/~anton/junkcode/context_switch2.c

./context_switch2 --test=yield --fp --altivec --vector 0 0

shows an improvement of 3% on POWER8.

Signed-off-by: Anton Blanchard <anton@samba.org>
[mpe: giveup_all() needs to be EXPORT_SYMBOL'ed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-02 19:34:26 +11:00
Anton Blanchard
1f2e25b2d5 powerpc: Remove fp_enable() and vec_enable(), use msr_check_and_{set, clear}()
More consolidation of our MSR available bit handling.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-01 13:52:26 +11:00