Commit Graph

14300 Commits

Author SHA1 Message Date
Gavin Shan
4d6186ca6f powerpc/powernv: Remove pnv_eeh_cap_start()
This moves the logic of pnv_eeh_cap_start() to pnv_eeh_find_cap()
as the function is only called by pnv_eeh_find_cap(). The logic
of both functions are pretty simple. No need to have separate
functions.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:15 +11:00
Gavin Shan
608fb9c296 powerpc/powernv: Cleanup on EEH comments
This applies cleanup on eeh-powernv.c, no functional changes:

   * Remove unnecessary comments and empty line.
   * Correct inaccurate comments.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:15 +11:00
Gavin Shan
00ba05a12b powerpc/pseries: Cleanup on pseries_eeh_get_state()
This cleans up pseries_eeh_get_state(), no functional changes:

   * Return EEH_STATE_NOT_SUPPORT early when the 2nd RTAS output
     argument is zero to avoid nested if statements.
   * Skip clearing bits in the PE state represented by variable
     "result" to simplify the code.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:42:14 +11:00
Gavin Shan
872ee2d652 powerpc/eeh: More relaxed condition for enabled IO path
When one or both of the below two flags are marked in the PE state, the
PE's IO path is regarded as enabled: EEH_STATE_MMIO_ACTIVE or
EEH_STATE_MMIO_ENABLED.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:41:43 +11:00
Gavin Shan
8234fcedf1 powerpc/eeh: Force reset on fenced PHB
On fenced PHB, the error handlers in the drivers of its subordinate
devices could return PCI_ERS_RESULT_CAN_RECOVER, indicating no reset
will be issued during the recovery. It's conflicting with the fact
that fenced PHB won't be recovered without reset.

This limits the return value from the error handlers in the drivers
of the fenced PHB's subordinate devices to PCI_ERS_RESULT_NEED_NONE
or PCI_ERS_RESULT_NEED_RESET, to ensure reset will be issued during
recovery.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:41:43 +11:00
Gavin Shan
f2da4ccf8b powerpc/eeh: More relaxed hotplug criterion
Currently, we rely on the existence of struct pci_driver::err_handler
to decide if the corresponding PCI device should be unplugged during
EEH recovery (partially hotplug case). However that check is not
sufficient. Some device drivers implement only some of the EEH error
handlers to collect diag-data. That means the driver still expects a
hotplug to recover from the EEH error.

This makes the hotplug criterion more relaxed: if the device driver
doesn't provide all necessary EEH error handlers, it will experience
hotplug during EEH recovery.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
[mpe: Minor change log rewording]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:39:07 +11:00
Gavin Shan
527d10ef3a powerpc/eeh: Don't unfreeze PHB PE after reset
On PowerNV platform, the PE is kept in frozen state until the PE
reset is completed to avoid recursive EEH error caused by MMIO
access during the period of EEH reset. The PE's frozen state is
cleared after BARs of PCI device included in the PE are restored
and enabled. However, we needn't clear the frozen state for PHB PE
explicitly at this point as there is no real PE for PHB PE. As the
PHB PE is always binding with PE#0, we actually clear PE#0, which
is wrong. It doesn't incur any problem though.

This checks if the PE is PHB PE and doesn't clear the frozen state
if it is.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:06:57 +11:00
Geoff Levand
879c26d4f6 powerpc/ps3: Quieten boot wrapper output with run_cmd
Add a boot wrapper script function run_cmd which will run a shell command
quietly and only print the output if either V=1 or an error occurs.

Also, run the ps3 dd commands with run_cmd to clean up the build output.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 20:06:56 +11:00
Gautham R. Shenoy
70aa3961a1 KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path
Currently a CPU running a guest can receive a H_DOORBELL in the
following two cases:
1) When the CPU is napping due to CEDE or there not being a guest
vcpu.
2) The CPU is running the guest vcpu.

Case 1), the doorbell message is not cleared since we were waking up
from nap. Hence when the EE bit gets set on transition from guest to
host, the H_DOORBELL interrupt is delivered to the host and the
corresponding handler is invoked.

However in Case 2), the message gets cleared by the action of taking
the H_DOORBELL interrupt. Since the CPU was running a guest, instead
of invoking the doorbell handler, the code invokes the second-level
interrupt handler to switch the context from the guest to the host. At
this point the setting of the EE bit doesn't result in the CPU getting
the doorbell interrupt since it has already been delivered once. So,
the handler for this doorbell is never invoked!

This causes softlockups if the missed DOORBELL was an IPI sent from a
sibling subcore on the same CPU.

This patch fixes it by explitly invoking the doorbell handler on the
exit path if the exit reason is H_DOORBELL similar to the way an
EXTERNAL interrupt is handled. Since this will also handle Case 1), we
can unconditionally clear the doorbell message in
kvmppc_check_wake_reason.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21 16:31:52 +11:00
Nikunj A Dadhania
bfec5c2cc0 KVM: PPC: Implement extension to report number of memslots
QEMU assumes 32 memslots if this extension is not implemented. Although,
current value of KVM_USER_MEM_SLOTS is 32, once KVM_USER_MEM_SLOTS
changes QEMU would take a wrong value.

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21 16:31:46 +11:00
Paul Mackerras
c64dfe2af3 KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent HPTEs
This fixes a bug where the old HPTE value returned by H_REMOVE has
the valid bit clear if the HPTE was an absent HPTE, as happens for
HPTEs for emulated MMIO pages and for RAM pages that have been paged
out by the host.  If the absent bit is set, we clear it and set the
valid bit, because from the guest's point of view, the HPTE is valid.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21 16:25:06 +11:00
Paul Mackerras
572abd563b KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation ioctl
Currently the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested
size of HPT, and if that is not possible, then try to allocate smaller
sizes (by factors of 2) until either a minimum is reached or the
allocation succeeds.  This is not ideal for userspace, particularly in
migration scenarios, where the destination VM really does require the
size requested.  Also, the minimum HPT size of 256kB may be
insufficient for the guest to run successfully.

This removes the fallback to smaller sizes on allocation failure for
the KVM_PPC_ALLOCATE_HTAB ioctl.  The fallback still exists for the
case where the HPT is allocated at the time the first VCPU is run, if
no HPT has been allocated by ioctl by that time.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-21 16:24:57 +11:00
Christophe Jaillet
1856f50c66 powerpc/prom: Avoid reference to potentially freed memory
of_get_property() is used inside the loop, but then the reference to the
node is dropped before dereferencing the prop pointer, which could by then
point to junk if the node has been freed.

Instead use of_property_read_u32() to actually read the property
value before dropping the reference.

of_property_read_u32() requires at least one cell (u32) to be present,
which is stricter than the old logic which would happily dereference a
property of any size. However we believe all device trees in the wild
have at least one cell.

Skiboot may produce memory nodes with more than one cell, but that is
OK, of_property_read_u32() will return the first one.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[mpe: Expand change log with device tree details]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-21 15:31:25 +11:00
David S. Miller
26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
Stephane Eranian
24f1a79a5f perf/powerpc: Add support for PERF_SAMPLE_BRANCH_CALL
The patch catches PERF_SAMPLE_BRANCH_CALL because it is not clear whether
this is actually supported by the hardware.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-4-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:30:54 +02:00
Michael Ellerman
bed08b7e1f powerpc/cell: Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU
The TUNE_CELL option allows you to build a kernel that runs on multiple
CPUs but is tuned (ie. optimised) to run on Cell CPUs. Now days no one
is building a distro in that fashion, and any users who are building
custom kernels for their Cell machines are better off building with
CONFIG_CELL_CPU, which builds a kernel that only runs on Cell and
therefore can be optimised even more aggresively.

Dropping the option also avoids confusing other users, who are presented
with an option to tune for Cell when they are not building for a Cell
CPU at all.

Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-19 19:51:18 +11:00
Hon Ching \(Vicky\) Lo
9e5d4af458 vTPM: get the buffer allocated for event log instead of the actual log
The OS should ask Power Firmware (PFW) for the size of the buffer
allocated for the event log, instead of the size of the actual
event log.  It then passes the buffer adddress and size to PFW in
the handover process, into which PFW copies the log.

Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2015-10-19 01:01:23 +02:00
Hon Ching \(Vicky\) Lo
b4ed0469d0 vTPM: reformat event log to be byte-aligned
The event log generated by OpenFirmware in PowerPC is 4-byte aligned.
This patch reformats the log to be byte-aligned for the Linux client.

Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2015-10-19 01:01:23 +02:00
Hon Ching \(Vicky\) Lo
2f82e98265 vTPM: fix searching for the right vTPM node in device tree
Replace all occurrences of '/ibm,vtpm' with '/vdevice/vtpm',
as only the latter is guanranteed to be available for the client OS.
The '/ibm,vtpm' node should only be used by Open Firmware, which
is susceptible to changes.

Signed-off-by: Hon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
2015-10-19 01:01:22 +02:00
Scott Wood
1930bb5ccf powerpc/fsl_pci: Don't set up inbound windows in kdump crash kernel
Otherwise, because the top end of the crash kernel is treated as the
absolute top of memory rather than the beginning of a reserved region,
in-flight DMA from the previous kernel that targets areas above the
crash kernel can trigger a storm of PCI errors.  We only do this for
kdump, not normal kexec, in case kexec is being used to upgrade to a
kernel that wants a different inbound memory map.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mingkai Hu <Mingkai.hu@freescale.com>
2015-10-17 00:36:37 -05:00
Scott Wood
1112450a18 powerpc/85xx: Don't use generic timebase sync on 64-bit
85xx currently uses the generic timebase sync mechanism when
CONFIG_KEXEC is enabled, because 32-bit 85xx kexec support does a hard
reset of each core.  64-bit 85xx kexec does not do this, so we neither
need nor want this (nor is the generic timebase sync code built on
ppc64).

FWIW, I don't like the fact that the hard reset is done on 32-bit
kexec, and I especially don't like the timebase sync being triggered
only on the presence of CONFIG_KEXEC rather than actually booting in
that environment, but that's beyond the scope of this patch...

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:36 -05:00
Scott Wood
9f640bf532 powerpc/fsl-corenet: Disable coreint if kexec is enabled
Problems have been observed in coreint (EPR) mode if interrupts are
left pending (due to the lack of device quiescence with kdump) after
having tried to deliver to a CPU but unable to deliver due to MSR[EE]
-- interrupts no longer get reliably delivered in the new kernel.  I
tried various ways of fixing it up inside the crash kernel itself, and
none worked (including resetting the entire mpic).  Masking all
interrupts and issuing EOIs in the crashing kernel did help a lot of
the time, but the behavior was not consistent.

Thus, stick to standard IACK mode when kdump is a possibility.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:36 -05:00
Scott Wood
01c593d749 powerpc/fsl-booke-64: Allow booting from the secondary thread
This allows SMP kernels to work as kdump crash kernels.  While crash
kernels don't really need to be SMP, this prevents things from breaking
if a user does it anyway (which is not something you want to only find
out once the main kernel has crashed in the field, especially if
whether it works or not depends on which cpu crashed).

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:35 -05:00
poonam aggrwal
5224644551 powerpc/b4860: Renamed the L2 caches
To make provision for more than one L2 caches in the system, change the
name from L2 to L2_1; same as in T4 platforms.
* Also remove the L2 entry from common file
  "arch/powerpc/boot/dts/fsl/b4si-post.dtsi"
  Keep them only in separate files for b4860 and b4420.

Signed-off-by: Shaveta Leekha <shaveta@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:34 -05:00
Hongtao Jia
dc37374b9c powerpc/fsl: Move Freescale device tree files into fsl folder
It makes no sense that some Freescale device tree files are in fsl
directory while some others not. This patch move Freescale device tree
files into fsl folder. To do that the following two steps are made:
- Move Freescale device tree files into fsl folder.
- Update the include path in these files from "fsl/*.dtsi" to "*.dtsi".

Please add "fsl/" prefix when you make dtb using Makefile.

Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com>
[scottwood: fixed cuImage rule]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:34 -05:00
poonam aggrwal
2d80f651c8 powerpc/b4860: Removed LIODN register from sRIO node
In case of B4860 LIODN register for sRIO is not in GUTs block but in
the sRIO register space.

Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:33 -05:00
Andy Fleming
c383ee84e1 powerpc/85xx: Add support for Varisys Cyrus board
This board uses a P5020 chip, and boots just fine using
the corenet_generic code. The device tree is very similar to the
P5020DS, except that there is no Flash memory. The environment is,
instead, stored on an MMC card on the motherboard.

Signed-off-by: Andy Fleming <afleming@gmail.com>
[scottwood: fixed trailing whitespace]
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-17 00:36:33 -05:00
Scott Wood
072daeed55 powerpc/fsl_pci: Check for get_user/probe_kernel_address failure
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hongtao Jia <hongtao.jia@freescale.com>
2015-10-17 00:35:41 -05:00
Zhao Qiang
a109752ea6 powerpc/t104xd4rdb: add DS26522 nodes to device tree
DS26522 is used for tdm, configured by SPI bus.
Add nodes under spi node to t104xd4rdb.dtsi.

Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-16 19:32:11 -05:00
Uwe Kleine-König
78c102c24c powerpc/dts: don't fall back to fsl,pq3-gpio for fsl,mpc8572-gpio
While the handling of fsl,pq3-gpio and fsl,mpc8572-gpio is done in the
same driver and the two hardly differ, the latter controller needs a
workaround for an erratum in the gpio_get callback. To make this
difference more explicit remove fsl,pq3-gpio from the list of
compatibles for mpc8572 machines.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-16 18:46:00 -05:00
Yangbo Lu
07e9117e43 powerpc/dts: Add and fix 1588 timer node for eTSEC
Add 1588 timer node in files:
arch/powerpc/boot/dts/bsc9131rdb.dtsi
arch/powerpc/boot/dts/bsc9132qds.dtsi
arch/powerpc/boot/dts/p1010rdb.dtsi
arch/powerpc/boot/dts/p1020rdb-pd.dts
arch/powerpc/boot/dts/p1021rdb-pc.dtsi
arch/powerpc/boot/dts/p1022ds.dtsi
arch/powerpc/boot/dts/p1025twr.dtsi
For P2020RDB-PC, registers' values should be calculated
based on default 1588 reference clock(300MHz) not 250MHz,
and fix this in file:
arch/powerpc/boot/dts/p2020rdb-pc.dtsi

Signed-off-by: Yangbo Lu <yangbo.lu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-16 18:45:11 -05:00
chenhui zhao
881ea7d3f5 powerpc/corenet: use the mixed mode of MPIC when enabling CPU hotplug
Core reset may cause issue if using the proxy mode of MPIC.
Use the mixed mode of MPIC if enabling CPU hotplug.

Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-16 18:44:27 -05:00
Linus Torvalds
ebb65c81e1 powerpc fixes for 4.3 #3
- Re-enable CONFIG_SCSI_DH in our defconfigs
  - Remove unused os_area_db_id_video_mode
  - cxl: fix leak of IRQ names in cxl_free_afu_irqs() from Andrew
  - cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API from Andrew
  - cxl: fix leak of ctx->mapping when releasing kernel API contexts from Andrew
  - cxl: Workaround malformed pcie packets on some cards from Philippe
  - cxl: Fix number of allocated pages in SPA from Christophe Lombard
  - Fix checkstop in native_hpte_clear() with lockdep from Cyril
  - Panic on unhandled Machine Check on powernv from Daniel
  - selftests/powerpc: Fix build failure of load_unaligned_zeropad test
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWIM0xAAoJEFHr6jzI4aWAsCIP/04uAiPCqWOwHjr8/eAlNAmJ
 GaA6b91QUUpBlyXgzYZShS/FQEnyukbGTUzaS3KwijOdRJtCHxvl2eG7pOCws+GS
 2YeA9mBm7MgYT0BJ+KLGCgrF5C/sc+LN3udO9Kf1LimLpp+fIILHgEmhrfy00wUp
 f7tJ/Rvpt23PmcCDX0PhA7NuOrRu5hQOQ9rsqJfzc7XObZAG1AfISPgALgaeAINc
 XqQfWiNFLmDJyhV9K39rUXSTvHYl6pPnfDj4GelfjQD2l/csH0M4MeGW2tHNkgVy
 CakLWOP3zdZVTYTcB8wypnoZxATPhEsHehJmQ4fu3n0WR1vHfCqh4rFZuPaaX0NG
 P3In0eOV285RIpNLcwkchN+07Ops1Fvi5XonaQpgHCcI9c4H7IAGPbQau2DhR9sU
 DyZQ+/6wNzpXbM7llM3VyTA2zvvyiuEzuIZI78XWexO/Ny6TCItRtEqJEXMA+ChX
 lKbLluRnQcnn5sizK0yj4mtkffAbu7Za1KGl1nm1Q/5pBQWsC40wFcRLNNdzqVmH
 7tSp8cIEYunCYKy5bAheWJTzpUgGD55EEcUkQFHVm5LKBXyA73qJRSMuLZqtnB3z
 g6eTiEKhZvVFedNMDNFnNWrvOnd8JpyjGLRAbqgwMhN+lgVvmwwSSB6V2SefMnuL
 HCSGqR40vPA9bH0Cz/ND
 =3ze+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - Re-enable CONFIG_SCSI_DH in our defconfigs
 - Remove unused os_area_db_id_video_mode
 - cxl: fix leak of IRQ names in cxl_free_afu_irqs() from Andrew
 - cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API from Andrew
 - cxl: fix leak of ctx->mapping when releasing kernel API contexts from Andrew
 - cxl: Workaround malformed pcie packets on some cards from Philippe
 - cxl: Fix number of allocated pages in SPA from Christophe Lombard
 - Fix checkstop in native_hpte_clear() with lockdep from Cyril
 - Panic on unhandled Machine Check on powernv from Daniel
 - selftests/powerpc: Fix build failure of load_unaligned_zeropad test

* tag 'powerpc-4.3-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  selftests/powerpc: Fix build failure of load_unaligned_zeropad test
  powerpc/powernv: Panic on unhandled Machine Check
  powerpc: Fix checkstop in native_hpte_clear() with lockdep
  cxl: Fix number of allocated pages in SPA
  cxl: Workaround malformed pcie packets on some cards
  cxl: fix leak of ctx->mapping when releasing kernel API contexts
  cxl: fix leak of ctx->irq_bitmap when releasing context via kernel API
  cxl: fix leak of IRQ names in cxl_free_afu_irqs()
  powerpc/ps3: Remove unused os_area_db_id_video_mode
  powerpc/configs: Re-enable CONFIG_SCSI_DH
2015-10-16 12:07:43 -07:00
Mahesh Salgaonkar
966d713e86 KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE
For the machine check interrupt that happens while we are in the guest,
kvm layer attempts the recovery, and then delivers the machine check interrupt
directly to the guest if recovery fails. On successful recovery we go back to
normal functioning of the guest. But there can be cases where a machine check
interrupt can happen with MSR(RI=0) while we are in the guest. This means
MC interrupt is unrecoverable and we have to deliver a machine check to the
guest since the machine check interrupt might have trashed valid values in
SRR0/1. The current implementation do not handle this case, causing guest
to crash with Bad kernel stack pointer instead of machine check oops message.

[26281.490060] Bad kernel stack pointer 3fff9ccce5b0 at c00000000000490c
[26281.490434] Oops: Bad kernel stack pointer, sig: 6 [#1]
[26281.490472] SMP NR_CPUS=2048 NUMA pSeries

This patch fixes this issue by checking MSR(RI=0) in KVM layer and forwarding
unrecoverable interrupt to guest which then panics with proper machine check
Oops message.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-16 11:53:47 +11:00
Michael Ellerman
ad987fc8ea powerpc/xmon: Add some more elements to the existing PACA dump list
This patch adds a set of new elements to the existing PACA dump list
inside an xmon session which can be listed below improving the overall
xmon debug support.

With this patch, a typical xmon PACA dump looks something like this.

paca for cpu 0x0 @ c00000000fdc0000:
 possible             = yes
 present              = yes
 online               = yes
 lock_token           = 0x8000            	(0xa)
 paca_index           = 0x0               	(0x8)
 kernel_toc           = 0xc000000001393200	(0x10)
 kernelbase           = 0xc000000000000000	(0x18)
 kernel_msr           = 0xb000000000001033	(0x20)
 emergency_sp         = 0xc00000003fff0000	(0x28)
 mc_emergency_sp      = 0xc00000003ffec000	(0x2e0)
 in_mce               = 0x0               	(0x2e8)
 hmi_event_available  = 0x0               	(0x2ea)
 data_offset          = 0x1fe7b0000       	(0x30)
 hw_cpu_id            = 0x0               	(0x38)
 cpu_start            = 0x1               	(0x3a)
 kexec_state          = 0x0               	(0x3b)
 slb_shadow[0]:       = 0xc000000008000000 0x40016e7779000510
 slb_shadow[1]:       = 0xd000000008000001 0x400142add1000510
 vmalloc_sllp         = 0x510             	(0x1b8)
 slb_cache_ptr        = 0x4               	(0x1ba)
 slb_cache[0]:        = 0x000000000003f000
 slb_cache[1]:        = 0x0000000000000001
 slb_cache[2]:        = 0x0000000000000003
 slb_cache[3]:        = 0x0000000000001000
 slb_cache[4]:        = 0x0000000000001000
 slb_cache[5]:        = 0x0000000000000000
 slb_cache[6]:        = 0x0000000000000000
 slb_cache[7]:        = 0x0000000000000000
 dscr_default         = 0x0               	(0x58)
 __current            = 0xc000000001331e80	(0x290)
 kstack               = 0xc000000001393e30	(0x298)
 stab_rr              = 0x11              	(0x2a0)
 saved_r1             = 0xc0000001fffef5e0	(0x2a8)
 trap_save            = 0x0               	(0x2b8)
 soft_enabled         = 0x0               	(0x2ba)
 irq_happened         = 0x1               	(0x2bb)
 io_sync              = 0x0               	(0x2bc)
 irq_work_pending     = 0x0               	(0x2bd)
 nap_state_lost       = 0x0               	(0x2be)
 sprg_vdso            = 0x0               	(0x2c0)
 tm_scratch           = 0x8000000100009033	(0x2c8)
 core_idle_state_ptr  = (null)            	(0x2d0)
 thread_idle_state    = 0x0               	(0x2d8)
 thread_mask          = 0x0               	(0x2d9)
 subcore_sibling_mask = 0x0               	(0x2da)
 user_time            = 0x0               	(0x2f0)
 system_time          = 0x0               	(0x2f8)
 user_time_scaled     = 0x0               	(0x300)
 starttime            = 0x3f462418b5cf4   	(0x308)
 starttime_user       = 0x3f4622a57092a   	(0x310)
 startspurr           = 0xd62a5718        	(0x318)
 utime_sspurr         = 0x0               	(0x320)
 stolen_time          = 0x0               	(0x328)

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Endian swap slb_shadow before display, minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:02 +11:00
Sam bobroff
0c23a88ccc powerpc/xmon: Paginate kernel log buffer display
The kernel log buffer is often much longer than the size of a terminal
so paginate it's output.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:02 +11:00
Sam bobroff
958b7c8050 powerpc/xmon: Paged output for paca display
The paca display is already more than 24 lines, which can be problematic
if you have an old school 80x24 terminal, or more likely you are on a
virtual terminal which does not scroll for whatever reason.

This patch adds a new command "#", which takes a single (hex) numeric
argument: lines per page. It will cause the output of "dp" and "dpa"
to be broken into pages, if necessary.

Sample output:

0:mon> # 10
0:mon> dp1
paca for cpu 0x1 @ c00000000fdc0480:
 possible         = yes
 present          = yes
 online           = yes
 lock_token       = 0x8000            	(0x8)
 paca_index       = 0x1               	(0xa)
 kernel_toc       = 0xc000000000eb2400	(0x10)
 kernelbase       = 0xc000000000000000	(0x18)
 kernel_msr       = 0xb000000000001032	(0x20)
 emergency_sp     = 0xc00000003ffe8000	(0x28)
 mc_emergency_sp  = 0xc00000003ffe4000	(0x2e0)
 in_mce           = 0x0               	(0x2e8)
 data_offset      = 0x7f170000        	(0x30)
 hw_cpu_id        = 0x8               	(0x38)
 cpu_start        = 0x1               	(0x3a)
 kexec_state      = 0x0               	(0x3b)
[Hit a key (a:all, q:truncate, any:next page)]
0:mon>
 __current        = 0xc00000007e696620	(0x290)
 kstack           = 0xc00000007e6ebe30	(0x298)
 stab_rr          = 0xb               	(0x2a0)
 saved_r1         = 0xc00000007ef37860	(0x2a8)
 trap_save        = 0x0               	(0x2b8)
 soft_enabled     = 0x0               	(0x2ba)
 irq_happened     = 0x1               	(0x2bb)
 io_sync          = 0x0               	(0x2bc)
 irq_work_pending = 0x0               	(0x2bd)
 nap_state_lost   = 0x0               	(0x2be)
0:mon>

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[mpe: Use bool, make some variables static]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:01 +11:00
Christophe Jaillet
b340587e68 powerpc/mpc5xxx: Use of_get_next_parent to simplify code
of_get_next_parent can be used to simplify the while() loop and
avoid the need of a temp variable.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:01 +11:00
Christophe Jaillet
1def37586f powerpc/numa: Use of_get_next_parent to simplify code
of_get_next_parent can be used to simplify the while() loop and
avoid the need of a temp variable.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:00 +11:00
Paul Gortmaker
5fab1d1cb1 powerpc: Delete old orphaned PrPMC 280/2800 DTS and boot file.
In commit 3c8464a9b1 ("powerpc:
Delete old PrPMC 280/2800 support") we got rid of most of the C
code, and the Makefile/Kconfig hooks, but it seems I left the
platform's DTS file orphaned in the tree as well as the boot code.
Here we get rid of them both.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:00 +11:00
Stephen Rothwell
4c8123181d powerpc: discard .exit.data at runtime
.exit.text is discarded at run time and there are some references from
that to .exit.data, so we need to discard .exit.data at run time as well.

Fixes these errors:

`.exit.data' referenced in section `.exit.text' of drivers/built-in.o: defined in discarded section `.exit.data' of drivers/built-in.o
`.exit.data' referenced in section `.exit.text' of drivers/built-in.o: defined in discarded section `.exit.data' of drivers/built-in.o

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:59 +11:00
Gavin Shan
54f9a64a36 powerpc/eeh: atomic_dec_if_positive() to update passthru count
No need to have two atomic opertions (update and fetch/check) when
decreasing PE's number of passed devices as one atomic operation
is enough.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:58 +11:00
Andrew Donnellan
6b8b252f40 powerpc/pci: export pcibios_free_controller()
Export pcibios_free_controller(), so it can be used by the cxl module to
free virtual PHBs.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:57 +11:00
Sam bobroff
a34236155a powerpc: Individual System V IPC system calls
This patch provides individual system call numbers for the following
System V IPC system calls, on PowerPC, so that they do not need to be
multiplexed:
* semop, semget, semctl, semtimedop
* msgsnd, msgrcv, msgget, msgctl
* shmat, shmdt, shmget, shmctl

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:57 +11:00
Michael Ellerman
f7688056cb powerpc/pseries: Drop always true CONFIG_PSERIES_MSI
Now that pseries selects PCI_MSI && PCI, EEH will always be true, and
therefore CONFIG_PSERIES_MSI will always be true. So drop it, and move
msi.o to obj-y.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:56 +11:00
Michael Ellerman
44f2aecfd0 powerpc/pseries: Move PCI objects to obj-y
Make it entirely clear in the Makefile that we always build the pci
related files by moving them to obj-y.

Note that CONFIG_EEH is now always enabled on pseries, because it
depends on PSERIES && PCI.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:56 +11:00
Michael Ellerman
84eb9e612b powerpc/pseries: Remove use of CONFIG_PCI
Now that we always have CONFIG_PCI=y for pseries, we can stop guarding
code with CONFIG_PCI ifdefs.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:55 +11:00
Michael Ellerman
4c9cd468b3 powerpc/pseries: Make PCI non-optional
The pseries build with PCI=n looks to have been broken for at least 5
years, and no one's noticed or cared.

Following the obvious breakages backward, the first commit I can find
that builds is the parent of 2eb4afb69f ("powerpc/pci: Move pseries
code into pseries platform specific area") from April 2009.

A distro would never ship a PCI=n kernel, so it is only useful for folks
building custom kernels. Also on KVM the virtio devices appear on PCI,
so it would only be useful if you were building kernels specifically to
run on PowerVM and with no PCI devices.

The added code complexity, and testing load (which we've clearly not
been doing), is not justified by the small reduction in kernel size for
such a niche use case.

So just make PCI non-optional on pseries.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:31:54 +11:00
Tudor Laurentiu
224f363246 KVM: PPC: e500: fix couple of shift operations on 64 bits
Fix couple of cases where we shift left a 32-bit
value thus might get truncated results on 64-bit
targets.

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Suggested-by: Scott Wood <scotttwood@freescale.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-15 15:59:19 +11:00
Tudor Laurentiu
2daab50e17 KVM: PPC: e500: Emulate TMCFG0 TMRN register
Emulate TMCFG0 TMRN register exposing one HW thread per vcpu.

Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[Laurentiu.Tudor@freescale.com: rebased on latest kernel, use
 define instead of hardcoded value, moved code in own function]
Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Acked-by: Scott Wood <scotttwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-15 15:58:16 +11:00
Andrzej Hajda
d4cd4f9586 KVM: PPC: e500: fix handling local_sid_lookup result
The function can return negative value.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-15 15:58:16 +11:00
Tudor Laurentiu
6a14c22224 powerpc/e6500: add TMCFG0 register definition
The register is not currently used in the base kernel
but will be in a forthcoming kvm patch.

Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-10-15 15:58:16 +11:00
Marc Zyngier
5d4c9bc776 irqdomain: Use irq_domain_get_of_node() instead of direct field access
The struct irq_domain contains a "struct device_node *" field
(of_node) that is almost the only link between the irqdomain
and the device tree infrastructure.

In order to prepare for the removal of that field, convert all
users to use irq_domain_get_of_node() instead.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-and-tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Graeme Gregory <graeme@xora.org.uk>
Cc: Jake Oshins <jakeo@microsoft.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Link: http://lkml.kernel.org/r/1444737105-31573-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-13 19:01:23 +02:00
Aneesh Kumar K.V
891121e6c0 powerpc/mm: Differentiate between hugetlb and THP during page walk
We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd73
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12 15:30:09 +11:00
Aneesh Kumar K.V
ec2640b114 powerpc/mm: Disable hugepd for 64K page size.
After commit e2b3d202d1
("powerpc: Switch 16GB and 16MB explicit hugepages to a
different page table format"), we don't need to support
is_hugepd() for 64K page size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12 15:29:59 +11:00
Daniel Axtens
f2dd80ecca powerpc/powernv: Panic on unhandled Machine Check
All unrecovered machine check errors on PowerNV should cause an
immediate panic. There are 2 reasons that this is the right policy:
it's not safe to continue, and we're already trying to reboot.

Firstly, if we go through the recovery process and do not successfully
recover, we can't be sure about the state of the machine, and it is
not safe to recover and proceed.

Linux knows about the following sources of Machine Check Errors:
- Uncorrectable Errors (UE)
- Effective - Real Address Translation (ERAT)
- Segment Lookaside Buffer (SLB)
- Translation Lookaside Buffer (TLB)
- Unknown/Unrecognised

In the SLB, TLB and ERAT cases, we can further categorise these as
parity errors, multihit errors or unknown/unrecognised.

We can handle SLB errors by flushing and reloading the SLB. We can
handle TLB and ERAT multihit errors by flushing the TLB. (It appears
we may not handle TLB and ERAT parity errors: I will investigate
further and send a followup patch if appropriate.)

This leaves us with uncorrectable errors. Uncorrectable errors are
usually the result of ECC memory detecting an error that it cannot
correct, but they also crop up in the context of PCI cards failing
during DMA writes, and during CAPI error events.

There are several types of UE, and there are 3 places a UE can occur:
Skiboot, the kernel, and userspace. For Skiboot errors, we have the
facility to make some recoverable. For userspace, we can simply kill
(SIGBUS) the affected process. We have no meaningful way to deal with
UEs in kernel space or in unrecoverable sections of Skiboot.

Currently, these unrecovered UEs fall through to
machine_check_expection() in traps.c, which calls die(), which OOPSes
and sends SIGBUS to the process. This sometimes allows us to stumble
onwards. For example we've seen UEs kill the kernel eehd and
khugepaged. However, the process killed could have held a lock, or it
could have been a more important process, etc: we can no longer make
any assertions about the state of the machine. Similarly if we see a
UE in skiboot (and again we've seen this happen), we're not in a
position where we can make any assertions about the state of the
machine.

Likewise, for unknown or unrecognised errors, we're not able to say
anything about the state of the machine.

Therefore, if we have an unrecovered MCE, the most appropriate thing
to do is to panic.

The second reason is that since e784b6499d ("powerpc/powernv: Invoke
opal_cec_reboot2() on unrecoverable machine check errors."), we
attempt a special OPAL reboot on an unhandled MCE. This is so the
hardware can record error data for later debugging.

The comments in that commit assert that we are heading down the panic
path anyway. At the moment this is not always true. With UEs in kernel
space, for instance, they are marked as recoverable by the hardware,
so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't
panic() but would simply die() and OOPS. It doesn't make sense to be
staggering on if we've just tried to reboot: we should panic().

Explicitly panic() on unrecovered MCEs on PowerNV.
Update the comments appropriately.

This fixes some hangs following EEH events on cxlflash setups.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09 08:07:19 +11:00
Colin Ian King
c13e1c05b2 powerpc/pseries/hvcserver: don't memset pi_buff if it is null
pi_buff is being memset before it is sanity checked. Move the
memset after the null pi_buff sanity check to avoid an oops.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09 08:03:03 +11:00
Aneesh Kumar K.V
f78f7ed726 powerpc: Fix _ALIGN_* errors due to type difference.
This avoid errors like

        unsigned int usize = 1 << 30;
        int size = 1 << 30;
        unsigned long addr = 64UL << 30 ;

        value = _ALIGN_DOWN(addr, usize); -> 0
        value = _ALIGN_DOWN(addr, size);  -> 0x1000000000

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09 08:02:25 +11:00
Cyril Bur
fdf880a608 powerpc: Fix checkstop in native_hpte_clear() with lockdep
native_hpte_clear() is called in real mode from two places:
- Early in boot during htab initialisation if firmware assisted dump is
  active.
- Late in the kexec path.

In both contexts there is no need to disable interrupts are they are
already disabled. Furthermore, locking around the tlbie() is only required
for pre POWER5 hardware.

On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre
POWER5 hardware concurrent tlbie()s could result in deadlock. This code
would only be executed at crashdump time, during which all bets are off,
concurrent tlbie()s are unlikely and taking locks is unsafe therefore the
best course of action is to simply do nothing. Concurrent tlbie()s are not
possible in the first case as secondary CPUs have not come up yet.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09 08:01:38 +11:00
Chris Metcalf
7a5692e6e5 arch/powerpc: provide zero_bytemask() for big-endian
For some reason, only the little-endian flavor of
powerpc provided the zero_bytemask() implementation.

Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-10-08 11:44:12 -04:00
Ingo Molnar
d3df65c198 Merge branch 'perf/urgent' into perf/core, before pulling new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-08 10:52:18 +02:00
Claudiu Manoil
70963d245e powerpc: dts: p1022si: Add fsl,wake-on-filer for eTSEC
Enable the "wake-on-filer" (aka. wake on user defined packet)
wake on lan capability for the eTSEC ethernet nodes.

Cc: Li Yang <leoli@freescale.com>
Cc: Zhao Chenhui <chenhui.zhao@freescale.com>

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07 04:19:43 -07:00
Chris Metcalf
19c22f3a29 word-at-a-time.h: fix some Kbuild files
arch/tile added word-at-a-time.h after the patch that added generic-y
entries; the generic-y entry is now stale.

arch/h8300 is newer than the generic-y patch for word-at-a-time.h,
and needs a generic-y entry.

arch/powerpc seems to have gotten a generic-y entry by mistake in
the first patch; this change removes it.

Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
2015-10-06 14:52:48 -04:00
Samuel Mendoza-Jonas
1b70386c99 powerpc/kexec: Wait 1s for secondaries to enter OPAL
Always include a timeout when waiting for secondary cpus to enter OPAL
in the kexec path, rather than only when crashing.

Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-06 21:57:42 +11:00
Christophe Leroy
255e8e046c powerpc/8xx: Shorten irq_chip name for the SIU
show_interrupts() expects the irq_chip name to be max 8 characters
otherwise everything get misaligned

# cat /proc/interrupts
           CPU0
 17:          0   CPM PIC   0 Level     error
 19:          0  MPC8XX SIU  15 Level     tbint
 20:         90   CPM PIC   4 Level     cpm_uart
 38:      29746  MPC8XX SIU   5 Level     fs_enet-mac
 39:          0  MPC8XX SIU   7 Level     fs_enet-mac
 47:        401   CPM PIC   5 Level     fsl_spi
 68:          1  MPC8XX SIU   2 Level     phy_interrupt, phy_interrupt, phy_interrupt
LOC:    7225485   Local timer interrupts for timer event device
LOC:          9   Local timer interrupts for others
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
MCE:          0   Machine check exceptions

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-06 21:54:42 +11:00
Denis Kirjanov
cb2d3883c6 powerpc/msi: Free the bitmap if it was slab allocated
During the MSI bitmap test on boot kmemleak spews the following trace:

unreferenced object 0xc00000016e86c900 (size 64):
    comm "swapper/0", pid 1, jiffies 4294893173 (age 518.024s)
    hex dump (first 32 bytes):
	00 00 01 ff 7f ff 7f 37 00 00 00 00 00 00 00 00
	.......7........
	ff ff ff ff ff ff ff ff 01 ff ff ff ff
	ff ff ff
	................
	backtrace:
	[<c00000000003eebc>] .zalloc_maybe_bootmem+0x3c/0x380
	[<c000000000042d6c>] .msi_bitmap_alloc+0x3c/0xb0
	[<c000000000a9aff8>] .msi_bitmap_selftest+0x30/0x2b4
	[<c0000000000090f4>] .do_one_initcall+0xd4/0x270
	[<c000000000a8e250>] .kernel_init_freeable+0x1a0/0x280
	[<c000000000009b5c>] .kernel_init+0x1c/0x120
	[<c000000000007fbc>] .ret_from_kernel_thread+0x58/0x9c

Add a flag to msi_bitmap for tracking allocations from slab and memblock
so we can properly free/handle memory in msi_bitmap_free().

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[mpe: Reword changelog & use bitmap_from_slab in the if]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-05 21:32:50 +11:00
Andy Shevchenko
06bacefcbd powerpc/pseries: re-use code from of_helpers module
The derive_parent() has similar semantics to what we have in newly introduced
of_helpers module. The replacement reduces code base and propagates the actual
error code to the caller.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-05 21:11:28 +11:00
Andy Shevchenko
a46d988409 powerpc/pseries: handle nodes without '/'
In case we have node without '/' strrchr() returns NULL which might lead to
crash. Replace strrchr() by kbasename() and modify condition to avoid such
behaviour.

Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-05 21:11:27 +11:00
Andy Shevchenko
a030e1e4bb powerpc/pseries: replace kmalloc + strlcpy
The helper kstrndup() will do the same in one line.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-05 21:11:26 +11:00
Andy Shevchenko
dc85aaed64 powerpc/pseries: fix a potential memory leak
In case we have a full node name like /foo/bar and /foo is not found the
parent_path left unfreed. So, free a memory before return to a caller.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-05 21:11:25 +11:00
Andy Shevchenko
948ad1acaf powerpc/pseries: extract of_helpers module
Extract a new module to share the code between other modules.

There is no functional change.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-05 21:11:24 +11:00
Linus Torvalds
30c44659f4 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.

Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.

The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.

strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result.  To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.

strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string.  Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated.  It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.

strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG.  It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.

So why did I waffle about this for so long?

Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.

And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.

So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches.  Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: use global strscpy() rather than private copy
  string: provide strscpy()
  Make asm/word-at-a-time.h available on all architectures
2015-10-04 16:31:13 +01:00
Daniel Borkmann
a91263d520 ebpf: migrate bpf_prog's flags to bitfield
As we need to add further flags to the bpf_prog structure, lets migrate
both bools to a bitfield representation. The size of the base structure
(excluding insns) remains unchanged at 40 bytes.

Add also tags for the kmemchecker, so that it doesn't throw false
positives. Even in case gcc would generate suboptimal code, it's not
being accessed in performance critical paths.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03 05:02:39 -07:00
Christophe Jaillet
b6080db4f4 powerpc/nvram: Fix function name in some errors messages.
'nvram_create_os_partition' should be 'nvram_create_partition'.
Use __func__ to have it right, as done elsewhere in this file.

Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-02 22:55:05 +10:00
Christophe Jaillet
7d52318717 powerpc/nvram: Add missing kfree in error path
If 'nvram_write_header' fails, then 'new_part' should be freed, otherwise,
there is a memory leak.

Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-02 22:54:55 +10:00
Michael Ellerman
2adc48a691 powerpc: Add ppc64le_defconfig
Based directly on ppc64_defconfig using merge_config.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:03 +10:00
Aneesh Kumar K.V
65d3223a85 powerpc/mm: Add virt_to_pfn and use this instead of opencoding
This add helper virt_to_pfn and remove the opencoded usage of the
same.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:02 +10:00
Michael Neuling
c974809a26 powerpc/vdso: Avoid link stack corruption in __get_datapage()
powerpc has a link register (lr) used for calling functions. We "bl
<func>" to call a function, and "blr" to return back to the call site.

The lr is only a single register, so if we call another function from
inside this function (ie. nested calls), software must save away the
lr on the software stack before calling the new function. Before
returning (ie. before the "blr"), the lr is restored by software from
the software stack.

This makes branch prediction quite difficult for the processor as it
will only know the branch target just before the "blr".

To help with this, modern powerpc processors keep a (non-architected)
hardware stack of lr called a "link stack". When a "bl <func>" is
run, the lr is pushed onto this stack. When a "blr" is called, the
branch predictor pops the lr value from the top of the link stack, and
uses it to predict the branch target. Hence the processor pipeline
knows a lot earlier the branch target.

This works great but there are some cases where you call "bl" but
without a matching "blr". Once such case is when trying to determine
the program counter (which can't be read directly). Here you "bl+4;
mflr" to get the program counter. If you do this, the link stack will
get out of sync with reality, causing the branch predictor to
mis-predict subsequent function returns.

To avoid this, modern micro-architectures have a special case of bl.
Using the form "bcl 20,31,+4", ensures the processor doesn't push to
the link stack.

The 32 and 64 bit variants of __get_datapage() use a "bl; mflr" to
determine the loaded address of the VDSO. The current versions of
these attempt to use this special bl variant.

Unfortunately they use +8 rather than the required +4. Hence the
current code results in the link stack getting out of sync with
reality and hence the resulting performance degradation.

This patch moves it to bcl+4 by moving __kernel_datapage_offset out of
__get_datapage().

With this patch, running a gettimeofday() (which uses
__get_datapage()) microbenchmark we get a decent bump in performance
on POWER7/8.

For the benchmark in tools/testing/selftests/powerpc/benchmarks/gettimeofday.c
  POWER8:
    64bit gets ~4% improvement
    32bit gets ~9% improvement
  POWER7:
    64bit gets ~7% improvement

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Aaron Sawdey <sawdey@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:02 +10:00
Michael Ellerman
26cd835ef8 powerpc/slb: Use a local to avoid multiple calls to get_slb_shadow()
For no reason other than it looks ugly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:01 +10:00
Anshuman Khandual
1d15010c34 powerpc/slb: Define an enum for the bolted indexes
This patch defines macros for the three bolted SLB indexes we use.
Switch the functions that take the indexes as an argument to use the
enum.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:01 +10:00
Michael Ellerman
787b393c9f powerpc/vdso: Emit GNU & SysV hashes
Andy Lutomirski says:

  Some dynamic loaders may be slightly faster if a GNU hash is
  available.

  This is unlikely to have any measurable effect on the time it takes
  to resolve vdso symbols (since there are so few of them).  In some
  contexts, it can be a win for a different reason: if every DSO has a
  GNU hash section, then libc can avoid calculating SysV hashes at
  all. Both musl and glibc appear to have this optimization.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:00 +10:00
Michael Ellerman
4fa9a3f6b6 powerpc/ps3: Remove unused os_area_db_id_video_mode
This struct is unused, which is now a build error with gcc 6:

  error: 'os_area_db_id_video_mode' defined but not used

There doesn't seem to be any good reason to keep it around so remove it,
it's in the history if anyone needs it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-30 10:33:02 +10:00
Geoff Levand
336382c78b powerpc/ps3: Refresh ps3_defconfig
Refresh and remove obsolete CONFIG_EXT3_FS.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-29 23:01:04 +10:00
Boqun Feng
e5e16d8f3e powerpc: Kconfig: remove BE-only platforms from LE kernel build
Currently, little endian is only supported on powernv and pseries,
however, Kconfigs still allow us to include other platforms in a LE
kernel, this may result in space wasting or even build error if some
BE-only platforms always assume they are built for a BE kernel. So just
modify the Kconfigs of BE-only platforms to remove them from being built
for a LE kernel.

For 32bit only platforms, nothing needs to be done, because
CPU_LITTLE_ENDIAN depends on PPC64. For 64bit supported platforms, add
CPU_BIG_ENDIAN to dependencies explicitly, so that these platforms will
be disabled for LE [Suggested-by: Cédric Le Goater <clg@fr.ibm.com>].

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-29 22:57:00 +10:00
Michael Ellerman
6b98f2bec5 powerpc/configs: Re-enable CONFIG_SCSI_DH
Commit 086b91d052 ("scsi_dh: integrate into the core SCSI code")
changed CONFIG_SCSI_DH from tristate to bool.

Our defconfigs have CONFIG_SCSI_DH=m, which the kconfig machinery warns
us is invalid, but instead of converting it to =y it leaves it unset.
This means we loose the CONFIG_SCSI_DH code and everything that depends
on it.

So convert the values in the defconfigs to =y.

Fixes: 086b91d052 ("scsi_dh: integrate into the core SCSI code")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-29 20:34:03 +10:00
Ingo Molnar
6afc0c269c Merge branch 'linus' into perf/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28 08:06:57 +02:00
Linus Torvalds
966966a630 PCI updates for v4.3:
Resource management
     - Revert pci_read_bridge_bases() unification (Bjorn Helgaas)
     - Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn Helgaas)
 
   MSI
     - Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson)
 
   Renesas R-Car host bridge driver
     - Add R8A7794 support (Sergei Shtylyov)
 
   Miscellaneous
     - Fix devfn for VPD access through function 0 (Alex Williamson)
     - Use function 0 VPD only for identical functions (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWBUq2AAoJEFmIoMA60/r8wbwP/0D/+fKEPYJlB6hx1wLHpVk3
 K//vEwH0RgA3v2X53QUoHg94gTYhSZLKX0zdAFshbphE0HCZ6AO3UO+/ZJ3cui6J
 PYvKOnhby2ErNotZqrs3DQIM8rGgl0ZVgoFQrAWEvwiHRHI/r2ArK/oR4PiBjxJT
 StYuJoTkZIlJyHXza6tvHDcWi+Jc8t8r0HC4Vs32BlaVBQM0SH3CMxHfhJw/Q9xP
 WHFif1sH0N+p7WDyHH71C1T8POOgXY73BsD2AC0se3lRYZ9SVkOVy9ECGUucx8F6
 LDAuFelwRvW2Dr9kh38+5f8Xp155E+eZ6zRWW9/JlrUKVEtHhOFhtrRfDNKHuDCt
 B9ETrxDiSUFAdQ2weye9BK6aXK0CHF6YP3PCbvK77qFUUsN8csFSKktanKrFAbML
 CdjkVkEoeLHw+aXzyDg0pSBRZMQ24dTQDh7YqOFZGuEjCLPXOEQ8nitf0IzBB0KI
 4QetT/QK3bKkgtVKTwPP+s9f4g+fA/oiwJ21ZTV9hi/9upywTa/umCUvH9Fmb8Fp
 VZeTzugSht0+ioXpaF/6/KO0Ccp/t5uAHYeuBBMqiHX7ks8DdnfPCwbWNRKkg25O
 Qy7Y8VnnOtesRCAqBq5y/hHlLUluMkjYpEblYFiD6HBWcjUh6xE6LlIO4mwnjqWI
 zjB3w7+0GOrvS7dBSx0N
 =f4c3
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "These are fixes for things we merged for v4.3 (VPD, MSI, and bridge
  window management), and a new Renesas R8A7794 SoC device ID.

  Details:

  Resource management:
   - Revert pci_read_bridge_bases() unification (Bjorn Helgaas)
   - Clear IORESOURCE_UNSET when clipping a bridge window (Bjorn
     Helgaas)

  MSI:
   - Fix MSI IRQ domains for VFs on virtual buses (Alex Williamson)

  Renesas R-Car host bridge driver:
   - Add R8A7794 support (Sergei Shtylyov)

  Miscellaneous:
   - Fix devfn for VPD access through function 0 (Alex Williamson)
   - Use function 0 VPD only for identical functions (Alex Williamson)"

* tag 'pci-v4.3-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: rcar: Add R8A7794 support
  PCI: Use function 0 VPD for identical functions, regular VPD for others
  PCI: Fix devfn for VPD access through function 0
  PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses
  PCI: Clear IORESOURCE_UNSET when clipping a bridge window
  PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
2015-09-25 11:16:53 -07:00
Linus Torvalds
b6d980f493 AMD fixes for bugs introduced in the 4.2 merge window,
and a few PPC bug fixes too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWBSn7AAoJEL/70l94x66Dxd4H/RT6kWWj9x4grEYUkcJUDyK2
 AXm7XcKQm04auwAic8Otr+ts/Qix/50kWmBe/TU0QLgqb8rj5Dj3yGFK6Z1y6mAz
 KvaxqMJd4tZGTqN0DDvC2ItEdzjfAdeJZo/FHXqPHVspG0G14T7STLna02LTBBEJ
 tNzY9qor8nFhg2fT2szqKaudUNgTqkCTpo57o2BrHE96SHG+m0WdpQCV1F5hPVpg
 Te0Pb7qX9xng5n3sQ7IV/t3QYbrza1ACwNQS9XJa0Yu6iEz7JdmVmzHQASK9ynn6
 hUHhsNYGx4IsPjPtfJk2GroNaRDZL+VMzw07tfcOvPx8xkS9hS63pwzmSBqfLrM=
 =Ywqn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
  bug fixes too"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: disable halt_poll_ns as default for s390x
  KVM: x86: fix off-by-one in reserved bits check
  KVM: x86: use correct page table format to check nested page table reserved bits
  KVM: svm: do not call kvm_set_cr0 from init_vmcb
  KVM: x86: trap AMD MSRs for the TSeg base and mask
  KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
  kvm: svm: reset mmu on VCPU reset
2015-09-25 10:51:40 -07:00
David Hildenbrand
920552b213 KVM: disable halt_poll_ns as default for s390x
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25 10:31:30 +02:00
Michael Ellerman
793b8bf9ca powerpc: Wire up sys_membarrier()
The selftest passes on 64-bit LE & BE, and 32-bit.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-21 17:27:08 +10:00
Thomas Huth
3eb4ee6825 KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
Access to the kvm->buses (like with the kvm_io_bus_read() and -write()
functions) has to be protected via the kvm->srcu lock.
The kvmppc_h_logical_ci_load() and -store() functions are missing
this lock so far, so let's add it there, too.
This fixes the problem that the kernel reports "suspicious RCU usage"
when lock debugging is enabled.

Cc: stable@vger.kernel.org # v4.1+
Fixes: 99342cf804
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-21 09:05:15 +10:00
Gautham R. Shenoy
7e022e717f KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
In guest_exit_cont we call kvmhv_commence_exit which expects the trap
number as the argument. However r3 doesn't contain the trap number at
this point and as a result we would be calling the function with a
spurious trap number.

Fix this by copying r12 into r3 before calling kvmhv_commence_exit as
r12 contains the trap number.

Cc: stable@vger.kernel.org # v4.1+
Fixes: eddb60fb14
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-21 09:05:12 +10:00
Paul Mackerras
5fc3e64f94 KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
This fixes a bug which results in stale vcore pointers being left in
the per-cpu preempted vcore lists when a VM is destroyed.  The result
of the stale vcore pointers is usually either a crash or a lockup
inside collect_piggybacks() when another VM is run.  A typical
lockup message looks like:

[  472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039]
[  472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv]
[  472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49
[  472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000
[  472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130
[  472.162337] REGS: c000001e41bff520 TRAP: 0901   Not tainted  (4.2.0-kvm+)
[  472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24848844  XER: 00000000
[  472.162588] CFAR: c00000000096b0ac SOFTE: 1
GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001
GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821
GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740
GPR12: c000000000111130 c00000000fdae400
[  472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130
[  472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130
[  472.163179] Call Trace:
[  472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable)
[  472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90
[  472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv]
[  472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv]
[  472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm]
[  472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[  472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm]
[  472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0
[  472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0
[  472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0
[  472.164060] Instruction dump:
[  472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2
[  472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070

The bug is that kvmppc_run_vcpu does not correctly handle the case
where a vcpu task receives a signal while its guest vcpu is executing
in the guest as a result of being piggy-backed onto the execution of
another vcore.  In that case we need to wait for the vcpu to finish
executing inside the guest, and then remove this vcore from the
preempted vcores list.  That way, we avoid leaving this vcpu's vcore
on the preempted vcores list when the vcpu gets interrupted.

Fixes: ec25716508
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-21 09:00:47 +10:00
Linus Torvalds
3ae839454e Mostly stable material, a lot of ARM fixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJV+/ucAAoJEL/70l94x66DV8YH/1KDym/1GJ+/Br/YkHZnM53l
 3Q0PwSLu9cNcIL9lUuDLwGTaVj+y8ud1Hjr/uzvKwivktmUYVZhkdtnZmnanvGOM
 qKB9K3nFXCPx8uqy8Dn7fOwEKcg9FmDOTTkWy13HDnXO+V4crSVVt+rPw+6FUMld
 NV5tYdw9Lu7y3XrveDebPWaPtyDL7OAagzmeK47eMffxG7X9Hf1H2aT7HueRi7x/
 SkLIe3gmiOWmHVJDPE9TOmFYIj19gywDFysKes1gdVJLVUIXiELMT7SrvAYnToVB
 zISIEj7Zx4SINPxpf2dUn8REm7NsmJY+PffLIl/Nv+ozGggFQGFH0SMZ08p0bxw=
 =tfmn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Mostly stable material, a lot of ARM fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  sched: access local runqueue directly in single_task_running
  arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
  arm64: KVM: Remove all traces of the ThumbEE registers
  arm: KVM: Disable virtual timer even if the guest is not using it
  arm64: KVM: Disable virtual timer even if the guest is not using it
  arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
  KVM: s390: Replace incorrect atomic_or with atomic_andnot
  arm: KVM: Fix incorrect device to IPA mapping
  arm64: KVM: Fix user access for debug registers
  KVM: vmx: fix VPID is 0000H in non-root operation
  KVM: add halt_attempted_poll to VCPU stats
  kvm: fix zero length mmio searching
  kvm: fix double free for fast mmio eventfd
  kvm: factor out core eventfd assign/deassign logic
  kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
  KVM: make the declaration of functions within 80 characters
  KVM: arm64: add workaround for Cortex-A57 erratum #852523
  KVM: fix polling for guest halt continued even if disable it
  arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
  arm64: KVM: set {v,}TCR_EL2 RES1 bits
  ...
2015-09-18 09:23:08 -07:00
Linus Torvalds
fadb97b089 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "This is a rather large update post rc1 due to the final steps of
  cleanups and API changes which had to wait for the preparatory patches
  to hit your tree.

   - Regression fixes for ARM GIC irqchips

   - Regression fixes and lockdep anotations for renesas irq chips

   - The leftovers of the cleanup and preparatory patches which have
     been ignored by maintainers

   - Final conversions of the newly merged users of obsolete APIs

   - Final removal of obsolete APIs

   - Final removal of ARM artifacts which had been introduced during the
     conversion of ARM to the generic interrupt code.

   - Final split of the irq_data into chip specific and common data to
     reflect the needs of hierarchical irq domains.

   - Treewide removal of the first argument of interrupt flow handlers,
     i.e. the irq number, which is not used by the majority of handlers
     and simple to retrieve from the other argument the irq descriptor.

   - A few comment updates and build warning fixes"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  arm64: Remove ununsed set_irq_flags
  ARM: Remove ununsed set_irq_flags
  sh: Kill off set_irq_flags usage
  irqchip: Kill off set_irq_flags usage
  gpu/drm: Kill off set_irq_flags usage
  genirq: Remove irq argument from irq flow handlers
  genirq: Move field 'msi_desc' from irq_data into irq_common_data
  genirq: Move field 'affinity' from irq_data into irq_common_data
  genirq: Move field 'handler_data' from irq_data into irq_common_data
  genirq: Move field 'node' from irq_data into irq_common_data
  irqchip/gic-v3: Use IRQD_FORWARDED_TO_VCPU flag
  irqchip/gic: Use IRQD_FORWARDED_TO_VCPU flag
  genirq: Provide IRQD_FORWARDED_TO_VCPU status flag
  genirq: Simplify irq_data_to_desc()
  genirq: Remove __irq_set_handler_locked()
  pinctrl/pistachio: Use irq_set_handler_locked
  gpio: vf610: Use irq_set_handler_locked
  powerpc/mpc8xx: Use irq_set_handler_locked()
  powerpc/ipic: Use irq_set_handler_locked()
  powerpc/cpm2: Use irq_set_handler_locked()
  ...
2015-09-18 08:11:42 -07:00
Linus Torvalds
f240bdd2a5 powerpc fixes for 4.3
- Fix 32-bit TCE table init in kdump kernel from Nish
  - Fix kdump with non-power-of-2 crashkernel= from Nish
  - Abort cxl_pci_enable_device_hook() if PCI channel is offline from Andrew
  - Fix to release DRC when configure_connector() fails from Bharata
  - Wire up sys_userfaultfd()
  - Fix race condition in tearing down MSI interrupts from Paul
  - Fix unbalanced pci_dev_get() in cxl_probe() from Daniel
  - Fix cxl build failure due to -Wunused-variable gcc behaviour change from Ian
  - Tell the toolchain to use ABI v2 when building an LE boot wrapper from Benh
  - Fix THP to recompute hash value after a failed update from Aneesh
  - 32-bit memcpy/memset: only use dcbz once cache is enabled from Christophe
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV+h5EAAoJEFHr6jzI4aWADboP/3jc6vUtOmFZ1GBGwXrftOc0
 PGtkTtYnGbaNsp+ZBl39Y+CFsSfhFVaDgylm/8G5NMKZaSBVdcFJwfU1w6Ymn49R
 nZkAT5PC9KgT5RTuRTZ3DO/Y2RC9vg2T9pXjEn8NGYcV8GgUkc3dZAn48S3AFgnV
 4jQI5sbxvwU12XkCUn+DkETh13g3gLYtRxwehBu/S/ovED5iNHKJwnXRzxyAA969
 dARNriSeyLBVMLamJ+rJB1S5hVTZTMbughFVVFbgriyIGuC/C1g9b9GN86dCGS6w
 T6VrKveK/iVCLUB16KV8+inbfvUrXItOxhGJWPHw9uAJGLZTz5G+yLHRRPX8onyC
 pgDesJDDpP/P7sAnKto3tF1Vzi7lwVtVPC1dT1Fc9VAWJGPYC/d16EKGIpNqqlnc
 mAIJ7wcI5c/HxvqXR2rdRV6fMer+aY7utwMsh4o/gDs7ArQUcuCrOKSW0jvHGmyr
 MumARXnUGDwPnGD8IfYI2vDOvwisv9g6XACwsM+pi499SfowaiLuK3utsMccagGZ
 INFtaqS7gpcj8TTu3kymw1TbKW/tqG9T81RRZ0rFH1q3aSfvwQ9QvsXctSeqOl/n
 lkxR3Mk0CT0oupXNKV6pjqsRwLroA0AF5tGxuw4ap1Rt4i8G7WHaPgRp7SPZxtkR
 qVBryfK5bKqYtWp4ztVK
 =Dgn5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit TCE table init in kdump kernel from Nish

 - Fix kdump with non-power-of-2 crashkernel= from Nish

 - Abort cxl_pci_enable_device_hook() if PCI channel is offline from
   Andrew

 - Fix to release DRC when configure_connector() fails from Bharata

 - Wire up sys_userfaultfd()

 - Fix race condition in tearing down MSI interrupts from Paul

 - Fix unbalanced pci_dev_get() in cxl_probe() from Daniel

 - Fix cxl build failure due to -Wunused-variable gcc behaviour change
   from Ian

 - Tell the toolchain to use ABI v2 when building an LE boot wrapper
   from Benh

 - Fix THP to recompute hash value after a failed update from Aneesh

 - 32-bit memcpy/memset: only use dcbz once cache is enabled from
   Christophe

* tag 'powerpc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc32: memset: only use dcbz once cache is enabled
  powerpc32: memcpy: only use dcbz once cache is enabled
  powerpc/mm: Recompute hash value after a failed update
  powerpc/boot: Specify ABI v2 when building an LE boot wrapper
  cxl: Fix build failure due to -Wunused-variable behaviour change
  cxl: Fix unbalanced pci_dev_get in cxl_probe
  powerpc/MSI: Fix race condition in tearing down MSI interrupts
  powerpc: Wire up sys_userfaultfd()
  powerpc/pseries: Release DRC when configure_connector fails
  cxl: abort cxl_pci_enable_device_hook() if PCI channel is offline
  powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=
  powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel
2015-09-18 08:01:06 -07:00
Marc Zyngier
705a7b474e powerpc/PCI: Fix lookup of linux,pci-probe-only property
When find_and_init_phbs() looks for the probe-only property, it seems to
trust the firmware to be correctly written, and assumes that there is a
parameter to the property.

It is conceivable that the firmware could not be that perfect, and it could
expose this property naked (at least one arm64 platform seems to exhibit
this exact behaviour).  The setup code the ends up making a decision based
on whatever the property pointer points to, which is likely to be junk.

Instead, switch to the common of_pci.c implementation that doesn't suffer
from this problem and ignore the property if the firmware couldn't make up
its mind.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-17 12:21:02 -05:00
LEROY Christophe
400c47d81c powerpc32: memset: only use dcbz once cache is enabled
memset() uses instruction dcbz to speed up clearing by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memset(). Allthough no part of the code
explicitly uses memset(), GCC may make calls to it.

This patch modifies memset() such that at startup, memset()
unconditionally skip the optimised bloc that uses dcbz instruction.

Once the initial MMU is set up, in machine_init() we patch memset()
by replacing this inconditional jump by a NOP

Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-17 10:36:53 +10:00
LEROY Christophe
1cd03890ea powerpc32: memcpy: only use dcbz once cache is enabled
memcpy() uses instruction dcbz to speed up copy by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memcpy(). Allthough no part of the code
explicitly uses memcpy(), GCC makes calls to it.

This patch modifies memcpy() such that at startup, memcpy()
unconditionally jumps to generic_memcpy() which doesn't use
the dcbz instruction.

Once the initial MMU is set up, in machine_init() we patch memcpy()
by replacing this inconditional jump by a NOP

Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-17 10:36:44 +10:00
Thomas Gleixner
bd0b9ac405 genirq: Remove irq argument from irq flow handlers
Most interrupt flow handlers do not use the irq argument. Those few
which use it can retrieve the irq number from the irq descriptor.

Remove the argument.

Search and replace was done with coccinelle and some extra helper
scripts around it. Thanks to Julia for her help!

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
2015-09-16 15:47:51 +02:00
Thomas Gleixner
9ca86b204b powerpc/mpc8xx: Use irq_set_handler_locked()
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.

Search and replacement was done with coccinelle:

@@
struct irq_data *d;
expression E1;
@@

-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-16 15:43:11 +02:00
Thomas Gleixner
9758a7b0e5 powerpc/ipic: Use irq_set_handler_locked()
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.

Search and replacement was done with coccinelle:

@@
struct irq_data *d;
expression E1;
@@

-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-16 15:43:11 +02:00
Thomas Gleixner
e9e879a3d6 powerpc/cpm2: Use irq_set_handler_locked()
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.

Search and replacement was done with coccinelle:

@@
struct irq_data *d;
expression E1;
@@

-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-16 15:43:10 +02:00
Thomas Gleixner
6b83bd9414 powerpc/mpc52xx: Use irq_set_handler_locked()
Use irq_set_handler_locked() as it avoids a redundant lookup of the
irq descriptor.

Search and replacement was done with coccinelle:

@@
struct irq_data *d;
expression E1;
@@

-__irq_set_handler_locked(d->irq, E1);
+irq_set_handler_locked(d, E1);

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-16 15:43:10 +02:00
Aneesh Kumar K.V
36b35d5d80 powerpc/mm: Recompute hash value after a failed update
If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.

Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.

Fixes: 6d492ecc64 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-16 22:06:03 +10:00
Benjamin Herrenschmidt
655471f54c powerpc/boot: Specify ABI v2 when building an LE boot wrapper
The kernel does it, not the boot wrapper, which breaks with some
cross compilers that still default to ABI v1.

Fixes: 147c05168f ("powerpc/boot: Add support for 64bit little endian wrapper")
Cc: stable@vger.kernel.org # v3.16+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-16 22:05:55 +10:00
Paolo Bonzini
62bea5bff4 KVM: add halt_attempted_poll to VCPU stats
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.

For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling.  This would show as an abnormally high number of
attempted polling compared to the successful polls.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 12:17:00 +02:00
Bjorn Helgaas
237865f195 PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"
Revert dff22d2054 ("PCI: Call pci_read_bridge_bases() from core instead
of arch code").

Reading PCI bridge windows is not arch-specific in itself, but there is PCI
core code that doesn't work correctly if we read them too early.  For
example, Hannes found this case on an ARM Freescale i.mx6 board:

  pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
  pci 0000:00:00.0: PCI bridge to [bus 01-ff]
  pci 0000:00:00.0: BAR 8: no space for [mem size 0x01000000] (mem window)
  pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00200000]
  pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x00004000]
  pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x00000100]

The 00:00.0 mem window needs to be at least 3MB: the 01:00.0 device needs
0x204100 of space, and mem windows are megabyte-aligned.

Bus sizing can increase a bridge window size, but never *decrease* it (see
d65245c329 ("PCI: don't shrink bridge resources")).  Prior to
dff22d2054, ARM didn't read bridge windows at all, so the "original size"
was zero, and we assigned a 3MB window.

After dff22d2054, we read the bridge windows before sizing the bus.  The
firmware programmed a 16MB window (size 0x01000000) in 00:00.0, and since
we never decrease the size, we kept 16MB even though we only needed 3MB.
But 16MB doesn't fit in the host bridge aperture, so we failed to assign
space for the window and the downstream devices.

I think this is a defect in the PCI core: we shouldn't rely on the firmware
to assign sensible windows.

Ray reported a similar problem, also on ARM, with Broadcom iProc.

Issues like this are too hard to fix right now, so revert dff22d2054.

Reported-by: Hannes <oe5hpm@gmail.com>
Reported-by: Ray Jui <rjui@broadcom.com>
Link: http://lkml.kernel.org/r/CAAa04yFQEUJm7Jj1qMT57-LG7ZGtnhNDBe=PpSRa70Mj+XhW-A@mail.gmail.com
Link: http://lkml.kernel.org/r/55F75BB8.4070405@broadcom.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
2015-09-15 13:18:04 -05:00
Jiang Liu
da92b4eb7e powerpc, irq: Use access helper irq_data_get_affinity_mask()
Use access helper irq_data_get_affinity_mask() so we can move the
affinity mask to irq_common_data.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1433145945-789-25-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-15 17:06:28 +02:00
Thomas Gleixner
391de7f9ef powerpc/cell: Prepare irq handler for irq argument removal
The irq argument of most interrupt flow handlers is unused or merily
used instead of a local variable. The handlers which need the irq
argument can retrieve the irq number from the irq descriptor.
    
Search and update was done with coccinelle and the invaluable help of
Julia Lawall.
    
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-14 10:30:05 +02:00
Thomas Gleixner
0a0dbd9258 powerpc/85xx: Prepare irq handlers for irq argument removal
The irq argument of most interrupt flow handlers is unused or merily
used instead of a local variable. The handlers which need the irq
argument can retrieve the irq number from the irq descriptor.
    
Search and update was done with coccinelle and the invaluable help of
Julia Lawall.
    
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-14 10:30:05 +02:00
Thomas Gleixner
5aac2d3368 powerpc/mpc5121_ads_cpld: Prepare irq handler for irq argument removal
The irq argument of most interrupt flow handlers is unused or merily
used instead of a local variable. The handlers which need the irq
argument can retrieve the irq number from the irq descriptor.
    
Search and update was done with coccinelle and the invaluable help of
Julia Lawall.
    
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
2015-09-14 10:30:04 +02:00
Sukadev Bhattiprolu
8f3e5684d3 perf/core: Drop PERF_EVENT_TXN
We currently use PERF_EVENT_TXN flag to determine if we are in the middle
of a transaction. If in a transaction, we defer the schedulability checks
from pmu->add() operation to the pmu->commit() operation.

Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
we can use the type to determine if we are in a transaction and drop the
PERF_EVENT_TXN flag.

When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
becomes unused, so drop that field as well.

This is an extension of the Powerpc patch from Peter Zijlstra to s390,
Sparc and x86 architectures.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:30 +02:00
Sukadev Bhattiprolu
88a486132d powerpc, perf/powerpc/hv-24x7: Use PMU_TXN_READ interface
The 24x7 counters in Powerpc allow monitoring a large number of counters
simultaneously. They also allow reading several counters in a single
HCALL so we can get a more consistent snapshot of the system.

Use the PMU's transaction interface to monitor and read several event
counters at once. The idea is that users can group several 24x7 events
into a single group of events. We use the following logic to submit
the group of events to the PMU and read the values:

	pmu->start_txn()		// Initialize before first event

	for each event in group
		pmu->read(event);	// Queue each event to be read

	pmu->commit_txn()		// Read/update all queuedcounters

The ->commit_txn() also updates the event counts in the respective
perf_event objects.  The perf subsystem can then directly get the
event counts from the perf_event and can avoid submitting a new
->read() request to the PMU.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-10-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:29 +02:00
Sukadev Bhattiprolu
fbbe070115 perf/core: Add a 'flags' parameter to the PMU transactional interfaces
Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group. A second
transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
by the 24x7 counters to read several counters at once.

Extend the transaction interfaces to the PMU to accept a 'txn_flags'
parameter and use this parameter to ignore any transactions that are
not of type PERF_PMU_TXN_ADD.

Thanks to Peter Zijlstra for his input.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[peterz: s390 compile fix]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 11:27:25 +02:00
Linus Torvalds
33e247c7e5 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:

 - even more of the rest of MM

 - lib/ updates

 - checkpatch updates

 - small changes to a few scruffy filesystems

 - kmod fixes/cleanups

 - kexec updates

 - a dma-mapping cleanup series from hch

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  dma-mapping: consolidate dma_set_mask
  dma-mapping: consolidate dma_supported
  dma-mapping: cosolidate dma_mapping_error
  dma-mapping: consolidate dma_{alloc,free}_noncoherent
  dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
  mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
  mm: make sure all file VMAs have ->vm_ops set
  mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
  mm: mark most vm_operations_struct const
  namei: fix warning while make xmldocs caused by namei.c
  ipc: convert invalid scenarios to use WARN_ON
  zlib_deflate/deftree: remove bi_reverse()
  lib/decompress_unlzma: Do a NULL check for pointer
  lib/decompressors: use real out buf size for gunzip with kernel
  fs/affs: make root lookup from blkdev logical size
  sysctl: fix int -> unsigned long assignments in INT_MIN case
  kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
  kexec: align crash_notes allocation to make it be inside one physical page
  kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
  kexec: split kexec_load syscall from kexec core code
  ...
2015-09-10 18:19:42 -07:00
Linus Torvalds
519f526d39 ARM:
- Full debug support for arm64
 - Active state switching for timer interrupts
 - Lazy FP/SIMD save/restore for arm64
 - Generic ARMv8 target
 
 PPC:
 - Book3S: A few bug fixes
 - Book3S: Allow micro-threading on POWER8
 
 x86:
 - Compiler warnings
 
 Generic:
 - Adaptive polling for guest halt
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJV7qd/AAoJEL/70l94x66DDBcH/2OLomKHjDOGXqJ/dpkqf4UU
 FYI1pVjs2zP4z3L7RYV/DeuEsD6XaWzS7EXQOS3mcb9d8GWahPrdofeVmpmhg/8y
 jmkuUEFHl2Ut6imk8qDlG3m42c86Mk8/1k38l1bp8S3lL0/Q7IyADyYAlHdwzpOx
 yEyOAE4VU4n+VyQH5dbnzc12QRTeHfRQc/dI3eQq238gf37SF/1qzOzeLIdbEa+N
 DCzqQ8SExbctiRaLzCY5Ogan+unZBQbFfhrDrUSryywrzo/8WRFVmbjuf5O5Ucxa
 +UTLMvmm1YgxvBvWhlcmA+HSzSVeWNvaHQ9illgE5+74G5CzaD2ukurmoz/+r+A=
 =XtrL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
 "ARM:
   - Full debug support for arm64
   - Active state switching for timer interrupts
   - Lazy FP/SIMD save/restore for arm64
   - Generic ARMv8 target

  PPC:
   - Book3S: A few bug fixes
   - Book3S: Allow micro-threading on POWER8

  x86:
   - Compiler warnings

  Generic:
   - Adaptive polling for guest halt"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (49 commits)
  kvm: irqchip: fix memory leak
  kvm: move new trace event outside #ifdef CONFIG_KVM_ASYNC_PF
  KVM: trace kvm_halt_poll_ns grow/shrink
  KVM: dynamic halt-polling
  KVM: make halt_poll_ns per-vCPU
  Silence compiler warning in arch/x86/kvm/emulate.c
  kvm: compile process_smi_save_seg_64() only for x86_64
  KVM: x86: avoid uninitialized variable warning
  KVM: PPC: Book3S: Fix typo in top comment about locking
  KVM: PPC: Book3S: Fix size of the PSPB register
  KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set
  KVM: PPC: Book3S HV: Fix race in starting secondary threads
  KVM: PPC: Book3S: correct width in XER handling
  KVM: PPC: Book3S HV: Fix preempted vcore stolen time calculation
  KVM: PPC: Book3S HV: Fix preempted vcore list locking
  KVM: PPC: Book3S HV: Implement H_CLEAR_REF and H_CLEAR_MOD
  KVM: PPC: Book3S HV: Fix bug in dirty page tracking
  KVM: PPC: Book3S HV: Fix race in reading change bit when removing HPTE
  KVM: PPC: Book3S HV: Implement dynamic micro-threading on POWER8
  KVM: PPC: Book3S HV: Make use of unused threads when running guests
  ...
2015-09-10 16:42:49 -07:00
Christoph Hellwig
452e06af1f dma-mapping: consolidate dma_set_mask
Almost everyone implements dma_set_mask the same way, although some time
that's hidden in ->set_dma_mask methods.

This patch consolidates those into a common implementation that either
calls ->set_dma_mask if present or otherwise uses the default
implementation.  Some architectures used to only call ->set_dma_mask
after the initial checks, and those instance have been fixed to do the
full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
been fixed.

Unfortunately some architectures overload unrelated semantics like changing
the dma_ops into it so we still need to allow for an architecture override
for now.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
ee196371d5 dma-mapping: consolidate dma_supported
Most architectures just call into ->dma_supported, but some also return 1
if the method is not present, or 0 if no dma ops are present (although
that should never happeb). Consolidate this more broad version into
common code.

Also fix h8300 which inorrectly always returned 0, which would have been
a problem if it's dma_set_mask implementation wasn't a similarly buggy
noop.

As a few architectures have much more elaborate implementations, we
still allow for arch overrides.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
efa21e432c dma-mapping: cosolidate dma_mapping_error
Currently there are three valid implementations of dma_mapping_error:

 (1) call ->mapping_error
 (2) check for a hardcoded error code
 (3) always return 0

This patch provides a common implementation that calls ->mapping_error
if present, then checks for DMA_ERROR_CODE if defined or otherwise
returns 0.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
1e8937526e dma-mapping: consolidate dma_{alloc,free}_noncoherent
Most architectures do not support non-coherent allocations and either
define dma_{alloc,free}_noncoherent to their coherent versions or stub
them out.

Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
implements them directly.

This patch moves the Openrisc version to common code, and handles the
DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.

Note that actual non-coherent allocations require a dma_cache_sync
implementation, so if non-coherent allocations didn't work on
an architecture before this patch they still won't work after it.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
6894258eda dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.

This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.

This patch (of 5):

The coherent DMA allocator works the same over all architectures supporting
dma_map operations.

This patch consolidates them and converges the minor differences:

 - the debug_dma helpers are now called from all architectures, including
   those that were previously missing them
 - dma_alloc_from_coherent and dma_release_from_coherent are now always
   called from the generic alloc/free routines instead of the ops
   dma-mapping-common.h always includes dma-coherent.h to get the defintions
   for them, or the stubs if the architecture doesn't support this feature
 - checks for ->alloc / ->free presence are removed.  There is only one
   magic instead of dma_map_ops without them (mic_dma_ops) and that one
   is x86 only anyway.

Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags.  An optional arch hook is provided
for that.

[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Dave Young
2965faa5e0 kexec: split kexec_load syscall from kexec core code
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Paul Mackerras
e297c939b7 powerpc/MSI: Fix race condition in tearing down MSI interrupts
This fixes a race which can result in the same virtual IRQ number
being assigned to two different MSI interrupts.  The most visible
consequence of that is usually a warning and stack trace from the
sysfs code about an attempt to create a duplicate entry in sysfs.

The race happens when one CPU (say CPU 0) is disposing of an MSI
while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
(for example) pnv_teardown_msi_irqs(), which calls
msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
to calling irq_dispose_mapping() to free up the virtal IRQ number,
CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
MSI, and gets the same hardware IRQ number that CPU 0 just freed.
CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
which sees that there is currently a mapping for that hardware IRQ
number and returns the corresponding virtual IRQ number (which is
the same virtual IRQ number that CPU 0 was using).  CPU 0 then
calls irq_dispose_mapping() and frees that virtual IRQ number.
Now, if another CPU comes along and calls irq_create_mapping(), it
is likely to get the virtual IRQ number that was just freed,
resulting in the same virtual IRQ number apparently being used for
two different hardware interrupts.

To fix this race, we just move the call to msi_bitmap_free_hwirqs()
to after the call to irq_dispose_mapping().  Since virq_to_hw()
doesn't work for the virtual IRQ number after irq_dispose_mapping()
has been called, we need to call it before irq_dispose_mapping() and
remember the result for the msi_bitmap_free_hwirqs() call.

The pattern of calling msi_bitmap_free_hwirqs() before
irq_dispose_mapping() appears in 5 places under arch/powerpc, and
appears to have originated in commit 05af7bd2d7 ("[POWERPC] MPIC
U3/U4 MSI backend") from 2007.

Fixes: 05af7bd2d7 ("[POWERPC] MPIC U3/U4 MSI backend")
Cc: stable@vger.kernel.org # v2.6.22+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-10 17:27:08 +10:00
Michael Ellerman
b855d45dc3 powerpc: Wire up sys_userfaultfd()
The selftest passes on 64-bit LE and BE.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-09 12:51:15 +10:00
Linus Torvalds
f6f7a63692 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:
 "Almost all of the rest of MM.  There was an unusually large amount of
  MM material this time"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
  zpool: remove no-op module init/exit
  mm: zbud: constify the zbud_ops
  mm: zpool: constify the zpool_ops
  mm: swap: zswap: maybe_preload & refactoring
  zram: unify error reporting
  zsmalloc: remove null check from destroy_handle_cache()
  zsmalloc: do not take class lock in zs_shrinker_count()
  zsmalloc: use class->pages_per_zspage
  zsmalloc: consider ZS_ALMOST_FULL as migrate source
  zsmalloc: partial page ordering within a fullness_list
  zsmalloc: use shrinker to trigger auto-compaction
  zsmalloc: account the number of compacted pages
  zsmalloc/zram: introduce zs_pool_stats api
  zsmalloc: cosmetic compaction code adjustments
  zsmalloc: introduce zs_can_compact() function
  zsmalloc: always keep per-class stats
  zsmalloc: drop unused variable `nr_to_migrate'
  mm/memblock.c: fix comment in __next_mem_range()
  mm/page_alloc.c: fix type information of memoryless node
  memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
  ...
2015-09-08 17:52:23 -07:00
Vlastimil Babka
96db800f5d mm: rename alloc_pages_exact_node() to __alloc_pages_node()
alloc_pages_exact_node() was introduced in commit 6484eb3e2a ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise.  In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see for example
commits 5265047ac3 ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb43f ("mm, mempolicy:
migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage.  Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly.  The number of users would be small
anyway.

Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead.  This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.

Both differences will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers.  Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Robin Holt <robinmholt@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08 15:35:28 -07:00
Linus Torvalds
12f03ee606 libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.  This facility is used by the pmem driver to
    enable pfn_to_page() operations on the page frames returned by DAX
    ('direct_access' in 'struct block_device_operations'). For now, the
    'memmap' allocation for these "device" pages comes from "System
    RAM".  Support for allocating the memmap from device memory will
    arrive in a later kernel.
 
 2/ Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt().  memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects.  The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.  Completion of
    the conversion is targeted for v4.4.
 
 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.
 
 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.
 
 5/ Miscellaneous updates and fixes to libnvdimm including support
    for issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
 OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
 nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
 NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
 zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
 sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
 bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
 dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
 slsw6DkrWT60CRE42nbK
 =o57/
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to ->direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
2015-09-08 14:35:59 -07:00
Linus Torvalds
dab3c3cc4f Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull core kbuild updates from Michal Marek:
 - modpost portability fix
 - linker script fix
 - genksyms segfault fix
 - fixdep cleanup
 - fix for clang detection

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Fix clang detection
  kbuild: fixdep: drop meaningless hash table initialization
  kbuild: fixdep: optimize code slightly
  genksyms: Regenerate parser
  genksyms: Duplicate function pointer type definitions segfault
  kbuild: Fix .text.unlikely placement
  Avoid conflict with host definitions when cross-compiling
2015-09-08 14:12:19 -07:00
Linus Torvalds
59a47fff02 Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
 
  o Allowing the trace event filters to filter on CPU number and process ids
 
  o Two new markers for trace output latency were added
     (10 and 100 msec latencies)
 
  o Have tracing_thresh filter function profiling time
 
 I also worked on modifying the ring buffer code for some future
 work, and moved the adding of the timestamp around. One of my changes
 caused a regression, and since other changes were built on top of it
 and already tested, I had to operate a revert of that change. Instead
 of rebasing, this change set has the code that caused a regression
 as well as the code to revert that change without touching the other
 changes that were made on top of it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
 dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
 g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
 RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
 Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
 SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
 =baZ3
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing update from Steven Rostedt:
 "Mostly this is just clean ups and micro optimizations.

  The changes with more meat are:

   - Allowing the trace event filters to filter on CPU number and
     process ids

   - Two new markers for trace output latency were added (10 and 100
     msec latencies)

   - Have tracing_thresh filter function profiling time

  I also worked on modifying the ring buffer code for some future work,
  and moved the adding of the timestamp around.  One of my changes
  caused a regression, and since other changes were built on top of it
  and already tested, I had to operate a revert of that change.  Instead
  of rebasing, this change set has the code that caused a regression as
  well as the code to revert that change without touching the other
  changes that were made on top of it"

* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
  tracing: Don't make assumptions about length of string on task rename
  tracing: Allow triggers to filter for CPU ids and process names
  ftrace: Format MCOUNT_ADDR address as type unsigned long
  tracing: Introduce two additional marks for delay
  ftrace: Fix function_graph duration spacing with 7-digits
  ftrace: add tracing_thresh to function profile
  tracing: Clean up stack tracing and fix fentry updates
  ring-buffer: Reorganize function locations
  ring-buffer: Make sure event has enough room for extend and padding
  ring-buffer: Get timestamp after event is allocated
  ring-buffer: Move the adding of the extended timestamp out of line
  ring-buffer: Add event descriptor to simplify passing data
  ftrace: correct the counter increment for trace_buffer data
  tracing: Fix for non-continuous cpu ids
  tracing: Prefer kcalloc over kzalloc with multiply
2015-09-08 14:04:14 -07:00
Bharata B Rao
daebaabb5c powerpc/pseries: Release DRC when configure_connector fails
Commit f32393c943 ("powerpc/pseries: Correct cpu affinity for
dlpar added cpus") moved dlpar_acquire_drc() call to before
dlpar_configure_connector() call in dlpar_cpu_probe(), but missed
to release the DRC if dlpar_configure_connector() failed.
During CPU hotplug, if configure-connector fails for any reason,
then this will result in subsequent CPU hotplug attempts to fail.

Release the acquired DRC if dlpar_configure_connector() call fails
so that the DRC is left in right isolation and allocation state
for the subsequent hotplug operation to succeed.

Fixes: f32393c943 ("powerpc/pseries: Correct cpu affinity for dlpar added cpus")
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-08 14:07:58 +10:00
Nishanth Aravamudan
fa14486979 powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=
The 32-bit TCE table initialization relies on the DMA window having a
size equal to a power of 2 (and checks for it explicitly). But
crashkernel= has no constraint that requires a power-of-2 be specified.
This causes the kdump kernel to fail to boot as none of the PCI devices
(including the disk controller) are successfully initialized.

After this change, the PCI devices successfully set up the 32-bit TCE
table and kdump succeeds.

Fixes: aca6913f55 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.2
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-07 20:14:18 +10:00
Nishanth Aravamudan
bb0054552d powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel
When attempting to kdump with the 4.2 kernel, we see for each PCI
device:

 pci 0003:01     : [PE# 000] Assign DMA32 space
 pci 0003:01     : [PE# 000] Setting up 32-bit TCE table at 0..80000000
 pci 0003:01     : [PE# 000] Failed to create 32-bit TCE table, err -22
 PCI: Domain 0004 has 8 available 32-bit DMA segments
 PCI: 4 PE# for a total weight of 70
 pci 0004:01     : [PE# 002] Assign DMA32 space
 pci 0004:01     : [PE# 002] Setting up 32-bit TCE table at 0..80000000
 pci 0004:01     : [PE# 002] Failed to create 32-bit TCE table, err -22
 pci 0004:0d     : [PE# 005] Assign DMA32 space
 pci 0004:0d     : [PE# 005] Setting up 32-bit TCE table at 0..80000000
 pci 0004:0d     : [PE# 005] Failed to create 32-bit TCE table, err -22
 pci 0004:0e     : [PE# 006] Assign DMA32 space
 pci 0004:0e     : [PE# 006] Setting up 32-bit TCE table at 0..80000000
 pci 0004:0e     : [PE# 006] Failed to create 32-bit TCE table, err -22
 pci 0004:10     : [PE# 008] Assign DMA32 space
 pci 0004:10     : [PE# 008] Setting up 32-bit TCE table at 0..80000000
 pci 0004:10     : [PE# 008] Failed to create 32-bit TCE table, err -22

and eventually the kdump kernel fails to boot as none of the PCI devices
(including the disk controller) are successfully initialized.

The EINVAL response is because the DMA window (the 2GB base window) is
larger than the kdump kernel's reserved memory (crashkernel=, in this
case specified to be 1024M). The check in question,

 if ((window_size > memory_hotplug_max()) || !is_power_of_2(window_size))

is a valid sanity check for pnv_pci_ioda2_table_alloc_pages(), so adjust
the caller to pass in a smaller window size if our maximum memory value
is smaller than the DMA window.

After this change, the PCI devices successfully set up the 32-bit TCE
table and kdump succeeds.

The problem was seen on a Firestone machine originally.

Fixes: aca6913f55 ("powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages")
Cc: stable@vger.kernel.org # 4.2
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Coding style pedantry, use u64, change the indentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-07 20:13:14 +10:00
Michal Marek
5631d9c429 kbuild: Fix clang detection
We cannot detect clang before including the arch Makefile, because that
can set the default cross compiler. We also cannot detect clang after
including the arch Makefile, because powerpc wants to know about clang.
Solve this by using an deferred variable. This costs us a few shell
invocations, but this is only a constant number.

Reported-by: Behan Webster <behanw@converseincode.com>
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michal Marek <mmarek@suse.com>
2015-09-04 13:14:10 +02:00
Linus Torvalds
ff474e8ca8 powerpc updates for 4.3
- Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask from Benjamin Herrenschmidt
  - EEH fixes for SRIOV from Gavin
  - Introduce rtas_get_sensor_fast() for IRQ handlers from Thomas Huth
  - Use hardware RNG for arch_get_random_seed_* not arch_get_random_* from Paul Mackerras
  - Seccomp filter support from Michael Ellerman
  - opal_cec_reboot2() handling for HMIs & machine checks from Mahesh Salgaonkar
  - Add powerpc timebase as a trace clock source from Naveen N. Rao
  - Misc cleanups in the xmon, signal & SLB code from Anshuman Khandual
  - Add an inline function to update POWER8 HID0 from Gautham R. Shenoy
  - Fix pte_pagesize_index() crash on 4K w/64K hash from Michael Ellerman
  - Drop support for 64K local store on 4K kernels from Michael Ellerman
  - move dma_get_required_mask() from pnv_phb to pci_controller_ops from Andrew Donnellan
  - Initialize distance lookup table from drconf path from Nikunj A Dadhania
  - Enable RTC class support from Vaibhav Jain
  - Disable automatically blocked PCI config from Gavin Shan
  - Add LEDs driver for PowerNV platform from Vasant Hegde
  - Fix endianness issues in the HVSI driver from Laurent Dufour
  - Kexec endian fixes from Samuel Mendoza-Jonas
  - Fix corrupted pdn list from Gavin Shan
  - Fix fenced PHB caused by eeh_slot_error_detail() from Gavin Shan
 
  - Freescale updates from Scott: Highlights include 32-bit memcpy/memset
    optimizations, checksum optimizations, 85xx config fragments and updates,
    device tree updates, e6500 fixes for non-SMP, and misc cleanup and minor
    fixes.
 
  - A ton of cxl updates & fixes:
   - Add explicit precision specifiers from Rasmus Villemoes
   - use more common format specifier from Rasmus Villemoes
   - Destroy cxl_adapter_idr on module_exit from Johannes Thumshirn
   - Destroy afu->contexts_idr on release of an afu from Johannes Thumshirn
   - Compile with -Werror from Daniel Axtens
   - EEH support from Daniel Axtens
   - Plug irq_bitmap getting leaked in cxl_context from Vaibhav Jain
   - Add alternate MMIO error handling from Ian Munsie
   - Allow release of contexts which have been OPENED but not STARTED from Andrew Donnellan
   - Remove use of macro DEFINE_PCI_DEVICE_TABLE from Vaishali Thakkar
   - Release irqs if memory allocation fails from Vaibhav Jain
   - Remove racy attempt to force EEH invocation in reset from Daniel Axtens
   - Fix + cleanup error paths in cxl_dev_context_init from Ian Munsie
   - Fix force unmapping mmaps of contexts allocated through the kernel api from Ian Munsie
   - Set up and enable PSL Timebase from Philippe Bergheaud
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV5+GzAAoJEFHr6jzI4aWA0iAP/jcd0kNaNBzLgcDKKygKdgz4
 xn4EWu81vfMfZYWesb0ATrjlH0hLsRxSXoFUqUMhtJTa5kNAoCIaz/M8WBALS50h
 aT+i7br4WEU2j2FcaMyP3iAZx/2hl+2utODJSHPRWPkec1fUDBfEyBf++e520RWM
 HUQGIGZXh8yq7KMA96Pwhsvls9vOB8hS2UdU/NS8ff3J5jFvXC1/WmF2qfzJBS1V
 8iHyz26Jl8+dJ+et7iC2oD5XQAjIH1oJgOyPVPBzAQttfi8RjuVzRA30TfPBAUwI
 lC9nlmPy6bCe4kiQYWVB1z7GegHyW/9vkeuMj/u8mZbqpaayMEMZmd2C3iNDXNHx
 i2NSvdln539t4qWYsV2v6lVCfa/ayDHD73Wackj5Dk394tzXnpCPhxNzc2yKEd5v
 h7vwYc9jBhsbfSCSogaM+gSHJ1APgCidggHJMYYNA2nN2u6V62RpsMB7zp/1+Q2v
 yqYdD8oYF4Dm21x/ujaNFrlizROD46WS0UqdJ3yP6HAqRYIpRXtibmpECJgt1n5h
 HjADEci4hQ2UQxdMdp/Q5KZnPTJebBtrZrmkW5r6cZBUaTB5TVkFaEWN44CT/Loh
 tMNeA3qOBN06CaQS2WL3UUUWpbZq9fSbWuUZ5lWZDb5AOyRxe5eWVYNLkiyIXozY
 L24l1bYdBhXahnjoS/kc
 =n9+X
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask
   from Benjamin Herrenschmidt

 - EEH fixes for SRIOV from Gavin

 - introduce rtas_get_sensor_fast() for IRQ handlers from Thomas Huth

 - use hardware RNG for arch_get_random_seed_* not arch_get_random_*
   from Paul Mackerras

 - seccomp filter support from Michael Ellerman

 - opal_cec_reboot2() handling for HMIs & machine checks from Mahesh
   Salgaonkar

 - add powerpc timebase as a trace clock source from Naveen N.  Rao

 - misc cleanups in the xmon, signal & SLB code from Anshuman Khandual

 - add an inline function to update POWER8 HID0 from Gautham R.  Shenoy

 - fix pte_pagesize_index() crash on 4K w/64K hash from Michael Ellerman

 - drop support for 64K local store on 4K kernels from Michael Ellerman

 - move dma_get_required_mask() from pnv_phb to pci_controller_ops from
   Andrew Donnellan

 - initialize distance lookup table from drconf path from Nikunj A
   Dadhania

 - enable RTC class support from Vaibhav Jain

 - disable automatically blocked PCI config from Gavin Shan

 - add LEDs driver for PowerNV platform from Vasant Hegde

 - fix endianness issues in the HVSI driver from Laurent Dufour

 - kexec endian fixes from Samuel Mendoza-Jonas

 - fix corrupted pdn list from Gavin Shan

 - fix fenced PHB caused by eeh_slot_error_detail() from Gavin Shan

 - Freescale updates from Scott: Highlights include 32-bit memcpy/memset
   optimizations, checksum optimizations, 85xx config fragments and
   updates, device tree updates, e6500 fixes for non-SMP, and misc
   cleanup and minor fixes.

 - a ton of cxl updates & fixes:
    - add explicit precision specifiers from Rasmus Villemoes
    - use more common format specifier from Rasmus Villemoes
    - destroy cxl_adapter_idr on module_exit from Johannes Thumshirn
    - destroy afu->contexts_idr on release of an afu from Johannes
      Thumshirn
    - compile with -Werror from Daniel Axtens
    - EEH support from Daniel Axtens
    - plug irq_bitmap getting leaked in cxl_context from Vaibhav Jain
    - add alternate MMIO error handling from Ian Munsie
    - allow release of contexts which have been OPENED but not STARTED
      from Andrew Donnellan
    - remove use of macro DEFINE_PCI_DEVICE_TABLE from Vaishali Thakkar
    - release irqs if memory allocation fails from Vaibhav Jain
    - remove racy attempt to force EEH invocation in reset from Daniel
      Axtens
    - fix + cleanup error paths in cxl_dev_context_init from Ian Munsie
    - fix force unmapping mmaps of contexts allocated through the kernel
      api from Ian Munsie
    - set up and enable PSL Timebase from Philippe Bergheaud

* tag 'powerpc-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (140 commits)
  cxl: Set up and enable PSL Timebase
  cxl: Fix force unmapping mmaps of contexts allocated through the kernel api
  cxl: Fix + cleanup error paths in cxl_dev_context_init
  powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail()
  powerpc/pseries: Cleanup on pci_dn_reconfig_notifier()
  powerpc/pseries: Fix corrupted pdn list
  powerpc/powernv: Enable LEDS support
  powerpc/iommu: Set default DMA offset in dma_dev_setup
  cxl: Remove racy attempt to force EEH invocation in reset
  cxl: Release irqs if memory allocation fails
  cxl: Remove use of macro DEFINE_PCI_DEVICE_TABLE
  powerpc/powernv: Fix mis-merge of OPAL support for LEDS driver
  powerpc/powernv: Reset HILE before kexec_sequence()
  powerpc/kexec: Reset secondary cpu endianness before kexec
  powerpc/hvsi: Fix endianness issues in the HVSI driver
  leds/powernv: Add driver for PowerNV platform
  powerpc/powernv: Create LED platform device
  powerpc/powernv: Add OPAL interfaces for accessing and modifying system LED states
  powerpc/powernv: Fix the log message when disabling VF
  cxl: Allow release of contexts which have been OPENED but not STARTED
  ...
2015-09-03 16:41:38 -07:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Greg Kurz
4e33d1f0a1 KVM: PPC: Book3S: Fix typo in top comment about locking
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-04 07:28:05 +10:00
Thomas Huth
f35f3a48d6 KVM: PPC: Book3S: Fix size of the PSPB register
The size of the Problem State Priority Boost Register is only
32 bits, but the kvm_vcpu_arch->pspb variable is declared as
"ulong", ie. 64-bit. However, the assembler code accesses this
variable with 32-bit accesses, and the KVM_REG_PPC_PSPB macro
is defined with SIZE_U32, too, so that the current code is
broken on big endian hosts: kvmppc_get_one_reg_hv() will only
return zero for this register since it is using the wrong half
of the pspb variable. Let's fix this problem by adjusting the
size of the pspb field in the kvm_vcpu_arch structure.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-03 16:09:09 +10:00
Gautham R. Shenoy
06554d9f6c KVM: PPC: Book3S HV: Exit on H_DOORBELL if HOST_IPI is set
The code that handles the case when we receive a H_DOORBELL interrupt
has a comment which says "Hypervisor doorbell - exit only if host IPI
flag set".  However, the current code does not actually check if the
host IPI flag is set.  This is due to a comparison instruction that
got missed.

As a result, the current code performs the exit to host only
if some sibling thread or a sibling sub-core is exiting to the
host.  This implies that, an IPI sent to a sibling core in
(subcores-per-core != 1) mode will be missed by the host unless the
sibling core is on the exit path to the host.

This patch adds the missing comparison operation which will ensure
that when HOST_IPI flag is set, we unconditionally exit to the host.

Fixes: 66feed61cd
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-03 16:08:34 +10:00
Gautham R. Shenoy
7f23532866 KVM: PPC: Book3S HV: Fix race in starting secondary threads
The current dynamic micro-threading code has a race due to which a
secondary thread naps when it is supposed to be running a vcpu. As a
side effect of this, on a guest exit, the primary thread in
kvmppc_wait_for_nap() finds that this secondary thread hasn't cleared
its vcore pointer. This results in "CPU X seems to be stuck!"
warnings.

The race is possible since the primary thread on exiting the guests
only waits for all the secondaries to clear its vcore pointer. It
subsequently expects the secondary threads to enter nap while it
unsplits the core. A secondary thread which hasn't yet entered the nap
will loop in kvm_no_guest until its vcore pointer and the do_nap flag
are unset. Once the core has been unsplit, a new vcpu thread can grab
the core and set the do_nap flag *before* setting the vcore pointers
of the secondary. As a result, the secondary thread will now enter nap
via kvm_unsplit_nap instead of running the guest vcpu.

Fix this by setting the do_nap flag after setting the vcore pointer in
the PACA of the secondary in kvmppc_run_core. Also, ensure that a
secondary thread doesn't nap in kvm_unsplit_nap when the vcore pointer
in its PACA struct is set.

Fixes: b4deba5c41
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2015-09-03 16:07:42 +10:00
Linus Torvalds
1081230b74 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "This first core part of the block IO changes contains:

   - Cleanup of the bio IO error signaling from Christoph.  We used to
     rely on the uptodate bit and passing around of an error, now we
     store the error in the bio itself.

   - Improvement of the above from myself, by shrinking the bio size
     down again to fit in two cachelines on x86-64.

   - Revert of the max_hw_sectors cap removal from a revision again,
     from Jeff Moyer.  This caused performance regressions in various
     tests.  Reinstate the limit, bump it to a more reasonable size
     instead.

   - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
     Most devices have huge trim limits, which can cause nasty latencies
     when deleting files.  Enable the admin to configure the size down.
     We will look into having a more sane default instead of UINT_MAX
     sectors.

   - Improvement of the SGP gaps logic from Keith Busch.

   - Enable the block core to handle arbitrarily sized bios, which
     enables a nice simplification of bio_add_page() (which is an IO hot
     path).  From Kent.

   - Improvements to the partition io stats accounting, making it
     faster.  From Ming Lei.

   - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
     file in blk-mq, as well as a fix for a blk-mq timeout race
     condition.

   - Ming Lin has been carrying Kents above mentioned patches forward
     for a while, and testing them.  Ming also did a few fixes around
     that.

   - Sasha Levin found and fixed a use-after-free problem introduced by
     the bio->bi_error changes from Christoph.

   - Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
  blk: Fix bio_io_vec index when checking bvec gaps
  block: Replace SG_GAPS with new queue limits mask
  block: bump BLK_DEF_MAX_SECTORS to 2560
  Revert "block: remove artifical max_hw_sectors cap"
  blk-mq: fix race between timeout and freeing request
  blk-mq: fix buffer overflow when reading sysfs file of 'pending'
  Documentation: update notes in biovecs about arbitrarily sized bios
  block: remove bio_get_nr_vecs()
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: kill merge_bvec_fn() completely
  md/raid5: get rid of bio_fits_rdev()
  md/raid5: split bio for chunk_aligned_read
  block: remove split code in blkdev_issue_{discard,write_same}
  btrfs: remove bio splitting and merge_bvec_fn() calls
  bcache: remove driver private bio splitting code
  block: simplify bio_add_page()
  block: make generic_make_request handle arbitrarily sized bios
  blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
  block: don't access bio->bi_error after bio_put()
  block: shrink struct bio down to 2 cache lines again
  ...
2015-09-02 13:10:25 -07:00
Linus Torvalds
ae98207309 Power management and ACPI material for v4.3-rc1
- ACPICA update to upstream revision 20150818 including method
    tracing extensions to allow more in-depth AML debugging in the
    kernel and a number of assorted fixes and cleanups (Bob Moore,
    Lv Zheng, Markus Elfring).
 
  - ACPI sysfs code updates and a documentation update related to
    AML method tracing (Lv Zheng).
 
  - ACPI EC driver fix related to serialized evaluations of _Qxx
    methods and ACPI tools updates allowing the EC userspace tool
    to be built from the kernel source (Lv Zheng).
 
  - ACPI processor driver updates preparing it for future
    introduction of CPPC support and ACPI PCC mailbox driver
    updates (Ashwin Chaugule).
 
  - ACPI interrupts enumeration fix for a regression related
    to the handling of IRQ attribute conflicts between MADT
    and the ACPI namespace (Jiang Liu).
 
  - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar).
 
  - ACPI device registration code reorganization to separate the
    sysfs-related code and bus type operations from the rest (Rafael
    J Wysocki).
 
  - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
    Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
 
  - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups
    (Pan Xinhui, Rafael J Wysocki).
 
  - cpufreq core cleanups on top of the previous changes allowing it
    to preseve its sysfs directories over system suspend/resume (Viresh
    Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
 
  - cpufreq fixes and cleanups related to governors (Viresh Kumar).
 
  - cpufreq updates (core and the cpufreq-dt driver) related to the
    turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
 
  - New DT bindings for Operating Performance Points (OPP), support
    for them in the OPP framework and in the cpufreq-dt driver plus
    related OPP framework fixes and cleanups (Viresh Kumar).
 
  - cpufreq powernv driver updates (Shilpasri G Bhat).
 
  - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
 
  - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
    and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
 
  - intel_pstate driver updates including Skylake-S support, support
    for enabling HW P-states per CPU and an additional vendor bypass
    list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
 
  - cpuidle core fixes related to the handling of coupled idle states
    (Xunlei Pang).
 
  - intel_idle driver updates including Skylake Client support and
    support for freeze-mode-specific idle states (Len Brown).
 
  - Driver core updates related to power management (Andy Shevchenko,
    Rafael J Wysocki).
 
  - Generic power domains framework fixes and cleanups (Jon Hunter,
    Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
 
  - Device PM QoS framework update to allow the latency tolerance
    setting to be exposed to user space via sysfs (Mika Westerberg).
 
  - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
    exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
 
  - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
 
  - rockchip-io AVS support updates (Heiko Stuebner).
 
  - PM core clocks support fixup (Colin Ian King).
 
  - Power capping RAPL driver update including support for Skylake H/S
    and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
 
  - Generic device properties framework fixes related to the handling
    of static (driver-provided) property sets (Andy Shevchenko).
 
  - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
    Shreyas B Prabhu).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJV5hhGAAoJEILEb/54YlRxs+EQAK51iFk48+IbpHYaZZ50Yo4m
 ZZc2zBcbwRcBlU9vKERrhG+jieSl8J/JJNxT8vBjKqyvNw038mCjewQh02ol0HuC
 R7nlDiVJkmZ50sLO4xwE/1UBZr/XqbddwCUnYzvFMkMTA0ePzFtf8BrJ1FXpT8S/
 fkwSXQty6hvJDwxkfrbMSaA730wMju9lahx8D6MlmUAedWYZOJDMQKB4WKa/St5X
 9uckBPHUBB2KiKlXxdbFPwKLNxHvLROq5SpDLc6cM/7XZB+QfNFy85CUjCUtYo1O
 1W8k0qnztvZ6UEv27qz5dejGyAGOarMWGGNsmL9evoeGeHRpQL+dom7HcTnbAfUZ
 walyhYSm/zKkdy7Vl3xWUUQkMG48+PviMI6K0YhHXb3Rm5wlR/yBNZTwNIty9SX/
 fKCHEa8QynWwLxgm53c3xRkiitJxMsHNK03moLD9zQMjshTyTNvpNbZoahyKQzk6
 H+9M1DBRHhkkREDWSwGutukxfEMtWe2vcZcyERrFiY7l5k1j58DwDBMPqjPhRv6q
 P/1NlCzr0XYf83Y86J18LbDuPGDhTjjIEn6CqbtI2mmWqTg3+rF7zvS2ux+FzMnA
 gisv8l6GT9JiWhxKFqqL/rrVpwtyHebWLYE/RpNUW6fEzLziRNj1qyYO9dqI/GGi
 I3rfxlXoc/5xJWCgNB8f
 =fTgI
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "From the number of commits perspective, the biggest items are ACPICA
  and cpufreq changes with the latter taking the lead (over 50 commits).

  On the cpufreq front, there are many cleanups and minor fixes in the
  core and governors, driver updates etc.  We also have a new cpufreq
  driver for Mediatek MT8173 chips.

  ACPICA mostly updates its debug infrastructure and adds a number of
  fixes and cleanups for a good measure.

  The Operating Performance Points (OPP) framework is updated with new
  DT bindings and support for them among other things.

  We have a few updates of the generic power domains framework and a
  reorganization of the ACPI device enumeration code and bus type
  operations.

  And a lot of fixes and cleanups all over.

  Included is one branch from the MFD tree as it contains some
  PM-related driver core and ACPI PM changes a few other commits are
  based on.

  Specifics:

   - ACPICA update to upstream revision 20150818 including method
     tracing extensions to allow more in-depth AML debugging in the
     kernel and a number of assorted fixes and cleanups (Bob Moore, Lv
     Zheng, Markus Elfring).

   - ACPI sysfs code updates and a documentation update related to AML
     method tracing (Lv Zheng).

   - ACPI EC driver fix related to serialized evaluations of _Qxx
     methods and ACPI tools updates allowing the EC userspace tool to be
     built from the kernel source (Lv Zheng).

   - ACPI processor driver updates preparing it for future introduction
     of CPPC support and ACPI PCC mailbox driver updates (Ashwin
     Chaugule).

   - ACPI interrupts enumeration fix for a regression related to the
     handling of IRQ attribute conflicts between MADT and the ACPI
     namespace (Jiang Liu).

   - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi
     Kasagar).

   - ACPI device registration code reorganization to separate the
     sysfs-related code and bus type operations from the rest (Rafael J
     Wysocki).

   - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
     Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).

   - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan
     Xinhui, Rafael J Wysocki).

   - cpufreq core cleanups on top of the previous changes allowing it to
     preseve its sysfs directories over system suspend/resume (Viresh
     Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).

   - cpufreq fixes and cleanups related to governors (Viresh Kumar).

   - cpufreq updates (core and the cpufreq-dt driver) related to the
     turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).

   - New DT bindings for Operating Performance Points (OPP), support for
     them in the OPP framework and in the cpufreq-dt driver plus related
     OPP framework fixes and cleanups (Viresh Kumar).

   - cpufreq powernv driver updates (Shilpasri G Bhat).

   - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).

   - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
     and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).

   - intel_pstate driver updates including Skylake-S support, support
     for enabling HW P-states per CPU and an additional vendor bypass
     list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).

   - cpuidle core fixes related to the handling of coupled idle states
     (Xunlei Pang).

   - intel_idle driver updates including Skylake Client support and
     support for freeze-mode-specific idle states (Len Brown).

   - Driver core updates related to power management (Andy Shevchenko,
     Rafael J Wysocki).

   - Generic power domains framework fixes and cleanups (Jon Hunter,
     Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).

   - Device PM QoS framework update to allow the latency tolerance
     setting to be exposed to user space via sysfs (Mika Westerberg).

   - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
     exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).

   - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).

   - rockchip-io AVS support updates (Heiko Stuebner).

   - PM core clocks support fixup (Colin Ian King).

   - Power capping RAPL driver update including support for Skylake H/S
     and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).

   - Generic device properties framework fixes related to the handling
     of static (driver-provided) property sets (Andy Shevchenko).

   - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
     Shreyas B Prabhu)"

* tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits)
  cpufreq: speedstep-lib: Use monotonic clock
  cpufreq: powernv: Increase the verbosity of OCC console messages
  cpufreq: sfi: use kmemdup rather than duplicating its implementation
  cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
  cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
  cpufreq: remove redundant 'policy' field from user_policy
  cpufreq: remove redundant 'governor' field from user_policy
  cpufreq: update user_policy.* on success
  cpufreq: use memcpy() to copy policy
  cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
  cpufreq: mediatek: Add MT8173 cpufreq driver
  dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
  PM / Domains: Fix typo in description of genpd_dev_pm_detach()
  PM / Domains: Remove unusable governor dummies
  PM / Domains: Make pm_genpd_init() available to modules
  PM / domains: Align column headers and data in pm_genpd_summary output
  powercap / RAPL: disable the 2nd power limit properly
  tools: cpupower: Fix error when running cpupower monitor
  PM / OPP: Drop unlikely before IS_ERR(_OR_NULL)
  PM / OPP: Fix static checker warning (broken 64bit big endian systems)
  ...
2015-09-01 19:45:46 -07:00
Linus Torvalds
089b669506 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree for 4.3 (kerneldoc updates, printk()
  fixes, Documentation and MAINTAINERS updates)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
  MAINTAINERS: update my e-mail address
  mod_devicetable: add space before */
  scsi: a100u2w: trivial typo in printk
  i2c: Fix typo in i2c-bfin-twi.c
  treewide: fix typos in comment blocks
  Doc: fix trivial typo in SubmittingPatches
  proportions: Spelling s/consitent/consistent/
  dm: Spelling s/consitent/consistent/
  aic7xxx: Fix typo in error message
  pcmcia: Fix typo in locking documentation
  scsi/arcmsr: Fix typos in error log
  drm/nouveau/gr: Fix typo in nv10.c
  [SCSI] Fix printk typos in drivers/scsi
  staging: comedi: Grammar s/Enable support a/Enable support for a/
  Btrfs: Spelling s/consitent/consistent/
  README: GTK+ is a acronym
  ASoC: omap: Fix typo in config option description
  mm: tlb.c: Fix error message
  ntfs: super.c: Fix error log
  fix typo in Documentation/SubmittingPatches
  ...
2015-09-01 18:46:42 -07:00
Linus Torvalds
17e6b00ac4 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "This updated pull request does not contain the last few GIC related
  patches which were reported to cause a regression.  There is a fix
  available, but I let it breed for a couple of days first.

  The irq departement provides:

   - new infrastructure to support non PCI based MSI interrupts
   - a couple of new irq chip drivers
   - the usual pile of fixlets and updates to irq chip drivers
   - preparatory changes for removal of the irq argument from interrupt
     flow handlers
   - preparatory changes to remove IRQF_VALID"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits)
  irqchip/imx-gpcv2: IMX GPCv2 driver for wakeup sources
  irqchip: Add bcm2836 interrupt controller for Raspberry Pi 2
  irqchip: Add documentation for the bcm2836 interrupt controller
  irqchip/bcm2835: Add support for being used as a second level controller
  irqchip/bcm2835: Refactor handle_IRQ() calls out of MAKE_HWIRQ
  PCI: xilinx: Fix typo in function name
  irqchip/gic: Ensure gic_cpu_if_up/down() programs correct GIC instance
  irqchip/gic: Only allow the primary GIC to set the CPU map
  PCI/MSI: pci-xgene-msi: Consolidate chained IRQ handler install/remove
  unicore32/irq: Prepare puv3_gpio_handler for irq argument removal
  tile/pci_gx: Prepare trio_handle_level_irq for irq argument removal
  m68k/irq: Prepare irq handlers for irq argument removal
  C6X/megamode-pic: Prepare megamod_irq_cascade for irq argument removal
  blackfin: Prepare irq handlers for irq argument removal
  arc/irq: Prepare idu_cascade_isr for irq argument removal
  sparc/irq: Use access helper irq_data_get_affinity_mask()
  sparc/irq: Use helper irq_data_get_irq_handler_data()
  parisc/irq: Use access helper irq_data_get_affinity_mask()
  mn10300/irq: Use access helper irq_data_get_affinity_mask()
  irqchip/i8259: Prepare i8259_irq_dispatch for irq argument removal
  ...
2015-09-01 14:33:35 -07:00
Linus Torvalds
5e359bf221 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Rather large, but nothing exiting:

   - new range check for settimeofday() to prevent that boot time
     becomes negative.
   - fix for file time rounding
   - a few simplifications of the hrtimer code
   - fix for the proc/timerlist code so the output of clock realtime
     timers is accurate
   - more y2038 work
   - tree wide conversion of clockevent drivers to the new callbacks"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (88 commits)
  hrtimer: Handle failure of tick_init_highres() gracefully
  hrtimer: Unconfuse switch_hrtimer_base() a bit
  hrtimer: Simplify get_target_base() by returning current base
  hrtimer: Drop return code of hrtimer_switch_to_hres()
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  time: Introduce current_kernel_time64()
  time: Introduce struct itimerspec64
  time: Add the common weak version of update_persistent_clock()
  time: Always make sure wall_to_monotonic isn't positive
  time: Fix nanosecond file time rounding in timespec_trunc()
  timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers
  cris/time: Migrate to new 'set-state' interface
  kernel: broadcast-hrtimer: Migrate to new 'set-state' interface
  xtensa/time: Migrate to new 'set-state' interface
  unicore/time: Migrate to new 'set-state' interface
  um/time: Migrate to new 'set-state' interface
  sparc/time: Migrate to new 'set-state' interface
  sh/localtimer: Migrate to new 'set-state' interface
  score/time: Migrate to new 'set-state' interface
  s390/time: Migrate to new 'set-state' interface
  ...
2015-09-01 14:04:50 -07:00
Linus Torvalds
25525bea46 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The dominant change in this cycle was the continued work to isolate
  kernel drivers from MTRR legacies: this tree gets rid of all kernel
  internal driver interfaces to MTRRs (mostly by rewriting it to proper
  PAT interfaces), the only access left is the /proc/mtrr ABI.

  This work was done by Luis R Rodriguez.

  There's also some related PCI interface additions for which I've
  Cc:-ed Bjorn"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
  s390/io: Add pci_iomap_wc() and pci_iomap_wc_range()
  drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style
  drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc()
  PCI: Add pci_iomap_wc() variants
  drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer
  drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  PCI: Add pci_ioremap_wc_bar()
  x86/mm: Make kernel/check.c explicitly non-modular
  x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular
  x86/mm/pat: Add comments to cachemode translation tables
  arch/*/io.h: Add ioremap_uc() to all architectures
  drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc()
  drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC
  drivers/video/fbdev/atyfb: Clarify ioremap() base and length used
  drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper
  x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default
  ...
2015-09-01 10:07:40 -07:00
Rafael J. Wysocki
4ffe18c255 Merge branch 'pm-cpufreq'
* pm-cpufreq: (53 commits)
  cpufreq: speedstep-lib: Use monotonic clock
  cpufreq: powernv: Increase the verbosity of OCC console messages
  cpufreq: sfi: use kmemdup rather than duplicating its implementation
  cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
  cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
  cpufreq: remove redundant 'policy' field from user_policy
  cpufreq: remove redundant 'governor' field from user_policy
  cpufreq: update user_policy.* on success
  cpufreq: use memcpy() to copy policy
  cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
  cpufreq: mediatek: Add MT8173 cpufreq driver
  dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
  intel_pstate: append more Oracle OEM table id to vendor bypass list
  intel_pstate: Add SKY-S support
  intel_pstate: Fix possible overflow complained by Coverity
  cpufreq: Correct a freq check in cpufreq_set_policy()
  cpufreq: Lock CPU online/offline in cpufreq_register_driver()
  cpufreq: Replace recover_policy with new_policy in cpufreq_online()
  cpufreq: Separate CPU device registration from CPU online
  cpufreq: powernv: Restore cpu frequency to policy->cur on unthrottling
  ...
2015-09-01 15:52:35 +02:00
Linus Torvalds
a1d8561172 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest change in this cycle is the rewrite of the main SMP load
  balancing metric: the CPU load/utilization.  The main goal was to make
  the metric more precise and more representative - see the changelog of
  this commit for the gory details:

    9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")

  It is done in a way that significantly reduces complexity of the code:

    5 files changed, 249 insertions(+), 494 deletions(-)

  and the performance testing results are encouraging.  Nevertheless we
  need to keep an eye on potential regressions, since this potentially
  affects every SMP workload in existence.

  This work comes from Yuyang Du.

  Other changes:

   - SCHED_DL updates.  (Andrea Parri)

   - Simplify architecture callbacks by removing finish_arch_switch().
     (Peter Zijlstra et al)

   - cputime accounting: guarantee stime + utime == rtime.  (Peter
     Zijlstra)

   - optimize idle CPU wakeups some more - inspired by Facebook server
     loads.  (Mike Galbraith)

   - stop_machine fixes and updates.  (Oleg Nesterov)

   - Introduce the 'trace_sched_waking' tracepoint.  (Peter Zijlstra)

   - sched/numa tweaks.  (Srikar Dronamraju)

   - misc fixes and small cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  sched/deadline: Fix comment in enqueue_task_dl()
  sched/deadline: Fix comment in push_dl_tasks()
  sched: Change the sched_class::set_cpus_allowed() calling context
  sched: Make sched_class::set_cpus_allowed() unconditional
  sched: Fix a race between __kthread_bind() and sched_setaffinity()
  sched: Ensure a task has a non-normalized vruntime when returning back to CFS
  sched/numa: Fix NUMA_DIRECT topology identification
  tile: Reorganize _switch_to()
  sched, sparc32: Update scheduler comments in copy_thread()
  sched: Remove finish_arch_switch()
  sched, tile: Remove finish_arch_switch
  sched, sh: Fold finish_arch_switch() into switch_to()
  sched, score: Remove finish_arch_switch()
  sched, avr32: Remove finish_arch_switch()
  sched, MIPS: Get rid of finish_arch_switch()
  sched, arm: Remove finish_arch_switch()
  sched/fair: Clean up load average references
  sched/fair: Provide runnable_load_avg back to cfs_rq
  sched/fair: Remove task and group entity load when they are dead
  sched/fair: Init cfs_rq's sched_entity load average
  ...
2015-08-31 20:26:22 -07:00
Linus Torvalds
7073bc6612 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - the combination of tree geometry-initialization simplifications and
     OS-jitter-reduction changes to expedited grace periods.  These two
     are stacked due to the large number of conflicts that would
     otherwise result.

   - privatize smp_mb__after_unlock_lock().

     This commit moves the definition of smp_mb__after_unlock_lock() to
     kernel/rcu/tree.h, in recognition of the fact that RCU is the only
     thing using this, that nothing else is likely to use it, and that
     it is likely to go away completely.

   - documentation updates.

   - torture-test updates.

   - misc fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  rcu,locking: Privatize smp_mb__after_unlock_lock()
  rcu: Silence lockdep false positive for expedited grace periods
  rcu: Don't disable CPU hotplug during OOM notifiers
  scripts: Make checkpatch.pl warn on expedited RCU grace periods
  rcu: Update MAINTAINERS entry
  rcu: Clarify CONFIG_RCU_EQS_DEBUG help text
  rcu: Fix backwards RCU_LOCKDEP_WARN() in synchronize_rcu_tasks()
  rcu: Rename rcu_lockdep_assert() to RCU_LOCKDEP_WARN()
  rcu: Make rcu_is_watching() really notrace
  cpu: Wait for RCU grace periods concurrently
  rcu: Create a synchronize_rcu_mult()
  rcu: Fix obsolete priority-boosting comment
  rcu: Use WRITE_ONCE in RCU_INIT_POINTER
  rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
  rcu: Add RCU-sched flavors of get-state and cond-sync
  rcu: Add fastpath bypassing funnel locking
  rcu: Rename RCU_GP_DONE_FQS to RCU_GP_DOING_FQS
  rcu: Pull out wait_event*() condition into helper function
  documentation: Describe new expedited stall warnings
  rcu: Add stall warnings to synchronize_sched_expedited()
  ...
2015-08-31 18:12:07 -07:00
Linus Torvalds
d4c90396ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.3:

  API:

   - the AEAD interface transition is now complete.
   - add top-level skcipher interface.

  Drivers:

   - x86-64 acceleration for chacha20/poly1305.
   - add sunxi-ss Allwinner Security System crypto accelerator.
   - add RSA algorithm to qat driver.
   - add SRIOV support to qat driver.
   - add LS1021A support to caam.
   - add i.MX6 support to caam"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (163 commits)
  crypto: algif_aead - fix for multiple operations on AF_ALG sockets
  crypto: qat - enable legacy VFs
  MPI: Fix mpi_read_buffer
  crypto: qat - silence a static checker warning
  crypto: vmx - Fixing opcode issue
  crypto: caam - Use the preferred style for memory allocations
  crypto: caam - Propagate the real error code in caam_probe
  crypto: caam - Fix the error handling in caam_probe
  crypto: caam - fix writing to JQCR_MS when using service interface
  crypto: hash - Add AHASH_REQUEST_ON_STACK
  crypto: testmgr - Use new skcipher interface
  crypto: skcipher - Add top-level skcipher interface
  crypto: cmac - allow usage in FIPS mode
  crypto: sahara - Use dmam_alloc_coherent
  crypto: caam - Add support for LS1021A
  crypto: qat - Don't move data inside output buffer
  crypto: vmx - Fixing GHASH Key issue on little endian
  crypto: vmx - Fixing AES-CTR counter bug
  crypto: null - Add missing Kconfig tristate for NULL2
  crypto: nx - Add forward declaration for struct crypto_aead
  ...
2015-08-31 17:38:39 -07:00