Create sysfs interface to export data from H_BEST_ENERGY hcall
that can be used by administrative tools on supported pseries
platforms for energy management optimizations.
sys/device/system/cpu/pseries_(de)activate_hint_list and
sys/device/system/cpu/cpuN/pseries_(de)activate_hint will provide
hints for activation and deactivation of cpus respectively.
These hints are abstract number given by the hypervisor based
on the extended knowledge the hypervisor has regarding the
system topology and resource mappings.
The activate and the deactivate sysfs entry is for the two
distinct operations that we could do for energy savings. When
we have more capacity than required, we could deactivate few
core to save energy. The choice of the core to deactivate
will be based on /sys/devices/system/cpu/deactivate_hint_list.
The comma separated list of cpus (cores) will be the preferred
choice. If we have to activate some of the deactivated cores,
then /sys/devices/system/cpu/activate_hint_list will be used.
The per-cpu file
/sys/device/system/cpu/cpuN/pseries_(de)activate_hint further
provide more fine grain information by exporting the value of
the hint itself.
Added new driver module
arch/powerpc/platforms/pseries/pseries_energy.c
under new config option CONFIG_PSERIES_ENERGY
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These APIs take logical cpu number as input
Change cpu_first_thread_in_core() to cpu_first_thread_sibling()
Change cpu_last_thread_in_core() to cpu_last_thread_sibling()
These APIs convert core number (index) to logical cpu/thread numbers
Add cpu_first_thread_of_core(int core)
Changed cpu_thread_to_core() to cpu_core_index_of_thread(int cpu)
The goal is to make 'threads_per_core' accessible to the
pseries_energy module. Instead of making an API to read
threads_per_core, this is a higher level wrapper function to
convert from logical cpu number to core number.
The current APIs cpu_first_thread_in_core() and
cpu_last_thread_in_core() returns logical CPU number while
cpu_thread_to_core() returns core number or index which is
not a logical CPU number. The new APIs are now clearly named to
distinguish 'core number' versus first and last 'logical cpu
number' in that core.
The new APIs cpu_{first,last}_thread_sibling() work on
logical cpu numbers. While cpu_first_thread_of_core() and
cpu_core_index_of_thread() work on core index.
Example usage: (4 threads per core system)
cpu_first_thread_sibling(5) = 4
cpu_last_thread_sibling(5) = 7
cpu_core_index_of_thread(5) = 1
cpu_first_thread_of_core(1) = 4
cpu_core_index_of_thread() is used in cpu_to_drc_index() in the
module and cpu_first_thread_of_core() is used in
drc_index_to_cpu() in the module.
Make API changes to few callers. Export symbols for use in modules.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This introduces a pair of kernel parameters that can be used to disable
the MULTITCE and BULK_REMOVE h-calls.
By default, those hcalls are enabled, active, and good for throughput
and performance. The ability to disable them will be useful for some of
the PREEMPT_RT related investigation and work occurring on Power.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
cc: Olof Johansson <olof@lixom.net>
cc: Anton Blanchard <anton@samba.org>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The __KERNEL__ ifdef isn't necessary at this point, because it is
checked in an outer ifdef level already and has no effect here.
Signed-off-by: Christian Dietrich <qy03fugy@stud.informatik.uni-erlangen.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The crashkernel region will almost always overlap RTAS. If we free the
crashkernel region via "echo 0 > /sys/kernel/kexec_crash_size" then we will
free RTAS and the machine will crash in confusing and exciting ways.
Override crash_free_reserved_phys_range and check for overlap with RTAS.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
POWER5 added popcntb, and POWER7 added popcntw and popcntd. As a first step
this patch does all the work out of line, but it would be nice to implement
them as inlines with an out of line fallback.
The performance issue with hweight was noticed when disabling SMT on a large
(192 thread) POWER7 box. The patch improves that testcase by about 8%.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The perf hardware pmu got initialized at various points in the boot,
some before early_initcall() some after (notably arch_initcall).
The problem is that the NMI lockup detector is ran from early_initcall()
and expects the hardware pmu to be present.
Sanitize this by moving all architecture hardware pmu implementations to
initialize at early_initcall() and move the lockup detector to an explicit
initcall right after that.
Cc: paulus <paulus@samba.org>
Cc: davem <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290707759.2145.119.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb,ppc: Fix regression in evr register handling
kgdb,x86: fix regression in detach handling
kdb: fix crash when KDB_BASE_CMD_MAX is exceeded
kdb: fix memory leak in kdb_main.c
The commit 5e3d20a remove bkl from startup code so setup_arch() it isn't called
with bkl held anymore. Update the comment on top of that function.
Fix also a typo.
This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We were seeing oops like the following when we did an rmmod on a module:
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x8000000000008010
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 P5020 DS
last sysfs file: /sys/devices/qman-portals.2/qman-pool.9/uevent
Modules linked in: qman_tester(-)
NIP: 8000000000008010 LR: c000000000074858 CTR: 8000000000008010
REGS: c00000002e29bab0 TRAP: 0400 Not tainted
(2.6.34.6-00744-g2d21f14)
MSR: 0000000080029000 <EE,ME,CE> CR: 24000448 XER: 00000000
TASK = c00000007a8be600[4987] 'rmmod' THREAD: c00000002e298000 CPU: 1
GPR00: 8000000000008010 c00000002e29bd30 8000000000012798 c00000000035fb28
GPR04: 0000000000000002 0000000000000002 0000000024022428 c000000000009108
GPR08: fffffffffffffffe 800000000000a618 c0000000003c13c8 0000000000000000
GPR12: 0000000022000444 c00000000fffed00 0000000000000000 0000000000000000
GPR16: 00000000100c0000 0000000000000000 00000000100dabc8 0000000010099688
GPR20: 0000000000000000 00000000100cfc28 0000000000000000 0000000010011a44
GPR24: 00000000100017b2 0000000000000000 0000000000000000 0000000000000880
GPR28: c00000000035fb28 800000000000a7b8 c000000000376d80 c0000000003cce50
NIP [8000000000008010] .test_exit+0x0/0x10 [qman_tester]
LR [c000000000074858] .SyS_delete_module+0x1f8/0x2f0
Call Trace:
[c00000002e29bd30] [c0000000000748b4] .SyS_delete_module+0x254/0x2f0 (unreliable)
[c00000002e29be30] [c000000000000580] syscall_exit+0x0/0x2c
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
38600000 4e800020 60000000 60000000 <4e800020> 60000000 60000000 60000000
---[ end trace 4f57124939a84dc8 ]---
This appears to be due to checking the wrong permission bits in the
instruction_tlb_miss handling if the address that faulted was in vmalloc
space. We need to look at the supervisor execute (_PAGE_BAP_SX) bit and
not the user bit (_PAGE_BAP_UX/_PAGE_EXEC).
Also removed a branch level since it did not appear to be used.
Reported-by: Jeffrey Ladouceur <Jeffrey.Ladouceur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In:
powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
commit d28513bc7f
Author: David Gibson <david@gibson.dropbear.id.au>
subpage_protection() was changed to to take an mm rather a pgdir but it
didn't change calling site in hashpage_preload(). The change wasn't
noticed at compile time since hashpage_preload() used a void* as the
parameter to subpage_protection().
This is obviously wrong and can trigger the following crash when
CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_PPC_64K_PAGES
CONFIG_PPC_SUBPAGE_PROT are enabled.
Freeing unused kernel memory: 704k freed
Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6c49b7
Faulting instruction address: 0xc0000000000410f4
cpu 0x2: Vector: 300 (Data Access) at [c00000004233f590]
pc: c0000000000410f4: .hash_preload+0x258/0x338
lr: c000000000041054: .hash_preload+0x1b8/0x338
sp: c00000004233f810
msr: 8000000000009032
dar: 6b6b6b6b6b6c49b7
dsisr: 40000000
current = 0xc00000007e2c0070
paca = 0xc000000007fe0500
pid = 1, comm = init
enter ? for help
[c00000004233f810] c000000000041020 .hash_preload+0x184/0x338 (unreliable)
[c00000004233f8f0] c00000000003ed98 .update_mmu_cache+0xb0/0xd0
[c00000004233f990] c000000000157754 .__do_fault+0x48c/0x5dc
[c00000004233faa0] c000000000158fd0 .handle_mm_fault+0x508/0xa8c
[c00000004233fb90] c0000000006acdd4 .do_page_fault+0x428/0x6ac
[c00000004233fe30] c000000000005260 handle_page_fault+0x20/0x74
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit ffe8018c34 of the -mm tree
fixes the initramfs size calculation for e.g. s390 but breaks it
for 32bit architectures which do not define CONFIG_32BIT.
This patch fix the problem for PPC32 which will elsewise end up
with a __initramfs_size of 0.
Signed-off-by: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
arch/powerpc/mm/tlb_nohash.c: In function 'setup_initial_memory_limit':
arch/powerpc/mm/tlb_nohash.c:588:29: error: 'ppc64_memblock_base' undeclared (first use in this function)
arch/powerpc/mm/tlb_nohash.c:588:29: note: each undeclared identifier is reported only once for each function it appears in
Due to a copy/paste typo with the following commit:
commit cd3db0c4ca
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Tue Jul 6 15:39:02 2010 -0700
memblock: Remove rmo_size, burry it in arch/powerpc where it belongs
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
EEH and pci_dlpar #undef DEBUG, but I think they were added before the
ability to control this from Kconfig. It's really annoying to only get
some of the debug messages from these files. Leave the lpar.c #undef
alone as it produces so much output as to make the kernel unusable.
Update the Kconfig text to indicate this particular quirk :)
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The code is missing a fix that went into the main kernel variant
(we should try to share that code again at some stage)
Reported-by: Albert Cahalan <acahalan@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit ff10b88b5a (kgdb,ppc: Individual
register get/set for ppc) introduced a problem where memcpy was used
incorrectly to read and write the evr registers with a kernel that
has:
CONFIG_FSL_BOOKE=y
CONFIG_SPE=y
CONFIG_KGDB=y
This patch also fixes the following compilation problems:
arch/powerpc/kernel/kgdb.c: In function 'dbg_get_reg':
arch/powerpc/kernel/kgdb.c:341: error: passing argument 2 of 'memcpy' makes pointer from integer without a cast
arch/powerpc/kernel/kgdb.c: In function 'dbg_set_reg':
arch/powerpc/kernel/kgdb.c:366: error: passing argument 1 of 'memcpy' makes pointer from integer without a cast
[jason.wessel@windriver.com: Remove void * casts and fix patch header]
Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
CC: linuxppc-dev@lists.ozlabs.org
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While at it, fix two checkpatch errors.
Several non-const struct instances constified by this patch were added after
the introduction of platform_suspend_ops in checkpatch.pl's list of "should
be const" structs (79404849e9).
Patch against mainline.
Inspired by hunks of the grsecurity patch, updated for newer kernels.
Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
This was preventing the guest from setting any bits in the
hardware MSR which aren't forced on, such as MSR[SPE].
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
It is not legal to call mutex_lock() with interrupts disabled.
This will assert with debug checks enabled.
If there's a real need to disable interrupts here, it could be done
after the mutex is acquired -- but I don't see why it's needed at all.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The VCPU uninit calls some TLB functions, and the TLB uninit function
frees the memory used by them.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Fix an unresolved symbol with CONFIG_KVM_GUEST plus CONFIG_RELOCATABLE on
Book E.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Structure kvm_ppc_pvinfo is copied to userland with flags and
pad fields unitialized. It leads to leaking of contents of
kernel stack memory.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Unnecessary cast from void* in assignment.
Signed-off-by: matt mooney <mfm@muteddisk.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
There are two identical implementations of of_get_mac_address(), one
each in arch/powerpc/kernel/prom_parse.c and
arch/microblaze/kernel/prom_parse.c. Move this function to a new
common file of_net.{c,h} and adjust all the callers to include the new
header.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
[grant.likely@secretlab.ca: protect header with #ifdef]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* git://git.infradead.org/mtd-2.6: (82 commits)
mtd: fix build error in m25p80.c
mtd: Remove redundant mutex from mtd_blkdevs.c
MTD: Fix wrong check register_blkdev return value
Revert "mtd: cleanup Kconfig dependencies"
mtd: cfi_cmdset_0002: make sector erase command variable
mtd: cfi_cmdset_0002: add CFI detection for SST 38VF640x chips
mtd: cfi_util: add support for switching SST 39VF640xB chips into QRY mode
mtd: cfi_cmdset_0001: use defined value of P_ID_INTEL_PERFORMANCE instead of hardcoded one
block2mtd: dubious assignment
P4080/mtd: Fix the freescale lbc issue with 36bit mode
P4080/eLBC: Make Freescale elbc interrupt common to elbc devices
mtd: phram: use KBUILD_MODNAME
mtd: OneNAND: S5PC110: Fix double call suspend & resume function
mtd: nand: fix MTD_MODE_RAW writes
jffs2: use kmemdup
mtd: sm_ftl: cosmetic, use bool when possible
mtd: r852: remove useless pci powerup/down from suspend/resume routines
mtd: blktrans: fix a race vs kthread_stop
mtd: blktrans: kill BKL
mtd: allow to unload the mtdtrans module if its block devices aren't open
...
Fix up trivial whitespace-introduced conflict in drivers/mtd/mtdchar.c
Conflicts:
drivers/mtd/mtd_blkdevs.c
Merge Grant's device-tree bits so that we can apply the subsequent fixes.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
commit 534af1082329392bc29f6badf815e69ae2ae0f4c(kgdb,kdb: individual
register set and and get API) introduce dbg_get_reg/dbg_set_reg API
for individual register get and set.
This patch implement those APIs for ppc.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (38 commits)
kbuild: convert `arch/tile' to the kconfig mainmenu upgrade
README: cite nconfig
Revert "kconfig: Temporarily disable dependency warnings"
kconfig: Use PATH_MAX instead of 128 for path buffer sizes.
kconfig: Fix realloc usage()
kconfig: Propagate const
kconfig: Don't go out from read config loop when you read new symbol
kconfig: fix menuconfig on debian lenny
kbuild: migrate all arch to the kconfig mainmenu upgrade
kconfig: expand file names
kconfig: use the file's name of sourced file
kconfig: constify file name
kconfig: don't emit warning upon rootmenu's prompt redefinition
kconfig: replace KERNELVERSION usage by the mainmenu's prompt
kconfig: delay gconf window initialization
kconfig: expand by default the rootmenu's prompt
kconfig: add a symbol string expansion helper
kconfig: regen parser
kconfig: implement the `mainmenu' directive
kconfig: allow PACKAGE to be defined on the compiler's command-line
...
Fix up trivial conflict in arch/mn10300/Kconfig
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (48 commits)
DMAENGINE: move COH901318 to arch_initcall
dma: imx-dma: fix signedness bug
dma/timberdale: simplify conditional
ste_dma40: remove channel_type
ste_dma40: remove enum for endianess
ste_dma40: remove TIM_FOR_LINK option
ste_dma40: move mode_opt to separate config
ste_dma40: move channel mode to a separate field
ste_dma40: move priority to separate field
ste_dma40: add variable to indicate valid dma_cfg
async_tx: make async_tx channel switching opt-in
move async raid6 test to lib/Kconfig.debug
dmaengine: Add Freescale i.MX1/21/27 DMA driver
intel_mid_dma: change the slave interface
intel_mid_dma: fix the WARN_ONs
intel_mid_dma: Add sg list support to DMA driver
intel_mid_dma: Allow DMAC2 to share interrupt
intel_mid_dma: Allow IRQ sharing
intel_mid_dma: Add runtime PM support
DMAENGINE: define a dummy filter function for ste_dma40
...
The taskstats interface uses microsecond granularity for the user and
system time values. The conversion from cputime to the taskstats values
uses the cputime_to_msecs primitive which effectively limits the
granularity to milliseconds. Add the cputime_to_usecs primitive for
architectures that have better, more precise CPU time values. Remove
cputime_to_msecs primitive because there are no more users left.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Luck Tony <tony.luck@intel.com>
Cc: Shailabh Nagar <nagar1234@in.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Shailabh Nagar <nagar@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Detects RIO link to the already enumerated device and properly sets links
between device objects. Changes to the enumeration/discovery logic:
1. Use Master Enable bit to signal end of the enumeration - agents may
start their discovery process as soon as they see this bit set
(Component Tag register was used before for this purpose).
2. Enumerator sets Component Tag (!= 0) immediately during device
setup. This allows to identify the device if the redundant route
exists in a RIO system.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Rearrange RIO port-write interrupt handling to perform message
buffering as soon as possible.
- Modify to disable port-write controller when clearing Transaction
Error (TE) bit.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use new 'datavp' and 'datalp' variables in order to remove unnecessary
castings.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix up the arguments to arch_ptrace() to take account of the fact that
@addr and @data are now unsigned long rather than long as of a preceding
patch in this series.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph reported a nice splat which illustrated a race in the new stack
based kmap_atomic implementation.
The problem is that we pop our stack slot before we're completely done
resetting its state -- in particular clearing the PTE (sometimes that's
CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear
the PTE used for the last slot, that interrupt can reuse the slot in a
dirty state, which triggers a BUG in kmap_atomic().
Fix this by introducing kmap_atomic_idx() which reports the current slot
index without actually releasing it and use that to find the PTE and delay
the _pop() until after we're completely done.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Christoph Hellwig <hch@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we no longer need to provide KM_type, the whole pte_*map_nested()
API is now redundant, remove it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keep the current interface but ignore the KM_type and use a stack based
approach.
The advantage is that we get rid of crappy code like:
#define __KM_PTE \
(in_nmi() ? KM_NMI_PTE : \
in_irq() ? KM_IRQ_PTE : \
KM_PTE0)
and in general can stop worrying about what context we're in and what kmap
slots might be appropriate for that.
The downside is that FRV kmap_atomic() gets more expensive.
For now we use a CPP trick suggested by Andrew:
#define kmap_atomic(page, args...) __kmap_atomic(page)
to avoid having to touch all kmap_atomic() users in a single patch.
[ not compiled on:
- mn10300: the arch doesn't actually build with highmem to begin with ]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c]
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (365 commits)
ALSA: hda - Disable sticky PCM stream assignment for AD codecs
ALSA: usb - Creative USB X-Fi volume knob support
ALSA: ca0106: Use card specific dac id for mute controls.
ALSA: ca0106: Allow different sound cards to use different SPI channel mappings.
ALSA: ca0106: Create a nice spot for mapping channels to dacs.
ALSA: ca0106: Move enabling of front dac out of hardcoded setup sequence.
ALSA: ca0106: Pull out dac powering routine into separate function.
ALSA: ca0106 - add Sound Blaster 5.1vx info.
ASoC: tlv320dac33: Use usleep_range for delays
ALSA: usb-audio: add Novation Launchpad support
ALSA: hda - Add workarounds for CT-IBG controllers
ALSA: hda - Fix wrong TLV mute bit for STAC/IDT codecs
ASoC: tpa6130a2: Error handling for broken chip
ASoC: max98088: Staticise m98088_eq_band
ASoC: soc-core: Fix codec->name memory leak
ALSA: hda - Apply ideapad quirk to Acer laptops with Cxt5066
ALSA: hda - Add some workarounds for Creative IBG
ALSA: hda - Fix wrong SPDIF NID assignment for CA0110
ALSA: hda - Fix codec rename rules for ALC662-compatible codecs
ALSA: hda - Add alc_init_jacks() call to other codecs
...
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
mtd/m25p80: add support to parse the partitions by OF node
of/irq: of_irq.c needs to include linux/irq.h
of/mips: Cleanup some include directives/files.
of/mips: Add device tree support to MIPS
of/flattree: Eliminate need to provide early_init_dt_scan_chosen_arch
of/device: Rework to use common platform_device_alloc() for allocating devices
of/xsysace: Fix OF probing on little-endian systems
of: use __be32 types for big-endian device tree data
of/irq: remove references to NO_IRQ in drivers/of/platform.c
of/promtree: add package-to-path support to pdt
of/promtree: add of_pdt namespace to pdt code
of/promtree: no longer call prom_ functions directly; use an ops structure
of/promtree: make drivers/of/pdt.c no longer sparc-only
sparc: break out some PROM device-tree building code out into drivers/of
of/sparc: convert various prom_* functions to use phandle
sparc: stop exporting openprom.h header
powerpc, of_serial: Endianness issues setting up the serial ports
of: MTD: Fix OF probing on little-endian systems
of: GPIO: Fix OF probing on little-endian systems
When system uses 36bit physical address, res.start is 36bit
physical address. But the function of in_be32 returns 32bit
physical address. Then both of them compared each other is
wrong. So by converting the address of res.start into
the right format fixes this issue.
Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Move Freescale elbc interrupt from nand driver to elbc driver.
Then all elbc devices can use the interrupt instead of ONLY nand.
For former nand driver, it had the two functions:
1. detecting nand flash partitions;
2. registering elbc interrupt.
Now, second function is removed to fsl_lbc.c.
Signed-off-by: Lan Chunhe-B25806 <b25806@freescale.com>
Signed-off-by: Roy Zang <tie-fei.zang@freescale.com>
Reviewed-by: Anton Vorontsov <cbouatmailru@gmail.com>
Cc: Wood Scott-B07421 <B07421@freescale.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
KVM: Fix signature of kvm_iommu_map_pages stub
KVM: MCE: Send SRAR SIGBUS directly
KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
KVM: fix typo in copyright notice
KVM: Disable interrupts around get_kernel_ns()
KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
KVM: MMU: move access code parsing to FNAME(walk_addr) function
KVM: MMU: audit: check whether have unsync sps after root sync
KVM: MMU: audit: introduce audit_printk to cleanup audit code
KVM: MMU: audit: unregister audit tracepoints before module unloaded
KVM: MMU: audit: fix vcpu's spte walking
KVM: MMU: set access bit for direct mapping
KVM: MMU: cleanup for error mask set while walk guest page table
KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
KVM: x86: Fix constant type in kvm_get_time_scale
KVM: VMX: Add AX to list of registers clobbered by guest switch
KVM guest: Move a printk that's using the clock before it's ready
KVM: x86: TSC catchup mode
...
We have to protect the include for linux/of.h by __KERNEL__ so it doesn't
accidently get referenced outside.
This patch fixes this and makes the tree compile again.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The e500_tlb.c file didn't compile for me due to the following error:
arch/powerpc/kvm/e500_tlb.c: In function ‘kvmppc_e500_shadow_map’:
arch/powerpc/kvm/e500_tlb.c:300: error: format ‘%lx’ expects type ‘long unsigned int’, but argument 2 has type ‘gfn_t’
So let's explicitly cast the argument to make printk happy.
Signed-off-by: Alexander Graf <agraf@suse.de>
The kvmppc_e500_stlbe_invalidate() function was trying to pass too many
parameters to trace_kvm_stlb_inval(). This appears to be a bad
copy-paste from a call to trace_kvm_stlb_write().
Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
BookE also wants to support level based interrupts, so let's implement
all the necessary logic there. We need to trick a bit here because the
irqprios are 1:1 assigned to architecture defined values. But since there
is some space left there, we can just pick a random one and move it later
on - it's internal anyways.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we have all the level interrupt magic in place, let's
expose the capability to user space, so it can make use of it!
Signed-off-by: Alexander Graf <agraf@suse.de>
The current interrupt logic is just completely broken. We get a notification
from user space, telling us that an interrupt is there. But then user space
expects us that we just acknowledge an interrupt once we deliver it to the
guest.
This is not how real hardware works though. On real hardware, the interrupt
controller pulls the external interrupt line until it gets notified that the
interrupt was received.
So in reality we have two events: pulling and letting go of the interrupt line.
To maintain backwards compatibility, I added a new request for the pulling
part. The letting go part was implemented earlier already.
With this in place, we can now finally start guests that do not randomly stall
and stop to work at random times.
This patch implements above logic for Book3S.
Signed-off-by: Alexander Graf <agraf@suse.de>
Before I incorrectly enabled napping also for BookE, which would result in
needless dcache flushes. Since we only need to force enable napping on
Book3s_64 because it doesn't go into MSR_POW otherwise, we can just #ifdef
that code to this particular platform.
Reported-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Match only the first part of cur_cpu_spec->platform.
440GP (the first 440 processor) is identified by the string "ppc440gp", while
all later 440 processors use simply "ppc440".
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Developers can now tell at a glace the exact type of the premature interrupt,
instead of just knowing that there was some premature interrupt.
Signed-off-by: Hollis Blanchard <hollis_blanchard@mentor.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
On Book3S a mtmsr with the MSR_POW bit set indicates that the OS is in
idle and only needs to be waked up on the next interrupt.
Now, unfortunately we let that bit slip into the stored MSR value which
is not what the real CPU does, so that we ended up executing code like
this:
r = mfmsr();
/* r containts MSR_POW */
mtmsr(r | MSR_EE);
This obviously breaks, as we're going into idle mode in code sections that
don't expect to be idling.
This patch masks MSR_POW out of the stored MSR value on wakeup, making
guests happy again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Up until now we were doing segment mappings wrong on Book3s_32. For Book3s_64
we were using a trick where we know that a single mmu_context gives us 16 bits
of context ids.
The mm system on Book3s_32 instead uses a clever algorithm to distribute VSIDs
across the available range, so a context id really only gives us 16 available
VSIDs.
To keep at least a few guest processes in the SID shadow, let's map a number of
contexts that we can use as VSID pool. This makes the code be actually correct
and shouldn't hurt performance too much.
Signed-off-by: Alexander Graf <agraf@suse.de>
There are some heuristics in the PPC power management code that try to find
out if the particular hardware we're running on supports proper power management
or just hangs the machine when going into nap mode.
Since we know that KVM is safe with nap, let's force enable it in the PV code
once we're certain that we are on a KVM VM.
Signed-off-by: Alexander Graf <agraf@suse.de>
We had an arbitrary limitation in mtmsrd L=1 that kept us from using r30 and
r31 as input registers. Let's get rid of that and get more potential speedups!
Signed-off-by: Alexander Graf <agraf@suse.de>
When having a decrementor interrupt pending, the dequeuing happens manually
through an mtdec instruction. This instruction simply calls dequeue on that
interrupt, so the int_pending hint doesn't get updated.
This patch enables updating the int_pending hint also on dequeue, thus
correctly enabling guests to stay in guest contexts more often.
Signed-off-by: Alexander Graf <agraf@suse.de>
So far we've been restricting ourselves to r0-r29 as registers an mtmsr
instruction could use. This was bad, as there are some code paths in
Linux actually using r30.
So let's instead handle all registers gracefully and get rid of that
stupid limitation
Signed-off-by: Alexander Graf <agraf@suse.de>
This is the guest side of the mtsr acceleration. Using this a guest can now
call mtsrin with almost no overhead as long as it ensures that it only uses
it with (MSR_IR|MSR_DR) == 0. Linux does that, so we're good.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that the actual mtsr doesn't do anything anymore, we can move the sr
contents over to the shared page, so a guest can directly read and write
its sr contents from guest context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Right now we're examining the contents of Book3s_32's segment registers when
the register is written and put the interpreted contents into a struct.
There are two reasons this is bad. For starters, the struct has worse real-time
performance, as it occupies more ram. But the more important part is that with
segment registers being interpreted from their raw values, we can put them in
the shared page, allowing guests to mess with them directly.
This patch makes the internal representation of SRs be u32s.
Signed-off-by: Alexander Graf <agraf@suse.de>
The current approach duplicates the spr->bat finding logic and makes it harder
to reuse the actually used variables. So let's move everything down to the spr
handler.
Signed-off-by: Alexander Graf <agraf@suse.de>
We will soon add SR PV support to the shared page, so we need some
infrastructure that allows the guest to query for features KVM exports.
This patch adds a second return value to the magic mapping that
indicated to the guest which features are available.
Signed-off-by: Alexander Graf <agraf@suse.de>
It turns out the in-kernel hash function is sub-optimal for our subtle
hash inputs where every bit is significant. So let's revert to the original
hash functions.
This reverts commit 05340ab4f9a6626f7a2e8f9fe5397c61d494f445.
Signed-off-by: Alexander Graf <agraf@suse.de>
There is a race condition in the pte invalidation code path where we can't
be sure if a pte was invalidated already. So let's move the spin lock around
to get rid of the race.
Signed-off-by: Alexander Graf <agraf@suse.de>
When hitting a no-execute or read-only data/inst storage interrupt we were
flushing the respective PTE so we're sure it gets properly overwritten next.
According to the spec, this is unnecessary though. The guest issues a tlbie
anyways, so we're safe to just keep the PTE around and have it manually removed
from the guest, saving us a flush.
Signed-off-by: Alexander Graf <agraf@suse.de>
When the guest jumps into kernel mode and has the magic page mapped, theres a
very high chance that it will also use it. So let's detect that scenario and
map the segment accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>
The different ways of flusing shadow ptes have their own debug prints which use
stupid old printk.
Let's move them to tracepoints, making them easier available, faster and
possible to activate on demand
Signed-off-by: Alexander Graf <agraf@suse.de>
After a flush the sid map contained lots of entries with 0 for their gvsid and
hvsid value. Unfortunately, 0 can be a real value the guest searches for when
looking up a vsid so it would incorrectly find the host's 0 hvsid mapping which
doesn't belong to our sid space.
So let's also check for the valid bit that indicated that the sid we're
looking at actually contains useful data.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have a debug printk on every exit that is usually #ifdef'ed out. Using
tracepoints makes a lot more sense here though, as they can be dynamically
enabled.
This patch converts the most commonly used debug printks of EXIT_DEBUG to
tracepoints.
Signed-off-by: Alexander Graf <agraf@suse.de>
When CONFIG_KVM_GUEST is selected, but CONFIG_KVM is not, we were missing
some defines in asm-offsets.c and included too many headers at other places.
This patch makes above configuration work.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Add kvm_release_page_clean() after is_error_page() to avoid
leakage of error page.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When using a relocatable kernel we need to make sure that the trampline code
and the interrupt handlers are both copied to low memory. The only way to do
this reliably is to put them in the copied section.
This patch should make relocated kernels work with KVM.
KVM-Stable-Tag
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On Book3S KVM we directly expose some asm pointers to C code as
variables. These need to be relocated and thus break on relocatable
kernels.
To make sure we can at least build, let's mark them as long instead
of u32 where 64bit relocations don't work.
This fixes the following build error:
WARNING: 2 bad relocations^M
> c000000000008590 R_PPC64_ADDR32 .text+0x4000000000008460^M
> c000000000008594 R_PPC64_ADDR32 .text+0x4000000000008598^M
Please keep in mind that actually using KVM on a relocated kernel
might still break. This only fixes the compile problem.
Reported-by: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Book3S_32 requires MSR_DR to be disabled during load_up_xxx while on Book3S_64
it's supposed to be enabled. I misread the code and disabled it in both cases,
potentially breaking the PS3 which has a really small RMA.
This patch makes KVM work on the PS3 again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On Book3s_32 the tlbie instruction flushed effective addresses by the mask
0x0ffff000. This is pretty hard to reflect with a hash that hashes ~0xfff, so
to speed up that target we should also keep a special hash around for it.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On failure gfn_to_pfn returns bad_page so use correct function to check
for that.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
So far we've been running all code without locking of any sort. This wasn't
really an issue because I didn't see any parallel access to the shadow MMU
code coming.
But then I started to implement dirty bitmapping to MOL which has the video
code in its own thread, so suddenly we had the dirty bitmap code run in
parallel to the shadow mmu code. And with that came trouble.
So I went ahead and made the MMU modifying functions as parallelizable as
I could think of. I hope I didn't screw up too much RCU logic :-). If you
know your way around RCU and locking and what needs to be done when, please
take a look at this patch.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Due to previous changes, the Book3S_32 guest MMU code didn't compile properly
when enabling debugging.
This patch repairs the broken code paths, making it possible to define DEBUG_MMU
and friends again.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to tell the guest the opcodes that make up a hypercall through
interfaces that are controlled by userspace. So we need to add a call
for userspace to allow it to query those opcodes so it can pass them
on.
This is required because the hypercall opcodes can change based on
the hypervisor conditions. If we're running in hardware accelerated
hypervisor mode, a hypercall looks different from when we're running
without hardware acceleration.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On BookE the preferred way to write the EE bit is the wrteei instruction. It
already encodes the EE bit in the instruction.
So in order to get BookE some speedups as well, let's also PV'nize thati
instruction.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
There is also a form of mtmsr where all bits need to be addressed. While the
PPC64 Linux kernel behaves resonably well here, on PPC32 we do not have an
L=1 form. It does mtmsr even for simple things like only changing EE.
So we need to hook into that one as well and check for a mask of bits that we
deem safe to change from within guest context.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The PowerPC ISA has a special instruction for mtmsr that only changes the EE
and RI bits, namely the L=1 form.
Since that one is reasonably often occuring and simple to implement, let's
go with this first. Writing EE=0 is always just a store. Doing EE=1 also
requires us to check for pending interrupts and if necessary exit back to the
hypervisor.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When we hook an instruction we need to make sure we don't clobber any of
the registers at that point. So we write them out to scratch space in the
magic page. To make sure we don't fall into a race with another piece of
hooked code, we need to disable interrupts.
To make the later patches and code in general easier readable, let's introduce
a set of defines that save and restore r30, r31 and cr. Let's also define some
helpers to read the lower 32 bits of a 64 bit field on 32 bit systems.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will need to patch several instruction streams over to a different
code path, so we need a way to patch a single instruction with a branch
somewhere else.
This patch adds a helper to facilitate this patching.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will soon require more sophisticated methods to replace single instructions
with multiple instructions. We do that by branching to a memory region where we
write replacement code for the instruction to.
This region needs to be within 32 MB of the patched instruction though, because
that's the furthest we can jump with immediate branches.
So we keep 1MB of free space around in bss. After we're done initing we can just
tell the mm system that the unused pages are free, but until then we have enough
space to fit all our code in.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
With our current MMU scheme we don't need to know about the tlbsync instruction.
So we can just nop it out.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Some instructions can simply be replaced by load and store instructions to
or from the magic page.
This patch replaces often called instructions that fall into the above category.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will soon start and replace instructions from the text section with
other, paravirtualized versions. To ease the readability of those patches
I split out the generic looping and magic page mapping code out.
This patch still only contains stubs. But at least it loops through the
text section :).
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We have all the hypervisor pieces in place now, but the guest parts are still
missing.
This patch implements basic awareness of KVM when running Linux as guest. It
doesn't do anything with it yet though.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
Now that we have the shared page in place and the MMU code knows about
the magic page, we can expose that capability to the guest!
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We need to override EA as well as PA lookups for the magic page. When the guest
tells us to project it, the magic page overrides any guest mappings.
In order to reflect that, we need to hook into all the MMU layers of KVM to
force map the magic page if necessary.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
We will be introducing a method to project the shared page in guest context.
As soon as we're talking about this coupling, the shared page is colled magic
page.
This patch introduces simple defines, so the follow-up patches are easier to
read.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
On PowerPC it's very normal to not support all of the physical RAM in real mode.
To check if we're matching on the shared page or not, we need to know the limits
so we can restrain ourselves to that range.
So let's make it a define instead of open-coding it. And while at it, let's also
increase it.
Signed-off-by: Alexander Graf <agraf@suse.de>
v2 -> v3:
- RMO -> PAM (non-magic page)
Signed-off-by: Avi Kivity <avi@redhat.com>
When the guest turns on interrupts again, it needs to know if we have an
interrupt pending for it. Because if so, it should rather get out of guest
context and get the interrupt.
So we introduce a new field in the shared page that we use to tell the guest
that there's a pending interrupt lying around.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
While running in hooked code we need to store register contents out because
we must not clobber any registers.
So let's add some fields to the shared page we can just happily write to.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When running in hooked code we need a way to disable interrupts without
clobbering any interrupts or exiting out to the hypervisor.
To achieve this, we have an additional critical field in the shared page. If
that field is equal to the r1 register of the guest, it tells the hypervisor
that we're in such a critical section and thus may not receive any interrupts.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
To communicate with KVM directly we need to plumb some sort of interface
between the guest and KVM. Usually those interfaces use hypercalls.
This hypercall implementation is described in the last patch of the series
in a special documentation file. Please read that for further information.
This patch implements stubs to handle KVM PPC hypercalls on the host and
guest side alike.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
When in kernel mode there are 4 additional registers available that are
simple data storage. Instead of exiting to the hypervisor to read and
write those, we can just share them with the guest using the page.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The SRR0 and SRR1 registers contain cached values of the PC and MSR
respectively. They get written to by the hypervisor when an interrupt
occurs or directly by the kernel. They are also used to tell the rfi(d)
instruction where to jump to.
Because it only gets touched on defined events that, it's very simple to
share with the guest. Hypervisor and guest both have full r/w access.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The DAR register contains the address a data page fault occured at. This
register behaves pretty much like a simple data storage register that gets
written to on data faults. There is no hypervisor interaction required on
read or write.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
The DSISR register contains information about a data page fault. It is fully
read/write from inside the guest context and we don't need to worry about
interacting based on writes of this register.
This patch converts all users of the current field to the shared page.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
One of the most obvious registers to share with the guest directly is the
MSR. The MSR contains the "interrupts enabled" flag which the guest has to
toggle in critical sections.
So in order to bring the overhead of interrupt en- and disabling down, let's
put msr into the shared page. Keep in mind that even though you can fully read
its contents, writing to it doesn't always update all state. There are a few
safe fields that don't require hypervisor interaction. See the documentation
for a list of MSR bits that are safe to be set from inside the guest.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
For transparent variable sharing between the hypervisor and guest, I introduce
a shared page. This shared page will contain all the registers the guest can
read and write safely without exiting guest context.
This patch only implements the stubs required for the basic structure of the
shared page. The actual register moving follows.
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (141 commits)
USB: mct_u232: fix broken close
USB: gadget: amd5536udc.c: fix error path
USB: imx21-hcd - fix off by one resource size calculation
usb: gadget: fix Kconfig warning
usb: r8a66597-udc: Add processing when USB was removed.
mxc_udc: add workaround for ENGcm09152 for i.MX35
USB: ftdi_sio: add device ids for ScienceScope
USB: musb: AM35x: Workaround for fifo read issue
USB: musb: add musb support for AM35x
USB: AM35x: Add musb support
usb: Fix linker errors with CONFIG_PM=n
USB: ohci-sh - use resource_size instead of defining its own resource_len macro
USB: isp1362-hcd - use resource_size instead of defining its own resource_len macro
USB: isp116x-hcd - use resource_size instead of defining its own resource_len macro
USB: xhci: Fix compile error when CONFIG_PM=n
USB: accept some invalid ep0-maxpacket values
USB: xHCI: PCI power management implementation
USB: xHCI: bus power management implementation
USB: xHCI: port remote wakeup implementation
USB: xHCI: port power management implementation
...
Manually fix up (non-data) conflict: the SCSI merge gad renamed the
'hw_sector_size' member to 'physical_block_size', and the USB tree
brought a new use of it.
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek
Replace FSL USB platform code by simple platform driver for
creation of FSL USB platform devices.
The driver creates platform devices based on the information
from USB nodes in the flat device tree. This is the replacement
for old arch fsl_soc usb code removed by this patch. The driver
uses usual of-style binding, available EHCI-HCD and UDC
drivers can be bound to the created devices. The new of-style
driver additionaly instantiates USB OTG platform device, as the
appropriate USB OTG driver will be added soon.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits)
powerpc/44x: Update ppc44x_defconfig
powerpc/watchdog: Make default timeout for Book-E watchdog a Kconfig option
fsl_rio: Add comments for sRIO registers.
powerpc/fsl-booke: Add e55xx (64-bit) smp defconfig
powerpc/fsl-booke: Add p5020 DS board support
powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips
powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes
powerpc/fsl-booke: Add support for FSL 64-bit e5500 core
powerpc/85xx: add cache-sram support
powerpc/85xx: add ngPIXIS FPGA device tree node to the P1022DS board
powerpc: Fix compile error with paca code on ppc64e
powerpc/fsl-booke: Add p3041 DS board support
oprofile/fsl emb: Don't set MSR[PMM] until after clearing the interrupt.
powerpc/fsl-booke: Add PCI device ids for P2040/P3041/P5010/P5020 QoirQ chips
powerpc/mpc8xxx_gpio: Add support for 'qoriq-gpio' controllers
powerpc/fsl_booke: Add support to boot from core other than 0
powerpc/p1022: Add probing for individual DMA channels
powerpc/fsl_soc: Search all global-utilities nodes for rstccr
powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT
powerpc/mpc83xx: Support for MPC8308 P1M board
...
Fix up conflict with the generic irq_work changes in arch/powerpc/kernel/time.c
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S
xen: Cope with unmapped pages when initializing kernel pagetable
memblock, bootmem: Round pfn properly for memory and reserved regions
memblock: Annotate memblock functions with __init_memblock
memblock: Allow memblock_init to be called early
memblock/arm: Fix memblock_region_is_memory() typo
x86, memblock: Remove __memblock_x86_find_in_range_size()
memblock: Fix wraparound in find_region()
x86-32, memblock: Make add_highpages honor early reserved ranges
x86, memblock: Fix crashkernel allocation
arm, memblock: Fix the sparsemem build
memblock: Fix section mismatch warnings
powerpc, memblock: Fix memblock API change fallout
memblock, microblaze: Fix memblock API change fallout
x86: Remove old bootmem code
x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
x86: Remove not used early_res code
x86, memblock: Replace e820_/_early string with memblock_
x86: Use memblock to replace early_res
x86, memblock: Use memblock_debug to control debug message print out
...
Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags:
Fix IRQ flag handling naming
MIPS: Add missing #inclusions of <linux/irq.h>
smc91x: Add missing #inclusion of <linux/irq.h>
Drop a couple of unnecessary asm/system.h inclusions
SH: Add missing consts to sys_execve() declaration
Blackfin: Rename IRQ flags handling functions
Blackfin: Add missing dep to asm/irqflags.h
Blackfin: Rename DES PC2() symbol to avoid collision
Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header
Blackfin: Split PLL code from mach-specific cdef headers
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6: (53 commits)
spi/omap2_mcspi: Verify TX reg is empty after TX only xfer with DMA
spi/omap2_mcspi: disable channel after TX_ONLY transfer in PIO mode
spi/bfin_spi: namespace local structs
spi/bfin_spi: init early
spi/bfin_spi: check per-transfer bits_per_word
spi/bfin_spi: warn when CS is driven by hardware (CPHA=0)
spi/bfin_spi: cs should be always low when a new transfer begins
spi/bfin_spi: fix typo in comment
spi/bfin_spi: reject unsupported SPI modes
spi/bfin_spi: use dma_disable_irq_nosync() in irq handler
spi/bfin_spi: combine duplicate SPI_CTL read/write logic
spi/bfin_spi: reset ctl_reg bits when setup is run again on a device
spi/bfin_spi: push all size checks into the transfer function
spi/bfin_spi: use nosync when disabling the IRQ from the IRQ handler
spi/bfin_spi: sync hardware state before reprogramming everything
spi/bfin_spi: save/restore state when suspending/resuming
spi/bfin_spi: redo GPIO CS handling
Blackfin: SPI: expand SPI bitmasks
spi/bfin_spi: use the SPI namespaced bit names
spi/bfin_spi: drop extra memory we don't need
...
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
apic, x86: Check if EILVT APIC registers are available (AMD only)
x86: ioapic: Call free_irte only if interrupt remapping enabled
arm: Use ARCH_IRQ_INIT_FLAGS
genirq, ARM: Fix boot on ARM platforms
genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
x86: Switch sparse_irq allocations to GFP_KERNEL
genirq: Switch sparse_irq allocator to GFP_KERNEL
genirq: Make sparse_lock a mutex
x86: lguest: Use new irq allocator
genirq: Remove the now unused sparse irq leftovers
genirq: Sanitize dynamic irq handling
genirq: Remove arch_init_chip_data()
x86: xen: Sanitise sparse_irq handling
x86: Use sane enumeration
x86: uv: Clean up the direct access to irq_desc
x86: Make io_apic.c local functions static
genirq: Remove irq_2_iommu
x86: Speed up the irq_remapped check in hot pathes
intr_remap: Simplify the code further
...
Fix up trivial conflicts in arch/x86/Kconfig
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
sched: Export account_system_vtime()
sched: Call tick_check_idle before __irq_enter
sched: Remove irq time from available CPU power
sched: Do not account irq time to current task
x86: Add IRQ_TIME_ACCOUNTING
sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
sched: Add a PF flag for ksoftirqd identification
sched: Consolidate account_system_vtime extern declaration
sched: Fix softirq time accounting
sched: Drop group_capacity to 1 only if local group has extra capacity
sched: Force balancing on newidle balance if local group has capacity
sched: Set group_imb only a task can be pulled from the busiest cpu
sched: Do not consider SCHED_IDLE tasks to be cache hot
sched: Drop all load weight manipulation for RT tasks
sched: Create special class for stop/migrate work
sched: Unindent labels
sched: Comment updates: fix default latency and granularity numbers
tracing/sched: Add sched_pi_setprio tracepoint
sched: Give CPU bound RT tasks preference
sched: Try not to migrate higher priority RT tasks
...
This patch refactors the early init parsing of the chosen node so that
architectures aren't forced to provide an empty implementation of
early_init_dt_scan_chosen_arch. Instead, if an architecture wants to
do something different, it can either use a wrapper function around
early_init_dt_scan_chosen(), or it can replace it altogether.
This patch was written in preparation to adding device tree support to
both x86 ad MIPS.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: David Daney <ddaney@caviumnetworks.com>
The current code allocates and manages platform_devices created from
the device tree manually. It also uses an unsafe shortcut for
allocating the platform_device and the resource table at the same
time. (which I added in the last rework; sorry).
This patch refactors the code to use platform_device_alloc() for
allocating new devices. This reduces the amount of custom code
implemented by of_platform, eliminates the unsafe alloc trick, and has
the side benefit of letting the platform_bus code manage freeing the
device data and resources when the device is freed.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Commit c3f00c70 ("perf: Separate find_get_context() from event
initialization") changed the generic perf_event code to call
perf_event_alloc, which calls the arch-specific event_init code,
before looking up the context for the new event. Unfortunately,
power_pmu_event_init uses event->ctx->task to see whether the
new event is a per-task event or a system-wide event, and thus
crashes since event->ctx is NULL at the point where
power_pmu_event_init gets called.
(The reason it needs to know whether it is a per-task event is
because there are some hardware events on Power systems which
only count when the processor is not idle, and there are some
fixed-function counters which count such events. For example,
the "run cycles" event counts cycles when the processor is not
idle. If the user asks to count cycles, we can use "run cycles"
if this is a per-task event, since the processor is running when
the task is running, by definition. We can't use "run cycles"
if the user asks for "cycles" on a system-wide counter.)
Fortunately the information we need is in the
event->attach_state field, so we just use that instead.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101019055535.GA10398@drongo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Alexey Kardashevskiy <aik@au1.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just a minor cleanup patch that makes things easier to the following patches.
No functionality change in this patch.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-3-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.
Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.
The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.
Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch below updates broken web addresses in the arch directory.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
The Freescale P1022DS has an on-chip video controller called the DIU, and a
driver for this device already exists. Update the platform file for the
P1022DS reference board to enable the driver, and update the defconfig for
Freescale MPC85xx boards to add the driver.
[Edited to resolve header add/add conflict and drop #define DEBUG.
-- broonie]
Signed-off-by: Timur Tabi <timur@freescale.com>
Acked-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Make sure the new bluestone board is selected for the multiplatform defconfig.
Also build logfs and squashfs as modules.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Add some comments to make sRIO registers map better readable.
Signed-off-by: Shaohui Xie <b21989@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The p5020 SoC from Freescale is the first 64-bit Book-E processor and
utilizes the two e5500 cores. Adding a defconfig that enables basic kernel
for e5500 based processors.
Also added the p5020 / e5500 support to the ppc64e defconfig.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On Freescale parts typically have TLB array for large mappings that we can
bolt the linear mapping into. We utilize the code that already exists
on PPC32 on the 64-bit side to setup the linear mapping to be cover by
bolted TLB entries. We utilize a quarter of the variable size TLB array
for this purpose.
Additionally, we limit the amount of memory to what we can cover via
bolted entries so we don't get secondary faults in the TLB miss
handlers. We should fix this limitation in the future.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Update setup_page_sizes() to support for a MMU v1.0 FSL style MMU
implementation. In such a processor, we don't have TLB0PS or EPTCFG
registers (and access to these registers may cause exceptions). We need
to parse the older format of TLBnCFG for page size support. Additionaly,
assume since we are an FSL implementation that we have 2 TLB arrays and
the second array contains the variable size pages.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The new e5500 core is similar to the e500mc core but adds 64-bit
support. We support running it in 32-bit mode as it is identical to the
e500mc.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
It adds cache-sram support in P1/P2 QorIQ platforms as under:
* A small abstraction over powerpc's remote heap allocator
* Exports mpc85xx_cache_sram_alloc()/free() APIs
* Supports only one contiguous SRAM window
* Drivers can do the following in Kconfig to use these APIs
"select FSL_85XX_CACHE_SRAM if MPC85xx"
* Required SRAM size and the offset where SRAM should be mapped must be
provided at kernel command line as :
cache-sram-size=<value>
cache-sram-offset=<offset>
Signed-off-by: Harninder Rai <harninder.rai@freescale.com>
Signed-off-by: Vivek Mahajan <vivek.mahajan@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The device tree for Freescale's P1022DS reference board is missing the node
for the ngPIXIS FPGA.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
arch/powerpc/kernel/paca.c: In function 'allocate_lppacas':
arch/powerpc/kernel/paca.c:111:1: error: parameter name omitted
arch/powerpc/kernel/paca.c:111:1: error: parameter name omitted
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On an arch 2.06 hypervisor, a pending perfmon interrupt will be delivered
to the hypervisor at any point the guest is running, regardless of
MSR[EE]. In order to reflect this interrupt, the hypervisor has to mask
the interrupt in PMGC0 -- and set MSRP[PMMP] to intercept futher guest
accesses to the PMRs to detect when to unmask (and prevent the guest from
unmasking early, or seeing inconsistent state).
This has the side effect of ignoring any changes the guest makes to
MSR[PMM], so wait until after the interrupt is clear, and thus the
hypervisor should have cleared MSRP[PMMP], before setting MSR[PMM]. The
counters wil not actually run until PMGC0[FAC] is cleared in
pmc_start_ctrs(), so this will not reduce the effectiveness of PMM.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Add 'fsl,qoriq-gpio' compatiable to the list we search for to bind
against for mpc8xxx_gpio. This compatiable will be used on P1-P5xxx
QorIQ devices like P4080.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>