We plan to change mm->cpu_vm_mask definition later. Thus, this patch convert
it into proper macro.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Since v2.6.20 "Pass struct dev pointer to dma_cache_sync()"
(d3fa72e455), dma_cache_sync() takes a
struct dev pointer, but these appear to be missing from the tile and
mn10300 implementations, so add them.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
[cmetcalf@tilera.com: took only the "tile" portion as I don't maintain mn10300]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
They are only applicable for locally-homecached memory ranges, so
change their names to {flush,finv}_buffer_local(). Change inv_buffer()
to just do an mf instead of any kind of fancier barrier, since you're
obviously not going to be waiting for anything once the local homecache
is invalidated.
Fix tilepro.c network driver not to bother calling finv_buffer when
stopping the EPP, but just mf after memset to ensure that it will not
see any packet data after we finish stopping; use finv_buffer_remote()
when doing exit-time cleanup.
This also fixes a (not very interesting) generic Linux build failure
where drivers/scsi/st.c declares its own flush_buffer().
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
User space code has been able to discover the static page size
by including a special <hv/pagesize.h> file. In the current release,
that file is now gone, and <asm/page.h> doesn't rely on it. The
getpagesize() API is now the only way for userspace to get the page size.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This change adds a number of missing headers in asm (fb.h, parport.h,
serial.h, and vga.h) using the minimal generic versions.
It also adds a number of missing interfaces that showed up as build
failures when trying to build various drivers not normally included in the
"tile" distribution: ioremap_wc(), memset_io(), io{read,write}{16,32}be(),
virt_to_bus(), bus_to_virt(), irq_canonicalize(), __pte(), __pgd(),
and __pmd(). I also added a cast in virt_to_page() since not all callers
pass a pointer.
I fixed <asm/stat.h> to properly include a __KERNEL__ guard for the
__ARCH_WANT_STAT64 symbol, and <asm/swab.h> to use __builtin_bswap32()
even for our 64-bit architecture, since the same code is produced.
I added an export for get_cycles(), since it's used in some modules.
And I made <arch/spr_def.h> properly include the __KERNEL__ guard,
even though it's not yet exported, since it likely will be soon.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Otherwise, it's possible to end up with the prefetcher pulling
data into cache that the code believes has been flushed.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Otherwise, in principle, there could be stale I$ data present
next time the page that previously held the kernel module code was
used to run some new code.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This allows processes to spread more effectively to multiple cores
(particularly important on 64-core chips!).
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This semantic was already true for atomic operations within the kernel,
and this change makes it true for the fast atomic syscalls (__NR_cmpxchg
and __NR_atomic_update) as well. Previously, user-space had to use
the fast atomic syscalls exclusively to update memory, since raw stores
could lose a race with the atomic update code even when the atomic update
hadn't actually modified the value.
With this change, we no longer write back the value to memory if it
hasn't changed. This allows certain types of idioms in user space to
work as expected, e.g. "atomic exchange" to acquire a spinlock, followed
by a raw store of zero to release the lock.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This support is required for CONFIG_KEYS, NFSv4 kernel DNS, etc.
The change is slightly more complex than the minimal thing, since
I took advantage of having to go into the assembly code to just
move a bunch of stuff into C code: specifically, the schedule(),
do_async_page_fault(), do_signal(), and single_step_once() support,
in addition to the TIF_NOTIFY_RESUME support.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
This change is the result of some work to make the backtrace code more
shareable between kernel, libc, and gdb.
For the kernel, some good effects are to eliminate the hacky
"VirtualAddress" typedef in favor of "unsigned long", to eliminate a
bunch of spurious kernel doc comments, to remove the dead "bt_read_memory"
function, and to use "__tilegx__" in #ifdefs instead of "TILE_CHIP".
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] kvm-390: Let kernel exit SIE instruction on work
[S390] dasd: check sense type in device change handler
[S390] pfault: fix token handling
[S390] qdio: reset error states immediately
[S390] fix page table walk for changing page attributes
[S390] prng: prevent access beyond end of stack
[S390] dasd: fix race between open and offline
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
ASoC: add a module alias to the FSI driver
ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
ASoC: codecs: JZ4740: Fix OOPS
ASoC: Fix output PGA enabling in wm_hubs CODECs
ASoC: sn95031: decorate function with __devexit_p()
ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
ASoC: sst_platform: Fix lock acquring
ASoC: fsi: driver safely remove for against irq
ASoC: fsi: modify vague PM control on probe
ASoC: fsi: take care in failing case of dai register
MAINTAINERS: Update Samsung ASoC maintainer's id
ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
ASoC: Set left channel volume update bits for WM8994
ASoC: fix config error path
ASoC: check channel mismatch between cpu_dai and codec_dai
ASoC: Tegra: Suspend/resume support
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, x86: Update/fix Intel Nehalem cache events
perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
x86, perf event: Turn off unstructured raw event access to offcore registers
perf: Support Xeon E7's via the Westmere PMU driver
It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).
Don pointed out that:
" I also noticed this patch fixed some unknown NMIs
on a P4 when I stressed the box".
Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:
|
| The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
| user space bits were not. This made it impossible to set the extra mask
| and actually do the OFFCORE profiling
|
Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:
|
| Some raw events -- like the Intel OFFCORE events -- support additional
| parameters. These can be appended after a ':'.
|
| For example on a multi socket Intel Nehalem:
|
| perf stat -e r1b7:20ff -a sleep 1
|
| Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
| that measures any access to DRAM on another socket.
|
But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.
The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.
We already have such generalization in place for CPU cache events,
and it's all very extensible.
"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.
We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.
That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:
perf record -e dram ./myapp
perf record -e dram-remote ./myapp
... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.
These generalized events would work on all CPUs and architectures that
have comparable PMU features.
( Note, these are just examples: actual implementation could have more
sophistication and more parameter - as long as they center around
similarly simple usecases. )
Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:
e994d7d23a: perf: Fix LLC-* events on Intel Nehalem/Westmere
But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.
( Note: after generalization has been implemented raw offcore events can be
supported as well: there can always be an odd event that is marginally
useful but not useful enough to generalize. DRAM profiling is definitely
*not* such a category so generalization must be done first. )
Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d23a commit, not mentioned in the changelog.
As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.
Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.
Reported-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y
when NUMA emulation is enabled is currently broken because it does
not iterate through every emulated node and bind cpus that have
affinity to it.
NUMA emulation should bind each cpu to every local node to
accurately represent the true NUMA topology of the underlying
machine.
debug_cpumask_set_cpu() needs to be fixed at the same time so
that the debugging information that it emits shows the new
cpumask of the node being assigned when the cpu is being added
or removed.
It can now take responsibility of setting or clearing the cpu
itself to remove the need for duplicate code.
Also change its last parameter, "enable", to have the correct bool
type since it can only be true or false.
-v2: Fix the return statements, by Kosaki Motohiro
Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Andreas Herrmann reported that 7d6b46707f ("x86, NUMA: Fix fakenuma
boot failure") causes certain physical NUMA topologies (for example
AMD Magny-Cours) to move sibling cpus to a single node when in reality
they are in separate domains.
This may result in some nodes being completely void of cpus, which
doesn't accurately represent the correct topology. The system will
boot, but will have suboptimal NUMA performance.
This commit was intended as a fix for NUMA emulation, but should
not cause a regression for real NUMA machines as a side effect.
( There will be a separate fix for the numa-debug code, which
will not affect physical topologies. )
Reported-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918110.12634@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6881/1: cputype.h uses __attribute_const__ which requires including kernel.h
ARM: Add new syscalls
* 'stable/bug-fixes-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: mask_rw_pte: do not apply the early_ioremap checks on x86_32
xen: do not create the extra e820 region at an addr lower than 4G
Adding two sets of I2C devices to the same bus doesn't quite work,
atleast not anymore. Stash one array and determine how much of it
shall be added instead.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
x86_64, in fact early_ioremap is not used at all to setup the initial
pagetable on x86_32.
Moreover on x86_32 the two checks are wrong because the range
pgt_buf_start..pgt_buf_end initially should be mapped RW because
the pages in the range are not pagetable pages yet and haven't been
cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
part of the initial mapping, xen_alloc_pte is capable of turning
the ptes RO when they become pagetable pages.
Fix the issue and improve the readability of the code providing two
different implementation of mask_rw_pte for x86_32 and x86_64.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Do not add the extra e820 region at a physical address lower than 4G
because it breaks e820_end_of_low_ram_pfn().
It is OK for us to move the xen_extra_mem_start up and down because this
is the index of the memory that can be ballooned in/out - it is memory
not available to the kernel during bootup.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
From: Christian Borntraeger <borntraeger@de.ibm.com>
This patch fixes the sie exit on interrupts. The low level
interrupt handler returns to the PSW address in pt_regs and not
to the PSW address in the lowcore.
Without this fix a cpu bound guest might never leave guest state
since the host interrupt handler would blindly return to the
SIE instruction, even on need_resched and friends.
Cc: stable@kernel.org
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
f6649a7e "[S390] cleanup lowcore access from external interrupts" changed
handling of external interrupts. Instead of letting the external interrupt
handlers accessing the per cpu lowcore the entry code of the kernel reads
already all fields that are necessary and passes them to the handlers.
The pfault interrupt handler was incorrectly converted. It tries to
dereference a value which used to be a pointer to a lowcore field. After
the conversion however it is not anymore the pointer to the field but its
content. So instead of a dereference only a cast is needed to get the
task pointer that caused the pfault.
Fixes a NULL pointer dereference and a subsequent kernel crash:
Unable to handle kernel pointer dereference at virtual kernel address (null)
Oops: 0004 [#1] SMP
Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
loop qeth_l3 qeth vmur ccwgroup ext3 jbd mbcache dm_mod
dasd_eckd_mod dasd_diag_mod dasd_mod
CPU: 0 Not tainted 2.6.38-2-s390x #1
Process cron (pid: 1106, task: 000000001f962f78, ksp: 000000001fa0f9d0)
Krnl PSW : 0404200180000000 000000000002c03e (pfault_interrupt+0xa2/0x138)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000001 0000000000000000 0000000000000001
000000001f962f78 0000000000518968 0000000090000002 000000001ff03280
0000000000000000 000000000064f000 000000001f962f78 0000000000002603
0000000006002603 0000000000000000 000000001ff7fe68 000000001ff7fe48
Krnl Code: 000000000002c036: 5820d010 l %r2,16(%r13)
000000000002c03a: 1832 lr %r3,%r2
000000000002c03c: 1a31 ar %r3,%r1
>000000000002c03e: ba23d010 cs %r2,%r3,16(%r13)
000000000002c042: a744fffc brc 4,2c03a
000000000002c046: a7290002 lghi %r2,2
000000000002c04a: e320d0000024 stg %r2,0(%r13)
000000000002c050: 07f0 bcr 15,%r0
Call Trace:
([<000000001f962f78>] 0x1f962f78)
[<000000000001acda>] do_extint+0xf6/0x138
[<000000000039b6ca>] ext_no_vtime+0x30/0x34
[<000000007d706e04>] 0x7d706e04
Last Breaking-Event-Address:
[<0000000000000000>] 0x0
For stable maintainers:
the first kernel which contains this bug is 2.6.37.
Reported-by: Stephen Powell <zlinuxman@wowway.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The page table walk for changing page attributes used the wrong
address for pgd/pud/pmd lookups if the range was bigger than
a pmd entry. Fix the lookup by using the correct address.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
While initializing the state of the prng only the first 8 bytes of
random data where used, the second 8 bytes were read from the memory
after the stack. If only 64 bytes of the kernel stack are used and
CONFIG_DEBUG_PAGEALLOC is enabled a kernel panic may occur because of
the invalid page access. Use the correct multiplicator to stay within
the random data buffer.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Device suspend/resume infrastructure is used not only by the suspend
and hibernate code in kernel/power, but also by APM, Xen and the
kexec jump feature. However, commit 40dc166cb5
(PM / Core: Introduce struct syscore_ops for core subsystems PM)
failed to add syscore_suspend() and syscore_resume() calls to that
code, which generally leads to breakage when the features in question
are used.
To fix this problem, add the missing syscore_suspend() and
syscore_resume() calls to arch/x86/kernel/apm_32.c, kernel/kexec.c
and drivers/xen/manage.c.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Some unnamed moron fatfingered the arguments of the irq chip callbacks
to irq_chip instead of irq_data.
While at it remove the nmi_count() print in arch_show_interrupts()
which has been broken before the irq conversion already.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, gart: Make sure GART does not map physmem above 1TB
x86, gart: Set DISTLBWALKPRB bit always
x86, gart: Convert spaces to tabs in enable_gart_translation
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf, x86: Fix AMD family 15h FPU event constraints
perf, x86: Fix pre-defined cache-misses event for AMD family 15h cpus
perf evsel: Fix use of inherit
perf hists browser: Fix seg fault when annotate null symbol
Depending on the unit mask settings some FPU events may be scheduled
only on cpu counter #3. This patch fixes this.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
Link: http://lkml.kernel.org/r/1302913676-14352-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With AMD cpu family 15h a unit mask was introduced for the Data Cache
Miss event (0x041/L1-dcache-load-misses). We need to enable bit 0
(first data cache miss or streaming store to a 64 B cache line) of
this mask to proper count data cache misses.
Now we set this bit for all families and models. In case a PMU does
not implement a unit mask for event 0x041 the bit is ignored.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1302913676-14352-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The GART can only map physical memory below 1TB. Make sure
the gart driver in the kernel does not try to map memory
above 1TB.
Cc: <stable@kernel.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-5-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The DISTLBWALKPRB bit must be set for the GART because the
gatt table is mapped UC. But the current code does not set
the bit at boot when the BIOS setup the aperture correctly.
Fix that by setting this bit when enabling the GART instead
of the other places.
Cc: <stable@kernel.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Link: http://lkml.kernel.org/r/1303134346-5805-4-git-send-email-joerg.roedel@amd.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Because of speculative event roll back, it is possible for some event coutners
to decrease between reads on POWER7. This causes a problem with the way that
counters are updated. Delta calues are calculated in a 64 bit value and the
top 32 bits are masked. If the register value has decreased, this leaves us
with a very large positive value added to the kernel counters. This patch
protects against this by skipping the update if the delta would be negative.
This can lead to a lack of precision in the coutner values, but from my testing
the value is typcially fewer than 10 samples at a time.
Signed-off-by: Eric B Munson <emunson@mgebm.net>
Cc: stable@kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This problem was noticed on an MPC855T platform. Ftrace did oops
when trying to write to the kernel text segment.
Many thanks to Joakim for finding the root cause of this problem.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We currently enable interrupts before the dispatch log for the boot
cpu is setup. If a timer interrupt comes in early enough we oops in
scan_dispatch_log:
Unable to handle kernel paging request for data at address 0x00000010
...
.scan_dispatch_log+0xb0/0x170
.account_system_vtime+0xa0/0x220
.irq_enter+0x88/0xc0
.do_IRQ+0x48/0x230
The patch below adds a check to scan_dispatch_log to ensure the
dispatch log has been allocated.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PAPR specifies that DTL buffers can not cross AMS environments (aka CMO
in the PAPR) and can not cross a memory entitlement granule boundary
(4k). This is found in section 14.11.3.2 H_REGISTER_VPA of the PAPR.
kmalloc does not guarantee an alignment of the allocation, though,
beyond 8 bytes (at least in my understanding). Create a special kmem
cache for DTL buffers with the alignment requirement.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>