In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.
On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE). Instead, do one mmu_update hypercall
directly.
During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.
This significantly improves page fault performance in 32-bit PV
guests.
lmbench3 test Before After Improvement
----------------------------------------------
lat_pagefault 3.18 us 2.32 us 27%
lat_proc fork 356 us 313.3 us 11%
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.
But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.
So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)
Here, X is 256, that is 1/2 of 512 entries.
But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.
As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.
My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.
Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':
multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199ns
Single process with sequencial flushing and memory accessing:
with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns
[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]
Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPu9MpAAoJEFjIrFwIi8fJaNQH/RylThiO+O+LBpPrO8VRUw+2
/Io98T7ZK2ggoUeaJx0C8irM0JMFAkxGMcfX3w9fwNt/BTec4s++4JhbN1jYN0da
6a0PqINo+M8y73So6CBfuJDCunaRLGKVG/ibIO3Y3WAff51/H+DMvO7uYYDAE0aA
mikyOxnaty0DiG5i4JGDHGmCzDASfK/jgGccZ03m6522mDx5ZIbTzZWONLfz8dqT
rbxnn9vrNLgEYWuzyLMwW0GymToUtt01xBQvwJLAbhn8lr1WBRBLpxXA+5iYNQrn
Ri25G7keYJhG4uwZfaHnR+4HTrmhlGzK1Z96dkqpGUaeIcdyWmPMp22VtBBiwG8=
=uyRr
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
* Extend the APIC ops implementation and add IRQ_WORKER vector
support so that 'perf' can work properly.
* Fix self-ballooning code, and balloon logic when booting as initial
domain.
* Move array printing code to generic debugfs
* Support XenBus domains.
* Lazily free grants when a domain is dead/non-existent.
* In M2P code use batching calls
Bug-fixes:
* Fix NULL dereference in allocation failure path (hvc_xen)
* Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug
* Fix HVM guest resume - we would leak an PIRQ value instead of
reusing the existing one."
Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of
apic ipi interface next to the new apic_id functions.
* tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not map the same GSI twice in PVHVM guests.
hvc_xen: NULL dereference on allocation failure
xen: Add selfballoning memory reservation tunable.
xenbus: Add support for xenbus backend in stub domain
xen/smp: unbind irqworkX when unplugging vCPUs.
xen: enter/exit lazy_mmu_mode around m2p_override calls
xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep
xen: implement IRQ_WORK_VECTOR handler
xen: implement apic ipi interface
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/gnttab: add deferred freeing logic
debugfs: Add support to print u32 array in debugfs
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
Pull x86/apic changes from Ingo Molnar:
"Most of the changes are about helping virtualized guest kernels
achieve better performance."
Fix up trivial conflicts with the iommu updates to arch/x86/kernel/apic/io_apic.c
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Implement EIO micro-optimization
x86/apic: Add apic->eoi_write() callback
x86/apic: Use symbolic APIC_EOI_ACK
x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it
x86/xen/apic: Add missing #include <xen/xen.h>
x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQ
x86/apic: Fix UP boot crash
x86: Conditionally update time when ack-ing pending irqs
xen/apic: implement io apic read with hypercall
Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'"
xen/x86: Implement x86_apic_ops
x86/apic: Replace io_apic_ops with x86_io_apic_ops.
* stable/autoballoon.v5.2:
xen/setup: update VA mapping when releasing memory during setup
xen/setup: Combine the two hypercall functions - since they are quite similar.
xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM
xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0
xen/p2m: An early bootup variant of set_phys_to_machine
xen/p2m: Collapse early_alloc_p2m_middle redundant checks.
xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument
xen/p2m: Move code around to allow for better re-usage.
In xen_memory_setup(), if a page that is being released has a VA
mapping this must also be updated. Otherwise, the page will be not
released completely -- it will still be referenced in Xen and won't be
freed util the mapping is removed and this prevents it from being
reallocated at a different PFN.
This was already being done for the ISA memory region in
xen_ident_map_ISA() but on many systems this was omitting a few pages
as many systems marked a few pages below the ISA memory region as
reserved in the e820 map.
This fixes errors such as:
(XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152
(XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17)
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If I try to do "cat /sys/kernel/debug/kernel_page_tables"
I end up with:
BUG: unable to handle kernel paging request at ffffc7fffffff000
IP: [<ffffffff8106aa51>] ptdump_show+0x221/0x480
PGD 0
Oops: 0000 [#1] SMP
CPU 0
.. snip..
RAX: 0000000000000000 RBX: ffffc00000000fff RCX: 0000000000000000
RDX: 0000800000000000 RSI: 0000000000000000 RDI: ffffc7fffffff000
which is due to the fact we are trying to access a PFN that is not
accessible to us. The reason (at least in this case) was that
PGD[256] is set to __HYPERVISOR_VIRT_START which was setup (by the
hypervisor) to point to a read-only linear map of the MFN->PFN array.
During our parsing we would get the MFN (a valid one), try to look
it up in the MFN->PFN tree and find it invalid and return ~0 as PFN.
Then pte_mfn_to_pfn would happilly feed that in, attach the flags
and return it back to the caller. 'ptdump_show' bitshifts it and
gets and invalid value that it tries to dereference.
Instead of doing all of that, we detect the ~0 case and just
return !_PAGE_PRESENT.
This bug has been in existence .. at least until 2.6.37 (yikes!)
CC: stable@kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This reverts commit 2531d64b6f.
The two patches:
x86/apic: Replace io_apic_ops with x86_io_apic_ops.
xen/x86: Implement x86_apic_ops
take care of fixing it properly.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree,
* the other is to fix drivers to load on PV (a previous patch made them only
load in PVonHVM mode).
The rest are just minor fixes in the various drivers and some cleanup in the
core code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPfyVUAAoJEFjIrFwIi8fJUjUH/jbY5JavRqSlNELZW2A4Ta76
8p00LqLHw/C56iHZcWKke8mqtWNb+ZfcQt7ZYcxDIYa4QWBL28x0OLAO2tOBIt37
ZjYESWSdFJaJvmpADluWtFyGyZ9TYJllDTBm/jWj1ZtKSZvR1YkhuMXCS0f4AmGQ
xFzSWJZUDdiOAqpN+VQD8wP00gfR8knQLg16XE2fvFdQo4XwpCtqLfHV/5pMMGdy
Cs/ep6rq/7cdv/nshKOcBnw7RW8l3Xoi/28ht8k3DvAQ2VtFq1Tugv2G9pcCHwQG
DIBkB3SOU6/v6P5at5+egKS5xR1fJetCWlkMd8kkbcdz2NPI4UDMkvOW6Q8yQls=
=6Ve+
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull xen fixes from Konrad Rzeszutek Wilk:
"Two fixes for regressions:
* one is a workaround that will be removed in v3.5 with proper fix in
the tip/x86 tree,
* the other is to fix drivers to load on PV (a previous patch made
them only load in PVonHVM mode).
The rest are just minor fixes in the various drivers and some cleanup
in the core code."
* tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
xen/pciback: fix XEN_PCI_OP_enable_msix result
xen/smp: Remove unnecessary call to smp_processor_id()
xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
xen: only check xen_platform_pci_unplug if hvm
The above mentioned patch checks the IOAPIC and if it contains
-1, then it unmaps said IOAPIC. But under Xen we get this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
PGD 0
Oops: 0002 [#1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
1525 /0U990C
RIP: e030:[<ffffffff8134e51f>] [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
RSP: e02b: ffff8800d42cbb70 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
FS: 0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0:000000008005003b
CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
Stack:
00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
Call Trace:
[<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
[<ffffffff8100a9b2>] ? check_events+0x12+0x20
[<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
[<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
[<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
[<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
[<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
[<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
[<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
[<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
[<ffffffff812dc8d3>] pci_enable_device+0x13/0x20
The reason we are dying is b/c the call acpi_get_override_irq() is used,
which returns the polarity and trigger for the IRQs. That function calls
mp_find_ioapics to get the 'struct ioapic' structure - which along with the
mp_irq[x] is used to figure out the default values and the polarity/trigger
overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.
The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
struct so that platforms can override it. But for v3.4 lets carry this
work-around. This patch does that by providing a slightly different variant
of the fake IOAPIC entries.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
- PV multiconsole support, so that there can be hvc1, hvc2, etc;
- P-state and C-state power management driver that uploads said
power management data to the hypervisor. It also inhibits cpufreq
scaling drivers to load so that only the hypervisor can make power
management decisions - fixing a weird perf bug.
- Function Level Reset (FLR) support in the Xen PCI backend.
Fixes:
- Kconfig dependencies for Xen PV keyboard and video
- Compile warnings and constify fixes
- Change over to use percpu_xxx instead of this_cpu_xxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJPZ0qkAAoJEFjIrFwIi8fJjCgH/jeJ39E8ML8DP9tCS2HQnMqM
uTEjLcqvoJ7sEhHvtBLPeG2p0jyBvOWjLbSc7P8nESBAMPvSYol8L6WqfWrdSU4r
lHrma2sg9UYzRog5NyxAgkp7bBsBBFOnhVL3Cxb5Ig78cPWzeSWGpqGZ8M/d51Wf
1iE0tHuU4DpN+fg1SZqPqEm8ecEJ/eSrVTnyTx/Qo2Ak+Zw98SqzX7SV5lo8mudd
WFL1F2K9FyTNk79ndGhqFt36x6nEbFgMLbmCDWumLuWN6bMd1Uq0wNkCqW4F1h28
3yqnY+rfQh4y3eXK1B9nttCUTs+/66U5ZWrT6B1IJumGTAIqcWfgeUX/Vn/HVC4=
=tfMc
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull xen updates from Konrad Rzeszutek Wilk:
"which has three neat features:
- PV multiconsole support, so that there can be hvc1, hvc2, etc; This
can be used in HVM and in PV mode.
- P-state and C-state power management driver that uploads said power
management data to the hypervisor. It also inhibits cpufreq
scaling drivers to load so that only the hypervisor can make power
management decisions - fixing a weird perf bug.
There is one thing in the Kconfig that you won't like: "default y
if (X86_ACPI_CPUFREQ = y || X86_POWERNOW_K8 = y)" (note, that it
all depends on CONFIG_XEN which depends on CONFIG_PARAVIRT which by
default is off). I've a fix to convert that boolean expression
into "default m" which I am going to post after the cpufreq git
pull - as the two patches to make this work depend on a fix in Dave
Jones's tree.
- Function Level Reset (FLR) support in the Xen PCI backend.
Fixes:
- Kconfig dependencies for Xen PV keyboard and video
- Compile warnings and constify fixes
- Change over to use percpu_xxx instead of this_cpu_xxx"
Fix up trivial conflicts in drivers/tty/hvc/hvc_xen.c due to changes to
a removed commit.
* tag 'stable/for-linus-3.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
xen/acpi-processor: C and P-state driver that uploads said data to hypervisor.
xen: constify all instances of "struct attribute_group"
xen/xenbus: ignore console/0
hvc_xen: introduce HVC_XEN_FRONTEND
hvc_xen: implement multiconsole support
hvc_xen: support PV on HVM consoles
xenbus: don't free other end details too early
xen/enlighten: Expose MWAIT and MWAIT_LEAF if hypervisor OKs it.
xen/setup/pm/acpi: Remove the call to boot_option_idle_override.
xenbus: address compiler warnings
xen: use this_cpu_xxx replace percpu_xxx funcs
xen/pciback: Support pci_reset_function, aka FLR or D3 support.
pci: Introduce __pci_reset_function_locked to be used when holding device_lock.
xen: Utilize the restore_msi_irqs hook.
[Pls also look at https://lkml.org/lkml/2012/2/10/228]
Using of PAT to change pages from WB to WC works quite nicely.
Changing it back to WB - not so much. The crux of the matter is
that the code that does this (__page_change_att_set_clr) has only
limited information so when it tries to the change it gets
the "raw" unfiltered information instead of the properly filtered one -
and the "raw" one tell it that PSE bit is on (while infact it
is not). As a result when the PTE is set to be WB from WC, we get
tons of:
:WARNING: at arch/x86/xen/mmu.c:475 xen_make_pte+0x67/0xa0()
:Hardware name: HP xw4400 Workstation
.. snip..
:Pid: 27, comm: kswapd0 Tainted: G W 3.2.2-1.fc16.x86_64 #1
:Call Trace:
: [<ffffffff8106dd1f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8106dd7a>] warn_slowpath_null+0x1a/0x20
: [<ffffffff81005a17>] xen_make_pte+0x67/0xa0
: [<ffffffff810051bd>] __raw_callee_save_xen_make_pte+0x11/0x1e
: [<ffffffff81040e15>] ? __change_page_attr_set_clr+0x9d5/0xc00
: [<ffffffff8114c2e8>] ? __purge_vmap_area_lazy+0x158/0x1d0
: [<ffffffff8114cca5>] ? vm_unmap_aliases+0x175/0x190
: [<ffffffff81041168>] change_page_attr_set_clr+0x128/0x4c0
: [<ffffffff81041542>] set_pages_array_wb+0x42/0xa0
: [<ffffffff8100a9b2>] ? check_events+0x12/0x20
: [<ffffffffa0074d4c>] ttm_pages_put+0x1c/0x70 [ttm]
: [<ffffffffa0074e98>] ttm_page_pool_free+0xf8/0x180 [ttm]
: [<ffffffffa0074f78>] ttm_pool_mm_shrink+0x58/0x90 [ttm]
: [<ffffffff8112ba04>] shrink_slab+0x154/0x310
: [<ffffffff8112f17a>] balance_pgdat+0x4fa/0x6c0
: [<ffffffff8112f4b8>] kswapd+0x178/0x3d0
: [<ffffffff815df134>] ? __schedule+0x3d4/0x8c0
: [<ffffffff81090410>] ? remove_wait_queue+0x50/0x50
: [<ffffffff8112f340>] ? balance_pgdat+0x6c0/0x6c0
: [<ffffffff8108fb6c>] kthread+0x8c/0xa0
for every page. The proper fix for this is has been posted
and is https://lkml.org/lkml/2012/2/10/228
"x86/cpa: Use pte_attrs instead of pte_flags on CPA/set_p.._wb/wc operations."
along with a detailed description of the problem and solution.
But since that posting has gone nowhere I am proposing
this band-aid solution so that at least users don't get
the page corruption (the pages that are WC don't get changed to WB
and end up being recycled for filesystem or other things causing
mysterious crashes).
The negative impact of this patch is that users of WC flag
(which are InfiniBand, radeon, nouveau drivers) won't be able
to set that flag - so they are going to see performance degradation.
But stability is more important here.
Fixes RH BZ# 742032, 787403, and 745574
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
percpu_xxx funcs are duplicated with this_cpu_xxx funcs, so replace them
for further code clean up.
I don't know much of xen code. But, since the code is in x86 architecture,
the percpu_xxx is exactly same as this_cpu_xxx serials functions. So, the
change is safe.
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts & resolutions:
* arch/x86/xen/setup.c
dc91c728fd "xen: allow extra memory to be in multiple regions"
24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."
conflicted on xen_add_extra_mem() updates. The resolution is
trivial as the latter just want to replace
memblock_x86_reserve_range() with memblock_reserve().
* drivers/pci/intel-iommu.c
166e9278a3 "x86/ia64: intel-iommu: move to drivers/iommu/"
5dfe8660a3 "bootmem: Replace work_with_active_regions() with..."
conflicted as the former moved the file under drivers/iommu/.
Resolved by applying the chnages from the latter on the moved
file.
* mm/Kconfig
6661672053 "memblock: add NO_BOOTMEM config symbol"
c378ddd53f "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"
conflicted trivially. Both added config options. Just
letting both add their own options resolves the conflict.
* mm/memblock.c
d1f0ece6cd "mm/memblock.c: small function definition fixes"
ed7b56a799 "memblock: Remove memblock_memory_can_coalesce()"
confliected. The former updates function removed by the
latter. Resolution is trivial.
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m/debugfs: Make type_name more obvious.
xen/p2m/debugfs: Fix potential pointer exception.
xen/enlighten: Fix compile warnings and set cx to known value.
xen/xenbus: Remove the unnecessary check.
xen/irq: If we fail during msi_capability_init return proper error code.
xen/events: Don't check the info for NULL as it is already done.
xen/events: BUG() when we can't allocate our event->irq array.
* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: Fix selfballooning and ensure it doesn't go too far
xen/gntdev: Fix sleep-inside-spinlock
xen: modify kernel mappings corresponding to granted pages
xen: add an "highmem" parameter to alloc_xenballooned_pages
xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
xen/p2m: Make debug/xen/mmu/p2m visible again.
Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
We dropped a lot of the MMU debugfs in favour of using
tracing API - but there is one which just provides
mostly static information that was made invisible by this change.
Bring it back.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'stable/bug.fixes' of git://oss.oracle.com/git/kwilk/xen:
xen/i386: follow-up to "replace order-based range checking of M2P table by linear one"
xen/irq: Alter the locking to use a mutex instead of a spinlock.
xen/e820: if there is no dom0_mem=, don't tweak extra_pages.
xen: disable PV spinlocks on HVM
The numbers obtained from the hypervisor really can't ever lead to an
overflow here, only the original calculation going through the order
of the range could have. This avoids the (as Jeremy points outs)
somewhat ugly NULL-based calculation here.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/tracing: Fix tracing config option properly
xen: Do not enable PV IPIs when vector callback not present
xen/x86: replace order-based range checking of M2P table by linear one
xen: xen-selfballoon.c needs more header files
The order-based approach is not only less efficient (requiring a shift
and a compare, typical generated code looking like this
mov eax, [machine_to_phys_order]
mov ecx, eax
shr ebx, cl
test ebx, ebx
jnz ...
whereas a direct check requires just a compare, like in
cmp ebx, [machine_to_phys_nr]
jae ...
), but also slightly dangerous in the 32-on-64 case - the element
address calculation can wrap if the next power of two boundary is
sufficiently far away from the actual upper limit of the table, and
hence can result in user space addresses being accessed (with it being
unknown what may actually be mapped there).
Additionally, the elimination of the mistaken use of fls() here (should
have been __fls()) fixes a latent issue on x86-64 that would trigger
if the code was run on a system with memory extending beyond the 44-bit
boundary.
CC: stable@kernel.org
Signed-off-by: Jan Beulich <jbeulich@novell.com>
[v1: Based on Jeremy's feedback]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip:
x86-64: Rework vsyscall emulation and add vsyscall= parameter
x86-64: Wire up getcpu syscall
x86: Remove unnecessary compile flag tweaks for vsyscall code
x86-64: Add vsyscall:emulate_vsyscall trace event
x86-64: Add user_64bit_mode paravirt op
x86-64, xen: Enable the vvar mapping
x86-64: Work around gold bug 13023
x86-64: Move the "user" vsyscall segment out of the data segment.
x86-64: Pad vDSO to a page boundary
We don' use it anymore and there are more false positives.
This reverts commit fc25151d9a.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Xen needs to handle VVAR_PAGE, introduced in git commit:
9fd67b4ed0
x86-64: Give vvars their own page
Otherwise we die during bootup with a message like:
(XEN) mm.c:940:d10 Error getting mfn 1888 (pfn 1e3e48) from L1 entry
8000000001888465 for l1e_owner=10, pg_owner=10
(XEN) mm.c:5049:d10 ptwr_emulate: could not get_page_from_l1e()
[ 0.000000] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 0.000000] IP: [<ffffffff8103a930>] xen_set_pte+0x20/0xe0
Signed-off-by: Andy Lutomirski <luto@mit.edu>
Link: http://lkml.kernel.org/r/4659478ed2f3480938f96491c2ecbe2b2e113a23.1312378163.git.luto@mit.edu
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make sure the fastpath code is inlined. Batch the page permission change
and the pin/unpin, and make sure that it can be batched with any
adjacent set_pte/pmd/etc operations.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Other than sanity check and debug message, the x86 specific version of
memblock reserve/free functions are simple wrappers around the generic
versions - memblock_reserve/free().
This patch adds debug messages with caller identification to the
generic versions and replaces x86 specific ones and kills them.
arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
after this change and removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Simple enough - we use an extern defined symbol which is not
defined when CONFIG_SMP is not defined. This fixes the linker
dying.
CC: stable@kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The MAXSMP config option requires CPUMASK_OFFSTACK, which in turn
requires we init the memory for the maps while we bring up the cpus.
MAXSMP also increases NR_CPUS to 4096. This increase in size exposed an
issue in the argument construction for multicalls from
xen_flush_tlb_others. The args should only need space for the actual
number of cpus.
Also in 2.6.39 it exposes a bootup problem.
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8157a1d3>] set_cpu_sibling_map+0x123/0x30d
...
Call Trace:
[<ffffffff81039a3f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[<ffffffff819dc4db>] xen_smp_prepare_cpus+0x36/0x135
..
CC: stable@kernel.org
Signed-off-by: Andrew Jones <drjones@redhat.com>
[v2: Updated to compile on 3.0]
[v3: Updated to compile when CONFIG_SMP is not defined]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We only need to set max_pfn_mapped to the last pfn mapped on x86_64 to
make sure that cleanup_highmap doesn't remove important mappings at
_end.
We don't need to do this on x86_32 because cleanup_highmap is not called
on x86_32. Besides lowering max_pfn_mapped on x86_32 has the unwanted
side effect of limiting the amount of memory available for the 1:1
kernel pagetable allocation.
This patch reverts the x86_32 part of the original patch.
CC: stable@kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen: fix compile without CONFIG_XEN_DEBUG_FS
Use arbitrary_virt_to_machine() to deal with ioremapped pud updates.
Use arbitrary_virt_to_machine() to deal with ioremapped pmd updates.
xen/mmu: remove all ad-hoc stats stuff
xen: use normal virt_to_machine for ptes
xen: make a pile of mmu pvop functions static
vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
xen: condense everything onto xen_set_pte
xen: use mmu_update for xen_set_pte_at()
xen: drop all the special iomap pte paths.
We no longer support HIGHPTE allocations, so ptes should always be
within the kernel's direct map, and don't need pagetable walks
to convert to machine addresses.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen_set_pte_at and xen_clear_pte are essentially identical to
xen_set_pte, so just make them all common.
When batched set_pte and pte_clear are the same, but the unbatch operation
must be different: they need to update the two halves of the pte in
different order.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
In principle update_va_mapping is a good match for set_pte_at, since
it gets the address being mapped, which allows Xen to use its linear
pagetable mapping.
However that assumes that the pmd for the address is attached to the
current pagetable, which may not be true for a given user address space
because the kernel pmd is not shared (at least on 32-bit guests).
Normally the kernel will automatically sync a missing part of the
pagetable with the init_mm pagetable transparently via faults, but that
fails when a missing address is passed to Xen.
And while the linear pagetable mapping is very useful for 32-bit Xen
(as it avoids an explicit domain mapping), 32-bit Xen is deprecated.
64-bit Xen has all memory mapped all the time, so it makes no real
difference.
The upshot is that we should use mmu_update, since it can operate on
non-current pagetables or detached pagetables.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Xen can work out when we're doing IO mappings for itself, so we don't
need to do anything special, and the extra tests just clog things up.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* 'stable/irq' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: do not clear and mask evtchns in __xen_evtchn_do_upcall
* 'stable/p2m.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/p2m: Create entries in the P2M_MFN trees's to track 1-1 mappings
* 'stable/e820.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/setup: Fix for incorrect xen_extra_mem_start initialization under 32-bit
xen/setup: Ignore E820_UNUSABLE when setting 1-1 mappings.
* 'stable/mmu.bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen mmu: fix a race window causing leave_mm BUG()
Cleanup code/data sections definitions
accordingly to include/linux/init.h.
Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
[v1: Rebased on top of latest linus's to include fixes in mmu.c]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Introduce a new x86_init hook called pagetable_reserve that at the end
of init_memory_mapping is used to reserve a range of memory addresses for
the kernel pagetable pages we used and free the other ones.
On native it just calls memblock_x86_reserve_range while on xen it also
takes care of setting the spare memory previously allocated
for kernel pagetable pages from RO to RW, so that it can be used for
other purposes.
A detailed explanation of the reason why this hook is needed follows.
As a consequence of the commit:
commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date: Fri Dec 17 16:58:28 2010 -0800
x86-64, mm: Put early page table high
at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW. When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).
In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).
For this reason we need a hook to reserve the kernel pagetable pages we
used and free the other ones so that they can be reused for other
purposes.
On native it just means calling memblock_x86_reserve_range, on Xen it
also means marking RW the pagetable pages that we allocated before but
that haven't been used before.
Another way to fix this is without using the hook is by adding a 'if
(xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
counterpart, but that is just nasty.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
mask_rw_pte is currently checking if a pfn is a pagetable page if it
falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
because pgt_buf_end is a moving target: pgt_buf_top is the real
boundary.
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
As a consequence of the commit:
commit 4b239f458c
Author: Yinghai Lu <yinghai@kernel.org>
Date: Fri Dec 17 16:58:28 2010 -0800
x86-64, mm: Put early page table high
it causes the Linux kernel to crash under Xen:
mapping kernel into physical memory
Xen: setup ISA identity maps
about to get started...
(XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
(XEN) mm.c:3027:d0 Error while pinning mfn b1d89
(XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
...
The reason is that at some point init_memory_mapping is going to reach
the pagetable pages area and map those pages too (mapping them as normal
memory that falls in the range of addresses passed to init_memory_mapping
as argument). Some of those pages are already pagetable pages (they are
in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
mapped RO and everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW. When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).
In order to fix the issue we mark all the pages in the entire range
pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
is completed only the range pgt_buf_start-pgt_buf_end is reserved by
init_memory_mapping. Hence the kernel is going to crash as soon as one
of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
ranges are RO).
For this reason, this function is introduced which is called _after_
the init_memory_mapping has completed (in a perfect world we would
call this function from init_memory_mapping, but lets ignore that).
Because we are called _after_ init_memory_mapping the pgt_buf_[start,
end,top] have all changed to new values (b/c another init_memory_mapping
is called). Hence, the first time we enter this function, we save
away the pgt_buf_start value and update the pgt_buf_[end,top].
When we detect that the "old" pgt_buf_start through pgt_buf_end
PFNs have been reserved (so memblock_x86_reserve_range has been called),
we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.
And then we update those "old" pgt_buf_[end|top] with the new ones
so that we can redo this on the next pagetable.
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
[v1: Updated with Jeremy's comments]
[v2: Added the crash output]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The two "is_early_ioremap_ptep" checks in mask_rw_pte are only used on
x86_64, in fact early_ioremap is not used at all to setup the initial
pagetable on x86_32.
Moreover on x86_32 the two checks are wrong because the range
pgt_buf_start..pgt_buf_end initially should be mapped RW because
the pages in the range are not pagetable pages yet and haven't been
cleared yet. Afterwards considering the pgt_buf_start..pgt_buf_end is
part of the initial mapping, xen_alloc_pte is capable of turning
the ptes RO when they become pagetable pages.
Fix the issue and improve the readability of the code providing two
different implementation of mask_rw_pte for x86_32 and x86_64.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
There are valid situations in which this error is not
a warning. Mainly when QEMU maps a guest memory and uses
the VM_IO flag to set the MFNs. For right now make the
WARN be WARN_ONCE. In the future we will:
1). Remove the VM_IO code handling..
2). .. which will also remove this debug facility.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xen: update mask_rw_pte after kernel page tables init changes
xen: set max_pfn_mapped to the last pfn mapped
x86: Cleanup highmap after brk is concluded
Fix up trivial onflict (added header file includes) in
arch/x86/mm/init_64.c
After "x86-64, mm: Put early page table high" already existing kernel
page table pages can be mapped using early_ioremap too so we need to
update mask_rw_pte to make sure these pages are still mapped RO.
The reason why we have to do that is explain by the commit message of
fef5ba7979:
"Xen requires that all pages containing pagetable entries to be mapped
read-only. If pages used for the initial pagetable are already mapped
then we can change the mapping to RO. However, if they are initially
unmapped, we need to make sure that when they are later mapped, they
are also mapped RO.
..SNIP..
the pagetable setup code early_ioremaps the pages to write their
entries, so we must make sure that mappings created in the early_ioremap
fixmap area are mapped RW. (Those mappings are removed before the pages
are presented to Xen as pagetable pages.)"
We accomplish all this in mask_rw_pte by mapping RO all the pages mapped
using early_ioremap apart from the last one that has been allocated
because it is not a page table page yet (it has not been hooked into the
page tables yet).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Do not set max_pfn_mapped to the end of the initial memory mappings,
that also contain pages that don't belong in pfn space (like the mfn
list).
Set max_pfn_mapped to the last real pfn mapped in the initial memory
mappings that is the pfn backing _end.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Flush TLB if PGD entry is changed in i386 PAE mode
x86, dumpstack: Correct stack dump info when frame pointer is available
x86: Clean up csum-copy_64.S a bit
x86: Fix common misspellings
x86: Fix misspelling and align params
x86: Use PentiumPro-optimized partial_csum() on VIA C7
They were generated by 'codespell' and then manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'stable/hvc-console' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/hvc: Disable probe_irq_on/off from poking the hvc-console IRQ line.
* 'stable/gntalloc.v6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: gntdev: fix build warning
xen/p2m/m2p/gnttab: do not add failed grant maps to m2p override
xen-gntdev: Add cast to pointer
xen-gntdev: Fix incorrect use of zero handle
xen: change xen/[gntdev/gntalloc] to default m
xen-gntdev: prevent using UNMAP_NOTIFY_CLEAR_BYTE on read-only mappings
xen-gntdev: Avoid double-mapping memory
xen-gntdev: Avoid unmapping ranges twice
xen-gntdev: Use map->vma for checking map validity
xen-gntdev: Fix unmap notify on PV domains
xen-gntdev: Fix memory leak when mmap fails
xen/gntalloc,gntdev: Add unmap notify ioctl
xen-gntalloc: Userspace grant allocation driver
xen-gntdev: Support mapping in HVM domains
xen-gntdev: Add reference counting to maps
xen-gntdev: Use find_vma rather than iterating our vma list manually
xen-gntdev: Change page limit to be global instead of per-open
* 'stable/balloon' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (24 commits)
xen-gntdev: Use ballooned pages for grant mappings
xen-balloon: Add interface to retrieve ballooned pages
xen-balloon: Move core balloon functionality out of module
xen/balloon: Remove pr_info's and don't alter retry_count
xen/balloon: Protect against CPU exhaust by event/x process
xen/balloon: Migration from mod_timer() to schedule_delayed_work()
xen/balloon: Removal of driver_pages
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (93 commits)
x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others()
x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation
x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation()
x86-64, NUMA: Clean up initmem_init()
x86-64, NUMA: Fix numa_emulation code with node0 without RAM
x86-64, NUMA: Revert NUMA affine page table allocation
x86: Work around old gas bug
x86-64, NUMA: Better explain numa_distance handling
x86-64, NUMA: Fix distance table handling
mm: Move early_node_map[] reverse scan helpers under HAVE_MEMBLOCK
x86-64, NUMA: Fix size of numa_distance array
x86: Rename e820_table_* to pgt_buf_*
bootmem: Move __alloc_memory_core_early() to nobootmem.c
bootmem: Move contig_page_data definition to bootmem.c/nobootmem.c
bootmem: Separate out CONFIG_NO_BOOTMEM code into nobootmem.c
x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance()
x86-64, NUMA: Add proper function comments to global functions
x86-64, NUMA: Move NUMA emulation into numa_emulation.c
x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file
x86-64, NUMA: Do not scan two times for setup_node_bootmem()
...
Fix up conflicts in arch/x86/kernel/smpboot.c
* 'stable/p2m-identity.v4.9.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/m2p: Check whether the MFN has IDENTITY_FRAME bit set..
xen/m2p: No need to catch exceptions when we know that there is no RAM
xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set.
xen/debugfs: Add 'p2m' file for printing out the P2M layout.
xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.
xen/mmu: WARN_ON when racing to swap middle leaf.
xen/mmu: Set _PAGE_IOMAP if PFN is an identity PFN.
xen/mmu: Add the notion of identity (1-1) mapping.
xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY.
* 'stable/e820' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest and fix overflow.
xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps.
Removal of driver_pages (I do not have seen any references to it).
Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Only enabled if XEN_DEBUG is enabled. We print a warning
when:
pfn_to_mfn(pfn) == pfn, but no VM_IO (_PAGE_IOMAP) flag set
(and pfn is an identity mapped pfn)
pfn_to_mfn(pfn) != pfn, and VM_IO flag is set.
(ditto, pfn is an identity mapped pfn)
[v2: Make it dependent on CONFIG_XEN_DEBUG instead of ..DEBUG_FS]
[v3: Fix compiler warning]
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We walk over the whole P2M tree and construct a simplified view of
which PFN regions belong to what level and what type they are.
Only enabled if CONFIG_XEN_DEBUG_FS is set.
[v2: UNKN->UNKNOWN, use uninitialized_var]
[v3: Rebased on top of mmu->p2m code split]
[v4: Fixed the else if]
Reviewed-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If we find that the PFN is within the P2M as an identity
PFN make sure to tack on the _PAGE_IOMAP flag.
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
It's forbidden to take the page_table_lock with the irq disabled
or if there's contention the IPIs (for tlb flushes) sent with
the page_table_lock held will never run leading to a deadlock.
Nobody takes the pgd_lock from irq context so the _irqsave can be
removed.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With this patch, we diligently set regions that will be used by the
balloon driver to be INVALID_P2M_ENTRY and under the ownership
of the balloon driver. We are OK using the __set_phys_to_machine
as we do not expect to be allocating any P2M middle or entries pages.
The set_phys_to_machine has the side-effect of potentially allocating
new pages and we do not want that at this stage.
We can do this because xen_build_mfn_list_list will have already
allocated all such pages up to xen_max_p2m_pfn.
We also move the check for auto translated physmap down the
stack so it is present in __set_phys_to_machine.
[v2: Rebased with mmu->p2m code split]
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
e820_table_{start|end|top}, which are used to buffer page table
allocation during early boot, are now derived from memblock and don't
have much to do with e820. Change the names so that they reflect what
they're used for.
This patch doesn't introduce any behavior change.
-v2: Ingo found that earlier patch "x86: Use early pre-allocated page
table buffer top-down" caused crash on 32bit and needed to be
dropped. This patch was updated to reflect the change.
-tj: Updated commit description.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
On stock 2.6.37-rc4, running:
# mount lilith:/export /mnt/lilith
# find /mnt/lilith/ -type f -print0 | xargs -0 file
crashes the machine fairly quickly under Xen. Often it results in oops
messages, but the couple of times I tried just now, it just hung quietly
and made Xen print some rude messages:
(XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
3000000000000000) for mfn 1d7058 (pfn 18fa7)
(XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
(XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
(XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04
Which means the domain tried to map a pagetable page RW, which would
allow it to map arbitrary memory, so Xen stopped it. This is because
vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
finished with them, and those pages got recycled as pagetable pages
while still having these RW aliases.
Removing those mappings immediately removes the Xen-visible aliases, and
so it has no problem with those pages being reused as pagetable pages.
Deferring the TLB flush doesn't upset Xen because it can flush the TLB
itself as needed to maintain its invariants.
When unmapping a region in the vmalloc space, clear the ptes
immediately. There's no point in deferring this because there's no
amortization benefit.
The TLBs are left dirty, and they are flushed lazily to amortize the
cost of the IPIs.
This specific motivation for this patch is an oops-causing regression
since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
of vm_map_ram() introduced in 56e4ebf877 ("NFS: readdir with vmapped
pages") . XFS also uses vm_map_ram() and could cause similar problems.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Only make swapper_pg_dir readonly and pinned when generic x86 architecture code
(which also starts on initial_page_table) switches to it. This helps ensure
that the generic setup paths work on Xen unmodified. In particular
clone_pgd_range writes directly to the destination pgd entries and is used to
initialise swapper_pg_dir so we need to ensure that it remains writeable until
the last possible moment during bring up.
This is complicated slightly by the need to avoid sharing kernel PMD entries
when running under Xen, therefore the Xen implementation must make a copy of
the kernel PMD (which is otherwise referred to by both intial_page_table and
swapper_pg_dir) before switching to swapper_pg_dir.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* upstream/core:
xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs.
xen: set IO permission early (before early_cpu_init())
xen: re-enable boot-time ballooning
xen/balloon: make sure we only include remaining extra ram
xen/balloon: the balloon_lock is useless
xen: add extra pages to balloon
xen/events: use locked set|clear_bit() for cpu_evtchn_mask
xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore
xen: implement XENMEM_machphys_mapping
* upstream/xenfs:
Revert "xen/privcmd: create address space to allow writable mmaps"
xen/xenfs: update xenfs_mount for new prototype
xen: fix header export to userspace
xen: set vma flag VM_PFNMAP in the privcmd mmap file_op
xen: xenfs: privcmd: check put_user() return code
* upstream/evtchn:
xen: make evtchn's name less generic
xen/evtchn: the evtchn device is non-seekable
xen/evtchn: add missing static
xen/evtchn: Fix name of Xen event-channel device
xen/evtchn: don't do unbind_from_irqhandler under spinlock
xen/evtchn: remove spurious barrier
xen/evtchn: ports start enabled
xen/evtchn: dynamically allocate port_user array
xen/evtchn: track enabled state for each port
* commit 'v2.6.37-rc2': (10093 commits)
Linux 2.6.37-rc2
capabilities/syslog: open code cap_syslog logic to fix build failure
i2c: Sanity checks on adapter registration
i2c: Mark i2c_adapter.id as deprecated
i2c: Drivers shouldn't include <linux/i2c-id.h>
i2c: Delete unused adapter IDs
i2c: Remove obsolete cleanup for clientdata
include/linux/kernel.h: Move logging bits to include/linux/printk.h
Fix gcc 4.5.1 miscompiling drivers/char/i8k.c (again)
hwmon: (w83795) Check for BEEP pin availability
hwmon: (w83795) Clear intrusion alarm immediately
hwmon: (w83795) Read the intrusion state properly
hwmon: (w83795) Print the actual temperature channels as sources
hwmon: (w83795) List all usable temperature sources
hwmon: (w83795) Expose fan control method
hwmon: (w83795) Fix fan control mode attributes
hwmon: (lm95241) Check validity of input values
hwmon: Change mail address of Hans J. Koch
PCI: sysfs: fix printk warnings
GFS2: Fix inode deallocation race
...
This hypercall allows Xen to specify a non-default location for the
machine to physical mapping. This capability is used when running a 32
bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to
exactly the size required.
[ Impact: add Xen hypercall definitions ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Set VM_PFNMAP in the privcmd mmap file_op, rather than later in
xen_remap_domain_mfn_range when it is too late because
vma_wants_writenotify has already been called and vm_page_prot has
already been modified.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
sizeof(pmd_t *) is 4 bytes on 32-bit PAE leading to an allocation of
only 2048 bytes. The correct size is sizeof(pmd_t) giving us a full
page allocation.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
xen: register xen pci notifier
xen: initialize cpu masks for pv guests in xen_smp_init
xen: add a missing #include to arch/x86/pci/xen.c
xen: mask the MTRR feature from the cpuid
xen: make hvc_xen console work for dom0.
xen: add the direct mapping area for ISA bus access
xen: Initialize xenbus for dom0.
xen: use vcpu_ops to setup cpu masks
xen: map a dummy page for local apic and ioapic in xen_set_fixmap
xen: remap MSIs into pirqs when running as initial domain
xen: remap GSIs as pirqs when running as initial domain
xen: introduce XEN_DOM0 as a silent option
xen: map MSIs into pirqs
xen: support GSI -> pirq remapping in PV on HVM guests
xen: add xen hvm acpi_register_gsi variant
acpi: use indirect call to register gsi in different modes
xen: implement xen_hvm_register_pirq
xen: get the maximum number of pirqs from xen
xen: support pirq != irq
* 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits)
X86/PCI: Remove the dependency on isapnp_disable.
xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c
MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright.
x86: xen: Sanitse irq handling (part two)
swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it.
MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer.
xen/pci: Request ACS when Xen-SWIOTLB is activated.
xen-pcifront: Xen PCI frontend driver.
xenbus: prevent warnings on unhandled enumeration values
xenbus: Xen paravirtualised PCI hotplug support.
xen/x86/PCI: Add support for the Xen PCI subsystem
x86: Introduce x86_msi_ops
msi: Introduce default_[teardown|setup]_msi_irqs with fallback.
x86/PCI: Export pci_walk_bus function.
x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
x86/PCI: Clean up pci_cache_line_size
xen: fix shared irq device passthrough
xen: Provide a variant of xen_poll_irq with timeout.
xen: Find an unbound irq number in reverse order (high to low).
xen: statically initialize cpu_evtchn_mask_p
...
Fix up trivial conflicts in drivers/pci/Makefile
* 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen/privcmd: make privcmd visible in domU
xen/privcmd: move remap_domain_mfn_range() to core xen code and export.
privcmd: MMAPBATCH: Fix error handling/reporting
xenbus: export xen_store_interface for xenfs
xen/privcmd: make sure vma is ours before doing anything to it
xen/privcmd: print SIGBUS faults
xen/xenfs: set_page_dirty is supposed to return true if it dirties
xen/privcmd: create address space to allow writable mmaps
xen: add privcmd driver
xen: add variable hypercall caller
xen: add xen_set_domain_pte()
xen: add /proc/xen/xsd_{kva,port} to xenfs
* 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits)
xen: include xen/xen.h for definition of xen_initial_domain()
xen: use host E820 map for dom0
xen: correctly rebuild mfn list list after migration.
xen: improvements to VIRQ_DEBUG output
xen: set up IRQ before binding virq to evtchn
xen: ensure that all event channels start off bound to VCPU 0
xen/hvc: only notify if we actually sent something
xen: don't add extra_pages for RAM after mem_end
xen: add support for PAT
xen: make sure xen_max_p2m_pfn is up to date
xen: limit extra memory to a certain ratio of base
xen: add extra pages for E820 RAM regions, even if beyond mem_end
xen: make sure xen_extra_mem_start is beyond all non-RAM e820
xen: implement "extra" memory to reserve space for pages not present at boot
xen: Use host-provided E820 map
xen: don't map missing memory
xen: defer building p2m mfn structures until kernel is mapped
xen: add return value to set_phys_to_machine()
xen: convert p2m to a 3 level tree
xen: make install_p2mtop_page() static
...
Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of
'reserve_early()' - in the new memblock world order it is now
'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.
add the direct mapping area for ISA bus access when running as initial
domain
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Otherwise the second migration attempt fails because the mfn_list_list
still refers to all the old mfns.
We need to update the entires in both p2m_top_mfn and the mid_mfn
pages which p2m_top_mfn refers to.
In order to do this we need to keep track of the virtual addresses
mapping the p2m_mid_mfn pages since we cannot rely on
mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still
contain the old MFN after a migration, which may now belong to another
domain and hence have a different mapping in the m2p.
Therefore add and maintain a third top level page, p2m_top_mfn_p[],
which tracks the virtual addresses of the mfns contained in
p2m_top_mfn[].
We also need to update the content of the p2m_mid_missing_mfn page on
resume to refer to the page's new mfn.
p2m_missing does not need updating since the migration process takes
care of the leaf p2m pages for us.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Convert Linux PAT entries into Xen ones when constructing ptes. Linux
doesn't use _PAGE_PAT for ptes, so the only difference in the first 4
entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default)
use it for WT.
xen_pte_val does the inverse conversion.
We hard-code assumptions about Linux's current PAT layout, but a
warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems.
If necessary we could go to a more general table-based conversion between
Linux and Xen PAT entries.
hugetlbfs poses a problem at the moment, the x86 architecture uses the
same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending
on which pagetable level we're using. At the moment this should be OK
so long as nobody tries to do a pte_val on a hugetlbfs pte.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keep xen_max_p2m_pfn up to date with the end of the extra memory
we're adding. It is possible that it will be too high since memory
may be truncated by a "mem=" option on the kernel command line, but
that won't matter.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When setting up a pte for a missing pfn (no matching mfn), just create
an empty pte rather than a junk mapping.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When building mfn parts of p2m structure, we rely on being able to
use mfn_to_virt, which in turn requires kernel to be mapped into
the linear area (which is distinct from the kernel image mapping
on 64-bit). Defer calling xen_build_mfn_list_list() until after
xen_setup_kernel_pagetable();
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
set_phys_to_machine() can return false on failure, which means a memory
allocation failure for the p2m structure. It can only fail if setting
the mfn for a pfn in previously unused address space. It is guaranteed
to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating
the mfn for an existing pfn.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Make the p2m structure a 3 level tree which covers the full possible
physical space.
The p2m structure contains mappings from the domain's pfns to system-wide
mfns. The structure has 3 levels and two roots. The first root is for
the domain's own use, and is linked with virtual addresses. The second
is all mfn references, and is used by Xen on save/restore to allow it to
update the p2m mapping for the domain.
At boot, the domain builder provides a simple flat p2m array for all the
initially present pages. We construct the two levels above that using
the early_brk allocator. After early boot time, set_phys_to_machine()
will allocate any missing levels using the normal kernel allocator
(at GFP_KERNEL, so it must be called in a normal blocking context).
Because the early_brk() API requires us to pre-reserve the maximum amount
of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY
config option, but its only negative side-effect is to increase the
kernel's apparent bss size. However, since all unused brk memory is
returned to the heap, there's no real downside to making it large.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Allocate p2m tables based on the actual runtime maximum pfn rather than
the static config-time limit.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Use early brk mechanism to allocate p2m tables, to save memory when
booting non-Xen.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S
xen: Cope with unmapped pages when initializing kernel pagetable
memblock, bootmem: Round pfn properly for memory and reserved regions
memblock: Annotate memblock functions with __init_memblock
memblock: Allow memblock_init to be called early
memblock/arm: Fix memblock_region_is_memory() typo
x86, memblock: Remove __memblock_x86_find_in_range_size()
memblock: Fix wraparound in find_region()
x86-32, memblock: Make add_highpages honor early reserved ranges
x86, memblock: Fix crashkernel allocation
arm, memblock: Fix the sparsemem build
memblock: Fix section mismatch warnings
powerpc, memblock: Fix memblock API change fallout
memblock, microblaze: Fix memblock API change fallout
x86: Remove old bootmem code
x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
x86: Remove not used early_res code
x86, memblock: Replace e820_/_early string with memblock_
x86: Use memblock to replace early_res
x86, memblock: Use memblock_debug to control debug message print out
...
Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile
This allows xenfs to be built as a module, previously it required flush_tlb_all
and arbitrary_virt_to_machine to be exported.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Add xen_set_domain_pte() to allow setting a pte mapping a page from
another domain. The common case is to map from DOMID_IO, the pseudo
domain which owns all IO pages, but will also be used in the privcmd
interface to map other domain pages.
[ Impact: new Xen-internal API for cross-domain mappings ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Xen requires that all pages containing pagetable entries to be mapped
read-only. If pages used for the initial pagetable are already mapped
then we can change the mapping to RO. However, if they are initially
unmapped, we need to make sure that when they are later mapped, they
are also mapped RO.
We do this by knowing that the kernel pagetable memory is pre-allocated
in the range e820_table_start - e820_table_end, so any pfn within this
range should be mapped read-only. However, the pagetable setup code
early_ioremaps the pages to write their entries, so we must make sure
that mappings created in the early_ioremap fixmap area are mapped RW.
(Those mappings are removed before the pages are presented to Xen
as pagetable pages.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4CB63A80.8060702@goop.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly
-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
VMI was the only user of the alloc_pmd_clone hook, given that VMI
is now removed we can also remove this hook.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
x86: Detect whether we should use Xen SWIOTLB.
pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
xen/mmu: inhibit vmap aliases rather than trying to clear them out
vmap: add flag to allow lazy unmap to be disabled at runtime
xen: Add xen_create_contiguous_region
xen: Rename the balloon lock
xen: Allow unprivileged Xen domains to create iomap pages
xen: use _PAGE_IOMAP in ioremap to do machine mappings
Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
include/xen/xen-ops.h
This patch introduce a CONFIG_XEN_PVHVM compile time option to
enable/disable Xen PV on HVM support.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Rather than trying to deal with aliases once they appear, just completely
inhibit them. Mostly the removal of aliases was managable, but it comes
unstuck in xen_create_contiguous_region() because it gets executed at
interrupt time (as a result of dma_alloc_coherent()), which causes all
sorts of confusion in the vmap code, as it was never intended to be run
in interrupt context.
This has the unfortunate side effect of removing all the unmap batching
the vmap code so carefully added, but that can't be helped.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When a pagetable is about to be destroyed, we notify Xen so that the
hypervisor can clear the related shadow pagetable.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
A memory region must be physically contiguous in order to be accessed
through DMA. This patch adds xen_create_contiguous_region, which
ensures a region of contiguous virtual memory is also physically
contiguous.
Based on Stephen Tweedie's port of the 2.6.18-xen version.
Remove contiguous_bitmap[] as it's no longer needed.
Ported from linux-2.6.18-xen.hg 707:e410857fd83c
[ Impact: add Xen-internal API to make pages phys-contig ]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen_create_contiguous_region needs access to the balloon lock to
ensure memory doesn't change under its feet, so expose the balloon
lock
* Change the name of the lock to xen_reservation_lock, to imply it's
now less-specific usage.
[ Impact: cleanup ]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
PV DomU domains are allowed to map hardware MFNs for PCI passthrough,
but are not generally allowed to map raw machine pages. In particular,
various pieces of code try to map DMI and ACPI tables in the ISA ROM
range. We disallow _PAGE_IOMAP for those mappings, so that they are
redirected to a set of local zeroed pages we reserve for that purpose.
[ Impact: prevent passthrough of ISA space, as we only allow PCI ]
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In a Xen domain, ioremap operates on machine addresses, not
pseudo-physical addresses. We use _PAGE_IOMAP to determine whether a
mapping is intended for machine addresses.
[ Impact: allow Xen domain to map real hardware ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Now that both Xen and VMI disable allocations of PTE pages from high
memory this paravirt op serves no further purpose.
This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping
highpte pages".
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com>
Acked-by: Alok Kataria <akataria@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There's a path in the pagefault code where the kernel deliberately
breaks its own locking rules by kmapping a high pte page without
holding the pagetable lock (in at least page_check_address). This
breaks Xen's ability to track the pinned/unpinned state of the
page. There does not appear to be a viable workaround for this
behaviour so simply disable HIGHPTE for all Xen guests.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-1-git-send-email-ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Pasi Kärkkäinen <pasik@iki.fi>
Cc: <stable@kernel.org> # .32.x: 14315592: Allow highmem user page tables to be disabled at boot time
Cc: <stable@kernel.org> # .32.x
Cc: <xen-devel@lists.xensource.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
pvops kernels >= 2.6.30 can currently only be saved and restored once. The
second attempt to save results in:
ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
ERROR Internal error: Failed to map/save the p2m frame list
I finally narrowed it down to:
commit cdaead6b4e
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Fri Feb 27 15:34:59 2009 -0800
xen: split construction of p2m mfn tables from registration
Build the p2m_mfn_list_list early with the rest of the p2m table, but
register it later when the real shared_info structure is in place.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The unforeseen side-effect of this change was to cause the mfn list list to not
be rebuilt on resume. Prior to this change it would have been rebuilt via
xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().
Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer).
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We really do not need two paravirt/x86_init_ops functions which are
called in two consecutive source lines. Move the only user of
post_allocator_init into the already existing pagetable_setup_done
function.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits)
xen: cache cr0 value to avoid trap'n'emulate for read_cr0
xen/x86-64: clean up warnings about IST-using traps
xen/x86-64: fix breakpoints and hardware watchpoints
xen: reserve Xen start_info rather than e820 reserving
xen: add FIX_TEXT_POKE to fixmap
lguest: update lazy mmu changes to match lguest's use of kvm hypercalls
xen: honour VCPU availability on boot
xen: add "capabilities" file
xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet
xen/sys/hypervisor: change writable_pt to features
xen: add /sys/hypervisor support
xen/xenbus: export xenbus_dev_changed
xen: use device model for suspending xenbus devices
xen: remove suspend_cancel hook
xen/dev-evtchn: clean up locking in evtchn
xen: export ioctl headers to userspace
xen: add /dev/xen/evtchn driver
xen: add irq_from_evtchn
xen: clean up gate trap/interrupt constants
xen: set _PAGE_NX in __supported_pte_mask before pagetable construction
...
mmu.c needs to #include module.h to prevent these warnings:
arch/x86/xen/mmu.c:239: warning: data definition has no type or storage class
arch/x86/xen/mmu.c:239: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/x86/xen/mmu.c:239: warning: parameter names (without types) in function declaration
[ Impact: cleanup ]
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/frv/include/asm/pgtable.h
arch/x86/include/asm/required-features.h
arch/x86/xen/mmu.c
Merge reason: x86/xen was on a .29 base still, move it to a fresher
branch and pick up Xen fixes as well, plus resolve
conflicts
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The Xen pagetables are no longer implicitly reserved as part of the other
i386_start_kernel reservations, so make sure we explicitly reserve them.
This prevents them from being released into the general kernel free page
pool and reused.
[ Impact: fix Xen guest crash ]
Also-Bisected-by: Bryan Donlan <bdonlan@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4A032EEC.30509@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-rc1/xen/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen: add FIX_TEXT_POKE to fixmap
xen: honour VCPU availability on boot
xen: clean up gate trap/interrupt constants
xen: set _PAGE_NX in __supported_pte_mask before pagetable construction
xen: resume interrupts before system devices.
xen/mmu: weaken flush_tlb_other test
xen/mmu: some early pagetable cleanups
Xen: Add virt_to_pfn helper function
x86-64: remove PGE from must-have feature list
xen: mask XSAVE from cpuid
NULL noise: arch/x86/xen/smp.c
xen: remove xen_load_gdt debug
xen: make xen_load_gdt simpler
xen: clean up xen_load_gdt
xen: split construction of p2m mfn tables from registration
xen: separate p2m allocation from setting
xen: disable preempt for leave_lazy_mmu
Use phys_addr_t for receiving a physical address argument instead of
unsigned long. This allows fixmap to handle pages higher than 4GB on
x86-32.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping
pfns, not mfns.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
FIX_TEXT_POKE[01] are used to map kernel addresses, so they're mapping
pfns, not mfns.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Impact: fixes crashing bug
There's no particular problem with getting an empty cpu mask,
so just shortcut-return if we get one.
Avoids crash reported by Christophe Saout <christophe@saout.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
1. make sure early-allocated ptes are pinned, so they can be later
unpinned
2. don't pin pmd+pud, just make them RO
3. scatter some __inits around
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Build the p2m_mfn_list_list early with the rest of the p2m table, but
register it later when the real shared_info structure is in place.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When doing very early p2m setting, we need to separate setting
from allocation, so split things up accordingly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen_mc_flush() requires preemption to be disabled for its own sanity,
so disable it while we're flushing.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* commit 'origin/master': (4825 commits)
Fix build errors due to CONFIG_BRANCH_TRACER=y
parport: Use the PCI IRQ if offered
tty: jsm cleanups
Adjust path to gpio headers
KGDB_SERIAL_CONSOLE check for module
Change KCONFIG name
tty: Blackin CTS/RTS
Change hardware flow control from poll to interrupt driven
Add support for the MAX3100 SPI UART.
lanana: assign a device name and numbering for MAX3100
serqt: initial clean up pass for tty side
tty: Use the generic RS485 ioctl on CRIS
tty: Correct inline types for tty_driver_kref_get()
splice: fix deadlock in splicing to file
nilfs2: support nanosecond timestamp
nilfs2: introduce secondary super block
nilfs2: simplify handling of active state of segments
nilfs2: mark minor flag for checkpoint created by internal operation
nilfs2: clean up sketch file
nilfs2: super block operations fix endian bug
...
Conflicts:
arch/x86/include/asm/thread_info.h
arch/x86/lguest/boot.c
drivers/xen/manage.c
Impact: fixes crashing bug
There's no particular problem with getting an empty cpu mask,
so just shortcut-return if we get one.
Avoids crash reported by Christophe Saout <christophe@saout.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
1. make sure early-allocated ptes are pinned, so they can be later
unpinned
2. don't pin pmd+pud, just make them RO
3. scatter some __inits around
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Build the p2m_mfn_list_list early with the rest of the p2m table, but
register it later when the real shared_info structure is in place.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When doing very early p2m setting, we need to separate setting
from allocation, so split things up accordingly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
xen_mc_flush() requires preemption to be disabled for its own sanity,
so disable it while we're flushing.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Impact: remove obsolete checks, simplification
Lift restrictions on preemption with lazy mmu mode, as it is now allowed.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: allow preemption during lazy mmu updates
If we're in lazy mmu mode when context switching, leave
lazy mmu mode, but remember the task's state in
TIF_LAZY_MMU_UPDATES. When we resume the task, check this
flag and re-enter lazy mmu mode if its set.
This sets things up for allowing lazy mmu mode while preemptible,
though that won't actually be active until the next change.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: simplification, prepare for later changes
Make lazy cpu mode more specific to context switching, so that
it makes sense to do more context-switch specific things in
the callbacks.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Impact: new interface
Add a brk()-like allocator which effectively extends the bss in order
to allow very early code to do dynamic allocations. This is better than
using statically allocated arrays for data in subsystems which may never
get used.
The space for brk allocations is in the bss ELF segment, so that the
space is mapped properly by the code which maps the kernel, and so
that bootloaders keep the space free rather than putting a ramdisk or
something into it.
The bss itself, delimited by __bss_stop, ends before the brk area
(__brk_base to __brk_limit). The kernel text, data and bss is reserved
up to __bss_stop.
Any brk-allocated data is reserved separately just before the kernel
pagetable is built, as that code allocates from unreserved spaces
in the e820 map, potentially allocating from any unused brk memory.
Ultimately any unused memory in the brk area is used in the general
kernel memory pool.
Initially the brk space is set to 1MB, which is probably much larger
than any user needs (the largest current user is i386 head_32.S's code
to build the pagetables to map the kernel, which can get fairly large
with a big kernel image and no PSE support). So long as the system
has sufficient memory for the bootloader to reserve the kernel+1MB brk,
there are no bad effects resulting from an over-large brk.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The virtually mapped percpu space causes us two problems:
- for hypercalls which take an mfn, we need to do a full pagetable
walk to convert the percpu va into an mfn, and
- when a hypercall requires a page to be mapped RO via all its aliases,
we need to make sure its RO in both the percpu mapping and in the
linear mapping
This primarily affects the gdt and the vcpu info structure.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The commit
commit 4595f9620c
Author: Rusty Russell <rusty@rustcorp.com.au>
Date: Sat Jan 10 21:58:09 2009 -0800
x86: change flush_tlb_others to take a const struct cpumask
causes xen_flush_tlb_others to allocate a multicall and then issue it
without initializing it in the case where the cpumask is empty,
leading to:
[ 8.354898] 1 multicall(s) failed: cpu 1
[ 8.354921] Pid: 2213, comm: bootclean Not tainted 2.6.29-rc3-x86_32p-xenU-tip #135
[ 8.354937] Call Trace:
[ 8.354955] [<c01036e3>] xen_mc_flush+0x133/0x1b0
[ 8.354971] [<c0105d2a>] ? xen_force_evtchn_callback+0x1a/0x30
[ 8.354988] [<c0105a60>] xen_flush_tlb_others+0xb0/0xd0
[ 8.355003] [<c0126643>] flush_tlb_page+0x53/0xa0
[ 8.355018] [<c0176a80>] do_wp_page+0x2a0/0x7c0
[ 8.355034] [<c0238f0a>] ? notify_remote_via_irq+0x3a/0x70
[ 8.355049] [<c0178950>] handle_mm_fault+0x7b0/0xa50
[ 8.355065] [<c0131a3e>] ? wake_up_new_task+0x8e/0xb0
[ 8.355079] [<c01337b5>] ? do_fork+0xe5/0x320
[ 8.355095] [<c0121919>] do_page_fault+0xe9/0x240
[ 8.355109] [<c0121830>] ? do_page_fault+0x0/0x240
[ 8.355125] [<c032457a>] error_code+0x72/0x78
[ 8.355139] call 1/1: op=2863311530 arg=[aaaaaaaa] result=-38 xen_flush_tlb_others+0x41/0xd0
Since empty cpumasks are rare and undoing an xen_mc_entry() is tricky
just issue such requests normally.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Moving the mmu code from enlighten.c to mmu.c inadvertently broke the
32-bit build. Fix it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Optimization
In the native case, pte_val, make_pte, etc are all just identity
functions, so there's no need to clobber a lot of registers over them.
(This changes the 32-bit callee-save calling convention to return both
EAX and EDX so functions can return 64-bit values.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Cleanup
Move remaining mmu-related stuff into mmu.c.
A general cleanup, and lay the groundwork for later patches.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
It is an optimization and a cleanup, and adds the following new
generic percpu methods:
percpu_read()
percpu_write()
percpu_add()
percpu_sub()
percpu_and()
percpu_or()
percpu_xor()
and implements support for them on x86. (other architectures will fall
back to a default implementation)
The advantage is that for example to read a local percpu variable,
instead of this sequence:
return __get_cpu_var(var);
ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
ffffffff8102ca32: 81
ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax
We can get a single instruction by using the optimized variants:
return percpu_read(var);
ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax
I also cleaned up the x86-specific APIs and made the x86 code use
these new generic percpu primitives.
tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
* added percpu_and() for completeness's sake
* made generic percpu ops atomic against preemption
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>