Currently __assign_irq_vector() goes through each cpu in the
specified mask until it finds a free vector in all the cpu's
that are part of the same interrupt domain. We visit all the
interrupt domain sibling cpus to reserve the free vector. So,
when we fail to find a free vector in an interrupt domain, it is
safe to continue our search with a cpu belonging to a new
interrupt domain. No need to go through each cpu, if the domain
containing that cpu is already visited.
Use the irq_cfg's old_domain to track the visited domains and
optimize the cpu traversal while finding a free vector in the
given cpumask.
NOTE: We can also optimize the search by using for_each_cpu() and
skip the current cpu, if it is not the first cpu in the mask
returned by the vector_allocation_domain(). But re-using the
cfg->old_domain to track the visited domains will be slightly
faster.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the ->irq_set_affinity() routines out of the #ifdef CONFIG_SMP
sections and use config_enabled(CONFIG_SMP) checks inside those
routines. Thus making those routines simple null stubs for
!CONFIG_SMP and retaining those routines with no additional
runtime overhead for CONFIG_SMP kernels.
Cleans up the ifdef CONFIG_SMP in and around routines related to
irq_set_affinity in io_apic and irq_remapping subsystems.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: torvalds@linux-foundation.org
Cc: joerg.roedel@amd.com
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1339723729.3475.63.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cpu_mask_to_apicid_and() always returns apicid of a single CPU,
even in case multiple CPUs were requested. This update fixes a
typo and forces apicid of a cluster to be returned.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614075043.GI3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In case of invalid parameters cpu_mask_to_apicid_and() might
return apicid value of 0 (on Summit) or a uninitialized value
(on ES7000), although it is supposed to return apicid of cpu-0
at least. Fix the operation to always return a valid apicid.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614075026.GH3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since there are only two locations where cpu_mask_to_apicid() is
called from, remove the operation and use only
cpu_mask_to_apicid_and() instead.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Suggested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614074935.GE3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit 8637e38 ("x86/apic: Avoid useless scanning thru a
cpumask in assign_irq_vector()") vector_allocation_domain()
operation indicates if a cpumask is dynamic or static. This
update fixes the oversight and makes the operation to return a
value.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614103933.GJ3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently cpu_mask_to_apicid() should not get a offline CPU with
the cpumask. Otherwise some apic drivers might try to access
non-existent per-cpu variables (i.e. x2apic). In that regard
cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations are
inconsistent.
This fix makes the two operations do not rely on calling
functions and always return the apicid for only online CPUs. As
result, the meaning and implementations of cpu_mask_to_apicid()
and cpu_mask_to_apicid_and() operations become straight.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131624.GG4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Current cpu_mask_to_apicid() and cpu_mask_to_apicid_and()
implementations have few shortcomings:
1. A value returned by cpu_mask_to_apicid() is written to
hardware registers unconditionally. Should BAD_APICID get ever
returned it will be written to a hardware too. But the value of
BAD_APICID is not universal across all hardware in all modes and
might cause unexpected results, i.e. interrupts might get routed
to CPUs that are not configured to receive it.
2. Because the value of BAD_APICID is not universal it is
counter- intuitive to return it for a hardware where it does not
make sense (i.e. x2apic).
3. cpu_mask_to_apicid_and() operation is thought as an
complement to cpu_mask_to_apicid() that only applies a AND mask
on top of a cpumask being passed. Yet, as consequence of 18374d8
commit the two operations are inconsistent in that of:
cpu_mask_to_apicid() should not get a offline CPU with the cpumask
cpu_mask_to_apicid_and() should not fail and return BAD_APICID
These limitations are impossible to realize just from looking at
the operations prototypes.
Most of these shortcomings are resolved by returning a error
code instead of BAD_APICID. As the result, faults are reported
back early rather than possibilities to cause a unexpected
behaviour exist (in case of [1]).
The only exception is setup_timer_IRQ0_pin() routine. Although
obviously controversial to this fix, its existing behaviour is
preserved to not break the fragile check_timer() and would
better addressed in a separate fix.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131559.GF4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In case of static vector allocation domains (i.e. flat) if all
vector numbers are exhausted, an attempt to assign a new vector
will lead to useless scans through all CPUs in the cpumask, even
though it is known that each new pass would fail. Make this
corner case less painful by letting report whether the vector
allocation domain depends on passed arguments or not and stop
scanning early.
The same could have been achived by introducing a static flag to
the apic operations. But let's allow vector_allocation_domain()
have more intelligence here and decide dynamically, in case we
would need it in the future.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131542.GE4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When assigning a new vector it is primarially done by adding 8
to the previously given out vector number. Hence, two
consequently allocated vector numbers would likely fall into the
same priority level. Try to spread vector numbers to different
priority levels better by changing the step from 8 to 16.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131514.GD4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If the HW implements round-robin interrupt delivery, this
enables multiple cpu's (which are part of the user specified
interrupt smp_affinity mask and belong to the same x2apic
cluster) to service the interrupt.
Also if the platform supports Power Aware Interrupt Routing,
then this enables the interrupt to be routed to an idle cpu or a
busy cpu depending on the perf/power bias tunable.
We are now grouping all the cpu's in a cluster to one vector
domain. So that will limit the total number of interrupt sources
handled by Linux. Previously we support "cpu-count *
available-vectors-per-cpu" interrupt sources but this will now
reduce to "cpu-count/16 * available-vectors-per-cpu".
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: gorcunov@openvz.org
Cc: agordeev@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337644682-19854-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Until now, irq_cfg domain is mostly static. Either all CPU's
(used by flat mode) or one CPU (first CPU in the irq afffinity
mask) to which irq is being migrated (this is used by the rest
of apic modes).
Upcoming x2apic cluster mode optimization patch allows the irq
to be sent to any CPU in the x2apic cluster (if supported by the
HW). So irq_cfg domain changes on the fly (depending on which
CPU in the x2apic cluster is online).
Instead of checking for any intersection between the new irq
affinity mask and the current irq_cfg domain, check if the new
irq affinity mask is a subset of the current irq_cfg domain.
Otherwise proceed with updating the irq_cfg domain aswell as
assigning vector's on all the CPUs specified in the new mask.
This also cleans up a workaround in updating irq_cfg domain for
legacy irq's that are handled by the IO-APIC.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: gorcunov@openvz.org
Cc: agordeev@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337644682-19854-1-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some subarchitectures (such as vSMP) need to slightly adjust the
underlying APIC structure. Add an APIC post-initialization callback
to 'struct x86_platform_ops' for this purpose and use it for
adjusting the APIC structure on vSMP systems.
Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1338675095-27260-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The interrupt chip irq_set_affinity() functions copy the affinity mask
to irq_data->affinity but return 0, i.e. IRQ_SET_MASK_OK.
IRQ_SET_MASK_OK causes the core code to do another redundant copy.
Return IRQ_SET_MASK_OK_NOCOPY to avoid this.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Keping Chen <chenkeping@huawei.com>
Link: http://lkml.kernel.org/r/1333120296-13563-4-git-send-email-jiang.liu@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull the MCA deletion branch from Paul Gortmaker:
"It was good that we could support MCA machines back in the day, but
realistically, nobody is using them anymore. They were mostly limited
to 386-sx 16MHz CPU and some 486 class machines and never more than
64MB of RAM. Even the enthusiast hobbyist community seems to have
dried up close to ten years ago, based on what you can find searching
various websites dedicated to the relatively short lived hardware.
So lets remove the support relating to CONFIG_MCA. There is no point
carrying this forward, wasting cycles doing routine maintenance on it;
wasting allyesconfig build time on validating it, wasting I/O on git
grep'ping over it, and so on."
Let's see if anybody screams. It generally has compiled, and James
Bottomley pointed out that there was a MCA extension from NCR that
allowed for up to 4GB of memory and PPro-class machines. So in *theory*
there may be users out there.
But even James (technically listed as a maintainer) doesn't actually
have a system, and while Alan Cox claims to have a machine in his cellar
that he offered to anybody who wants to take it off his hands, he didn't
argue for keeping MCA support either.
So we could bring it back. But somebody had better speak up and talk
about how they have actually been using said MCA hardware with modern
kernels for us to do that. And David already took the patch to delete
all the networking driver code (commit a5e371f61a: "drivers/net:
delete all code/drivers depending on CONFIG_MCA").
* 'delete-mca' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
MCA: delete all remaining traces of microchannel bus support.
scsi: delete the MCA specific drivers and driver code
serial: delete the MCA specific 8250 support.
arm: remove ability to select CONFIG_MCA
Pull x86/apic changes from Ingo Molnar:
"Most of the changes are about helping virtualized guest kernels
achieve better performance."
Fix up trivial conflicts with the iommu updates to arch/x86/kernel/apic/io_apic.c
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Implement EIO micro-optimization
x86/apic: Add apic->eoi_write() callback
x86/apic: Use symbolic APIC_EOI_ACK
x86/apic: Fix typo EIO_ACK -> EOI_ACK and document it
x86/xen/apic: Add missing #include <xen/xen.h>
x86/apic: Only compile local function if used with !CONFIG_GENERIC_PENDING_IRQ
x86/apic: Fix UP boot crash
x86: Conditionally update time when ack-ing pending irqs
xen/apic: implement io apic read with hypercall
Revert "xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'"
xen/x86: Implement x86_apic_ops
x86/apic: Replace io_apic_ops with x86_io_apic_ops.
We know both register and value for eoi beforehand,
so there's no need to check it and no need to do math
to calculate the msr. Saves instructions/branches
on each EOI when using x2apic.
I looked at the objdump output to verify that the
generated code looks right and actually is shorter.
The real improvemements will be on the KVM guest side
though, those come in a later patch.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/e019d1a125316f10d3e3a4b2f6bda41473f4fb72.1337184153.git.mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add eoi_write callback so that kvm can override
eoi accesses without touching the rest of the apic.
As a side-effect, this will enable a micro-optimization
for apics using msr.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0df425d746c49ac2ecc405174df87752869629d2.1337184153.git.mst@redhat.com
[ tidied it up a bit ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Hardware with MCA bus is limited to 386 and 486 class machines
that are now 20+ years old and typically with less than 32MB
of memory. A quick search on the internet, and you see that
even the MCA hobbyist/enthusiast community has lost interest
in the early 2000 era and never really even moved ahead from
the 2.4 kernels to the 2.6 series.
This deletes anything remaining related to CONFIG_MCA from core
kernel code and from the x86 architecture. There is no point in
carrying this any further into the future.
One complication to watch for is inadvertently scooping up
stuff relating to machine check, since there is overlap in
the TLA name space (e.g. arch/x86/boot/mca.c).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The local function io_apic_level_ack_pending() is only called
from io_apic_level_ack_pending(). The later function is only
compiled if CONFIG_GENERIC_PENDING_IRQ is defined. Move the
io_apic_level_ack_pending() to the existing #ifdef
CONFIG_GENERIC_PENDING_IRQ code block.
This will remove the following warning message during compiling
without CONFIG_GENERIC_PENDING_IRQ defined:
* arch/x86/kernel/apic/io_apic.c:382: warning: ‘io_apic_level_ack_pending’ defined but not used
Signed-off-by: Márton Németh <nm127@freemail.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/1336461860.2296.3.camel@sbsiddha-mobl2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On virtual environments, apic_read could take a long time. As a
result, under certain conditions the ack pending loop may exit
without any queued irqs left, but after more than one second. A
warning will be printed needlessly in this case.
If the loop is about to exit regardless of max_loops, don't
update it.
Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ rebased and reworded the commit message]
Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1334873552-31346-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Make the file names consistent with the naming conventions of irq subsystem.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Make the code consistent with the naming conventions of irq subsystem.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Remove the Intel specific interfaces from dmar.h and remove
asm/irq_remapping.h which is only used for io_apic.c anyway.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The operation for releasing a remapping entry is iommu
specific too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The function to set interrupt affinity with interrupt
remapping enabled is Intel specific too. So move it to the
irq_remap_ops too.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The IOAPIC setup routine for interrupt remapping is VT-d
specific. Move it to the irq_remap_ops and add a call helper
function.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Convert these calls too:
* Disable of remapping hardware
* Reenable of remapping hardware
* Enable fault handling
With that all of arch/x86/kernel/apic/apic.c is converted to
use the generic intr-remapping interface.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This patch introduces irq_remap_ops to hold implementation
specific function pointer to handle interrupt remapping. As
the first part the initialization functions for VT-d are
converted to these ops.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Which makes the code fit within the rest of the x86_ops functions.
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
[v1: Changed x86_apic -> x86_ioapic per Yinghai Lu <yinghai@kernel.org> suggestion]
[v2: Rebased on tip/x86/urgent and redid to match Ingo's syntax style]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Provide systems that do not support x2apic cluster mode
a mechanism to select x2apic physical mode using the
FADT FORCE_APIC_PHYSICAL_DESTINATION_MODE bit.
Changes from v1: (based on Suresh's comments)
- removed #ifdef CONFIG_ACPI
- removed #include <linux/acpi.h>
Signed-off-by: Greg Pearson <greg.pearson@hp.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1335313436-32020-1-git-send-email-greg.pearson@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Current APIC code assumes MSR_IA32_APICBASE is present for all systems.
Pentium Classic P5 and friends didn't have this MSR. MSR_IA32_APICBASE
was introduced as an architectural MSR by Intel @ P6.
Code paths that can touch this MSR invalidly are when vendor == Intel &&
cpu-family == 5 and APIC bit is set in CPUID - or when you simply pass
lapic on the kernel command line, on a P5.
The below patch stops Linux incorrectly interfering with the
MSR_IA32_APICBASE for P5 class machines. Other code paths exist that
touch the MSR - however those paths are not currently reachable for a
conformant P5.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linux.intel.com>
Link: http://lkml.kernel.org/r/4F8EEDD3.1080404@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
It's only called from amd.c:srat_detect_node(). The introduced
condition for calling the fixup code is true for all AMD
multi-node processors, e.g. Magny-Cours and Interlagos. There we
have 2 NUMA nodes on one socket. Thus there are cores having
different numa-node-id but with equal phys_proc_id.
There is no point to print error messages in such a situation.
The confusing/misleading error message was introduced with
commit 64be4c1c24 ("x86: Add
x86_init platform override to fix up NUMA core numbering").
Remove the default fixup function (especially the error message)
and replace it by a NULL pointer check, move the
Numascale-specific condition for calling the fixup into the
fixup-function itself and slightly adapt the comment.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Cc: <sp@numascale.com>
Cc: <bp@amd64.org>
Cc: <daniel@numascale-asia.com>
Link: http://lkml.kernel.org/r/20120402160648.GR27684@alberich.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add information about LVT offset assignments to better debug firmware
bugs related to this. See following examples.
# dmesg | grep -i 'offset\|ibs'
LVT offset 0 assigned for vector 0xf9
[Firmware Bug]: cpu 0, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
[Firmware Bug]: cpu 0, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
Failed to setup IBS, -22
In this case the BIOS assigns both offsets for MCE (0xf9) and IBS
(0x400) vectors to offset 0, which is why the second APIC setup (IBS)
failed.
With correct setup you get:
# dmesg | grep -i 'offset\|ibs'
LVT offset 0 assigned for vector 0xf9
LVT offset 1 assigned for vector 0x400
IBS: LVT offset 1 assigned
perf: AMD IBS detected (0x00000007)
oprofile: AMD IBS detected (0x00000007)
Note: The vector includes also the message type to handle also NMIs
(0x400). In the firmware bug message the format is the same as of the
APIC500 register and includes the mask bit (bit 16) in addition.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Xen dom0 needs to paravirtualize IO operations to the IO APIC,
so add a io_apic_ops for it to intercept. Do this as ops
structure because there's at least some chance that another
paravirtualized environment may want to intercept these.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: jwboyer@redhat.com
Cc: yinghai@kernel.org
Link: http://lkml.kernel.org/r/1332385090-18056-2-git-send-email-konrad.wilk@oracle.com
[ Made all the affected code easier on the eyes ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch removes dead code from certain .config variations.
When CONFIG_GENERIC_PENDING_IRQ=n irq move and reenable code is
never get executed, nor do_unmask_irq variable updates its init
value. Move the code under CONFIG_GENERIC_PENDING_IRQ macro.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120320141935.GA24806@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As suggested by Suresh Siddha and Yinghai Lu:
For x2apic pre-enabled systems, apic driver is set already early
through early_acpi_boot_init()/early_acpi_process_madt()/
acpi_parse_madt()/default_acpi_madt_oem_check() path so that
apic_id_valid() checking will be sufficient during MADT and SRAT
parsing.
For non-x2apic pre-enabled systems, all apic ids should be less
than 255.
This allows us to substitute the checks in
arch/x86/kernel/acpi/boot.c::acpi_parse_x2apic() and
arch/x86/mm/srat.c::acpi_numa_x2apic_affinity_init() with
apic->apic_id_valid().
In addition we can avoid feigning the x2apic cpu feature in the
NumaChip apic code.
The following apic drivers have separate apic_id_valid()
functions which will accept x2apic type IDs :
x2apic_phys
x2apic_cluster
x2apic_uv_x
apic_numachip
Signed-off-by: Steffen Persvold <sp@numascale.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Daniel J Blueman <daniel@numascale-asia.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1331925935-13372-1-git-send-email-sp@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 "urgent" leftovers from Ingo Molnar:
"Pending x86/urgent bits that were not high prio enough to warrant
-rc-less v3.3-final inclusion."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, efi: Fix pointer math issue in handle_ramdisks()
x86/ioapic: Add register level checks to detect bogus io-apic entries
x86, mce: Fix rcu splat in drain_mce_log_buffer()
x86, memblock: Move mem_hole_size() to .init
Move APIC ID validity check into platform APIC code, so it can
be overridden when needed. For NumaChip systems, always trust
MADT, as it's constructed with high APIC IDs.
Behaviour verifies on standard x86 systems and on NumaChip
systems with this, and compile-tested with allyesconfig.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Reviewed-by: Steffen Persvold <sp@numascale.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1331709454-27966-1-git-send-email-daniel@numascale-asia.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
With the recent changes to clear_IO_APIC_pin() which tries to
clear remoteIRR bit explicitly, some of the users started to see
"Unable to reset IRR for apic .." messages.
Close look shows that these are related to bogus IO-APIC entries
which return's all 1's for their io-apic registers. And the
above mentioned error messages are benign. But kernel should
have ignored such io-apic's in the first place.
Check if register 0, 1, 2 of the listed io-apic are all 1's and
ignore such io-apic.
Reported-by: Álvaro Castillo <midgoon@gmail.com>
Tested-by: Jon Dufresne <jon@jondufresne.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: kernel-team@fedoraproject.org
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1331577393.31585.94.camel@sbsiddha-desk.sc.intel.com
[ Performed minor cleanup of affected code. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We use MP IRQs for SFI presented timer interrupts, we should
also set mp_bus_not_pci for MP_ISA_BUS so that pin_2_irq mapping
is correct.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com>
Link: http://lkml.kernel.org/n/tip-8h3rc1igpp8ir94aas69qmhk@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>