Commit Graph

5852 Commits

Author SHA1 Message Date
Ingo Molnar
3ddeb51d9c Merge branch 'linus' into core/percpu
Conflicts:
	arch/x86/kernel/setup_percpu.c
2009-01-27 12:01:51 +01:00
Linus Torvalds
cfb901bf84 Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Warn on deprecated binding model use
  eeprom: More consistent symbol names
  eeprom: Move 93cx6 eeprom driver to /drivers/misc/eeprom
  spi: Move at25 (for SPI eeproms) to /drivers/misc/eeprom
  i2c: Move old eeprom driver to /drivers/misc/eeprom
  i2c: Move at24 to drivers/misc/eeprom
  i2c: Quilt tree has moved
  i2c: Delete many unused adapter IDs
  i2c: Delete 10 unused driver IDs
2009-01-26 15:11:41 -08:00
Linus Torvalds
2034563ca3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: fix kbuild.txt typos
  kbuild: print usage with no arguments in scripts/config
  Revert "kbuild: strip generated symbols from *.ko"
2009-01-26 15:10:37 -08:00
Jean Delvare
dd7f8dbe2b eeprom: More consistent symbol names
Now that all EEPROM drivers live in the same place, let's harmonize
their symbol names.

Also fix eeprom's dependencies, it definitely needs sysfs, and is no
longer experimental after many years in the kernel tree.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
2009-01-26 21:19:57 +01:00
Linus Torvalds
3386c05bdb Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debugobjects: add and use INIT_WORK_ON_STACK
  rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR
  relay: fix lock imbalance in relay_late_setup_files
  oprofile: fix uninitialized use of struct op_entry
  rcu: move Kconfig menu
  softlock: fix false panic which can occur if softlockup_thresh is reduced
  rcu: add __cpuinit to rcu_init_percpu_data()
2009-01-26 09:47:56 -08:00
Linus Torvalds
1e70c7f7a9 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  hrtimers: fix inconsistent lock state on resume in hres_timers_resume
  time-sched.c: tick_nohz_update_jiffies should be static
  locking, hpet: annotate false positive warning
  kernel/fork.c: unused variable 'ret'
  itimers: remove the per-cpu-ish-ness
2009-01-26 09:47:43 -08:00
Linus Torvalds
810ee58de2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
  xen: unitialised return value in xenbus_write_transaction
  x86: fix section mismatch warning
  x86: unmask CPUID levels on Intel CPUs, fix
  x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
  x86: use standard PIT frequency
  xen: handle highmem pages correctly when shrinking a domain
  x86, mm: fix pte_free()
  xen: actually release memory when shrinking domain
  x86: unmask CPUID levels on Intel CPUs
  x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
  x86: fix PTE corruption issue while mapping RAM using /dev/mem
  x86: mtrr fix debug boot parameter
  x86: fix page attribute corruption with cpa()
  Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
  x86: use early clobbers in usercopy*.c
  x86: remove kernel_physical_mapping_init() from init section
  fix: crash: IP: __bitmap_intersects+0x48/0x73
  cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
  work_on_cpu: Use our own workqueue.
  work_on_cpu: don't try to get_online_cpus() in work_on_cpu.
  ...
2009-01-26 09:47:28 -08:00
Rakib Mullick
659d2618b3 x86: fix section mismatch warning
Here function vmi_activate calls a init function activate_vmi , which
causes the following section mismatch warnings:

  LD      arch/x86/kernel/built-in.o
WARNING: arch/x86/kernel/built-in.o(.text+0x13ba9): Section mismatch
in reference from the function vmi_activate() to the function
.init.text:vmi_time_init()
The function vmi_activate() references
the function __init vmi_time_init().
This is often because vmi_activate lacks a __init
annotation or the annotation of vmi_time_init is wrong.

WARNING: arch/x86/kernel/built-in.o(.text+0x13bd1): Section mismatch
in reference from the function vmi_activate() to the function
.devinit.text:vmi_time_bsp_init()
The function vmi_activate() references
the function __devinit vmi_time_bsp_init().
This is often because vmi_activate lacks a __devinit
annotation or the annotation of vmi_time_bsp_init is wrong.

WARNING: arch/x86/kernel/built-in.o(.text+0x13bdb): Section mismatch
in reference from the function vmi_activate() to the function
.devinit.text:vmi_time_ap_init()
The function vmi_activate() references
the function __devinit vmi_time_ap_init().
This is often because vmi_activate lacks a __devinit
annotation or the annotation of vmi_time_ap_init is wrong.

Fix it by marking vmi_activate() as __init too.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-26 14:27:18 +01:00
Ingo Molnar
99fb4d349d x86: unmask CPUID levels on Intel CPUs, fix
Impact: fix boot hang on pre-model-15 Intel CPUs

rdmsrl_safe() does not work in very early bootup code yet, because we
dont have the pagefault handler installed yet so exception section
does not get parsed. rdmsr_safe() will just crash and hang the bootup.

So limit the MSR_IA32_MISC_ENABLE MSR read to those CPU types that
support it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-26 12:36:24 +01:00
Eric Anholt
ef5fa0ab24 x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
In the absence of PAT, PAGE_KERNEL_WC ends up mapping to a memory type that
gets UC behavior even in the presence of a WC MTRR covering the area in
question.  By swapping to PAGE_KERNEL_UC_MINUS, we can get the actual
behavior the caller wanted (WC if you can manage it, UC otherwise).

This recovers the 40% performance improvement of using WC in the DRM
to upload vertex data.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-01-26 11:14:27 +01:00
Ingo Molnar
e1b4d11436 x86: use standard PIT frequency
the RDC and ELAN platforms use slighly different PIT clocks, resulting in
a timex.h hack that changes PIT_TICK_RATE during build time. But if a
tester enables any of these platform support .config options, the PIT
will be miscalibrated on standard PC platforms.

So use one frequency - in a subsequent patch we'll add a quirk to allow
x86 platforms to define different PIT frequencies.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-25 16:57:47 +01:00
Peter Zijlstra
42ef73fe13 x86, mm: fix pte_free()
On -rt we were seeing spurious bad page states like:

Bad page state in process 'firefox'
page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
[<c043d0f3>] ? printk+0x14/0x19
[<c0272d4e>] bad_page+0x4e/0x79
[<c0273831>] free_hot_cold_page+0x5b/0x1d3
[<c02739f6>] free_hot_page+0xf/0x11
[<c0273a18>] __free_pages+0x20/0x2b
[<c027d170>] __pte_alloc+0x87/0x91
[<c027d25e>] handle_mm_fault+0xe4/0x733
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c0218875>] do_page_fault+0x36f/0x88a

This is the case where a concurrent fault already installed the PTE and
we get to free the newly allocated one.

This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
which is overlaid with the {private, mapping} struct.

union {
    struct {
        unsigned long private;
        struct address_space *mapping;
    };
    spinlock_t ptl;
    struct kmem_cache *slab;
    struct page *first_page;
};

Normally the spinlock is small enough to not stomp on page->mapping, but
PREEMPT_RT=y has huge 'spin'locks.

But lockdep kernels should also be able to trigger this splat, as the
lock tracking code grows the spinlock to cover page->mapping.

The obvious fix is calling pgtable_page_dtor() like the regular pte free
path __pte_free_tlb() does.

It seems all architectures except x86 and nm10300 already do this, and
nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
doesn't do SMP or simply doesnt do MMU at all or something.

Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
2009-01-23 18:42:06 +01:00
Ingo Molnar
99d0000f71 x86, xen: fix hardirq.h merge fallout
Impact: build fix

This build error:

 arch/x86/xen/suspend.c:22: error: implicit declaration of function 'fix_to_virt'
 arch/x86/xen/suspend.c:22: error: 'FIX_PARAVIRT_BOOTMAP' undeclared (first use in this function)
 arch/x86/xen/suspend.c:22: error: (Each undeclared identifier is reported only once
 arch/x86/xen/suspend.c:22: error: for each function it appears in.)

triggers because the hardirq.h unification removed an implicit fixmap.h
include - on which arch/x86/xen/suspend.c depended. Add the fixmap.h
include explicitly.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-23 11:09:15 +01:00
Brian Gerst
2de3a5f795 x86: make irq_cpustat_t fields conditional
Impact: shrink size of irq_cpustat_t when possible

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-23 11:03:32 +09:00
Brian Gerst
22da7b3df3 x86: merge hardirq_{32,64}.h into hardirq.h
Impact: cleanup

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-23 11:03:31 +09:00
Brian Gerst
658a9a2c34 x86: sync hardirq_{32,64}.h
Impact: better code generation and removal of unused field for 32bit

In general, use the 64-bit version.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-23 11:03:31 +09:00
Brian Gerst
3819cd489e x86: remove include of apic.h from hardirq_64.h
Impact: cleanup

APIC definitions aren't needed here.  Remove the include and fix
up the fallout.

tj: added include to mce_intel_64.c.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-23 11:03:29 +09:00
Brian Gerst
03d2989df9 x86: remove idle_timestamp from 32bit irq_cpustat_t
Impact: bogus irq_cpustat field removed

idle_timestamp is left over from the removed irqbalance code.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-23 11:03:28 +09:00
Thomas Gleixner
336f6c322d debugobjects: add and use INIT_WORK_ON_STACK
Impact: Fix debugobjects warning

debugobject enabled kernels spit out a warning in hpet code due to a
workqueue which is initialized on stack.

Add INIT_WORK_ON_STACK() which calls init_timer_on_stack() and use it
in hpet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-01-22 10:02:07 +01:00
H. Peter Anvin
066941bd4e x86: unmask CPUID levels on Intel CPUs
Impact: Fixes crashes with misconfigured BIOSes on XSAVE hardware

Avuton Olrich reported early boot crashes with v2.6.28 and
bisected it down to dc1e35c6e9
("x86, xsave: enable xsave/xrstor on cpus with xsave support").

If the CPUID limit bit in MSR_IA32_MISC_ENABLE is set, clear it to
make all CPUID information available.  This is required for some
features to work, in particular XSAVE.

Reported-and-bisected-by: Avuton Olrich <avuton@gmail.com>
Tested-by: Avuton Olrich <avuton@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-22 09:24:02 +01:00
H. Peter Anvin
bdf21a49ba x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
Impact: None (new bit definitions currently unused)

Add bit definitions for the MSR_IA32_MISC_ENABLE MSRs to
<asm/msr-index.h>.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-01-21 15:13:53 -08:00
Suresh Siddha
9597134218 x86: fix PTE corruption issue while mapping RAM using /dev/mem
Beschorner Daniel reported:
> hwinfo problem since 2.6.28, showing this in the oops:
>	Corrupted page table at address 7fd04de3ec00

Also, PaX Team reported a regression with this commit:

>	commit 9542ada803
>	Author: Suresh Siddha <suresh.b.siddha@intel.com>
>	Date:   Wed Sep 24 08:53:33 2008 -0700
>
>	    x86: track memtype for RAM in page struct

This commit breaks mapping any RAM page through /dev/mem, as the
reserve_memtype() was not initializing the return attribute type and as such
corrupting the PTE entry that was setup with the return attribute type.

Because of this bug, application mapping this RAM page through /dev/mem
will die with "Corrupted page table at address xxxx" message in the kernel
log and also the kernel identity mapping which maps the underlying RAM
page gets converted to UC.

Fix this by initializing the return attribute type before calling
reserve_ram_pages_type()

Reported-by: PaX Team <pageexec@freemail.hu>
Reported-and-tested-by: Beschorner Daniel <Daniel.Beschorner@facton.com>
Tested-and-Acked-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 18:42:32 +01:00
Nick Piggin
03b486322e x86: make UV support configurable
Make X86 SGI Ultraviolet support configurable. Saves about 13K of text size
on my modest config.

   text    data     bss     dec     hex filename
6770537 1158680  694356 8623573  8395d5 vmlinux
6757492 1157664  694228 8609384  835e68 vmlinux.nouv

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 13:00:42 +01:00
Thomas Renninger
731f1872f4 x86: mtrr fix debug boot parameter
while looking at:

  http://bugzilla.kernel.org/show_bug.cgi?id=11541

I realized that the mtrr.show param cannot work, because
the code is processed much too early.

This patch:
 - Declares mtrr.show as early_param
 - Stays consistent with the previous param (which I doubt
   that it ever worked), so mtrr.show=1 would still work
 - Declares mtrr_show as initdata

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 12:26:42 +01:00
Suresh Siddha
a1e46212a4 x86: fix page attribute corruption with cpa()
Impact: fix sporadic slowdowns and warning messages

This patch fixes a performance issue reported by Linus on his
Nehalem system. While Linus reverted the PAT patch (commit
58dab916df) which exposed the issue,
existing cpa() code can potentially still cause wrong(page attribute
corruption) behavior.

This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
various people reported.

In 64bit kernel, kernel identity mapping might have holes depending
on the available memory and how e820 reports the address range
covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
in the address range that is not listed by e820 entries, kernel identity
mapping will have a corresponding hole in its 1-1 identity mapping.

If cpa() happens on the kernel identity mapping which falls into these holes,
existing code fails like this:

	__change_page_attr_set_clr()
		__change_page_attr()
			returns 0 because of if (!kpte). But doesn't
			set cpa->numpages and cpa->pfn.
		cpa_process_alias()
			uses uninitialized cpa->pfn (random value)
			which can potentially lead to changing the page
			attribute of kernel text/data, kernel identity
			mapping of RAM pages etc. oops!

This bug was easily exposed by another PAT patch which was doing
cpa() more often on kernel identity mapping holes (physical range between
max_low_pfn_mapped and 4GB), where in here it was setting the
cache disable attribute(PCD) for kernel identity mappings aswell.

Fix cpa() to handle the kernel identity mapping holes. Retain
the WARN() for cpa() calls to other not present address ranges
(kernel-text/data, ioremap() addresses)

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 12:24:54 +01:00
Ingo Molnar
ace6c6c840 x86: make x86_32 use tlb_64.c, build fix, clean up X86_L1_CACHE_BYTES
Fix:

  arch/x86/mm/tlb.c:47: error: ‘CONFIG_X86_INTERNODE_CACHE_BYTES’ undeclared here (not in a function)

The CONFIG_X86_INTERNODE_CACHE_BYTES symbol is only defined on 64-bit,
because vsmp support is 64-bit only. Define it on 32-bit too - where it
will always be equal to X86_L1_CACHE_BYTES.

Also move the default of X86_L1_CACHE_BYTES (which is separate from the
more commonly used L1_CACHE_SHIFT kconfig symbol) from 128 bytes to
64 bytes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 11:17:11 +01:00
Ingo Molnar
198030782c Merge branch 'x86/mm' into core/percpu
Conflicts:
	arch/x86/mm/fault.c
2009-01-21 10:39:51 +01:00
Ingo Molnar
4ec71fa2d2 x86: uv cleanup, build fix
Fix:

 arch/x86/mm/srat_64.c: In function ‘acpi_numa_processor_affinity_init’:
 arch/x86/mm/srat_64.c:141: error: implicit declaration of function ‘get_uv_system_type’
 arch/x86/mm/srat_64.c:141: error: ‘UV_X2APIC’ undeclared (first use in this function)
 arch/x86/mm/srat_64.c:141: error: (Each undeclared identifier is reported only once
 arch/x86/mm/srat_64.c:141: error: for each function it appears in.)

A couple of UV definitions were moved to asm/uv/uv.h, but srat_64.c did
not include that header. Add it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 10:24:27 +01:00
Ingo Molnar
55f4949f57 x86, mm: move tlb.c to arch/x86/mm/
Impact: cleanup

Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
to arch/x86/mm/, where it belongs logically.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 10:16:19 +01:00
Ingo Molnar
3eb3963fd1 Merge branch 'cpus4096' into core/percpu
Conflicts:
	arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
	arch/x86/kernel/tlb_32.c

Merge it here because both the cpumask changes and the ongoing percpu
work is touching the TLB code. The percpu changes take precedence, as
they eliminate tlb_32.c altogether.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 10:14:17 +01:00
Ingo Molnar
552b8aa4d1 Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
This reverts commit 4217458daf.

Justin Madru bisected this commit, it was causing weird Firefox
crashes.

The reason is that GCC mis-optimizes (re-uses) the on-stack parameters of
the calling frame, which corrupts the syscall return pt_regs state and
thus corrupts user-space register state.

So we go back to the slightly less clean but more optimization-safe
method of getting to pt_regs. Also add a comment to explain this.

Resolves: http://bugzilla.kernel.org/show_bug.cgi?id=12505

Reported-and-bisected-by: Justin Madru <jdm64@gawab.com>
Tested-by: Justin Madru <jdm64@gawab.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 09:43:18 +01:00
Andi Kleen
e0a96129db x86: use early clobbers in usercopy*.c
Impact: fix rare (but currently harmless) miscompile with certain configs and gcc versions

Hugh Dickins noticed that strncpy_from_user() was miscompiled
in some circumstances with gcc 4.3.

Thanks to Hugh's excellent analysis it was easy to track down.

Hugh writes:

> Try building an x86_64 defconfig 2.6.29-rc1 kernel tree,
> except not quite defconfig, switch CONFIG_PREEMPT_NONE=y
> and CONFIG_PREEMPT_VOLUNTARY off (because it expands a
> might_fault() there, which hides the issue): using a
> gcc 4.3.2 (I've checked both openSUSE 11.1 and Fedora 10).
>
> It generates the following:
>
> 0000000000000000 <__strncpy_from_user>:
>    0:   48 89 d1                mov    %rdx,%rcx
>    3:   48 85 c9                test   %rcx,%rcx
>    6:   74 0e                   je     16 <__strncpy_from_user+0x16>
>    8:   ac                      lods   %ds:(%rsi),%al
>    9:   aa                      stos   %al,%es:(%rdi)
>    a:   84 c0                   test   %al,%al
>    c:   74 05                   je     13 <__strncpy_from_user+0x13>
>    e:   48 ff c9                dec    %rcx
>   11:   75 f5                   jne    8 <__strncpy_from_user+0x8>
>   13:   48 29 c9                sub    %rcx,%rcx
>   16:   48 89 c8                mov    %rcx,%rax
>   19:   c3                      retq
>
> Observe that "sub %rcx,%rcx; mov %rcx,%rax", whereas gcc 4.2.1
> (and many other configs) say "sub %rcx,%rdx; mov %rdx,%rax".
> Isn't it returning 0 when it ought to be returning strlen?

The asm constraints for the strncpy_from_user() result were missing an
early clobber, which tells gcc that the last output arguments
are written before all input arguments are read.

Also add more early clobbers in the rest of the file and fix 32-bit
usercopy.c in the same way.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
[ since this API is rarely used and no in-kernel user relies on a 'len'
  return value (they only rely on negative return values) this miscompile
  was never noticed in the field. But it's worth fixing it nevertheless. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-21 09:43:17 +01:00
Tejun Heo
16c2d3f895 x86: rename tlb_64.c to tlb.c
Impact: file rename

tlb_64.c is now the tlb code for both 32 and 64.  Rename it to tlb.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:06 +09:00
Tejun Heo
02cf94c370 x86: make x86_32 use tlb_64.c
Impact: less contention when issuing invalidate IPI, cleanup

Make x86_32 use the same tlb code as 64bit.  The 64bit code uses
multiple IPI vectors for tlb shootdown to reduce contention.  This
patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
code paths.

Note that the usage of asmlinkage is inconsistent for x86_32 and 64
and calls for further cleanup.  This has been noted with a FIXME
comment in tlb_64.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:06 +09:00
Tejun Heo
6dd01bedee x86: prepare for tlb merge
Impact: clean up, ipi vector number reordering for x86_32

Make the following changes to prepare for tlb merge.

* reorder x86_32 ip vectors

* adjust tlb_32.c and tlb_64.c such that their logics coincide exactly
	- on spurious invalidate ipi, tlb_32 acks the irq
	- tlb_64 now has proper memory barriers around clearing
          flush_cpumask (no change in generated code)

* unexport flush_tlb_page from tlb_32.c, there's no user

* use unsigned int for cpu id

* drop unnecessary includes from tlb_64.c

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:06 +09:00
Tejun Heo
bdbcdd4888 x86: uv cleanup
Impact: cleanup

Make the following uv related cleanups.

* collect visible uv related definitions and interfaces into uv/uv.h
  and use it.  this cleans up the messy situation where on 64bit, uv
  is defined properly, on 32bit generic it's dummy and on the rest
  undefined.  after this clean up, uv is defined on 64 and dummy on
  32.

* update uv_flush_tlb_others() such that it takes cpumask of
  to-be-flushed cpus as argument, instead of that minus self, and
  returns yet-to-be-flushed cpumask, instead of modifying the passed
  in parameter.  this interface change will ease dummy implementation
  of uv_flush_tlb_others() and makes uv tlb flush related stuff
  defined in tlb_uv proper.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:06 +09:00
Brian Gerst
d650a51485 x86: merge irq_regs.h
Impact: cleanup, better irq_regs code generation for x86_64

Make 64-bit use the same optimizations as 32-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:06 +09:00
Brian Gerst
6826c8ff07 x86: merge mmu_context.h
Impact: cleanup

tj: * changed cpu to unsigned as was done on mmu_context_64.h as cpu
      id is officially unsigned int
    * added missing ';' to 32bit version of deactivate_mm()

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:06 +09:00
Brian Gerst
0dd76d736e x86: set %fs to __KERNEL_PERCPU unconditionally for x86_32
Impact: cleanup

%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus.  Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:05 +09:00
Brian Gerst
299e26992a x86: fix percpu_write with 64-bit constants
Impact: slightly better code generation for percpu_to_op()

The processor will sign-extend 32-bit immediate values in 64-bit
operations.  Use the 'e' constraint ("32-bit signed integer constant,
or a symbolic reference known to fit that range") for 64-bit constants.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:05 +09:00
Brian Gerst
06deef892c x86: clean up gdt_page definition
Impact: cleanup && more compact percpu area layout with future changes

Move 64-bit GDT to page-aligned section and clean up comment
formatting.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:05 +09:00
Tejun Heo
67e68bde02 x86: update canary handling during switch
Impact: cleanup

In switch_to(), instead of taking offset to irq_stack_union.stack,
make it a proper percpu access using __percpu_arg() and per_cpu_var().

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21 17:26:05 +09:00
Nick Piggin
92181f190b x86: optimise x86's do_page_fault (C entry point for the page fault path)
Impact: cleanup, restructure code to improve assembly

gcc isn't _all_ that smart about spilling registers to stack or reusing
stack slots, even with branch annotations. do_page_fault contained a lot
of functionality, so split unlikely paths into their own functions, and
mark them as noinline just to be sure. I consider this actually to be
somewhat of a cleanup too: the main function now contains about half
the number of lines so the normal path is easier to read, while the error
cases are also nicely split away.

Also, ensure the order of arguments to functions is always the same: regs,
addr, error_code. This can reduce code size a tiny bit, and just looks neater
too.

And add a couple of branch annotations.

Before:
  do_page_fault:
          subq    $360, %rsp      #,

After:
  do_page_fault:
          subq    $56, %rsp       #,

bloat-o-meter:
  add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
  function                                     old     new   delta
  __bad_area_nosemaphore                         -     506    +506
  no_context                                     -     474    +474
  vmalloc_fault                                  -     424    +424
  spurious_fault                                 -     358    +358
  mm_fault_error                                 -     272    +272
  bad_area_access_error                          -      89     +89
  bad_area                                       -      89     +89
  bad_area_nosemaphore                           -      10     +10
  do_page_fault                               2464     784   -1680

Yes, the total size increases by 542 bytes, due to the extra function calls.
But these will very rarely be called (except for vmalloc_fault) in a normal
workload. Importantly, do_page_fault is less than 1/3rd it's original size,
and touches far less stack.

Existing gotos and branch hints did move a lot of the infrequently used text
out of the fastpath, but that's even further improved after this patch.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-20 13:14:23 +01:00
Ingo Molnar
0ce1c38368 Merge commit 'v2.6.29-rc2' into x86/mm 2009-01-20 09:23:28 +01:00
Ingo Molnar
5766b842b2 x86, cpumask: fix tlb flush race
Impact: fix bootup crash

The cpumask is now passed in as a reference to mm->cpu_vm_mask, not on
the stack - hence it is not constant anymore during the TLB flush.

That way it could race and some static sanity checks would trigger:

[  238.154287] ------------[ cut here ]------------
[  238.156039] kernel BUG at arch/x86/kernel/tlb_32.c:130!
[  238.156039] invalid opcode: 0000 [#1] SMP
[  238.156039] last sysfs file: /sys/class/net/eth2/address
[  238.156039] Modules linked in:
[  238.156039]
[  238.156039] Pid: 6493, comm: ifup-eth Not tainted (2.6.29-rc2-tip #1) P4DC6
[  238.156039] EIP: 0060:[<c0118f87>] EFLAGS: 00010202 CPU: 2
[  238.156039] EIP is at native_flush_tlb_others+0x35/0x158
[  238.156039] EAX: c0ef972c EBX: f6143301 ECX: 00000000 EDX: 00000000
[  238.156039] ESI: f61433a8 EDI: f6143200 EBP: f34f3e00 ESP: f34f3df0
[  238.156039]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  238.156039] Process ifup-eth (pid: 6493, ti=f34f2000 task=f399ab00 task.ti=f34f2000)
[  238.156039] Stack:
[  238.156039]  ffffffff f61433a8 ffffffff f6143200 f34f3e18 c0118e9c 00000000 f6143200
[  238.156039]  f61433a8 f5bec738 f34f3e28 c0119435 c2b5b830 f6143200 f34f3e34 c01c2dc3
[  238.156039]  bffd9000 f34f3e60 c01c3051 00000000 ffffffff f34f3e4c 00000000 00000071
[  238.156039] Call Trace:
[  238.156039]  [<c0118e9c>] ? flush_tlb_others+0x52/0x5b
[  238.156039]  [<c0119435>] ? flush_tlb_mm+0x7f/0x8b
[  238.156039]  [<c01c2dc3>] ? tlb_finish_mmu+0x2d/0x55
[  238.156039]  [<c01c3051>] ? exit_mmap+0x124/0x170
[  238.156039]  [<c013e965>] ? mmput+0x40/0xf5
[  238.156039]  [<c01e4788>] ? flush_old_exec+0x640/0x94b
[  238.156039]  [<c01ddb4e>] ? fsnotify_access+0x37/0x39
[  238.156039]  [<c01e3435>] ? kernel_read+0x39/0x4b
[  238.156039]  [<c021bc8a>] ? load_elf_binary+0x4a1/0x11bb
[  238.156039]  [<c01c0af9>] ? might_fault+0x51/0x9c
[  238.156039]  [<c010a2cc>] ? paravirt_read_tsc+0x20/0x4f
[  238.156039]  [<c010a406>] ? native_sched_clock+0x5d/0x60
[  238.156039]  [<c01e2fda>] ? search_binary_handler+0xab/0x2c4
[  238.156039]  [<c021b7e9>] ? load_elf_binary+0x0/0x11bb
[  238.156039]  [<c04ae9a5>] ? _raw_read_unlock+0x21/0x46
[  238.156039]  [<c021b7e9>] ? load_elf_binary+0x0/0x11bb
[  238.156039]  [<c01e2fe1>] ? search_binary_handler+0xb2/0x2c4
[  238.156039]  [<c01e4076>] ? do_execve+0x21c/0x2ee
[  238.156039]  [<c01029b7>] ? sys_execve+0x51/0x8c
[  238.156039]  [<c0103eaf>] ? sysenter_do_call+0x12/0x43

Fix it by not assuming that the cpumask is constant.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-20 09:13:15 +01:00
Ingo Molnar
8f5d36ed5b Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu 2009-01-20 08:23:45 +01:00
Brian Gerst
0d974d4592 x86: remove pda.h
Impact: cleanup

Signed-off-by: Brian Gerst <brgerst@gmail.com>
2009-01-20 12:29:20 +09:00
Brian Gerst
947e76cdc3 x86: move stack_canary into irq_stack
Impact: x86_64 percpu area layout change, irq_stack now at the beginning

Now that the PDA is empty except for the stack canary, it can be removed.
The irqstack is moved to the start of the per-cpu section.  If the stack
protector is enabled, the canary overlaps the bottom 48 bytes of the irqstack.

tj: * updated subject
    * dropped asm relocation of irq_stack_ptr
    * updated comments a bit
    * rebased on top of stack canary changes

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-20 12:29:20 +09:00
Brian Gerst
8c7e58e690 x86: rework __per_cpu_load adjustments
Impact: cleanup

Use cpu_number to determine if the adjustment is necessary.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-20 12:29:20 +09:00
Brian Gerst
8ce031972b x86: remove pda_init()
Impact: cleanup

Copy the code to cpu_init() to satisfy the requirement that the cpu
be reinitialized.  Remove all other calls, since the segments are
already initialized in head_64.S.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-20 12:29:19 +09:00