Commit Graph

687 Commits

Author SHA1 Message Date
Masami Hiramatsu
311ac88fd2 [PATCH] x86: kprobes-booster
Current kprobe copies the original instruction at the probe point and replaces
it with a breakpoint instruction (int3).  When the kernel hits the probe
point, kprobe handler is invoked.  And the copied instruction is single-step
executed on the copied buffer (not on the original address) by kprobe.  After
that, the kprobe checks registers and modify it (if need) as if the
instructions was executed on the original address.

My proposal is based on the fact there are many instructions which do NOT
require the register modification after the single-step execution.  When the
copied instruction is a kind of them, kprobe just jumps back to the next
instruction after single-step execution.  If so, why don't we execute those
instructions directly?

With kprobe-booster patch, kprobes will execute a copied instruction directly
and (if need) jump back to original code.  This direct execution is executed
when the kprobe don't have both post_handler and break_handler, and the copied
instruction can be executed directly.

I sorted instructions which can be executed directly or not;

- Call instructions are NG(can not be executed directly).
  We should correct the return address pushed into top of stack.
- Indirect instructions except for absolute indirect-jumps
  are NG. Those instructions changes EIP randomly. We should
  check EIP and correct it.
- Instructions that change EIP beyond the range of the
  instruction buffer are NG.
- Instructions that change EIP to tail 5 bytes of the
  instruction buffer (it is the size of a jump instruction).
  We must write a jump instruction which backs to original
  kernel code in the instruction buffer.
- Break point instruction is NG. We should not touch EIP and
  pass to other handlers.
- Absolute direct/indirect jumps are OK.- Conditional Jumps are NG.
- Halt and software-interruptions are NG. Because it will stay on
  the instruction buffer of kprobes.
- Prefixes are NG.
- Unknown/reserved opcode is NG.
- Other 1 byte instructions are OK. But those instructions need a
  jump back code.
- 2 bytes instructions are mapped sparsely. So, in this release,
  this patch don't boost those instructions.

>From Intel's IA-32 opcode map described in IA-32 Intel Architecture Software
Developer's Manual Vol.2 B, I determined that following opcodes are not
boostable.

- 0FH (2byte escape)
- 70H - 7FH (Jump on condition)
- 9AH (Call) and 9CH (Pushf)
- C0H-C1H (Grp 2: includes reserved opcode)
- C6H-C7H (Grp11: includes reserved opcode)
- CCH-CEH (Software-interrupt)
- D0H-D3H (Grp2: includes reserved opcode)
- D6H (Reserved)
- D8H-DFH (Coprocessor)
- E0H-E3H (loop/conditional jump)
- E8H (Call)
- F0H-F3H (Prefixes and reserved)
- F4H (Halt)
- F6H-F7H (Grp3: includes reserved opcode)
- FEH-FFH(Grp4,5: includes reserved opcode)

Kprobe-booster checks whether target instruction can be boosted (can be
executed directly) at arch_copy_kprobe() function.  If the target instruction
can be boosted, it clears "boostable" flag.  If not, it sets "boostable" flag
-1.  This is disabled status.  In resume_execution() function, If "boostable"
flag is cleared, kprobe-booster measures the size of the target instruction
and sets "boostable" flag 1.

In kprobe_handler(), kprobe checks the "boostable" flag.  If the flag is 1, it
resets current kprobe and executes instruction buffer directly instead of
single stepping.

When unregistering a boosted kprobe, it calls synchronize_sched()
after "int3" is removed. So we can ensure followings after
the synchronize_sched() called.
- interrupt handlers are finished on all CPUs.
- instruction buffer is not executed on all CPUs.
And we can release the boosted kprobe safely.

And also, on preemptible kernel, the booster is not enabled where the kernel
preemption is enabled.  So, there are no preempted threads on the instruction
buffer.

The description of kretprobe-booster:
====================================

In the normal operation, kretprobe make a target function return to trampoline
code.  And a kprobe (called trampoline_probe) have been inserted at the
trampoline code.  When the kernel hits this kprobe, it calls kretprobe's
handler and it returns to original return address.

Kretprobe-booster patch removes the trampoline_probe.  It allows the
trampoline code to call kretprobe's handler directly instead of invoking
kprobe.  And tranpoline code returns to original return address.

This new trampoline code stores and restores registers, so the kretprobe
handler is still able to access those registers.

Current kprobe has about 1.3 usec/probe(*) overhead, and kprobe-booster patch
reduces it to 0.6 usec/probe(*).  Also current kretprobe has about 2.0
usec/probe(*) overhead.  Kprobe-booster patch reduces it to 1.3 usec/probe(*),
and the combination of both kprobe-booster patch and kretprobe-booster patch
reduces it to 0.9 usec/probe(*).

I expect the combination of both patches can reduce half of a probing
overhead.

Performance numbers strongly depend on the processor model.

Andrew Morton wrote:
> These preempt tricks look rather nasty.  Can you please describe what the
> problem is, precisely?  And how this code avoids it?  Perhaps we can find
> something cleaner.

The problem is how to remove the copied instructions of the
kprobe *safely* on the preemptable kernel (CONFIG_PREEMPT=y).

Kprobes basically executes the following actions;

(1)int3
(2)preempt_disable()
(3)kprobe_prehandler()
(4)copied instructioin(single step)
(5)kprobe_posthandler()
(6)preempt_enable()
(7)return to the original code

During the execution of copied instruction, preemption is
disabled (from step (2) to (6)).
When unregistering the probes, Kprobe waits for RCU
quiescent state by using synchronize_sched() after removing
int3 instruction.
Thus we can ensure the copied instruction is not executed.

On the other hand, kprobe-booster executes the following actions;

(1)int3
(2)preempt_disable()
(3)kprobe_prehandler()
(4)preempt_enable()             <-- this one is added by my patch
(5)copied instruction(direct execution)
(6)jmp back to the original code

The problem is that we have no way to prevent preemption on
step (5) or (6). We cannot call preempt_disable() after step (6),
because there are no rooms to do that. Thus, some other
processes may be preempted at step(5) or (6) on preemptable kernel.
And I couldn't find the easy way to ensure that other processes'
stack do *not* have the address of them. (I thought some way
to do that, but those are very costly.)

So currently, I simply boost the kprobe only when the probe
point is already preemption disabled.

> Also, the patch adds a preempt_enable() but I don't see a corresponding
> preempt_disable().  Am I missing something?

It is corresponding to the preempt_disable() in the top of
kprobe_handler().
I copied the code of kprobe_handler() here:

static int __kprobes kprobe_handler(struct pt_regs *regs)
{
        struct kprobe *p;
        int ret = 0;
        kprobe_opcode_t *addr = NULL;
        unsigned long *lp;
        struct kprobe_ctlblk *kcb;

        /*
         * We don't want to be preempted for the entire
         * duration of kprobe processing
         */
        preempt_disable();             <-- HERE
        kcb = get_kprobe_ctlblk();

Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:04 -08:00
Masami Hiramatsu
b50ea74c7b [PATCH] kprobes: clean up resume_execute()
Clean up kprobe's resume_execute() for i386 arch.

Signed-off-by: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:57:03 -08:00
Darren Jenkins
d6d21dfdd3 [PATCH] fix array overrun in efi.c
Coverity found an over-run @ line 364 of efi.c

This is due to the loop checking the size correctly, then adding a '\0'
after possibly hitting the end of the array.

Ensure the loop exits with one space left in the array.

Signed-off-by: Darren Jenkins <darrenrjenkins@gmail.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:57 -08:00
Ingo Molnar
14cc3e2b63 [PATCH] sem2mutex: misc static one-file mutexes
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jens Axboe <axboe@suse.de>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:55 -08:00
Tolentino, Matthew E
23dd842c00 [PATCH] EFI fixes
Here's a patch that fixes EFI boot for x86 on 2.6.16-rc5-mm3.  The
off-by-one is admittedly my fault, but the other two fix up the rest.

Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Bjorn Helgaas
b2c99e3c70 [PATCH] EFI: keep physical table addresses in efi structure
Almost all users of the table addresses from the EFI system table want
physical addresses.  So rather than doing the pa->va->pa conversion, just keep
physical addresses in struct efi.

This fixes a DMI bug: the efi structure contained the physical SMBIOS address
on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
on a virtual address on ia64.

This is essentially the same as an earlier patch by Matt Tolentino:
	http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
except that this changes all table addresses, not just ACPI addresses.

Matt's original patch was backed out because it caused MCAs on HP sx1000
systems.  That problem is resolved by the ioremap() attribute checking added
for ia64.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Bjorn Helgaas
27d8e3d15b [PATCH] DMI: only ioremap stuff we actually need
dmi_scan_machine() tries to ioremap 0x10000 (64K) bytes, even though it only
looks at the first 32 bytes or so.  If the SMBIOS table is near the end of a
memory region, the ioremap() may fail when it shouldn't.

This is in the efi_enabled path, so it really only affects ia64 at the moment.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Matt Domsch
3ed3bce846 [PATCH] ia64: use i386 dmi_scan.c
Enable DMI table parsing on ia64.

Andi Kleen has a patch in his x86_64 tree which enables the use of i386
dmi_scan.c on x86_64.  dmi_scan.c functions are being used by the
drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
memory spaces where the IPMI controllers may be found.

This patch adds equivalent changes for ia64 as to what is in the x86_64
tree.  In addition, I reworked the DMI detection, such that on EFI-capable
systems, it uses the efi.smbios pointer to find the table, rather than
brute-force searching from 0xF0000.  On non-EFI systems, it continues the
brute-force search.

My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
does it have an ACPI SPMI table.  Also note, currently shipping Dell x8xx
EM64T servers don't have these either, so DMI is the only method for
obtaining the address of the IPMI controller.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Vivek Goyal
10dbe196a8 [PATCH] i386: export: memory more than 4G through /proc/iomem
Currently /proc/iomem exports physical memory also apart from io device
memory.  But on i386, it truncates any memory more than 4GB.  This leads to
problems for kexec/kdump.

Kexec reads /proc/iomem to determine the system memory layout and prepares a
memory map based on that and passes it to the kernel being kexeced.  Given the
fact that memory more than 4GB has been truncated, new kernel never gets to
see and use that memory.

Kdump also reads /proc/iomem to determine the physical memory layout of the
system and encodes this informaiton in ELF headers.  After a crash new kernel
parses these ELF headers being used by previous kernel and vmcore is prepared
accordingly.  As memory more than 4GB has been truncated, kdump never sees
that memory and never prepares ELF headers for it.  Hence vmcore is truncated
and limited to 4GB even if there is more physical memory in the system.

This patch exports memory more than 4GB through /proc/iomem on i386.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:54 -08:00
Jan Beulich
20c0d2d440 [PATCH] i386: pass proper trap numbers to die chain handlers
Pass the trap number causing the call to notify_die() to the die
notification handler chain in a number of instances.  Also, honor the
return value from the handler chain invocation in die() as, through a
debugger, the fault may have been fixed.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-By: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
H. Peter Anvin
841b8a46bf [PATCH] x86: "make isoimage" support; FDINITRD= support; minor cleanups
Add a "make isoimage" to i386 and x86-64, which allows the automatic
creation of a bootable CD image.  It also adds an option FDINITRD= to
include an initrd of the user's choice in generated floppy- or CD boot
images.  Finally, some minor cleanups of the image generation code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-26 08:56:53 -08:00
Linus Torvalds
1b9a391736 Merge branch 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current
* 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits)
  [PATCH] fix audit_init failure path
  [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format
  [PATCH] sem2mutex: audit_netlink_sem
  [PATCH] simplify audit_free() locking
  [PATCH] Fix audit operators
  [PATCH] promiscuous mode
  [PATCH] Add tty to syscall audit records
  [PATCH] add/remove rule update
  [PATCH] audit string fields interface + consumer
  [PATCH] SE Linux audit events
  [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c
  [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL
  [PATCH] Fix IA64 success/failure indication in syscall auditing.
  [PATCH] Miscellaneous bug and warning fixes
  [PATCH] Capture selinux subject/object context information.
  [PATCH] Exclude messages by message type
  [PATCH] Collect more inode information during syscall processing.
  [PATCH] Pass dentry, not just name, in fsnotify creation hooks.
  [PATCH] Define new range of userspace messages.
  [PATCH] Filter rule comparators
  ...

Fixed trivial conflict in security/selinux/hooks.c
2006-03-25 09:24:53 -08:00
Andi Kleen
ad90573f93 [PATCH] x86_64: Initialize powernow_data[] for all siblings
I got an oops on a dual core system because the lost tick handler
called cpufreq_get() on core 1 and powernow tried to follow
a NULL powernow_data[] pointer there.

Initialize powernow_data for all cores of a CPU.

Cc: Jacob Shin <jacob.shin@amd.com>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:14:39 -08:00
Andi Kleen
9d95dd849c [PATCH] i386/x86-64: List Intel LaGrange AKA SMX in /proc/cpuinfo
Spec just got published so we know the CPUID bit.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:57 -08:00
Andi Kleen
2ab7f1833b [PATCH] x86_64: Quieten down microcode update driver
Only log data in microcode driver when something is changed Otherwise it
was far too noisy on large systems.

Also remove the printk when it is unloaded.

Cc: tigran@veritas.com

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:56 -08:00
Andi Kleen
f2d3efedbe [PATCH] x86_64: Implement early DMI scanning
There are more and more cases where we need to know DMI information
early to work around bugs.  i386 already had early DMI scanning, but
x86-64 didn't.  Implement this now.

This required some cleanup in the i386 code.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:55 -08:00
Andi Kleen
f083a329e6 [PATCH] x86_64: Clean up and tweak ACPI blacklist year code
- Move the core parser into dmi_scan.c.  It can be useful for other
   subsystems too.
 - Differentiate between field doesn't exist and field is 0 or
   unparseable.  The first case is likely an old BIOS with broken ACPI,
   the later is likely a slightly buggy BIOS where someone forget to
   edit the date.  Don't blacklist in the later case.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:54 -08:00
Andi Kleen
6edfba1b33 [PATCH] x86_64: Don't define string functions to builtin
gcc should handle this anyways, and it causes problems when
sprintf is turned into strcpy by gcc behind our backs and
the C fallback version of strcpy is actually defining __builtin_strcpy

Then drop -ffreestanding from the main Makefile because it isn't
needed anymore and implies -fno-builtin, which is wrong now.
(it was only added for x86-64, so dropping it should be safe)

Noticed by Roman Zippel

Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:53 -08:00
Andi Kleen
dca99a38bc [PATCH] x86-64: Use -mtune=generic for generic kernels
The upcomming gcc 4.2 got a new option -mtune=generic to tune
code for both common AMD and Intel CPUs. Use this option
when available for generic kernels.

On x86-64 it is used with CONFIG_GENERIC_CPU. On i386 it is
enabled with CONFIG_X86_GENERIC.  It won't affect the base
line CPU support in any ways and also not the minimum supported CPU.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 09:10:52 -08:00
Linus Torvalds
be9bf30c73 Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] kzalloc conversion for gx-suspmod
  [CPUFREQ] Whitespace cleanup
  [CPUFREQ] Mark longhaul driver as broken.
  [PATCH] cpufreq: fix section mismatch warnings
  [CPUFREQ] Fix the p4-clockmod N60 errata workaround.
  [CPUFREQ] Fix handling for CPU hotplug
  [CPUFREQ] powernow-k8: Let cpufreq driver handle affected CPUs
  [CPUFREQ] Lots of whitespace & CodingStyle cleanup.
  [CPUFREQ] Remove duplicate cpuinfo struct
  [CPUFREQ] Silence powernow-k8 warning on k7's.
2006-03-25 08:52:23 -08:00
Linus Torvalds
2e1ca21d46 Merge master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild
* master.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild: (46 commits)
  kbuild: remove obsoleted scripts/reference_* files
  kbuild: fix make help & make *pkg
  kconfig: fix time ordering of writes to .kconfig.d and include/linux/autoconf.h
  Kconfig: remove the CONFIG_CC_ALIGN_* options
  kbuild: add -fverbose-asm to i386 Makefile
  kbuild: clean-up genksyms
  kbuild: Lindent genksyms.c
  kbuild: fix genksyms build error
  kbuild: in makefile.txt note that Makefile is preferred name for kbuild files
  kbuild: replace PHONY with FORCE
  kbuild: Fix bug in crc symbol generating of kernel and modules
  kbuild: change kbuild to not rely on incorrect GNU make behavior
  kbuild: when warning symbols exported twice now tell user this is the problem
  kbuild: fix make dir/file.xx when asm symlink is missing
  kbuild: in the section mismatch check try harder to find symbols
  kbuild: fix section mismatch check for unwind on IA64
  kbuild: kill false positives from section mismatch warnings for powerpc
  kbuild: kill trailing whitespace in modpost & friends
  kbuild: small update of allnoconfig description
  kbuild: make namespace.pl CROSS_COMPILE happy
  ...

Trivial conflict in arch/ppc/boot/Makefile manually fixed up
2006-03-25 08:48:48 -08:00
Andrew Morton
f081a529f8 [PATCH] cpufreq: speedstep-smi asm fix
Fix bug identified by Linus Torvalds <torvalds@osdl.org>: the `out'
instruction depends upon the state of memory_data[], so we need to tell gcc
that before executing it. (The opcode, not gcc).

Fixes http://bugzilla.kernel.org/show_bug.cgi?id=5553

Thanks to Antonio Ospite <ospite@studenti.unina.it> for testing.

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:42:45 -08:00
Ashok Raj
34f361ade2 [PATCH] Check if cpu can be onlined before calling smp_prepare_cpu()
- Moved check for online cpu out of smp_prepare_cpu()

- Moved default declaration of smp_prepare_cpu() to kernel/cpu.c

- Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since
  its called from cpu_up() as well now.

- Removed clearing from cpu_present_map during cpu_offline as it breaks
  using cpu_up() directly during a subsequent online operation.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: "Li, Shaohua" <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:23:01 -08:00
Andrew Morton
4a2f0acf0f [PATCH] kconfig: clarify memory debug options
The Kconfig text for CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC have always
seemed a bit confusing.  Change them to:

CONFIG_DEBUG_SLAB: "Debug slab memory allocations"
CONFIG_DEBUG_PAGEALLOC: "Debug page memory allocations"

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:54 -08:00
Andrey Borzenkov
2c3ca07d2f [PATCH] Fix EDD to properly ignore signature of non-existing drives
Some BIOSes do not always set CF on error before return from int13.  The
patch adds additional check for status being zero (AH == 0).

Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:52 -08:00
Chen, Kenneth W
a7d06ca7b6 [PATCH] x86: HUGETLBFS and DEBUG_PAGEALLOC are incompatible
DEBUG_PAGEALLOC is not compatible with hugetlb page support.  That debug
option turns off PSE.  Once it is turned off in CR4, the cpu will ignore
pse bit in the pmd and causing infinite page-not- present faults.

So disable DEBUG_PAGEALLOC if the user selected hugetlbfs.

Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:50 -08:00
Ashok Raj
d3f4aaa3d7 [PATCH] x86: make CONFIG_HOTPLUG_CPU depend on !X86_PC
Make CONFIG_HOTPLUG_CPU depend on !X86_PC, so we need to turn on either
CONFIG_GENERICARCH, CONFIG_BIGSMP or any other subarch except X86_PC when
CONFIG_HOTPLUG_CPU=y

With 2.6.15+ kernels when CONFIG_HOTPLUG_CPU is turned on we switch to
bigsmp mode for sending IPI's and ioapic configurations that caused the
following error message.

>> More than 8 CPUs detected and CONFIG_X86_PC cannot handle it.
>> Use CONFIG_X86_GENERICARCH or CONFIG_X86_BIGSMP.

Originally bigsmp was added just to handle >8 cpus, but now with hotplug
cpu support we need to use bigsmp mode (why?  see below), that cause the
above error message even if there were less than 8 cpus in the system.

The message is bogus, but we are cannot use logical flat mode due to issues
with broadcast IPI can confuse a CPU just comming up.  We use flat physical
mode just like x86_64 case.  More details on why bigsmp now uses flat
physical mode (vs.  cluster mode) in following link.

http://marc.theaimsgroup.com/?l=linux-kernel&m=113261865814107&w=2

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:50 -08:00
Andrey Panin
bc83455bc8 [PATCH] fix DMI onboard device discovery
Attached patch fixes invalid pointer arithmetic in DMI code to make onboard
device discovery working again.

akpm: bug has been present since dmi_find_device() was added in 2.6.14.
Affects ipmi only (I think) - the symptoms weren't described.

akpm: changed to use pointer arithmetic rather than open-coded sizeof.

Signed-off-by: Andrey Panin <pazke@donpac.ru>
Cc: Corey Minyard <minyard@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-25 08:22:48 -08:00
Adrian Bunk
cdb0452789 [PATCH] kill include/linux/platform.h, default_idle() cleanup
include/linux/platform.h contained nothing that was actually used except
the default_idle() prototype, and is therefore removed by this patch.

This patch does the following with the platform specific default_idle()
functions on different architectures:
- remove the unused function:
  - parisc
  - sparc64
- make the needlessly global function static:
  - arm
  - h8300
  - m68k
  - m68knommu
  - s390
  - v850
  - x86_64
- add a prototype in asm/system.h:
  - cris
  - i386
  - ia64

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Patrick Mochel <mochel@digitalimplant.org>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:21 -08:00
Andrew Morton
a720115678 [PATCH] more-for_each_cpu-conversions fix
I screwed up this conversion - we should be iterating across online CPUs, not
possible ones.

Spotted by Joe Perches <joe@perches.com>

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24 07:33:15 -08:00
Bernhard Kaindl
8c4b2cf9af [PATCH] PCI: PCI/Cardbus cards hidden, needs pci=assign-busses to fix
"In some cases, especially on modern laptops with a lot of PCI and cardbus
bridges, we're unable to assign correct secondary/subordinate bus numbers
to all cardbus bridges due to BIOS limitations unless we are using
"pci=assign-busses" boot option." -- Ivan Kokshaysky (from a patch comment)

Without it, Cardbus cards inserted are never seen by PCI because the parent
PCI-PCI Bridge of the Cardbus bridge will not pass and translate Type 1 PCI
configuration cycles correctly and the system will fail to find and
initialise the PCI devices in the system.

Reference: PCI-PCI Bridges: PCI Configuration Cycles and PCI Bus Numbering:
http://www.science.unitn.it/~fiorella/guidelinux/tlk/node72.html

The reason for this is that:
 ``All PCI busses located behind a PCI-PCI bridge must reside between the
secondary bus number and the subordinate bus number (inclusive).''

"pci=assign-busses" makes pcibios_assign_all_busses return 1 and this
turns on PCI renumbering during PCI probing.

Alan suggested to use DMI automatically set assign-busses on problem systems.

The only question for me was where to put it.  I put it directly before
scanning PCI bus into pcibios_scan_root() because it's called from legacy,
acpi and numa and so it can be one place for all systems and configurations
which may need it.

AMD64 Laptops are also affected and fixed by assign-busses, and the code is
also incuded from arch/x86_64/pci/ that place will also work for x86_64
kernels, I only ifdef'-ed the x86-only Laptop in this example.

Affected and known or assumed to be fixed with it are (found by googling):

* ASUS Z71V and L3s
* Samsung X20
* Compaq R3140us and all Compaq R3000 series laptops with TI1620 Controller,
  also Compaq R4000 series (from a kernel.org bugreport)
* HP zv5000z (AMD64 3700+, known that fixup_parent_subordinate_busnr fixes it)
* HP zv5200z
* IBM ThinkPad 240
* An IBM ThinkPad (1.8 GHz Pentium M) debugged by Pavel Machek
  gives the correspondig message which detects the possible problem.
* MSI S260 / Medion SIM 2100 MD 95600

The patch also expands the "try pci=assign-busses" warning so testers will
help us to update the DMI table.

Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-23 14:35:14 -08:00
Linus Torvalds
b408cbc704 [PATCH] PCI: resource address mismatch
On Tue, 21 Feb 2006, Ivan Kokshaysky wrote:
> There are two bogus entries in the BIOS memory map table which are
> conflicting with a prefetchable memory range of the AGP bridge:
>
>  BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
>  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
>
> 0000:00:02.0 PCI bridge: Silicon Integrated Systems [SiS] Virtual PCI-to-PCI bridge (AGP) (prog-if 00 [Normal decode])
> 	Flags: bus master, fast devsel, latency 0
> 	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> 	I/O behind bridge: 0000c000-0000cfff
> 	Memory behind bridge: e7e00000-e7efffff
> 	Prefetchable memory behind bridge: fec00000-ffcfffff
> 					   ^^^^^^^^^^^^^^^^^

Yes. However, it's pretty clear that the e820 entries are there for a
reason. Probably they are a hack by the BIOS maintainers to keep Windows
from stomping/moving that region, exactly because they want to keep the
bridge where it is (or, it's actually for the BIOS itself - the BIOS
tables are a horrid mess, and BIOS engineers are pretty hacky people:
they'll add random entries to make their own broken algorithms do the
"right thing").

> Starting from 2.6.13, kernel tries to resolve that sort of conflicts,
> so that prefetch window of the bridge and the framebuffer memory behind
> it get moved to 0x10000000.

I think we could (and probably should) solve this another way: consider
the ACPI "reserved regions" from the e820 map exactly the same way that we
do other ACPI hints - they should restrict _new_ allocations, but not
impact stuff we figure out on our own.

Basically, right now we assign _unassigned_ resources at "fs_initcall"
time. If we were to add in the e820 "reserved region" stuff before that
(but after we've done PCI discovery), we'd probably do the right thing.

Right now we do the e820 reserved regions very early indeed: we call
"register_memory()" from setup_arch(). We could move at least part of it
(the part that registers the resources) down a bit.

Here's a test-patch. I'm not saying we should absolutely do this, but it
might be interesting to try...

Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: <bjk@luxsci.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-23 14:35:14 -08:00
Andi Kleen
92c05fc1a3 [PATCH] PCI: Give PCI config access initialization a defined ordering
I moved it to a separate function which is safer.

This avoids problems with the linker reordering them and the
less useful PCI config space access methods taking priority
over the better ones.

Fixes some problems with broken MMCONFIG

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-03-23 14:35:12 -08:00
Andrew Morton
394e3902c5 [PATCH] more for_each_cpu() conversions
When we stop allocating percpu memory for not-possible CPUs we must not touch
the percpu data for not-possible CPUs at all.  The correct way of doing this
is to test cpu_possible() or to use for_each_cpu().

This patch is a kernel-wide sweep of all instances of NR_CPUS.  I found very
few instances of this bug, if any.  But the patch converts lots of open-coded
test to use the preferred helper macros.

Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Kyle McMartin <kyle@parisc-linux.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Christian Zankel <chris@zankel.net>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Jens Axboe <axboe@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:17 -08:00
Andrew Morton
dd287796d6 [PATCH] pause_on_oops command line option
Attempt to fix the problem wherein people's oops reports scroll off the screen
due to repeated oopsing or to oopses on other CPUs.

If this happens the user can reboot with the `pause_on_oops=<seconds>' option.
It will allow the first oopsing CPU to print an oops record just a single
time.  Second oopsing attempts, or oopses on other CPUs will cause those CPUs
to enter a tight loop until the specified number of seconds have elapsed.

The patch implements the infrastructure generically in the expectation that
architectures other than x86 will find it useful.

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:16 -08:00
Ingo Molnar
91368d73e4 [PATCH] make bug messages more consistent
Consolidate all kernel bug printouts to begin with the "BUG: " string.
Makes it easier to find them in large bootup logs.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:16 -08:00
Ingo Molnar
7a7d1cf954 [PATCH] sem2mutex: kprobes
Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:12 -08:00
Rafael J. Wysocki
fc558a7496 [PATCH] swsusp: finally solve mysqld problem
This patch from Pavel moves userland freeze signals handling into more logical
place.  It now hits even with mysqld running.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:08 -08:00
Ashok Raj
bdaff4a331 [PATCH] x86 topology: don;t create a control file for BSP that cannot be removed
Don't create "online" control file for BSP (i386/x86_64) since its
not removable.

We originally added this to support ppc64 if the kernel has support but
BIOS indicated no offline support, we just didnt create online files for
them.

We used the same method in ia64 as well, if we have a cpu taking platform
interrupts but cannot be removed if those interrupts cannot be re-targeted
to another cpu.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:07 -08:00
Adrian Bunk
905c399594 [PATCH] x86: some fixups for the X86_NUMAQ dependencies
You must always ensure to fulfill the dependencies of what you are
select'ing.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Shaohua Li
7c5c1e427b [PATCH] x86: deterine xapic using apic version
Checking APIC version instead of CPU family to determine XAPIC. Family 6
CPU could have xapic as well.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Shaohua Li
f2d0d263b5 [PATCH] x86: cpuid.4 doesn't need cpu level 5
Detecting cache line using cpuid.4, cpuid level 4 is enough.

Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Chuck Ebbert
75874d5cc8 [PATCH] i386: fix dump_stack()
i386 has a small bug in the stack dump code where it prints an extra log
level code.  Remove that and fix the alignment of normal stack dump
printout.  Also remove some unnecessary printk() calls.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Jan Beulich
4ef0652a74 [PATCH] i386: cleanup after cpu_gdt_descr conversion to per-cpu data
With cpu_gdt_descr having been converted to per-CPU data, the old object
(in head.S) no longer needs to reserve space for each CPU's instance.  With
cpu_gdt_table not being used for CPU 0 anymore, it doesn't seem to need
page alignment (or if in fact there is a need for it to retain that
alignment, the whole object should go into .data.page_align).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Jesper Juhl
52f4a91afd [PATCH] Fix the imlicit declaration of mtrr_centaur_report_mcr in arch/i386/kernel/cpu/centaur.c
arch/i386/kernel/cpu/centaur.c: In function `centaur_mcr_insert':
arch/i386/kernel/cpu/centaur.c:33: warning: implicit declaration of function `mtrr_centaur_report_mcr'

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Jesper Juhl
382dbd07c9 [PATCH] fix implicit declaration of GET_APIC_ID in arch/i386/kernel/apic.c
arch/i386/kernel/apic.c:840: warning: implicit declaration of function `GET_APIC_ID'

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Chuck Ebbert
be0a39120c [PATCH] i386: more vsyscall documentation
Document a limitation of vsyscall-sysenter, since patches to fix it have
been rejected.

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:06 -08:00
Chuck Ebbert
635cf99a80 [PATCH] i386: fix singlestep through an int80 syscall
Using PTRACE_SINGLESTEP on a child that does an int80 syscall misses the
SIGTRAP that should be delivered upon syscall exit.  Fix that by setting
TIF_SINGLESTEP when entering the kernel via int80 with TF set.

/* Test whether singlestep through an int80 syscall works.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/user.h>

static int child, status;
static struct user_regs_struct regs;

static void do_child()
{
	ptrace(PTRACE_TRACEME, 0, 0, 0);
	kill(getpid(), SIGUSR1);
	asm ("int $0x80" : : "a" (20)); /* getpid */
}

static void do_parent()
{
	unsigned long eip, expected = 0;
again:
	waitpid(child, &status, 0);
	if (WIFEXITED(status) || WIFSIGNALED(status))
		return;

	if (WIFSTOPPED(status)) {
		ptrace(PTRACE_GETREGS, child, 0, &regs);
		eip = regs.eip;
		if (expected)
			fprintf(stderr, "child stop @ %08x, expected %08x %s\n",
					eip, expected,
					eip == expected ? "" : " <== ERROR");

		if (*(unsigned short *)eip == 0x80cd) {
			fprintf(stderr, "int 0x80 at %08x\n", (unsigned int)eip);
			expected = eip + 2;
		} else
			expected = 0;

		ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
	}
	goto again;
}

int main(int argc, char * const argv[])
{
	child = fork();
	if (child)
		do_parent();
	else
		do_child();
	return 0;
}

Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00
Jan Beulich
db753bdfc2 [PATCH] i386: fix uses of user_mode() vs. user_mode_vm()
>commit 76381fee7e
>Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
>Date:   Thu Jun 23 00:08:46 2005 -0700
>
>    [PATCH] xen: x86_64: use more usermode macro
>
>    Make use of the user_mode macro where it's possible.  This is useful for Xen
>    because it will need only to redefine only the macro to a hypervisor call.

I am of the opinion that the above changeset is incomplete, i.e.  it missed
converting some previous uses of user_mode to user_mode_vm.  While most of
them could be considered just cosmetical, at least the one in die_nmi
doesn't appear to be.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00
Jan Beulich
101f12af16 [PATCH] i386: actively synchronize vmalloc area when registering certain callbacks
Registering a callback handler through register_die_notifier() is obviously
primarily intended for use by modules.  However, the way these currently
get called it is basically impossible for them to actually be used by
modules, as there is, on non-PAE configurationes, a good chance (the larger
the module, the better) for the system to crash as a result.

This is because the callback gets invoked

(a) in the page fault path before the top level page table propagation
    gets carried out (hence a fault to propagate the top level page table
    entry/entries mapping to module's code/data would nest infinitly) and

(b) in the NMI path, where nested faults must absolutely not happen,
    since otherwise the IRET from the nested fault re-enables NMIs,
    potentially resulting in nested NMI occurences.

Besides the modular aspect, similar problems would even arise for in-
kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
memory inside their handlers.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-23 07:38:05 -08:00