linux

Author	SHA1	Message	Date
Peter Zijlstra	a866374aec	[PATCH] mm: pagefault_{disable,enable}() Introduce pagefault_{disable,enable}() and use these where previously we did manual preempt increments/decrements to make the pagefault handler do the atomic thing. Currently they still rely on the increased preempt count, but do not rely on the disabled preemption, this might go away in the future. (NOTE: the extra barrier() in pagefault_disable might fix some holes on machines which have too many registers for their own good) [heiko.carstens@de.ibm.com: s390 fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Chen, Kenneth W	39dde65c99	[PATCH] shared page table for hugetlb page Following up with the work on shared page table done by Dave McCracken. This set of patch target shared page table for hugetlb memory only. The shared page table is particular useful in the situation of large number of independent processes sharing large shared memory segments. In the normal page case, the amount of memory saved from process' page table is quite significant. For hugetlb, the saving on page table memory is not the primary objective (as hugetlb itself already cuts down page table overhead significantly), instead, the purpose of using shared page table on hugetlb is to allow faster TLB refill and smaller cache pollution upon TLB miss. With PT sharing, pte entries are shared among hundreds of processes, the cache consumption used by all the page table is smaller and in return, application gets much higher cache hit ratio. One other effect is that cache hit ratio with hardware page walker hitting on pte in cache will be higher and this helps to reduce tlb miss latency. These two effects contribute to higher application performance. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Dave McCracken <dmccr@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Nick Piggin	cc10250907	[PATCH] mm: add arch_alloc_page Add an arch_alloc_page to match arch_free_page. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Ashwin Chaugule	7602bdf2fd	[PATCH] new scheme to preempt swap token The new swap token patches replace the current token traversal algo. The old algo had a crude timeout parameter that was used to handover the token from one task to another. This algo, transfers the token to the tasks that are in need of the token. The urgency for the token is based on the number of times a task is required to swap-in pages. Accordingly, the priority of a task is incremented if it has been badly affected due to swap-outs. To ensure that the token doesnt bounce around rapidly, the token holders are given a priority boost. The priority of tasks is also decremented, if their rate of swap-in's keeps reducing. This way, the condition to check whether to pre-empt the swap token, is a matter of comparing two task's priority fields. [akpm@osdl.org: cleanups] Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:21 -08:00
Paul Jackson	7253f4ef04	[PATCH] memory page_alloc zonelist caching reorder structure Rearrange the struct members in the 'struct zonelist_cache' structure, so as to put the readonly (once initialized) z_to_n[] array first, where it will come right after the zones[] array in struct zonelist. This pretty much eliminates the chance that the two frequently written elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap times, will end up on the same cache line as the performance sensitive, frequently read, never (after init) written zones[] array. Keeping frequently written data off frequently read cache lines is good for performance. Thanks to Rohit Seth for the suggestion. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Paul Jackson	9276b1bc96	[PATCH] memory page_alloc zonelist caching speedup Optimize the critical zonelist scanning for free pages in the kernel memory allocator by caching the zones that were found to be full recently, and skipping them. Remembers the zones in a zonelist that were short of free memory in the last second. And it stashes a zone-to-node table in the zonelist struct, to optimize that conversion (minimize its cache footprint.) Recent changes: This differs in a significant way from a similar patch that I posted a week ago. Now, instead of having a nodemask_t of recently full nodes, I have a bitmask of recently full zones. This solves a problem that last weeks patch had, which on systems with multiple zones per node (such as DMA zone) would take seeing any of these zones full as meaning that all zones on that node were full. Also I changed names - from "zonelist faster" to "zonelist cache", as that seemed to better convey what we're doing here - caching some of the key zonelist state (for faster access.) See below for some performance benchmark results. After all that discussion with David on why I didn't need them, I went and got some ;). I wanted to verify that I had not hurt the normal case of memory allocation noticeably. At least for my one little microbenchmark, I found (1) the normal case wasn't affected, and (2) workloads that forced scanning across multiple nodes for memory improved up to 10% fewer System CPU cycles and lower elapsed clock time ('sys' and 'real'). Good. See details, below. I didn't have the logic in get_page_from_freelist() for various full nodes and zone reclaim failures correct. That should be fixed up now - notice the new goto labels zonelist_scan, this_zone_full, and try_next_zone, in get_page_from_freelist(). There are two reasons I persued this alternative, over some earlier proposals that would have focused on optimizing the fake numa emulation case by caching the last useful zone: 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems) have seen real customer loads where the cost to scan the zonelist was a problem, due to many nodes being full of memory before we got to a node we could use. Or at least, I think we have. This was related to me by another engineer, based on experiences from some time past. So this is not guaranteed. Most likely, though. The following approach should help such real numa systems just as much as it helps fake numa systems, or any combination thereof. 2) The effort to distinguish fake from real numa, using node_distance, so that we could cache a fake numa node and optimize choosing it over equivalent distance fake nodes, while continuing to properly scan all real nodes in distance order, was going to require a nasty blob of zonelist and node distance munging. The following approach has no new dependency on node distances or zone sorting. See comment in the patch below for a description of what it actually does. Technical details of note (or controversy): - See the use of "zlc_active" and "did_zlc_setup" below, to delay adding any work for this new mechanism until we've looked at the first zone in zonelist. I figured the odds of the first zone having the memory we needed were high enough that we should just look there, first, then get fancy only if we need to keep looking. - Some odd hackery was needed to add items to struct zonelist, while not tripping up the custom zonelists built by the mm/mempolicy.c code for MPOL_BIND. My usual wordy comments below explain this. Search for "MPOL_BIND". - Some per-node data in the struct zonelist is now modified frequently, with no locking. Multiple CPU cores on a node could hit and mangle this data. The theory is that this is just performance hint data, and the memory allocator will work just fine despite any such mangling. The fields at risk are the struct 'zonelist_cache' fields 'fullzones' (a bitmask) and 'last_full_zap' (unsigned long jiffies). It should all be self correcting after at most a one second delay. - This still does a linear scan of the same lengths as before. All I've optimized is making the scan faster, not algorithmically shorter. It is now able to scan a compact array of 'unsigned short' in the case of many full nodes, so one cache line should cover quite a few nodes, rather than each node hitting another one or two new and distinct cache lines. - If both Andi and Nick don't find this too complicated, I will be (pleasantly) flabbergasted. - I removed the comment claiming we only use one cachline's worth of zonelist. We seem, at least in the fake numa case, to have put the lie to that claim. - I pay no attention to the various watermarks and such in this performance hint. A node could be marked full for one watermark, and then skipped over when searching for a page using a different watermark. I think that's actually quite ok, as it will tend to slightly increase the spreading of memory over other nodes, away from a memory stressed node. =============== Performance - some benchmark results and analysis: This benchmark runs a memory hog program that uses multiple threads to touch alot of memory as quickly as it can. Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of the total 96 GBytes on the system, and using 1, 19, 37, or 55 threads (on a 56 CPU system.) System, user and real (elapsed) timings were recorded for each run, shown in units of seconds, in the table below. Two kernels were tested - 2.6.18-mm3 and the same kernel with this zonelist caching patch added. The table also shows the percentage improvement the zonelist caching sys time is over (lower than) the stock -mm kernel. number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent GBs N ------------ -------------- ---------------- systime mem threads sys user real sys user real sys user real better 12 1 153 24 177 151 24 176 -2 0 -1 1% 12 19 99 22 8 99 22 8 0 0 0 0% 12 37 111 25 6 112 25 6 1 0 0 -0% 12 55 115 25 5 110 23 5 -5 -2 0 4% 38 1 502 74 576 497 73 570 -5 -1 -6 0% 38 19 426 78 48 373 76 39 -53 -2 -9 12% 38 37 544 83 36 547 82 36 3 -1 0 -0% 38 55 501 77 23 511 80 24 10 3 1 -1% 64 1 917 125 1042 890 124 1014 -27 -1 -28 2% 64 19 1118 138 119 965 141 103 -153 3 -16 13% 64 37 1202 151 94 1136 150 81 -66 -1 -13 5% 64 55 1118 141 61 1072 140 58 -46 -1 -3 4% 90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4% 90 19 2392 199 192 2116 189 176 -276 -10 -16 11% 90 37 3313 238 175 2972 225 145 -341 -13 -30 10% 90 55 1948 210 104 1843 213 100 -105 3 -4 5% Notes: 1) This test ran a memory hog program that started a specified number N of threads, and had each thread allocate and touch 1/N'th of the total memory to be used in the test run in a single loop, writing a constant word to memory, one store every 4096 bytes. Watching this test during some earlier trial runs, I would see each of these threads sit down on one CPU and stay there, for the remainder of the pass, a different CPU for each thread. 2) The 'real' column is not comparable to the 'sys' or 'user' columns. The 'real' column is seconds wall clock time elapsed, from beginning to end of that test pass. The 'sys' and 'user' columns are total CPU seconds spent on that test pass. For a 19 thread test run, for example, the sum of 'sys' and 'user' could be up to 19 times the number of 'real' elapsed wall clock seconds. 3) Tests were run on a fresh, single-user boot, to minimize the amount of memory already in use at the start of the test, and to minimize the amount of background activity that might interfere. 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM. 5) Notice that the 'real' time gets large for the single thread runs, even though the measured 'sys' and 'user' times are modest. I'm not sure what that means - probably something to do with it being slow for one thread to be accessing memory along ways away. Perhaps the fake numa system, running ostensibly the same workload, would not show this substantial degradation of 'real' time for one thread on many nodes -- lets hope not. 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs) ran quite efficiently, as one might expect. Each pair of threads needed to allocate and touch the memory on the node the two threads shared, a pleasantly parallizable workload. 7) The intermediate thread count passes, when asking for alot of memory forcing them to go to a few neighboring nodes, improved the most with this zonelist caching patch. Conclusions: This zonelist cache patch probably makes little difference one way or the other for most workloads on real numa hardware, if those workloads avoid heavy off node allocations. * For memory intensive workloads requiring substantial off-node allocations on real numa hardware, this patch improves both kernel and elapsed timings up to ten per-cent. * For fake numa systems, I'm optimistic, but will have to leave that up to Rohit Seth to actually test (once I get him a 2.6.18 backport.) Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: David Rientjes <rientjes@cs.washington.edu> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Christoph Lameter	89689ae7f9	[PATCH] Get rid of zone_table[] The zone table is mostly not needed. If we have a node in the page flags then we can get to the zone via NODE_DATA() which is much more likely to be already in the cpu cache. In case of SMP and UP NODE_DATA() is a constant pointer which allows us to access an exact replica of zonetable in the node_zones field. In all of the above cases there will be no need at all for the zone table. The only remaining case is if in a NUMA system the node numbers do not fit into the page flags. In that case we make sparse generate a table that maps sections to nodes and use that table to to figure out the node number. This table is sized to fit in a single cache line for the known 32 bit NUMA platform which makes it very likely that the information can be obtained without a cache miss. For sparsemem the zone table seems to be have been fairly large based on the maximum possible number of sections and the number of zones per node. There is some memory saving by removing zone_table. The main benefit is to reduce the cache foootprint of the VM from the frequent lookups of zones. Plus it simplifies the page allocator. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Andrew Morton	676dcb8bc2	[PATCH] add bottom_half.h With CONFIG_SMP=n: drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable' drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable' Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h includes sched.h which will need spinlock.h. So the patch breaks the _bh declarations out into a separate header and includes it in both interrupt.h and spinlock.h. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:20 -08:00
Pavel Pisa	86987d5bf4	[ARM] 3991/1: i.MX/MX1 high resolution time source Enhanced resolution for time measurement functions. Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:24:16 +00:00
Ben Dooks	9073341c2b	[ARM] 3986/1: H1940: suspend to RAM support Add support to suspend and resume, using the H1940's bootloader Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:17:49 +00:00
Rod Whitby	a47d08e2e3	[ARM] 3984/1: ixp4xx/nslu2: Fix disk LED numbering (take 2) This patch fixes an error in the numbering of the disk LEDs on the Linksys NSLU2. The error crept in because the physical location of the LEDs has the Disk 2 LED above the Disk 1 LED. Thanks to Gordon Farquharson for reporting this. Signed-off-by: Rod Whitby <rod@whitby.id.au> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:17:06 +00:00
Nicolas Pitre	838ccbc35e	[ARM] 3978/1: macro to provide a 63-bit value from a 32-bit hardware counter This is done in a completely lockless fashion. Bits 0 to 31 of the count are provided by the hardware while bits 32 to 62 are stored in memory. The top bit in memory is used to synchronize with the hardware count half-period. When the top bit of both counters (hardware and in memory) differ then the memory is updated with a new value, incrementing it when the hardware counter wraps around. Because a word store in memory is atomic then the incremented value will always be in synch with the top bit indicating to any potential concurrent reader if the value in memory is up to date or not wrt the needed increment. And any race in updating the value in memory is harmless as the same value would be stored more than once. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:45 +00:00
Nicolas Pitre	fa4adc6149	[ARM] 3611/4: optimize do_div() when divisor is constant On ARM all divisions have to be performed "manually". For 64-bit divisions that may take more than a hundred cycles in many cases. With 32-bit divisions gcc already use the recyprocal of constant divisors to perform a multiplication, but not with 64-bit divisions. Since the kernel is increasingly relying upon 64-bit divisions it is worth optimizing at least those cases where the divisor is a constant. This is what this patch does using plain C code that gets optimized away at compile time. For example, despite the amount of added C code, do_div(x, 10000) now produces the following assembly code (where x is assigned to r0-r1): adr r4, .L0 ldmia r4, {r4-r5} umull r2, r3, r4, r0 mov r2, #0 umlal r3, r2, r5, r0 umlal r3, r2, r4, r1 mov r3, #0 umlal r2, r3, r5, r1 mov r0, r2, lsr #11 orr r0, r0, r3, lsl #21 mov r1, r3, lsr #11 ... .L0: .word 948328779 .word 879609302 which is the fastest that can be done for any value of x in that case, many times faster than the __do_div64 code (except for the small x value space for which the result ends up being zero or a single bit). The fact that this code is generated inline produces a tiny increase in .text size, but not significant compared to the needed code around each __do_div64 call site this code is replacing. The algorithm used has been validated on a 16-bit scale for all possible values, and then recodified for 64-bit values. Furthermore I've been running it with the final BUG_ON() uncommented for over two months now with no problem. Note that this new code is compiled with gcc versions 4.0 or later. Earlier gcc versions proved themselves too problematic and only the original code is used with them. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2006-12-07 16:06:09 +00:00
Michael Chan	9d26e21342	[TG3]: Add TG3_FLG2_IS_NIC flag. Add Tg3_FLG2_IS_NIC flag to unambiguously determine whether the device is NIC or onboard. Previously, the EEPROM_WRITE_PROT flag was overloaded to also mean onboard. With the separation, we can support some devices that are onboard but do not use eeprom write protect. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 00:21:14 -08:00
Michael Chan	676917d488	[TG3]: Add 5787F device ID. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 00:20:22 -08:00
Joy Latten	c9204d9ca7	audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n Disables auditing in ipsec when CONFIG_AUDITSYSCALL is disabled in the kernel. Also includes a bug fix for xfrm_state.c as a result of original ipsec audit patch. Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 20:14:23 -08:00
Joy Latten	161a09e737	audit: Add auditing to ipsec An audit message occurs when an ipsec SA or ipsec policy is created/deleted. Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 20:14:22 -08:00
Randy Dunlap	95b99a670d	[IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n include/net/irda/irlan_filter.h:31: warning: 'struct seq_file' declared inside parameter list include/net/irda/irlan_filter.h:31: warning: its scope is only this definition or declaration, which is probably not what you want Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 20:10:07 -08:00
Yasuyuki Kozakai	9ee0779e99	[NETFILTER]: nf_conntrack: fix warning in PPTP helper Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:04 -08:00
Rik Snel	c494e0705d	[CRYPTO] lib: table driven multiplications in GF(2^128) A lot of cypher modes need multiplications in GF(2^128). LRW, ABL, GCM... I use functions from this library in my LRW implementation and I will also use them in my ABL (Arbitrary Block Length, an unencumbered (correct me if I am wrong, wide block cipher mode). Elements of GF(2^128) must be presented as u128 *, it encourages automatic and proper alignment. The library contains support for two different representations of GF(2^128), see the comment in gf128mul.h. There different levels of optimization (memory/speed tradeoff). The code is based on work by Dr Brian Gladman. Notable changes: - deletion of two optimization modes - change from u32 to u64 for faster handling on 64bit machines - support for 'bbe' representation in addition to the, already implemented, 'lle' representation. - move 'inline void' functions from header to 'static void' in the source file - update to use the linux coding style conventions The original can be found at: http://fp.gladman.plus.com/AES/modes.vc8.19-06-06.zip The copyright (and GPL statement) of the original author is preserved. Signed-off-by: Rik Snel <rsnel@cube.dyndns.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2006-12-06 18:38:55 -08:00
Rik Snel	aec3694b98	[CRYPTO] lib: some common 128-bit block operations, nicely centralized 128bit is a common blocksize in linux kernel cryptography, so it helps to centralize some common operations. The code, while mostly trivial, is based on a header file mode_hdr.h in http://fp.gladman.plus.com/AES/modes.vc8.19-06-06.zip The original copyright (and GPL statement) of the original author, Dr Brian Gladman, is preserved. Signed-off-by: Rik Snel <rsnel@cube.dyndns.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2006-12-06 18:38:55 -08:00
Adrian Bunk	cc44215eaa	[CRYPTO] api: Remove unused functions This patch removes the following no longer used functions: - api.c: crypto_alg_available() - digest.c: crypto_digest_init() - digest.c: crypto_digest_update() - digest.c: crypto_digest_final() - digest.c: crypto_digest_digest() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2006-12-06 18:38:54 -08:00
Kazunori MIYAZAWA	7cf4c1a5fd	[IPSEC]: Add support for AES-XCBC-MAC The glue of xfrm. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2006-12-06 18:38:51 -08:00
Jamal Hadi Salim	334c29a645	[GENETLINK]: Move command capabilities to flags. This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. We increment the nlctrl version so as to signal to user space that to not expect the attributes. We will try to be careful not to do this too often ;-> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:38:41 -08:00
Jan Beulich	b65780e123	[PATCH] unwinder: move .eh_frame to RODATA The .eh_frame section contents is never written to, so it can as well benefit from CONFIG_DEBUG_RODATA. Diff-ed against firstfloor tree. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:19 +01:00
Burman Yan	116780fc04	[PATCH] i386: replace kmalloc+memset with kzalloc Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:19 +01:00
Adrian Bunk	d7fb027128	[PATCH] x86-64: remove remaining pc98 code Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:19 +01:00
Andi Kleen	9dc452ba2d	[PATCH] x86-64: Fix constraints in atomic_add_return() Following i386 from Duncan Sands Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:13 +01:00
Duncan Sands	e4b522d7ef	[PATCH] x86-64: fix asm constraints in i386 atomic_add_return Since v->counter is both read and written, it should be an output as well as an input for the asm. The current code only gets away with this because counter is volatile. Also, according to Documents/atomic_ops.txt, atomic_add_return should provide a memory barrier, in particular a compiler barrier, so the asm should be marked as clobbering memory. Test case: #include <stdio.h> typedef struct { int counter; } atomic_t; /* NB: no "volatile" / #define ATOMIC_INIT(i) { (i) } #define atomic_read(v) ((v)->counter) static __inline__ int atomic_add_return(int i, atomic_t v) { int __i = i; __asm__ __volatile__( "lock; xaddl %0, %1;" :"=r"(i) :"m"(v->counter), "0"(i)); /* __asm__ __volatile__( "lock; xaddl %0, %1" :"+r" (i), "+m" (v->counter) : : "memory"); */ return i + __i; } int main (void) { atomic_t a = ATOMIC_INIT(0); int x; x = atomic_add_return (1, &a); if ((x!=1) \|\| (atomic_read(&a)!=1)) printf("fail: %i, %i\n", x, atomic_read(&a)); } Signed-off-by: Duncan Sands <baldrick@free.fr> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:13 +01:00
Adrian Bunk	9ee4016888	[PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header Nothing in include/asm-x86_64/cpufeature.h is part of the userspace<->kernel interface. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:13 +01:00
Venkatesh Pallipadi	d331e739f5	[PATCH] x86-64: Fix interrupt race in idle callback (3rd try) Idle callbacks has some races when enter_idle() sets isidle and subsequent interrupts that can happen on that CPU, before CPU goes to idle. Due to this, an IDLE_END can get called before IDLE_START. To avoid these races, disable interrupts before enter_idle and make sure that all idle routines do not enable interrupts before entering idle. Note that poll_idle() still has a this race as it has to enable interrupts before going to idle. But, all other idle routines have the race fixed. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:13 +01:00
Jan Beulich	359ad0d401	[PATCH] unwinder: more sanity checks in Dwarf2 unwinder Tighten the requirements on both input to and output from the Dwarf2 unwinder. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:13 +01:00
Adrian Bunk	a1a70c25be	[PATCH] i386: always enable regparm -mregparm=3 has been enabled by default for some time on i386, and AFAIK there aren't any problems with it left. This patch removes the REGPARM config option and sets -mregparm=3 unconditionally. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:12 +01:00
Chuck Ebbert	0741f4d207	[PATCH] x86: add sysctl for kstack_depth_to_print Add sysctl for kstack_depth_to_print. This lets users change the amount of raw stack data printed in dump_stack() without having to reboot. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Wink Saville	c7a3392e9e	[PATCH] x86-64: Fix comments for MSR_FS_BASE and MSR_GS_BASE. The comments for MSR_FS_BASE & MSR_GS_BASE were transposed. Signed-off-by: Wink Saville <wink@saville.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Artiom Myaskouvskey	bf7e6a1963	[PATCH] i386: Preserve EFI run time regions with memmap parameter When using memmap kernel parameter in EFI boot we should also add to memory map memory regions of runtime services to enable their mapping later. AK: merged and cleaned up the patch Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Stephane Eranian	538f188e03	[PATCH] i386: i386 add Intel BTS cpufeature bit and detection (take 2) Here is a small patch for i386 which adds a cpufeature flag and detection code for Intel's Branch Trace Store (BTS) feature. This feature can be found on Intel P4 and Core 2 processors among others. It can also be used by perfmon. changelog: - add CPU_FEATURE_BTS - add Branch Trace Store detection signed-off-by: stephane eranian <eranian@hpl.hp.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Stephane Eranian	ee58fad51a	[PATCH] x86-64: x86-64 add Intel BTS cpufeature bit and detection (take 2) Here is a small patch for x86-64 which adds a cpufeature flag and detection code for Intel's Branch Trace Store (BTS) feature. This feature can be found on Intel P4 and Core 2 processors among others. It can also be used by perfmon. changelog: - add CPU_FEATURE_BTS - add Branch Trace Store detection signed-off-by: stephane eranian <eranian@hpl.hp.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:11 +01:00
Artiom Myaskouvskey	e1cccf48b1	[PATCH] i386: call efi_get_time during suspend Function efi_get_time called not only during init kernel phase but also during suspend (from get_cmos_time). When it is called from get_cmos_time the corresponding runtime service should be called in virtual and not in physical mode. Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Narayanan, Chandramouli" <chandramouli.narayanan@intel.com> Cc: "Jiossy, Rami" <rami.jiossy@intel.com> Cc: "Satt, Shai" <shai.satt@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:11 +01:00
Siddha, Suresh B	b0d0a4ba45	[PATCH] x86: fix the irqbalance quirk for E7320/E7520/E7525 Move the irqbalance quirks for E7320/E7520/E7525(Errata 23 in http://download.intel.com/design/chipsets/specupdt/30304203.pdf) to early quirks. And add a PCI quirk for these platforms to check(which happens very late during the boot) if the APIC routing is indeed set to default flat mode. This fixes the breakage(in x86_64) of this quirk due to cpu hotplug which selects physical mode instead of the logical flat(as needed for this errata workaround). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:10 +01:00
Siddha, Suresh B	9899f826fc	[PATCH] x86-64: add genapic_force Add genapic_force. Used by the next Intel quirks patch. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:10 +01:00
Siddha, Suresh B	72486f1f8f	[PATCH] i386: change the 'no_control' field to 'hotpluggable' in the struct cpu Change the 'no_control' field in the cpu struct to a more positive and better term 'hotpluggable'. And change(/cleanup) the logic accordingly. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:10 +01:00
Siddha, Suresh B	fd6d7d2689	[PATCH] i386: introduce the mechanism of disabling cpu hotplug control Add 'enable_cpu_hotplug' flag and when cleared, the hotplug control file ("online") will not be added under /sys/devices/system/cpu/cpuX/ Next patch doing PCI quirks will use this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:10 +01:00
Siddha, Suresh B	274e1bbdee	[PATCH] x86: add write_pci_config_byte() to direct PCI access routines Mechanism of selecting physical mode in genapic when cpu hotplug is enabled on x86_64, broke the quirk(quirk_intel_irqbalance()) introduced for working around the transposing interrupt message errata in E7520/E7320/E7525 (revision ID 0x9 and below. errata #23 in http://download.intel.com/design/chipsets/specupdt/30304203.pdf). This errata requires the mode to be in logical flat, so that interrupts can be directed to more than one cpu(and thus use hardware IRQ balancing enabled by BIOS on these platforms). Following four patches fixes this by moving the quirk to early quirk and forcing the x86_64 genapic selection to logical flat on these platforms. Thanks to Shaohua for pointing out the breakage. This patch: Add write_pci_config_byte() to direct PCI access routines Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:10 +01:00
Jan Beulich	eab724e5df	[PATCH] x86-64: adjust pmd_bad() Make pmd_bad() symmetrical to pgd_bad() and pud_bad(). At once, simplify them all. TBD: tighten down the checks again as suggested by Hugh D. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:09 +01:00
Jan Beulich	4a1c422750	[PATCH] x86-64: remove prototype of free_bootmem_generic() The function doesn't exist (anymore). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:09 +01:00
Ernie Petrides	103efcd9aa	[PATCH] x86-64: fix perms/range of vsyscall vma in /proc/*/maps The final line of /proc/<pid>/maps on x86_64 for native 64-bit tasks shows an incorrect ending address and incorrect permissions. There is only a single page mapped in this vsyscall region, and it is accessible for both read and execute. The patch below fixes this. (Since 32-bit-compat tasks have a real vma with correct perms/range, no change is necessary for that scenario.) Before the patch, a "cat /proc/self/maps \| tail -1" shows this: ffffffffff600000-ffffffffffe00000 ---p 00000000 [...] After the patch, this is the output: ffffffffff600000-ffffffffff601000 r-xp 00000000 [...] Signed-off-by: Ernie Petrides <petrides@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:09 +01:00
Andi Kleen	c55d92d141	[PATCH] i386: Add support for compilation for Core2 gcc doesn't support -mtune=core2 yet, but will be soon. Use -mtune=generic or -mtune=i686 as fallback TBD need benchmarking for INTEL_USERCOPY etc. So far I used the same defaults as MPENTIUMM Signed-off-by: Andi Kleen <ak@suse.de>	2006-12-07 02:14:09 +01:00
Zachary Amsden	8ecb895069	[PATCH] paravirt: fix missing pte update The function ptep_get_and_clear uses an atomic instruction sequence to get and clear an active pte. Rather than add such an atomic operator to all virtual machine implementations in paravirt-ops, it is easier to support the raw atomic sequence and use either a trapping writable pagetable approach, or a post-update notification. For the post update notification, we require the pte_update function to be called after the access. Combine the 2-level and 3-level paging operators into one common function which does the post-update notification, and rename the actual atomic sequences to raw_ptep_xxx operators. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:09 +01:00
Zachary Amsden	dfbea0ad50	[PATCH] paravirt: fix parameter names in mmu operations Make parameter names match function argument names for the yet to be defined pte_update_defer accessor. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	2006-12-07 02:14:08 +01:00

... 188 189 190 191 192 ...

20300 Commits