Commit Graph

564 Commits

Author SHA1 Message Date
Linus Torvalds
d895cb1af1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile (part one) from Al Viro:
 "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
  locking violations, etc.

  The most visible changes here are death of FS_REVAL_DOT (replaced with
  "has ->d_weak_revalidate()") and a new helper getting from struct file
  to inode.  Some bits of preparation to xattr method interface changes.

  Misc patches by various people sent this cycle *and* ocfs2 fixes from
  several cycles ago that should've been upstream right then.

  PS: the next vfs pile will be xattr stuff."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  saner proc_get_inode() calling conventions
  proc: avoid extra pde_put() in proc_fill_super()
  fs: change return values from -EACCES to -EPERM
  fs/exec.c: make bprm_mm_init() static
  ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
  ocfs2: fix possible use-after-free with AIO
  ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
  get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
  target: writev() on single-element vector is pointless
  export kernel_write(), convert open-coded instances
  fs: encode_fh: return FILEID_INVALID if invalid fid_type
  kill f_vfsmnt
  vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
  nfsd: handle vfs_getattr errors in acl protocol
  switch vfs_getattr() to struct path
  default SET_PERSONALITY() in linux/elf.h
  ceph: prepopulate inodes only when request is aborted
  d_hash_and_lookup(): export, switch open-coded instances
  9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
  9p: split dropping the acls from v9fs_set_create_acl()
  ...
2013-02-26 20:16:07 -08:00
Wen Congyang
24d335ca36 memory-hotplug: introduce new arch_remove_memory() for removing page table
For removing memory, we need to remove page tables.  But it depends on
architecture.  So the patch introduce arch_remove_memory() for removing
page table.  Now it only calls __remove_pages().

Note: __remove_pages() for some archtecuture is not implemented
      (I don't know how to implement it for s390).

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Kees Cook
0d57af1ec7 arch/sh: remove depends on CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.

CC: Paul Mundt <lethal@linux-sh.org>
CC: Tejun Heo <tj@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-21 14:43:13 -08:00
Linus Torvalds
3d59eebc5e Automatic NUMA Balancing V11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD
 Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB
 vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo
 xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT
 DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb
 72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj
 YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj
 3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR
 hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W
 CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF
 BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi
 Ka0JKgnWvsa6ez6FSzKI
 =ivQa
 -----END PGP SIGNATURE-----

Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma

Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
 "There are three implementations for NUMA balancing, this tree
  (balancenuma), numacore which has been developed in tip/master and
  autonuma which is in aa.git.

  In almost all respects balancenuma is the dumbest of the three because
  its main impact is on the VM side with no attempt to be smart about
  scheduling.  In the interest of getting the ball rolling, it would be
  desirable to see this much merged for 3.8 with the view to building
  scheduler smarts on top and adapting the VM where required for 3.9.

  The most recent set of comparisons available from different people are

    mel:    https://lkml.org/lkml/2012/12/9/108
    mingo:  https://lkml.org/lkml/2012/12/7/331
    tglx:   https://lkml.org/lkml/2012/12/10/437
    srikar: https://lkml.org/lkml/2012/12/10/397

  The results are a mixed bag.  In my own tests, balancenuma does
  reasonably well.  It's dumb as rocks and does not regress against
  mainline.  On the other hand, Ingo's tests shows that balancenuma is
  incapable of converging for this workloads driven by perf which is bad
  but is potentially explained by the lack of scheduler smarts.  Thomas'
  results show balancenuma improves on mainline but falls far short of
  numacore or autonuma.  Srikar's results indicate we all suffer on a
  large machine with imbalanced node sizes.

  My own testing showed that recent numacore results have improved
  dramatically, particularly in the last week but not universally.
  We've butted heads heavily on system CPU usage and high levels of
  migration even when it shows that overall performance is better.
  There are also cases where it regresses.  Of interest is that for
  specjbb in some configurations it will regress for lower numbers of
  warehouses and show gains for higher numbers which is not reported by
  the tool by default and sometimes missed in treports.  Recently I
  reported for numacore that the JVM was crashing with
  NullPointerExceptions but currently it's unclear what the source of
  this problem is.  Initially I thought it was in how numacore batch
  handles PTEs but I'm no longer think this is the case.  It's possible
  numacore is just able to trigger it due to higher rates of migration.

  These reports were quite late in the cycle so I/we would like to start
  with this tree as it contains much of the code we can agree on and has
  not changed significantly over the last 2-3 weeks."

* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
  mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
  mm/rmap: Convert the struct anon_vma::mutex to an rwsem
  mm: migrate: Account a transhuge page properly when rate limiting
  mm: numa: Account for failed allocations and isolations as migration failures
  mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
  mm: numa: Add THP migration for the NUMA working set scanning fault case.
  mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
  mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
  mm: sched: numa: Control enabling and disabling of NUMA balancing
  mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
  mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
  mm: numa: migrate: Set last_nid on newly allocated page
  mm: numa: split_huge_page: Transfer last_nid on tail page
  mm: numa: Introduce last_nid to the page frame
  sched: numa: Slowly increase the scanning period as NUMA faults are handled
  mm: numa: Rate limit setting of pte_numa if node is saturated
  mm: numa: Rate limit the amount of memory that is migrated between nodes
  mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
  mm: numa: Migrate pages handled during a pmd_numa hinting fault
  mm: numa: Migrate on reference policy
  ...
2012-12-16 15:18:08 -08:00
David Rientjes
c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Linus Torvalds
608ff1a210 Merge branch 'akpm' (Andrew's patchbomb)
Merge misc updates from Andrew Morton:
 "About half of most of MM.  Going very early this time due to
  uncertainty over the coreautounifiednumasched things.  I'll send the
  other half of most of MM tomorrow.  The rest of MM awaits a slab merge
  from Pekka."

* emailed patches from Andrew Morton: (71 commits)
  memory_hotplug: ensure every online node has NORMAL memory
  memory_hotplug: handle empty zone when online_movable/online_kernel
  mm, memory-hotplug: dynamic configure movable memory and portion memory
  drivers/base/node.c: cleanup node_state_attr[]
  bootmem: fix wrong call parameter for free_bootmem()
  avr32, kconfig: remove HAVE_ARCH_BOOTMEM
  mm: cma: remove watermark hacks
  mm: cma: skip watermarks check for already isolated blocks in split_free_page()
  mm, oom: fix race when specifying a thread as the oom origin
  mm, oom: change type of oom_score_adj to short
  mm: cleanup register_node()
  mm, mempolicy: remove duplicate code
  mm/vmscan.c: try_to_freeze() returns boolean
  mm: introduce putback_movable_pages()
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce compaction and migration for ballooned pages
  mm: introduce a common interface for balloon pages mobility
  mm: redefine address_space.assoc_mapping
  mm: adjust address_space_operations.migratepage() return code
  arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/
  ...
2012-12-11 18:05:37 -08:00
Michel Lespinasse
b4265f1234 mm: use vm_unmapped_area() on sh architecture
Update the sh arch_get_unmapped_area[_topdown] functions to make use of
vm_unmapped_area() instead of implementing a brute force search.

[akpm@linux-foundation.org: remove now-unused COLOUR_ALIGN_DOWN()]
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:26 -08:00
Peter Zijlstra
cbee9f88ec mm: numa: Add fault driven placement and migration
NOTE: This patch is based on "sched, numa, mm: Add fault driven
	placement and migration policy" but as it throws away all the policy
	to just leave a basic foundation I had to drop the signed-offs-by.

This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on.  In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
2012-12-11 14:42:45 +00:00
Cyril Chemparathy
7e6735c357 /dev/mem: use phys_addr_t for physical addresses
This patch fixes the /dev/mem driver to use phys_addr_t for physical
addresses.  This is required on PAE systems, especially those that run
entirely out of >4G physical memory space.

Signed-off-by: Cyril Chemparathy <cyril@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-10-24 15:32:50 -07:00
Shaohua Li
45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Paul Mundt
90eed7d87b sh: Fix up recursive fault in oops with unset TTB.
Presently the oops code looks for the pgd either from the mm context or
the cached TTB value. There are presently cases where the TTB can be
unset or otherwise cleared by hardware, which we weren't handling,
resulting in recursive faults on the NULL pgd. In these cases we can
simply reload from swapper_pg_dir and continue on as normal.

Cc: stable@vger.kernel.org
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-07-25 13:11:13 +09:00
Paul Mundt
0412ddc822 sh64: Fix up section mismatch warnings.
WARNING: vmlinux.o(.cpuinit.text+0x280): Section mismatch in reference from the function cpu_probe() to the function .init.text:sh64_tlb_init()
The function __cpuinit cpu_probe() references
a function __init sh64_tlb_init().
If sh64_tlb_init is only used by cpu_probe then
annotate sh64_tlb_init with a matching annotation.

sh64_tlb_init() simply needs to be __cpuinit annotated, so fix that up.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-06-14 15:05:53 +09:00
Paul Mundt
d8fd35fc58 sh64: Fix up vmalloc fault range check.
With the previous attempt reverted this switches to conditionalizing the
end address. Nominally VMALLOC_END, but extended for P3_ADDR_MAX in the
store queue case.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-18 20:01:16 +09:00
Paul Mundt
c3e0af9879 Revert "sh: Ensure fixmap and store queue space can co-exist."
This reverts commit 20e7c297ef.
With store queues enabled the area above P4SEG has special properties
from the MMU's point of view, which was causing fixmap failure. We'll
have to do something else to satisfy the vmalloc range check.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-18 19:30:05 +09:00
Paul Mundt
fd37e75ed5 sh64: Set additional fault code values.
The SSR.MD status amongst other things are already made available, which
can be used for encoding a more precise fault code value.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 17:46:49 +09:00
Paul Mundt
392c3822a6 sh64: Tidy up and consolidate the TLB miss fast path.
This unifies the fast-path TLB miss handler, allowing for further cleanup
and eventual utilization of a shared _32/_64 handler.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 17:24:21 +09:00
Paul Mundt
2ec08e141f sh64: Fix up caller-save register settings for fast-path.
Now that the fast-path handler has been moved, we also need to update the
Makefile to ensure that the same restrictions for caller-save registers
are observed.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 16:46:07 +09:00
Paul Mundt
4de5185629 sh64: Invert page fault fast-path error path values.
This brings the sh64 version in line with the sh32 one with regards to
how errors are handled. Base work for further unification of the
implementations.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 16:44:45 +09:00
Paul Mundt
c06fd28387 sh64: Migrate to __update_tlb() API.
Now that we have a method for finding out if we're handling an ITLB fault
or not without passing it all the way down the chain, it's possible to
use the __update_tlb() interface in place of a special __do_tlb_refill().

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 15:52:28 +09:00
Paul Mundt
28080329ed sh: Enable shared page fault handler for _32/_64.
This moves the now generic _32 page fault handling code to a shared place
and adapts the _64 implementation to make use of it.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 15:33:28 +09:00
Paul Mundt
e45af0e083 sh64: Kill off unused fixed I/O mapping window.
This was reworked some time ago to go through fixmaps instead, leaving
the range itself unused. As such, kill off the remaining references and
hand over the remaining space for fixmaps directly. This also makes it
possible to simplify the vmalloc fault case as we no longer have to care
about the special section.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 15:16:11 +09:00
Paul Mundt
20e7c297ef sh: Ensure fixmap and store queue space can co-exist.
At the moment the top of the fixmap space is calculated from P4SEG, which
places it at the end of the store queue space when that API is enabled.
Make sure we use P3_ADDR_MAX here instead to find the proper address
limit. With this done, it's also possible to switch to the generic
vmalloc address range check now that VMALLOC_START/END encapsulate the
translatable areas that we care about.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 15:11:35 +09:00
Paul Mundt
9a7b7739f9 sh64: Utilize thread fault code encoding.
This plugs in fault code encoding for the sh64 page fault, too.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 15:07:52 +09:00
Paul Mundt
5a1dc78a38 sh: Support thread fault code encoding.
This provides a simple interface modelled after sparc64/m32r to encode
the error code in the upper byte of thread_info for finer-grained
handling in the page fault path.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 14:57:28 +09:00
Paul Mundt
dbdb4e9f3f sh: Tidy up and generalize page fault error paths.
This follows the x86 changes for tidying up the page fault error paths.
We'll build on top of this for _32/_64 unification.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-05-14 10:27:34 +09:00
Paul Mundt
b2212ea41d sh64: Kill off unused trap_no/error_code from thread_struct.
While the trap number and error code are passed around for debugging
purposes, this occurs wholly independently of the thread struct values.
These values were never part of the sigcontext ABI and are thus never
passed anywhere, so we can just kill them off across the board.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-19 17:52:20 +09:00
Paul Mundt
fb56a91922 Merge branches 'sh/st-integration' and 'sh/stackprotector' into sh-latest 2012-04-19 17:31:59 +09:00
Stuart Menefy
45c0e0e25e sh: Improve oops error reporting
In some cases the opps error reporting doesn't give enough information
to diagnose the problem, only printing information if it is thought
to be valid. Replace the current code with more detailed output.

This code is based on the ARM reporting, with minor changes for the SH.

[lethal@linux-sh.org: fixed up for 64-bit PTEs and pte_offset_kernel()]
Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-19 17:25:03 +09:00
Stuart Menefy
8d9a784d1e sh: Fix error synchronising kernel page tables
The problem is caused by the interaction of two features in the Linux
memory management code.

A processes address space is described by a struct mm_struct, and
every thread has a pointer to the mm it should run in. The exception
to this are kernel threads, which don't have an mm, and so borrow
the mm from the last thread which ran. The system is bootstrapped
by the initial kernel thread using init's mm (even though init hasn't
been created yet, its mm is the static init_mm).

The other feature is how the kernel handles the page table which
describes the portion of the address space which is only visible when
executing inside the kernel, and which is shared by all threads. On
the SH4 the only portion of the kernel's address space which described
using the page table is called P3, from 0xc0000000 to 0xdfffffff. This
portion of the address space is divided into three:
  - mappings for dma_alloc_coherent()
  - mappings for vmalloc() and ioremap()
  - fixmap mappings, primarily used in copy_user_pages() to create
    kernel mappings of user pages with the correct cache colour.

To optimise the TLB miss handler we don't want to add an additional
condition which checks whether the faulting address is in the user or
the kernel portion of the address space, and so all page tables have a
common portion which describes the kernel part of the address
space. As the SH4 uses a two level page table, only the kernel portion
of first level page table (the pgd entries) is duplicated. These all
point to the same second level entries (the pte's), and so no memory
is wasted.

The reference page table for the kernel is called the swapper_pg_dir,
and when a new page table is created for a new process the kernel
portion of the page table is copied from swapper_pg_dir. This works
fine when changes only occur in the second level of the kernel's page
table, or the first level entries are created before any new user
processes. However if a change occurs to the first level of the page
table, and there are existing processes which don't have this entry in
their page table, this new entry needs to be added. This is done on
demand, when the kernel accesses a P3 address which isn't mapped using
the current page table, the code in vmalloc_fault() copies the entry
from the reference page table (swapper_pg_dir) into the current
processes page table.

The bug which this patch addresses is that the code in vmalloc_fault()
was not copying addresses which fell in the dma_alloc_coherent()
portion of the address space, and it should have been copying any P3
address.

Why we hadn't seen this before, and what made this hard to reproduce,
is that normally the kernel will have called dma_alloc_coherent(), and
accessed the memory mapping created, before any user process
runs. Typically drivers such as USB or SATA will have created and used
mappings of this type during the kernel initialisation, when probing
for the attached devices, before init runs. Ethernet is slightly
different, as it normally only creates and accesses
dma_alloc_coherent() mappings when the network is brought up, but if
kernel level IP configuration is used this will also occur before any
user space process runs. So the first reproduction of this problem
which we saw was occurred when USB and SATA were removed from the
kernel, and then bring up Ethernet from user space using ifconfig.
I'd like to thank Joseph Bormolini who did the hard work reducing the
problem to this simple to reproduce criteria.

In your case the situation is slightly different, and turns out to
depends on the exact kernel configuration (which we had) and your
ramdisk contents (which we didn't - hence the need for some assumptions).

In this case the problem is a side effect of kernel level module
loading. Kernel subsystems sometimes trigger the load of kernel
modules directly, for example the crypto subsystem tries to load the
cryptomgr and MTD tries to load modules for Flash partitioning if
these are not built into the kernel. This is done by the kernel
creating a user process which runs insmod to try and load the
appropriate module.

In order for this to cause problems the system must be running with a
initrd or initramfs, which contains an insmod executable - if the
kernel can't find an insmod to run, no user process is created, and
the problem doesn't occur.  If an insmod is found, a process is
created to run it, which will inherit the kernel portion of the
swapper_pg_dir first level page table. It doesn't matter whether the
inmod is successful or not, but when the the kernel scheduler context
switches back to the kernel initialisation thread, the insmod's mm is
'borrowed' by the kernel thread, as it doesn't have an address space
of its own. (Reference counting is used to ensure this mm is not
destroyed, even though the user process which caused its creation may no
longer exist.) If this address space doesn't have a first level page
table entry for the consistent mappings, and a driver tries to access
such a mapping, we are in the same situation as described above,
except this time in a kernel thread rather than a user thread
executing inside the kernel.

See bugzilla: 15425, 15836, 15862, 16106, 16793

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-19 15:57:44 +09:00
Paul Mundt
ba2a3cdf76 sh64: Kill off dead page fault debug cruft.
In the future we'll be unifying some of the 32/64 page fault path, so
start to tidy up the _64 one by killing off some of the unused debug
cruft.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-11 12:53:06 +09:00
Paul Mundt
a1e2030122 sh64: Port OOM changes to do_page_fault
Reflect the sh32 OOM changes for the sh64 page fault handler, too.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-11 12:44:50 +09:00
Kautuk Consul
11fd982400 sh/mm/fault_32.c: Port OOM changes to do_page_fault
Commit d065bd810b
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to the 32-bit SH platform.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-04-11 12:37:54 +09:00
Linus Torvalds
664481ed45 SuperH updates for 3.4-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAk99uBgACgkQGkmNcg7/o7hglwCgqi6CE7i5gyneNYBn2ocRps4O
 y1UAoMSIscO6YWcHPuxOiNBbJYUy/jMI
 =SEO8
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: fix clock-sh7757 for the latest sh_mobile_sdhi driver
  serial: sh-sci: use serial_port_in/out vs sci_in/out.
  sh: vsyscall: Fix up .eh_frame generation.
  sh: dma: Fix up device attribute mismatch from sysdev fallout.
  sh: dwarf unwinder depends on SHcompact.
  sh: fix up fallout from system.h disintegration.
2012-04-07 09:52:46 -07:00
Linus Torvalds
58bca4a8fa Merge branch 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA mapping branch from Marek Szyprowski:
 "Short summary for the whole series:

  A few limitations have been identified in the current dma-mapping
  design and its implementations for various architectures.  There exist
  more than one function for allocating and freeing the buffers:
  currently these 3 are used dma_{alloc, free}_coherent,
  dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.

  For most of the systems these calls are almost equivalent and can be
  interchanged.  For others, especially the truly non-coherent ones
  (like ARM), the difference can be easily noticed in overall driver
  performance.  Sadly not all architectures provide implementations for
  all of them, so the drivers might need to be adapted and cannot be
  easily shared between different architectures.  The provided patches
  unify all these functions and hide the differences under the already
  existing dma attributes concept.  The thread with more references is
  available here:

    http://www.spinics.net/lists/linux-sh/msg09777.html

  These patches are also a prerequisite for unifying DMA-mapping
  implementation on ARM architecture with the common one provided by
  dma_map_ops structure and extending it with IOMMU support.  More
  information is available in the following thread:

    http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819

  More works on dma-mapping framework are planned, especially in the
  area of buffer sharing and managing the shared mappings (together with
  the recently introduced dma_buf interface: commit d15bd7ee44
  "dma-buf: Introduce dma buffer sharing mechanism").

  The patches in the current set introduce a new alloc/free methods
  (with support for memory attributes) in dma_map_ops structure, which
  will later replace dma_alloc_coherent and dma_alloc_writecombine
  functions."

People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.

* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  common: DMA-mapping: add NON-CONSISTENT attribute
  common: DMA-mapping: add WRITE_COMBINE attribute
  common: dma-mapping: introduce mmap method
  common: dma-mapping: remove old alloc_coherent and free_coherent methods
  Hexagon: adapt for dma_map_ops changes
  Unicore32: adapt for dma_map_ops changes
  Microblaze: adapt for dma_map_ops changes
  SH: adapt for dma_map_ops changes
  Alpha: adapt for dma_map_ops changes
  SPARC: adapt for dma_map_ops changes
  PowerPC: adapt for dma_map_ops changes
  MIPS: adapt for dma_map_ops changes
  X86 & IA64: adapt for dma_map_ops changes
  common: dma-mapping: introduce generic alloc() and free() methods
2012-04-04 17:13:43 -07:00
Paul Mundt
f03c4866d3 sh: fix up fallout from system.h disintegration.
Quite a bit of fallout all over the place, nothing terribly exciting.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-03-30 19:29:57 +09:00
David Howells
e839ca5287 Disintegrate asm/system.h for SH
Disintegrate asm/system.h for SH.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-sh@vger.kernel.org
2012-03-28 18:30:03 +01:00
Andrzej Pietrasiewicz
552c0d3ea6 SH: adapt for dma_map_ops changes
Adapt core SH architecture code for dma_map_ops changes: replace
alloc/free_coherent with generic alloc/free methods.

Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
2012-03-28 16:36:37 +02:00
Cong Wang
bc3e11be88 sh: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:15 +08:00
Phil Edworthy
1ae911cba4 sh: Fix sh2a build error for CONFIG_CACHE_WRITETHROUGH
Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-02-24 13:21:46 +09:00
Phil Edworthy
c1537b4863 sh: sh2a: Improve cache flush/invalidate functions
The cache functions lock out interrupts for long periods; this patch
reduces the impact when operating on large address ranges. In such
cases it will:
- Invalidate the entire cache rather than individual addresses.
- Do nothing when flushing the operand cache in write-through mode.
- When flushing the operand cache in write-back mdoe, index the
  search for matching addresses on the cache entires instead of the
  addresses to flush

Note: sh2a__flush_purge_region was only invalidating the operand
cache, this adds flush.

Signed-off-by: Phil Edworthy <phil.edworthy@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2012-01-12 13:11:02 +09:00
Tejun Heo
0ee332c145 memblock: Kill early_node_map[]
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP -
there's no user of early_node_map[] left.  Kill early_node_map[] and
replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP.  Also,
relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h
as page_alloc.c would no longer host an alternative implementation.

This change is ultimately one to one mapping and shouldn't cause any
observable difference; however, after the recent changes, there are
some functions which now would fit memblock.c better than page_alloc.c
and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK
doesn't make much sense on some of them.  Further cleanups for
functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice.

-v2: Fix compile bug introduced by mis-spelling
 CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in
 mmzone.h.  Reported by Stephen Rothwell.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:09 -08:00
Tejun Heo
1aadc0560f memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
The only function of memblock_analyze() is now allowing resize of
memblock region arrays.  Rename it to memblock_allow_resize() and
update its users.

* The following users remain the same other than renaming.

  arm/mm/init.c::arm_memblock_init()
  microblaze/kernel/prom.c::early_init_devtree()
  powerpc/kernel/prom.c::early_init_devtree()
  openrisc/kernel/prom.c::early_init_devtree()
  sh/mm/init.c::paging_init()
  sparc/mm/init_64.c::paging_init()
  unicore32/mm/init.c::uc32_memblock_init()

* In the following users, analyze was used to update total size which
  is no longer necessary.

  powerpc/kernel/machine_kexec.c::reserve_crashkernel()
  powerpc/kernel/prom.c::early_init_devtree()
  powerpc/mm/init_32.c::MMU_init()
  powerpc/mm/tlb_nohash.c::__early_init_mmu()  
  powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
  powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
  sh/kernel/machine_kexec.c::reserve_crashkernel()

* x86/kernel/e820.c::memblock_x86_fill() was directly setting
  memblock_can_resize before populating memblock and calling analyze
  afterwards.  Call memblock_allow_resize() before start populating.

memblock_can_resize is now static inside memblock.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:08 -08:00
Tejun Heo
fe091c208a memblock: Kill memblock_init()
memblock_init() initializes arrays for regions and memblock itself;
however, all these can be done with struct initializers and
memblock_init() can be removed.  This patch kills memblock_init() and
initializes memblock with struct initializer.

The only difference is that the first dummy entries don't have .nid
set to MAX_NUMNODES initially.  This doesn't cause any behavior
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:07 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Paul Gortmaker
f7be345515 sh: Add export.h to arch/sh specific files as required.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:05 -04:00
Simon Horman
e66ac3f26a sh: kexec: Add PHYSICAL_START
Add PHYSICAL_START kernel configuration parameter to set the address at
which the kernel should be loaded.

It has been observed on an sh7757lcr that simply modifying MEMORY_START
does not achieve this goal for 32bit sh. This is due to MEMORY_OFFSET in
arch/sh/kernel/vmlinux.lds.S bot being based on MEMORY_START on such
systems.

Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-10-28 15:03:43 +09:00
Linus Torvalds
4d4abdcb1d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
  perf: Remove the nmi parameter from the oprofile_perf backend
  x86, perf: Make copy_from_user_nmi() a library function
  perf: Remove perf_event_attr::type check
  x86, perf: P4 PMU - Fix typos in comments and style cleanup
  perf tools: Make test use the preset debugfs path
  perf tools: Add automated tests for events parsing
  perf tools: De-opt the parse_events function
  perf script: Fix display of IP address for non-callchain path
  perf tools: Fix endian conversion reading event attr from file header
  perf tools: Add missing 'node' alias to the hw_cache[] array
  perf probe: Support adding probes on offline kernel modules
  perf probe: Add probed module in front of function
  perf probe: Introduce debuginfo to encapsulate dwarf information
  perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
  perf probe: Remove redundant dwarf functions
  perf probe: Move strtailcmp to string.c
  perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
  tracing/kprobe: Update symbol reference when loading module
  tracing/kprobes: Support module init function probing
  kprobes: Return -ENOENT if probe point doesn't exist
  ...
2011-07-22 16:44:39 -07:00
Peter Zijlstra
a8b0ca17b8 perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

  - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
  - tracepoint: nmi=0; since tracepoint could be from NMI context.
  - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:35 +02:00
Paul Mundt
9ab3a15d95 sh: use printk_ratelimited instead of printk_ratelimit
Follows the powerpc change, for much the same rationale.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2011-06-30 15:10:06 +09:00