Merge branch 'common/mmcif' into rmobile-latest

This commit is contained in:
Paul Mundt 2011-01-14 16:06:31 +09:00
commit c488a4731a
181 changed files with 9949 additions and 2141 deletions

View File

@ -385,6 +385,10 @@ mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
pgpgin - # of pages paged in (equivalent to # of charging events).
pgpgout - # of pages paged out (equivalent to # of uncharging events).
swap - # of bytes of swap usage
dirty - # of bytes that are waiting to get written back to the disk.
writeback - # of bytes that are actively being written back to the disk.
nfs_unstable - # of bytes sent to the NFS server, but not yet committed to
the actual storage.
inactive_anon - # of bytes of anonymous memory and swap cache memory on
LRU list.
active_anon - # of bytes of anonymous and swap cache memory on active
@ -406,6 +410,9 @@ total_mapped_file - sum of all children's "cache"
total_pgpgin - sum of all children's "pgpgin"
total_pgpgout - sum of all children's "pgpgout"
total_swap - sum of all children's "swap"
total_dirty - sum of all children's "dirty"
total_writeback - sum of all children's "writeback"
total_nfs_unstable - sum of all children's "nfs_unstable"
total_inactive_anon - sum of all children's "inactive_anon"
total_active_anon - sum of all children's "active_anon"
total_inactive_file - sum of all children's "inactive_file"
@ -453,6 +460,73 @@ memory under it will be reclaimed.
You can reset failcnt by writing 0 to failcnt file.
# echo 0 > .../memory.failcnt
5.5 dirty memory
Control the maximum amount of dirty pages a cgroup can have at any given time.
Limiting dirty memory is like fixing the max amount of dirty (hard to reclaim)
page cache used by a cgroup. So, in case of multiple cgroup writers, they will
not be able to consume more than their designated share of dirty pages and will
be forced to perform write-out if they cross that limit.
The interface is equivalent to the procfs interface: /proc/sys/vm/dirty_*. It
is possible to configure a limit to trigger both a direct writeback or a
background writeback performed by per-bdi flusher threads. The root cgroup
memory.dirty_* control files are read-only and match the contents of
the /proc/sys/vm/dirty_* files.
Per-cgroup dirty limits can be set using the following files in the cgroupfs:
- memory.dirty_ratio: the amount of dirty memory (expressed as a percentage of
cgroup memory) at which a process generating dirty pages will itself start
writing out dirty data.
- memory.dirty_limit_in_bytes: the amount of dirty memory (expressed in bytes)
in the cgroup at which a process generating dirty pages will start itself
writing out dirty data. Suffix (k, K, m, M, g, or G) can be used to indicate
that value is kilo, mega or gigabytes.
Note: memory.dirty_limit_in_bytes is the counterpart of memory.dirty_ratio.
Only one of them may be specified at a time. When one is written it is
immediately taken into account to evaluate the dirty memory limits and the
other appears as 0 when read.
- memory.dirty_background_ratio: the amount of dirty memory of the cgroup
(expressed as a percentage of cgroup memory) at which background writeback
kernel threads will start writing out dirty data.
- memory.dirty_background_limit_in_bytes: the amount of dirty memory (expressed
in bytes) in the cgroup at which background writeback kernel threads will
start writing out dirty data. Suffix (k, K, m, M, g, or G) can be used to
indicate that value is kilo, mega or gigabytes.
Note: memory.dirty_background_limit_in_bytes is the counterpart of
memory.dirty_background_ratio. Only one of them may be specified at a time.
When one is written it is immediately taken into account to evaluate the dirty
memory limits and the other appears as 0 when read.
A cgroup may contain more dirty memory than its dirty limit. This is possible
because of the principle that the first cgroup to touch a page is charged for
it. Subsequent page counting events (dirty, writeback, nfs_unstable) are also
counted to the originally charged cgroup.
Example: If page is allocated by a cgroup A task, then the page is charged to
cgroup A. If the page is later dirtied by a task in cgroup B, then the cgroup A
dirty count will be incremented. If cgroup A is over its dirty limit but cgroup
B is not, then dirtying a cgroup A page from a cgroup B task may push cgroup A
over its dirty limit without throttling the dirtying cgroup B task.
When use_hierarchy=0, each cgroup has dirty memory usage and limits.
System-wide dirty limits are also consulted. Dirty memory consumption is
checked against both system-wide and per-cgroup dirty limits.
The current implementation does not enforce per-cgroup dirty limits when
use_hierarchy=1. System-wide dirty limits are used for processes in such
cgroups. Attempts to read memory.dirty_* files return the system-wide
values. Writes to the memory.dirty_* files return error. An enhanced
implementation is needed to check the chain of parents to ensure that no
dirty limit is exceeded.
6. Hierarchy support
The memory controller supports a deep hierarchy and hierarchical accounting.

View File

@ -8,7 +8,7 @@ Parameters: <cipher> <key> <iv_offset> <device path> <offset>
<cipher>
Encryption cipher and an optional IV generation mode.
(In format cipher-chainmode-ivopts:ivmode).
(In format cipher[:keycount]-chainmode-ivopts:ivmode).
Examples:
des
aes-cbc-essiv:sha256
@ -20,6 +20,11 @@ Parameters: <cipher> <key> <iv_offset> <device path> <offset>
Key used for encryption. It is encoded as a hexadecimal number.
You can only use key sizes that are valid for the selected cipher.
<keycount>
Multi-key compatibility mode. You can define <keycount> keys and
then sectors are encrypted according to their offsets (sector 0 uses key0;
sector 1 uses key1 etc.). <keycount> must be a power of two.
<iv_offset>
The IV offset is a sector count that is added to the sector number
before creating the IV.

View File

@ -0,0 +1,70 @@
Device-mapper RAID (dm-raid) is a bridge from DM to MD. It
provides a way to use device-mapper interfaces to access the MD RAID
drivers.
As with all device-mapper targets, the nominal public interfaces are the
constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
and STATUSTYPE_TABLE). The CTR table looks like the following:
1: <s> <l> raid \
2: <raid_type> <#raid_params> <raid_params> \
3: <#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>
Line 1 contains the standard first three arguments to any device-mapper
target - the start, length, and target type fields. The target type in
this case is "raid".
Line 2 contains the arguments that define the particular raid
type/personality/level, the required arguments for that raid type, and
any optional arguments. Possible raid types include: raid4, raid5_la,
raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (raid1 is
planned for the future.) The list of required and optional parameters
is the same for all the current raid types. The required parameters are
positional, while the optional parameters are given as key/value pairs.
The possible parameters are as follows:
<chunk_size> Chunk size in sectors.
[[no]sync] Force/Prevent RAID initialization
[rebuild <idx>] Rebuild the drive indicated by the index
[daemon_sleep <ms>] Time between bitmap daemon work to clear bits
[min_recovery_rate <kB/sec/disk>] Throttle RAID initialization
[max_recovery_rate <kB/sec/disk>] Throttle RAID initialization
[max_write_behind <sectors>] See '-write-behind=' (man mdadm)
[stripe_cache <sectors>] Stripe cache size for higher RAIDs
Line 3 contains the list of devices that compose the array in
metadata/data device pairs. If the metadata is stored separately, a '-'
is given for the metadata device position. If a drive has failed or is
missing at creation time, a '-' can be given for both the metadata and
data drives for a given position.
NB. Currently all metadata devices must be specified as '-'.
Examples:
# RAID4 - 4 data drives, 1 parity
# No metadata devices specified to hold superblock/bitmap info
# Chunk size of 1MiB
# (Lines separated for easy reading)
0 1960893648 raid \
raid4 1 2048 \
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
# RAID4 - 4 data drives, 1 parity (no metadata devices)
# Chunk size of 1MiB, force RAID initialization,
# min recovery rate at 20 kiB/sec/disk
0 1960893648 raid \
raid4 4 2048 min_recovery_rate 20 sync\
5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
Performing a 'dmsetup table' should display the CTR table used to
construct the mapping (with possible reordering of optional
parameters).
Performing a 'dmsetup status' will yield information on the state and
health of the array. The output is as follows:
1: <s> <l> raid \
2: <raid_type> <#devices> <1 health char for each dev> <resync_ratio>
Line 1 is standard DM output. Line 2 is best shown by example:
0 1960893648 raid raid4 5 AAAAA 2/490221568
Here we can see the RAID type is raid4, there are 5 devices - all of
which are 'A'live, and the array is 2/490221568 complete with recovery.

View File

@ -375,6 +375,7 @@ Anonymous: 0 kB
Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 374 kB
The first of these lines shows the same information as is displayed for the
mapping in /proc/PID/maps. The remaining lines show the size of the mapping
@ -670,6 +671,8 @@ varies by architecture and compile options. The following is from a
> cat /proc/meminfo
The "Locked" indicates whether the mapping is locked in memory or not.
MemTotal: 16344972 kB
MemFree: 13634064 kB
@ -1320,6 +1323,10 @@ scaled linearly with /proc/<pid>/oom_score_adj.
Writing to /proc/<pid>/oom_score_adj or /proc/<pid>/oom_adj will change the
other with its scaled value.
The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
requires CAP_SYS_RESOURCE.
NOTICE: /proc/<pid>/oom_adj is deprecated and will be removed, please see
Documentation/feature-removal-schedule.txt.

View File

@ -135,7 +135,7 @@ setting up a platform_device using the GPIO, is mark its direction:
int gpio_direction_input(unsigned gpio);
int gpio_direction_output(unsigned gpio, int value);
The return value is zero for success, else a negative errno. It must
The return value is zero for success, else a negative errno. It should
be checked, since the get/set calls don't have error returns and since
misconfiguration is possible. You should normally issue these calls from
a task context. However, for spinlock-safe GPIOs it's OK to use them

View File

@ -0,0 +1,298 @@
= Transparent Hugepage Support =
== Objective ==
Performance critical computing applications dealing with large memory
working sets are already running on top of libhugetlbfs and in turn
hugetlbfs. Transparent Hugepage Support is an alternative means of
using huge pages for the backing of virtual memory with huge pages
that supports the automatic promotion and demotion of page sizes and
without the shortcomings of hugetlbfs.
Currently it only works for anonymous memory mappings but in the
future it can expand over the pagecache layer starting with tmpfs.
The reason applications are running faster is because of two
factors. The first factor is almost completely irrelevant and it's not
of significant interest because it'll also have the downside of
requiring larger clear-page copy-page in page faults which is a
potentially negative effect. The first factor consists in taking a
single page fault for each 2M virtual region touched by userland (so
reducing the enter/exit kernel frequency by a 512 times factor). This
only matters the first time the memory is accessed for the lifetime of
a memory mapping. The second long lasting and much more important
factor will affect all subsequent accesses to the memory for the whole
runtime of the application. The second factor consist of two
components: 1) the TLB miss will run faster (especially with
virtualization using nested pagetables but almost always also on bare
metal without virtualization) and 2) a single TLB entry will be
mapping a much larger amount of virtual memory in turn reducing the
number of TLB misses. With virtualization and nested pagetables the
TLB can be mapped of larger size only if both KVM and the Linux guest
are using hugepages but a significant speedup already happens if only
one of the two is using hugepages just because of the fact the TLB
miss is going to run faster.
== Design ==
- "graceful fallback": mm components which don't have transparent
hugepage knowledge fall back to breaking a transparent hugepage and
working on the regular pages and their respective regular pmd/pte
mappings
- if a hugepage allocation fails because of memory fragmentation,
regular pages should be gracefully allocated instead and mixed in
the same vma without any failure or significant delay and without
userland noticing
- if some task quits and more hugepages become available (either
immediately in the buddy or through the VM), guest physical memory
backed by regular pages should be relocated on hugepages
automatically (with khugepaged)
- it doesn't require memory reservation and in turn it uses hugepages
whenever possible (the only possible reservation here is kernelcore=
to avoid unmovable pages to fragment all the memory but such a tweak
is not specific to transparent hugepage support and it's a generic
feature that applies to all dynamic high order allocations in the
kernel)
- this initial support only offers the feature in the anonymous memory
regions but it'd be ideal to move it to tmpfs and the pagecache
later
Transparent Hugepage Support maximizes the usefulness of free memory
if compared to the reservation approach of hugetlbfs by allowing all
unused memory to be used as cache or other movable (or even unmovable
entities). It doesn't require reservation to prevent hugepage
allocation failures to be noticeable from userland. It allows paging
and all other advanced VM features to be available on the
hugepages. It requires no modifications for applications to take
advantage of it.
Applications however can be further optimized to take advantage of
this feature, like for example they've been optimized before to avoid
a flood of mmap system calls for every malloc(4k). Optimizing userland
is by far not mandatory and khugepaged already can take care of long
lived page allocations even for hugepage unaware applications that
deals with large amounts of memory.
In certain cases when hugepages are enabled system wide, application
may end up allocating more memory resources. An application may mmap a
large region but only touch 1 byte of it, in that case a 2M page might
be allocated instead of a 4k page for no good. This is why it's
possible to disable hugepages system-wide and to only have them inside
MADV_HUGEPAGE madvise regions.
Embedded systems should enable hugepages only inside madvise regions
to eliminate any risk of wasting any precious byte of memory and to
only run faster.
Applications that gets a lot of benefit from hugepages and that don't
risk to lose memory by using hugepages, should use
madvise(MADV_HUGEPAGE) on their critical mmapped regions.
== sysfs ==
Transparent Hugepage Support can be entirely disabled (mostly for
debugging purposes) or only enabled inside MADV_HUGEPAGE regions (to
avoid the risk of consuming more memory resources) or enabled system
wide. This can be achieved with one of:
echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled
It's also possible to limit defrag efforts in the VM to generate
hugepages in case they're not immediately free to madvise regions or
to never try to defrag memory and simply fallback to regular pages
unless hugepages are immediately available. Clearly if we spend CPU
time to defrag memory, we would expect to gain even more by the fact
we use hugepages later instead of regular pages. This isn't always
guaranteed, but it may be more likely in case the allocation is for a
MADV_HUGEPAGE region.
echo always >/sys/kernel/mm/transparent_hugepage/defrag
echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo never >/sys/kernel/mm/transparent_hugepage/defrag
khugepaged will be automatically started when
transparent_hugepage/enabled is set to "always" or "madvise, and it'll
be automatically shutdown if it's set to "never".
khugepaged runs usually at low frequency so while one may not want to
invoke defrag algorithms synchronously during the page faults, it
should be worth invoking defrag at least in khugepaged. However it's
also possible to disable defrag in khugepaged:
echo yes >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo no >/sys/kernel/mm/transparent_hugepage/khugepaged/defrag
You can also control how many pages khugepaged should scan at each
pass:
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
and how many milliseconds to wait in khugepaged between each pass (you
can set this to 0 to run khugepaged at 100% utilization of one core):
/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs
and how many milliseconds to wait in khugepaged if there's an hugepage
allocation failure to throttle the next allocation attempt.
/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs
The khugepaged progress can be seen in the number of pages collapsed:
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed
for each pass:
/sys/kernel/mm/transparent_hugepage/khugepaged/full_scans
== Boot parameter ==
You can change the sysfs boot time defaults of Transparent Hugepage
Support by passing the parameter "transparent_hugepage=always" or
"transparent_hugepage=madvise" or "transparent_hugepage=never"
(without "") to the kernel command line.
== Need of application restart ==
The transparent_hugepage/enabled values only affect future
behavior. So to make them effective you need to restart any
application that could have been using hugepages. This also applies to
the regions registered in khugepaged.
== get_user_pages and follow_page ==
get_user_pages and follow_page if run on a hugepage, will return the
head or tail pages as usual (exactly as they would do on
hugetlbfs). Most gup users will only care about the actual physical
address of the page and its temporary pinning to release after the I/O
is complete, so they won't ever notice the fact the page is huge. But
if any driver is going to mangle over the page structure of the tail
page (like for checking page->mapping or other bits that are relevant
for the head page and not the tail page), it should be updated to jump
to check head page instead (while serializing properly against
split_huge_page() to avoid the head and tail pages to disappear from
under it, see the futex code to see an example of that, hugetlbfs also
needed special handling in futex code for similar reasons).
NOTE: these aren't new constraints to the GUP API, and they match the
same constrains that applies to hugetlbfs too, so any driver capable
of handling GUP on hugetlbfs will also work fine on transparent
hugepage backed mappings.
In case you can't handle compound pages if they're returned by
follow_page, the FOLL_SPLIT bit can be specified as parameter to
follow_page, so that it will split the hugepages before returning
them. Migration for example passes FOLL_SPLIT as parameter to
follow_page because it's not hugepage aware and in fact it can't work
at all on hugetlbfs (but it instead works fine on transparent
hugepages thanks to FOLL_SPLIT). migration simply can't deal with
hugepages being returned (as it's not only checking the pfn of the
page and pinning it during the copy but it pretends to migrate the
memory in regular page sizes and with regular pte/pmd mappings).
== Optimizing the applications ==
To be guaranteed that the kernel will map a 2M page immediately in any
memory region, the mmap region has to be hugepage naturally
aligned. posix_memalign() can provide that guarantee.
== Hugetlbfs ==
You can use hugetlbfs on a kernel that has transparent hugepage
support enabled just fine as always. No difference can be noted in
hugetlbfs other than there will be less overall fragmentation. All
usual features belonging to hugetlbfs are preserved and
unaffected. libhugetlbfs will also work fine as usual.
== Graceful fallback ==
Code walking pagetables but unware about huge pmds can simply call
split_huge_page_pmd(mm, pmd) where the pmd is the one returned by
pmd_offset. It's trivial to make the code transparent hugepage aware
by just grepping for "pmd_offset" and adding split_huge_page_pmd where
missing after pmd_offset returns the pmd. Thanks to the graceful
fallback design, with a one liner change, you can avoid to write
hundred if not thousand of lines of complex code to make your code
hugepage aware.
If you're not walking pagetables but you run into a physical hugepage
but you can't handle it natively in your code, you can split it by
calling split_huge_page(page). This is what the Linux VM does before
it tries to swapout the hugepage for example.
Example to make mremap.c transparent hugepage aware with a one liner
change:
diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -41,6 +41,7 @@ static pmd_t *get_old_pmd(struct mm_stru
return NULL;
pmd = pmd_offset(pud, addr);
+ split_huge_page_pmd(mm, pmd);
if (pmd_none_or_clear_bad(pmd))
return NULL;
== Locking in hugepage aware code ==
We want as much code as possible hugepage aware, as calling
split_huge_page() or split_huge_page_pmd() has a cost.
To make pagetable walks huge pmd aware, all you need to do is to call
pmd_trans_huge() on the pmd returned by pmd_offset. You must hold the
mmap_sem in read (or write) mode to be sure an huge pmd cannot be
created from under you by khugepaged (khugepaged collapse_huge_page
takes the mmap_sem in write mode in addition to the anon_vma lock). If
pmd_trans_huge returns false, you just fallback in the old code
paths. If instead pmd_trans_huge returns true, you have to take the
mm->page_table_lock and re-run pmd_trans_huge. Taking the
page_table_lock will prevent the huge pmd to be converted into a
regular pmd from under you (split_huge_page can run in parallel to the
pagetable walk). If the second pmd_trans_huge returns false, you
should just drop the page_table_lock and fallback to the old code as
before. Otherwise you should run pmd_trans_splitting on the pmd. In
case pmd_trans_splitting returns true, it means split_huge_page is
already in the middle of splitting the page. So if pmd_trans_splitting
returns true it's enough to drop the page_table_lock and call
wait_split_huge_page and then fallback the old code paths. You are
guaranteed by the time wait_split_huge_page returns, the pmd isn't
huge anymore. If pmd_trans_splitting returns false, you can proceed to
process the huge pmd and the hugepage natively. Once finished you can
drop the page_table_lock.
== compound_lock, get_user_pages and put_page ==
split_huge_page internally has to distribute the refcounts in the head
page to the tail pages before clearing all PG_head/tail bits from the
page structures. It can do that easily for refcounts taken by huge pmd
mappings. But the GUI API as created by hugetlbfs (that returns head
and tail pages if running get_user_pages on an address backed by any
hugepage), requires the refcount to be accounted on the tail pages and
not only in the head pages, if we want to be able to run
split_huge_page while there are gup pins established on any tail
page. Failure to be able to run split_huge_page if there's any gup pin
on any tail page, would mean having to split all hugepages upfront in
get_user_pages which is unacceptable as too many gup users are
performance critical and they must work natively on hugepages like
they work natively on hugetlbfs already (hugetlbfs is simpler because
hugetlbfs pages cannot be splitted so there wouldn't be requirement of
accounting the pins on the tail pages for hugetlbfs). If we wouldn't
account the gup refcounts on the tail pages during gup, we won't know
anymore which tail page is pinned by gup and which is not while we run
split_huge_page. But we still have to add the gup pin to the head page
too, to know when we can free the compound page in case it's never
splitted during its lifetime. That requires changing not just
get_page, but put_page as well so that when put_page runs on a tail
page (and only on a tail page) it will find its respective head page,
and then it will decrease the head page refcount in addition to the
tail page refcount. To obtain a head page reliably and to decrease its
refcount without race conditions, put_page has to serialize against
__split_huge_page_refcount using a special per-page lock called
compound_lock.

View File

@ -6592,13 +6592,12 @@ F: Documentation/i2c/busses/i2c-viapro
F: drivers/i2c/busses/i2c-viapro.c
VIA SD/MMC CARD CONTROLLER DRIVER
M: Joseph Chan <JosephChan@via.com.tw>
M: Bruce Chang <brucechang@via.com.tw>
M: Harald Welte <HaraldWelte@viatech.com>
S: Maintained
F: drivers/mmc/host/via-sdmmc.c
VIA UNICHROME(PRO)/CHROME9 FRAMEBUFFER DRIVER
M: Joseph Chan <JosephChan@via.com.tw>
M: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
L: linux-fbdev@vger.kernel.org
S: Maintained

View File

@ -53,6 +53,9 @@
#define MADV_MERGEABLE 12 /* KSM may merge identical pages */
#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
/* compatibility flags */
#define MAP_FILE 0

View File

@ -38,17 +38,9 @@
#ifdef CONFIG_MMU
void *module_alloc(unsigned long size)
{
struct vm_struct *area;
size = PAGE_ALIGN(size);
if (!size)
return NULL;
area = __get_vm_area(size, VM_ALLOC, MODULES_VADDR, MODULES_END);
if (!area)
return NULL;
return __vmalloc_area(area, GFP_KERNEL, PAGE_KERNEL_EXEC);
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL_EXEC, -1,
__builtin_return_address(0));
}
#else /* CONFIG_MMU */
void *module_alloc(unsigned long size)

View File

@ -50,7 +50,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
if (!new_pmd)
goto no_pmd;
new_pte = pte_alloc_map(mm, new_pmd, 0);
new_pte = pte_alloc_map(mm, NULL, new_pmd, 0);
if (!new_pte)
goto no_pte;

View File

@ -188,7 +188,7 @@ static void __init set_hw_addr(struct platform_device *pdev)
*/
regs = (void __iomem __force *)res->start;
pclk = clk_get(&pdev->dev, "pclk");
if (!pclk)
if (IS_ERR(pclk))
return;
clk_enable(pclk);

View File

@ -203,7 +203,7 @@ static void __init set_hw_addr(struct platform_device *pdev)
*/
regs = (void __iomem __force *)res->start;
pclk = clk_get(&pdev->dev, "pclk");
if (!pclk)
if (IS_ERR(pclk))
return;
clk_enable(pclk);

View File

@ -206,7 +206,7 @@ static void __init set_hw_addr(struct platform_device *pdev)
*/
regs = (void __iomem __force *)res->start;
pclk = clk_get(&pdev->dev, "pclk");
if (!pclk)
if (IS_ERR(pclk))
return;
clk_enable(pclk);

View File

@ -150,7 +150,7 @@ static void __init set_hw_addr(struct platform_device *pdev)
regs = (void __iomem __force *)res->start;
pclk = clk_get(&pdev->dev, "pclk");
if (!pclk)
if (IS_ERR(pclk))
return;
clk_enable(pclk);

View File

@ -134,7 +134,7 @@ static void __init set_hw_addr(struct platform_device *pdev)
regs = (void __iomem __force *)res->start;
pclk = clk_get(&pdev->dev, "pclk");
if (!pclk)
if (IS_ERR(pclk))
return;
clk_enable(pclk);

View File

@ -162,7 +162,7 @@ static void __init set_hw_addr(struct platform_device *pdev)
*/
regs = (void __iomem __force *)res->start;
pclk = clk_get(&pdev->dev, "pclk");
if (!pclk)
if (IS_ERR(pclk))
return;
clk_enable(pclk);

View File

@ -2,20 +2,17 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
@ -29,6 +26,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -72,8 +70,8 @@ CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_TCLIB=y
CONFIG_EEPROM_AT24=m
CONFIG_NETDEVICES=y
CONFIG_TUN=m
CONFIG_NET_ETHERNET=y
@ -106,6 +104,7 @@ CONFIG_GPIO_SYSFS=y
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_VBUS_DRAW=350
CONFIG_USB_ZERO=m
CONFIG_USB_ETH=m
CONFIG_USB_GADGETFS=m
@ -115,14 +114,12 @@ CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_MMC_SPI=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_AT32AP700X=y
CONFIG_DMADEVICES=y
@ -130,21 +127,23 @@ CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=m
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
CONFIG_UFS_FS=y
CONFIG_UBIFS_FS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
@ -155,5 +154,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
CONFIG_CRYPTO_PCBC=m

View File

@ -2,20 +2,17 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
@ -31,6 +28,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -74,8 +72,10 @@ CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_TCLIB=y
CONFIG_NETDEVICES=y
CONFIG_TUN=m
CONFIG_NET_ETHERNET=y
CONFIG_MACB=y
# CONFIG_NETDEV_1000 is not set
@ -104,6 +104,7 @@ CONFIG_I2C_GPIO=m
CONFIG_SPI=y
CONFIG_SPI_ATMEL=y
CONFIG_SPI_SPIDEV=m
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
@ -127,6 +128,7 @@ CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
@ -141,11 +143,14 @@ CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
@ -155,7 +160,6 @@ CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
@ -166,4 +170,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set

View File

@ -2,20 +2,17 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
@ -30,6 +27,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -73,8 +71,10 @@ CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_TCLIB=y
CONFIG_NETDEVICES=y
CONFIG_TUN=m
CONFIG_NET_ETHERNET=y
CONFIG_MACB=y
# CONFIG_NETDEV_1000 is not set
@ -103,6 +103,7 @@ CONFIG_I2C_GPIO=m
CONFIG_SPI=y
CONFIG_SPI_ATMEL=y
CONFIG_SPI_SPIDEV=m
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
@ -126,6 +127,7 @@ CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
@ -140,11 +142,14 @@ CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
@ -154,7 +159,6 @@ CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
@ -165,4 +169,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set

View File

@ -2,20 +2,17 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
@ -29,6 +26,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -74,6 +72,7 @@ CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_TCLIB=y
CONFIG_NETDEVICES=y
CONFIG_TUN=m
@ -107,6 +106,7 @@ CONFIG_GPIO_SYSFS=y
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_VBUS_DRAW=350
CONFIG_USB_ZERO=m
CONFIG_USB_ETH=m
CONFIG_USB_GADGETFS=m
@ -116,14 +116,12 @@ CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_MMC_SPI=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_AT32AP700X=y
CONFIG_DMADEVICES=y
@ -131,21 +129,23 @@ CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=m
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
CONFIG_UFS_FS=y
CONFIG_UBIFS_FS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
@ -156,5 +156,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
CONFIG_CRYPTO_PCBC=m

View File

@ -2,20 +2,17 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
@ -32,6 +29,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -77,8 +75,10 @@ CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_TCLIB=y
CONFIG_NETDEVICES=y
CONFIG_TUN=m
CONFIG_NET_ETHERNET=y
CONFIG_MACB=y
# CONFIG_NETDEV_1000 is not set
@ -107,6 +107,7 @@ CONFIG_I2C_GPIO=m
CONFIG_SPI=y
CONFIG_SPI_ATMEL=y
CONFIG_SPI_SPIDEV=m
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
@ -130,6 +131,7 @@ CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
@ -144,11 +146,14 @@ CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
@ -158,7 +163,6 @@ CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
@ -169,4 +173,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set

View File

@ -2,20 +2,17 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
@ -31,6 +28,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -76,8 +74,10 @@ CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_TCLIB=y
CONFIG_NETDEVICES=y
CONFIG_TUN=m
CONFIG_NET_ETHERNET=y
CONFIG_MACB=y
# CONFIG_NETDEV_1000 is not set
@ -106,6 +106,7 @@ CONFIG_I2C_GPIO=m
CONFIG_SPI=y
CONFIG_SPI_ATMEL=y
CONFIG_SPI_SPIDEV=m
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
@ -129,6 +130,7 @@ CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
@ -143,11 +145,14 @@ CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
@ -157,7 +162,6 @@ CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_NFSD=m
CONFIG_NFSD_V3=y
CONFIG_SMB_FS=m
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
@ -168,4 +172,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set

View File

@ -3,7 +3,6 @@ CONFIG_EXPERIMENTAL=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
@ -11,7 +10,7 @@ CONFIG_BLK_DEV_INITRD=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
@ -26,6 +25,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -35,6 +35,7 @@ CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IPGRE=m
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
@ -58,16 +59,14 @@ CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_DATAFLASH=m
CONFIG_MTD_M25P80=m
CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_PWM=m
CONFIG_ATMEL_TCLIB=y
CONFIG_ATMEL_SSC=m
CONFIG_EEPROM_AT24=m
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=m
CONFIG_BLK_DEV_SR=m
@ -120,7 +119,6 @@ CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
# CONFIG_SND_SUPPORT_OLD_API is not set
# CONFIG_SND_VERBOSE_PROCFS is not set
# CONFIG_SND_DRIVERS is not set
CONFIG_SND_AT73C213=m
# CONFIG_HID_SUPPORT is not set
CONFIG_USB_GADGET=y
@ -131,16 +129,15 @@ CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_MMC_SPI=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=m
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_ATMEL_PWM=m
CONFIG_LEDS_GPIO=m
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_AT32AP700X=y
CONFIG_DMADEVICES=y
@ -149,20 +146,23 @@ CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
# CONFIG_JFFS2_FS_WRITEBUFFER is not set
CONFIG_UBIFS_FS=y
CONFIG_MINIX_FS=m
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_MAGIC_SYSRQ=y
@ -170,6 +170,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_CRYPTO_HW is not set
CONFIG_CRC_T10DIF=m

View File

@ -2,22 +2,15 @@ CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_AUDIT=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
@ -33,6 +26,7 @@ CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -54,18 +48,18 @@ CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_DATAFLASH=m
CONFIG_MTD_M25P80=m
CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_PWM=m
CONFIG_ATMEL_TCLIB=y
CONFIG_ATMEL_SSC=m
CONFIG_EEPROM_AT24=m
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=m
CONFIG_BLK_DEV_SR=m
# CONFIG_SCSI_LOWLEVEL is not set
CONFIG_ATA=m
# CONFIG_SATA_PMP is not set
CONFIG_PATA_AT32=m
@ -77,6 +71,7 @@ CONFIG_PPP_ASYNC=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_INPUT=m
CONFIG_INPUT_EVDEV=m
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_GPIO=m
# CONFIG_MOUSE_PS2 is not set
@ -106,7 +101,6 @@ CONFIG_SND_PCM_OSS=m
CONFIG_SND_AT73C213=m
# CONFIG_HID_SUPPORT is not set
CONFIG_USB_GADGET=y
CONFIG_USB_GADGET_DEBUG_FS=y
CONFIG_USB_ZERO=m
CONFIG_USB_ETH=m
CONFIG_USB_GADGETFS=m
@ -116,36 +110,39 @@ CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_MMC_SPI=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_ATMEL_PWM=m
CONFIG_LEDS_GPIO=y
CONFIG_LEDS_GPIO=m
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=y
CONFIG_LEDS_TRIGGER_DEFAULT_ON=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_AT32AP700X=y
CONFIG_DMADEVICES=y
CONFIG_DW_DMAC=y
CONFIG_EXT2_FS=m
CONFIG_EXT3_FS=m
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=m
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
CONFIG_UBIFS_FS=y
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
CONFIG_CRC_T10DIF=m

View File

@ -1,19 +1,32 @@
CONFIG_EXPERIMENTAL=y
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
# CONFIG_BASE_FULL is not set
# CONFIG_FUTEX is not set
# CONFIG_EPOLL is not set
# CONFIG_SIGNALFD is not set
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SLOB=y
# CONFIG_BLOCK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_IOSCHED_DEADLINE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BOARD_ATSTK1004=y
# CONFIG_OWNERSHIP_TRACE is not set
CONFIG_NMI_DEBUGGING=y
CONFIG_PM=y
CONFIG_CPU_FREQ=y
# CONFIG_CPU_FREQ_STAT is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_AT32AP=y
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
@ -31,40 +44,104 @@ CONFIG_MTD=y
CONFIG_MTD_PARTITIONS=y
CONFIG_MTD_CMDLINE_PARTS=y
CONFIG_MTD_CHAR=y
CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
# CONFIG_MISC_DEVICES is not set
# CONFIG_INPUT is not set
CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_PWM=m
CONFIG_ATMEL_TCLIB=y
CONFIG_ATMEL_SSC=m
# CONFIG_SCSI_PROC_FS is not set
CONFIG_BLK_DEV_SD=m
CONFIG_BLK_DEV_SR=m
# CONFIG_SCSI_LOWLEVEL is not set
CONFIG_ATA=m
# CONFIG_SATA_PMP is not set
CONFIG_PATA_AT32=m
CONFIG_NETDEVICES=y
# CONFIG_NETDEV_1000 is not set
# CONFIG_NETDEV_10000 is not set
CONFIG_PPP=m
CONFIG_PPP_ASYNC=m
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_INPUT=m
CONFIG_INPUT_EVDEV=m
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_GPIO=m
# CONFIG_MOUSE_PS2 is not set
CONFIG_MOUSE_GPIO=m
# CONFIG_SERIO is not set
# CONFIG_VT is not set
# CONFIG_DEVKMEM is not set
CONFIG_SERIAL_ATMEL=y
CONFIG_SERIAL_ATMEL_CONSOLE=y
# CONFIG_SERIAL_ATMEL_PDC is not set
# CONFIG_LEGACY_PTYS is not set
# CONFIG_HW_RANDOM is not set
CONFIG_I2C=m
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_GPIO=m
CONFIG_SPI=y
CONFIG_SPI_ATMEL=y
CONFIG_SPI_SPIDEV=m
CONFIG_GPIO_SYSFS=y
# CONFIG_HWMON is not set
CONFIG_WATCHDOG=y
CONFIG_AT32AP700X_WDT=y
CONFIG_FB=y
CONFIG_FB_ATMEL=y
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=y
CONFIG_LCD_LTV350QV=y
# CONFIG_BACKLIGHT_CLASS_DEVICE is not set
CONFIG_USB_GADGET=y
CONFIG_USB_ETH=y
# CONFIG_USB_ETH_RNDIS is not set
CONFIG_USB_ZERO=m
CONFIG_USB_ETH=m
CONFIG_USB_GADGETFS=m
CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_ATMEL_PWM=m
CONFIG_LEDS_GPIO=m
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_RTC_CLASS=y
# CONFIG_RTC_INTF_PROC is not set
CONFIG_RTC_DRV_AT32AP700X=y
CONFIG_DMADEVICES=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_EXT4_FS=y
# CONFIG_EXT4_FS_XATTR is not set
# CONFIG_DNOTIFY is not set
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
# CONFIG_PROC_PAGE_MONITOR is not set
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
# CONFIG_JFFS2_FS_WRITEBUFFER is not set
CONFIG_UBIFS_FS=y
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y

View File

@ -3,7 +3,6 @@ CONFIG_EXPERIMENTAL=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_LOG_BUF_SHIFT=14
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_SYSCTL_SYSCALL is not set
@ -11,7 +10,7 @@ CONFIG_BLK_DEV_INITRD=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set
@ -37,6 +36,7 @@ CONFIG_INET=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IPGRE=m
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
@ -60,15 +60,13 @@ CONFIG_MTD_BLOCK=y
CONFIG_MTD_CFI=y
CONFIG_MTD_CFI_AMDSTD=y
CONFIG_MTD_PHYSMAP=y
CONFIG_MTD_DATAFLASH=m
CONFIG_MTD_DATAFLASH_OTP=y
CONFIG_MTD_M25P80=m
CONFIG_MTD_NAND=y
CONFIG_MTD_NAND_ATMEL=y
CONFIG_MTD_UBI=y
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_MISC_DEVICES=y
CONFIG_ATMEL_PWM=m
CONFIG_ATMEL_TCLIB=y
CONFIG_ATMEL_SSC=m
@ -132,17 +130,17 @@ CONFIG_USB_ETH=m
CONFIG_USB_GADGETFS=m
CONFIG_USB_FILE_STORAGE=m
CONFIG_USB_G_SERIAL=m
CONFIG_USB_CDC_COMPOSITE=m
CONFIG_MMC=y
CONFIG_MMC_TEST=m
CONFIG_MMC_ATMELMCI=y
CONFIG_MMC_SPI=m
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=m
CONFIG_LEDS_CLASS=y
CONFIG_LEDS_ATMEL_PWM=m
CONFIG_LEDS_GPIO=m
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
CONFIG_LEDS_TRIGGER_DEFAULT_ON=m
CONFIG_RTC_CLASS=y
CONFIG_RTC_DRV_AT32AP700X=y
CONFIG_DMADEVICES=y
@ -156,15 +154,18 @@ CONFIG_EXT4_FS=y
CONFIG_FUSE_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=850
CONFIG_PROC_KCORE=y
CONFIG_TMPFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_JFFS2_FS=y
CONFIG_UBIFS_FS=y
CONFIG_MINIX_FS=m
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_ROOT_NFS=y
CONFIG_CIFS=m
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_MAGIC_SYSRQ=y
@ -172,7 +173,3 @@ CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
CONFIG_CRYPTO_FIPS=y
# CONFIG_CRYPTO_HW is not set
CONFIG_CRC_T10DIF=m

View File

@ -11,7 +11,7 @@ CONFIG_BLK_DEV_INITRD=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_BLK_DEV_BSG is not set

View File

@ -12,7 +12,7 @@ CONFIG_BLK_DEV_INITRD=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_OPROFILE=m
CONFIG_KPROBES=y
# CONFIG_KPROBES is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y

View File

@ -15,20 +15,6 @@
#include <linux/types.h>
#include <linux/signal.h>
/* kernel/process.c */
asmlinkage int sys_fork(struct pt_regs *);
asmlinkage int sys_clone(unsigned long, unsigned long,
unsigned long, unsigned long,
struct pt_regs *);
asmlinkage int sys_vfork(struct pt_regs *);
asmlinkage int sys_execve(const char __user *, char __user *__user *,
char __user *__user *, struct pt_regs *);
/* kernel/signal.c */
asmlinkage int sys_sigaltstack(const stack_t __user *, stack_t __user *,
struct pt_regs *);
asmlinkage int sys_rt_sigreturn(struct pt_regs *);
/* mm/cache.c */
asmlinkage int sys_cacheflush(int, void __user *, size_t);

View File

@ -367,14 +367,13 @@ asmlinkage int sys_fork(struct pt_regs *regs)
}
asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
unsigned long parent_tidptr,
unsigned long child_tidptr, struct pt_regs *regs)
void __user *parent_tidptr, void __user *child_tidptr,
struct pt_regs *regs)
{
if (!newsp)
newsp = regs->sp;
return do_fork(clone_flags, newsp, regs, 0,
(int __user *)parent_tidptr,
(int __user *)child_tidptr);
return do_fork(clone_flags, newsp, regs, 0, parent_tidptr,
child_tidptr);
}
asmlinkage int sys_vfork(struct pt_regs *regs)

View File

@ -35,7 +35,6 @@ static struct clocksource counter = {
.rating = 50,
.read = read_cycle_count,
.mask = CLOCKSOURCE_MASK(32),
.shift = 16,
.flags = CLOCK_SOURCE_IS_CONTINUOUS,
};
@ -123,9 +122,7 @@ void __init time_init(void)
/* figure rate for counter */
counter_hz = clk_get_rate(boot_cpu_data.clk);
counter.mult = clocksource_hz2mult(counter_hz, counter.shift);
ret = clocksource_register(&counter);
ret = clocksource_register_hz(&counter, counter_hz);
if (ret)
pr_debug("timer: could not register clocksource: %d\n", ret);

View File

@ -618,7 +618,7 @@ pfm_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
}
/* forward declaration */
static static const struct dentry_operations pfmfs_dentry_operations;
static const struct dentry_operations pfmfs_dentry_operations;
static struct dentry *
pfmfs_mount(struct file_system_type *fs_type, int flags, const char *dev_name, void *data)

View File

@ -38,7 +38,7 @@ huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz)
if (pud) {
pmd = pmd_alloc(mm, pud, taddr);
if (pmd)
pte = pte_alloc_map(mm, pmd, taddr);
pte = pte_alloc_map(mm, NULL, pmd, taddr);
}
return pte;
}

View File

@ -77,6 +77,9 @@
#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
#define MADV_HWPOISON 100 /* poison a page for testing */
#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
/* compatibility flags */
#define MAP_FILE 0

View File

@ -46,17 +46,9 @@ static DEFINE_SPINLOCK(dbe_lock);
void *module_alloc(unsigned long size)
{
#ifdef MODULE_START
struct vm_struct *area;
size = PAGE_ALIGN(size);
if (!size)
return NULL;
area = __get_vm_area(size, VM_ALLOC, MODULE_START, MODULE_END);
if (!area)
return NULL;
return __vmalloc_area(area, GFP_KERNEL, PAGE_KERNEL);
return __vmalloc_node_range(size, 1, MODULE_START, MODULE_END,
GFP_KERNEL, PAGE_KERNEL, -1,
__builtin_return_address(0));
#else
if (size == 0)
return NULL;

View File

@ -59,6 +59,9 @@
#define MADV_MERGEABLE 65 /* KSM may merge identical pages */
#define MADV_UNMERGEABLE 66 /* KSM may not merge identical pages */
#define MADV_HUGEPAGE 67 /* Worth backing with hugepages */
#define MADV_NOHUGEPAGE 68 /* Not worth backing with hugepages */
/* compatibility flags */
#define MAP_FILE 0
#define MAP_VARIABLE 0

View File

@ -16,6 +16,16 @@
#ifdef __HAVE_ARCH_PTE_SPECIAL
static inline void get_huge_page_tail(struct page *page)
{
/*
* __split_huge_page_refcount() cannot run
* from under us.
*/
VM_BUG_ON(atomic_read(&page->_count) < 0);
atomic_inc(&page->_count);
}
/*
* The performance critical leaf functions are made noinline otherwise gcc
* inlines everything into a single function which results in too much
@ -47,6 +57,8 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
put_page(page);
return 0;
}
if (PageTail(page))
get_huge_page_tail(page);
pages[*nr] = page;
(*nr)++;

View File

@ -35,7 +35,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
if (pud) {
pmd = pmd_alloc(mm, pud, addr);
if (pmd)
pte = pte_alloc_map(mm, pmd, addr);
pte = pte_alloc_map(mm, NULL, pmd, addr);
}
}

View File

@ -23,17 +23,11 @@
static void *module_map(unsigned long size)
{
struct vm_struct *area;
size = PAGE_ALIGN(size);
if (!size || size > MODULES_LEN)
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
area = __get_vm_area(size, VM_ALLOC, MODULES_VADDR, MODULES_END);
if (!area)
return NULL;
return __vmalloc_area(area, GFP_KERNEL, PAGE_KERNEL);
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL, PAGE_KERNEL, -1,
__builtin_return_address(0));
}
static char *dot2underscore(char *name)

View File

@ -50,7 +50,7 @@ static inline int io_remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned
end = PGDIR_SIZE;
offset -= address;
do {
pte_t * pte = pte_alloc_map(mm, pmd, address);
pte_t *pte = pte_alloc_map(mm, NULL, pmd, address);
if (!pte)
return -ENOMEM;
io_remap_pte_range(mm, pte, address, end - address, address + offset, prot, space);

View File

@ -92,7 +92,7 @@ static inline int io_remap_pmd_range(struct mm_struct *mm, pmd_t * pmd, unsigned
end = PGDIR_SIZE;
offset -= address;
do {
pte_t * pte = pte_alloc_map(mm, pmd, address);
pte_t *pte = pte_alloc_map(mm, NULL, pmd, address);
if (!pte)
return -ENOMEM;
io_remap_pte_range(mm, pte, address, end - address, address + offset, prot, space);

View File

@ -214,7 +214,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
if (pud) {
pmd = pmd_alloc(mm, pud, addr);
if (pmd)
pte = pte_alloc_map(mm, pmd, addr);
pte = pte_alloc_map(mm, NULL, pmd, addr);
}
return pte;
}

View File

@ -31,7 +31,7 @@ static int init_stub_pte(struct mm_struct *mm, unsigned long proc,
if (!pmd)
goto out_pmd;
pte = pte_alloc_map(mm, pmd, proc);
pte = pte_alloc_map(mm, NULL, pmd, proc);
if (!pte)
goto out_pte;

View File

@ -822,6 +822,7 @@ extern bool kvm_rebooting;
#define KVM_ARCH_WANT_MMU_NOTIFIER
int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
int kvm_age_hva(struct kvm *kvm, unsigned long hva);
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);

View File

@ -435,6 +435,11 @@ static inline void pte_update(struct mm_struct *mm, unsigned long addr,
{
PVOP_VCALL3(pv_mmu_ops.pte_update, mm, addr, ptep);
}
static inline void pmd_update(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp)
{
PVOP_VCALL3(pv_mmu_ops.pmd_update, mm, addr, pmdp);
}
static inline void pte_update_defer(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
@ -442,6 +447,12 @@ static inline void pte_update_defer(struct mm_struct *mm, unsigned long addr,
PVOP_VCALL3(pv_mmu_ops.pte_update_defer, mm, addr, ptep);
}
static inline void pmd_update_defer(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp)
{
PVOP_VCALL3(pv_mmu_ops.pmd_update_defer, mm, addr, pmdp);
}
static inline pte_t __pte(pteval_t val)
{
pteval_t ret;
@ -543,6 +554,20 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
PVOP_VCALL4(pv_mmu_ops.set_pte_at, mm, addr, ptep, pte.pte);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
{
#if PAGETABLE_LEVELS >= 3
if (sizeof(pmdval_t) > sizeof(long))
/* 5 arg words */
pv_mmu_ops.set_pmd_at(mm, addr, pmdp, pmd);
else
PVOP_VCALL4(pv_mmu_ops.set_pmd_at, mm, addr, pmdp, pmd.pmd);
#endif
}
#endif
static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
{
pmdval_t val = native_pmd_val(pmd);

View File

@ -265,10 +265,16 @@ struct pv_mmu_ops {
void (*set_pte_at)(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval);
void (*set_pmd)(pmd_t *pmdp, pmd_t pmdval);
void (*set_pmd_at)(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmdval);
void (*pte_update)(struct mm_struct *mm, unsigned long addr,
pte_t *ptep);
void (*pte_update_defer)(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
void (*pmd_update)(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp);
void (*pmd_update_defer)(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp);
pte_t (*ptep_modify_prot_start)(struct mm_struct *mm, unsigned long addr,
pte_t *ptep);

View File

@ -46,6 +46,15 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
#define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
#endif
#ifdef CONFIG_SMP
static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
{
return __pmd(xchg((pmdval_t *)xp, 0));
}
#else
#define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
#endif
/*
* Bits _PAGE_BIT_PRESENT, _PAGE_BIT_FILE and _PAGE_BIT_PROTNONE are taken,
* split up the 29 bits of offset into this range:

View File

@ -104,6 +104,29 @@ static inline pte_t native_ptep_get_and_clear(pte_t *ptep)
#define native_ptep_get_and_clear(xp) native_local_ptep_get_and_clear(xp)
#endif
#ifdef CONFIG_SMP
union split_pmd {
struct {
u32 pmd_low;
u32 pmd_high;
};
pmd_t pmd;
};
static inline pmd_t native_pmdp_get_and_clear(pmd_t *pmdp)
{
union split_pmd res, *orig = (union split_pmd *)pmdp;
/* xchg acts as a barrier before setting of the high bits */
res.pmd_low = xchg(&orig->pmd_low, 0);
res.pmd_high = orig->pmd_high;
orig->pmd_high = 0;
return res.pmd;
}
#else
#define native_pmdp_get_and_clear(xp) native_local_pmdp_get_and_clear(xp)
#endif
/*
* Bits 0, 6 and 7 are taken in the low part of the pte,
* put the 32 bits of offset into the high part.

View File

@ -35,6 +35,7 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
#else /* !CONFIG_PARAVIRT */
#define set_pte(ptep, pte) native_set_pte(ptep, pte)
#define set_pte_at(mm, addr, ptep, pte) native_set_pte_at(mm, addr, ptep, pte)
#define set_pmd_at(mm, addr, pmdp, pmd) native_set_pmd_at(mm, addr, pmdp, pmd)
#define set_pte_atomic(ptep, pte) \
native_set_pte_atomic(ptep, pte)
@ -59,6 +60,8 @@ extern struct mm_struct *pgd_page_get_mm(struct page *page);
#define pte_update(mm, addr, ptep) do { } while (0)
#define pte_update_defer(mm, addr, ptep) do { } while (0)
#define pmd_update(mm, addr, ptep) do { } while (0)
#define pmd_update_defer(mm, addr, ptep) do { } while (0)
#define pgd_val(x) native_pgd_val(x)
#define __pgd(x) native_make_pgd(x)
@ -94,6 +97,11 @@ static inline int pte_young(pte_t pte)
return pte_flags(pte) & _PAGE_ACCESSED;
}
static inline int pmd_young(pmd_t pmd)
{
return pmd_flags(pmd) & _PAGE_ACCESSED;
}
static inline int pte_write(pte_t pte)
{
return pte_flags(pte) & _PAGE_RW;
@ -142,6 +150,23 @@ static inline int pmd_large(pmd_t pte)
(_PAGE_PSE | _PAGE_PRESENT);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline int pmd_trans_splitting(pmd_t pmd)
{
return pmd_val(pmd) & _PAGE_SPLITTING;
}
static inline int pmd_trans_huge(pmd_t pmd)
{
return pmd_val(pmd) & _PAGE_PSE;
}
static inline int has_transparent_hugepage(void)
{
return cpu_has_pse;
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
static inline pte_t pte_set_flags(pte_t pte, pteval_t set)
{
pteval_t v = native_pte_val(pte);
@ -216,6 +241,55 @@ static inline pte_t pte_mkspecial(pte_t pte)
return pte_set_flags(pte, _PAGE_SPECIAL);
}
static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
{
pmdval_t v = native_pmd_val(pmd);
return __pmd(v | set);
}
static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
{
pmdval_t v = native_pmd_val(pmd);
return __pmd(v & ~clear);
}
static inline pmd_t pmd_mkold(pmd_t pmd)
{
return pmd_clear_flags(pmd, _PAGE_ACCESSED);
}
static inline pmd_t pmd_wrprotect(pmd_t pmd)
{
return pmd_clear_flags(pmd, _PAGE_RW);
}
static inline pmd_t pmd_mkdirty(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_DIRTY);
}
static inline pmd_t pmd_mkhuge(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_PSE);
}
static inline pmd_t pmd_mkyoung(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_ACCESSED);
}
static inline pmd_t pmd_mkwrite(pmd_t pmd)
{
return pmd_set_flags(pmd, _PAGE_RW);
}
static inline pmd_t pmd_mknotpresent(pmd_t pmd)
{
return pmd_clear_flags(pmd, _PAGE_PRESENT);
}
/*
* Mask out unsupported bits in a present pgprot. Non-present pgprots
* can use those bits for other purposes, so leave them be.
@ -256,6 +330,16 @@ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
return __pte(val);
}
static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{
pmdval_t val = pmd_val(pmd);
val &= _HPAGE_CHG_MASK;
val |= massage_pgprot(newprot) & ~_HPAGE_CHG_MASK;
return __pmd(val);
}
/* mprotect needs to preserve PAT bits when updating vm_page_prot */
#define pgprot_modify pgprot_modify
static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
@ -350,7 +434,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
#define pmd_page(pmd) pfn_to_page(pmd_val(pmd) >> PAGE_SHIFT)
#define pmd_page(pmd) pfn_to_page((pmd_val(pmd) & PTE_PFN_MASK) >> PAGE_SHIFT)
/*
* the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@ -524,12 +608,26 @@ static inline pte_t native_local_ptep_get_and_clear(pte_t *ptep)
return res;
}
static inline pmd_t native_local_pmdp_get_and_clear(pmd_t *pmdp)
{
pmd_t res = *pmdp;
native_pmd_clear(pmdp);
return res;
}
static inline void native_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep , pte_t pte)
{
native_set_pte(ptep, pte);
}
static inline void native_set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp , pmd_t pmd)
{
native_set_pmd(pmdp, pmd);
}
#ifndef CONFIG_PARAVIRT
/*
* Rules for using pte_update - it must be called after any PTE update which
@ -607,6 +705,49 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
#define flush_tlb_fix_spurious_fault(vma, address)
#define mk_pmd(page, pgprot) pfn_pmd(page_to_pfn(page), (pgprot))
#define __HAVE_ARCH_PMDP_SET_ACCESS_FLAGS
extern int pmdp_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp,
pmd_t entry, int dirty);
#define __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG
extern int pmdp_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp);
#define __HAVE_ARCH_PMDP_CLEAR_YOUNG_FLUSH
extern int pmdp_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp);
#define __HAVE_ARCH_PMDP_SPLITTING_FLUSH
extern void pmdp_splitting_flush(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp);
#define __HAVE_ARCH_PMD_WRITE
static inline int pmd_write(pmd_t pmd)
{
return pmd_flags(pmd) & _PAGE_RW;
}
#define __HAVE_ARCH_PMDP_GET_AND_CLEAR
static inline pmd_t pmdp_get_and_clear(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp)
{
pmd_t pmd = native_pmdp_get_and_clear(pmdp);
pmd_update(mm, addr, pmdp);
return pmd;
}
#define __HAVE_ARCH_PMDP_SET_WRPROTECT
static inline void pmdp_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
{
clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
pmd_update(mm, addr, pmdp);
}
/*
* clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
*

View File

@ -59,6 +59,16 @@ static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte)
native_set_pte(ptep, pte);
}
static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
{
*pmdp = pmd;
}
static inline void native_pmd_clear(pmd_t *pmd)
{
native_set_pmd(pmd, native_make_pmd(0));
}
static inline pte_t native_ptep_get_and_clear(pte_t *xp)
{
#ifdef CONFIG_SMP
@ -72,14 +82,17 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
#endif
}
static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
{
*pmdp = pmd;
}
static inline void native_pmd_clear(pmd_t *pmd)
{
native_set_pmd(pmd, native_make_pmd(0));
#ifdef CONFIG_SMP
return native_make_pmd(xchg(&xp->pmd, 0));
#else
/* native_local_pmdp_get_and_clear,
but duplicated because of cyclic dependency */
pmd_t ret = *xp;
native_pmd_clear(xp);
return ret;
#endif
}
static inline void native_set_pud(pud_t *pudp, pud_t pud)
@ -168,6 +181,7 @@ extern void cleanup_highmap(void);
#define kc_offset_to_vaddr(o) ((o) | ~__VIRTUAL_MASK)
#define __HAVE_ARCH_PTE_SAME
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_PGTABLE_64_H */

View File

@ -22,6 +22,7 @@
#define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */
#define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1
#define _PAGE_BIT_CPA_TEST _PAGE_BIT_UNUSED1
#define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */
#define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */
/* If _PAGE_BIT_PRESENT is clear, we use these: */
@ -45,6 +46,7 @@
#define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
#define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
#define _PAGE_CPA_TEST (_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST)
#define _PAGE_SPLITTING (_AT(pteval_t, 1) << _PAGE_BIT_SPLITTING)
#define __HAVE_ARCH_PTE_SPECIAL
#ifdef CONFIG_KMEMCHECK
@ -70,6 +72,7 @@
/* Set of bits not changed in pte_modify */
#define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
_PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
#define _PAGE_CACHE_MASK (_PAGE_PCD | _PAGE_PWT)
#define _PAGE_CACHE_WB (0)

View File

@ -42,6 +42,11 @@ extern unsigned int machine_to_phys_order;
extern unsigned long get_phys_to_machine(unsigned long pfn);
extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn);
extern int m2p_add_override(unsigned long mfn, struct page *page);
extern int m2p_remove_override(struct page *page);
extern struct page *m2p_find_override(unsigned long mfn);
extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
static inline unsigned long pfn_to_mfn(unsigned long pfn)
{
unsigned long mfn;
@ -72,9 +77,6 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
if (xen_feature(XENFEAT_auto_translated_physmap))
return mfn;
if (unlikely((mfn >> machine_to_phys_order) != 0))
return ~0;
pfn = 0;
/*
* The array access can fail (e.g., device space beyond end of RAM).
@ -83,6 +85,14 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
*/
__get_user(pfn, &machine_to_phys_mapping[mfn]);
/*
* If this appears to be a foreign mfn (because the pfn
* doesn't map back to the mfn), then check the local override
* table to see if there's a better pfn to use.
*/
if (get_phys_to_machine(pfn) != mfn)
pfn = m2p_find_override_pfn(mfn, pfn);
return pfn;
}

View File

@ -37,20 +37,11 @@
void *module_alloc(unsigned long size)
{
struct vm_struct *area;
if (!size)
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
size = PAGE_ALIGN(size);
if (size > MODULES_LEN)
return NULL;
area = __get_vm_area(size, VM_ALLOC, MODULES_VADDR, MODULES_END);
if (!area)
return NULL;
return __vmalloc_area(area, GFP_KERNEL | __GFP_HIGHMEM,
PAGE_KERNEL_EXEC);
return __vmalloc_node_range(size, 1, MODULES_VADDR, MODULES_END,
GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL_EXEC,
-1, __builtin_return_address(0));
}
/* Free memory returned from module_alloc */

View File

@ -421,8 +421,11 @@ struct pv_mmu_ops pv_mmu_ops = {
.set_pte = native_set_pte,
.set_pte_at = native_set_pte_at,
.set_pmd = native_set_pmd,
.set_pmd_at = native_set_pmd_at,
.pte_update = paravirt_nop,
.pte_update_defer = paravirt_nop,
.pmd_update = paravirt_nop,
.pmd_update_defer = paravirt_nop,
.ptep_modify_prot_start = __ptep_modify_prot_start,
.ptep_modify_prot_commit = __ptep_modify_prot_commit,

View File

@ -133,7 +133,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn,
pmd = pmd_alloc(&tboot_mm, pud, vaddr);
if (!pmd)
return -1;
pte = pte_alloc_map(&tboot_mm, pmd, vaddr);
pte = pte_alloc_map(&tboot_mm, NULL, pmd, vaddr);
if (!pte)
return -1;
set_pte_at(&tboot_mm, vaddr, pte, pfn_pte(pfn, prot));

View File

@ -179,6 +179,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
if (pud_none_or_clear_bad(pud))
goto out;
pmd = pmd_offset(pud, 0xA0000);
split_huge_page_pmd(mm, pmd);
if (pmd_none_or_clear_bad(pmd))
goto out;
pte = pte_offset_map_lock(mm, pmd, 0xA0000, &ptl);

View File

@ -554,14 +554,18 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
return ret;
}
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
{
struct kvm_memory_slot *slot;
int host_level, level, max_level;
slot = gfn_to_memslot(vcpu->kvm, large_gfn);
if (slot && slot->dirty_bitmap)
return PT_PAGE_TABLE_LEVEL;
return true;
return false;
}
static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
{
int host_level, level, max_level;
host_level = host_mapping_level(vcpu->kvm, large_gfn);
@ -941,6 +945,35 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
return young;
}
static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
unsigned long data)
{
u64 *spte;
int young = 0;
/*
* If there's no access bit in the secondary pte set by the
* hardware it's up to gup-fast/gup to set the access bit in
* the primary pte or in the page structure.
*/
if (!shadow_accessed_mask)
goto out;
spte = rmap_next(kvm, rmapp, NULL);
while (spte) {
u64 _spte = *spte;
BUG_ON(!(_spte & PT_PRESENT_MASK));
young = _spte & PT_ACCESSED_MASK;
if (young) {
young = 1;
break;
}
spte = rmap_next(kvm, rmapp, spte);
}
out:
return young;
}
#define RMAP_RECYCLE_THRESHOLD 1000
static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
@ -961,6 +994,11 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva)
return kvm_handle_hva(kvm, hva, 0, kvm_age_rmapp);
}
int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
{
return kvm_handle_hva(kvm, hva, 0, kvm_test_age_rmapp);
}
#ifdef MMU_DEBUG
static int is_empty_shadow_page(u64 *spt)
{
@ -2281,6 +2319,48 @@ static int kvm_handle_bad_page(struct kvm *kvm, gfn_t gfn, pfn_t pfn)
return 1;
}
static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu,
gfn_t *gfnp, pfn_t *pfnp, int *levelp)
{
pfn_t pfn = *pfnp;
gfn_t gfn = *gfnp;
int level = *levelp;
/*
* Check if it's a transparent hugepage. If this would be an
* hugetlbfs page, level wouldn't be set to
* PT_PAGE_TABLE_LEVEL and there would be no adjustment done
* here.
*/
if (!is_error_pfn(pfn) && !kvm_is_mmio_pfn(pfn) &&
level == PT_PAGE_TABLE_LEVEL &&
PageTransCompound(pfn_to_page(pfn)) &&
!has_wrprotected_page(vcpu->kvm, gfn, PT_DIRECTORY_LEVEL)) {
unsigned long mask;
/*
* mmu_notifier_retry was successful and we hold the
* mmu_lock here, so the pmd can't become splitting
* from under us, and in turn
* __split_huge_page_refcount() can't run from under
* us and we can safely transfer the refcount from
* PG_tail to PG_head as we switch the pfn to tail to
* head.
*/
*levelp = level = PT_DIRECTORY_LEVEL;
mask = KVM_PAGES_PER_HPAGE(level) - 1;
VM_BUG_ON((gfn & mask) != (pfn & mask));
if (pfn & mask) {
gfn &= ~mask;
*gfnp = gfn;
kvm_release_pfn_clean(pfn);
pfn &= ~mask;
if (!get_page_unless_zero(pfn_to_page(pfn)))
BUG();
*pfnp = pfn;
}
}
}
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
gva_t gva, pfn_t *pfn, bool write, bool *writable);
@ -2289,20 +2369,25 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
{
int r;
int level;
int force_pt_level;
pfn_t pfn;
unsigned long mmu_seq;
bool map_writable;
level = mapping_level(vcpu, gfn);
force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
if (likely(!force_pt_level)) {
level = mapping_level(vcpu, gfn);
/*
* This path builds a PAE pagetable - so we can map
* 2mb pages at maximum. Therefore check if the level
* is larger than that.
*/
if (level > PT_DIRECTORY_LEVEL)
level = PT_DIRECTORY_LEVEL;
/*
* This path builds a PAE pagetable - so we can map 2mb pages at
* maximum. Therefore check if the level is larger than that.
*/
if (level > PT_DIRECTORY_LEVEL)
level = PT_DIRECTORY_LEVEL;
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
} else
level = PT_PAGE_TABLE_LEVEL;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
@ -2318,6 +2403,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn,
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
prefault);
spin_unlock(&vcpu->kvm->mmu_lock);
@ -2655,6 +2742,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
pfn_t pfn;
int r;
int level;
int force_pt_level;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
int write = error_code & PFERR_WRITE_MASK;
@ -2667,9 +2755,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
if (r)
return r;
level = mapping_level(vcpu, gfn);
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
if (likely(!force_pt_level)) {
level = mapping_level(vcpu, gfn);
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
} else
level = PT_PAGE_TABLE_LEVEL;
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
@ -2684,6 +2775,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
if (mmu_notifier_retry(vcpu, mmu_seq))
goto out_unlock;
kvm_mmu_free_some_pages(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, &gfn, &pfn, &level);
r = __direct_map(vcpu, gpa, write, map_writable,
level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);

View File

@ -550,6 +550,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
int r;
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
int force_pt_level;
unsigned long mmu_seq;
bool map_writable;
@ -577,7 +578,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
return 0;
}
if (walker.level >= PT_DIRECTORY_LEVEL) {
if (walker.level >= PT_DIRECTORY_LEVEL)
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
else
force_pt_level = 1;
if (!force_pt_level) {
level = min(walker.level, mapping_level(vcpu, walker.gfn));
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
}
@ -599,6 +604,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
kvm_mmu_free_some_pages(vcpu);
if (!force_pt_level)
transparent_hugepage_adjust(vcpu, &walker.gfn, &pfn, &level);
sptep = FNAME(fetch)(vcpu, addr, &walker, user_fault, write_fault,
level, &write_pt, pfn, map_writable, prefault);
(void)sptep;

View File

@ -8,6 +8,7 @@
#include <linux/mm.h>
#include <linux/vmstat.h>
#include <linux/highmem.h>
#include <linux/swap.h>
#include <asm/pgtable.h>
@ -89,6 +90,7 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
get_page(page);
SetPageReferenced(page);
pages[*nr] = page;
(*nr)++;
@ -103,6 +105,17 @@ static inline void get_head_page_multiple(struct page *page, int nr)
VM_BUG_ON(page != compound_head(page));
VM_BUG_ON(page_count(page) == 0);
atomic_add(nr, &page->_count);
SetPageReferenced(page);
}
static inline void get_huge_page_tail(struct page *page)
{
/*
* __split_huge_page_refcount() cannot run
* from under us.
*/
VM_BUG_ON(atomic_read(&page->_count) < 0);
atomic_inc(&page->_count);
}
static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
@ -128,6 +141,8 @@ static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
do {
VM_BUG_ON(compound_head(page) != head);
pages[*nr] = page;
if (PageTail(page))
get_huge_page_tail(page);
(*nr)++;
page++;
refs++;
@ -148,7 +163,18 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
pmd_t pmd = *pmdp;
next = pmd_addr_end(addr, end);
if (pmd_none(pmd))
/*
* The pmd_trans_splitting() check below explains why
* pmdp_splitting_flush has to flush the tlb, to stop
* this gup-fast code from running while we set the
* splitting bit in the pmd. Returning zero will take
* the slow path that will call wait_split_huge_page()
* if the pmd is still in splitting state. gup-fast
* can't because it has irq disabled and
* wait_split_huge_page() would never return as the
* tlb flush IPI wouldn't run.
*/
if (pmd_none(pmd) || pmd_trans_splitting(pmd))
return 0;
if (unlikely(pmd_large(pmd))) {
if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))

View File

@ -320,6 +320,25 @@ int ptep_set_access_flags(struct vm_area_struct *vma,
return changed;
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
int pmdp_set_access_flags(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp,
pmd_t entry, int dirty)
{
int changed = !pmd_same(*pmdp, entry);
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
if (changed && dirty) {
*pmdp = entry;
pmd_update_defer(vma->vm_mm, address, pmdp);
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
}
return changed;
}
#endif
int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
@ -335,6 +354,23 @@ int ptep_test_and_clear_young(struct vm_area_struct *vma,
return ret;
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
int pmdp_test_and_clear_young(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
int ret = 0;
if (pmd_young(*pmdp))
ret = test_and_clear_bit(_PAGE_BIT_ACCESSED,
(unsigned long *)pmdp);
if (ret)
pmd_update(vma->vm_mm, addr, pmdp);
return ret;
}
#endif
int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
@ -347,6 +383,36 @@ int ptep_clear_flush_young(struct vm_area_struct *vma,
return young;
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
int pmdp_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp)
{
int young;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
young = pmdp_test_and_clear_young(vma, address, pmdp);
if (young)
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return young;
}
void pmdp_splitting_flush(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmdp)
{
int set;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
set = !test_and_set_bit(_PAGE_BIT_SPLITTING,
(unsigned long *)pmdp);
if (set) {
pmd_update(vma->vm_mm, address, pmdp);
/* need tlb flush only to serialize against gup-fast */
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
}
}
#endif
/**
* reserve_top_address - reserves a hole in the top of kernel address space
* @reserve - size of hole to reserve

View File

@ -12,7 +12,8 @@ CFLAGS_mmu.o := $(nostackp)
obj-y := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o
grant-table.o suspend.o platform-pci-unplug.o \
p2m.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o

View File

@ -173,371 +173,6 @@ DEFINE_PER_CPU(unsigned long, xen_current_cr3); /* actual vcpu cr3 */
*/
#define USER_LIMIT ((STACK_TOP_MAX + PGDIR_SIZE - 1) & PGDIR_MASK)
/*
* Xen leaves the responsibility for maintaining p2m mappings to the
* guests themselves, but it must also access and update the p2m array
* during suspend/resume when all the pages are reallocated.
*
* The p2m table is logically a flat array, but we implement it as a
* three-level tree to allow the address space to be sparse.
*
* Xen
* |
* p2m_top p2m_top_mfn
* / \ / \
* p2m_mid p2m_mid p2m_mid_mfn p2m_mid_mfn
* / \ / \ / /
* p2m p2m p2m p2m p2m p2m p2m ...
*
* The p2m_mid_mfn pages are mapped by p2m_top_mfn_p.
*
* The p2m_top and p2m_top_mfn levels are limited to 1 page, so the
* maximum representable pseudo-physical address space is:
* P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE pages
*
* P2M_PER_PAGE depends on the architecture, as a mfn is always
* unsigned long (8 bytes on 64-bit, 4 bytes on 32), leading to
* 512 and 1024 entries respectively.
*/
unsigned long xen_max_p2m_pfn __read_mostly;
#define P2M_PER_PAGE (PAGE_SIZE / sizeof(unsigned long))
#define P2M_MID_PER_PAGE (PAGE_SIZE / sizeof(unsigned long *))
#define P2M_TOP_PER_PAGE (PAGE_SIZE / sizeof(unsigned long **))
#define MAX_P2M_PFN (P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE)
/* Placeholders for holes in the address space */
static RESERVE_BRK_ARRAY(unsigned long, p2m_missing, P2M_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_missing, P2M_MID_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_missing_mfn, P2M_MID_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
static inline unsigned p2m_top_index(unsigned long pfn)
{
BUG_ON(pfn >= MAX_P2M_PFN);
return pfn / (P2M_MID_PER_PAGE * P2M_PER_PAGE);
}
static inline unsigned p2m_mid_index(unsigned long pfn)
{
return (pfn / P2M_PER_PAGE) % P2M_MID_PER_PAGE;
}
static inline unsigned p2m_index(unsigned long pfn)
{
return pfn % P2M_PER_PAGE;
}
static void p2m_top_init(unsigned long ***top)
{
unsigned i;
for (i = 0; i < P2M_TOP_PER_PAGE; i++)
top[i] = p2m_mid_missing;
}
static void p2m_top_mfn_init(unsigned long *top)
{
unsigned i;
for (i = 0; i < P2M_TOP_PER_PAGE; i++)
top[i] = virt_to_mfn(p2m_mid_missing_mfn);
}
static void p2m_top_mfn_p_init(unsigned long **top)
{
unsigned i;
for (i = 0; i < P2M_TOP_PER_PAGE; i++)
top[i] = p2m_mid_missing_mfn;
}
static void p2m_mid_init(unsigned long **mid)
{
unsigned i;
for (i = 0; i < P2M_MID_PER_PAGE; i++)
mid[i] = p2m_missing;
}
static void p2m_mid_mfn_init(unsigned long *mid)
{
unsigned i;
for (i = 0; i < P2M_MID_PER_PAGE; i++)
mid[i] = virt_to_mfn(p2m_missing);
}
static void p2m_init(unsigned long *p2m)
{
unsigned i;
for (i = 0; i < P2M_MID_PER_PAGE; i++)
p2m[i] = INVALID_P2M_ENTRY;
}
/*
* Build the parallel p2m_top_mfn and p2m_mid_mfn structures
*
* This is called both at boot time, and after resuming from suspend:
* - At boot time we're called very early, and must use extend_brk()
* to allocate memory.
*
* - After resume we're called from within stop_machine, but the mfn
* tree should alreay be completely allocated.
*/
void xen_build_mfn_list_list(void)
{
unsigned long pfn;
/* Pre-initialize p2m_top_mfn to be completely missing */
if (p2m_top_mfn == NULL) {
p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_mfn_init(p2m_mid_missing_mfn);
p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_mfn_p_init(p2m_top_mfn_p);
p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_mfn_init(p2m_top_mfn);
} else {
/* Reinitialise, mfn's all change after migration */
p2m_mid_mfn_init(p2m_mid_missing_mfn);
}
for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += P2M_PER_PAGE) {
unsigned topidx = p2m_top_index(pfn);
unsigned mididx = p2m_mid_index(pfn);
unsigned long **mid;
unsigned long *mid_mfn_p;
mid = p2m_top[topidx];
mid_mfn_p = p2m_top_mfn_p[topidx];
/* Don't bother allocating any mfn mid levels if
* they're just missing, just update the stored mfn,
* since all could have changed over a migrate.
*/
if (mid == p2m_mid_missing) {
BUG_ON(mididx);
BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn);
pfn += (P2M_MID_PER_PAGE - 1) * P2M_PER_PAGE;
continue;
}
if (mid_mfn_p == p2m_mid_missing_mfn) {
/*
* XXX boot-time only! We should never find
* missing parts of the mfn tree after
* runtime. extend_brk() will BUG if we call
* it too late.
*/
mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_mfn_init(mid_mfn_p);
p2m_top_mfn_p[topidx] = mid_mfn_p;
}
p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
mid_mfn_p[mididx] = virt_to_mfn(mid[mididx]);
}
}
void xen_setup_mfn_list_list(void)
{
BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
virt_to_mfn(p2m_top_mfn);
HYPERVISOR_shared_info->arch.max_pfn = xen_max_p2m_pfn;
}
/* Set up p2m_top to point to the domain-builder provided p2m pages */
void __init xen_build_dynamic_phys_to_machine(void)
{
unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
unsigned long pfn;
xen_max_p2m_pfn = max_pfn;
p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_init(p2m_missing);
p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_init(p2m_mid_missing);
p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_init(p2m_top);
/*
* The domain builder gives us a pre-constructed p2m array in
* mfn_list for all the pages initially given to us, so we just
* need to graft that into our tree structure.
*/
for (pfn = 0; pfn < max_pfn; pfn += P2M_PER_PAGE) {
unsigned topidx = p2m_top_index(pfn);
unsigned mididx = p2m_mid_index(pfn);
if (p2m_top[topidx] == p2m_mid_missing) {
unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_init(mid);
p2m_top[topidx] = mid;
}
p2m_top[topidx][mididx] = &mfn_list[pfn];
}
}
unsigned long get_phys_to_machine(unsigned long pfn)
{
unsigned topidx, mididx, idx;
if (unlikely(pfn >= MAX_P2M_PFN))
return INVALID_P2M_ENTRY;
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);
return p2m_top[topidx][mididx][idx];
}
EXPORT_SYMBOL_GPL(get_phys_to_machine);
static void *alloc_p2m_page(void)
{
return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT);
}
static void free_p2m_page(void *p)
{
free_page((unsigned long)p);
}
/*
* Fully allocate the p2m structure for a given pfn. We need to check
* that both the top and mid levels are allocated, and make sure the
* parallel mfn tree is kept in sync. We may race with other cpus, so
* the new pages are installed with cmpxchg; if we lose the race then
* simply free the page we allocated and use the one that's there.
*/
static bool alloc_p2m(unsigned long pfn)
{
unsigned topidx, mididx;
unsigned long ***top_p, **mid;
unsigned long *top_mfn_p, *mid_mfn;
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
top_p = &p2m_top[topidx];
mid = *top_p;
if (mid == p2m_mid_missing) {
/* Mid level is missing, allocate a new one */
mid = alloc_p2m_page();
if (!mid)
return false;
p2m_mid_init(mid);
if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
free_p2m_page(mid);
}
top_mfn_p = &p2m_top_mfn[topidx];
mid_mfn = p2m_top_mfn_p[topidx];
BUG_ON(virt_to_mfn(mid_mfn) != *top_mfn_p);
if (mid_mfn == p2m_mid_missing_mfn) {
/* Separately check the mid mfn level */
unsigned long missing_mfn;
unsigned long mid_mfn_mfn;
mid_mfn = alloc_p2m_page();
if (!mid_mfn)
return false;
p2m_mid_mfn_init(mid_mfn);
missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
mid_mfn_mfn = virt_to_mfn(mid_mfn);
if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
free_p2m_page(mid_mfn);
else
p2m_top_mfn_p[topidx] = mid_mfn;
}
if (p2m_top[topidx][mididx] == p2m_missing) {
/* p2m leaf page is missing */
unsigned long *p2m;
p2m = alloc_p2m_page();
if (!p2m)
return false;
p2m_init(p2m);
if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
free_p2m_page(p2m);
else
mid_mfn[mididx] = virt_to_mfn(p2m);
}
return true;
}
/* Try to install p2m mapping; fail if intermediate bits missing */
bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
{
unsigned topidx, mididx, idx;
if (unlikely(pfn >= MAX_P2M_PFN)) {
BUG_ON(mfn != INVALID_P2M_ENTRY);
return true;
}
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);
if (p2m_top[topidx][mididx] == p2m_missing)
return mfn == INVALID_P2M_ENTRY;
p2m_top[topidx][mididx][idx] = mfn;
return true;
}
bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
{
if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return true;
}
if (unlikely(!__set_phys_to_machine(pfn, mfn))) {
if (!alloc_p2m(pfn))
return false;
if (!__set_phys_to_machine(pfn, mfn))
return false;
}
return true;
}
unsigned long arbitrary_virt_to_mfn(void *vaddr)
{
xmaddr_t maddr = arbitrary_virt_to_machine(vaddr);

510
arch/x86/xen/p2m.c Normal file
View File

@ -0,0 +1,510 @@
/*
* Xen leaves the responsibility for maintaining p2m mappings to the
* guests themselves, but it must also access and update the p2m array
* during suspend/resume when all the pages are reallocated.
*
* The p2m table is logically a flat array, but we implement it as a
* three-level tree to allow the address space to be sparse.
*
* Xen
* |
* p2m_top p2m_top_mfn
* / \ / \
* p2m_mid p2m_mid p2m_mid_mfn p2m_mid_mfn
* / \ / \ / /
* p2m p2m p2m p2m p2m p2m p2m ...
*
* The p2m_mid_mfn pages are mapped by p2m_top_mfn_p.
*
* The p2m_top and p2m_top_mfn levels are limited to 1 page, so the
* maximum representable pseudo-physical address space is:
* P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE pages
*
* P2M_PER_PAGE depends on the architecture, as a mfn is always
* unsigned long (8 bytes on 64-bit, 4 bytes on 32), leading to
* 512 and 1024 entries respectively.
*/
#include <linux/init.h>
#include <linux/module.h>
#include <linux/list.h>
#include <linux/hash.h>
#include <linux/sched.h>
#include <asm/cache.h>
#include <asm/setup.h>
#include <asm/xen/page.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/hypervisor.h>
#include "xen-ops.h"
static void __init m2p_override_init(void);
unsigned long xen_max_p2m_pfn __read_mostly;
#define P2M_PER_PAGE (PAGE_SIZE / sizeof(unsigned long))
#define P2M_MID_PER_PAGE (PAGE_SIZE / sizeof(unsigned long *))
#define P2M_TOP_PER_PAGE (PAGE_SIZE / sizeof(unsigned long **))
#define MAX_P2M_PFN (P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE)
/* Placeholders for holes in the address space */
static RESERVE_BRK_ARRAY(unsigned long, p2m_missing, P2M_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long *, p2m_mid_missing, P2M_MID_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long, p2m_mid_missing_mfn, P2M_MID_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long **, p2m_top, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long, p2m_top_mfn, P2M_TOP_PER_PAGE);
static RESERVE_BRK_ARRAY(unsigned long *, p2m_top_mfn_p, P2M_TOP_PER_PAGE);
RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
RESERVE_BRK(p2m_mid_mfn, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER_PAGE)));
static inline unsigned p2m_top_index(unsigned long pfn)
{
BUG_ON(pfn >= MAX_P2M_PFN);
return pfn / (P2M_MID_PER_PAGE * P2M_PER_PAGE);
}
static inline unsigned p2m_mid_index(unsigned long pfn)
{
return (pfn / P2M_PER_PAGE) % P2M_MID_PER_PAGE;
}
static inline unsigned p2m_index(unsigned long pfn)
{
return pfn % P2M_PER_PAGE;
}
static void p2m_top_init(unsigned long ***top)
{
unsigned i;
for (i = 0; i < P2M_TOP_PER_PAGE; i++)
top[i] = p2m_mid_missing;
}
static void p2m_top_mfn_init(unsigned long *top)
{
unsigned i;
for (i = 0; i < P2M_TOP_PER_PAGE; i++)
top[i] = virt_to_mfn(p2m_mid_missing_mfn);
}
static void p2m_top_mfn_p_init(unsigned long **top)
{
unsigned i;
for (i = 0; i < P2M_TOP_PER_PAGE; i++)
top[i] = p2m_mid_missing_mfn;
}
static void p2m_mid_init(unsigned long **mid)
{
unsigned i;
for (i = 0; i < P2M_MID_PER_PAGE; i++)
mid[i] = p2m_missing;
}
static void p2m_mid_mfn_init(unsigned long *mid)
{
unsigned i;
for (i = 0; i < P2M_MID_PER_PAGE; i++)
mid[i] = virt_to_mfn(p2m_missing);
}
static void p2m_init(unsigned long *p2m)
{
unsigned i;
for (i = 0; i < P2M_MID_PER_PAGE; i++)
p2m[i] = INVALID_P2M_ENTRY;
}
/*
* Build the parallel p2m_top_mfn and p2m_mid_mfn structures
*
* This is called both at boot time, and after resuming from suspend:
* - At boot time we're called very early, and must use extend_brk()
* to allocate memory.
*
* - After resume we're called from within stop_machine, but the mfn
* tree should alreay be completely allocated.
*/
void xen_build_mfn_list_list(void)
{
unsigned long pfn;
/* Pre-initialize p2m_top_mfn to be completely missing */
if (p2m_top_mfn == NULL) {
p2m_mid_missing_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_mfn_init(p2m_mid_missing_mfn);
p2m_top_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_mfn_p_init(p2m_top_mfn_p);
p2m_top_mfn = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_mfn_init(p2m_top_mfn);
} else {
/* Reinitialise, mfn's all change after migration */
p2m_mid_mfn_init(p2m_mid_missing_mfn);
}
for (pfn = 0; pfn < xen_max_p2m_pfn; pfn += P2M_PER_PAGE) {
unsigned topidx = p2m_top_index(pfn);
unsigned mididx = p2m_mid_index(pfn);
unsigned long **mid;
unsigned long *mid_mfn_p;
mid = p2m_top[topidx];
mid_mfn_p = p2m_top_mfn_p[topidx];
/* Don't bother allocating any mfn mid levels if
* they're just missing, just update the stored mfn,
* since all could have changed over a migrate.
*/
if (mid == p2m_mid_missing) {
BUG_ON(mididx);
BUG_ON(mid_mfn_p != p2m_mid_missing_mfn);
p2m_top_mfn[topidx] = virt_to_mfn(p2m_mid_missing_mfn);
pfn += (P2M_MID_PER_PAGE - 1) * P2M_PER_PAGE;
continue;
}
if (mid_mfn_p == p2m_mid_missing_mfn) {
/*
* XXX boot-time only! We should never find
* missing parts of the mfn tree after
* runtime. extend_brk() will BUG if we call
* it too late.
*/
mid_mfn_p = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_mfn_init(mid_mfn_p);
p2m_top_mfn_p[topidx] = mid_mfn_p;
}
p2m_top_mfn[topidx] = virt_to_mfn(mid_mfn_p);
mid_mfn_p[mididx] = virt_to_mfn(mid[mididx]);
}
}
void xen_setup_mfn_list_list(void)
{
BUG_ON(HYPERVISOR_shared_info == &xen_dummy_shared_info);
HYPERVISOR_shared_info->arch.pfn_to_mfn_frame_list_list =
virt_to_mfn(p2m_top_mfn);
HYPERVISOR_shared_info->arch.max_pfn = xen_max_p2m_pfn;
}
/* Set up p2m_top to point to the domain-builder provided p2m pages */
void __init xen_build_dynamic_phys_to_machine(void)
{
unsigned long *mfn_list = (unsigned long *)xen_start_info->mfn_list;
unsigned long max_pfn = min(MAX_DOMAIN_PAGES, xen_start_info->nr_pages);
unsigned long pfn;
xen_max_p2m_pfn = max_pfn;
p2m_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_init(p2m_missing);
p2m_mid_missing = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_init(p2m_mid_missing);
p2m_top = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_top_init(p2m_top);
/*
* The domain builder gives us a pre-constructed p2m array in
* mfn_list for all the pages initially given to us, so we just
* need to graft that into our tree structure.
*/
for (pfn = 0; pfn < max_pfn; pfn += P2M_PER_PAGE) {
unsigned topidx = p2m_top_index(pfn);
unsigned mididx = p2m_mid_index(pfn);
if (p2m_top[topidx] == p2m_mid_missing) {
unsigned long **mid = extend_brk(PAGE_SIZE, PAGE_SIZE);
p2m_mid_init(mid);
p2m_top[topidx] = mid;
}
p2m_top[topidx][mididx] = &mfn_list[pfn];
}
m2p_override_init();
}
unsigned long get_phys_to_machine(unsigned long pfn)
{
unsigned topidx, mididx, idx;
if (unlikely(pfn >= MAX_P2M_PFN))
return INVALID_P2M_ENTRY;
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);
return p2m_top[topidx][mididx][idx];
}
EXPORT_SYMBOL_GPL(get_phys_to_machine);
static void *alloc_p2m_page(void)
{
return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT);
}
static void free_p2m_page(void *p)
{
free_page((unsigned long)p);
}
/*
* Fully allocate the p2m structure for a given pfn. We need to check
* that both the top and mid levels are allocated, and make sure the
* parallel mfn tree is kept in sync. We may race with other cpus, so
* the new pages are installed with cmpxchg; if we lose the race then
* simply free the page we allocated and use the one that's there.
*/
static bool alloc_p2m(unsigned long pfn)
{
unsigned topidx, mididx;
unsigned long ***top_p, **mid;
unsigned long *top_mfn_p, *mid_mfn;
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
top_p = &p2m_top[topidx];
mid = *top_p;
if (mid == p2m_mid_missing) {
/* Mid level is missing, allocate a new one */
mid = alloc_p2m_page();
if (!mid)
return false;
p2m_mid_init(mid);
if (cmpxchg(top_p, p2m_mid_missing, mid) != p2m_mid_missing)
free_p2m_page(mid);
}
top_mfn_p = &p2m_top_mfn[topidx];
mid_mfn = p2m_top_mfn_p[topidx];
BUG_ON(virt_to_mfn(mid_mfn) != *top_mfn_p);
if (mid_mfn == p2m_mid_missing_mfn) {
/* Separately check the mid mfn level */
unsigned long missing_mfn;
unsigned long mid_mfn_mfn;
mid_mfn = alloc_p2m_page();
if (!mid_mfn)
return false;
p2m_mid_mfn_init(mid_mfn);
missing_mfn = virt_to_mfn(p2m_mid_missing_mfn);
mid_mfn_mfn = virt_to_mfn(mid_mfn);
if (cmpxchg(top_mfn_p, missing_mfn, mid_mfn_mfn) != missing_mfn)
free_p2m_page(mid_mfn);
else
p2m_top_mfn_p[topidx] = mid_mfn;
}
if (p2m_top[topidx][mididx] == p2m_missing) {
/* p2m leaf page is missing */
unsigned long *p2m;
p2m = alloc_p2m_page();
if (!p2m)
return false;
p2m_init(p2m);
if (cmpxchg(&mid[mididx], p2m_missing, p2m) != p2m_missing)
free_p2m_page(p2m);
else
mid_mfn[mididx] = virt_to_mfn(p2m);
}
return true;
}
/* Try to install p2m mapping; fail if intermediate bits missing */
bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn)
{
unsigned topidx, mididx, idx;
if (unlikely(pfn >= MAX_P2M_PFN)) {
BUG_ON(mfn != INVALID_P2M_ENTRY);
return true;
}
topidx = p2m_top_index(pfn);
mididx = p2m_mid_index(pfn);
idx = p2m_index(pfn);
if (p2m_top[topidx][mididx] == p2m_missing)
return mfn == INVALID_P2M_ENTRY;
p2m_top[topidx][mididx][idx] = mfn;
return true;
}
bool set_phys_to_machine(unsigned long pfn, unsigned long mfn)
{
if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return true;
}
if (unlikely(!__set_phys_to_machine(pfn, mfn))) {
if (!alloc_p2m(pfn))
return false;
if (!__set_phys_to_machine(pfn, mfn))
return false;
}
return true;
}
#define M2P_OVERRIDE_HASH_SHIFT 10
#define M2P_OVERRIDE_HASH (1 << M2P_OVERRIDE_HASH_SHIFT)
static RESERVE_BRK_ARRAY(struct list_head, m2p_overrides, M2P_OVERRIDE_HASH);
static DEFINE_SPINLOCK(m2p_override_lock);
static void __init m2p_override_init(void)
{
unsigned i;
m2p_overrides = extend_brk(sizeof(*m2p_overrides) * M2P_OVERRIDE_HASH,
sizeof(unsigned long));
for (i = 0; i < M2P_OVERRIDE_HASH; i++)
INIT_LIST_HEAD(&m2p_overrides[i]);
}
static unsigned long mfn_hash(unsigned long mfn)
{
return hash_long(mfn, M2P_OVERRIDE_HASH_SHIFT);
}
/* Add an MFN override for a particular page */
int m2p_add_override(unsigned long mfn, struct page *page)
{
unsigned long flags;
unsigned long pfn;
unsigned long address;
unsigned level;
pte_t *ptep = NULL;
pfn = page_to_pfn(page);
if (!PageHighMem(page)) {
address = (unsigned long)__va(pfn << PAGE_SHIFT);
ptep = lookup_address(address, &level);
if (WARN(ptep == NULL || level != PG_LEVEL_4K,
"m2p_add_override: pfn %lx not mapped", pfn))
return -EINVAL;
}
page->private = mfn;
page->index = pfn_to_mfn(pfn);
__set_phys_to_machine(pfn, FOREIGN_FRAME(mfn));
if (!PageHighMem(page))
/* Just zap old mapping for now */
pte_clear(&init_mm, address, ptep);
spin_lock_irqsave(&m2p_override_lock, flags);
list_add(&page->lru, &m2p_overrides[mfn_hash(mfn)]);
spin_unlock_irqrestore(&m2p_override_lock, flags);
return 0;
}
int m2p_remove_override(struct page *page)
{
unsigned long flags;
unsigned long mfn;
unsigned long pfn;
unsigned long address;
unsigned level;
pte_t *ptep = NULL;
pfn = page_to_pfn(page);
mfn = get_phys_to_machine(pfn);
if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
return -EINVAL;
if (!PageHighMem(page)) {
address = (unsigned long)__va(pfn << PAGE_SHIFT);
ptep = lookup_address(address, &level);
if (WARN(ptep == NULL || level != PG_LEVEL_4K,
"m2p_remove_override: pfn %lx not mapped", pfn))
return -EINVAL;
}
spin_lock_irqsave(&m2p_override_lock, flags);
list_del(&page->lru);
spin_unlock_irqrestore(&m2p_override_lock, flags);
__set_phys_to_machine(pfn, page->index);
if (!PageHighMem(page))
set_pte_at(&init_mm, address, ptep,
pfn_pte(pfn, PAGE_KERNEL));
/* No tlb flush necessary because the caller already
* left the pte unmapped. */
return 0;
}
struct page *m2p_find_override(unsigned long mfn)
{
unsigned long flags;
struct list_head *bucket = &m2p_overrides[mfn_hash(mfn)];
struct page *p, *ret;
ret = NULL;
spin_lock_irqsave(&m2p_override_lock, flags);
list_for_each_entry(p, bucket, lru) {
if (p->private == mfn) {
ret = p;
break;
}
}
spin_unlock_irqrestore(&m2p_override_lock, flags);
return ret;
}
unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn)
{
struct page *p = m2p_find_override(mfn);
unsigned long ret = pfn;
if (p)
ret = page_to_pfn(p);
return ret;
}
EXPORT_SYMBOL_GPL(m2p_find_override_pfn);

View File

@ -83,6 +83,9 @@
#define MADV_MERGEABLE 12 /* KSM may merge identical pages */
#define MADV_UNMERGEABLE 13 /* KSM may not merge identical pages */
#define MADV_HUGEPAGE 14 /* Worth backing with hugepages */
#define MADV_NOHUGEPAGE 15 /* Not worth backing with hugepages */
/* compatibility flags */
#define MAP_FILE 0

View File

@ -117,12 +117,21 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
"Node %d WritebackTmp: %8lu kB\n"
"Node %d Slab: %8lu kB\n"
"Node %d SReclaimable: %8lu kB\n"
"Node %d SUnreclaim: %8lu kB\n",
"Node %d SUnreclaim: %8lu kB\n"
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
"Node %d AnonHugePages: %8lu kB\n"
#endif
,
nid, K(node_page_state(nid, NR_FILE_DIRTY)),
nid, K(node_page_state(nid, NR_WRITEBACK)),
nid, K(node_page_state(nid, NR_FILE_PAGES)),
nid, K(node_page_state(nid, NR_FILE_MAPPED)),
nid, K(node_page_state(nid, NR_ANON_PAGES)),
nid, K(node_page_state(nid, NR_ANON_PAGES)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
HPAGE_PMD_NR
#endif
),
nid, K(node_page_state(nid, NR_SHMEM)),
nid, node_page_state(nid, NR_KERNEL_STACK) *
THREAD_SIZE / 1024,
@ -133,7 +142,13 @@ static ssize_t node_read_meminfo(struct sys_device * dev,
nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE) +
node_page_state(nid, NR_SLAB_UNRECLAIMABLE)),
nid, K(node_page_state(nid, NR_SLAB_RECLAIMABLE)),
nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE)));
nid, K(node_page_state(nid, NR_SLAB_UNRECLAIMABLE))
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
, nid,
K(node_page_state(nid, NR_ANON_TRANSPARENT_HUGEPAGES) *
HPAGE_PMD_NR)
#endif
);
n += hugetlb_report_node_meminfo(nid, buf + n);
return n;
}

View File

@ -240,6 +240,30 @@ config DM_MIRROR
Allow volume managers to mirror logical volumes, also
needed for live data migration tools such as 'pvmove'.
config DM_RAID
tristate "RAID 4/5/6 target (EXPERIMENTAL)"
depends on BLK_DEV_DM && EXPERIMENTAL
select MD_RAID456
select BLK_DEV_MD
---help---
A dm target that supports RAID4, RAID5 and RAID6 mappings
A RAID-5 set of N drives with a capacity of C MB per drive provides
the capacity of C * (N - 1) MB, and protects against a failure
of a single drive. For a given sector (row) number, (N - 1) drives
contain data sectors, and one drive contains the parity protection.
For a RAID-4 set, the parity blocks are present on a single drive,
while a RAID-5 set distributes the parity across the drives in one
of the available parity distribution methods.
A RAID-6 set of N drives with a capacity of C MB per drive
provides the capacity of C * (N - 2) MB, and protects
against a failure of any two drives. For a given sector
(row) number, (N - 2) drives contain data sectors, and two
drives contains two independent redundancy syndromes. Like
RAID-5, RAID-6 distributes the syndromes across the drives
in one of the available parity distribution methods.
config DM_LOG_USERSPACE
tristate "Mirror userspace logging (EXPERIMENTAL)"
depends on DM_MIRROR && EXPERIMENTAL && NET

View File

@ -36,6 +36,7 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o
ifeq ($(CONFIG_DM_UEVENT),y)
dm-mod-objs += dm-uevent.o

View File

@ -210,11 +210,11 @@ static struct page *read_sb_page(mddev_t *mddev, loff_t offset,
|| test_bit(Faulty, &rdev->flags))
continue;
target = rdev->sb_start + offset + index * (PAGE_SIZE/512);
target = offset + index * (PAGE_SIZE/512);
if (sync_page_io(rdev, target,
roundup(size, bdev_logical_block_size(rdev->bdev)),
page, READ)) {
page, READ, true)) {
page->index = index;
attach_page_buffers(page, NULL); /* so that free_buffer will
* quietly no-op */
@ -264,14 +264,18 @@ static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
{
mdk_rdev_t *rdev = NULL;
struct block_device *bdev;
mddev_t *mddev = bitmap->mddev;
while ((rdev = next_active_rdev(rdev, mddev)) != NULL) {
int size = PAGE_SIZE;
loff_t offset = mddev->bitmap_info.offset;
bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev;
if (page->index == bitmap->file_pages-1)
size = roundup(bitmap->last_page_size,
bdev_logical_block_size(rdev->bdev));
bdev_logical_block_size(bdev));
/* Just make sure we aren't corrupting data or
* metadata
*/
@ -1542,7 +1546,7 @@ void bitmap_cond_end_sync(struct bitmap *bitmap, sector_t sector)
wait_event(bitmap->mddev->recovery_wait,
atomic_read(&bitmap->mddev->recovery_active) == 0);
bitmap->mddev->curr_resync_completed = bitmap->mddev->curr_resync;
bitmap->mddev->curr_resync_completed = sector;
set_bit(MD_CHANGE_CLEAN, &bitmap->mddev->flags);
sector &= ~((1ULL << CHUNK_BLOCK_SHIFT(bitmap)) - 1);
s = 0;

File diff suppressed because it is too large Load Diff

View File

@ -352,7 +352,7 @@ static int __init dm_delay_init(void)
{
int r = -ENOMEM;
kdelayd_wq = create_workqueue("kdelayd");
kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0);
if (!kdelayd_wq) {
DMERR("Couldn't start kdelayd");
goto bad_queue;

View File

@ -295,19 +295,55 @@ retry:
DMWARN("remove_all left %d open device(s)", dev_skipped);
}
/*
* Set the uuid of a hash_cell that isn't already set.
*/
static void __set_cell_uuid(struct hash_cell *hc, char *new_uuid)
{
mutex_lock(&dm_hash_cells_mutex);
hc->uuid = new_uuid;
mutex_unlock(&dm_hash_cells_mutex);
list_add(&hc->uuid_list, _uuid_buckets + hash_str(new_uuid));
}
/*
* Changes the name of a hash_cell and returns the old name for
* the caller to free.
*/
static char *__change_cell_name(struct hash_cell *hc, char *new_name)
{
char *old_name;
/*
* Rename and move the name cell.
*/
list_del(&hc->name_list);
old_name = hc->name;
mutex_lock(&dm_hash_cells_mutex);
hc->name = new_name;
mutex_unlock(&dm_hash_cells_mutex);
list_add(&hc->name_list, _name_buckets + hash_str(new_name));
return old_name;
}
static struct mapped_device *dm_hash_rename(struct dm_ioctl *param,
const char *new)
{
char *new_name, *old_name;
char *new_data, *old_name = NULL;
struct hash_cell *hc;
struct dm_table *table;
struct mapped_device *md;
unsigned change_uuid = (param->flags & DM_UUID_FLAG) ? 1 : 0;
/*
* duplicate new.
*/
new_name = kstrdup(new, GFP_KERNEL);
if (!new_name)
new_data = kstrdup(new, GFP_KERNEL);
if (!new_data)
return ERR_PTR(-ENOMEM);
down_write(&_hash_lock);
@ -315,13 +351,19 @@ static struct mapped_device *dm_hash_rename(struct dm_ioctl *param,
/*
* Is new free ?
*/
hc = __get_name_cell(new);
if (change_uuid)
hc = __get_uuid_cell(new);
else
hc = __get_name_cell(new);
if (hc) {
DMWARN("asked to rename to an already-existing name %s -> %s",
DMWARN("Unable to change %s on mapped device %s to one that "
"already exists: %s",
change_uuid ? "uuid" : "name",
param->name, new);
dm_put(hc->md);
up_write(&_hash_lock);
kfree(new_name);
kfree(new_data);
return ERR_PTR(-EBUSY);
}
@ -330,22 +372,30 @@ static struct mapped_device *dm_hash_rename(struct dm_ioctl *param,
*/
hc = __get_name_cell(param->name);
if (!hc) {
DMWARN("asked to rename a non-existent device %s -> %s",
param->name, new);
DMWARN("Unable to rename non-existent device, %s to %s%s",
param->name, change_uuid ? "uuid " : "", new);
up_write(&_hash_lock);
kfree(new_name);
kfree(new_data);
return ERR_PTR(-ENXIO);
}
/*
* rename and move the name cell.
* Does this device already have a uuid?
*/
list_del(&hc->name_list);
old_name = hc->name;
mutex_lock(&dm_hash_cells_mutex);
hc->name = new_name;
mutex_unlock(&dm_hash_cells_mutex);
list_add(&hc->name_list, _name_buckets + hash_str(new_name));
if (change_uuid && hc->uuid) {
DMWARN("Unable to change uuid of mapped device %s to %s "
"because uuid is already set to %s",
param->name, new, hc->uuid);
dm_put(hc->md);
up_write(&_hash_lock);
kfree(new_data);
return ERR_PTR(-EINVAL);
}
if (change_uuid)
__set_cell_uuid(hc, new_data);
else
old_name = __change_cell_name(hc, new_data);
/*
* Wake up any dm event waiters.
@ -729,7 +779,7 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size)
hc = __find_device_hash_cell(param);
if (!hc) {
DMWARN("device doesn't appear to be in the dev hash table.");
DMDEBUG_LIMIT("device doesn't appear to be in the dev hash table.");
up_write(&_hash_lock);
return -ENXIO;
}
@ -741,7 +791,7 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size)
*/
r = dm_lock_for_deletion(md);
if (r) {
DMWARN("unable to remove open device %s", hc->name);
DMDEBUG_LIMIT("unable to remove open device %s", hc->name);
up_write(&_hash_lock);
dm_put(md);
return r;
@ -774,21 +824,24 @@ static int invalid_str(char *str, void *end)
static int dev_rename(struct dm_ioctl *param, size_t param_size)
{
int r;
char *new_name = (char *) param + param->data_start;
char *new_data = (char *) param + param->data_start;
struct mapped_device *md;
unsigned change_uuid = (param->flags & DM_UUID_FLAG) ? 1 : 0;
if (new_name < param->data ||
invalid_str(new_name, (void *) param + param_size) ||
strlen(new_name) > DM_NAME_LEN - 1) {
DMWARN("Invalid new logical volume name supplied.");
if (new_data < param->data ||
invalid_str(new_data, (void *) param + param_size) ||
strlen(new_data) > (change_uuid ? DM_UUID_LEN - 1 : DM_NAME_LEN - 1)) {
DMWARN("Invalid new mapped device name or uuid string supplied.");
return -EINVAL;
}
r = check_name(new_name);
if (r)
return r;
if (!change_uuid) {
r = check_name(new_data);
if (r)
return r;
}
md = dm_hash_rename(param, new_name);
md = dm_hash_rename(param, new_data);
if (IS_ERR(md))
return PTR_ERR(md);
@ -885,7 +938,7 @@ static int do_resume(struct dm_ioctl *param)
hc = __find_device_hash_cell(param);
if (!hc) {
DMWARN("device doesn't appear to be in the dev hash table.");
DMDEBUG_LIMIT("device doesn't appear to be in the dev hash table.");
up_write(&_hash_lock);
return -ENXIO;
}
@ -1212,7 +1265,7 @@ static int table_clear(struct dm_ioctl *param, size_t param_size)
hc = __find_device_hash_cell(param);
if (!hc) {
DMWARN("device doesn't appear to be in the dev hash table.");
DMDEBUG_LIMIT("device doesn't appear to be in the dev hash table.");
up_write(&_hash_lock);
return -ENXIO;
}

View File

@ -37,6 +37,13 @@ struct dm_kcopyd_client {
unsigned int nr_pages;
unsigned int nr_free_pages;
/*
* Block devices to unplug.
* Non-NULL pointer means that a block device has some pending requests
* and needs to be unplugged.
*/
struct block_device *unplug[2];
struct dm_io_client *io_client;
wait_queue_head_t destroyq;
@ -308,6 +315,31 @@ static int run_complete_job(struct kcopyd_job *job)
return 0;
}
/*
* Unplug the block device at the specified index.
*/
static void unplug(struct dm_kcopyd_client *kc, int rw)
{
if (kc->unplug[rw] != NULL) {
blk_unplug(bdev_get_queue(kc->unplug[rw]));
kc->unplug[rw] = NULL;
}
}
/*
* Prepare block device unplug. If there's another device
* to be unplugged at the same array index, we unplug that
* device first.
*/
static void prepare_unplug(struct dm_kcopyd_client *kc, int rw,
struct block_device *bdev)
{
if (likely(kc->unplug[rw] == bdev))
return;
unplug(kc, rw);
kc->unplug[rw] = bdev;
}
static void complete_io(unsigned long error, void *context)
{
struct kcopyd_job *job = (struct kcopyd_job *) context;
@ -345,7 +377,7 @@ static int run_io_job(struct kcopyd_job *job)
{
int r;
struct dm_io_request io_req = {
.bi_rw = job->rw | REQ_SYNC | REQ_UNPLUG,
.bi_rw = job->rw,
.mem.type = DM_IO_PAGE_LIST,
.mem.ptr.pl = job->pages,
.mem.offset = job->offset,
@ -354,10 +386,16 @@ static int run_io_job(struct kcopyd_job *job)
.client = job->kc->io_client,
};
if (job->rw == READ)
if (job->rw == READ) {
r = dm_io(&io_req, 1, &job->source, NULL);
else
prepare_unplug(job->kc, READ, job->source.bdev);
} else {
if (job->num_dests > 1)
io_req.bi_rw |= REQ_UNPLUG;
r = dm_io(&io_req, job->num_dests, job->dests, NULL);
if (!(io_req.bi_rw & REQ_UNPLUG))
prepare_unplug(job->kc, WRITE, job->dests[0].bdev);
}
return r;
}
@ -435,10 +473,18 @@ static void do_work(struct work_struct *work)
* Pages jobs when successful will jump onto the io jobs
* list. io jobs call wake when they complete and it all
* starts again.
*
* Note that io_jobs add block devices to the unplug array,
* this array is cleared with "unplug" calls. It is thus
* forbidden to run complete_jobs after io_jobs and before
* unplug because the block device could be destroyed in
* job completion callback.
*/
process_jobs(&kc->complete_jobs, kc, run_complete_job);
process_jobs(&kc->pages_jobs, kc, run_pages_job);
process_jobs(&kc->io_jobs, kc, run_io_job);
unplug(kc, READ);
unplug(kc, WRITE);
}
/*
@ -619,12 +665,15 @@ int dm_kcopyd_client_create(unsigned int nr_pages,
INIT_LIST_HEAD(&kc->io_jobs);
INIT_LIST_HEAD(&kc->pages_jobs);
memset(kc->unplug, 0, sizeof(kc->unplug));
kc->job_pool = mempool_create_slab_pool(MIN_JOBS, _job_cache);
if (!kc->job_pool)
goto bad_slab;
INIT_WORK(&kc->kcopyd_work, do_work);
kc->kcopyd_wq = create_singlethread_workqueue("kcopyd");
kc->kcopyd_wq = alloc_workqueue("kcopyd",
WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);
if (!kc->kcopyd_wq)
goto bad_workqueue;

View File

@ -12,12 +12,22 @@
#include "dm-log-userspace-transfer.h"
#define DM_LOG_USERSPACE_VSN "1.1.0"
struct flush_entry {
int type;
region_t region;
struct list_head list;
};
/*
* This limit on the number of mark and clear request is, to a degree,
* arbitrary. However, there is some basis for the choice in the limits
* imposed on the size of data payload by dm-log-userspace-transfer.c:
* dm_consult_userspace().
*/
#define MAX_FLUSH_GROUP_COUNT 32
struct log_c {
struct dm_target *ti;
uint32_t region_size;
@ -37,8 +47,15 @@ struct log_c {
*/
uint64_t in_sync_hint;
/*
* Mark and clear requests are held until a flush is issued
* so that we can group, and thereby limit, the amount of
* network traffic between kernel and userspace. The 'flush_lock'
* is used to protect these lists.
*/
spinlock_t flush_lock;
struct list_head flush_list; /* only for clear and mark requests */
struct list_head mark_list;
struct list_head clear_list;
};
static mempool_t *flush_entry_pool;
@ -169,7 +186,8 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
strncpy(lc->uuid, argv[0], DM_UUID_LEN);
spin_lock_init(&lc->flush_lock);
INIT_LIST_HEAD(&lc->flush_list);
INIT_LIST_HEAD(&lc->mark_list);
INIT_LIST_HEAD(&lc->clear_list);
str_size = build_constructor_string(ti, argc - 1, argv + 1, &ctr_str);
if (str_size < 0) {
@ -181,8 +199,11 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
r = dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_CTR,
ctr_str, str_size, NULL, NULL);
if (r == -ESRCH) {
DMERR("Userspace log server not found");
if (r < 0) {
if (r == -ESRCH)
DMERR("Userspace log server not found");
else
DMERR("Userspace log server failed to create log");
goto out;
}
@ -214,10 +235,9 @@ out:
static void userspace_dtr(struct dm_dirty_log *log)
{
int r;
struct log_c *lc = log->context;
r = dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_DTR,
(void) dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_DTR,
NULL, 0,
NULL, NULL);
@ -338,6 +358,71 @@ static int userspace_in_sync(struct dm_dirty_log *log, region_t region,
return (r) ? 0 : (int)in_sync;
}
static int flush_one_by_one(struct log_c *lc, struct list_head *flush_list)
{
int r = 0;
struct flush_entry *fe;
list_for_each_entry(fe, flush_list, list) {
r = userspace_do_request(lc, lc->uuid, fe->type,
(char *)&fe->region,
sizeof(fe->region),
NULL, NULL);
if (r)
break;
}
return r;
}
static int flush_by_group(struct log_c *lc, struct list_head *flush_list)
{
int r = 0;
int count;
uint32_t type = 0;
struct flush_entry *fe, *tmp_fe;
LIST_HEAD(tmp_list);
uint64_t group[MAX_FLUSH_GROUP_COUNT];
/*
* Group process the requests
*/
while (!list_empty(flush_list)) {
count = 0;
list_for_each_entry_safe(fe, tmp_fe, flush_list, list) {
group[count] = fe->region;
count++;
list_del(&fe->list);
list_add(&fe->list, &tmp_list);
type = fe->type;
if (count >= MAX_FLUSH_GROUP_COUNT)
break;
}
r = userspace_do_request(lc, lc->uuid, type,
(char *)(group),
count * sizeof(uint64_t),
NULL, NULL);
if (r) {
/* Group send failed. Attempt one-by-one. */
list_splice_init(&tmp_list, flush_list);
r = flush_one_by_one(lc, flush_list);
break;
}
}
/*
* Must collect flush_entrys that were successfully processed
* as a group so that they will be free'd by the caller.
*/
list_splice_init(&tmp_list, flush_list);
return r;
}
/*
* userspace_flush
*
@ -360,31 +445,25 @@ static int userspace_flush(struct dm_dirty_log *log)
int r = 0;
unsigned long flags;
struct log_c *lc = log->context;
LIST_HEAD(flush_list);
LIST_HEAD(mark_list);
LIST_HEAD(clear_list);
struct flush_entry *fe, *tmp_fe;
spin_lock_irqsave(&lc->flush_lock, flags);
list_splice_init(&lc->flush_list, &flush_list);
list_splice_init(&lc->mark_list, &mark_list);
list_splice_init(&lc->clear_list, &clear_list);
spin_unlock_irqrestore(&lc->flush_lock, flags);
if (list_empty(&flush_list))
if (list_empty(&mark_list) && list_empty(&clear_list))
return 0;
/*
* FIXME: Count up requests, group request types,
* allocate memory to stick all requests in and
* send to server in one go. Failing the allocation,
* do it one by one.
*/
r = flush_by_group(lc, &mark_list);
if (r)
goto fail;
list_for_each_entry(fe, &flush_list, list) {
r = userspace_do_request(lc, lc->uuid, fe->type,
(char *)&fe->region,
sizeof(fe->region),
NULL, NULL);
if (r)
goto fail;
}
r = flush_by_group(lc, &clear_list);
if (r)
goto fail;
r = userspace_do_request(lc, lc->uuid, DM_ULOG_FLUSH,
NULL, 0, NULL, NULL);
@ -395,7 +474,11 @@ fail:
* Calling code will receive an error and will know that
* the log facility has failed.
*/
list_for_each_entry_safe(fe, tmp_fe, &flush_list, list) {
list_for_each_entry_safe(fe, tmp_fe, &mark_list, list) {
list_del(&fe->list);
mempool_free(fe, flush_entry_pool);
}
list_for_each_entry_safe(fe, tmp_fe, &clear_list, list) {
list_del(&fe->list);
mempool_free(fe, flush_entry_pool);
}
@ -425,7 +508,7 @@ static void userspace_mark_region(struct dm_dirty_log *log, region_t region)
spin_lock_irqsave(&lc->flush_lock, flags);
fe->type = DM_ULOG_MARK_REGION;
fe->region = region;
list_add(&fe->list, &lc->flush_list);
list_add(&fe->list, &lc->mark_list);
spin_unlock_irqrestore(&lc->flush_lock, flags);
return;
@ -462,7 +545,7 @@ static void userspace_clear_region(struct dm_dirty_log *log, region_t region)
spin_lock_irqsave(&lc->flush_lock, flags);
fe->type = DM_ULOG_CLEAR_REGION;
fe->region = region;
list_add(&fe->list, &lc->flush_list);
list_add(&fe->list, &lc->clear_list);
spin_unlock_irqrestore(&lc->flush_lock, flags);
return;
@ -684,7 +767,7 @@ static int __init userspace_dirty_log_init(void)
return r;
}
DMINFO("version 1.0.0 loaded");
DMINFO("version " DM_LOG_USERSPACE_VSN " loaded");
return 0;
}
@ -694,7 +777,7 @@ static void __exit userspace_dirty_log_exit(void)
dm_ulog_tfr_exit();
mempool_destroy(flush_entry_pool);
DMINFO("version 1.0.0 unloaded");
DMINFO("version " DM_LOG_USERSPACE_VSN " unloaded");
return;
}

View File

@ -198,6 +198,7 @@ resend:
memset(tfr, 0, DM_ULOG_PREALLOCED_SIZE - sizeof(struct cn_msg));
memcpy(tfr->uuid, uuid, DM_UUID_LEN);
tfr->version = DM_ULOG_REQUEST_VERSION;
tfr->luid = luid;
tfr->seq = dm_ulog_seq++;

View File

@ -455,7 +455,7 @@ static int create_log_context(struct dm_dirty_log *log, struct dm_target *ti,
r = PTR_ERR(lc->io_req.client);
DMWARN("couldn't allocate disk io client");
kfree(lc);
return -ENOMEM;
return r;
}
lc->disk_header = vmalloc(buf_size);

View File

@ -23,6 +23,8 @@
#define DM_MSG_PREFIX "multipath"
#define MESG_STR(x) x, sizeof(x)
#define DM_PG_INIT_DELAY_MSECS 2000
#define DM_PG_INIT_DELAY_DEFAULT ((unsigned) -1)
/* Path properties */
struct pgpath {
@ -33,8 +35,7 @@ struct pgpath {
unsigned fail_count; /* Cumulative failure count */
struct dm_path path;
struct work_struct deactivate_path;
struct work_struct activate_path;
struct delayed_work activate_path;
};
#define path_to_pgpath(__pgp) container_of((__pgp), struct pgpath, path)
@ -65,11 +66,15 @@ struct multipath {
const char *hw_handler_name;
char *hw_handler_params;
unsigned nr_priority_groups;
struct list_head priority_groups;
wait_queue_head_t pg_init_wait; /* Wait for pg_init completion */
unsigned pg_init_required; /* pg_init needs calling? */
unsigned pg_init_in_progress; /* Only one pg_init allowed at once */
wait_queue_head_t pg_init_wait; /* Wait for pg_init completion */
unsigned pg_init_delay_retry; /* Delay pg_init retry? */
unsigned nr_valid_paths; /* Total number of usable paths */
struct pgpath *current_pgpath;
@ -82,6 +87,7 @@ struct multipath {
unsigned saved_queue_if_no_path;/* Saved state during suspension */
unsigned pg_init_retries; /* Number of times to retry pg_init */
unsigned pg_init_count; /* Number of times pg_init called */
unsigned pg_init_delay_msecs; /* Number of msecs before pg_init retry */
struct work_struct process_queued_ios;
struct list_head queued_ios;
@ -116,7 +122,6 @@ static struct workqueue_struct *kmultipathd, *kmpath_handlerd;
static void process_queued_ios(struct work_struct *work);
static void trigger_event(struct work_struct *work);
static void activate_path(struct work_struct *work);
static void deactivate_path(struct work_struct *work);
/*-----------------------------------------------
@ -129,8 +134,7 @@ static struct pgpath *alloc_pgpath(void)
if (pgpath) {
pgpath->is_active = 1;
INIT_WORK(&pgpath->deactivate_path, deactivate_path);
INIT_WORK(&pgpath->activate_path, activate_path);
INIT_DELAYED_WORK(&pgpath->activate_path, activate_path);
}
return pgpath;
@ -141,14 +145,6 @@ static void free_pgpath(struct pgpath *pgpath)
kfree(pgpath);
}
static void deactivate_path(struct work_struct *work)
{
struct pgpath *pgpath =
container_of(work, struct pgpath, deactivate_path);
blk_abort_queue(pgpath->path.dev->bdev->bd_disk->queue);
}
static struct priority_group *alloc_priority_group(void)
{
struct priority_group *pg;
@ -199,6 +195,7 @@ static struct multipath *alloc_multipath(struct dm_target *ti)
INIT_LIST_HEAD(&m->queued_ios);
spin_lock_init(&m->lock);
m->queue_io = 1;
m->pg_init_delay_msecs = DM_PG_INIT_DELAY_DEFAULT;
INIT_WORK(&m->process_queued_ios, process_queued_ios);
INIT_WORK(&m->trigger_event, trigger_event);
init_waitqueue_head(&m->pg_init_wait);
@ -238,14 +235,19 @@ static void free_multipath(struct multipath *m)
static void __pg_init_all_paths(struct multipath *m)
{
struct pgpath *pgpath;
unsigned long pg_init_delay = 0;
m->pg_init_count++;
m->pg_init_required = 0;
if (m->pg_init_delay_retry)
pg_init_delay = msecs_to_jiffies(m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT ?
m->pg_init_delay_msecs : DM_PG_INIT_DELAY_MSECS);
list_for_each_entry(pgpath, &m->current_pg->pgpaths, list) {
/* Skip failed paths */
if (!pgpath->is_active)
continue;
if (queue_work(kmpath_handlerd, &pgpath->activate_path))
if (queue_delayed_work(kmpath_handlerd, &pgpath->activate_path,
pg_init_delay))
m->pg_init_in_progress++;
}
}
@ -793,8 +795,9 @@ static int parse_features(struct arg_set *as, struct multipath *m)
const char *param_name;
static struct param _params[] = {
{0, 3, "invalid number of feature args"},
{0, 5, "invalid number of feature args"},
{1, 50, "pg_init_retries must be between 1 and 50"},
{0, 60000, "pg_init_delay_msecs must be between 0 and 60000"},
};
r = read_param(_params, shift(as), &argc, &ti->error);
@ -821,6 +824,14 @@ static int parse_features(struct arg_set *as, struct multipath *m)
continue;
}
if (!strnicmp(param_name, MESG_STR("pg_init_delay_msecs")) &&
(argc >= 1)) {
r = read_param(_params + 2, shift(as),
&m->pg_init_delay_msecs, &ti->error);
argc--;
continue;
}
ti->error = "Unrecognised multipath feature request";
r = -EINVAL;
} while (argc && !r);
@ -931,7 +942,7 @@ static void flush_multipath_work(struct multipath *m)
flush_workqueue(kmpath_handlerd);
multipath_wait_for_pg_init_completion(m);
flush_workqueue(kmultipathd);
flush_scheduled_work();
flush_work_sync(&m->trigger_event);
}
static void multipath_dtr(struct dm_target *ti)
@ -995,7 +1006,6 @@ static int fail_path(struct pgpath *pgpath)
pgpath->path.dev->name, m->nr_valid_paths);
schedule_work(&m->trigger_event);
queue_work(kmultipathd, &pgpath->deactivate_path);
out:
spin_unlock_irqrestore(&m->lock, flags);
@ -1034,7 +1044,7 @@ static int reinstate_path(struct pgpath *pgpath)
m->current_pgpath = NULL;
queue_work(kmultipathd, &m->process_queued_ios);
} else if (m->hw_handler_name && (m->current_pg == pgpath->pg)) {
if (queue_work(kmpath_handlerd, &pgpath->activate_path))
if (queue_work(kmpath_handlerd, &pgpath->activate_path.work))
m->pg_init_in_progress++;
}
@ -1169,6 +1179,7 @@ static void pg_init_done(void *data, int errors)
struct priority_group *pg = pgpath->pg;
struct multipath *m = pg->m;
unsigned long flags;
unsigned delay_retry = 0;
/* device or driver problems */
switch (errors) {
@ -1193,8 +1204,9 @@ static void pg_init_done(void *data, int errors)
*/
bypass_pg(m, pg, 1);
break;
/* TODO: For SCSI_DH_RETRY we should wait a couple seconds */
case SCSI_DH_RETRY:
/* Wait before retrying. */
delay_retry = 1;
case SCSI_DH_IMM_RETRY:
case SCSI_DH_RES_TEMP_UNAVAIL:
if (pg_init_limit_reached(m, pgpath))
@ -1227,6 +1239,7 @@ static void pg_init_done(void *data, int errors)
if (!m->pg_init_required)
m->queue_io = 0;
m->pg_init_delay_retry = delay_retry;
queue_work(kmultipathd, &m->process_queued_ios);
/*
@ -1241,7 +1254,7 @@ out:
static void activate_path(struct work_struct *work)
{
struct pgpath *pgpath =
container_of(work, struct pgpath, activate_path);
container_of(work, struct pgpath, activate_path.work);
scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),
pg_init_done, pgpath);
@ -1382,11 +1395,14 @@ static int multipath_status(struct dm_target *ti, status_type_t type,
DMEMIT("2 %u %u ", m->queue_size, m->pg_init_count);
else {
DMEMIT("%u ", m->queue_if_no_path +
(m->pg_init_retries > 0) * 2);
(m->pg_init_retries > 0) * 2 +
(m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT) * 2);
if (m->queue_if_no_path)
DMEMIT("queue_if_no_path ");
if (m->pg_init_retries)
DMEMIT("pg_init_retries %u ", m->pg_init_retries);
if (m->pg_init_delay_msecs != DM_PG_INIT_DELAY_DEFAULT)
DMEMIT("pg_init_delay_msecs %u ", m->pg_init_delay_msecs);
}
if (!m->hw_handler_name || type == STATUSTYPE_INFO)
@ -1655,7 +1671,7 @@ out:
*---------------------------------------------------------------*/
static struct target_type multipath_target = {
.name = "multipath",
.version = {1, 1, 1},
.version = {1, 2, 0},
.module = THIS_MODULE,
.ctr = multipath_ctr,
.dtr = multipath_dtr,
@ -1687,7 +1703,7 @@ static int __init dm_multipath_init(void)
return -EINVAL;
}
kmultipathd = create_workqueue("kmpathd");
kmultipathd = alloc_workqueue("kmpathd", WQ_MEM_RECLAIM, 0);
if (!kmultipathd) {
DMERR("failed to create workqueue kmpathd");
dm_unregister_target(&multipath_target);
@ -1701,7 +1717,8 @@ static int __init dm_multipath_init(void)
* old workqueue would also create a bottleneck in the
* path of the storage hardware device activation.
*/
kmpath_handlerd = create_singlethread_workqueue("kmpath_handlerd");
kmpath_handlerd = alloc_ordered_workqueue("kmpath_handlerd",
WQ_MEM_RECLAIM);
if (!kmpath_handlerd) {
DMERR("failed to create workqueue kmpath_handlerd");
destroy_workqueue(kmultipathd);

697
drivers/md/dm-raid.c Normal file
View File

@ -0,0 +1,697 @@
/*
* Copyright (C) 2010-2011 Neil Brown
* Copyright (C) 2010-2011 Red Hat, Inc. All rights reserved.
*
* This file is released under the GPL.
*/
#include <linux/slab.h>
#include "md.h"
#include "raid5.h"
#include "dm.h"
#include "bitmap.h"
#define DM_MSG_PREFIX "raid"
/*
* If the MD doesn't support MD_SYNC_STATE_FORCED yet, then
* make it so the flag doesn't set anything.
*/
#ifndef MD_SYNC_STATE_FORCED
#define MD_SYNC_STATE_FORCED 0
#endif
struct raid_dev {
/*
* Two DM devices, one to hold metadata and one to hold the
* actual data/parity. The reason for this is to not confuse
* ti->len and give more flexibility in altering size and
* characteristics.
*
* While it is possible for this device to be associated
* with a different physical device than the data_dev, it
* is intended for it to be the same.
* |--------- Physical Device ---------|
* |- meta_dev -|------ data_dev ------|
*/
struct dm_dev *meta_dev;
struct dm_dev *data_dev;
struct mdk_rdev_s rdev;
};
/*
* Flags for rs->print_flags field.
*/
#define DMPF_DAEMON_SLEEP 0x1
#define DMPF_MAX_WRITE_BEHIND 0x2
#define DMPF_SYNC 0x4
#define DMPF_NOSYNC 0x8
#define DMPF_STRIPE_CACHE 0x10
#define DMPF_MIN_RECOVERY_RATE 0x20
#define DMPF_MAX_RECOVERY_RATE 0x40
struct raid_set {
struct dm_target *ti;
uint64_t print_flags;
struct mddev_s md;
struct raid_type *raid_type;
struct dm_target_callbacks callbacks;
struct raid_dev dev[0];
};
/* Supported raid types and properties. */
static struct raid_type {
const char *name; /* RAID algorithm. */
const char *descr; /* Descriptor text for logging. */
const unsigned parity_devs; /* # of parity devices. */
const unsigned minimal_devs; /* minimal # of devices in set. */
const unsigned level; /* RAID level. */
const unsigned algorithm; /* RAID algorithm. */
} raid_types[] = {
{"raid4", "RAID4 (dedicated parity disk)", 1, 2, 5, ALGORITHM_PARITY_0},
{"raid5_la", "RAID5 (left asymmetric)", 1, 2, 5, ALGORITHM_LEFT_ASYMMETRIC},
{"raid5_ra", "RAID5 (right asymmetric)", 1, 2, 5, ALGORITHM_RIGHT_ASYMMETRIC},
{"raid5_ls", "RAID5 (left symmetric)", 1, 2, 5, ALGORITHM_LEFT_SYMMETRIC},
{"raid5_rs", "RAID5 (right symmetric)", 1, 2, 5, ALGORITHM_RIGHT_SYMMETRIC},
{"raid6_zr", "RAID6 (zero restart)", 2, 4, 6, ALGORITHM_ROTATING_ZERO_RESTART},
{"raid6_nr", "RAID6 (N restart)", 2, 4, 6, ALGORITHM_ROTATING_N_RESTART},
{"raid6_nc", "RAID6 (N continue)", 2, 4, 6, ALGORITHM_ROTATING_N_CONTINUE}
};
static struct raid_type *get_raid_type(char *name)
{
int i;
for (i = 0; i < ARRAY_SIZE(raid_types); i++)
if (!strcmp(raid_types[i].name, name))
return &raid_types[i];
return NULL;
}
static struct raid_set *context_alloc(struct dm_target *ti, struct raid_type *raid_type, unsigned raid_devs)
{
unsigned i;
struct raid_set *rs;
sector_t sectors_per_dev;
if (raid_devs <= raid_type->parity_devs) {
ti->error = "Insufficient number of devices";
return ERR_PTR(-EINVAL);
}
sectors_per_dev = ti->len;
if (sector_div(sectors_per_dev, (raid_devs - raid_type->parity_devs))) {
ti->error = "Target length not divisible by number of data devices";
return ERR_PTR(-EINVAL);
}
rs = kzalloc(sizeof(*rs) + raid_devs * sizeof(rs->dev[0]), GFP_KERNEL);
if (!rs) {
ti->error = "Cannot allocate raid context";
return ERR_PTR(-ENOMEM);
}
mddev_init(&rs->md);
rs->ti = ti;
rs->raid_type = raid_type;
rs->md.raid_disks = raid_devs;
rs->md.level = raid_type->level;
rs->md.new_level = rs->md.level;
rs->md.dev_sectors = sectors_per_dev;
rs->md.layout = raid_type->algorithm;
rs->md.new_layout = rs->md.layout;
rs->md.delta_disks = 0;
rs->md.recovery_cp = 0;
for (i = 0; i < raid_devs; i++)
md_rdev_init(&rs->dev[i].rdev);
/*
* Remaining items to be initialized by further RAID params:
* rs->md.persistent
* rs->md.external
* rs->md.chunk_sectors
* rs->md.new_chunk_sectors
*/
return rs;
}
static void context_free(struct raid_set *rs)
{
int i;
for (i = 0; i < rs->md.raid_disks; i++)
if (rs->dev[i].data_dev)
dm_put_device(rs->ti, rs->dev[i].data_dev);
kfree(rs);
}
/*
* For every device we have two words
* <meta_dev>: meta device name or '-' if missing
* <data_dev>: data device name or '-' if missing
*
* This code parses those words.
*/
static int dev_parms(struct raid_set *rs, char **argv)
{
int i;
int rebuild = 0;
int metadata_available = 0;
int ret = 0;
for (i = 0; i < rs->md.raid_disks; i++, argv += 2) {
rs->dev[i].rdev.raid_disk = i;
rs->dev[i].meta_dev = NULL;
rs->dev[i].data_dev = NULL;
/*
* There are no offsets, since there is a separate device
* for data and metadata.
*/
rs->dev[i].rdev.data_offset = 0;
rs->dev[i].rdev.mddev = &rs->md;
if (strcmp(argv[0], "-")) {
rs->ti->error = "Metadata devices not supported";
return -EINVAL;
}
if (!strcmp(argv[1], "-")) {
if (!test_bit(In_sync, &rs->dev[i].rdev.flags) &&
(!rs->dev[i].rdev.recovery_offset)) {
rs->ti->error = "Drive designated for rebuild not specified";
return -EINVAL;
}
continue;
}
ret = dm_get_device(rs->ti, argv[1],
dm_table_get_mode(rs->ti->table),
&rs->dev[i].data_dev);
if (ret) {
rs->ti->error = "RAID device lookup failure";
return ret;
}
rs->dev[i].rdev.bdev = rs->dev[i].data_dev->bdev;
list_add(&rs->dev[i].rdev.same_set, &rs->md.disks);
if (!test_bit(In_sync, &rs->dev[i].rdev.flags))
rebuild++;
}
if (metadata_available) {
rs->md.external = 0;
rs->md.persistent = 1;
rs->md.major_version = 2;
} else if (rebuild && !rs->md.recovery_cp) {
/*
* Without metadata, we will not be able to tell if the array
* is in-sync or not - we must assume it is not. Therefore,
* it is impossible to rebuild a drive.
*
* Even if there is metadata, the on-disk information may
* indicate that the array is not in-sync and it will then
* fail at that time.
*
* User could specify 'nosync' option if desperate.
*/
DMERR("Unable to rebuild drive while array is not in-sync");
rs->ti->error = "RAID device lookup failure";
return -EINVAL;
}
return 0;
}
/*
* Possible arguments are...
* RAID456:
* <chunk_size> [optional_args]
*
* Optional args:
* [[no]sync] Force or prevent recovery of the entire array
* [rebuild <idx>] Rebuild the drive indicated by the index
* [daemon_sleep <ms>] Time between bitmap daemon work to clear bits
* [min_recovery_rate <kB/sec/disk>] Throttle RAID initialization
* [max_recovery_rate <kB/sec/disk>] Throttle RAID initialization
* [max_write_behind <sectors>] See '-write-behind=' (man mdadm)
* [stripe_cache <sectors>] Stripe cache size for higher RAIDs
*/
static int parse_raid_params(struct raid_set *rs, char **argv,
unsigned num_raid_params)
{
unsigned i, rebuild_cnt = 0;
unsigned long value;
char *key;
/*
* First, parse the in-order required arguments
*/
if ((strict_strtoul(argv[0], 10, &value) < 0) ||
!is_power_of_2(value) || (value < 8)) {
rs->ti->error = "Bad chunk size";
return -EINVAL;
}
rs->md.new_chunk_sectors = rs->md.chunk_sectors = value;
argv++;
num_raid_params--;
/*
* Second, parse the unordered optional arguments
*/
for (i = 0; i < rs->md.raid_disks; i++)
set_bit(In_sync, &rs->dev[i].rdev.flags);
for (i = 0; i < num_raid_params; i++) {
if (!strcmp(argv[i], "nosync")) {
rs->md.recovery_cp = MaxSector;
rs->print_flags |= DMPF_NOSYNC;
rs->md.flags |= MD_SYNC_STATE_FORCED;
continue;
}
if (!strcmp(argv[i], "sync")) {
rs->md.recovery_cp = 0;
rs->print_flags |= DMPF_SYNC;
rs->md.flags |= MD_SYNC_STATE_FORCED;
continue;
}
/* The rest of the optional arguments come in key/value pairs */
if ((i + 1) >= num_raid_params) {
rs->ti->error = "Wrong number of raid parameters given";
return -EINVAL;
}
key = argv[i++];
if (strict_strtoul(argv[i], 10, &value) < 0) {
rs->ti->error = "Bad numerical argument given in raid params";
return -EINVAL;
}
if (!strcmp(key, "rebuild")) {
if (++rebuild_cnt > rs->raid_type->parity_devs) {
rs->ti->error = "Too many rebuild drives given";
return -EINVAL;
}
if (value > rs->md.raid_disks) {
rs->ti->error = "Invalid rebuild index given";
return -EINVAL;
}
clear_bit(In_sync, &rs->dev[value].rdev.flags);
rs->dev[value].rdev.recovery_offset = 0;
} else if (!strcmp(key, "max_write_behind")) {
rs->print_flags |= DMPF_MAX_WRITE_BEHIND;
/*
* In device-mapper, we specify things in sectors, but
* MD records this value in kB
*/
value /= 2;
if (value > COUNTER_MAX) {
rs->ti->error = "Max write-behind limit out of range";
return -EINVAL;
}
rs->md.bitmap_info.max_write_behind = value;
} else if (!strcmp(key, "daemon_sleep")) {
rs->print_flags |= DMPF_DAEMON_SLEEP;
if (!value || (value > MAX_SCHEDULE_TIMEOUT)) {
rs->ti->error = "daemon sleep period out of range";
return -EINVAL;
}
rs->md.bitmap_info.daemon_sleep = value;
} else if (!strcmp(key, "stripe_cache")) {
rs->print_flags |= DMPF_STRIPE_CACHE;
/*
* In device-mapper, we specify things in sectors, but
* MD records this value in kB
*/
value /= 2;
if (rs->raid_type->level < 5) {
rs->ti->error = "Inappropriate argument: stripe_cache";
return -EINVAL;
}
if (raid5_set_cache_size(&rs->md, (int)value)) {
rs->ti->error = "Bad stripe_cache size";
return -EINVAL;
}
} else if (!strcmp(key, "min_recovery_rate")) {
rs->print_flags |= DMPF_MIN_RECOVERY_RATE;
if (value > INT_MAX) {
rs->ti->error = "min_recovery_rate out of range";
return -EINVAL;
}
rs->md.sync_speed_min = (int)value;
} else if (!strcmp(key, "max_recovery_rate")) {
rs->print_flags |= DMPF_MAX_RECOVERY_RATE;
if (value > INT_MAX) {
rs->ti->error = "max_recovery_rate out of range";
return -EINVAL;
}
rs->md.sync_speed_max = (int)value;
} else {
DMERR("Unable to parse RAID parameter: %s", key);
rs->ti->error = "Unable to parse RAID parameters";
return -EINVAL;
}
}
/* Assume there are no metadata devices until the drives are parsed */
rs->md.persistent = 0;
rs->md.external = 1;
return 0;
}
static void do_table_event(struct work_struct *ws)
{
struct raid_set *rs = container_of(ws, struct raid_set, md.event_work);
dm_table_event(rs->ti->table);
}
static int raid_is_congested(struct dm_target_callbacks *cb, int bits)
{
struct raid_set *rs = container_of(cb, struct raid_set, callbacks);
return md_raid5_congested(&rs->md, bits);
}
static void raid_unplug(struct dm_target_callbacks *cb)
{
struct raid_set *rs = container_of(cb, struct raid_set, callbacks);
md_raid5_unplug_device(rs->md.private);
}
/*
* Construct a RAID4/5/6 mapping:
* Args:
* <raid_type> <#raid_params> <raid_params> \
* <#raid_devs> { <meta_dev1> <dev1> .. <meta_devN> <devN> }
*
* ** metadata devices are not supported yet, use '-' instead **
*
* <raid_params> varies by <raid_type>. See 'parse_raid_params' for
* details on possible <raid_params>.
*/
static int raid_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
int ret;
struct raid_type *rt;
unsigned long num_raid_params, num_raid_devs;
struct raid_set *rs = NULL;
/* Must have at least <raid_type> <#raid_params> */
if (argc < 2) {
ti->error = "Too few arguments";
return -EINVAL;
}
/* raid type */
rt = get_raid_type(argv[0]);
if (!rt) {
ti->error = "Unrecognised raid_type";
return -EINVAL;
}
argc--;
argv++;
/* number of RAID parameters */
if (strict_strtoul(argv[0], 10, &num_raid_params) < 0) {
ti->error = "Cannot understand number of RAID parameters";
return -EINVAL;
}
argc--;
argv++;
/* Skip over RAID params for now and find out # of devices */
if (num_raid_params + 1 > argc) {
ti->error = "Arguments do not agree with counts given";
return -EINVAL;
}
if ((strict_strtoul(argv[num_raid_params], 10, &num_raid_devs) < 0) ||
(num_raid_devs >= INT_MAX)) {
ti->error = "Cannot understand number of raid devices";
return -EINVAL;
}
rs = context_alloc(ti, rt, (unsigned)num_raid_devs);
if (IS_ERR(rs))
return PTR_ERR(rs);
ret = parse_raid_params(rs, argv, (unsigned)num_raid_params);
if (ret)
goto bad;
ret = -EINVAL;
argc -= num_raid_params + 1; /* +1: we already have num_raid_devs */
argv += num_raid_params + 1;
if (argc != (num_raid_devs * 2)) {
ti->error = "Supplied RAID devices does not match the count given";
goto bad;
}
ret = dev_parms(rs, argv);
if (ret)
goto bad;
INIT_WORK(&rs->md.event_work, do_table_event);
ti->split_io = rs->md.chunk_sectors;
ti->private = rs;
mutex_lock(&rs->md.reconfig_mutex);
ret = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */
mutex_unlock(&rs->md.reconfig_mutex);
if (ret) {
ti->error = "Fail to run raid array";
goto bad;
}
rs->callbacks.congested_fn = raid_is_congested;
rs->callbacks.unplug_fn = raid_unplug;
dm_table_add_target_callbacks(ti->table, &rs->callbacks);
return 0;
bad:
context_free(rs);
return ret;
}
static void raid_dtr(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
list_del_init(&rs->callbacks.list);
md_stop(&rs->md);
context_free(rs);
}
static int raid_map(struct dm_target *ti, struct bio *bio, union map_info *map_context)
{
struct raid_set *rs = ti->private;
mddev_t *mddev = &rs->md;
mddev->pers->make_request(mddev, bio);
return DM_MAPIO_SUBMITTED;
}
static int raid_status(struct dm_target *ti, status_type_t type,
char *result, unsigned maxlen)
{
struct raid_set *rs = ti->private;
unsigned raid_param_cnt = 1; /* at least 1 for chunksize */
unsigned sz = 0;
int i;
sector_t sync;
switch (type) {
case STATUSTYPE_INFO:
DMEMIT("%s %d ", rs->raid_type->name, rs->md.raid_disks);
for (i = 0; i < rs->md.raid_disks; i++) {
if (test_bit(Faulty, &rs->dev[i].rdev.flags))
DMEMIT("D");
else if (test_bit(In_sync, &rs->dev[i].rdev.flags))
DMEMIT("A");
else
DMEMIT("a");
}
if (test_bit(MD_RECOVERY_RUNNING, &rs->md.recovery))
sync = rs->md.curr_resync_completed;
else
sync = rs->md.recovery_cp;
if (sync > rs->md.resync_max_sectors)
sync = rs->md.resync_max_sectors;
DMEMIT(" %llu/%llu",
(unsigned long long) sync,
(unsigned long long) rs->md.resync_max_sectors);
break;
case STATUSTYPE_TABLE:
/* The string you would use to construct this array */
for (i = 0; i < rs->md.raid_disks; i++)
if (rs->dev[i].data_dev &&
!test_bit(In_sync, &rs->dev[i].rdev.flags))
raid_param_cnt++; /* for rebuilds */
raid_param_cnt += (hweight64(rs->print_flags) * 2);
if (rs->print_flags & (DMPF_SYNC | DMPF_NOSYNC))
raid_param_cnt--;
DMEMIT("%s %u %u", rs->raid_type->name,
raid_param_cnt, rs->md.chunk_sectors);
if ((rs->print_flags & DMPF_SYNC) &&
(rs->md.recovery_cp == MaxSector))
DMEMIT(" sync");
if (rs->print_flags & DMPF_NOSYNC)
DMEMIT(" nosync");
for (i = 0; i < rs->md.raid_disks; i++)
if (rs->dev[i].data_dev &&
!test_bit(In_sync, &rs->dev[i].rdev.flags))
DMEMIT(" rebuild %u", i);
if (rs->print_flags & DMPF_DAEMON_SLEEP)
DMEMIT(" daemon_sleep %lu",
rs->md.bitmap_info.daemon_sleep);
if (rs->print_flags & DMPF_MIN_RECOVERY_RATE)
DMEMIT(" min_recovery_rate %d", rs->md.sync_speed_min);
if (rs->print_flags & DMPF_MAX_RECOVERY_RATE)
DMEMIT(" max_recovery_rate %d", rs->md.sync_speed_max);
if (rs->print_flags & DMPF_MAX_WRITE_BEHIND)
DMEMIT(" max_write_behind %lu",
rs->md.bitmap_info.max_write_behind);
if (rs->print_flags & DMPF_STRIPE_CACHE) {
raid5_conf_t *conf = rs->md.private;
/* convert from kiB to sectors */
DMEMIT(" stripe_cache %d",
conf ? conf->max_nr_stripes * 2 : 0);
}
DMEMIT(" %d", rs->md.raid_disks);
for (i = 0; i < rs->md.raid_disks; i++) {
DMEMIT(" -"); /* metadata device */
if (rs->dev[i].data_dev)
DMEMIT(" %s", rs->dev[i].data_dev->name);
else
DMEMIT(" -");
}
}
return 0;
}
static int raid_iterate_devices(struct dm_target *ti, iterate_devices_callout_fn fn, void *data)
{
struct raid_set *rs = ti->private;
unsigned i;
int ret = 0;
for (i = 0; !ret && i < rs->md.raid_disks; i++)
if (rs->dev[i].data_dev)
ret = fn(ti,
rs->dev[i].data_dev,
0, /* No offset on data devs */
rs->md.dev_sectors,
data);
return ret;
}
static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
{
struct raid_set *rs = ti->private;
unsigned chunk_size = rs->md.chunk_sectors << 9;
raid5_conf_t *conf = rs->md.private;
blk_limits_io_min(limits, chunk_size);
blk_limits_io_opt(limits, chunk_size * (conf->raid_disks - conf->max_degraded));
}
static void raid_presuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
md_stop_writes(&rs->md);
}
static void raid_postsuspend(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
mddev_suspend(&rs->md);
}
static void raid_resume(struct dm_target *ti)
{
struct raid_set *rs = ti->private;
mddev_resume(&rs->md);
}
static struct target_type raid_target = {
.name = "raid",
.version = {1, 0, 0},
.module = THIS_MODULE,
.ctr = raid_ctr,
.dtr = raid_dtr,
.map = raid_map,
.status = raid_status,
.iterate_devices = raid_iterate_devices,
.io_hints = raid_io_hints,
.presuspend = raid_presuspend,
.postsuspend = raid_postsuspend,
.resume = raid_resume,
};
static int __init dm_raid_init(void)
{
return dm_register_target(&raid_target);
}
static void __exit dm_raid_exit(void)
{
dm_unregister_target(&raid_target);
}
module_init(dm_raid_init);
module_exit(dm_raid_exit);
MODULE_DESCRIPTION(DM_NAME " raid4/5/6 target");
MODULE_ALIAS("dm-raid4");
MODULE_ALIAS("dm-raid5");
MODULE_ALIAS("dm-raid6");
MODULE_AUTHOR("Neil Brown <dm-devel@redhat.com>");
MODULE_LICENSE("GPL");

View File

@ -261,7 +261,7 @@ static int mirror_flush(struct dm_target *ti)
struct dm_io_request io_req = {
.bi_rw = WRITE_FLUSH,
.mem.type = DM_IO_KMEM,
.mem.ptr.bvec = NULL,
.mem.ptr.addr = NULL,
.client = ms->io_client,
};
@ -637,6 +637,12 @@ static void do_write(struct mirror_set *ms, struct bio *bio)
.client = ms->io_client,
};
if (bio->bi_rw & REQ_DISCARD) {
io_req.bi_rw |= REQ_DISCARD;
io_req.mem.type = DM_IO_KMEM;
io_req.mem.ptr.addr = NULL;
}
for (i = 0, m = ms->mirror; i < ms->nr_mirrors; i++, m++)
map_region(dest++, m, bio);
@ -670,7 +676,8 @@ static void do_writes(struct mirror_set *ms, struct bio_list *writes)
bio_list_init(&requeue);
while ((bio = bio_list_pop(writes))) {
if (bio->bi_rw & REQ_FLUSH) {
if ((bio->bi_rw & REQ_FLUSH) ||
(bio->bi_rw & REQ_DISCARD)) {
bio_list_add(&sync, bio);
continue;
}
@ -1076,8 +1083,10 @@ static int mirror_ctr(struct dm_target *ti, unsigned int argc, char **argv)
ti->private = ms;
ti->split_io = dm_rh_get_region_size(ms->rh);
ti->num_flush_requests = 1;
ti->num_discard_requests = 1;
ms->kmirrord_wq = create_singlethread_workqueue("kmirrord");
ms->kmirrord_wq = alloc_workqueue("kmirrord",
WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);
if (!ms->kmirrord_wq) {
DMERR("couldn't start kmirrord");
r = -ENOMEM;
@ -1130,7 +1139,7 @@ static void mirror_dtr(struct dm_target *ti)
del_timer_sync(&ms->timer);
flush_workqueue(ms->kmirrord_wq);
flush_scheduled_work();
flush_work_sync(&ms->trigger_event);
dm_kcopyd_client_destroy(ms->kcopyd_client);
destroy_workqueue(ms->kmirrord_wq);
free_context(ms, ti, ms->nr_mirrors);
@ -1406,7 +1415,7 @@ static int mirror_iterate_devices(struct dm_target *ti,
static struct target_type mirror_target = {
.name = "mirror",
.version = {1, 12, 0},
.version = {1, 12, 1},
.module = THIS_MODULE,
.ctr = mirror_ctr,
.dtr = mirror_dtr,

View File

@ -256,7 +256,7 @@ static int chunk_io(struct pstore *ps, void *area, chunk_t chunk, int rw,
*/
INIT_WORK_ONSTACK(&req.work, do_metadata);
queue_work(ps->metadata_wq, &req.work);
flush_workqueue(ps->metadata_wq);
flush_work(&req.work);
return req.result;
}
@ -818,7 +818,7 @@ static int persistent_ctr(struct dm_exception_store *store,
atomic_set(&ps->pending_count, 0);
ps->callbacks = NULL;
ps->metadata_wq = create_singlethread_workqueue("ksnaphd");
ps->metadata_wq = alloc_workqueue("ksnaphd", WQ_MEM_RECLAIM, 0);
if (!ps->metadata_wq) {
kfree(ps);
DMERR("couldn't start header metadata update thread");

View File

@ -19,7 +19,6 @@
#include <linux/vmalloc.h>
#include <linux/log2.h>
#include <linux/dm-kcopyd.h>
#include <linux/workqueue.h>
#include "dm-exception-store.h"
@ -80,9 +79,6 @@ struct dm_snapshot {
/* Origin writes don't trigger exceptions until this is set */
int active;
/* Whether or not owning mapped_device is suspended */
int suspended;
atomic_t pending_exceptions_count;
mempool_t *pending_pool;
@ -106,10 +102,6 @@ struct dm_snapshot {
struct dm_kcopyd_client *kcopyd_client;
/* Queue of snapshot writes for ksnapd to flush */
struct bio_list queued_bios;
struct work_struct queued_bios_work;
/* Wait for events based on state_bits */
unsigned long state_bits;
@ -160,9 +152,6 @@ struct dm_dev *dm_snap_cow(struct dm_snapshot *s)
}
EXPORT_SYMBOL(dm_snap_cow);
static struct workqueue_struct *ksnapd;
static void flush_queued_bios(struct work_struct *work);
static sector_t chunk_to_sector(struct dm_exception_store *store,
chunk_t chunk)
{
@ -1110,7 +1099,6 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
s->ti = ti;
s->valid = 1;
s->active = 0;
s->suspended = 0;
atomic_set(&s->pending_exceptions_count, 0);
init_rwsem(&s->lock);
INIT_LIST_HEAD(&s->list);
@ -1153,9 +1141,6 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
spin_lock_init(&s->tracked_chunk_lock);
bio_list_init(&s->queued_bios);
INIT_WORK(&s->queued_bios_work, flush_queued_bios);
ti->private = s;
ti->num_flush_requests = num_flush_requests;
@ -1279,8 +1264,6 @@ static void snapshot_dtr(struct dm_target *ti)
struct dm_snapshot *s = ti->private;
struct dm_snapshot *snap_src = NULL, *snap_dest = NULL;
flush_workqueue(ksnapd);
down_read(&_origins_lock);
/* Check whether exception handover must be cancelled */
(void) __find_snapshots_sharing_cow(s, &snap_src, &snap_dest, NULL);
@ -1342,20 +1325,6 @@ static void flush_bios(struct bio *bio)
}
}
static void flush_queued_bios(struct work_struct *work)
{
struct dm_snapshot *s =
container_of(work, struct dm_snapshot, queued_bios_work);
struct bio *queued_bios;
unsigned long flags;
spin_lock_irqsave(&s->pe_lock, flags);
queued_bios = bio_list_get(&s->queued_bios);
spin_unlock_irqrestore(&s->pe_lock, flags);
flush_bios(queued_bios);
}
static int do_origin(struct dm_dev *origin, struct bio *bio);
/*
@ -1760,15 +1729,6 @@ static void snapshot_merge_presuspend(struct dm_target *ti)
stop_merge(s);
}
static void snapshot_postsuspend(struct dm_target *ti)
{
struct dm_snapshot *s = ti->private;
down_write(&s->lock);
s->suspended = 1;
up_write(&s->lock);
}
static int snapshot_preresume(struct dm_target *ti)
{
int r = 0;
@ -1783,7 +1743,7 @@ static int snapshot_preresume(struct dm_target *ti)
DMERR("Unable to resume snapshot source until "
"handover completes.");
r = -EINVAL;
} else if (!snap_src->suspended) {
} else if (!dm_suspended(snap_src->ti)) {
DMERR("Unable to perform snapshot handover until "
"source is suspended.");
r = -EINVAL;
@ -1816,7 +1776,6 @@ static void snapshot_resume(struct dm_target *ti)
down_write(&s->lock);
s->active = 1;
s->suspended = 0;
up_write(&s->lock);
}
@ -2194,7 +2153,7 @@ static int origin_iterate_devices(struct dm_target *ti,
static struct target_type origin_target = {
.name = "snapshot-origin",
.version = {1, 7, 0},
.version = {1, 7, 1},
.module = THIS_MODULE,
.ctr = origin_ctr,
.dtr = origin_dtr,
@ -2207,13 +2166,12 @@ static struct target_type origin_target = {
static struct target_type snapshot_target = {
.name = "snapshot",
.version = {1, 9, 0},
.version = {1, 10, 0},
.module = THIS_MODULE,
.ctr = snapshot_ctr,
.dtr = snapshot_dtr,
.map = snapshot_map,
.end_io = snapshot_end_io,
.postsuspend = snapshot_postsuspend,
.preresume = snapshot_preresume,
.resume = snapshot_resume,
.status = snapshot_status,
@ -2222,14 +2180,13 @@ static struct target_type snapshot_target = {
static struct target_type merge_target = {
.name = dm_snapshot_merge_target_name,
.version = {1, 0, 0},
.version = {1, 1, 0},
.module = THIS_MODULE,
.ctr = snapshot_ctr,
.dtr = snapshot_dtr,
.map = snapshot_merge_map,
.end_io = snapshot_end_io,
.presuspend = snapshot_merge_presuspend,
.postsuspend = snapshot_postsuspend,
.preresume = snapshot_preresume,
.resume = snapshot_merge_resume,
.status = snapshot_status,
@ -2291,17 +2248,8 @@ static int __init dm_snapshot_init(void)
goto bad_tracked_chunk_cache;
}
ksnapd = create_singlethread_workqueue("ksnapd");
if (!ksnapd) {
DMERR("Failed to create ksnapd workqueue.");
r = -ENOMEM;
goto bad_pending_pool;
}
return 0;
bad_pending_pool:
kmem_cache_destroy(tracked_chunk_cache);
bad_tracked_chunk_cache:
kmem_cache_destroy(pending_cache);
bad_pending_cache:
@ -2322,8 +2270,6 @@ bad_register_snapshot_target:
static void __exit dm_snapshot_exit(void)
{
destroy_workqueue(ksnapd);
dm_unregister_target(&snapshot_target);
dm_unregister_target(&origin_target);
dm_unregister_target(&merge_target);

View File

@ -39,23 +39,20 @@ struct stripe_c {
struct dm_target *ti;
/* Work struct used for triggering events*/
struct work_struct kstriped_ws;
struct work_struct trigger_event;
struct stripe stripe[0];
};
static struct workqueue_struct *kstriped;
/*
* An event is triggered whenever a drive
* drops out of a stripe volume.
*/
static void trigger_event(struct work_struct *work)
{
struct stripe_c *sc = container_of(work, struct stripe_c, kstriped_ws);
struct stripe_c *sc = container_of(work, struct stripe_c,
trigger_event);
dm_table_event(sc->ti->table);
}
static inline struct stripe_c *alloc_context(unsigned int stripes)
@ -160,7 +157,7 @@ static int stripe_ctr(struct dm_target *ti, unsigned int argc, char **argv)
return -ENOMEM;
}
INIT_WORK(&sc->kstriped_ws, trigger_event);
INIT_WORK(&sc->trigger_event, trigger_event);
/* Set pointer to dm target; used in trigger_event */
sc->ti = ti;
@ -211,7 +208,7 @@ static void stripe_dtr(struct dm_target *ti)
for (i = 0; i < sc->stripes; i++)
dm_put_device(ti, sc->stripe[i].dev);
flush_workqueue(kstriped);
flush_work_sync(&sc->trigger_event);
kfree(sc);
}
@ -367,7 +364,7 @@ static int stripe_end_io(struct dm_target *ti, struct bio *bio,
atomic_inc(&(sc->stripe[i].error_count));
if (atomic_read(&(sc->stripe[i].error_count)) <
DM_IO_ERROR_THRESHOLD)
queue_work(kstriped, &sc->kstriped_ws);
schedule_work(&sc->trigger_event);
}
return error;
@ -401,7 +398,7 @@ static void stripe_io_hints(struct dm_target *ti,
static struct target_type stripe_target = {
.name = "striped",
.version = {1, 3, 0},
.version = {1, 3, 1},
.module = THIS_MODULE,
.ctr = stripe_ctr,
.dtr = stripe_dtr,
@ -422,20 +419,10 @@ int __init dm_stripe_init(void)
return r;
}
kstriped = create_singlethread_workqueue("kstriped");
if (!kstriped) {
DMERR("failed to create workqueue kstriped");
dm_unregister_target(&stripe_target);
return -ENOMEM;
}
return r;
}
void dm_stripe_exit(void)
{
dm_unregister_target(&stripe_target);
destroy_workqueue(kstriped);
return;
}

View File

@ -71,6 +71,8 @@ struct dm_table {
void *event_context;
struct dm_md_mempools *mempools;
struct list_head target_callbacks;
};
/*
@ -204,6 +206,7 @@ int dm_table_create(struct dm_table **result, fmode_t mode,
return -ENOMEM;
INIT_LIST_HEAD(&t->devices);
INIT_LIST_HEAD(&t->target_callbacks);
atomic_set(&t->holders, 0);
t->discards_supported = 1;
@ -1225,10 +1228,17 @@ int dm_table_resume_targets(struct dm_table *t)
return 0;
}
void dm_table_add_target_callbacks(struct dm_table *t, struct dm_target_callbacks *cb)
{
list_add(&cb->list, &t->target_callbacks);
}
EXPORT_SYMBOL_GPL(dm_table_add_target_callbacks);
int dm_table_any_congested(struct dm_table *t, int bdi_bits)
{
struct dm_dev_internal *dd;
struct list_head *devices = dm_table_get_devices(t);
struct dm_target_callbacks *cb;
int r = 0;
list_for_each_entry(dd, devices, list) {
@ -1243,6 +1253,10 @@ int dm_table_any_congested(struct dm_table *t, int bdi_bits)
bdevname(dd->dm_dev.bdev, b));
}
list_for_each_entry(cb, &t->target_callbacks, list)
if (cb->congested_fn)
r |= cb->congested_fn(cb, bdi_bits);
return r;
}
@ -1264,6 +1278,7 @@ void dm_table_unplug_all(struct dm_table *t)
{
struct dm_dev_internal *dd;
struct list_head *devices = dm_table_get_devices(t);
struct dm_target_callbacks *cb;
list_for_each_entry(dd, devices, list) {
struct request_queue *q = bdev_get_queue(dd->dm_dev.bdev);
@ -1276,6 +1291,10 @@ void dm_table_unplug_all(struct dm_table *t)
dm_device_name(t->md),
bdevname(dd->dm_dev.bdev, b));
}
list_for_each_entry(cb, &t->target_callbacks, list)
if (cb->unplug_fn)
cb->unplug_fn(cb);
}
struct mapped_device *dm_table_get_md(struct dm_table *t)

View File

@ -32,7 +32,6 @@
#define DM_COOKIE_ENV_VAR_NAME "DM_COOKIE"
#define DM_COOKIE_LENGTH 24
static DEFINE_MUTEX(dm_mutex);
static const char *_name = DM_NAME;
static unsigned int major = 0;
@ -328,7 +327,6 @@ static int dm_blk_open(struct block_device *bdev, fmode_t mode)
{
struct mapped_device *md;
mutex_lock(&dm_mutex);
spin_lock(&_minor_lock);
md = bdev->bd_disk->private_data;
@ -346,7 +344,6 @@ static int dm_blk_open(struct block_device *bdev, fmode_t mode)
out:
spin_unlock(&_minor_lock);
mutex_unlock(&dm_mutex);
return md ? 0 : -ENXIO;
}
@ -355,10 +352,12 @@ static int dm_blk_close(struct gendisk *disk, fmode_t mode)
{
struct mapped_device *md = disk->private_data;
mutex_lock(&dm_mutex);
spin_lock(&_minor_lock);
atomic_dec(&md->open_count);
dm_put(md);
mutex_unlock(&dm_mutex);
spin_unlock(&_minor_lock);
return 0;
}
@ -1638,13 +1637,15 @@ static void dm_request_fn(struct request_queue *q)
if (map_request(ti, clone, md))
goto requeued;
spin_lock_irq(q->queue_lock);
BUG_ON(!irqs_disabled());
spin_lock(q->queue_lock);
}
goto out;
requeued:
spin_lock_irq(q->queue_lock);
BUG_ON(!irqs_disabled());
spin_lock(q->queue_lock);
plug_and_out:
if (!elv_queue_empty(q))
@ -1884,7 +1885,8 @@ static struct mapped_device *alloc_dev(int minor)
add_disk(md->disk);
format_dev_t(md->name, MKDEV(_major, minor));
md->wq = create_singlethread_workqueue("kdmflush");
md->wq = alloc_workqueue("kdmflush",
WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);
if (!md->wq)
goto bad_thread;
@ -1992,13 +1994,14 @@ static void event_callback(void *context)
wake_up(&md->eventq);
}
/*
* Protected by md->suspend_lock obtained by dm_swap_table().
*/
static void __set_size(struct mapped_device *md, sector_t size)
{
set_capacity(md->disk, size);
mutex_lock(&md->bdev->bd_inode->i_mutex);
i_size_write(md->bdev->bd_inode, (loff_t)size << SECTOR_SHIFT);
mutex_unlock(&md->bdev->bd_inode->i_mutex);
}
/*

View File

@ -288,10 +288,12 @@ static int md_make_request(struct request_queue *q, struct bio *bio)
int rv;
int cpu;
if (mddev == NULL || mddev->pers == NULL) {
if (mddev == NULL || mddev->pers == NULL
|| !mddev->ready) {
bio_io_error(bio);
return 0;
}
smp_rmb(); /* Ensure implications of 'active' are visible */
rcu_read_lock();
if (mddev->suspended) {
DEFINE_WAIT(__wait);
@ -703,9 +705,9 @@ static struct mdk_personality *find_pers(int level, char *clevel)
}
/* return the offset of the super block in 512byte sectors */
static inline sector_t calc_dev_sboffset(struct block_device *bdev)
static inline sector_t calc_dev_sboffset(mdk_rdev_t *rdev)
{
sector_t num_sectors = i_size_read(bdev->bd_inode) / 512;
sector_t num_sectors = i_size_read(rdev->bdev->bd_inode) / 512;
return MD_NEW_SIZE_SECTORS(num_sectors);
}
@ -763,7 +765,7 @@ void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev,
*/
struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, mddev);
bio->bi_bdev = rdev->bdev;
bio->bi_bdev = rdev->meta_bdev ? rdev->meta_bdev : rdev->bdev;
bio->bi_sector = sector;
bio_add_page(bio, page, size, 0);
bio->bi_private = rdev;
@ -793,7 +795,7 @@ static void bi_complete(struct bio *bio, int error)
}
int sync_page_io(mdk_rdev_t *rdev, sector_t sector, int size,
struct page *page, int rw)
struct page *page, int rw, bool metadata_op)
{
struct bio *bio = bio_alloc_mddev(GFP_NOIO, 1, rdev->mddev);
struct completion event;
@ -801,8 +803,12 @@ int sync_page_io(mdk_rdev_t *rdev, sector_t sector, int size,
rw |= REQ_SYNC | REQ_UNPLUG;
bio->bi_bdev = rdev->bdev;
bio->bi_sector = sector;
bio->bi_bdev = (metadata_op && rdev->meta_bdev) ?
rdev->meta_bdev : rdev->bdev;
if (metadata_op)
bio->bi_sector = sector + rdev->sb_start;
else
bio->bi_sector = sector + rdev->data_offset;
bio_add_page(bio, page, size, 0);
init_completion(&event);
bio->bi_private = &event;
@ -827,7 +833,7 @@ static int read_disk_sb(mdk_rdev_t * rdev, int size)
return 0;
if (!sync_page_io(rdev, rdev->sb_start, size, rdev->sb_page, READ))
if (!sync_page_io(rdev, 0, size, rdev->sb_page, READ, true))
goto fail;
rdev->sb_loaded = 1;
return 0;
@ -989,7 +995,7 @@ static int super_90_load(mdk_rdev_t *rdev, mdk_rdev_t *refdev, int minor_version
*
* It also happens to be a multiple of 4Kb.
*/
rdev->sb_start = calc_dev_sboffset(rdev->bdev);
rdev->sb_start = calc_dev_sboffset(rdev);
ret = read_disk_sb(rdev, MD_SB_BYTES);
if (ret) return ret;
@ -1330,7 +1336,7 @@ super_90_rdev_size_change(mdk_rdev_t *rdev, sector_t num_sectors)
return 0; /* component must fit device */
if (rdev->mddev->bitmap_info.offset)
return 0; /* can't move bitmap */
rdev->sb_start = calc_dev_sboffset(rdev->bdev);
rdev->sb_start = calc_dev_sboffset(rdev);
if (!num_sectors || num_sectors > rdev->sb_start)
num_sectors = rdev->sb_start;
md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size,
@ -2465,6 +2471,10 @@ slot_store(mdk_rdev_t *rdev, const char *buf, size_t len)
if (rdev2->raid_disk == slot)
return -EEXIST;
if (slot >= rdev->mddev->raid_disks &&
slot >= rdev->mddev->raid_disks + rdev->mddev->delta_disks)
return -ENOSPC;
rdev->raid_disk = slot;
if (test_bit(In_sync, &rdev->flags))
rdev->saved_raid_disk = slot;
@ -2482,7 +2492,8 @@ slot_store(mdk_rdev_t *rdev, const char *buf, size_t len)
/* failure here is OK */;
/* don't wakeup anyone, leave that to userspace. */
} else {
if (slot >= rdev->mddev->raid_disks)
if (slot >= rdev->mddev->raid_disks &&
slot >= rdev->mddev->raid_disks + rdev->mddev->delta_disks)
return -ENOSPC;
rdev->raid_disk = slot;
/* assume it is working */
@ -3107,7 +3118,7 @@ level_store(mddev_t *mddev, const char *buf, size_t len)
char nm[20];
if (rdev->raid_disk < 0)
continue;
if (rdev->new_raid_disk > mddev->raid_disks)
if (rdev->new_raid_disk >= mddev->raid_disks)
rdev->new_raid_disk = -1;
if (rdev->new_raid_disk == rdev->raid_disk)
continue;
@ -3736,6 +3747,8 @@ action_show(mddev_t *mddev, char *page)
return sprintf(page, "%s\n", type);
}
static void reap_sync_thread(mddev_t *mddev);
static ssize_t
action_store(mddev_t *mddev, const char *page, size_t len)
{
@ -3750,9 +3763,7 @@ action_store(mddev_t *mddev, const char *page, size_t len)
if (cmd_match(page, "idle") || cmd_match(page, "frozen")) {
if (mddev->sync_thread) {
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
md_unregister_thread(mddev->sync_thread);
mddev->sync_thread = NULL;
mddev->recovery = 0;
reap_sync_thread(mddev);
}
} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
@ -3904,7 +3915,7 @@ static struct md_sysfs_entry md_sync_speed = __ATTR_RO(sync_speed);
static ssize_t
sync_completed_show(mddev_t *mddev, char *page)
{
unsigned long max_sectors, resync;
unsigned long long max_sectors, resync;
if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
return sprintf(page, "none\n");
@ -3915,7 +3926,7 @@ sync_completed_show(mddev_t *mddev, char *page)
max_sectors = mddev->dev_sectors;
resync = mddev->curr_resync_completed;
return sprintf(page, "%lu / %lu\n", resync, max_sectors);
return sprintf(page, "%llu / %llu\n", resync, max_sectors);
}
static struct md_sysfs_entry md_sync_completed = __ATTR_RO(sync_completed);
@ -4002,19 +4013,24 @@ suspend_lo_store(mddev_t *mddev, const char *buf, size_t len)
{
char *e;
unsigned long long new = simple_strtoull(buf, &e, 10);
unsigned long long old = mddev->suspend_lo;
if (mddev->pers == NULL ||
mddev->pers->quiesce == NULL)
return -EINVAL;
if (buf == e || (*e && *e != '\n'))
return -EINVAL;
if (new >= mddev->suspend_hi ||
(new > mddev->suspend_lo && new < mddev->suspend_hi)) {
mddev->suspend_lo = new;
mddev->suspend_lo = new;
if (new >= old)
/* Shrinking suspended region */
mddev->pers->quiesce(mddev, 2);
return len;
} else
return -EINVAL;
else {
/* Expanding suspended region - need to wait */
mddev->pers->quiesce(mddev, 1);
mddev->pers->quiesce(mddev, 0);
}
return len;
}
static struct md_sysfs_entry md_suspend_lo =
__ATTR(suspend_lo, S_IRUGO|S_IWUSR, suspend_lo_show, suspend_lo_store);
@ -4031,20 +4047,24 @@ suspend_hi_store(mddev_t *mddev, const char *buf, size_t len)
{
char *e;
unsigned long long new = simple_strtoull(buf, &e, 10);
unsigned long long old = mddev->suspend_hi;
if (mddev->pers == NULL ||
mddev->pers->quiesce == NULL)
return -EINVAL;
if (buf == e || (*e && *e != '\n'))
return -EINVAL;
if ((new <= mddev->suspend_lo && mddev->suspend_lo >= mddev->suspend_hi) ||
(new > mddev->suspend_lo && new > mddev->suspend_hi)) {
mddev->suspend_hi = new;
mddev->suspend_hi = new;
if (new <= old)
/* Shrinking suspended region */
mddev->pers->quiesce(mddev, 2);
else {
/* Expanding suspended region - need to wait */
mddev->pers->quiesce(mddev, 1);
mddev->pers->quiesce(mddev, 0);
return len;
} else
return -EINVAL;
}
return len;
}
static struct md_sysfs_entry md_suspend_hi =
__ATTR(suspend_hi, S_IRUGO|S_IWUSR, suspend_hi_show, suspend_hi_store);
@ -4422,7 +4442,9 @@ int md_run(mddev_t *mddev)
* We don't want the data to overlap the metadata,
* Internal Bitmap issues have been handled elsewhere.
*/
if (rdev->data_offset < rdev->sb_start) {
if (rdev->meta_bdev) {
/* Nothing to check */;
} else if (rdev->data_offset < rdev->sb_start) {
if (mddev->dev_sectors &&
rdev->data_offset + mddev->dev_sectors
> rdev->sb_start) {
@ -4556,7 +4578,8 @@ int md_run(mddev_t *mddev)
mddev->safemode_timer.data = (unsigned long) mddev;
mddev->safemode_delay = (200 * HZ)/1000 +1; /* 200 msec delay */
mddev->in_sync = 1;
smp_wmb();
mddev->ready = 1;
list_for_each_entry(rdev, &mddev->disks, same_set)
if (rdev->raid_disk >= 0) {
char nm[20];
@ -4693,13 +4716,12 @@ static void md_clean(mddev_t *mddev)
mddev->plug = NULL;
}
void md_stop_writes(mddev_t *mddev)
static void __md_stop_writes(mddev_t *mddev)
{
if (mddev->sync_thread) {
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
md_unregister_thread(mddev->sync_thread);
mddev->sync_thread = NULL;
reap_sync_thread(mddev);
}
del_timer_sync(&mddev->safemode_timer);
@ -4713,10 +4735,18 @@ void md_stop_writes(mddev_t *mddev)
md_update_sb(mddev, 1);
}
}
void md_stop_writes(mddev_t *mddev)
{
mddev_lock(mddev);
__md_stop_writes(mddev);
mddev_unlock(mddev);
}
EXPORT_SYMBOL_GPL(md_stop_writes);
void md_stop(mddev_t *mddev)
{
mddev->ready = 0;
mddev->pers->stop(mddev);
if (mddev->pers->sync_request && mddev->to_remove == NULL)
mddev->to_remove = &md_redundancy_group;
@ -4736,7 +4766,7 @@ static int md_set_readonly(mddev_t *mddev, int is_open)
goto out;
}
if (mddev->pers) {
md_stop_writes(mddev);
__md_stop_writes(mddev);
err = -ENXIO;
if (mddev->ro==1)
@ -4773,7 +4803,7 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open)
if (mddev->ro)
set_disk_ro(disk, 0);
md_stop_writes(mddev);
__md_stop_writes(mddev);
md_stop(mddev);
mddev->queue->merge_bvec_fn = NULL;
mddev->queue->unplug_fn = NULL;
@ -5151,9 +5181,10 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
/* set saved_raid_disk if appropriate */
if (!mddev->persistent) {
if (info->state & (1<<MD_DISK_SYNC) &&
info->raid_disk < mddev->raid_disks)
info->raid_disk < mddev->raid_disks) {
rdev->raid_disk = info->raid_disk;
else
set_bit(In_sync, &rdev->flags);
} else
rdev->raid_disk = -1;
} else
super_types[mddev->major_version].
@ -5230,7 +5261,7 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
printk(KERN_INFO "md: nonpersistent superblock ...\n");
rdev->sb_start = i_size_read(rdev->bdev->bd_inode) / 512;
} else
rdev->sb_start = calc_dev_sboffset(rdev->bdev);
rdev->sb_start = calc_dev_sboffset(rdev);
rdev->sectors = rdev->sb_start;
err = bind_rdev_to_array(rdev, mddev);
@ -5297,7 +5328,7 @@ static int hot_add_disk(mddev_t * mddev, dev_t dev)
}
if (mddev->persistent)
rdev->sb_start = calc_dev_sboffset(rdev->bdev);
rdev->sb_start = calc_dev_sboffset(rdev);
else
rdev->sb_start = i_size_read(rdev->bdev->bd_inode) / 512;
@ -5510,7 +5541,6 @@ static int update_size(mddev_t *mddev, sector_t num_sectors)
* sb_start or, if that is <data_offset, it must fit before the size
* of each device. If num_sectors is zero, we find the largest size
* that fits.
*/
if (mddev->sync_thread)
return -EBUSY;
@ -6033,7 +6063,8 @@ static int md_thread(void * arg)
|| kthread_should_stop(),
thread->timeout);
if (test_and_clear_bit(THREAD_WAKEUP, &thread->flags))
clear_bit(THREAD_WAKEUP, &thread->flags);
if (!kthread_should_stop())
thread->run(thread->mddev);
}
@ -6799,7 +6830,7 @@ void md_do_sync(mddev_t *mddev)
desc, mdname(mddev));
mddev->curr_resync = j;
}
mddev->curr_resync_completed = mddev->curr_resync;
mddev->curr_resync_completed = j;
while (j < max_sectors) {
sector_t sectors;
@ -6817,8 +6848,7 @@ void md_do_sync(mddev_t *mddev)
md_unplug(mddev);
wait_event(mddev->recovery_wait,
atomic_read(&mddev->recovery_active) == 0);
mddev->curr_resync_completed =
mddev->curr_resync;
mddev->curr_resync_completed = j;
set_bit(MD_CHANGE_CLEAN, &mddev->flags);
sysfs_notify(&mddev->kobj, NULL, "sync_completed");
}
@ -7023,6 +7053,45 @@ static int remove_and_add_spares(mddev_t *mddev)
}
return spares;
}
static void reap_sync_thread(mddev_t *mddev)
{
mdk_rdev_t *rdev;
/* resync has finished, collect result */
md_unregister_thread(mddev->sync_thread);
mddev->sync_thread = NULL;
if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
/* success...*/
/* activate any spares */
if (mddev->pers->spare_active(mddev))
sysfs_notify(&mddev->kobj, NULL,
"degraded");
}
if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
mddev->pers->finish_reshape)
mddev->pers->finish_reshape(mddev);
md_update_sb(mddev, 1);
/* if array is no-longer degraded, then any saved_raid_disk
* information must be scrapped
*/
if (!mddev->degraded)
list_for_each_entry(rdev, &mddev->disks, same_set)
rdev->saved_raid_disk = -1;
clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
/* flag recovery needed just to double check */
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
sysfs_notify_dirent_safe(mddev->sysfs_action);
md_new_event(mddev);
}
/*
* This routine is regularly called by all per-raid-array threads to
* deal with generic issues like resync and super-block update.
@ -7047,9 +7116,6 @@ static int remove_and_add_spares(mddev_t *mddev)
*/
void md_check_recovery(mddev_t *mddev)
{
mdk_rdev_t *rdev;
if (mddev->bitmap)
bitmap_daemon_work(mddev);
@ -7117,34 +7183,7 @@ void md_check_recovery(mddev_t *mddev)
goto unlock;
}
if (mddev->sync_thread) {
/* resync has finished, collect result */
md_unregister_thread(mddev->sync_thread);
mddev->sync_thread = NULL;
if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
/* success...*/
/* activate any spares */
if (mddev->pers->spare_active(mddev))
sysfs_notify(&mddev->kobj, NULL,
"degraded");
}
if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
mddev->pers->finish_reshape)
mddev->pers->finish_reshape(mddev);
md_update_sb(mddev, 1);
/* if array is no-longer degraded, then any saved_raid_disk
* information must be scrapped
*/
if (!mddev->degraded)
list_for_each_entry(rdev, &mddev->disks, same_set)
rdev->saved_raid_disk = -1;
mddev->recovery = 0;
/* flag recovery needed just to double check */
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
sysfs_notify_dirent_safe(mddev->sysfs_action);
md_new_event(mddev);
reap_sync_thread(mddev);
goto unlock;
}
/* Set RUNNING before clearing NEEDED to avoid
@ -7202,7 +7241,11 @@ void md_check_recovery(mddev_t *mddev)
" thread...\n",
mdname(mddev));
/* leave the spares where they are, it shouldn't hurt */
mddev->recovery = 0;
clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
} else
md_wakeup_thread(mddev->sync_thread);
sysfs_notify_dirent_safe(mddev->sysfs_action);

View File

@ -60,6 +60,12 @@ struct mdk_rdev_s
mddev_t *mddev; /* RAID array if running */
int last_events; /* IO event timestamp */
/*
* If meta_bdev is non-NULL, it means that a separate device is
* being used to store the metadata (superblock/bitmap) which
* would otherwise be contained on the same device as the data (bdev).
*/
struct block_device *meta_bdev;
struct block_device *bdev; /* block device handle */
struct page *sb_page;
@ -148,7 +154,8 @@ struct mddev_s
* are happening, so run/
* takeover/stop are not safe
*/
int ready; /* See when safe to pass
* IO requests down */
struct gendisk *gendisk;
struct kobject kobj;
@ -497,8 +504,8 @@ extern void md_flush_request(mddev_t *mddev, struct bio *bio);
extern void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev,
sector_t sector, int size, struct page *page);
extern void md_super_wait(mddev_t *mddev);
extern int sync_page_io(mdk_rdev_t *rdev, sector_t sector, int size,
struct page *page, int rw);
extern int sync_page_io(mdk_rdev_t *rdev, sector_t sector, int size,
struct page *page, int rw, bool metadata_op);
extern void md_do_sync(mddev_t *mddev);
extern void md_new_event(mddev_t *mddev);
extern int md_allow_write(mddev_t *mddev);

View File

@ -1027,8 +1027,9 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
} else
set_bit(Faulty, &rdev->flags);
set_bit(MD_CHANGE_DEVS, &mddev->flags);
printk(KERN_ALERT "md/raid1:%s: Disk failure on %s, disabling device.\n"
KERN_ALERT "md/raid1:%s: Operation continuing on %d devices.\n",
printk(KERN_ALERT
"md/raid1:%s: Disk failure on %s, disabling device.\n"
"md/raid1:%s: Operation continuing on %d devices.\n",
mdname(mddev), bdevname(rdev->bdev, b),
mdname(mddev), conf->raid_disks - mddev->degraded);
}
@ -1364,10 +1365,10 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
*/
rdev = conf->mirrors[d].rdev;
if (sync_page_io(rdev,
sect + rdev->data_offset,
sect,
s<<9,
bio->bi_io_vec[idx].bv_page,
READ)) {
READ, false)) {
success = 1;
break;
}
@ -1390,10 +1391,10 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
rdev = conf->mirrors[d].rdev;
atomic_add(s, &rdev->corrected_errors);
if (sync_page_io(rdev,
sect + rdev->data_offset,
sect,
s<<9,
bio->bi_io_vec[idx].bv_page,
WRITE) == 0)
WRITE, false) == 0)
md_error(mddev, rdev);
}
d = start;
@ -1405,10 +1406,10 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
continue;
rdev = conf->mirrors[d].rdev;
if (sync_page_io(rdev,
sect + rdev->data_offset,
sect,
s<<9,
bio->bi_io_vec[idx].bv_page,
READ) == 0)
READ, false) == 0)
md_error(mddev, rdev);
}
} else {
@ -1488,10 +1489,8 @@ static void fix_read_error(conf_t *conf, int read_disk,
rdev = conf->mirrors[d].rdev;
if (rdev &&
test_bit(In_sync, &rdev->flags) &&
sync_page_io(rdev,
sect + rdev->data_offset,
s<<9,
conf->tmppage, READ))
sync_page_io(rdev, sect, s<<9,
conf->tmppage, READ, false))
success = 1;
else {
d++;
@ -1514,9 +1513,8 @@ static void fix_read_error(conf_t *conf, int read_disk,
rdev = conf->mirrors[d].rdev;
if (rdev &&
test_bit(In_sync, &rdev->flags)) {
if (sync_page_io(rdev,
sect + rdev->data_offset,
s<<9, conf->tmppage, WRITE)
if (sync_page_io(rdev, sect, s<<9,
conf->tmppage, WRITE, false)
== 0)
/* Well, this device is dead */
md_error(mddev, rdev);
@ -1531,9 +1529,8 @@ static void fix_read_error(conf_t *conf, int read_disk,
rdev = conf->mirrors[d].rdev;
if (rdev &&
test_bit(In_sync, &rdev->flags)) {
if (sync_page_io(rdev,
sect + rdev->data_offset,
s<<9, conf->tmppage, READ)
if (sync_page_io(rdev, sect, s<<9,
conf->tmppage, READ, false)
== 0)
/* Well, this device is dead */
md_error(mddev, rdev);

View File

@ -1051,8 +1051,9 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
}
set_bit(Faulty, &rdev->flags);
set_bit(MD_CHANGE_DEVS, &mddev->flags);
printk(KERN_ALERT "md/raid10:%s: Disk failure on %s, disabling device.\n"
KERN_ALERT "md/raid10:%s: Operation continuing on %d devices.\n",
printk(KERN_ALERT
"md/raid10:%s: Disk failure on %s, disabling device.\n"
"md/raid10:%s: Operation continuing on %d devices.\n",
mdname(mddev), bdevname(rdev->bdev, b),
mdname(mddev), conf->raid_disks - mddev->degraded);
}
@ -1559,9 +1560,9 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
rcu_read_unlock();
success = sync_page_io(rdev,
r10_bio->devs[sl].addr +
sect + rdev->data_offset,
sect,
s<<9,
conf->tmppage, READ);
conf->tmppage, READ, false);
rdev_dec_pending(rdev, mddev);
rcu_read_lock();
if (success)
@ -1598,8 +1599,8 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
atomic_add(s, &rdev->corrected_errors);
if (sync_page_io(rdev,
r10_bio->devs[sl].addr +
sect + rdev->data_offset,
s<<9, conf->tmppage, WRITE)
sect,
s<<9, conf->tmppage, WRITE, false)
== 0) {
/* Well, this device is dead */
printk(KERN_NOTICE
@ -1635,9 +1636,9 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
rcu_read_unlock();
if (sync_page_io(rdev,
r10_bio->devs[sl].addr +
sect + rdev->data_offset,
sect,
s<<9, conf->tmppage,
READ) == 0) {
READ, false) == 0) {
/* Well, this device is dead */
printk(KERN_NOTICE
"md/raid10:%s: unable to read back "

View File

@ -1721,7 +1721,6 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
set_bit(Faulty, &rdev->flags);
printk(KERN_ALERT
"md/raid:%s: Disk failure on %s, disabling device.\n"
KERN_ALERT
"md/raid:%s: Operation continuing on %d devices.\n",
mdname(mddev),
bdevname(rdev->bdev, b),
@ -4237,7 +4236,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
wait_event(conf->wait_for_overlap,
atomic_read(&conf->reshape_stripes)==0);
mddev->reshape_position = conf->reshape_progress;
mddev->curr_resync_completed = mddev->curr_resync;
mddev->curr_resync_completed = sector_nr;
conf->reshape_checkpoint = jiffies;
set_bit(MD_CHANGE_DEVS, &mddev->flags);
md_wakeup_thread(mddev->thread);
@ -4338,7 +4337,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
wait_event(conf->wait_for_overlap,
atomic_read(&conf->reshape_stripes) == 0);
mddev->reshape_position = conf->reshape_progress;
mddev->curr_resync_completed = mddev->curr_resync + reshape_sectors;
mddev->curr_resync_completed = sector_nr;
conf->reshape_checkpoint = jiffies;
set_bit(MD_CHANGE_DEVS, &mddev->flags);
md_wakeup_thread(mddev->thread);
@ -5339,7 +5338,7 @@ static int raid5_spare_active(mddev_t *mddev)
&& !test_bit(Faulty, &tmp->rdev->flags)
&& !test_and_set_bit(In_sync, &tmp->rdev->flags)) {
count++;
sysfs_notify_dirent(tmp->rdev->sysfs_state);
sysfs_notify_dirent_safe(tmp->rdev->sysfs_state);
}
}
spin_lock_irqsave(&conf->device_lock, flags);
@ -5528,8 +5527,8 @@ static int raid5_start_reshape(mddev_t *mddev)
return -ENOSPC;
list_for_each_entry(rdev, &mddev->disks, same_set)
if (rdev->raid_disk < 0 &&
!test_bit(Faulty, &rdev->flags))
if ((rdev->raid_disk < 0 || rdev->raid_disk >= conf->raid_disks)
&& !test_bit(Faulty, &rdev->flags))
spares++;
if (spares - mddev->degraded < mddev->delta_disks - conf->max_degraded)
@ -5589,6 +5588,11 @@ static int raid5_start_reshape(mddev_t *mddev)
/* Failure here is OK */;
} else
break;
} else if (rdev->raid_disk >= conf->previous_raid_disks
&& !test_bit(Faulty, &rdev->flags)) {
/* This is a spare that was manually added */
set_bit(In_sync, &rdev->flags);
added_devices++;
}
/* When a reshape changes the number of devices, ->degraded

View File

@ -1732,6 +1732,11 @@ static int __devinit atmel_serial_probe(struct platform_device *pdev)
device_init_wakeup(&pdev->dev, 1);
platform_set_drvdata(pdev, port);
if (port->rs485.flags & SER_RS485_ENABLED) {
UART_PUT_MR(&port->uart, ATMEL_US_USMODE_NORMAL);
UART_PUT_CR(&port->uart, ATMEL_US_RTSEN);
}
return 0;
err_add_port:

View File

@ -71,11 +71,18 @@ config XEN_SYS_HYPERVISOR
but will have no xen contents.
config XEN_XENBUS_FRONTEND
tristate
tristate
config XEN_GNTDEV
tristate "userspace grant access device driver"
depends on XEN
select MMU_NOTIFIER
help
Allows userspace processes to use grants.
config XEN_PLATFORM_PCI
tristate "xen platform pci device driver"
depends on XEN_PVHVM
depends on XEN_PVHVM && PCI
default m
help
Driver for the Xen PCI Platform device: it is responsible for

View File

@ -9,11 +9,14 @@ obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
obj-$(CONFIG_XEN_BALLOON) += balloon.o
obj-$(CONFIG_XEN_DEV_EVTCHN) += xen-evtchn.o
obj-$(CONFIG_XEN_GNTDEV) += xen-gntdev.o
obj-$(CONFIG_XENFS) += xenfs/
obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o
obj-$(CONFIG_XEN_PLATFORM_PCI) += platform-pci.o
obj-$(CONFIG_XEN_PLATFORM_PCI) += xen-platform-pci.o
obj-$(CONFIG_SWIOTLB_XEN) += swiotlb-xen.o
obj-$(CONFIG_XEN_DOM0) += pci.o
xen-evtchn-y := evtchn.o
xen-gntdev-y := gntdev.o
xen-platform-pci-y := platform-pci.o

665
drivers/xen/gntdev.c Normal file
View File

@ -0,0 +1,665 @@
/******************************************************************************
* gntdev.c
*
* Device for accessing (in user-space) pages that have been granted by other
* domains.
*
* Copyright (c) 2006-2007, D G Murray.
* (c) 2009 Gerd Hoffmann <kraxel@redhat.com>
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#undef DEBUG
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/miscdevice.h>
#include <linux/fs.h>
#include <linux/mm.h>
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
#include <linux/types.h>
#include <linux/uaccess.h>
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/slab.h>
#include <xen/xen.h>
#include <xen/grant_table.h>
#include <xen/gntdev.h>
#include <asm/xen/hypervisor.h>
#include <asm/xen/hypercall.h>
#include <asm/xen/page.h>
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Derek G. Murray <Derek.Murray@cl.cam.ac.uk>, "
"Gerd Hoffmann <kraxel@redhat.com>");
MODULE_DESCRIPTION("User-space granted page access driver");
static int limit = 1024;
module_param(limit, int, 0644);
MODULE_PARM_DESC(limit, "Maximum number of grants that may be mapped at "
"once by a gntdev instance");
struct gntdev_priv {
struct list_head maps;
uint32_t used;
uint32_t limit;
/* lock protects maps from concurrent changes */
spinlock_t lock;
struct mm_struct *mm;
struct mmu_notifier mn;
};
struct grant_map {
struct list_head next;
struct gntdev_priv *priv;
struct vm_area_struct *vma;
int index;
int count;
int flags;
int is_mapped;
struct ioctl_gntdev_grant_ref *grants;
struct gnttab_map_grant_ref *map_ops;
struct gnttab_unmap_grant_ref *unmap_ops;
struct page **pages;
};
/* ------------------------------------------------------------------ */
static void gntdev_print_maps(struct gntdev_priv *priv,
char *text, int text_index)
{
#ifdef DEBUG
struct grant_map *map;
pr_debug("maps list (priv %p, usage %d/%d)\n",
priv, priv->used, priv->limit);
list_for_each_entry(map, &priv->maps, next)
pr_debug(" index %2d, count %2d %s\n",
map->index, map->count,
map->index == text_index && text ? text : "");
#endif
}
static struct grant_map *gntdev_alloc_map(struct gntdev_priv *priv, int count)
{
struct grant_map *add;
int i;
add = kzalloc(sizeof(struct grant_map), GFP_KERNEL);
if (NULL == add)
return NULL;
add->grants = kzalloc(sizeof(add->grants[0]) * count, GFP_KERNEL);
add->map_ops = kzalloc(sizeof(add->map_ops[0]) * count, GFP_KERNEL);
add->unmap_ops = kzalloc(sizeof(add->unmap_ops[0]) * count, GFP_KERNEL);
add->pages = kzalloc(sizeof(add->pages[0]) * count, GFP_KERNEL);
if (NULL == add->grants ||
NULL == add->map_ops ||
NULL == add->unmap_ops ||
NULL == add->pages)
goto err;
for (i = 0; i < count; i++) {
add->pages[i] = alloc_page(GFP_KERNEL | __GFP_HIGHMEM);
if (add->pages[i] == NULL)
goto err;
}
add->index = 0;
add->count = count;
add->priv = priv;
if (add->count + priv->used > priv->limit)
goto err;
return add;
err:
if (add->pages)
for (i = 0; i < count; i++) {
if (add->pages[i])
__free_page(add->pages[i]);
}
kfree(add->pages);
kfree(add->grants);
kfree(add->map_ops);
kfree(add->unmap_ops);
kfree(add);
return NULL;
}
static void gntdev_add_map(struct gntdev_priv *priv, struct grant_map *add)
{
struct grant_map *map;
list_for_each_entry(map, &priv->maps, next) {
if (add->index + add->count < map->index) {
list_add_tail(&add->next, &map->next);
goto done;
}
add->index = map->index + map->count;
}
list_add_tail(&add->next, &priv->maps);
done:
priv->used += add->count;
gntdev_print_maps(priv, "[new]", add->index);
}
static struct grant_map *gntdev_find_map_index(struct gntdev_priv *priv,
int index, int count)
{
struct grant_map *map;
list_for_each_entry(map, &priv->maps, next) {
if (map->index != index)
continue;
if (map->count != count)
continue;
return map;
}
return NULL;
}
static struct grant_map *gntdev_find_map_vaddr(struct gntdev_priv *priv,
unsigned long vaddr)
{
struct grant_map *map;
list_for_each_entry(map, &priv->maps, next) {
if (!map->vma)
continue;
if (vaddr < map->vma->vm_start)
continue;
if (vaddr >= map->vma->vm_end)
continue;
return map;
}
return NULL;
}
static int gntdev_del_map(struct grant_map *map)
{
int i;
if (map->vma)
return -EBUSY;
for (i = 0; i < map->count; i++)
if (map->unmap_ops[i].handle)
return -EBUSY;
map->priv->used -= map->count;
list_del(&map->next);
return 0;
}
static void gntdev_free_map(struct grant_map *map)
{
int i;
if (!map)
return;
if (map->pages)
for (i = 0; i < map->count; i++) {
if (map->pages[i])
__free_page(map->pages[i]);
}
kfree(map->pages);
kfree(map->grants);
kfree(map->map_ops);
kfree(map->unmap_ops);
kfree(map);
}
/* ------------------------------------------------------------------ */
static int find_grant_ptes(pte_t *pte, pgtable_t token,
unsigned long addr, void *data)
{
struct grant_map *map = data;
unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT;
u64 pte_maddr;
BUG_ON(pgnr >= map->count);
pte_maddr = arbitrary_virt_to_machine(pte).maddr;
gnttab_set_map_op(&map->map_ops[pgnr], pte_maddr,
GNTMAP_contains_pte | map->flags,
map->grants[pgnr].ref,
map->grants[pgnr].domid);
gnttab_set_unmap_op(&map->unmap_ops[pgnr], pte_maddr,
GNTMAP_contains_pte | map->flags,
0 /* handle */);
return 0;
}
static int map_grant_pages(struct grant_map *map)
{
int i, err = 0;
pr_debug("map %d+%d\n", map->index, map->count);
err = gnttab_map_refs(map->map_ops, map->pages, map->count);
if (err)
return err;
for (i = 0; i < map->count; i++) {
if (map->map_ops[i].status)
err = -EINVAL;
map->unmap_ops[i].handle = map->map_ops[i].handle;
}
return err;
}
static int unmap_grant_pages(struct grant_map *map, int offset, int pages)
{
int i, err = 0;
pr_debug("map %d+%d [%d+%d]\n", map->index, map->count, offset, pages);
err = gnttab_unmap_refs(map->unmap_ops + offset, map->pages, pages);
if (err)
return err;
for (i = 0; i < pages; i++) {
if (map->unmap_ops[offset+i].status)
err = -EINVAL;
map->unmap_ops[offset+i].handle = 0;
}
return err;
}
/* ------------------------------------------------------------------ */
static void gntdev_vma_close(struct vm_area_struct *vma)
{
struct grant_map *map = vma->vm_private_data;
pr_debug("close %p\n", vma);
map->is_mapped = 0;
map->vma = NULL;
vma->vm_private_data = NULL;
}
static int gntdev_vma_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{
pr_debug("vaddr %p, pgoff %ld (shouldn't happen)\n",
vmf->virtual_address, vmf->pgoff);
vmf->flags = VM_FAULT_ERROR;
return 0;
}
static struct vm_operations_struct gntdev_vmops = {
.close = gntdev_vma_close,
.fault = gntdev_vma_fault,
};
/* ------------------------------------------------------------------ */
static void mn_invl_range_start(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start, unsigned long end)
{
struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
struct grant_map *map;
unsigned long mstart, mend;
int err;
spin_lock(&priv->lock);
list_for_each_entry(map, &priv->maps, next) {
if (!map->vma)
continue;
if (!map->is_mapped)
continue;
if (map->vma->vm_start >= end)
continue;
if (map->vma->vm_end <= start)
continue;
mstart = max(start, map->vma->vm_start);
mend = min(end, map->vma->vm_end);
pr_debug("map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n",
map->index, map->count,
map->vma->vm_start, map->vma->vm_end,
start, end, mstart, mend);
err = unmap_grant_pages(map,
(mstart - map->vma->vm_start) >> PAGE_SHIFT,
(mend - mstart) >> PAGE_SHIFT);
WARN_ON(err);
}
spin_unlock(&priv->lock);
}
static void mn_invl_page(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address)
{
mn_invl_range_start(mn, mm, address, address + PAGE_SIZE);
}
static void mn_release(struct mmu_notifier *mn,
struct mm_struct *mm)
{
struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
struct grant_map *map;
int err;
spin_lock(&priv->lock);
list_for_each_entry(map, &priv->maps, next) {
if (!map->vma)
continue;
pr_debug("map %d+%d (%lx %lx)\n",
map->index, map->count,
map->vma->vm_start, map->vma->vm_end);
err = unmap_grant_pages(map, /* offset */ 0, map->count);
WARN_ON(err);
}
spin_unlock(&priv->lock);
}
struct mmu_notifier_ops gntdev_mmu_ops = {
.release = mn_release,
.invalidate_page = mn_invl_page,
.invalidate_range_start = mn_invl_range_start,
};
/* ------------------------------------------------------------------ */
static int gntdev_open(struct inode *inode, struct file *flip)
{
struct gntdev_priv *priv;
int ret = 0;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
INIT_LIST_HEAD(&priv->maps);
spin_lock_init(&priv->lock);
priv->limit = limit;
priv->mm = get_task_mm(current);
if (!priv->mm) {
kfree(priv);
return -ENOMEM;
}
priv->mn.ops = &gntdev_mmu_ops;
ret = mmu_notifier_register(&priv->mn, priv->mm);
mmput(priv->mm);
if (ret) {
kfree(priv);
return ret;
}
flip->private_data = priv;
pr_debug("priv %p\n", priv);
return 0;
}
static int gntdev_release(struct inode *inode, struct file *flip)
{
struct gntdev_priv *priv = flip->private_data;
struct grant_map *map;
int err;
pr_debug("priv %p\n", priv);
spin_lock(&priv->lock);
while (!list_empty(&priv->maps)) {
map = list_entry(priv->maps.next, struct grant_map, next);
err = gntdev_del_map(map);
if (WARN_ON(err))
gntdev_free_map(map);
}
spin_unlock(&priv->lock);
mmu_notifier_unregister(&priv->mn, priv->mm);
kfree(priv);
return 0;
}
static long gntdev_ioctl_map_grant_ref(struct gntdev_priv *priv,
struct ioctl_gntdev_map_grant_ref __user *u)
{
struct ioctl_gntdev_map_grant_ref op;
struct grant_map *map;
int err;
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
pr_debug("priv %p, add %d\n", priv, op.count);
if (unlikely(op.count <= 0))
return -EINVAL;
if (unlikely(op.count > priv->limit))
return -EINVAL;
err = -ENOMEM;
map = gntdev_alloc_map(priv, op.count);
if (!map)
return err;
if (copy_from_user(map->grants, &u->refs,
sizeof(map->grants[0]) * op.count) != 0) {
gntdev_free_map(map);
return err;
}
spin_lock(&priv->lock);
gntdev_add_map(priv, map);
op.index = map->index << PAGE_SHIFT;
spin_unlock(&priv->lock);
if (copy_to_user(u, &op, sizeof(op)) != 0) {
spin_lock(&priv->lock);
gntdev_del_map(map);
spin_unlock(&priv->lock);
gntdev_free_map(map);
return err;
}
return 0;
}
static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
struct ioctl_gntdev_unmap_grant_ref __user *u)
{
struct ioctl_gntdev_unmap_grant_ref op;
struct grant_map *map;
int err = -ENOENT;
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
pr_debug("priv %p, del %d+%d\n", priv, (int)op.index, (int)op.count);
spin_lock(&priv->lock);
map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
if (map)
err = gntdev_del_map(map);
spin_unlock(&priv->lock);
if (!err)
gntdev_free_map(map);
return err;
}
static long gntdev_ioctl_get_offset_for_vaddr(struct gntdev_priv *priv,
struct ioctl_gntdev_get_offset_for_vaddr __user *u)
{
struct ioctl_gntdev_get_offset_for_vaddr op;
struct grant_map *map;
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
pr_debug("priv %p, offset for vaddr %lx\n", priv, (unsigned long)op.vaddr);
spin_lock(&priv->lock);
map = gntdev_find_map_vaddr(priv, op.vaddr);
if (map == NULL ||
map->vma->vm_start != op.vaddr) {
spin_unlock(&priv->lock);
return -EINVAL;
}
op.offset = map->index << PAGE_SHIFT;
op.count = map->count;
spin_unlock(&priv->lock);
if (copy_to_user(u, &op, sizeof(op)) != 0)
return -EFAULT;
return 0;
}
static long gntdev_ioctl_set_max_grants(struct gntdev_priv *priv,
struct ioctl_gntdev_set_max_grants __user *u)
{
struct ioctl_gntdev_set_max_grants op;
if (copy_from_user(&op, u, sizeof(op)) != 0)
return -EFAULT;
pr_debug("priv %p, limit %d\n", priv, op.count);
if (op.count > limit)
return -E2BIG;
spin_lock(&priv->lock);
priv->limit = op.count;
spin_unlock(&priv->lock);
return 0;
}
static long gntdev_ioctl(struct file *flip,
unsigned int cmd, unsigned long arg)
{
struct gntdev_priv *priv = flip->private_data;
void __user *ptr = (void __user *)arg;
switch (cmd) {
case IOCTL_GNTDEV_MAP_GRANT_REF:
return gntdev_ioctl_map_grant_ref(priv, ptr);
case IOCTL_GNTDEV_UNMAP_GRANT_REF:
return gntdev_ioctl_unmap_grant_ref(priv, ptr);
case IOCTL_GNTDEV_GET_OFFSET_FOR_VADDR:
return gntdev_ioctl_get_offset_for_vaddr(priv, ptr);
case IOCTL_GNTDEV_SET_MAX_GRANTS:
return gntdev_ioctl_set_max_grants(priv, ptr);
default:
pr_debug("priv %p, unknown cmd %x\n", priv, cmd);
return -ENOIOCTLCMD;
}
return 0;
}
static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
{
struct gntdev_priv *priv = flip->private_data;
int index = vma->vm_pgoff;
int count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
struct grant_map *map;
int err = -EINVAL;
if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED))
return -EINVAL;
pr_debug("map %d+%d at %lx (pgoff %lx)\n",
index, count, vma->vm_start, vma->vm_pgoff);
spin_lock(&priv->lock);
map = gntdev_find_map_index(priv, index, count);
if (!map)
goto unlock_out;
if (map->vma)
goto unlock_out;
if (priv->mm != vma->vm_mm) {
printk(KERN_WARNING "Huh? Other mm?\n");
goto unlock_out;
}
vma->vm_ops = &gntdev_vmops;
vma->vm_flags |= VM_RESERVED|VM_DONTCOPY|VM_DONTEXPAND|VM_PFNMAP;
vma->vm_private_data = map;
map->vma = vma;
map->flags = GNTMAP_host_map | GNTMAP_application_map;
if (!(vma->vm_flags & VM_WRITE))
map->flags |= GNTMAP_readonly;
spin_unlock(&priv->lock);
err = apply_to_page_range(vma->vm_mm, vma->vm_start,
vma->vm_end - vma->vm_start,
find_grant_ptes, map);
if (err) {
printk(KERN_WARNING "find_grant_ptes() failure.\n");
return err;
}
err = map_grant_pages(map);
if (err) {
printk(KERN_WARNING "map_grant_pages() failure.\n");
return err;
}
map->is_mapped = 1;
return 0;
unlock_out:
spin_unlock(&priv->lock);
return err;
}
static const struct file_operations gntdev_fops = {
.owner = THIS_MODULE,
.open = gntdev_open,
.release = gntdev_release,
.mmap = gntdev_mmap,
.unlocked_ioctl = gntdev_ioctl
};
static struct miscdevice gntdev_miscdev = {
.minor = MISC_DYNAMIC_MINOR,
.name = "xen/gntdev",
.fops = &gntdev_fops,
};
/* ------------------------------------------------------------------ */
static int __init gntdev_init(void)
{
int err;
if (!xen_domain())
return -ENODEV;
err = misc_register(&gntdev_miscdev);
if (err != 0) {
printk(KERN_ERR "Could not register gntdev device\n");
return err;
}
return 0;
}
static void __exit gntdev_exit(void)
{
misc_deregister(&gntdev_miscdev);
}
module_init(gntdev_init);
module_exit(gntdev_exit);
/* ------------------------------------------------------------------ */

View File

@ -447,6 +447,52 @@ unsigned int gnttab_max_grant_frames(void)
}
EXPORT_SYMBOL_GPL(gnttab_max_grant_frames);
int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct page **pages, unsigned int count)
{
int i, ret;
pte_t *pte;
unsigned long mfn;
ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map_ops, count);
if (ret)
return ret;
for (i = 0; i < count; i++) {
/* m2p override only supported for GNTMAP_contains_pte mappings */
if (!(map_ops[i].flags & GNTMAP_contains_pte))
continue;
pte = (pte_t *) (mfn_to_virt(PFN_DOWN(map_ops[i].host_addr)) +
(map_ops[i].host_addr & ~PAGE_MASK));
mfn = pte_mfn(*pte);
ret = m2p_add_override(mfn, pages[i]);
if (ret)
return ret;
}
return ret;
}
EXPORT_SYMBOL_GPL(gnttab_map_refs);
int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
struct page **pages, unsigned int count)
{
int i, ret;
ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
if (ret)
return ret;
for (i = 0; i < count; i++) {
ret = m2p_remove_override(pages[i]);
if (ret)
return ret;
}
return ret;
}
EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
static int gnttab_map(unsigned int start_idx, unsigned int end_idx)
{
struct gnttab_setup_table setup;

View File

@ -105,7 +105,7 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
int i, ret;
long ioaddr, iolen;
long ioaddr;
long mmio_addr, mmio_len;
unsigned int max_nr_gframes;
@ -114,7 +114,6 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
return i;
ioaddr = pci_resource_start(pdev, 0);
iolen = pci_resource_len(pdev, 0);
mmio_addr = pci_resource_start(pdev, 1);
mmio_len = pci_resource_len(pdev, 1);
@ -125,19 +124,13 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
goto pci_out;
}
if (request_mem_region(mmio_addr, mmio_len, DRV_NAME) == NULL) {
dev_err(&pdev->dev, "MEM I/O resource 0x%lx @ 0x%lx busy\n",
mmio_addr, mmio_len);
ret = -EBUSY;
ret = pci_request_region(pdev, 1, DRV_NAME);
if (ret < 0)
goto pci_out;
}
if (request_region(ioaddr, iolen, DRV_NAME) == NULL) {
dev_err(&pdev->dev, "I/O resource 0x%lx @ 0x%lx busy\n",
iolen, ioaddr);
ret = -EBUSY;
ret = pci_request_region(pdev, 0, DRV_NAME);
if (ret < 0)
goto mem_out;
}
platform_mmio = mmio_addr;
platform_mmiolen = mmio_len;
@ -169,9 +162,9 @@ static int __devinit platform_pci_init(struct pci_dev *pdev,
return 0;
out:
release_region(ioaddr, iolen);
pci_release_region(pdev, 0);
mem_out:
release_mem_region(mmio_addr, mmio_len);
pci_release_region(pdev, 1);
pci_out:
pci_disable_device(pdev);
return ret;

View File

@ -141,13 +141,12 @@ int ecryptfs_init_persistent_file(struct dentry *ecryptfs_dentry)
return rc;
}
static inode *ecryptfs_get_inode(struct inode *lower_inode,
static struct inode *ecryptfs_get_inode(struct inode *lower_inode,
struct super_block *sb)
{
struct inode *inode;
int rc = 0;
lower_inode = lower_dentry->d_inode;
if (lower_inode->i_sb != ecryptfs_superblock_to_lower(sb)) {
rc = -EXDEV;
goto out;
@ -202,7 +201,7 @@ int ecryptfs_interpose(struct dentry *lower_dentry, struct dentry *dentry,
{
struct inode *lower_inode = lower_dentry->d_inode;
struct inode *inode = ecryptfs_get_inode(lower_inode, sb);
if (IS_ERR(inode)
if (IS_ERR(inode))
return PTR_ERR(inode);
if (flags & ECRYPTFS_INTERPOSE_FLAG_D_ADD)
d_add(dentry, inode);

View File

@ -84,13 +84,9 @@ static inline struct inode *wb_inode(struct list_head *head)
return list_entry(head, struct inode, i_wb_list);
}
static void bdi_queue_work(struct backing_dev_info *bdi,
struct wb_writeback_work *work)
/* Wakeup flusher thread or forker thread to fork it. Requires bdi->wb_lock. */
static void bdi_wakeup_flusher(struct backing_dev_info *bdi)
{
trace_writeback_queue(bdi, work);
spin_lock_bh(&bdi->wb_lock);
list_add_tail(&work->list, &bdi->work_list);
if (bdi->wb.task) {
wake_up_process(bdi->wb.task);
} else {
@ -98,15 +94,26 @@ static void bdi_queue_work(struct backing_dev_info *bdi,
* The bdi thread isn't there, wake up the forker thread which
* will create and run it.
*/
trace_writeback_nothread(bdi, work);
wake_up_process(default_backing_dev_info.wb.task);
}
}
static void bdi_queue_work(struct backing_dev_info *bdi,
struct wb_writeback_work *work)
{
trace_writeback_queue(bdi, work);
spin_lock_bh(&bdi->wb_lock);
list_add_tail(&work->list, &bdi->work_list);
if (!bdi->wb.task)
trace_writeback_nothread(bdi, work);
bdi_wakeup_flusher(bdi);
spin_unlock_bh(&bdi->wb_lock);
}
static void
__bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
bool range_cyclic, bool for_background)
bool range_cyclic)
{
struct wb_writeback_work *work;
@ -126,7 +133,6 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
work->sync_mode = WB_SYNC_NONE;
work->nr_pages = nr_pages;
work->range_cyclic = range_cyclic;
work->for_background = for_background;
bdi_queue_work(bdi, work);
}
@ -144,7 +150,7 @@ __bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
*/
void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
{
__bdi_start_writeback(bdi, nr_pages, true, false);
__bdi_start_writeback(bdi, nr_pages, true);
}
/**
@ -152,13 +158,21 @@ void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages)
* @bdi: the backing device to write from
*
* Description:
* This does WB_SYNC_NONE background writeback. The IO is only
* started when this function returns, we make no guarentees on
* completion. Caller need not hold sb s_umount semaphore.
* This makes sure WB_SYNC_NONE background writeback happens. When
* this function returns, it is only guaranteed that for given BDI
* some IO is happening if we are over background dirty threshold.
* Caller need not hold sb s_umount semaphore.
*/
void bdi_start_background_writeback(struct backing_dev_info *bdi)
{
__bdi_start_writeback(bdi, LONG_MAX, true, true);
/*
* We just wake up the flusher thread. It will perform background
* writeback as soon as there is no other work to do.
*/
trace_writeback_wake_background(bdi);
spin_lock_bh(&bdi->wb_lock);
bdi_wakeup_flusher(bdi);
spin_unlock_bh(&bdi->wb_lock);
}
/*
@ -616,6 +630,7 @@ static long wb_writeback(struct bdi_writeback *wb,
};
unsigned long oldest_jif;
long wrote = 0;
long write_chunk;
struct inode *inode;
if (wbc.for_kupdate) {
@ -628,6 +643,24 @@ static long wb_writeback(struct bdi_writeback *wb,
wbc.range_end = LLONG_MAX;
}
/*
* WB_SYNC_ALL mode does livelock avoidance by syncing dirty
* inodes/pages in one big loop. Setting wbc.nr_to_write=LONG_MAX
* here avoids calling into writeback_inodes_wb() more than once.
*
* The intended call sequence for WB_SYNC_ALL writeback is:
*
* wb_writeback()
* __writeback_inodes_sb() <== called only once
* write_cache_pages() <== called once for each inode
* (quickly) tag currently dirty pages
* (maybe slowly) sync all tagged pages
*/
if (wbc.sync_mode == WB_SYNC_NONE)
write_chunk = MAX_WRITEBACK_PAGES;
else
write_chunk = LONG_MAX;
wbc.wb_start = jiffies; /* livelock avoidance */
for (;;) {
/*
@ -636,6 +669,16 @@ static long wb_writeback(struct bdi_writeback *wb,
if (work->nr_pages <= 0)
break;
/*
* Background writeout and kupdate-style writeback may
* run forever. Stop them if there is other work to do
* so that e.g. sync can proceed. They'll be restarted
* after the other works are all done.
*/
if ((work->for_background || work->for_kupdate) &&
!list_empty(&wb->bdi->work_list))
break;
/*
* For background writeout, stop when we are below the
* background dirty threshold
@ -644,7 +687,7 @@ static long wb_writeback(struct bdi_writeback *wb,
break;
wbc.more_io = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
wbc.nr_to_write = write_chunk;
wbc.pages_skipped = 0;
trace_wbc_writeback_start(&wbc, wb->bdi);
@ -654,8 +697,8 @@ static long wb_writeback(struct bdi_writeback *wb,
writeback_inodes_wb(wb, &wbc);
trace_wbc_writeback_written(&wbc, wb->bdi);
work->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
work->nr_pages -= write_chunk - wbc.nr_to_write;
wrote += write_chunk - wbc.nr_to_write;
/*
* If we consumed everything, see if we have more
@ -670,7 +713,7 @@ static long wb_writeback(struct bdi_writeback *wb,
/*
* Did we write something? Try for more
*/
if (wbc.nr_to_write < MAX_WRITEBACK_PAGES)
if (wbc.nr_to_write < write_chunk)
continue;
/*
* Nothing written. Wait for some inode to
@ -718,6 +761,23 @@ static unsigned long get_nr_dirty_pages(void)
get_nr_dirty_inodes();
}
static long wb_check_background_flush(struct bdi_writeback *wb)
{
if (over_bground_thresh()) {
struct wb_writeback_work work = {
.nr_pages = LONG_MAX,
.sync_mode = WB_SYNC_NONE,
.for_background = 1,
.range_cyclic = 1,
};
return wb_writeback(wb, &work);
}
return 0;
}
static long wb_check_old_data_flush(struct bdi_writeback *wb)
{
unsigned long expired;
@ -787,6 +847,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait)
* Check for periodic writeback, kupdated() style
*/
wrote += wb_check_old_data_flush(wb);
wrote += wb_check_background_flush(wb);
clear_bit(BDI_writeback_running, &wb->bdi->state);
return wrote;
@ -873,7 +934,7 @@ void wakeup_flusher_threads(long nr_pages)
list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
if (!bdi_has_dirty_io(bdi))
continue;
__bdi_start_writeback(bdi, nr_pages, false, false);
__bdi_start_writeback(bdi, nr_pages, false);
}
rcu_read_unlock();
}
@ -1164,7 +1225,7 @@ EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
* @sb: the superblock
*
* This function writes and waits on any dirty inode belonging to this
* super_block. The number of pages synced is returned.
* super_block.
*/
void sync_inodes_sb(struct super_block *sb)
{
@ -1242,11 +1303,11 @@ int sync_inode(struct inode *inode, struct writeback_control *wbc)
EXPORT_SYMBOL(sync_inode);
/**
* sync_inode - write an inode to disk
* sync_inode_metadata - write an inode to disk
* @inode: the inode to sync
* @wait: wait for I/O to complete.
*
* Write an inode to disk and adjust it's dirty state after completion.
* Write an inode to disk and adjust its dirty state after completion.
*
* Note: only writes the actual inode, no associated data or other metadata.
*/

View File

@ -40,7 +40,7 @@
* status of that page is hard. See end_buffer_async_read() for the details.
* There is no point in duplicating all that complexity.
*/
static void mpage_end_io_read(struct bio *bio, int err)
static void mpage_end_io(struct bio *bio, int err)
{
const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
@ -50,44 +50,29 @@ static void mpage_end_io_read(struct bio *bio, int err)
if (--bvec >= bio->bi_io_vec)
prefetchw(&bvec->bv_page->flags);
if (uptodate) {
SetPageUptodate(page);
} else {
ClearPageUptodate(page);
SetPageError(page);
if (bio_data_dir(bio) == READ) {
if (uptodate) {
SetPageUptodate(page);
} else {
ClearPageUptodate(page);
SetPageError(page);
}
unlock_page(page);
} else { /* bio_data_dir(bio) == WRITE */
if (!uptodate) {
SetPageError(page);
if (page->mapping)
set_bit(AS_EIO, &page->mapping->flags);
}
end_page_writeback(page);
}
unlock_page(page);
} while (bvec >= bio->bi_io_vec);
bio_put(bio);
}
static void mpage_end_io_write(struct bio *bio, int err)
{
const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
do {
struct page *page = bvec->bv_page;
if (--bvec >= bio->bi_io_vec)
prefetchw(&bvec->bv_page->flags);
if (!uptodate){
SetPageError(page);
if (page->mapping)
set_bit(AS_EIO, &page->mapping->flags);
}
end_page_writeback(page);
} while (bvec >= bio->bi_io_vec);
bio_put(bio);
}
static struct bio *mpage_bio_submit(int rw, struct bio *bio)
{
bio->bi_end_io = mpage_end_io_read;
if (rw == WRITE)
bio->bi_end_io = mpage_end_io_write;
bio->bi_end_io = mpage_end_io;
submit_bio(rw, bio);
return NULL;
}

View File

@ -1579,6 +1579,7 @@ static int nfs_create(struct inode *dir, struct dentry *dentry, int mode,
{
struct iattr attr;
int error;
int open_flags = 0;
dfprintk(VFS, "NFS: create(%s/%ld), %s\n",
dir->i_sb->s_id, dir->i_ino, dentry->d_name.name);
@ -1586,7 +1587,10 @@ static int nfs_create(struct inode *dir, struct dentry *dentry, int mode,
attr.ia_mode = mode;
attr.ia_valid = ATTR_MODE;
error = NFS_PROTO(dir)->create(dir, dentry, &attr, 0, NULL);
if ((nd->flags & LOOKUP_CREATE) != 0)
open_flags = nd->intent.open.flags;
error = NFS_PROTO(dir)->create(dir, dentry, &attr, open_flags, NULL);
if (error != 0)
goto out_err;
return 0;

View File

@ -1151,7 +1151,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
goto err_task_lock;
}
if (oom_score_adj < task->signal->oom_score_adj &&
if (oom_score_adj < task->signal->oom_score_adj_min &&
!capable(CAP_SYS_RESOURCE)) {
err = -EACCES;
goto err_sighand;
@ -1164,6 +1164,8 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
atomic_dec(&task->mm->oom_disable_count);
}
task->signal->oom_score_adj = oom_score_adj;
if (has_capability_noaudit(current, CAP_SYS_RESOURCE))
task->signal->oom_score_adj_min = oom_score_adj;
/*
* Scale /proc/pid/oom_adj appropriately ensuring that OOM_DISABLE is
* always attainable.

View File

@ -100,6 +100,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
"VmallocChunk: %8lu kB\n"
#ifdef CONFIG_MEMORY_FAILURE
"HardwareCorrupted: %5lu kB\n"
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
"AnonHugePages: %8lu kB\n"
#endif
,
K(i.totalram),
@ -128,7 +131,12 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
K(i.freeswap),
K(global_page_state(NR_FILE_DIRTY)),
K(global_page_state(NR_WRITEBACK)),
K(global_page_state(NR_ANON_PAGES)),
K(global_page_state(NR_ANON_PAGES)
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
HPAGE_PMD_NR
#endif
),
K(global_page_state(NR_FILE_MAPPED)),
K(global_page_state(NR_SHMEM)),
K(global_page_state(NR_SLAB_RECLAIMABLE) +
@ -150,6 +158,10 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
vmi.largest_chunk >> 10
#ifdef CONFIG_MEMORY_FAILURE
,atomic_long_read(&mce_bad_pages) << (PAGE_SHIFT - 10)
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
HPAGE_PMD_NR)
#endif
);

Some files were not shown because too many files have changed in this diff Show More