linux/Documentation
Minchan Kim bda807d444 mm: migrate: support non-lru movable page migration
We have allowed migration for only LRU pages until now and it was enough
to make high-order pages.  But recently, embedded system(e.g., webOS,
android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
have seen several reports about troubles of small high-order allocation.
For fixing the problem, there were several efforts (e,g,.  enhance
compaction algorithm, SLUB fallback to 0-order page, reserved memory,
vmalloc and so on) but if there are lots of non-movable pages in system,
their solutions are void in the long run.

So, this patch is to support facility to change non-movable pages with
movable.  For the feature, this patch introduces functions related to
migration to address_space_operations as well as some page flags.

If a driver want to make own pages movable, it should define three
functions which are function pointers of struct
address_space_operations.

1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);

What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully.  On returing true, VM marks the
page as PG_isolated so concurrent isolation in several CPUs skip the
page for isolation.  If a driver cannot isolate the page, it should
return *false*.

Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.

2. int (*migratepage) (struct address_space *mapping,
		struct page *newpage, struct page *oldpage, enum migrate_mode);

After isolation, VM calls migratepage of driver with isolated page.  The
function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage.  Keep in mind that you should
indicate to the VM the oldpage is no longer movable via
__ClearPageMovable() under page_lock if you migrated the oldpage
successfully and returns 0.  If driver cannot migrate the page at the
moment, driver can return -EAGAIN.  On -EAGAIN, VM will retry page
migration in a short time because VM interprets -EAGAIN as "temporal
migration failure".  On returning any error except -EAGAIN, VM will give
up the page migration without retrying in this time.

Driver shouldn't touch page.lru field VM using in the functions.

3. void (*putback_page)(struct page *);

If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed
page.  In this function, driver should put the isolated page back to the
own data structure.

4. non-lru movable page flags

There are two page flags for supporting non-lru movable page.

* PG_movable

Driver should use the below function to make page movable under
page_lock.

	void __SetPageMovable(struct page *page, struct address_space *mapping)

It needs argument of address_space for registering migration family
functions which will be called by VM.  Exactly speaking, PG_movable is
not a real flag of struct page.  Rather than, VM reuses page->mapping's
lower bits to represent it.

	#define PAGE_MAPPING_MOVABLE 0x2
	page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;

so driver shouldn't access page->mapping directly.  Instead, driver
should use page_mapping which mask off the low two bits of page->mapping
so it can get right struct address_space.

For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page.  As
well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
__ClearPageMovable).  But __PageMovable is cheap to catch whether page
is LRU or non-lru movable once the page has been isolated.  Because LRU
pages never can have PAGE_MAPPING_MOVABLE in page->mapping.  It is also
good for just peeking to test non-lru movable pages before more
expensive checking with lock_page in pfn scanning to select victim.

For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page.  The lock_page prevents
sudden destroying of page->mapping.

Driver using __SetPageMovable should clear the flag via
__ClearMovablePage under page_lock before the releasing the page.

* PG_isolated

To prevent concurrent isolation among several CPUs, VM marks isolated
page as PG_isolated under lock_page.  So if a CPU encounters PG_isolated
non-lru movable page, it can skip it.  Driver doesn't need to manipulate
the flag because VM will set/clear it automatically.  Keep in mind that
if driver sees PG_isolated page, it means the page have been isolated by
VM so it shouldn't touch page.lru field.  PG_isolated is alias with
PG_reclaim flag so driver shouldn't use the flag for own purpose.

[opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
  Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
..
ABI iio:core: timestamping clock selection support 2016-06-30 19:41:38 +01:00
accounting taskstats: fix nl parsing in accounting/getdelays.c 2016-04-27 12:50:14 -04:00
acpi ACPI / tables: Convert initrd table override to table upgrade mechanism 2016-04-18 23:59:09 +02:00
aoe
arm Documentation: fix common spelling mistakes 2016-04-28 07:51:59 -06:00
arm64 irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144 2016-06-02 18:01:07 +01:00
auxdisplay
backlight
blackfin
block The most interesting thing (IMO) this time around is some beginning 2016-05-19 18:07:25 -07:00
blockdev zram: cosmetic: cleanup documentation 2016-07-26 16:19:19 -07:00
bus-devices
cdrom
cgroup-v1 Documentation/memcg: update kmem limit doc as codes behavior 2016-05-14 10:12:45 -06:00
cma
connector samples: connector: from Documentation to samples directory 2016-04-28 07:47:35 -06:00
console
cpu-freq Documentation: cpufreq: intel_pstate: fix typo 2016-02-18 20:31:53 +01:00
cpuidle
cris
crypto crypto: doc - Use ahash 2016-02-06 15:33:11 +08:00
development-process
device-mapper dm stats: fix spelling mistake in Documentation 2016-05-05 15:25:54 -04:00
devicetree Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 21:35:03 -07:00
dmaengine Merge branch 'topic/async' into for-linus 2016-01-06 15:17:47 +05:30
DocBook iio:core: timestamping clock selection support 2016-06-30 19:41:38 +01:00
driver-model Staging and IIO driver update for 4.7-rc1 2016-05-20 22:20:48 -07:00
dvb [media] media: change email address 2016-01-25 12:01:08 -02:00
early-userspace
EDID
extcon
fault-injection net: Add support for CHANGEUPPER notifier error injection 2015-12-03 11:49:23 -05:00
fb Documentation: fb: fix spelling mistakes 2016-05-10 12:05:27 +03:00
features powerpc: Add HAVE_PERF_USER_STACK_DUMP support 2016-05-11 21:54:05 +10:00
filesystems mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
firmware_class Documentation: fix common spelling mistakes 2016-04-28 07:51:59 -06:00
fmc
fpga
frv
gpio gpio: clarify open drain/source docs 2016-04-27 10:23:44 +02:00
hid
hwmon hwmon: Add driver for FTS BMC chip "Teutates" 2016-07-20 06:29:54 -07:00
i2c [media] rtl2832: change the i2c gate to be mux-locked 2016-05-04 22:40:02 +02:00
ia64
ide
iio iio: Documentation: Add IIO configfs documentation 2015-12-03 18:19:28 +00:00
infiniband Round two of 4.7 merge window patches 2016-05-28 11:04:16 -07:00
input Input: clarify we want BTN_TOOL_<name> on proximity 2016-04-06 10:23:09 -07:00
ioctl gpio: uapi: use 0xB4 as ioctl() major 2016-03-10 16:02:52 +07:00
isdn isdn: i4l: move active-isdn drivers to staging 2016-03-05 15:00:38 -08:00
ja_JP Documentatio: HOWTO: remove regression postings info from translations 2016-04-16 10:49:08 -06:00
kbuild kbuild, x86: Track generated headers with generated-y 2016-07-07 15:58:44 +02:00
kdump kdump: fix dmesg gdbmacro to work with record based printk 2016-06-03 15:06:22 -07:00
ko_KR Documentation: HOWTO: update git home URL in translations 2016-04-16 10:49:18 -06:00
laptops Documentation: laptops: fix spelling mistake 2016-04-28 07:21:48 -06:00
leds leds: core: Fix brightness setting upon hardware blinking enabled 2016-06-08 11:47:06 +02:00
livepatch Merge branches 'for-4.7/core', 'for-4.7/livepatching-doc' and 'for-4.7/livepatching-ppc64' into for-linus 2016-05-17 12:06:35 +02:00
locking locking/Documentation/lockdep: Fix spelling mistakes 2016-04-28 10:40:57 +02:00
m68k
memory-devices
metag
mic Char/Misc patches for 4.6-rc1 2016-03-17 13:47:50 -07:00
mips
misc-devices Merge char-misc-next into staging-next 2016-02-22 14:46:24 -08:00
mmc Documentation: mmc: Add the introduction for mmc-utils 2016-03-31 00:54:22 -06:00
mn10300
mtd Documentation: mtd: improve nand_ecc.txt for readability and correctness 2015-11-17 17:05:14 -08:00
namespaces
netlabel
networking Documentation: ip-sysctl.txt: clarify secure_redirects 2016-05-29 22:40:53 -07:00
nfc
nios2
nvdimm libnvdimm: documentation clarifications 2015-11-12 09:55:23 -08:00
nvmem
parisc
PCI
pcmcia
phy
platform
power PM / runtime: Document steps for device removal 2016-04-05 03:46:59 +02:00
powerpc powerpc/eeh: rename EEH from "extended" to "enhanced" error handling 2016-04-11 20:30:42 +10:00
pps Documentation: pps: fix spelling mistake 2016-04-28 07:23:59 -06:00
prctl Documentation: Fix int/unsigned int comparison 2016-02-17 14:09:43 -07:00
pti
ptp Another relatively boring cycle for the docs tree: typo fixes, translation 2016-03-17 12:09:35 -07:00
rapidio rapidio: add mport char device driver 2016-03-22 15:36:02 -07:00
RCU Documentation: Fix spelling mistake 2016-06-14 16:01:00 -07:00
s390 s390/zcore: remove /sys/kernel/debug/zcore/mem 2015-11-27 09:24:12 +01:00
scheduler
scsi Merge remote-tracking branch 'mkp-scsi/4.7/scsi-fixes' into fixes 2016-06-18 11:59:01 -07:00
security KEYS: Add placeholder for KDF usage with DH 2016-06-03 16:14:34 +10:00
serial TTY and Serial driver update for 4.7-rc1 2016-05-20 20:57:27 -07:00
sh
sound ALSA: compress: fix some typo 2016-05-08 11:27:44 +02:00
spi spi: tools: move spidev_test metadata 2015-11-30 12:14:12 +00:00
sysctl rcu: sysctl: Panic on RCU Stall 2016-06-15 16:00:05 -07:00
target target: make close_session optional 2016-05-10 01:19:26 -07:00
thermal thermal: Syntactic and factual errors in the API document 2016-05-17 07:28:24 -07:00
timers Documentation: fix common spelling mistakes 2016-04-28 07:51:59 -06:00
tpm
trace Char / Misc driver update for 4.7-rc1 2016-05-20 21:20:31 -07:00
usb Hi Greg, below are changes for chipidea and OTG FSM, no major changes. 2016-05-04 10:25:58 -07:00
vDSO
video4linux The most interesting thing (IMO) this time around is some beginning 2016-05-19 18:07:25 -07:00
virtual kvm: introduce KVM_MAX_VCPU_ID 2016-05-11 22:37:54 +02:00
vm mm: migrate: support non-lru movable page migration 2016-07-26 16:19:19 -07:00
w1 w1: add ability to set (SRAM) and store (EEPROM) configuration for temp sensors like DS18B20 2016-05-01 14:37:49 -07:00
watchdog Documentation: Add ebc-c384_wdt watchdog-parameters.txt entry 2016-05-14 18:28:29 +02:00
wimax
x86 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 17:32:28 -07:00
xtensa
zh_CN Documentation:Update Documentation/zh_CN/arm64/booting.txt 2016-04-28 07:15:36 -06:00
00-INDEX
adding-syscalls.txt documentation: trivial typo: adding-syscalls.txt: s/statat/fstatat/ 2016-04-18 11:31:49 -06:00
applying-patches.txt
assoc_array.txt
atomic_ops.txt
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
cgroup-v2.txt Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2016-03-21 10:05:13 -07:00
Changes There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
circular-buffers.txt
clk.txt
coccinelle.txt
CodeOfConflict
CodingStyle Documentation/CodingStyle: add space before parenthesis in example macro 2016-01-25 12:36:28 -07:00
cpu-hotplug.txt Documentation: cpu-hotplug: Fix sysfs mount instructions 2015-12-10 11:35:30 -07:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
devices.txt Documentation: update the devices.txt documentation 2016-03-29 10:11:44 -07:00
digsig.txt
DMA-API-HOWTO.txt dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
DMA-API.txt DMA-API: fix confusing sentence in Documentation/DMA-API.txt 2016-01-11 18:29:00 -07:00
DMA-attributes.txt ARM: 8506/1: common: DMA-mapping: add DMA_ATTR_ALLOC_SINGLE_PAGES attribute 2016-02-11 15:33:38 +00:00
dma-buf-sharing.txt dma-buf: Update docs for SYNC ioctl 2016-03-21 09:26:45 +01:00
DMA-ISA-LPC.txt
dontdiff Documentation: dontdiff: remove media from dontdiff 2015-11-11 10:08:07 -07:00
dynamic-debug-howto.txt
edac.txt EDAC: Remove references to bluesmoke.sourceforge.net 2015-11-26 14:46:06 +01:00
efi-stub.txt doc: efi-stub.txt: Fix arm64 paths 2015-12-14 15:24:03 +00:00
eisa.txt
email-clients.txt A few more documentation patches that wandered in and have no reason to 2015-11-13 09:19:05 -08:00
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gdb-kernel-debugging.txt Revert "scripts/gdb: add documentation example for radix tree" 2016-07-15 14:54:27 +09:00
highuid.txt
HOWTO Documentation: Howto: Fixed subtitles style 2016-03-09 15:30:03 -07:00
hsi.txt
hw_random.txt
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt iommu/vt-d: Fix link to Intel IOMMU Specification 2016-01-29 12:32:12 +01:00
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt ipmi watchdog : add panic_wdt_timeout parameter 2015-11-16 06:28:43 -06:00
IRQ-affinity.txt
IRQ-domain.txt Documentation/IRQ-domain.txt: Document irq_domain_create_{linear, tree} 2016-03-31 00:32:59 -06:00
IRQ.txt
irqflags-tracing.txt
isa.txt Documentation: Add ISA bus driver documentation 2016-05-02 09:32:04 -07:00
isapnp.txt
java.txt
kasan.txt mm, kasan: SLAB support 2016-03-25 16:37:42 -07:00
kcov.txt kernel: add kcov code coverage 2016-03-22 15:36:02 -07:00
kernel-doc-nano-HOWTO.txt
kernel-docs.txt Documentation: update Michael K. Johnson's work 2016-04-15 15:37:25 -06:00
kernel-parameters.txt Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 20:43:12 -07:00
kernel-per-CPU-kthreads.txt irq_poll: make blk-iopoll available outside the block layer 2015-12-11 11:52:24 -08:00
kmemcheck.txt
kmemleak.txt
kobject.txt
kprobes.txt
kref.txt
kselftest.txt Documentation: kselftest: Remove duplicate word 2016-03-09 15:33:38 -07:00
ldm.txt
local_ops.txt
lockup-watchdogs.txt kernel/watchdog.c: add sysctl knob hardlockup_panic 2015-11-05 19:34:48 -08:00
logo.gif
logo.txt
lzo.txt Documentation: lzo: fix spelling mistakes 2016-04-28 07:23:11 -06:00
magic-number.txt
mailbox.txt
Makefile [media] samples: v4l: from Documentation to samples directory 2016-05-09 18:34:37 -03:00
ManagementStyle
md-cluster.txt md-cluster: change array_sectors and update size are not supported 2016-05-04 12:39:35 -07:00
md.txt
memory-barriers.txt locking/Documentation: Clarify limited control-dependency scope 2016-06-17 09:54:45 +02:00
memory-hotplug.txt memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE 2016-05-19 19:12:14 -07:00
men-chameleon-bus.txt
module-signing.txt modsign: Fix documentation on module signing enforcement parameter. 2016-03-12 01:48:11 -07:00
mono.txt
nommu-mmap.txt
ntb.txt
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt phy: core: Allow children node to be overridden 2016-04-29 16:39:39 +02:00
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt mm, printk: introduce new format string for flags 2016-03-15 16:55:16 -07:00
pwm.txt pwm: Update documentation 2016-05-17 14:48:04 +02:00
ramoops.txt
rbtree.txt
remoteproc.txt
rfkill.txt rfkill: Add documentation about LED triggers 2016-02-24 09:13:12 +01:00
robust-futex-ABI.txt
robust-futexes.txt Documentation: robust-futexes: fix spelling mistakes 2016-04-28 07:26:41 -06:00
rpmsg.txt rpmsg: use module_rpmsg_driver in existing drivers and examples 2016-05-06 11:09:01 -07:00
rtc.txt rtc: implement a sysfs interface for clock offset 2016-03-14 17:08:16 +01:00
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt stable_kernel_rules.txt: Remove extra space after Cc: 2015-11-20 16:54:57 -07:00
static-keys.txt
SubmitChecklist
SubmittingDrivers
SubmittingPatches SubmittingPatches: fix spelling of "git send-email" 2016-01-25 12:30:18 -07:00
svga.txt
sync_file.txt Documentation: add Sync File doc 2016-04-29 17:37:10 -07:00
sysfs-rules.txt
sysrq.txt Doc: correct the location of sysrq.c 2016-04-28 08:02:36 -06:00
this_cpu_ops.txt
ubsan.txt UBSAN: run-time undefined behavior sanity checker 2016-01-20 17:09:18 -08:00
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt Documentation: xillybus: fix spelling mistake 2016-04-28 07:44:54 -06:00
xz.txt
zorro.txt