linux/drivers/hv
Vitaly Kuznetsov 5abbbb75d7 Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural
Memory blocks can be onlined in random order. When this order is not natural
some memory pages are not onlined because of the redundant check in
hv_online_page().

Here is a real world scenario:
1) Host tries to hot-add the following (process_hot_add):
  pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144

2) This results in adding 4 memory blocks:
[  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
[  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
[  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
[  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
The last one is incomplete but we have special has->covered_end_pfn counter to
avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
them online later on.

3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
$ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
/sys/devices/system/memory/memory10/state offline
/sys/devices/system/memory/memory11/state offline
/sys/devices/system/memory/memory12/state offline
/sys/devices/system/memory/memory9/state offline

4) We bring them online in non-natural order:
$grep MemTotal /proc/meminfo
MemTotal:         966348 kB
$echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
MemTotal:        1019596 kB
$echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB
$echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB

As you can see memory9 block gives us zero additional memory. We can also
observe a huge discrepancy between host- and guest-reported memory sizes.

The root cause of the issue is the redundant pg >= covered_start_pfn check (and
covered_start_pfn advancing) in hv_online_page(). When upper memory block in
being onlined before the lower one (memory12 and memory11 in the above case) we
advance the covered_start_pfn pointer and all memory9 pages do not pass the
check. If the assumption that host always gives us requests in sequential order
and pg_start always equals rg_start when the first request for the new HA
region is received (that's the case in my testing) is correct than we can get
rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
sufficient.

The current char-next branch is broken and this patch fixes
the bug.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 11:53:54 +01:00
..
channel_mgmt.c Drivers: hv: vmbus: Perform device register in the per-channel work element 2015-03-25 11:53:53 +01:00
channel.c Drivers: hv: vmbus: Suport an API to send packet with additional control 2015-03-01 19:31:47 -08:00
connection.c Drivers: hv: vmbus: Perform device register in the per-channel work element 2015-03-25 11:53:53 +01:00
hv_balloon.c Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural 2015-03-25 11:53:54 +01:00
hv_fcopy.c hv: hv_fcopy: drop the obsolete message on transfer failure 2015-01-25 09:17:58 -08:00
hv_kvp.c Drivers: hv: kvp,vss: Fast propagation of userspace communication failure 2014-11-26 19:00:32 -08:00
hv_snapshot.c Drivers: hv: kvp,vss: Fast propagation of userspace communication failure 2014-11-26 19:00:32 -08:00
hv_util.c Drivers: hv: util: On device remove, close the channel after de-initializing the service 2015-03-01 19:31:02 -08:00
hv.c Drivers: hv: vmbus: Teardown clockevent devices on module unload 2015-03-01 19:30:07 -08:00
hyperv_vmbus.h Drivers: hv: vmbus: Perform device register in the per-channel work element 2015-03-25 11:53:53 +01:00
Kconfig x86: Make Linux guest support optional 2013-03-04 13:14:25 -08:00
Makefile Drivers: hv: Implement the file copy service 2014-02-18 10:53:48 -08:00
ring_buffer.c Drivers: hv: vmbus: Enable interrupt driven flow control 2014-09-23 23:31:22 -07:00
vmbus_drv.c mei: bus: () can be static 2015-03-01 21:43:37 -08:00