Move zram sysfs code to zram drv and remove zram_sysfs.c
file. This gives ability to make static a number of previously
exported zram functions, used from zram sysfs, e.g. internal zram
zram_meta_alloc/free(). We also can drop zram_drv wrapper
functions, used from zram sysfs:
e.g. zram_reset_device()/__zram_reset_device() pair.
v2: as suggested by Greg K-H, move MODULE description to the
bottom of the file.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use atomic64_xxx() to replace open-coded zram_stat64_xxx().
Some architectures have native support of atomic64 operations,
so we can get rid of the spin_lock() in zram_stat64_xxx().
On the other hand, for platforms use generic version of atomic64
implement, it may cause an extra save/restore of the interrupt
flag. So it's a tradeoff.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some architectures provides architecture-specific, optimized version of
clear_page()/copy_page(), which may have better performance than
memset()/memcpy(). So use clear_page()/copy_page() to optimize zram
performance if possible.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now there's no caller of zram_get_num_devices(), so kill it.
And change zram_devices to static because it's only used in zram_drv.c.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Simplify and optimize dev_to_zram() without walking the zram_devices
array.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use zram->init_lock to protect access to zram->meta, otherwise it
may cause invalid memory access if zram->meta has been freed by
zram_reset_device().
This issue may be triggered by:
Thread 1:
while true; do cat mem_used_total; done
Thread 2:
while true; do echo 8M > disksize; echo 1 > reset; done
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Function valid_io_request() should verify the entire request are within
the zram device address range. Otherwise it may cause invalid memory
access when accessing/modifying zram->meta->table[index] because the
'index' is out of range. Then it may access non-exist memory, randomly
modify memory belong to other subsystems, which is hard to track down.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When doing a patial write and the whole page is filled with zero,
zram_bvec_write() will free uncmem twice.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On error recovery path of zram_init(), it leaks the zram device object
causing the failure. So change create_device() to free allocated
resources on error path.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zram_slot_free_notify() is free-running without any protection from
concurrent operations. So there are race conditions between
zram_bvec_read()/zram_bvec_write() and zram_slot_free_notify(),
and possible consequences include:
1) Trigger BUG_ON(!handle) on zram_bvec_write() side.
2) Access to freed pages on zram_bvec_read() side.
3) Break some fields (bad_compress, good_compress, pages_stored)
in zram->stats if the swap layer makes concurrently call to
zram_slot_free_notify().
So enhance zram_slot_free_notify() to acquire writer lock on zram->lock
before calling zram_free_page().
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Memory for zram->disk object may have already been freed after returning
from destroy_device(zram), then it's unsafe for zram_reset_device(zram)
to access zram->disk again.
We can't solve this bug by flipping the order of destroy_device(zram)
and zram_reset_device(zram), that will cause deadlock issues to the
zram sysfs handler.
So fix it by holding an extra reference to zram->disk before calling
destroy_device(zram).
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When zram decompress fails, the code unnecessarily dumps failure messages and
does stat accumulation in function zram_decompress_page(), this work is already
done in function zram_decompress_page, the patch skips the redundant work.
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
alloc failures already get standardized OOM
messages and a dump_stack.
For the affected mallocs around these OOM messages:
Converted kzallocs with multiplies to kcalloc.
Converted kmallocs with multiplies to kmalloc_array.
Converted a kmalloc/strlen/strncpy to kstrdup.
Moved a spin_lock below a removed OOM message and
removed a now unnecessary spin_unlock.
Neatened alignment and whitespace.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lockdep complains about recursive deadlock of zram->init_lock.
[1] made it false positive because we can't request IO to zram
before setting disksize. Anyway, we should shut lockdep up to
avoid many reporting from user.
[1] : zram: force disksize setting before using zram
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kbuild bot whinges due to print format mistmatch caused by
zram: force disksize setting before using zram.
This patch fixes it.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1) User of zram normally do mkfs.xxx or mkswap before using
the zram block device(ex, normally, do it at booting time)
It ends up allocating such metadata of zram before real usage so
benefit of lazy initialzation would be mitigated.
2) Some user want to use zram when memory pressure is high.(ie, load zram
dynamically, NOT booting time). It does make sense because people don't
want to waste memory until memory pressure is high(ie, where zram is really
helpful time). In this case, lazy initialzation could be failed easily
because we will use GFP_NOIO instead of GFP_KERNEL for avoiding deadlock.
So the benefit of lazy initialzation would be mitigated, too.
3) Metadata overhead is not critical and Nitin has a plan to diet it.
4K : 12 byte(64bit machine) -> 64G : 192M so 0.3% isn't big overhead
If insane user use such big zram device up to 20, it could consume 6% of ram
but efficieny of zram will cover the waste.
So this patch gives up lazy initialization and instead we initialize metadata
at disksize setting time.
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now zram document syas "set disksize is optional"
but partly it's wrong. When you try to use zram firstly after
booting, you must set disksize, otherwise zram can't work because
zram gendisk's size is 0. But once you do it, you can use zram freely
after reset because reset doesn't reset to zero paradoxically.
So in this time, disksize setting is optional.:(
It's inconsitent for user behavior and not straightforward.
This patch forces always setting disksize firstly before using zram.
Yes. It changes current behavior so someone could complain when
he upgrades zram. Apparently it could be a problem if zram is mainline
but it still lives in staging so behavior could be changed for right
way to go. Let them excuse.
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now zram allocates new page with GFP_KERNEL in zram I/O path
if IO is partial. Unfortunately, It may cause deadlock with
reclaim path like below.
write_page from fs
fs_lock
allocation(GFP_KERNEL)
reclaim
pageout
write_page from fs
fs_lock <-- deadlock
This patch fixes it by using GFP_NOIO. In read path, we
reorganize code flow so that kmap_atomic is called after the
GFP_NOIO allocation.
Cc: stable@vger.kernel.org
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
[ penberg@kernel.org: don't use GFP_ATOMIC ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zs_create_pool() currently takes a name argument which is
never used in any useful way.
This patch removes it.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It seems like an overkill to have adding and subtracting
1 functions from the 32bit counters. Just do it directly.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ->disksize variable stores values in units of bytes,
print the correct size in Kb
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Simplify dealing with num_devices when initializing zram.
Also cleanup some of the output messages.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
- Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
- Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.
Fixes bug 50081:
https://bugzilla.kernel.org/show_bug.cgi?id=50081
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@vger.kernel.org>
Reported-by: Mihail Kasadjikov <hamer.mk@gmail.com>
Reported-by: Tomas M <tomas@slax.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add missing angle bracket before and after the URL.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zram_bvec_read() shared decompress functionality with zram_read_before_write() function.
Factor-out and make commonly used zram_decompress_page() function, which also simplified
error handling in zram_bvec_read().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This resolves the conflict with:
drivers/staging/comedi/drivers/amplc_dio200.c
and syncs up the changes that happened in the staging directory for
3.7-rc3.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change 130f315a (staging: zram: remove special handle of uncompressed page)
introduced a bug in the handling of incompressible pages which resulted in
memory allocation failure for such pages.
When a page expands on compression, say from 4K to 4K+30, we were trying to
do zsmalloc(pool, 4K+30). However, the maximum size which zsmalloc can
allocate is PAGE_SIZE (for obvious reasons), so such allocation requests
always return failure (0).
For a page that has compressed size larger than the original size (this may
happen with already compressed or random data), there is no point storing
the compressed version as that would take more space and would also require
time for decompression when needed again. So, the fix is to store any page,
whose compressed size exceeds a threshold (max_zpage_size), as-it-is i.e.
without compression. Memory required for storing this uncompressed page can
then be requested from zsmalloc which supports PAGE_SIZE sized allocations.
Lastly, the fix checks that we do not attempt to "decompress" the page which
we stored in the uncompressed form -- we just memcpy() out such pages.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: viechweg@gmail.com
Reported-by: paerley@gmail.com
Reported-by: wu.tommy@gmail.com
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zram doesn't use xv_malloc any more so it doesn't have
limitation about zobj_header.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch switches zcache and zram dependency to ZSMALLOC
rather than X86. There is no net change since ZSMALLOC
depends on X86, however, this prevent further changes to
these files as zsmalloc dependencies change.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using the __aligned() attribute in favor of __attribute__((aligned(size)))
Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Porting zram to use the pr_warn() function instead of the deprecated
pr_warning().
Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xvmalloc can't handle PAGE_SIZE page so that zram have to
handle it specially but zsmalloc can do it so let's remove
unnecessary special handling code.
Quote from Nitin
"I think page vs handle distinction was added since xvmalloc could not
handle full page allocation. Now that zsmalloc allows full page
allocation, we can just use it for both cases. This would also allow
removing the ZRAM_UNCOMPRESSED flag. The only downside will be slightly
slower code path for full page allocation but this event is anyways
supposed to be rare, so should be fine."
1. This patch reduces code very much.
drivers/staging/zram/zram_drv.c | 104 +++++--------------------------------
drivers/staging/zram/zram_drv.h | 17 +-----
drivers/staging/zram/zram_sysfs.c | 6 +--
3 files changed, 15 insertions(+), 112 deletions(-)
2. change pages_expand with bad_compress so it can count
bad compression(above 75%) ratio.
3. remove zobj_header which is for back-reference for defragmentation
because firstly, it's not used at the moment and zsmalloc can't handle
bigger size than PAGE_SIZE so zram can't do it any more without redesign.
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fd1a30de makes a bug that it uses (struct page *) as zsmalloc's handle
although it's a uncompressed page so that it can access random page,
return random data or even crashed by get_first_page in zs_map_object.
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should use unsigned long as handle instead of void * to avoid any
confusion. Without this, users may just treat zs_malloc return value as
a pointer and try to deference it.
This patch passed compile test(zram, zcache and ramster) and zram is
tested on qemu.
changelog
* from v2
- remove hval pointed out by Nitin
- based on next-20120607
* from v1
- change zcache's zv_create return value
- baesd on next-20120604
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
zram accepts number of devices to be created
as a module parameter. This was renamed from
num_devices to zram_num_devices (without updating
the documentation!) since num_devices was declared
as a non-static global variable, polluting the global
namespace. Now, we declare it as a static variable
and revert back the name change.
The documentation (zram.txt) already mentions
num_devices as the module parameter name.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
linux/vmalloc.h added to zsmalloc-main.c to resolve implicit
declaration errors.
X86 dependency added to zsmalloc and dependent drivers zcache and zram.
This X86 only requirement is not ideal. Working to find portable
functions for __flush_tlb_one and set_pte.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The allocation of zram->compress_buffer is misssing a GFP_* specifier.
This is equivalent to GFP_NOWAIT but it is more likely a omission.
Since the allocation just above it uses GFP_KERNEL, there is no reason
to use GFP_NOWAIT here. Therefore, add GFP_KERNEL.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As reported by checkpatch.pl strict_strtoX is obsolet and should be
replaced by kstrtoX.
Signed-off-by: Sergey Datsevich <srgdts@gmail.com>
Signed-off-by: Bjoern Meier <bjoernmeier@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fixes sparse warning:
zram_drv.c:666:6: warning: symbol 'zram_slot_free_notify' was not
declared. Should it be static?
Also, max_zpage_size is now size_t just to be consistent with data-type
of other variables maintaining sizes of various kinds.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>