Now zram allocates new page with GFP_KERNEL in zram I/O path
if IO is partial. Unfortunately, It may cause deadlock with
reclaim path like below.
write_page from fs
fs_lock
allocation(GFP_KERNEL)
reclaim
pageout
write_page from fs
fs_lock <-- deadlock
This patch fixes it by using GFP_NOIO. In read path, we
reorganize code flow so that kmap_atomic is called after the
GFP_NOIO allocation.
Cc: stable@vger.kernel.org
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
[ penberg@kernel.org: don't use GFP_ATOMIC ]
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zs_create_pool() currently takes a name argument which is
never used in any useful way.
This patch removes it.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It seems like an overkill to have adding and subtracting
1 functions from the 32bit counters. Just do it directly.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ->disksize variable stores values in units of bytes,
print the correct size in Kb
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Simplify dealing with num_devices when initializing zram.
Also cleanup some of the output messages.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
- Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
- Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.
Fixes bug 50081:
https://bugzilla.kernel.org/show_bug.cgi?id=50081
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@vger.kernel.org>
Reported-by: Mihail Kasadjikov <hamer.mk@gmail.com>
Reported-by: Tomas M <tomas@slax.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add missing angle bracket before and after the URL.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zram_bvec_read() shared decompress functionality with zram_read_before_write() function.
Factor-out and make commonly used zram_decompress_page() function, which also simplified
error handling in zram_bvec_read().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This resolves the conflict with:
drivers/staging/comedi/drivers/amplc_dio200.c
and syncs up the changes that happened in the staging directory for
3.7-rc3.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change 130f315a (staging: zram: remove special handle of uncompressed page)
introduced a bug in the handling of incompressible pages which resulted in
memory allocation failure for such pages.
When a page expands on compression, say from 4K to 4K+30, we were trying to
do zsmalloc(pool, 4K+30). However, the maximum size which zsmalloc can
allocate is PAGE_SIZE (for obvious reasons), so such allocation requests
always return failure (0).
For a page that has compressed size larger than the original size (this may
happen with already compressed or random data), there is no point storing
the compressed version as that would take more space and would also require
time for decompression when needed again. So, the fix is to store any page,
whose compressed size exceeds a threshold (max_zpage_size), as-it-is i.e.
without compression. Memory required for storing this uncompressed page can
then be requested from zsmalloc which supports PAGE_SIZE sized allocations.
Lastly, the fix checks that we do not attempt to "decompress" the page which
we stored in the uncompressed form -- we just memcpy() out such pages.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Reported-by: viechweg@gmail.com
Reported-by: paerley@gmail.com
Reported-by: wu.tommy@gmail.com
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Zram doesn't use xv_malloc any more so it doesn't have
limitation about zobj_header.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch improves mapping performance in zsmalloc by getting
usage information from the user in the form of a "mapping mode"
and using it to avoid unnecessary copying for objects that span
pages.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch switches zcache and zram dependency to ZSMALLOC
rather than X86. There is no net change since ZSMALLOC
depends on X86, however, this prevent further changes to
these files as zsmalloc dependencies change.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Using the __aligned() attribute in favor of __attribute__((aligned(size)))
Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Porting zram to use the pr_warn() function instead of the deprecated
pr_warning().
Signed-off-by: Sam Hansen <solid.se7en@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xvmalloc can't handle PAGE_SIZE page so that zram have to
handle it specially but zsmalloc can do it so let's remove
unnecessary special handling code.
Quote from Nitin
"I think page vs handle distinction was added since xvmalloc could not
handle full page allocation. Now that zsmalloc allows full page
allocation, we can just use it for both cases. This would also allow
removing the ZRAM_UNCOMPRESSED flag. The only downside will be slightly
slower code path for full page allocation but this event is anyways
supposed to be rare, so should be fine."
1. This patch reduces code very much.
drivers/staging/zram/zram_drv.c | 104 +++++--------------------------------
drivers/staging/zram/zram_drv.h | 17 +-----
drivers/staging/zram/zram_sysfs.c | 6 +--
3 files changed, 15 insertions(+), 112 deletions(-)
2. change pages_expand with bad_compress so it can count
bad compression(above 75%) ratio.
3. remove zobj_header which is for back-reference for defragmentation
because firstly, it's not used at the moment and zsmalloc can't handle
bigger size than PAGE_SIZE so zram can't do it any more without redesign.
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fd1a30de makes a bug that it uses (struct page *) as zsmalloc's handle
although it's a uncompressed page so that it can access random page,
return random data or even crashed by get_first_page in zs_map_object.
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We should use unsigned long as handle instead of void * to avoid any
confusion. Without this, users may just treat zs_malloc return value as
a pointer and try to deference it.
This patch passed compile test(zram, zcache and ramster) and zram is
tested on qemu.
changelog
* from v2
- remove hval pointed out by Nitin
- based on next-20120607
* from v1
- change zcache's zv_create return value
- baesd on next-20120604
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull kmap_atomic cleanup from Cong Wang.
It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().
Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.
* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
drbd: remove the second argument of k[un]map_atomic()
zcache: remove the second argument of k[un]map_atomic()
gma500: remove the second argument of k[un]map_atomic()
dm: remove the second argument of k[un]map_atomic()
tomoyo: remove the second argument of k[un]map_atomic()
sunrpc: remove the second argument of k[un]map_atomic()
rds: remove the second argument of k[un]map_atomic()
net: remove the second argument of k[un]map_atomic()
mm: remove the second argument of k[un]map_atomic()
lib: remove the second argument of k[un]map_atomic()
power: remove the second argument of k[un]map_atomic()
kdb: remove the second argument of k[un]map_atomic()
udf: remove the second argument of k[un]map_atomic()
ubifs: remove the second argument of k[un]map_atomic()
squashfs: remove the second argument of k[un]map_atomic()
reiserfs: remove the second argument of k[un]map_atomic()
ocfs2: remove the second argument of k[un]map_atomic()
ntfs: remove the second argument of k[un]map_atomic()
...
zram accepts number of devices to be created
as a module parameter. This was renamed from
num_devices to zram_num_devices (without updating
the documentation!) since num_devices was declared
as a non-static global variable, polluting the global
namespace. Now, we declare it as a static variable
and revert back the name change.
The documentation (zram.txt) already mentions
num_devices as the module parameter name.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
linux/vmalloc.h added to zsmalloc-main.c to resolve implicit
declaration errors.
X86 dependency added to zsmalloc and dependent drivers zcache and zram.
This X86 only requirement is not ideal. Working to find portable
functions for __flush_tlb_one and set_pte.
Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The allocation of zram->compress_buffer is misssing a GFP_* specifier.
This is equivalent to GFP_NOWAIT but it is more likely a omission.
Since the allocation just above it uses GFP_KERNEL, there is no reason
to use GFP_NOWAIT here. Therefore, add GFP_KERNEL.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As reported by checkpatch.pl strict_strtoX is obsolet and should be
replaced by kstrtoX.
Signed-off-by: Sergey Datsevich <srgdts@gmail.com>
Signed-off-by: Bjoern Meier <bjoernmeier@hotmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-3.2/core' of git://git.kernel.dk/linux-block: (29 commits)
block: don't call blk_drain_queue() if elevator is not up
blk-throttle: use queue_is_locked() instead of lockdep_is_held()
blk-throttle: Take blkcg->lock while traversing blkcg->policy_list
blk-throttle: Free up policy node associated with deleted rule
block: warn if tag is greater than real_max_depth.
block: make gendisk hold a reference to its queue
blk-flush: move the queue kick into
blk-flush: fix invalid BUG_ON in blk_insert_flush
block: Remove the control of complete cpu from bio.
block: fix a typo in the blk-cgroup.h file
block: initialize the bounce pool if high memory may be added later
block: fix request_queue lifetime handling by making blk_queue_cleanup() properly shutdown
block: drop @tsk from attempt_plug_merge() and explain sync rules
block: make get_request[_wait]() fail if queue is dead
block: reorganize throtl_get_tg() and blk_throtl_bio()
block: reorganize queue draining
block: drop unnecessary blk_get/put_queue() in scsi_cmd_ioctl() and blk_get_tg()
block: pass around REQ_* flags instead of broken down booleans during request alloc/free
block: move blk_throtl prototypes to block/blk.h
block: fix genhd refcounting in blkio_policy_parse_and_set()
...
Fix up trivial conflicts due to "mddev_t" -> "struct mddev" conversion
and making the request functions be of type "void" instead of "int" in
- drivers/md/{faulty.c,linear.c,md.c,md.h,multipath.c,raid0.c,raid1.c,raid10.c,raid5.c}
- drivers/staging/zram/zram_drv.c
There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.
Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Fixes sparse warning:
zram_drv.c:666:6: warning: symbol 'zram_slot_free_notify' was not
declared. Should it be static?
Also, max_zpage_size is now size_t just to be consistent with data-type
of other variables maintaining sizes of various kinds.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When the allocation of zram->table fails, we set zram->disksize to zero
to prevent accessing the unallocated table entries during cleanup.
However, we currently don't take this precaution when the initialization
fails earlier.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently init_lock only prevents concurrent execution of zram_init_device()
and zram_reset_device() but not zram_make_request() nor sysfs store functions.
This patch changes init_lock into a rw_semaphore. A write lock is taken by
init, reset and store functions, a read lock is taken by zram_make_request().
Also, avoids to release the lock before calling __zram_reset_device() for
cleaning after a failed init, thus preventing any concurrent task to see an
inconsistent state of zram.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The global variable "num_devices" is too general to be
global. This patch switches the name to be "zram_num_devices".
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The global variable "devices" is too general to be global.
This patch switches the name to be "zram_devices".
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the unmapping order of KM_USER0/1 in
handle_uncompressed_page() and zram_read() so that kmap()/kunmap() calls
are correctly nested.
Reported-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, nothing protects zram table from concurrent access.
For instance, ZRAM_UNCOMPRESSED bit can be cleared by zram_free_page()
called from a concurrent write between the time ZRAM_UNCOMPRESSED has
been set and the time it is tested to unmap KM_USER0 in
zram_bvec_write(). This ultimately leads to kernel panic.
Also, a read request can occurs when the page has been freed by a
running write request and before it has been updated, leading to
zero filled block being incorrectly read and "Read before write"
error message.
This patch replace the current mutex by a rw_semaphore. It extends
the protection to zram table (currently, only compression buffers are
protected) and read requests (currently, only write requests are
protected).
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit 7b19b8d45b (zram: Prevent overflow
in logical block size) introduced ZRAM_LOGICAL_BLOCK_SIZE constant to
prevent overflow of logical block size on 64k page kernel.
However, the current implementation of zram only allow operation on block
of the same size as a page. That makes theorically legit 4k requests fail
on 64k page kernel.
This patch makes zram allow operation on partial pages. Basically, it
means we still do operations on full pages internally, but only copy the
relevent segments from/to the user memory.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch refactor the code of zram_read/write() functions. It does
not removes a lot of duplicate code alone, but is mostly a helper for
the third patch of this series (Staging: zram: allow partial page
operations).
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The offset of uncompressed page is always zero: handle_uncompressed_page()
doesn't have to care about it.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Both zram and zcache use xvmalloc allocator. If xvmalloc
is compiled separately for both of them, we will get linker
error if they are both selected as "built-in". We can also
get linker error regarding missing xvmalloc symbols if zram
is not built.
So, we now compile xvmalloc separately and export its symbols
which are then used by both of zram and zcache.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently the device is initialized when first write is done to the
device. Any read attempt before the first write would fail, including
"hidden" read the user may not know about (as for example if he tries
to write a partial block).
This patch initializes the device on first request, whether read or
write.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is to resolve a merge conflict with:
drivers/staging/zram/zram_drv.c
as pointed out by Stephen Rothwell
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In zram_read() and zram_write() we were not incrementing the
index number and thus were reading/writing values from/to
incorrect sectors on zram disk, resulting in data corruption.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch eliminates duplicate code. The remove_block_head function
is a special case of remove_block which can be contained in remove_block
without confusion.
The portion of code in remove_block_head which was noted as "DEBUG ONLY"
is now mandatory. Doing this provides consistent management of the double
linked list of blocks under a freelist and makes this consolidation
of delete block code safe. The first and last blocks will have NULL
pointers in their previous and next page pointers respectively.
Additionally, any time a block is removed from a free list the next and
previous pointers will be set to NULL to avoid misuse outside xvmalloc.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently zram will do nothing to the page in the bvec when that page
has not been previously written. This allows random data to leak to
user space. That can be seen by doing the following:
## Load the module and create a 256Mb zram device called /dev/zram0
# modprobe zram
# echo $((256*1024*1024)) > /sys/class/block/zram0/disksize
## Initialize the device by writing zero to the first block
# dd if=/dev/zero of=/dev/zram0 bs=512 count=1
## Read ~256Mb of memory into a file and hope for something interesting
# dd if=/dev/zram0 of=file
This patch will treat an unwritten page as a zero-filled page. If a
page is read before a write has occurred the data returned is all 0's.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
By swapping the total_pages statistic with the lock we close a
hole in the structure for 64-bit CPUs.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a debug config flag to enable debug printk output and future
debug code.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change is in a conditional block which is entered only when there is
an existing data block on the freelist where the insert has taken place.
The new block is pushed onto the freelist stack and this conditional block
is updating links in the prior stack head to point to the new stack head.
After this conditional block the first-/second-level indices are updated
to indicate that there is a free block at this location.
This patch adds an immediate return from the conditional block to avoid
setting bits again to indicate a free block on this freelist. The bits
would already be set because there was an existing free block on this
freelist.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On a 64K page kernel, the value PAGE_SIZE passed to
blk_queue_logical_block_size would overflow the logical block size
argument (resulting in setting it to 0).
This patch sets the logical block size to 4096, using a new
ZRAM_LOGICAL_BLOCK_SIZE constant.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>