Fix a number of issues with the per-MM VMA patch:
(1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
a NOMMU system with more than 2G pages. Makes no difference on a 32-bit
system.
(2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
lest it overflow.
(3) Move the allocation of the vm_area_struct slab back for fork.c.
(4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
(5) Use BUG_ON() rather than if () BUG().
(6) Make the default validate_nommu_regions() a static inline rather than a
#define.
(7) Make free_page_series()'s objection to pages with a refcount != 1 more
informative.
(8) Adjust the __put_nommu_region() banner comment to indicate that the
semaphore must be held for writing.
(9) Limit the number of warnings about munmaps of non-mmapped regions.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a build failure with generic debug pagealloc:
mm/debug-pagealloc.c: In function 'set_page_poison':
mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: In function 'clear_page_poison':
mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: In function 'page_poison':
mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: At top level:
mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages'
include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here
mm/debug-pagealloc.c: In function 'kernel_map_pages':
mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function)
by fixing
- debug_flags should be in struct page
- define DEBUG_PAGEALLOC config option for all architectures
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Copy/paste error. The RV670 microcode should work ok, so it's
not a show stopper.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add Start-of-Frame and End-of-Frame debugging to ipu_idmac.c, in the
future it might also be needed for the actual video processing in
mx3-camera, at which point, the ISRs will have to be transferred to
mx3_camera.c, for which ipu_irq_map() and ipu_irq_unmap() functions will
have to be exported.
Also simplify a couple of pointer-dereferences.
Signed-off-by: Guennadi Liakhovetski <lg@denx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Remove unused #include <version.h> in drivers/net/qlge/qlge_ethtool.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unused #include <version.h> in drivers/net/dnet.c.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that trivial reset of pcount to one was not sufficient
in tcp_retransmit_skb. Multiple counters experience a positive
miscount when skb's pcount gets lowered without the necessary
adjustments (depending on skb's sacked bits which exactly), at
worst a packets_out miscount can crash at RTO if the write queue
is empty!
Triggering this requires mss change, so bidir tcp or mtu probe or
like.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Uwe Bugla <uwe.bugla@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need full-scale adjustment to fix a TCP miscount in the next
patch, so just move it into a helper and call for that from the
other places.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some devices cannot send very short usb transfers. To get around this the
firmware adds a known pattern and flags the driver that it should check for
this pattern on short transfers. This flag was not taken into account by
the driver.
Signed-off-by: Jan Dumon <j.dumon@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changed the order in which things are freed. This fixes an oops when
unplugging the device while network traffic is ongoing.
Signed-off-by: Jan Dumon <j.dumon@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EDIDs should be backward compatible, so don't bail if we see a version
of 3 (which is out there now) and print a message if we see something
newer, but allow it to be parsed.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Check whether the INTERLACE/DBLSCAN is supported by output device. If
not, the mode containing the flag of INTERLACE/DBLSCAN will be marked
as unsupported.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Should be,
edid_vendor[2] = (edid->mfg_id[1] & 0x1f) + '@';
Since vendor ID has only two bytes only, I am somewhat surprised why gcc
doesn't complain this.
Reported-by: Guo, Chaohong <chaohong.guo@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Remove an include that isn't actually needed to prevent needless
rebuilds.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The readq/writeq really need to be static inline on the arches which
don't provide them.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The commit:
platform driver: fix incorrect use of 'platform_bus_type' with 'struct device_driver'
contains this:
-static int __exit pxa2xx_flash_remove(struct device *dev)
+static int __exit pxa2xx_flash_remove(struct platform_device *dev)
...
- .remove = __exit_p(pxa2xx_flash_remove),
+ .remove = __devexit_p(pxa2xx_flash_remove),
which leads to the following build error:
`pxa2xx_flash_remove' referenced in section `.data' of drivers/built-in.o: defined in discarded section `.exit.text' of drivers/built-in.o
This is not the only instance of it in this patch - all __exit_p's
touched by this patch have been converted to __devexit_p's without
regard to the original function.
Let's revert this change and, if we are going to convert functions
to be __devexit/__devinit, lets have that as a _separate_ patch doing
just that change.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Otherwise, the PAGE_CACHE_WC would end up getting us a UC-only mapping, and
the write performance of GTT maps dropped 10x.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[anholt: cleaned up unused var]
Signed-off-by: Eric Anholt <eric@anholt.net>
Add EXPORT_SYMBOL_GPL(fsl_pq_mdio_bus_name) for module builds
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove an unneeded return statement and conditional
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We get this on 32 builds:
fs/built-in.o: In function `extent_fiemap':
(.text+0x1019f2): undefined reference to `__ucmpdi2'
Happens because of a switch statement with a 64 bit argument.
Convert this to an if statement to fix this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_new_inode doesn't call iput to free the inode
when it fails.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Need to check kthread_should_stop after schedule_timeout() before calling
schedule(). This causes threads to sleep with potentially no one to wake them
up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop.
Signed-off-by: Amit Gud <gud@ksu.edu>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The 'flushoncommit' mount option forces any data dirtied by a write in a
prior transaction to commit as part of the current commit. This makes
the committed state a fully consistent view of the file system from the
application's perspective (i.e., it includes all completed file system
operations). This was previously the behavior only when a snapshot is
created.
This is used by Ceph to ensure that completed writes make it to the
platter along with the metadata operations they are bound to (by
BTRFS_IOC_TRANS_{START,END}).
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Add a 'notreelog' mount option to disable the tree log (used by fsync,
O_SYNC writes). This is much slower, but the tree logging produces
inconsistent views into the FS for ceph.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs options can change at times other than mount, yet /proc/mounts shows the
options string used when the fs was mounted (an example would be when btrfs
determines that barriers aren't useful and turns them off.) This patch
instead outputs the actual options in use by btrfs.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Because btrfs is copy-on-write, we end up picking new locations for
blocks very often. This makes it fairly difficult to maintain perfect
read patterns over time, but we can at least do some optimizations
for writes.
This is done today by remembering the last place we allocated and
trying to find a free space hole big enough to hold more than just one
allocation. The end result is that we tend to write sequentially to
the drive.
This happens all the time for metadata and it happens for data
when mounted -o ssd. But, the way we record it is fairly racey
and it tends to fragment the free space over time because we are trying
to allocate fairly large areas at once.
This commit gets rid of the races by adding a free space cluster object
with dedicated locking to make sure that only one process at a time
is out replacing the cluster.
The free space fragmentation is somewhat solved by allowing a cluster
to be comprised of smaller free space extents. This part definitely
adds some CPU time to the cluster allocations, but it allows the allocator
to consume the small holes left behind by cow.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_next_leaf was using blocking locks when it could have been using
faster spinning ones instead. This adds a few extra checks around
the pieces that block and switches over to spinning locks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch removes the pinned_mutex. The extent io map has an internal tree
lock that protects the tree itself, and since we only copy the extent io map
when we are committing the transaction we don't need it there. We also don't
need it when caching the block group since searching through the tree is also
protected by the internal map spin lock.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
This patch removes the block group alloc mutex used to protect the free space
tree for allocations and replaces it with a spin lock which is used only to
protect the free space rb tree. This means we only take the lock when we are
directly manipulating the tree, which makes us a touch faster with
multi-threaded workloads.
This patch also gets rid of btrfs_find_free_space and replaces it with
btrfs_find_space_for_alloc, which takes the number of bytes you want to
allocate, and empty_size, which is used to indicate how much free space should
be at the end of the allocation.
It will return an offset for the allocator to use. If we don't end up using it
we _must_ call btrfs_add_free_space to put it back. This is the tradeoff to
kill the alloc_mutex, since we need to make sure nobody else comes along and
takes our space.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
I've replaced the strange looping constructs with a list_for_each_entry on
space_info->block_groups. If we have a hint we just jump into the loop with
the block group and start looking for space. If we don't find anything we
start at the beginning and start looking. We never come out of the loop with a
ref on the block_group _unless_ we found space to use, then we drop it after we
set the trans block_group.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
This patch cleans up the free space cache code a bit. It better documents the
idiosyncrasies of tree_search_offset and makes the code make a bit more sense.
I took out the info allocation at the start of __btrfs_add_free_space and put it
where it makes more sense. This was left over cruft from when alloc_mutex
existed. Also all of the re-searches we do to make sure we inserted properly.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Btrfs pages being written get set to writeback, and then may go through
a number of steps before they hit the block layer. This includes compression,
checksumming and async bio submission.
The end result is that someone who writes a page and then does
wait_on_page_writeback is likely to unplug the queue before the bio they
cared about got there.
We could fix this by marking bios sync, or by doing more frequent unplugs,
but this commit just changes the async bio submission code to unplug
after it has processed all the bios for a device. The async bio submission
does a fair job of collection bios, so this shouldn't be a huge problem
for reducing merging at the elevator.
For streaming O_DIRECT writes on a 5 drive array, it boosts performance
from 386MB/s to 460MB/s.
Thanks to Hisashi Hifumi for helping with this work.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs uses async helper threads to submit write bios so the checksumming
helper threads don't block on the disk.
The submit bio threads may process bios for more than one block device,
so when they find one device congested they try to move on to other
devices instead of blocking in get_request_wait for one device.
This does a pretty good job of keeping multiple devices busy, but the
congested flag has a number of problems. A congested device may still
give you a request, and other procs that aren't backing off the congested
device may starve you out.
This commit uses the io_context stored in current to decide if our process
has been made a batching process by the block layer. If so, it keeps
sending IO down for at least one batch. This helps make sure we do
a good amount of work each time we visit a bdev, and avoids large IO
stalls in multi-device workloads.
It's also very ugly. A better solution is in the works with Jens Axboe.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Set queue ordered mode. It doesn't really matter what we set here
because we don't ever put any requests on the queue. But we need to set
something other than QUEUE_ORDERED_NONE so that __generic_make_request
passes barrier requests to us.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Move wait queue declaration and unplug to dm_wait_for_completion.
The purpose is to minimize duplicate code in the further patches.
The patch reorders functions a little bit. It doesn't change any
functionality. For proper non-deadlock operation, add_wait_queue must
happen before set_current_state(interruptible) and before the test for
!atomic_read(&md->pending).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Merge pushback and deferred lists into one list - use deferred list
for both deferred and pushed-back bios.
This will be needed for proper support of barrier bios: it is impossible to
support ordering correctly with two lists because the requests on both lists
will be mixed up.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Allow uninterruptible wait for pending IOs.
Add argument "interruptible" to dm_wait_for_completion that specifies
either interruptible or uninterruptible waiting.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Merge __flush_deferred_io() into the only caller, dm_wq_work().
There's no need to have a function that has only one caller.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Move the bio_io_error() calls directly into __split_and_process_bio().
This avoids some code duplication in later patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Rename __split_bio() to __split_and_process_bio() because it not only splits
the bio to serveral parts, but also submits them to target drivers.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove struct dm_wq_req and move "work" directly into struct mapped_device.
In the revised implementation, the thread will do just one type of work
(processing the queue).
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>