Move option parsing out of rbd_get_client() and into its caller.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
A Boolean field "snap_exists" in an rbd mapping is used to indicate
whether a mapped snapshot has been removed from an image's snapshot
context, to stop sending requests for that snapshot as soon as we
know it's gone.
Generalize the interpretation of this field so it applies to
non-snapshot (i.e. "head") mappings. That is, define its value
to be false until the mapping has been set, and then define it to be
true for both snapshot mappings or head mappings.
Rename the field "exists" to reflect the broader interpretation.
The rbd_mapping structure is on its way out, so move the field
back into the rbd_device structure.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Moving the snap_id and snap_name fields into the separate
rbd_mapping structure was misguided. (And in time, perhaps
we'll do away with that structure altogether...)
Move these fields back into struct rbd_device.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If a format 2 image has a parent, its pool id will be specified
using a 64-bit value. Change the pool id we save for an image to
match that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If rbd_dev_snaps_update() has ever been called for an rbd device
structure there could be snapshot structures on its snaps list.
In rbd_add(), this function is called but a subsequent error
path neglected to clean up any of these snapshots.
Add a call to rbd_remove_all_snaps() in the appropriate spot to
remedy this. Change a couple of error labels to be a little
clearer while there.
Drop the leading underscores from the function name; there's nothing
special about that function that they might signify. As suggested
in review, the leading underscores in __rbd_remove_snap_dev() have
been removed as well.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When processing a request, rbd_rq_fn() makes clones of the bio's in
the request's bio chain and submits the results to osd's to be
satisfied. If a request bio straddles the boundary between objects
backing the rbd image, it must be represented by two cloned bio's,
one for the first part (at the end of one object) and one for the
second (at the beginning of the next object).
This has been handled by a function bio_chain_clone(), which
includes an interface only a mother could love, and which has
been found to have other problems.
This patch defines two new fairly generic bio functions (one which
replaces bio_chain_clone()) to help out the situation, and then
revises rbd_rq_fn() to make use of them.
First, bio_clone_range() clones a portion of a single bio, starting
at a given offset within the bio and including only as many bytes
as requested. As a convenience, a request to clone the entire bio
is passed directly to bio_clone().
Second, bio_chain_clone_range() performs a similar function,
producing a chain of cloned bio's covering a sub-range of the
source chain. No bio_pair structures are used, and if successful
the result will represent exactly the specified range.
Using bio_chain_clone_range() makes bio_rq_fn() a little easier
to understand, because it avoids the need to pass very much
state information between consecutive calls. By avoiding the need
to track a bio_pair structure, it also eliminates the problem
described here: http://tracker.newdream.net/issues/2933
Note that a block request (and therefore the complete length of
a bio chain processed in rbd_rq_fn()) is an unsigned int, while
the result of rbd_segment_length() is u64. This change makes
this range trunctation explicit, and trips a bug if the the
segment boundary is too far off.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
If we detach due to local read-error (which sets a bit in the bitmap),
stay Primary, and then re-attach (which re-reads the bitmap from disk),
we potentially lost the "out-of-sync" (or, "bad block") information in
the bitmap.
Always (try to) write out the changed bitmap pages before going diskless.
That way, we don't lose the bit for the bad block,
the next resync will fetch it from the peer, and rewrite
it locally, which may result in block reallocation in some
lower layer (or the hardware), and thereby "heal" the bad blocks.
If the bitmap writeout errors out as well, we will (again: try to)
mark the "we need a full sync" bit in our super block,
if it was a READ error; writes are covered by the activity log already.
If that superblock does not make it to disk either, we are sorry.
Maybe we just lost an entire disk or controller (or iSCSI connection),
and there actually are no bad blocks at all, so we don't need to
re-fetch from the peer, there is no "auto-healing" necessary.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- struct drbd_conf { ... unsigned long flags; ... }
+ struct drbd_conf { ... unsigned long drbd_flags[N]; ... }
And introduce wrapper functions for test/set/clear bit operations
on this member.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The intention of force-detach is to be able to deal with a completely
unresponsive lower level IO stack, which does not even deliver error
completions anymore, but no completion at all.
In all other cases, we must still wait for the meta data IO completion.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This has not yet been observed, but conceivably, when using GFP_KERNEL
allocations from drbd_md_sync(), drbd_flush_after_epoch() or
receive_SyncParam(), we could trigger additional IO to our own device,
or an other device in a criss-cross setup, and end up in a local
deadlock, or potentially a distributed deadlock in a criss-cross setup
involving the peer blocked in a similar way waiting for us to make
progress.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The former comment arguing that GFP_KERNEL was good enough was wrong: it
did not take resize into account at all, and assumed the only path
leading here was the normal attach on a still secondary device, so no
deadlock would be possible.
Both resize on a Primary, or attach on a diskless Primary,
could potentially deadlock.
drbd_bm_resize() is called while IO to the respective device is
suspended, so we must use GFP_NOIO to avoid potential deadlock.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
"aborting" requests, or force-detaching the disk, is intended for
completely blocked/hung local backing devices which do no longer
complete requests at all, not even do error completions. In this
situation, usually a hard-reset and failover is the only way out.
By "aborting", basically faking a local error-completion,
we allow for a more graceful swichover by cleanly migrating services.
Still the affected node has to be rebooted "soon".
By completing these requests, we allow the upper layers to re-use
the associated data pages.
If later the local backing device "recovers", and now DMAs some data
from disk into the original request pages, in the best case it will
just put random data into unused pages; but typically it will corrupt
meanwhile completely unrelated data, causing all sorts of damage.
Which means delayed successful completion,
especially for READ requests,
is a reason to panic().
We assume that a delayed *error* completion is OK,
though we still will complain noisily about it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Disconnecting is a cluster wide state change. In case the peer node agrees
to the state transition, it sends back the fact on the meta-data connection
and closes both sockets.
In case the node node that initiated the state transfer sees the closing
action on the data-socket, before the P_STATE_CHG_REPLY packet, it was
going into one of the network failure states.
At least with the fencing option set to something else thatn "dont-care",
the unclean shutdown of the connection causes a short IO freeze or
a fence operation.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The DISCARD_CONCURRENT flag should be set on one node and cleared on the
other node.
As the code was before it was theoretical possible that a node accepts the
meta socket, but has to close it later on, and keeps the DISCARD_CONCURRENT
flag.
Correct this by moving the clear_bit(DISCARD_CONCURRENT) where the packet
gets sent.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We now can schedule only a specific range of sectors for online verify,
or interrupt a running verify without interrupting the connection.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is at least the worker context, the receiver context, the context of
receiving netlink packts and processes reading a sysfs attribute that access
the uuids.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
xfstests has always had random failures of tests due to loop devices
failing to be torn down and hence leaving filesytems that cannot be
unmounted. This causes test runs to immediately stop.
Over the past 6 or 7 years we've added hacks like explicit unmount
-d commands for loop mounts, losetup -d after unmount -d fails, etc,
but still the problems persist. Recently, the frequency of loop
related failures increased again to the point that xfstests 259 will
reliably fail with a stray loop device that was not torn down.
That is despite the fact the test is above as simple as it gets -
loop 5 or 6 times running mkfs.xfs with different paramters:
lofile=$(losetup -f)
losetup $lofile "$testfile"
"$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!"
sync
losetup -d $lofile
And losteup -d $lofile is failing with EBUSY on 1-3 of these loops
every time the test is run.
Turns out that blkid is running simultaneously with losetup -d, and
so it sees an elevated reference count and returns EBUSY. But why
is blkid running? It's obvious, isn't it? udev has decided to try
and find out what is on the block device as a result of a creation
notification. And it is racing with mkfs, so might still be scanning
the device when mkfs finishes and we try to tear it down.
So, make losetup -d force autoremove behaviour. That is, when the
last reference goes away, tear down the device. xfstests wants it
*gone*, not causing random teardown failures when we know that all
the operations the tests have specifically run on the device have
completed and are no longer referencing the loop device.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Added appropriate timeout value for secure erase based on identify device data
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Changing the type of bdev parameters to be unsigned int :1, rather than bool.
This is more consistent with the types of other features in the block drivers.
Signed-off-by: Oliver Chick <oliver.chick@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This is a small cleanup, that also may turn error handling of
unitialized disks more readable. We don't need a separate variable to
track allocated disks, remove dr and reuse drive variable instead.
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The same checks to see if a drive can be or is registered are
repeated through the code, factor out the checks in a common function
and replace the repeated checks with it.
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On floppy initialization, if something failed inside the loop we call
add_disk, there was no cleanup of previous iterations in the error
handling.
Cc: stable@vger.kernel.org
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since commit 070ad7e ("floppy: convert to delayed work and single-thread
wq"), we end up calling alloc_ordered_workqueue multiple times inside
the loop, which shouldn't be intended. Besides the leak, other side
effect in the current code is if blk_init_queue fails, we would end up
calling unregister_blkdev even if we didn't call yet register_blkdev.
Just moved the allocation of floppy_wq before the loop, and adjusted the
code accordingly.
Cc: stable@vger.kernel.org # 3.5+
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static?
drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static?
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The rbd_device structure has an embedded rbd_options structure.
Such a structure is needed to work with the generic ceph argument
parsing code, but there's no need to keep it around once argument
parsing is done.
Use a local variable to hold the rbd options used in parsing in
rbd_get_client(), and just transfer its content (it's just a
read_only flag) into the field in the rbd_mapping sub-structure
that requires that information.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
The aim of this patch is to make what's going on rbd_merge_bvec() a
bit more obvious than it was before. This was an issue when a
recent btrfs bug led us to question whether the merge function was
working correctly.
Use "obj" rather than "chunk" to indicate the units whose boundaries
we care about we call (rados) "objects".
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Change RBD_MAX_SNAP_NAME_LEN to be based on NAME_MAX. That is a
practical limit for the length of a snapshot name (based on the
presence of a directory using the name under /sys/bus/rbd to
represent the snapshot).
The /sys entry is created by prefixing it with "snap_"; define that
prefix symbolically, and take its length into account in defining
the snapshot name length limit.
Enforce the limit in rbd_add_parse_args(). Also delete a dout()
call in that function that was not meant to be committed.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This adds a verification that an rbd image's object order is
within the upper and lower bounds supported by this implementation.
It must be at least 9 (SECTOR_SHIFT), because the Linux bio system
assumes that minimum granularity.
It also must be less than 32 (at the moment anyway) because there
exist spots in the code that store the size of a "segment" (object
backing an rbd image) in a signed int variable, which can be 32 bits
including the sign. We should be able to relax this limit once
we've verified the code uses 64-bit types where needed.
Note that the CLI tool already limits the order to the range 12-25.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The two calls to rbd_do_op() from rbd_rq_fn() differ only in the
value passed for the snapshot id and the snapshot context.
For reads the snapshot always comes from the mapping, and for writes
the snapshot id is always CEPH_NOSNAP.
The snapshot context is always null for reads. For writes, the
snapshot context always comes from the rbd header, but it is
acquired under protection of header semaphore and could change
thereafter, so we can't simply use what's available inside
rbd_do_op().
Eliminate the snapid parameter from rbd_do_op(), and set it
based on the I/O direction inside that function instead. Always
pass the snapshot context acquired in the caller, but reset it
to a null pointer inside rbd_do_op() if the operation is a read.
As a result, there is no difference in the read and write calls
to rbd_do_op() made in rbd_rq_fn(), so just call it unconditionally.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The only callers of rbd_do_op() are in rbd_rq_fn(), where call one
is used for writes and the other used for reads. The request passed
to rbd_do_op() already encodes the I/O direction, and that
information can be used inside the function to set the opcode and
flags value (rather than passing them in as arguments).
So get rid of the opcode and flags arguments to rbd_do_op().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Both rbd_req_read() and rbd_req_write() are simple wrapper routines
for rbd_do_op(), and each is only called once. Replace each wrapper
call with a direct call to rbd_do_op(), and get rid of the wrapper
functions.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
The name of the "read-only" mapping option was inadvertently changed
in this commit:
f84344f3 rbd: separate mapping info in rbd_dev
Revert that hunk to return it to what it should be.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
When rbd_dev_probe() calls rbd_dev_image_id() it expects to get
a 0 return code if successful, but it is getting a positive value.
The reason is that rbd_dev_image_id() returns the value it gets from
rbd_req_sync_exec(), which returns the number of bytes read in as a
result of the request. (This ultimately comes from
ceph_copy_from_page_vector() in rbd_req_sync_op()).
Force the return value to 0 when successful in rbd_dev_image_id().
Do the same in rbd_dev_v2_object_prefix().
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
In rbd_dev_id_put(), there's a loop that's intended to determine
the maximum device id in use. But it isn't doing that at all,
the effect of how it's written is to simply use the just-put id
number, which ignores whole purpose of this function.
Fix the bug.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull block IO update from Jens Axboe:
"Core block IO bits for 3.7. Not a huge round this time, it contains:
- First series from Kent cleaning up and generalizing bio allocation
and freeing.
- WRITE_SAME support from Martin.
- Mikulas patches to prevent O_DIRECT crashes when someone changes
the block size of a device.
- Make bio_split() work on data-less bio's (like trim/discards).
- A few other minor fixups."
Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton. It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6: "mm: replace vma prio_tree with an interval tree").
So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.
* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
block: makes bio_split support bio without data
scatterlist: refactor the sg_nents
scatterlist: add sg_nents
fs: fix include/percpu-rwsem.h export error
percpu-rw-semaphore: fix documentation typos
fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
blockdev: turn a rw semaphore into a percpu rw semaphore
Fix a crash when block device is read and block size is changed at the same time
block: fix request_queue->flags initialization
block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
block: ioctl to zero block ranges
block: Make blkdev_issue_zeroout use WRITE SAME
block: Implement support for WRITE SAME
block: Consolidate command flag and queue limit checks for merges
block: Clean up special command handling logic
block/blk-tag.c: Remove useless kfree
block: remove the duplicated setting for congestion_threshold
block: reject invalid queue attribute values
block: Add bio_clone_bioset(), bio_clone_kmalloc()
block: Consolidate bio_alloc_bioset(), bio_kmalloc()
...
Now that v2 images support is fully implemented, have
rbd_dev_v2_probe() return 0 to indicate a successful probe.
(Note that an image that implements layering will fail
the probe early because of the feature chekc.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Version 2 images have two sets of feature bit fields. The first
indicates features possibly used by the image. The second indicates
features that the client *must* support in order to use the image.
When an image (or snapshot) is first examined, we need to make sure
that the local implementation supports the image's required
features. If not, fail the probe for the image.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Define a new function rbd_dev_v2_refresh() to update/refresh the
snapshot context for a format version 2 rbd image. This function
will update anything that is not fixed for the life of an rbd
image--at the moment this is mainly the snapshot context and (for
a base mapping) the size.
Update rbd_refresh_header() so it selects which function to use
based on the image format.
Rename __rbd_refresh_header() to be rbd_dev_v1_refresh()
to be consistent with the naming of its version 2 counterpart.
Similarly rename rbd_refresh_header() to be rbd_dev_refresh().
Unrelated--we use rbd_image_format_valid() here. Delete the other
use of it, which was primarily put in place to ensure that function
was referenced at the time it was defined.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Encapsulate the code that handles updating the size of a mapping
after an rbd image has been refreshed. This is done in anticipation
of the next patch, which will make this common code for format 1 and
2 images.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Pull ceph updates from Sage Weil:
"The bulk of this pull is a series from Alex that refactors and cleans
up the RBD code to lay the groundwork for supporting the new image
format and evolving feature set. There are also some cleanups in
libceph, and for ceph there's fixed validation of file striping
layouts and a bugfix in the code handling a shrinking MDS cluster."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
ceph: avoid 32-bit page index overflow
ceph: return EIO on invalid layout on GET_DATALOC ioctl
rbd: BUG on invalid layout
ceph: propagate layout error on osd request creation
libceph: check for invalid mapping
ceph: convert to use le32_add_cpu()
ceph: Fix oops when handling mdsmap that decreases max_mds
rbd: update remaining header fields for v2
rbd: get snapshot name for a v2 image
rbd: get the snapshot context for a v2 image
rbd: get image features for a v2 image
rbd: get the object prefix for a v2 rbd image
rbd: add code to get the size of a v2 rbd image
rbd: lay out header probe infrastructure
rbd: encapsulate code that gets snapshot info
rbd: add an rbd features field
rbd: don't use index in __rbd_add_snap_dev()
rbd: kill create_snap sysfs entry
rbd: define rbd_dev_image_id()
rbd: define some new format constants
...
Pull virtio changes from Rusty Russell:
"New workflow: same git trees pulled by linux-next get sent straight to
Linus. Git is awkward at shuffling patches compared with quilt or mq,
but that doesn't happen often once things get into my -next branch."
* 'virtio-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (24 commits)
lguest: fix occasional crash in example launcher.
virtio-blk: Disable callback in virtblk_done()
virtio_mmio: Don't attempt to create empty virtqueues
virtio_mmio: fix off by one error allocating queue
drivers/virtio/virtio_pci.c: fix error return code
virtio: don't crash when device is buggy
virtio: remove CONFIG_VIRTIO_RING
virtio: add help to CONFIG_VIRTIO option.
virtio: support reserved vqs
virtio: introduce an API to set affinity for a virtqueue
virtio-ring: move queue_index to vring_virtqueue
virtio_balloon: not EXPERIMENTAL any more.
virtio-balloon: dependency fix
virtio-blk: fix NULL checking in virtblk_alloc_req()
virtio-blk: Add REQ_FLUSH and REQ_FUA support to bio path
virtio-blk: Add bio-based IO path for virtio-blk
virtio: console: fix error handling in init() function
tools: Fix pthread flag for Makefile of trace-agent used by virtio-trace
tools: Add guest trace agent as a user tool
virtio/console: Allocate scatterlist according to the current pipe size
...
Pull ADM Xen support from Konrad Rzeszutek Wilk:
Features:
* Allow a Linux guest to boot as initial domain and as normal guests
on Xen on ARM (specifically ARMv7 with virtualized extensions). PV
console, block and network frontend/backends are working.
Bug-fixes:
* Fix compile linux-next fallout.
* Fix PVHVM bootup crashing.
The Xen-unstable hypervisor (so will be 4.3 in a ~6 months), supports
ARMv7 platforms.
The goal in implementing this architecture is to exploit the hardware
as much as possible. That means use as little as possible of PV
operations (so no PV MMU) - and use existing PV drivers for I/Os
(network, block, console, etc). This is similar to how PVHVM guests
operate in X86 platform nowadays - except that on ARM there is no need
for QEMU. The end result is that we share a lot of the generic Xen
drivers and infrastructure.
Details on how to compile/boot/etc are available at this Wiki:
http://wiki.xen.org/wiki/Xen_ARMv7_with_Virtualization_Extensions
and this blog has links to a technical discussion/presentations on the
overall architecture:
http://blog.xen.org/index.php/2012/09/21/xensummit-sessions-new-pvh-virtualisation-mode-for-arm-cortex-a15arm-servers-and-x86/
* tag 'stable/for-linus-3.7-arm-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (21 commits)
xen/xen_initial_domain: check that xen_start_info is initialized
xen: mark xen_init_IRQ __init
xen/Makefile: fix dom-y build
arm: introduce a DTS for Xen unprivileged virtual machines
MAINTAINERS: add myself as Xen ARM maintainer
xen/arm: compile netback
xen/arm: compile blkfront and blkback
xen/arm: implement alloc/free_xenballooned_pages with alloc_pages/kfree
xen/arm: receive Xen events on ARM
xen/arm: initialize grant_table on ARM
xen/arm: get privilege status
xen/arm: introduce CONFIG_XEN on ARM
xen: do not compile manage, balloon, pci, acpi, pcpu and cpu_hotplug on ARM
xen/arm: Introduce xen_ulong_t for unsigned long
xen/arm: Xen detection and shared_info page mapping
docs: Xen ARM DT bindings
xen/arm: empty implementation of grant_table arch specific functions
xen/arm: sync_bitops
xen/arm: page.h definitions
xen/arm: hypercalls
...