Commit Graph

2270 Commits

Author SHA1 Message Date
Matthew Wilcox
d534df3c73 NVMe: Rename nvme_req_info to nvme_bio
There are too many things called 'info' in this driver.  This data
structure is auxiliary information for a struct bio, so call it nvme_bio,
or nbio when used as a variable.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Shane Michael Matthews
e025344c56 NVMe: Initial PRP List support
Add a pointer to the nvme_req_info to hold a new data structure
(nvme_prps) which contains a list of the pages allocated to this
particular request for holding PRP list entries.  nvme_setup_prps()
now returns this pointer.

To allocate and free the memory used for PRP lists, we need a struct
device, so we need to pass the nvme_queue pointer to many functions
which didn't use to need it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox
51882d00f0 NVMe: Advance the sg pointer when filling in an sg list
For multipage BIOs, we were always using sg[0] instead of advancing
through the list.  Oops :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox
d2d8703481 NVMe: Renumber the special context values
If POISON_POINTER_DELTA isn't defined, ensure they're in page 0 which
should never be mapped.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox
9294bbed78 NVMe: Handle the congestion list a little better
In the bio completion handler, check for bios on the congestion list
for this NVM queue.  Also, lock the congestion list in the make_request
function as the queue may end up being shared between multiple CPUs.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox
e85248e516 NVMe: Record the timeout for each command
In addition to recording the completion data for each command, record
the anticipated completion time.  Choose a timeout of 5 seconds for
normal I/Os and 60 seconds for admin I/Os.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox
ec6ce618d6 NVMe: Need to lock queue during interrupt handling
If we're sharing a queue between multiple CPUs and we cancel a sync I/O,
we must have the queue locked to avoid corrupting the stack of the thread
that submitted the I/O.  It turns out this is the same locking that's needed
for the threaded irq handler, so share that code.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:56 -04:00
Matthew Wilcox
48e3d39816 NVMe: Detect command IDs completing that are out of range
If the adapter completes a command ID that is outside the bounds of
the array, return CMD_CTX_INVALID instead of random data, and print a
message in the sync_completion handler (which is rapidly becoming the
misc completion handler :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox
b36235df01 NVMe: Detect commands that are completed twice
Set the context value to CMD_CTX_COMPLETED, and print a message in the
sync_completion handler if we see it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox
be7b62754e NVMe: Use a symbolic name to represent cancelled commands instead of 0
I have plans for other special values in sync_completion.  Plus, this
is more self-documenting, and lets us detect bogus usages.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox
58ffacb545 NVMe: Add a module parameter to use a threaded interrupt
We're currently calling bio_endio from hard interrupt context.  This is
not a good idea for preemptible kernels as it will cause longer latencies.
Using a threaded interrupt will run the entire queue processing mechanism
(including bio_endio) in a thread, which can be preempted.  Unfortuantely,
it also adds about 7us of latency to the single-I/O case, so make it a
module parameter for the moment.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox
b1ad37efca NVMe: Call put_nvmeq() before calling nvme_submit_sync_cmd()
We can't have preemption disabled when we call schedule().  Accept the
possibility that we'll get preempted, and it'll cost us some cacheline
bounces.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox
3c0cf138d7 NVMe: Allow fatal signals to interrupt I/O
If the user sends a fatal signal, sleeping in the TASK_KILLABLE state
permits the task to be aborted.  The only wrinkle is making sure that
if/when the command completes later that it doesn't upset anything.
Handle this by setting the data pointer to 0, and checking the value
isn't NULL in the sync completion path.  Eventually, bios can be cancelled
through this path too.  Note that the cmdid isn't freed to prevent reuse.

We should also abort the command in the future, but this is a good start.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:55 -04:00
Matthew Wilcox
db5d0c198d NVMe: Release 0.2
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox
6ee44cdced NVMe: Add download / activate firmware ioctls
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox
388f037f4e NVMe: Move sysfs entries to the right place
Because I wasn't setting driverfs_dev, the devices were showing up under
/sys/devices/virtual/block.  Now they appear underneath the PCI device
which they belong to.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Shane Michael Matthews
5911f20039 NVMe: Disable the device before we write the admin queues
In case the card has been left in a partially-configured state,
write 0 to the Enable bit.

Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox
574e8b95bc NVMe: Request I/O regions
Calling pci_request_selected_regions() reserves these regions for our use.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:54 -04:00
Matthew Wilcox
2930353f9f NVMe: Allow queues to be allocated above 4GB
Need to call dma_set_coherent_mask() to allow queues to be allocated
above 4GB.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox
f64d3365a3 NVMe: Enable device DMA
Need to call pci_set_master() to enable device DMA

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Shane Michael Matthews
0ee5a7d7cb NVMe: Enable and disable the PCI device
Call pci_enable_device_mem() at initialisation and pci_disable_device
at exit.

Signed-off-by: Shane Michael Matthews <shane.matthews@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox
3f85d50b60 NVMe: Check returns from nvme_alloc_queue()
It can return NULL, so handle that.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox
8e9f0e7115 NVMe: Remove 'node' from nvme_dev
We don't keep a list of nvme_dev any more

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox
51814232ec NVMe: Read the model, serial & firmware rev from the controller
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox
a53295b699 NVMe: Add NVME_IOCTL_SUBMIT_IO
Allow userspace to submit synchronous I/O like the SCSI sg interface does.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:53 -04:00
Matthew Wilcox
7fc3cdabba NVMe: Create nvme_map_user_pages() and nvme_unmap_user_pages()
These are generalisations of the code that was in
nvme_submit_user_admin_command().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox
bd38c5557c NVMe: Change NVME_IOCTL_GET_RANGE_TYPE to return all the ranges
Factor out most of nvme_identify() into a new nvme_submit_user_admin_command()
function.  Change nvme_get_range_type() to call it and change nvme_ioctl to
realise that it's getting back all 64 ranges.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox
b8deb62cf2 NVMe: Zero the command before we send it
Make sure there's no left-over bits set from previous commands that used
this slot.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox
ff22b54fda NVMe: Add nvme_setup_prps()
Generalise the code from nvme_identify() that sets PRP1 & PRP2 so that
it's usable for commands sent by nvme_submit_bio_queue().

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox
36c14ed9ca NVMe: Use PRP2 for the nvme_identify ioctl
DMA the result straight to userspace instead of bounce-buffering in the
kernel.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:52 -04:00
Matthew Wilcox
53c9577e9c NVMe: Fix admin IRQ claim on real hardware
The admin IRQ is supposed to use the pin-based (or single message MSI)
interrupt.  Accomplish this by filling in entry[0]'s vector with the
INTx irq number.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
821234603b NVMe: Rename 'cycle' to 'phase'
It's called the phase bit in the current draft

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
1b23484bd0 NVMe: Implement per-CPU queues
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
b3b06812e1 NVMe: Reduce set_queue_count arguments by one
sq_count and cq_count are always the same, so just call it 'count'.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
3001082cac NVMe: Factor out queue_request_irq()
Two callers with an almost identical long string of arguments, and
introducing a third soon.  Time to factor out the commonalities.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Matthew Wilcox
b60503ba43 NVMe: New driver
This driver is for devices that follow the NVM Express standard

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2011-11-04 15:52:51 -04:00
Michael S. Tsirkin
5087a50e66 virtio-blk: use ida to allocate disk index
Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while.  It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-11-02 11:41:02 +10:30
Paul Gortmaker
0c8d44f239 block: Fix files that are modules and hence need module.h
We want to remove the implicit everywhere presence of module.h
so fix up the people relying on that implicit presence in advance.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:13 -04:00
Paul Gortmaker
d5decd3b95 block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros
These files were getting <linux/module.h> via an implicit include
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason.  Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:12 -04:00
Michael S. Tsirkin
a0eda62552 virtio-blk: use ida to allocate disk index
Based on a patch by Mark Wu <dwu@redhat.com>

Current index allocation in virtio-blk is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while.  It also could cause confusion about the disk
name in the case of hot-plugging disks.
Change virtio-blk to use ida to allocate index, instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-31 08:05:36 +01:00
Linus Torvalds
97d2eb13a0 Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client
* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
  libceph: fix double-free of page vector
  ceph: fix 32-bit ino numbers
  libceph: force resend of osd requests if we skip an osdmap
  ceph: use kernel DNS resolver
  ceph: fix ceph_monc_init memory leak
  ceph: let the set_layout ioctl set single traits
  Revert "ceph: don't truncate dirty pages in invalidate work thread"
  ceph: replace leading spaces with tabs
  libceph: warn on msg allocation failures
  libceph: don't complain on msgpool alloc failures
  libceph: always preallocate mon connection
  libceph: create messenger with client
  ceph: document ioctls
  ceph: implement (optional) max read size
  ceph: rename rsize -> rasize
  ceph: make readpages fully async
2011-10-28 16:42:18 -07:00
David Vrabel
2d073846b8 block: xen-blkback: use API provided by xenbus module to map rings
The xenbus module provides xenbus_map_ring_valloc() and
xenbus_map_ring_vfree().  Use these to map the ring pages granted by
the frontend.

Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-26 10:02:35 -04:00
Sage Weil
6ab00d465a libceph: create messenger with client
This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil <sage@newdream.net>
2011-10-25 16:10:15 -07:00
Linus Torvalds
59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
Linus Torvalds
31018acd4c Merge branches 'stable/bug.fixes-3.2' and 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/p2m/debugfs: Make type_name more obvious.
  xen/p2m/debugfs: Fix potential pointer exception.
  xen/enlighten: Fix compile warnings and set cx to known value.
  xen/xenbus: Remove the unnecessary check.
  xen/irq: If we fail during msi_capability_init return proper error code.
  xen/events: Don't check the info for NULL as it is already done.
  xen/events: BUG() when we can't allocate our event->irq array.

* 'stable/mmu.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fix selfballooning and ensure it doesn't go too far
  xen/gntdev: Fix sleep-inside-spinlock
  xen: modify kernel mappings corresponding to granted pages
  xen: add an "highmem" parameter to alloc_xenballooned_pages
  xen/p2m: Use SetPagePrivate and its friends for M2P overrides.
  xen/p2m: Make debug/xen/mmu/p2m visible again.
  Revert "xen/debug: WARN_ON when identity PFN has no _PAGE_IOMAP flag set."
2011-10-25 09:17:47 +02:00
Jens Axboe
83157223de Merge branch 'for-linus' into for-3.2/core 2011-10-24 16:24:38 +02:00
Mike Miller
ab5dbebe33 cciss: add small delay when using PCI Power Management to reset for kump
The P600 requires a small delay when changing states. Otherwise we may think
the board did not reset and we bail. This for kdump only and is particular
to the P600.

Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-20 22:21:52 +02:00
Jens Axboe
b8d8bdfe31 Merge branch 'stable/for-jens-3.2' of git://oss.oracle.com/git/kwilk/xen into for-3.2/drivers 2011-10-20 15:10:59 +02:00
Jens Axboe
5c04b426f2 Merge branch 'v3.1-rc10' into for-3.2/core
Conflicts:
	block/blk-core.c
	include/linux/blkdev.h

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2011-10-19 14:30:42 +02:00
Konrad Rzeszutek Wilk
6927d92091 xen/blkback: Fix two races in the handling of barrier requests.
There are two windows of opportunity to cause a race when
processing a barrier request. This patch fixes this.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-10-17 14:28:57 -04:00