Commit Graph

250 Commits

Author SHA1 Message Date
Linus Torvalds
d3b5d35290 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main x86 MM changes in this cycle were:

   - continued native kernel PCID support preparation patches to the TLB
     flushing code (Andy Lutomirski)

   - various fixes related to 32-bit compat syscall returning address
     over 4Gb in applications, launched from 64-bit binaries - motivated
     by C/R frameworks such as Virtuozzo. (Dmitry Safonov)

   - continued Intel 5-level paging enablement: in particular the
     conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)

   - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)

   - ... plus misc updates, fixes and cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
  x86/mm: Fix flush_tlb_page() on Xen
  x86/mm: Make flush_tlb_mm_range() more predictable
  x86/mm: Remove flush_tlb() and flush_tlb_current_task()
  x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
  x86/mm/64: Fix crash in remove_pagetable()
  Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
  x86/boot/e820: Remove a redundant self assignment
  x86/mm: Fix dump pagetables for 4 levels of page tables
  x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
  x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
  Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
  x86/espfix: Add support for 5-level paging
  x86/kasan: Extend KASAN to support 5-level paging
  x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
  x86/paravirt: Add 5-level support to the paravirt code
  x86/mm: Define virtual memory map for 5-level paging
  x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
  x86/boot: Detect 5-level paging support
  x86/mm/numa: Remove numa_nodemask_from_meminfo()
  ...
2017-05-01 23:54:56 -07:00
Dan Williams
7138970383 mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
The x86 conversion to the generic GUP code included a small change which causes
crashes and data corruption in the pmem code - not good.

The root cause is that the /dev/pmem driver code implicitly relies on the x86
get_user_pages() implementation doing a get_page() on the page refcount, because
get_page() does a get_zone_device_page() which properly refcounts pmem's separate
page struct arrays that are not present in the regular page struct structures.
(The pmem driver does this because it can cover huge memory areas.)

But the x86 conversion to the generic GUP code changed the get_page() to
page_cache_get_speculative() which is faster but doesn't do the
get_zone_device_page() call the pmem code relies on.

One way to solve the regression would be to change the generic GUP code to use
get_page(), but that would slow things down a bit and punish other generic-GUP
using architectures for an x86-ism they did not care about. (Arguably the pmem
driver was probably not working reliably for them: but nvdimm is an Intel
feature, so non-x86 exposure is probably still limited.)

So restructure the pmem code's interface with the MM instead: get rid of the
get/put_zone_device_page() distinction, integrate put_zone_device_page() into
__put_page() and and restructure the pmem completion-wait and teardown machinery:

Kirill points out that the calls to {get,put}_dev_pagemap() can be
removed from the mm fast path if we take a single get_dev_pagemap()
reference to signify that the page is alive and use the final put of the
page to drop that reference.

This does require some care to make sure that any waits for the
percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
since it now maintains its own elevated reference.

This speeds up things while also making the pmem refcounting more robust going
forward.

Suggested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-05-01 09:15:53 +02:00
Dan Williams
4aa5615e08 libnvdimm: band aid btt vs clear poison locking
The following warning results from holding a lane spinlock,
preempt_disable(), or the btt map spinlock and then trying to take the
reconfig_mutex to walk the poison list and potentially add new entries.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
 in_atomic(): 1, irqs_disabled(): 0, pid: 17159, name: dd
 [..]
 Call Trace:
  dump_stack+0x85/0xc8
  ___might_sleep+0x184/0x250
  __might_sleep+0x4a/0x90
  __mutex_lock+0x58/0x9b0
  ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
  ? __nvdimm_bus_badblocks_clear+0x2f/0x60 [libnvdimm]
  ? acpi_nfit_forget_poison+0x79/0x80 [nfit]
  ? _raw_spin_unlock+0x27/0x40
  mutex_lock_nested+0x1b/0x20
  nvdimm_bus_lock+0x21/0x30 [libnvdimm]
  nvdimm_forget_poison+0x25/0x50 [libnvdimm]
  nvdimm_clear_poison+0x106/0x140 [libnvdimm]
  nsio_rw_bytes+0x164/0x270 [libnvdimm]
  btt_write_pg+0x1de/0x3e0 [nd_btt]
  ? blk_queue_enter+0x30/0x290
  btt_make_request+0x11a/0x310 [nd_btt]
  ? blk_queue_enter+0xb7/0x290
  ? blk_queue_enter+0x30/0x290
  generic_make_request+0x118/0x3b0

As a minimal fix, disable error clearing when the BTT is enabled for the
namespace. For the final fix a larger rework of the poison list locking
is needed.

Note that this is not a problem in the blk case since that path never
calls nvdimm_clear_poison().

Cc: <stable@vger.kernel.org>
Fixes: 82bf1037f2 ("libnvdimm: check and clear poison before writing to pmem")
Cc: Dave Jiang <dave.jiang@intel.com>
[jeff: dynamically disable error clearing in the btt case]
Suggested-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-10 17:21:45 -07:00
Dan Williams
0beb2012a1 libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
Holding the reconfig_mutex over a potential userspace fault sets up a
lockdep dependency chain between filesystem-DAX and the libnvdimm ioctl
path. Move the user access outside of the lock.

     [ INFO: possible circular locking dependency detected ]
     4.11.0-rc3+ #13 Tainted: G        W  O
     -------------------------------------------------------
     fallocate/16656 is trying to acquire lock:
      (&nvdimm_bus->reconfig_mutex){+.+.+.}, at: [<ffffffffa00080b1>] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
     but task is already holding lock:
      (jbd2_handle){++++..}, at: [<ffffffff813b4944>] start_this_handle+0x104/0x460

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #2 (jbd2_handle){++++..}:
            lock_acquire+0xbd/0x200
            start_this_handle+0x16a/0x460
            jbd2__journal_start+0xe9/0x2d0
            __ext4_journal_start_sb+0x89/0x1c0
            ext4_dirty_inode+0x32/0x70
            __mark_inode_dirty+0x235/0x670
            generic_update_time+0x87/0xd0
            touch_atime+0xa9/0xd0
            ext4_file_mmap+0x90/0xb0
            mmap_region+0x370/0x5b0
            do_mmap+0x415/0x4f0
            vm_mmap_pgoff+0xd7/0x120
            SyS_mmap_pgoff+0x1c5/0x290
            SyS_mmap+0x22/0x30
            entry_SYSCALL_64_fastpath+0x1f/0xc2

    -> #1 (&mm->mmap_sem){++++++}:
            lock_acquire+0xbd/0x200
            __might_fault+0x70/0xa0
            __nd_ioctl+0x683/0x720 [libnvdimm]
            nvdimm_ioctl+0x8b/0xe0 [libnvdimm]
            do_vfs_ioctl+0xa8/0x740
            SyS_ioctl+0x79/0x90
            do_syscall_64+0x6c/0x200
            return_from_SYSCALL_64+0x0/0x7a

    -> #0 (&nvdimm_bus->reconfig_mutex){+.+.+.}:
            __lock_acquire+0x16b6/0x1730
            lock_acquire+0xbd/0x200
            __mutex_lock+0x88/0x9b0
            mutex_lock_nested+0x1b/0x20
            nvdimm_bus_lock+0x21/0x30 [libnvdimm]
            nvdimm_forget_poison+0x25/0x50 [libnvdimm]
            nvdimm_clear_poison+0x106/0x140 [libnvdimm]
            pmem_do_bvec+0x1c2/0x2b0 [nd_pmem]
            pmem_make_request+0xf9/0x270 [nd_pmem]
            generic_make_request+0x118/0x3b0
            submit_bio+0x75/0x150

Cc: <stable@vger.kernel.org>
Fixes: 62232e45f4 ("libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices")
Cc: Dave Jiang <dave.jiang@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-10 17:21:45 -07:00
Dan Williams
fe514739d8 libnvdimm: fix blk free space accounting
Commit a1f3e4d6a0 "libnvdimm, region: update nd_region_available_dpa()
for multi-pmem support" reworked blk dpa (DIMM Physical Address)
accounting to comprehend multiple pmem namespace allocations aliasing
with a given blk-dpa range.

The following call trace is a result of failing to account for allocated
blk capacity.

 WARNING: CPU: 1 PID: 2433 at tools/testing/nvdimm/../../../drivers/nvdimm/names
4 size_store+0x6f3/0x930 [libnvdimm]
 nd_region region5: allocation underrun: 0x0 of 0x1000000 bytes
 [..]
 Call Trace:
  dump_stack+0x86/0xc3
  __warn+0xcb/0xf0
  warn_slowpath_fmt+0x5f/0x80
  size_store+0x6f3/0x930 [libnvdimm]
  dev_attr_store+0x18/0x30

If a given blk-dpa allocation does not alias with any pmem ranges then
the full allocation should be accounted as busy space, not the size of
the current pmem contribution to the region.

The thinkos that led to this confusion was not realizing that the struct
resource management is already guaranteeing no collisions between pmem
allocations and blk allocations on the same dimm. Also, we do not try to
support blk allocations in aliased pmem holes.

This patch also fixes a case where the available blk goes negative.

Cc: <stable@vger.kernel.org>
Fixes: a1f3e4d6a0 ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support").
Reported-by: Dariusz Dokupil <dariusz.dokupil@intel.com>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-04 15:08:36 -07:00
Dan Williams
86ef58a4e3 nfit, libnvdimm: fix interleave set cookie calculation
The interleave-set cookie is a sum that sanity checks the composition of
an interleave set has not changed from when the namespace was initially
created.  The checksum is calculated by sorting the DIMMs by their
location in the interleave-set. The comparison for the sort must be
64-bit wide, not byte-by-byte as performed by memcmp() in the broken
case.

Fix the implementation to accept correct cookie values in addition to
the Linux "memcmp" order cookies, but only allow correct cookies to be
generated going forward. It does mean that namespaces created by
third-party-tooling, or created by newer kernels with this fix, will not
validate on older kernels. However, there are a couple mitigating
conditions:

    1/ platforms with namespace-label capable NVDIMMs are not widely
       available.

    2/ interleave-sets with a single-dimm are by definition not affected
       (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.

The cookie stored in the namespace label will be fixed by any write the
namespace label, the most straightforward way to achieve this is to
write to the "alt_name" attribute of a namespace in sysfs.

Cc: <stable@vger.kernel.org>
Fixes: eaf961536e ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-03-01 00:49:42 -08:00
Dan Williams
bfb34527a3 libnvdimm, pfn: fix memmap reservation size versus 4K alignment
When vmemmap_populate() allocates space for the memmap it does so in 2MB
sized chunks. The libnvdimm-pfn driver incorrectly accounts for this
when the alignment of the device is set to 4K. When this happens we
trigger memory allocation failures in altmap_alloc_block_buf() and
trigger warnings of the form:

 WARNING: CPU: 0 PID: 3376 at arch/x86/mm/init_64.c:656 arch_add_memory+0xe4/0xf0
 [..]
 Call Trace:
  dump_stack+0x86/0xc3
  __warn+0xcb/0xf0
  warn_slowpath_null+0x1d/0x20
  arch_add_memory+0xe4/0xf0
  devm_memremap_pages+0x29b/0x4e0

Fixes: 315c562536 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-02-04 14:47:31 -08:00
Dan Williams
9d032f4201 libnvdimm, namespace: do not delete namespace-id 0
Given that the naming of pmem devices changes from the pmemX form to the
pmemX.Y form when namespace id is greater than 0, arrange for namespaces
with id-0 to be exempt from deletion. Otherwise a simple reconfiguration
of an existing namespace to a new mode results in a name change of the
resulting block device:

    # ndctl list --namespace=namespace1.0
    {
      "dev":"namespace1.0",
      "mode":"raw",
      "size":2147483648,
      "uuid":"3dadf3dc-89b9-4b24-b20e-abc8a4707ce3",
      "blockdev":"pmem1"
    }

    # ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
    {
      "dev":"namespace1.1",
      "mode":"memory",
      "size":2111832064,
      "uuid":"7b4a6341-7318-4219-a02c-fb57c0bbf613",
      "blockdev":"pmem1.1"
    }

This change does require tooling changes to explicitly look for
namespaceX.0 if the seed has already advanced to another namespace.

Cc: <stable@vger.kernel.org>
Fixes: 98a29c39dc ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-01-31 18:18:21 -08:00
Bhumika Goyal
970d14e398 nvdimm: constify device_type structures
Declare device_type structure as const as it is only stored in the
type field of a device structure. This field is of type const, so add
const to declaration of device_type structure.

File size before:
  text	   data	    bss	    dec	    hex	filename
  19278	   3199	     16	  22493	   57dd	nvdimm/namespace_devs.o

File size after:
  text	   data	    bss	    dec	    hex	filename
  19929	   3160	     16	  23105	   5a41	nvdimm/namespace_devs.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-01-31 18:16:30 -08:00
Dan Williams
1f19b983a8 libnvdimm, namespace: fix pmem namespace leak, delete when size set to zero
Commit 98a29c39dc ("libnvdimm, namespace: allow creation of multiple
pmem-namespaces per region") added support for establishing additional
pmem namespace beyond the seed device, similar to blk namespaces.
However, it neglected to delete the namespace when the size is set to
zero.

Fixes: 98a29c39dc ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-01-13 09:50:33 -08:00
Stefan Hajnoczi
d47d1d27fd pmem: return EIO on read_pmem() failure
The read_pmem() function uses memcpy_mcsafe() on x86 where an EFAULT
error code indicates a failed read.  Block I/O should use EIO to
indicate failure.  Other pmem code paths (like bad blocks) already use
EIO so let's be consistent.

This fixes compatibility with consumers like btrfs that try to parse the
specific error code rather than treat all errors the same.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-01-12 16:40:29 -08:00
Linus Torvalds
3be134e515 libnvdimm for 4.10
* Dynamic label support: To date namespace label support has been
 limited to disambiguating cases where PMEM (direct load/store) and BLK
 (mmio aperture) accessed-capacity alias on the same DIMM. Since 4.9 added
 support for multiple namespaces per PMEM-region there is value to
 support namespace labels even in the non-aliasing case. The presence of
 a valid namespace index block force-enables label support when the
 kernel would otherwise rely on region boundaries, and permits the region
 to be sub-divided.
 
 * Handle media errors in namespace metadata: Complement the error
 handling for media errors in namespace data areas with support for
 clearing errors on writes, and downgrading potential machine-check
 exceptions to simple i/o errors on read.
 
 * Device-DAX region attributes: Add 'align', 'id', and 'size' as
 attributes for device-dax regions. In particular this enables userspace
 tooling to generically size memory mapping and i/o operations. Prevent
 userspace from growing assumptions / dependencies about the parent
 device topology for a dax region. A libnvdimm namespace may not always
 be the parent device of a dax region.
 
 * Various cleanups and small fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYVxRcAAoJEB7SkWpmfYgCSnoQAKNs7IJRtGqKFCkMB5VDAW79
 Lifi35Hqm+mfFaLMIyRoKJMnhXKQBsthfcHZIy5t63kDWEV/tJ+riZ+m2Ibz4GPE
 AUn07jxge2q9v0RwMylqpZ/EHyaK/xD1z0+SdIwVF2VaEDoVdnmu2WEqjuir/lfM
 t70B/YoWLg6CcHeTzaV5vdxlRKyG4pzOV3UzPlYLROr3wFJBHD88j/4zcGy3F1ue
 DpzGThhtEQXZUZCJLp54E0ch7V2cz/DNUDYwsjbCZrdYYmUJFqjZQxNyftn28aEJ
 HI/BERXLhm2HF2qRqC+0C1lJmUylYk4wXUGco+b/u1GYN59eFe/JT3dJgxEHFKL2
 nVOUmR8XsRE6gkFWQGcFesRjjI1Rqz2Wgql7oqkSL9grGOwY4rvKX0LhJxgvfuZ0
 Itb5h0dMFUriAP0DaaJFMYfANIOx+mqr4RxDMwP5ZADku6+Ga0LXxvcBt06K/mYO
 /zLo27MhsuuFy/odXm1A21OjFiH2pM3pSCaytKvDjy7DwzgE7ZzXmixXQ7j8YVp6
 OtLYzMjIt8nt4xh0hmav5tb0r9l6mgNqlifacMC9lEM/7SDAiBXXBLuQngJN/j0s
 iXllBk0pYVWibf3VcD6oY+qKdmBWvXxPjgj1lHE6j9A7Gw2jwIt/W2NWzV94nKmC
 f6d+faHiRokqyVhdgfx3
 =YWfP
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The libnvdimm pull request is relatively small this time around due to
  some development topics being deferred to 4.11.

  As for this pull request the bulk of it has been in -next for several
  releases leading to one late fix being added (commit 868f036fee
  ("libnvdimm: fix mishandled nvdimm_clear_poison() return value")). It
  has received a build success notification from the 0day-kbuild robot
  and passes the latest libnvdimm unit tests.

  Summary:

   - Dynamic label support: To date namespace label support has been
     limited to disambiguating cases where PMEM (direct load/store) and
     BLK (mmio aperture) accessed-capacity alias on the same DIMM. Since
     4.9 added support for multiple namespaces per PMEM-region there is
     value to support namespace labels even in the non-aliasing case.
     The presence of a valid namespace index block force-enables label
     support when the kernel would otherwise rely on region boundaries,
     and permits the region to be sub-divided.

   - Handle media errors in namespace metadata: Complement the error
     handling for media errors in namespace data areas with support for
     clearing errors on writes, and downgrading potential machine-check
     exceptions to simple i/o errors on read.

   - Device-DAX region attributes: Add 'align', 'id', and 'size' as
     attributes for device-dax regions. In particular this enables
     userspace tooling to generically size memory mapping and i/o
     operations. Prevent userspace from growing assumptions /
     dependencies about the parent device topology for a dax region. A
     libnvdimm namespace may not always be the parent device of a dax
     region.

   - Various cleanups and small fixes"

* tag 'libnvdimm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  dax: add region 'id', 'size', and 'align' attributes
  libnvdimm: fix mishandled nvdimm_clear_poison() return value
  libnvdimm: replace mutex_is_locked() warnings with lockdep_assert_held
  libnvdimm, pfn: fix align attribute
  libnvdimm, e820: use module_platform_driver
  libnvdimm, namespace: use octal for permissions
  libnvdimm, namespace: avoid multiple sector calculations
  libnvdimm: remove else after return in nsio_rw_bytes()
  libnvdimm, namespace: fix the type of name variable
  libnvdimm: use consistent naming for request_mem_region()
  nvdimm: use the right length of "pmem"
  libnvdimm: check and clear poison before writing to pmem
  tools/testing/nvdimm: dynamic label support
  libnvdimm: allow a platform to force enable label support
  libnvdimm: use generic iostat interfaces
2016-12-18 15:49:10 -08:00
Dan Williams
c44ef859ce Merge branch 'for-4.10/libnvdimm' into libnvdimm-for-next 2016-12-17 15:08:10 -08:00
Dan Williams
868f036fee libnvdimm: fix mishandled nvdimm_clear_poison() return value
Colin, via static analysis, reports that the length could be negative
from nvdimm_clear_poison() in the error case. There was a similar
problem with commit 0a3f27b9a6 "libnvdimm, namespace: avoid multiple
sector calculations" that I noticed when merging the for-4.10/libnvdimm
topic branch into libnvdimm-for-next, but I missed this one. Fix both of
them to the following procedure:

* if we clear a block's worth of media, clear that many blocks in
  badblocks

* if we clear less than the requested size of the transfer return an
  error

* always invalidate cache after any non-error / non-zero
  nvdimm_clear_poison result

Fixes: 82bf1037f2 ("libnvdimm: check and clear poison before writing to pmem")
Fixes: 0a3f27b9a6 ("libnvdimm, namespace: avoid multiple sector calculations")
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Dave Jiang <dave.jiang@intel.com>
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-16 08:10:31 -08:00
Dan Williams
9cf8bd529c libnvdimm: replace mutex_is_locked() warnings with lockdep_assert_held
For warnings that should only ever trigger during development and
testing replace WARN statements with lockdep_assert_held. The lockdep
pattern is prevalent, and these paths are are well covered by libnvdimm
unit tests.

Reported-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-15 20:04:31 -08:00
Linus Torvalds
e7aa8c2eb1 These are the documentation changes for 4.10.
It's another busy cycle for the docs tree, as the sphinx conversion
 continues.  Highlights include:
 
  - Further work on PDF output, which remains a bit of a pain but should be
    more solid now.
 
  - Five more DocBook template files converted to Sphinx.  Only 27 to go...
    Lots of plain-text files have also been converted and integrated.
 
  - Images in binary formats have been replaced with more source-friendly
    versions.
 
  - Various bits of organizational work, including the renaming of various
    files discussed at the kernel summit.
 
  - New documentation for the device_link mechanism.
 
 ...and, of course, lots of typo fixes and small updates.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYTbl7AAoJEI3ONVYwIuV63NIP/REwzThnGWFJMRSuq8Ieq2r9
 sFSQsaGTGlhyKiDoEooo+SO/Za3uTonjK+e7WZg8mhdiEdamta5aociU/71C1Yy/
 T9ur0FhcGblrvZ1NidSDvCLwuECZOMMei7mgLZ9a+KCpc4ANqqTVZSUm1blKcqhF
 XelhVXxBa0ar35l/pVzyCxkdNXRWXv+MJZE8hp5XAdTdr11DS7UY9zrZdH31axtf
 BZlbYJrvB8WPydU6myTjRpirA17Hu7uU64MsL3bNIEiRQ+nVghEzQC8uxeUCvfVx
 r0H5AgGGQeir+e8GEv2T20SPZ+dumXs+y/HehKNb3jS3gV0mo+pKPeUhwLIxr+Zh
 QY64gf+jYf5ISHwAJRnU0Ima72ehObzSbx9Dko10nhq2OvbR5f83gjz9t9jKYFU7
 RDowICA8lwqyRbHRoVfyoW8CpVhWFpMFu3yNeJMckeTish3m7ANqzaWslbsqIP5G
 zxgFMIrVVSbeae+sUeygtEJAnWI09aZ4tuaUXYtGWwu6ikC/3aV6DryP4bthG2LF
 A19uV4nMrLuuh8g2wiTHHjMfjYRwvSn+f9yaolwJhwyNDXQzRPy+ZJ3W/6olOkXC
 bAxTmVRCW5GA/fmSrfXmW1KbnxlWfP2C62hzZQ09UHxzTHdR97oFLDQdZhKo1uwf
 pmSJR0hVeRUmA4uw6+Su
 =A0EV
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.10' of git://git.lwn.net/linux

Pull documentation update from Jonathan Corbet:
 "These are the documentation changes for 4.10.

  It's another busy cycle for the docs tree, as the sphinx conversion
  continues. Highlights include:

   - Further work on PDF output, which remains a bit of a pain but
     should be more solid now.

   - Five more DocBook template files converted to Sphinx. Only 27 to
     go... Lots of plain-text files have also been converted and
     integrated.

   - Images in binary formats have been replaced with more
     source-friendly versions.

   - Various bits of organizational work, including the renaming of
     various files discussed at the kernel summit.

   - New documentation for the device_link mechanism.

  ... and, of course, lots of typo fixes and small updates"

* tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
  dma-buf: Extract dma-buf.rst
  Update Documentation/00-INDEX
  docs: 00-INDEX: document directories/files with no docs
  docs: 00-INDEX: remove non-existing entries
  docs: 00-INDEX: add missing entries for documentation files/dirs
  docs: 00-INDEX: consolidate process/ and admin-guide/ description
  scripts: add a script to check if Documentation/00-INDEX is sane
  Docs: change sh -> awk in REPORTING-BUGS
  Documentation/core-api/device_link: Add initial documentation
  core-api: remove an unexpected unident
  ppc/idle: Add documentation for powersave=off
  Doc: Correct typo, "Introdution" => "Introduction"
  Documentation/atomic_ops.txt: convert to ReST markup
  Documentation/local_ops.txt: convert to ReST markup
  Documentation/assoc_array.txt: convert to ReST markup
  docs-rst: parse-headers.pl: cleanup the documentation
  docs-rst: fix media cleandocs target
  docs-rst: media/Makefile: reorganize the rules
  docs-rst: media: build SVG from graphviz files
  docs-rst: replace bayer.png by a SVG image
  ...
2016-12-12 21:58:13 -08:00
Dan Williams
af7d9f0c57 libnvdimm, pfn: fix align attribute
Fix the format specifier so that the attribute can be parsed correctly.
Currently it returns decimal 1000 for a 4096-byte alignment.

Cc: <stable@vger.kernel.org>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Fixes: 315c562536 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-10 08:12:05 -08:00
Dan Williams
efda1b5d87 acpi, nfit, libnvdimm: fix / harden ars_status output length handling
Given ambiguities in the ACPI 6.1 definition of the "Output (Size)"
field of the ARS (Address Range Scrub) Status command, a firmware
implementation may in practice return 0, 4, or 8 to indicate that there
is no output payload to process.

The specification states "Size of Output Buffer in bytes, including this
field.". However, 'Output Buffer' is also the name of the entire
payload, and earlier in the specification it states "Max Query ARS
Status Output Buffer Size: Maximum size of buffer (including the Status
and Extended Status fields)".

Without this fix if the BIOS happens to return 0 it causes memory
corruption as evidenced by this result from the acpi_nfit_ctl() unit
test.

 ars_status00000000: 00020000 00000000                    ........
 BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff)
 kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC
 task: ffff8803332d2ec0 task.stack: ffffc9000174c000
 RIP: 0010:[<ffffffff814cfe72>]  [<ffffffff814cfe72>] __memcpy+0x12/0x20
 RSP: 0018:ffffc9000174f9a8  EFLAGS: 00010246
 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56
 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000
 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8
 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0
 FS:  00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0
 Stack:
  ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac
  0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000
  0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246
 Call Trace:
  [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit]
  [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test]

Cc: <stable@vger.kernel.org>
Fixes: 747ffe11b4 ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-06 16:08:10 -08:00
Johannes Thumshirn
3a71b3c894 libnvdimm, e820: use module_platform_driver
Use module_platform_driver for the e820 driver instead of open-coding it.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-05 08:52:21 -08:00
Fabian Frederick
b44fe76043 libnvdimm, namespace: use octal for permissions
According to commit f90774e1fd
("checkpatch: look for symbolic permissions and suggest octal instead")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-04 10:54:08 -08:00
Fabian Frederick
0a3f27b9a6 libnvdimm, namespace: avoid multiple sector calculations
Use sector_t for cleared

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-04 10:48:58 -08:00
Fabian Frederick
d37806dc37 libnvdimm: remove else after return in nsio_rw_bytes()
else after return is not needed.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
[djbw: removed some now unnecessary newlines]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-12-04 10:45:13 -08:00
Nicolas Iooss
238b323a68 libnvdimm, namespace: fix the type of name variable
In create_namespace_blk(), the local variable "name" is defined as an
array of NSLABEL_NAME_LEN pointers:

    char *name[NSLABEL_NAME_LEN];

This variable is then used in calls to memcpy() and kmemdup() as if it
were char[NSLABEL_NAME_LEN]. Remove the star in the variable definition
to makes it look right.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-11-28 13:41:17 -08:00
Dan Williams
450c6633e8 libnvdimm: use consistent naming for request_mem_region()
Here is an example /proc/iomem listing for a system with 2 namespaces,
one in "sector" mode and one in "memory" mode:

  1fc000000-2fbffffff : Persistent Memory (legacy)
    1fc000000-2fbffffff : namespace1.0
  340000000-34fffffff : Persistent Memory
    340000000-34fffffff : btt0.1

Here is the corresponding ndctl listing:

  # ndctl list
  [
    {
      "dev":"namespace1.0",
      "mode":"memory",
      "size":4294967296,
      "blockdev":"pmem1"
    },
    {
      "dev":"namespace0.0",
      "mode":"sector",
      "size":267091968,
      "uuid":"f7594f86-badb-4592-875f-ded577da2eaf",
      "sector_size":4096,
      "blockdev":"pmem0s"
    }
  ]

Notice that the ndctl listing is purely in terms of namespace devices,
while the iomem listing leaks the internal "btt0.1" implementation
detail. Given that ndctl requires the namespace device name to change
the mode, for example:

  # ndctl create-namespace --reconfig=namespace0.0 --mode=raw --force

...use the namespace name in the iomem listing to keep the claiming
device name consistent across different mode settings.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-11-28 11:15:18 -08:00
Jonathan Corbet
917fef6f7e Linux 4.9-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYHmoCAAoJEHm+PkMAQRiG7RMIAI2i7Y5hpL5yCxK5AFaL4u/G
 KxXfp1B1UanUTgjOmd7zGqtDYcFX9t7GTTUFixQ7/9Opr4PD9qbnatoDGSc3xjbT
 msDgA1B78F1/Q3kHWfeGq32MihQ4mj5NwUCo+igUcUvvWG7mHgzErj/Nh5RoobQX
 p/izdpTbrw3GX6xXB8olbG7XWHaVye/+TT3q6+gmgm8I/QEujcLeGoycE0zlhPN8
 FG/JX76At/+ZM2Py7Oxo3k+oKL9CHrtOQYDp/wN0uslV5eYvvkZz0/M1HMOGZt+c
 gZU5jzM17K7C4Nzo06WAuBU9wUBGc25m+cPicLlOmljnzfU+f50SKaDjZq3p7QI=
 =2KUF
 -----END PGP SIGNATURE-----

Merge tag 'v4.9-rc4' into sound

Bring in -rc4 patches so I can successfully merge the sound doc changes.
2016-11-18 16:13:41 -07:00
Nicolas Iooss
2d9a02744f nvdimm: use the right length of "pmem"
In order to test that the name of a resource begins with "pmem", call
strncmp() with 4 as length instead of 3 to match the whole prefix.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-11-11 20:37:42 -08:00
Dave Jiang
82bf1037f2 libnvdimm: check and clear poison before writing to pmem
We need to clear any poison when we are writing to pmem. The granularity
will be sector size. If it's less then we can't do anything about it
barring corruption.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
[djbw: fixup 0-length write request to succeed]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-11-11 20:35:31 -08:00
Arnd Bergmann
867dfe3421 nvdimm: make CONFIG_NVDIMM_DAX 'bool'
A bugfix just tried to address a randconfig build problem and introduced
a variant of the same problem: with CONFIG_LIBNVDIMM=y and
CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:

drivers/nvdimm/built-in.o: In function `to_nd_device_type':
bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2':
region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax'
region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax'
drivers/nvdimm/built-in.o: In function `nd_region_probe':
region.c:(.text+0x70f3): undefined reference to `nd_dax_create'
drivers/nvdimm/built-in.o: In function `mode_show':
namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa55f): undefined reference to `is_nd_dax'
drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe':
(.text+0xa56e): undefined reference to `to_nd_dax'

This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again
as it should be (it gets linked into the libnvdimm module). To fix
the original problem, I'm adding a dependency on LIBNVDIMM to
DEV_DAX_PMEM, which ensures we can't have that one built-in if the
rest is a module.

Fixes: 4e65e9381c ("/dev/dax: fix Kconfig dependency build breakage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-27 16:16:21 -07:00
Mauro Carvalho Chehab
8c27ceff36 docs: fix locations of several documents that got moved
The previous patch renamed several files that are cross-referenced
along the Kernel documentation. Adjust the links to point to
the right places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2016-10-24 08:12:35 -02:00
Toshi Kani
3115bb02b5 pmem: report error on clear poison failure
ACPI Clear Uncorrectable Error DSM function may fail or may be
unsupported on a platform.  pmem_clear_poison() returns without clearing
badblocks in such cases.  This failure is detected at the next read
(-EIO).

This behavior can lead to an issue when user keeps writing but does not
read immediately.  For instance, flight recorder file may be only read
when it is necessary for troubleshooting.

Change pmem_do_bvec() and pmem_clear_poison() to return -EIO so that
filesystem can log an error message on a write error.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-19 10:35:52 -07:00
Dan Carpenter
75d29713b7 libnvdimm, namespace: potential NULL deref on allocation error
If the kcalloc() fails then "devs" can be NULL and we dereference it
checking "devs[i]".

Fixes: 1b40e09a12 ('libnvdimm: blk labels and namespace instantiation')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-19 10:35:51 -07:00
Dan Williams
42237e393f libnvdimm: allow a platform to force enable label support
Platforms like QEMU-KVM implement an NFIT table and label DSMs.
However, since that environment does not define an aliased
configuration, the labels are currently ignored and the kernel registers
a single full-sized pmem-namespace per region. Now that the kernel
supports sub-divisions of pmem regions the labels have a purpose.
Arrange for the labels to be honored when we find an existing / valid
namespace index block.

Cc: <qemu-devel@nongnu.org>
Cc: Haozhong Zhang <haozhong.zhang@intel.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-19 08:57:33 -07:00
Toshi Kani
8d7c22ac0c libnvdimm: use generic iostat interfaces
nd_iostat_start() and nd_iostat_end() implement the same functionality
that generic_start_io_acct() and generic_end_io_acct() already provide.

Change nd_iostat_start() and nd_iostat_end() to call the generic iostat
interfaces.  There is no change in the nd interfaces.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-19 08:53:26 -07:00
Dan Williams
e476f94482 Merge branch 'for-4.9/dax' into libnvdimm-for-next 2016-10-07 16:46:30 -07:00
Dan Williams
178d6f4be8 Merge branch 'for-4.9/libnvdimm' into libnvdimm-for-next 2016-10-07 16:46:24 -07:00
Ross Zwisler
4e65e9381c /dev/dax: fix Kconfig dependency build breakage
The function dax_pmem_probe() in drivers/dax/pmem.c is compiled under the
CONFIG_DEV_DAX_PMEM tri-state config option.  This config option currently
only depends on CONFIG_NVDIMM_DAX, a bool, which means that the following
configuration is possible:

CONFIG_LIBNVDIMM=m
...
CONFIG_NVDIMM_DAX=y
CONFIG_DEV_DAX=y
CONFIG_DEV_DAX_PMEM=y

With this config LIBNVDIMM is compiled as a module with NVDIMM_DAX=y just
meaning that we will compile drivers/nvdimm/dax_devs.c into that module.
However, dax_pmem_probe() depends on several symbols defined in
drivers/nvdimm/dax_devs.c, which results in the following build errors:

drivers/built-in.o: In function `dax_pmem_probe':
linux/drivers/dax/pmem.c:70: undefined reference to `to_nd_dax'
linux/drivers/dax/pmem.c:74: undefined reference to
`nvdimm_namespace_common_probe'
linux/drivers/dax/pmem.c:80: undefined reference to `devm_nsio_enable'
linux/drivers/dax/pmem.c:81: undefined reference to `nvdimm_setup_pfn'
linux/drivers/dax/pmem.c:84: undefined reference to `devm_nsio_disable'
linux/drivers/dax/pmem.c:122: undefined reference to `to_nd_region'
drivers/built-in.o: In function `dax_pmem_init':
linux/drivers/dax/pmem.c:147: undefined reference to `__nd_driver_register'

Fix this by making NVDIMM_DAX a tristate.  DEV_DAX_PMEM depends on
NVDIMM_DAX which depends on LIBNVDIMM.  Since they are all now tristates,
if LIBNVDIMM is built as a kernel module DEV_DAX_PMEM will be as well.
This prevents dax_devs.c from being built as a built-in while its
dependencies are in the libnvdimm.ko module.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 16:46:00 -07:00
Dan Williams
98a29c39dc libnvdimm, namespace: allow creation of multiple pmem-namespaces per region
Similar to BLK regions, publish new seed namespace devices to allow
unused PMEM region capacity to be consumed by additional namespaces.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams
991d9020f3 libnvdimm, namespace: lift single pmem limit in scan_labels()
Now that the rest of the infrastructure has been converted to handle
multi-pmem configurations, lift the artificial barrier at scan time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams
c969e24c1b libnvdimm, namespace: filter out of range labels in scan_labels()
Short-circuit doomed-to-fail label validation attempts by skipping
labels that are outside the given region.  For example a DIMM that has
multiple PMEM regions will waste time attempting to create namespaces
only to find that the interleave-set-cookie does not validate, e.g.:

    nd_region region6: invalid cookie in label: 73e608dc-47b9-4b2a-b5c7-2d55a32e0c2

Similar to how we skip BLK labels when performing PMEM validation we can
skip out-of-range labels early.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams
762d067dba libnvdimm, namespace: enable allocation of multiple pmem namespaces
Now that we have nd_region_available_dpa() able to handle the presence
of multiple PMEM allocations in aliased PMEM regions, reuse that same
infrastructure to track allocations from free space.  In particular
handle allocating from an aliased PMEM region in the case where there
are dis-contiguous holes.  The allocation for BLK and PMEM are
documented in the space_valid() helper:

    BLK-space is valid as long as it does not precede a PMEM
    allocation in a given region. PMEM-space must be contiguous
    and adjacent to an existing existing allocation (if one
    exists).

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams
16660eaea0 libnvdimm, namespace: update label implementation for multi-pmem
Instead of assuming that there will only ever be one allocated range at
the start of the region, account for additional namespaces that might
start at an offset from the region base.

After this change pmem namespaces now have a reason to carry an array of
resources similar to blk.  Unifying the resource tracking infrastructure
in nd_namespace_common is a future cleanup candidate.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams
012207334a libnvdimm, namespace: expand pmem device naming scheme for multi-pmem
pmem devices are currently named /dev/pmem<region-index>. Preserve the
naming of the 0th device, but add a ".<namespace-index>" for other
devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:22:53 -07:00
Dan Williams
a1f3e4d6a0 libnvdimm, region: update nd_region_available_dpa() for multi-pmem support
The free dpa (dimm-physical-address) space calculation reports how much
free space is available with consideration for aliased BLK + PMEM
regions.  Recall that BLK capacity is allocated from high addresses and
PMEM is allocated from low addresses in their respective regions.

nd_region_available_dpa() accounts for the fact that the largest
encroachment (lowest starting address) into PMEM capacity by a BLK
allocation limits the available capacity to that point, regardless if
there is BLK allocation hole at a higher address.  Similarly, for the
multi-pmem case we need to track the largest encroachment (highest
 ending address) of a PMEM allocation in BLK capacity regardless of
whether there is an allocation hole that a BLK allocation could fill at
a lower address.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:20:53 -07:00
Dan Williams
6ff3e912d3 libnvdimm, namespace: sort namespaces by dpa at init
Add more determinism to initial namespace device-name assignments by
sorting the namespaces by starting dpa.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:20:53 -07:00
Dan Williams
0e3b0d123c libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time
If label scanning finds multiple valid pmem namespaces allow them to be
surfaced rather than fail namespace scanning. Support for creating
multiple namespaces per region is saved for a later patch.

Note that this adds some new error messages to clarify which of the pmem
namespaces in the set are potentially impacted by invalid labels.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-07 09:20:53 -07:00
Dan Williams
8a5f50d3b7 libnvdimm, namespace: unify blk and pmem label scanning
In preparation for allowing multiple namespace per pmem region, unify
blk and pmem label scanning.  Given that blk regions already support
multiple namespaces, teaching that path how to do pmem namespace
scanning is an incremental step towards multiple pmem namespace support.
This should be functionally equivalent to the previous state in that
stops after finding the first valid pmem label set.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-05 20:24:18 -07:00
Dan Williams
f95b4bca9e libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper
The ability to translate a generic struct device pointer into a
namespace uuid is a useful utility as we go to unify the blk and pmem
label scanning paths.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-10-05 20:24:18 -07:00
Dan Williams
ae8219f186 libnvdimm, label: convert label tracking to a linked list
In preparation for enabling multiple namespaces per pmem region, convert
the label tracking to use a linked list.  In particular this will allow
select_pmem_id() to move labels from the unvalidated state to the
validated state.  Currently we only track one validated set per-region.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30 19:13:42 -07:00
Dan Williams
44c462eb9e libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc
Before we add more libnvdimm-private fields to nd_mapping make it clear
which parameters are input vs libnvdimm internals. Use struct
nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make
struct nd_mapping private to libnvdimm.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30 19:13:42 -07:00
Dave Jiang
db58028ee4 nvdimm: reduce duplicated wpq flushes
Existing implemenetation writes to all the flush hint addresses for a
given ND region. This is not necessary as the flushes are per imc and
not per DIMM. Search the mappings and clear out the duplicates at init
to avoid multiple flush to the same imc.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-09-30 19:12:52 -07:00