Add information about LVT offset assignments to better debug firmware
bugs related to this. See following examples.
# dmesg | grep -i 'offset\|ibs'
LVT offset 0 assigned for vector 0xf9
[Firmware Bug]: cpu 0, try to use APIC500 (LVT offset 0) for vector 0x10400, but the register is already in use for vector 0xf9 on another cpu
[Firmware Bug]: cpu 0, IBS interrupt offset 0 not available (MSRC001103A=0x0000000000000100)
Failed to setup IBS, -22
In this case the BIOS assigns both offsets for MCE (0xf9) and IBS
(0x400) vectors to offset 0, which is why the second APIC setup (IBS)
failed.
With correct setup you get:
# dmesg | grep -i 'offset\|ibs'
LVT offset 0 assigned for vector 0xf9
LVT offset 1 assigned for vector 0x400
IBS: LVT offset 1 assigned
perf: AMD IBS detected (0x00000007)
oprofile: AMD IBS detected (0x00000007)
Note: The vector includes also the message type to handle also NMIs
(0x400). In the firmware bug message the format is the same as of the
APIC500 register and includes the mask bit (bit 16) in addition.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This device-mapper target creates a read-only device that transparently
validates the data on one underlying device against a pre-generated tree
of cryptographic checksums stored on a second device.
Two checksum device formats are supported: version 0 which is already
shipping in Chromium OS and version 1 which incorporates some
improvements.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Elly Jones <ellyjones@chromium.org>
Cc: Milan Broz <mbroz@redhat.com>
Cc: Olof Johansson <olofj@chromium.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch introduces a new function dm_bufio_prefetch. It prefetches
the specified range of blocks into dm-bufio cache without waiting
for i/o completion.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add dm thin target arguments to control discard support.
ignore_discard: Disables discard support
no_discard_passdown: Don't pass discards down to the underlying data
device, but just remove the mapping within the thin provisioning target.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Support discards in the thin target.
On discard the corresponding mapping(s) are removed from the thin
device. If the associated block(s) are no longer shared the discard
is passed to the underlying device.
All bios other than discards now have an associated deferred_entry
that is saved to the 'all_io_entry' in endio_hook. When non-discard
IO completes and associated mappings are quiesced any discards that
were deferred, via ds_add_work() in process_discard(), will be queued
for processing by the worker thread.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
drivers/md/dm-thin.c | 173 ++++++++++++++++++++++++++++++++++++++++++++++----
drivers/md/dm-thin.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 158 insertions(+), 14 deletions(-)
This patch contains the ground work needed for dm-thin to support discard.
- Adds endio function that replaces shared_read_endio.
- Introduce an explicit 'quiesced' flag into the new_mapping structure.
Before, this was implicitly indicated by m->list being empty.
- The map_info->ptr remains constant for the duration of a bio's trip
through the thin target. Make it easier to reason about it.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Support the use of an external _read only_ device as an origin for a thin
device.
Any read to an unprovisioned area of the thin device will be passed
through to the origin. Writes trigger allocation of new blocks as
usual.
One possible use case for this would be VM hosts that want to run
guests on thinly-provisioned volumes but have the base image on another
device (possibly shared between many VMs).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The thin metadata format can only make use of a device that is <=
THIN_METADATA_MAX_SECTORS (currently 15.9375 GB). Therefore, there is no
practical benefit to using a larger device.
However, it may be that other factors impose a certain granularity for
the space that is allocated to a device (E.g. lvm2 can impose a coarse
granularity through the use of large, >= 1 GB, physical extents).
Rather than reject a larger metadata device, during thin-pool device
construction, switch to allowing it but issue a warning if a device
larger than THIN_METADATA_MAX_SECTORS_WARNING (16 GB) is
provided. Any space over 15.9375 GB will not be used.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Save space by removing entries from the space map ref_count tree if
they're no longer needed.
Ref counts are stored in two places: a bitmap if the ref_count is
below 3, or a btree of uint32_t if 3 or above.
When a ref_count that was above 3 drops below we can remove it from
the tree and save some metadata space. This removal was commented out
before because I was unsure why this was causing under-populated btree
nodes. Earlier patches have fixed this issue.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Commit unwritten data every second to prevent too much building up.
Released blocks don't become available until after the next commit
(for crash resilience). Prior to this patch commits were only
triggered by a message to the target or a REQ_{FLUSH,FUA} bio. This
allowed far too big a position to build up.
The interval is hard-coded to 1 second. This is a sensible setting.
I'm not making this user configurable, since there isn't much to be
gained by tweaking this - and a lot lost by setting it far too high.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Device mapper uses sscanf to convert arguments to numbers. The problem is that
the way we use it ignores additional unmatched characters in the scanned string.
For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
but also it will match number with some garbage appended, like "123abc".
As a result, device mapper accepts garbage after some numbers. For example
the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
will pass without an error.
This patch fixes all sscanf uses in device mapper. It appends "%c" with
a pointer to a dummy character variable to every sscanf statement.
The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
only if string is a null-terminated number (optionally preceded by some
whitespace characters). If there is some character appended after the number,
sscanf matches "%c", writes the character to the dummy variable and returns 2.
We check the return value for 1 and consequently reject numbers with some
garbage appended.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The dm-raid code currently fails to create a RAID array if any of the
superblocks cannot be read. This was an oversight as there is already
code to handle this case if the values ('- -') were provided for the
failed array position.
With this patch, if a superblock cannot be read, the array position's
fields are initialized as though '- -' was set in the table. That is,
the device is failed and the position should not be used, but if there
is sufficient redundancy, the array should still be activated.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Fix a harmless typo.
The root is a chunk of data that gets written to the superblock. This
data is used to recreate the space map when opening a metadata area.
We have two space maps; one tracking space on the metadata device and
one of the data device. Both of these use the same format for their
root, so this typo was harmless.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Now that the value_size is held within every node of the btrees we can
remove this argument from value_ptr().
For the last few months a BUG_ON has been checking this argument is
the same as that held in the node. No issues were reported. So this
is a safe change.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The map_context pointer should always be set. However, we have reports
that upon requeuing it is not set correctly. So add set and clear
functions with a BUG_ON() to track the issue properly.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
As a precaution, set bi_end_io to NULL when failing to remap.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
free_devices in dm_table.c already uses list_for_each(), so we don't
need to check if the list is empty.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove documentation for unimplemented 'trim' message.
I'd planned a 'trim' target message for shrinking thin devices, but
this is better handled via the discard ioctl.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
The dm raid module (using md) is becoming the preferred way of creating long-lived
mirrors through userspace LVM so remove the EXPERIMENTAL tag.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Drop EXPERIMENTAL tag from dm-uevent.
It's not changed for a while and some userspace tools are relying upon it.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Update device-mapper MAINTAINERS entry to mention quilt working tree location
and persistent-data subdirectory.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Describe attributes provided by device-mapper in /sys/block.
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
When we remove an entry from a node we sometimes rebalance with it's
two neighbours. This wasn't being done correctly; in some cases
entries have to move all the way from the right neighbour to the left
neighbour, or vice versa. This patch pretty much re-writes the
balancing code to fix it.
This code is barely used currently; only when you delete a thin
device, and then only if you have hundreds of them in the same pool.
Once we have discard support, which removes mappings, this will be used
much more heavily.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Avoid using the bi_next field for the holder of a cell when deferring
bios because a stacked device below might change it. Store the
holder in a new field in struct cell instead.
When a cell is created, the bio that triggered creation (the holder) was
added to the same bio list as subsequent bios. In some cases we pass
this holder bio directly to devices underneath. If those devices use
the bi_next field there will be trouble...
This also simplifies some code that had to work out which bio was the
holder.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Always set io->error to -EIO when an error is detected in dm-crypt.
There were cases where an error code would be set only if we finish
processing the last sector. If there were other encryption operations in
flight, the error would be ignored and bio would be returned with
success as if no error happened.
This bug is present in kcryptd_crypt_write_convert, kcryptd_crypt_read_convert
and kcryptd_async_done.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Reviewed-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch fixes a possible deadlock in dm-crypt's mempool use.
Currently, dm-crypt reserves a mempool of MIN_BIO_PAGES reserved pages.
It allocates first MIN_BIO_PAGES with non-failing allocation (the allocation
cannot fail and waits until the mempool is refilled). Further pages are
allocated with different gfp flags that allow failing.
Because allocations may be done in parallel, this code can deadlock. Example:
There are two processes, each tries to allocate MIN_BIO_PAGES and the processes
run simultaneously.
It may end up in a situation where each process allocates (MIN_BIO_PAGES / 2)
pages. The mempool is exhausted. Each process waits for more pages to be freed
to the mempool, which never happens.
To avoid this deadlock scenario, this patch changes the code so that only
the first page is allocated with non-failing gfp mask. Allocation of further
pages may fail.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Call the correct exit function on failure in dm_exception_store_init.
Signed-off-by: Andrei Warkentin <andrey.warkentin@gmail.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
asm/system.h is a cause of circular dependency problems because it contains
commonly used primitive stuff like barrier definitions and uncommonly used
stuff like switch_to() that might require MMU definitions.
asm/system.h has been disintegrated by this point on all arches into the
following common segments:
(1) asm/barrier.h
Moved memory barrier definitions here.
(2) asm/cmpxchg.h
Moved xchg() and cmpxchg() here. #included in asm/atomic.h.
(3) asm/bug.h
Moved die() and similar here.
(4) asm/exec.h
Moved arch_align_stack() here.
(5) asm/elf.h
Moved AT_VECTOR_SIZE_ARCH here.
(6) asm/switch_to.h
Moved switch_to() here.
Signed-off-by: David Howells <dhowells@redhat.com>
Split arch_align_stack() out from asm-generic/system.h into its own header of
asm-generic/exec.h as part of the asm/system.h disintegration.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Split the switch_to() wrapper out of asm-generic/system.h into its own
asm-generic/system.h as part of the asm/system.h disintegration.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
to simplify disintegration of asm/system.h.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Create asm-generic/barrier.h and move the barrier definitions from
asm-generic/system.h to it.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h as all arch
files that #include the former also #include the latter. See:
grep -rl asm-generic/cmpxchg-local[.]h arch/ | sort > b
grep -rl asm-generic/cmpxchg[.]h arch/ | sort > a
comm a b
This simplifies the disintegration of asm-generic/system.h for arches that
don't have their own.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Disintegrate asm/system.h for Unicore32. (Compilation successful)
The implementation details are not changed, but only splitted.
BTW, some codestyles are adjusted.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Disintegrate asm/system.h for PowerPC.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: linuxppc-dev@lists.ozlabs.org