Commit Graph

231 Commits

Author SHA1 Message Date
NeilBrown
0a27ec96b6 [PATCH] md: improve raid10 "IO Barrier" concept
raid10 needs to put up a barrier to new requests while it does resync or other
background recovery.  The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.

This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:02 -08:00
NeilBrown
17999be4aa [PATCH] md: improve raid1 "IO Barrier" concept
raid1 needs to put up a barrier to new requests while it does resync or other
background recovery.  The code for this is currently open-coded, slighty
obscure by its use of two waitqueues, and not documented.

This patch gathers all the related code into 4 functions, and includes a
comment which (hopefully) explains what is happening.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Darrick J. Wong
ac81b2ee45 [PATCH] make dm-mirror not issue invalid resync requests
I've been attempting to set up a (Host)RAID mirror with dm_mirror on
2.6.14.3, and I've been having a strange little problem.  The configuration
in question is a set of 9GB SCSI disks that have 17942584 sectors.  I set
up the dm_mirror table as such:

0 17942528 mirror core 2 2048 nosync 2 8:48 0 8:64 0

If I'm not mistaken, this sets up a 9GB RAID1 mriror with 1MB stripes
across both SCSI disks.  The sector count of the dm device is less than the
size of the disks, so we shouldn't fall off the end.  However, I always get
the messages like this in dmesg when I set up the dm table:

attempt to access beyond end of device
sdd: rw=0, want=17958656, limit=17942584

Clearly, something is trying to read sectors past the end of the drive.  I
traced it down to the __rh_recovery_prepare function in dm-raid1.c, which
gets called when we're putting the mirror set together.  This function
calls the dirty region log's get_resync_work function to see if there's any
resync that needs to be done, and queues up any areas that are out of sync.
 The log's get_resync_work function is actually a pointer to the
core_get_resync_work function in dm-log.c.

The core_get_resync_work function queries a bitset lc->sync_bits to find
out if there are any regions that are out of date (i.e.  the bit is 0),
which is where the problem occurs.  If every bit in lc->sync_bits is 1
(which is the case when we've just configured a new RAID1 with the nosync
option), the find_next_zero_bit does NOT return the size parameter
(lc->region_count in this case), it returns the size parameter rounded up
to the nearest multiple of 32!  I don't know if this is intentional, but
i386 and x86_64 both exhibit this behavior.

In any case, the statement "if (*region == lc->region_count)" looks like
it's supposed to catch the case where are no regions to resync and
return 0.  Since find_next_zero_bit apparently has a habit of returning
a value that's larger than lc->region_count, the enclosed patch changes
the equality test to a greater-than test so that we don't try to resync
areas outside of the RAID1 region.  Seeing as the HostRAID metadata
lives just past the end of the RAID1 data, mucking around in that area
is not a good idea.

I suppose another way to fix this would be to amend find_next_zero_bit so
that it doesn't return values larger than "size", but I don't know if
there's a reason for the current behavior.

Signed-Off-By: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Stefan Rompf
9d3520a339 [PATCH] dm-crypt: zero key before freeing it
Zap the memory before freeing it so we don't leave crypto information
around in memory.

Signed-off-by: Stefan Rompf <stefan@loplof.de>
Acked-by: Clemens Fruhwirth <clemens@endorphin.org>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Adrian Bunk
0b56306e56 [PATCH] drivers/md/kcopyd.c: #if 0 kcopyd_cancel()
This patch #if 0's the not yet implemented global function kcopyd_cancel().

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Alasdair G Kergon
6da487dcc0 [PATCH] device-mapper ioctl: add skip lock_fs flag
Add ioctl DM_SKIP_LOCKFS_FLAG for userspace to request that lock_fs is
bypassed when suspending a device.

There's no change to the behaviour of existing code that doesn't know about
the new flag.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Alasdair G Kergon
aa8d7c2fbe [PATCH] device-mapper: make lock_fs optional
Devices only needs syncing when creating snapshots, so make this optional when
suspending a device.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:01 -08:00
Alasdair G Kergon
e39e2e95eb [PATCH] device-mapper: rename frozen_bdev
Rename frozen_bdev to suspended_bdev and move the bdget outside lockfs.  (This
prepares for making lockfs optional.)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Jonathan E Brassow
a1a1908070 [PATCH] device-mapper raid1: add default mirror
This patch introduces a new field to the mirror_set (default_mirror) to store
the default mirror.

(A subsequent patch will allow us to change the default mirror in the event of
a failure.)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Alasdair G Kergon
2d5fe68987 [PATCH] device-mapper: scanf sector format change
Use %llu not %Lu in sscanf/printf format strings.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Andrew Stribblehill
e6c276159c [PATCH] device-mapper: remove unused definition
This patch removes an unused #define.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
Alasdair G Kergon
2d38fe2044 [PATCH] device-mapper snapshot: metadata reading separation
More snapshot metadata reading into separate function, to prepare for changing
the place it gets called from.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
goggin, edward
81f1777a55 [PATCH] device-mapper ioctl: event on rename
After changing the name of a mapped device, trigger a dm event.  (For
userspace multipath tools.)

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
David Teigland
d229a9589f [PATCH] device-mapper: add dm_get_md
Add dm_get_dev() to get a mapped device given its dev_t.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:34:00 -08:00
David Teigland
637842cfdb [PATCH] device-mapper: add dm_find_md
Abstract dm_find_md() from dm_get_mdptr() to allow use elsewhere.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06 08:33:59 -08:00
Linus Torvalds
25c862cc9e Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild 2006-01-04 16:36:52 -08:00
Linus Torvalds
f61ea1b0c8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2006-01-04 16:30:12 -08:00
Brian Gerst
352dd1df32 gitignore: misc files
Ignore all files generated from *_shipped files, plus a few others.

Signed-off-by: Brian Gerst <bgerst@didntduck.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2006-01-01 22:21:50 +01:00
Neil Brown
bcb97940f3 [PATCH] md: Change case of raid level reported in sys/mdX/md/level
I had thought that keeping the reported tail level clearly different
from the module name was a good idea, but I've changed my mind.

'raid5' is better and probably less confusing than 'RAID-5'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-19 16:47:50 -08:00
Mike Christie
defd94b754 [SCSI] seperate max_sectors from max_hw_sectors
- export __blk_put_request and blk_execute_rq_nowait
needed for async REQ_BLOCK_PC requests
- seperate max_hw_sectors and max_sectors for block/scsi_ioctl.c and
SG_IO bio.c helpers per Jens's last comments. Since block/scsi_ioctl.c SG_IO was
already testing against max_sectors and SCSI-ml was setting max_sectors and
max_hw_sectors to the same value this does not change any scsi SG_IO behavior. It only
prepares ll_rw_blk.c, scsi_ioctl.c and bio.c for when SCSI-ml begins to set
a valid max_hw_sectors for all LLDs. Today if a LLD does not set it
SCSI-ml sets it to a safe default and some LLDs set it to a artificial low
value to overcome memory and feedback issues.

Note: Since we now cap max_sectors to BLK_DEF_MAX_SECTORS, which is 1024,
drivers that used to call blk_queue_max_sectors with a large value of
max_sectors will now see the fs requests capped to BLK_DEF_MAX_SECTORS.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-12-15 15:11:40 -08:00
NeilBrown
5036805be7 [PATCH] md: use correct size of raid5 stripe cache when measuring how full it is
The raid5 stripe cache was recently changed from fixed size (NR_STRIPES) to
variable size (conf->max_nr_stripes).  However there are two places that still
use the constant and as a result, reducing the size of the stripe cache can
result in a deadlock.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 09:06:04 -08:00
NeilBrown
3795bb0fc5 [PATCH] md: fix a use-after-free bug in raid1
Who would submit code with a FIXME like that in it !!!!

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-12-12 09:06:04 -08:00
NeilBrown
6aea114a72 [PATCH] md: fix --re-add for raid1 and raid6
If you have an array with a write-intent-bitmap, and you remove a device, then
re-add it, a full recovery isn't needed.  We detect a re-add by looking at
saved_raid_disk.  For raid1, it doesn't matter which disk it was, only whether
or not it was an active device.  The old code being removed set a value of
'mirror' which was then ignored, so it can go.  The changed code performs the
correct check.

For raid6, if there are two missing devices, make sure we chose the right slot
on --re-add rather than always the first slot.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:26 -08:00
NeilBrown
b2a2703c28 [PATCH] md: set default_bitmap_offset properly in set_array_info
If an array is created using set_array_info, default_bitmap_offset isn't set
properly meaning that an internal bitmap cannot be hot-added until the array
is stopped and re-assembled.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown
b5ab28a3b8 [PATCH] md: fix problem with raid6 intent bitmap
When doing a recovery, we need to know whether the array will still be
degraded after the recovery has finished, so we can know whether bits can be
clearred yet or not.  This patch performs the required check.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown
700e432d83 [PATCH] md: fix locking problem in r5/r6
bitmap_unplug actually writes data (bits) to storage, so we shouldn't be
holding a spinlock...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
NeilBrown
22dfdf5212 [PATCH] md: improve read speed to raid10 arrays using 'far copies'
raid10 has two different layouts.  One uses near-copies (so multiple
copies of a block are at the same or similar offsets of different
devices) and the other uses far-copies (so multiple copies of a block
are stored a greatly different offsets on different devices).  The point
of far-copies is that it allows the first section (normally first half)
to be layed out in normal raid0 style, and thus provide raid0 sequential
read performance.

Unfortunately, the read balancing in raid10 makes some poor decisions
for far-copies arrays and you don't get the desired performance.  So
turn off that bad bit of read_balance for far-copies arrays.

With this patch, read speed of an 'f2' array is comparable with a raid0
with the same number of devices, though write speed is ofcourse still
very slow.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28 14:42:25 -08:00
Jonathan E Brassow
7692c5dd48 [PATCH] device-mapper raid1: drop mark_region spinlock fix
The spinlock region_lock is held while calling mark_region which can sleep.
Drop the spinlock before calling that function.

A region's state and inclusion in the clean list are altered by rh_inc and
rh_dec.  The state variable is set to RH_CLEAN in rh_dec, but only if
'pending' is zero.  It is set to RH_DIRTY in rh_inc, but not if it is already
so.  The changes to 'pending', the state, and the region's inclusion in the
clean list need to be atomicly.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
jblunck@suse.de
233886dd32 [PATCH] device-mapper snapshot: bio_list fix
bio_list_merge() should do nothing if the second list is empty - not oops.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Stefan Bader
640eb3b045 [PATCH] device-mapper dm-mpath: endio spinlock fix
do_end_io() can be called without interrupts blocked.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Alasdair G Kergon
0e56822d30 [PATCH] device-mapper: mirror log bitset fix
The linux bitset operators (test_bit, set_bit etc) work on arrays of "unsigned
long".  dm-log uses such bitsets but treats them as arrays of uint32_t, only
allocating and zeroing a multiple of 4 bytes (as 'clean_bits' is a uint32_t).

The patch below fixes this problem.

The problem is specific to 64-bit big endian machines such as s390x or ppc-64
and can prevent pvmove terminating.

In the simplest case, if "region_count" were (say) 30, then
bitset_size (below) would be 4 and bitset_uint32_count would be 1.
Thus the memory for this butset, after allocation and zeroing would
be
   0 0 0 0 X X X X
On a bigendian 64bit machine, bit 0 for this bitset is in the 8th
byte! (and every bit that dm-log would use would be in the X area).

   0 0 0 0 X X X X
                 ^
                 here

which hasn't been cleared properly.

As the dm-raid1 code only syncs and counts regions which have a 0 in the
'sync_bits' bitset, and only finishes when it has counted high enough, a large
number of 1's among those 'X's will cause the sync to not complete.

It is worth noting that the code uses the same bitsets for in-memory and
on-disk logs.  As these bitsets are host-endian and host-sized, this means
that they cannot safely be moved between computers with

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Alasdair G Kergon
c4cc66351a [PATCH] device-mapper: list_versions fix
In some circumstances the LIST_VERSIONS output is truncated because the size
calculation forgets about a 'uint32_t' in each structure - but the inclusion
of the whole of ALIGN_MASK frequently compensates for the omission.

This is a quick workaround to use an upper bound.  (The code ought to be fixed
to supply the actual size.)

Running 'dmsetup targets' may demonstrate the problem: when I run it, the last
line comes out as 'erro' instead of 'error'.  Consequently, 'lvcreate --type
error' doesn't work.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:31 -08:00
Kiyoshi Ueda
b6fcc80d03 [PATCH] device-mapper dm-ioctl: missing put in table load error case
An error path in table_load() forgets to release a table that won't now be
referenced.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-22 09:14:30 -08:00
NeilBrown
c0e485216d [PATCH] md: fix is_mddev_idle calculation now that disk/sector accounting happens when request completes
md needs to monitor the rate of requests to its devices when doing
resync/recovery so that it can back-off when there is non-resync IO.  It
does this by comparing resync IO, which it counts, with total IO which is
taken from disk_stats.

disk_stats were recently changed to account sectors when a request
completes instead of when it is queued.  This upsets md's calculations.

We could do the sync_io accounting at the end of requests too, but that has
problems.  If an underlying device is an md array, the accounting will
still be done when the request is submitted.  This could be changed for
some raid levels, but it cannot be changed for raid0 or linear without
substantial code changes.

So instead, we increase the error that is_mddev_idle allows, up to the
maximum amount of resync IO that can be in flight at any time.  The
calculation is current fragile as each personality as different limits for
in-flight resync.  This should be fixed up.

For now, this simple patch fixes the problem.

Increasing the error margin decreases the sensitivity to non-resync IO.  To
partially compensate for this, the time to wait when non-resync IO is
detected is increased so that less steady IO is required to keep the resync
at bay.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-18 07:49:46 -08:00
Neil Brown
34ef75f09f [PATCH] md: don't pass a NULL file* into ->prepare_write()
Some filesystems go oops.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-18 07:49:46 -08:00
NeilBrown
93588e2284 [PATCH] md: make md threads interruptible again
Despite the fact that md threads don't need to be signalled, and won't
respond to signals anyway, we need to have an 'interruptible' wait, else
they stay in 'D' state and add to the load average.

(akpm: the signal_pending() test is unneeded - we'll fix that up in the next
round.  For now, leave it there because that's how the code used to be).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
NeilBrown
e8a0033451 [PATCH] md: mark START_ARRAY deprecated with a date
This was marked deprecated "after 2.6" back in the 2.5 days.  But now it
seems there isn't going to be any "after 2.6", and we deprecate by date
now.  So set a date.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-15 08:59:19 -08:00
NeilBrown
bb636547b0 [PATCH] md: document sysfs usage of md, and make a couple of small refinements
Document in Documentation/md.txt the files that now appear in sysfs, and make
a couple of small refinements to exactly when 'level' and 'raid_disks' are
empty, to make it match the documentation.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown
7eec314d75 [PATCH] md: improve 'scan_mode' and rename it to 'sync_action'
The current sync_action for an array can be one of

   idle  - nothing happening
   resync - reduncancy being recalcualted
   recover - missing device being recoverred to spare
   check   - user initiated check of redundancy
   repair  - like resync but user-initiated and ignores
             bitmap optimisation.

Each of these strings can also be written to the 'sync_action' file to cause
that action to happen (if appropriate).

While 'sync' is not technically correct, as a recovery is *not* a 'sync', I
think it is the most servicable word here.  Also 'action' is a strong word
than 'mode'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown
787453c239 [PATCH] md: complete conversion of md to use kthreads
There are a few loose ends following the conversion of md to use kthreads:

- Some fields in mdk_thread_t that aren't needed (kthreads does it's own
  completion and manages it's own name).

- thread->run is now never NULL, so no need to check

- Some tests for signal_pending that aren't needed (As we don't use signals
  to stop threads any more)

- Some flush_signals are not needed

- Some waits are interruptible and don't need to be.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:40 -08:00
NeilBrown
fd9d49cac4 [PATCH] md: ignore auto-readonly flag for arrays where it isn't meaningful
The 'auto-readonly' flag (which suppresses resync and superblock updates until
the first write) is not meaningful for personalities that don't support resync
or superblock writes (raid0, linear, etc).

So clear the setting early to avoid it confusing anything - e.g.  appearing in
/proc/mdstat

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
8e1b39d623 [PATCH] md: only try to print recovery/resync status for personalities that support recovery
The introduction of 'resync=PENDING' (for read-only devices) caused that
message to appear for non-syncable arrays like raid0 and linear.  Simplest
thing is to not try to print any resync info unless the personality clearly
supports it.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
411036fa19 [PATCH] md: split off some md attributes in sysfs to a separate group
Some, but not all, md array support data redundancy and hence support checking
and restoring that redundancy (resync, rebuild).

Some attributes apply specifically to functions involving this redundancy, and
so should only appear for md arrays for which they are meaningful.  i.e.  they
should not appear for raid0, linear, multpath, faulty.

This patch separates these into a distinct group and creates the group only if
the personality supports sync_request.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
96de1e663c [PATCH] md: fix some locking and module refcounting issues with md's use of sysfs
1/ I really should be using the __ATTR macros for defining attributes, so
   that the .owner field get set properly, otherwise modules can be removed
   while sysfs files are open.  This also involves some name changes of _show
   routines.

2/ Always lock the mddev (against reconfiguration) for all sysfs attribute
   access.  This easily avoid certain races and is completely consistant with
   other interfaces (ioctl and /proc/mdstat both always lock against
   reconfiguration).

3/ raid5 attributes must check that the 'conf' structure actually exists
   (the array could have been stopped while an attribute file was open).

4/ A missing 'kfree' from when the raid5_conf_t was converted to have a
   kobject embedded, and then converted back again.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
3855ad9f39 [PATCH] md: make sure a user-request sync of raid5 ignores intent bitmap
A sync of raid5 usually ignore blocks which the bitmap says are in-sync.  But
a user-request check or repair should not ignore these.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
e5de485f00 [PATCH] md: make manual repair work for raid1
Raid1 currently optimises resync using the intent bitmap etc.  This
optimisation is not wanted when we explicitly request a repair through sysfs,
so add appropriate checks.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
f637b9f9fc [PATCH] md: make sure /block link in /sys/.../md/ goes to correct devices
If a block_device is a partition, then it's kobject is
  bdev->bd_part->kobj
otherwise (if it is a full device), the kobject is
  bdev->bd_disk->kobj

As md wants back-links to the correct object (whether partition or not), we
need to respect this difference...  (Thus current code shows a link to the
whole device, whether we are using a partition or not, which is wrong).

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:39 -08:00
NeilBrown
f91de92ed6 [PATCH] md: allow md arrays to be started read-only (module parameter).
When an md array is started, the superblock will be written, and resync may
commense.  This is not good if you want to be completely read-only as, for
example, when preparing to resume from a suspend-to-disk image.

So introduce a module parameter "start_ro" which can be set
to '1' at boot, at module load, or via
  /sys/module/md_mod/parameters/start_ro

When this is set, new arrays get an 'auto-ro' mode, which disables all
internal io (superblock updates, resync, recovery) and is automatically
switched to 'rw' when the first write request arrives.

The array can be set to true 'ro' mode using 'mdadm -r' before the first
write request, or resync can be started without a write using 'mdadm -w'.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
19133a4298 [PATCH] md: Remove attempt to use dynamic names in sysfs for component devices on an MD array.
With version-0.90 superblock, component devices on an md device to not have
any stable name related to the array -(version-1 assigns a fixed index when
a device is added to an array, and this remains despit any hot-swap).

The intial code for making these devices appear in sysfs used dynamic
names, which would change whenever a hot-spare was swapped for a failed or
missing device.  This turns out not to be practical in sysfs for a number
of reasons.

This patch changes then naming of component devices to be based on the
result of 'bdevname'.  This is stable and should be unique.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
a9701a3047 [PATCH] md: support BIO_RW_BARRIER for md/raid1
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....

So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.

We initially assumes barriers are OK.

When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers.  This will usually clear the flags fairly quickly.

If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.

When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried.  So raid1d is enhanced to do this,
and when any bio write completes (i.e.  no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.

We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
  1/ the device used to support BARRIER, but now doesn't.  Few devices
     change like this, though raid1 can!
or
  2/ the array has no persistent superblock, so there was no opportunity to
     pre-test for barriers when writing the superblock.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
bd926c63b7 [PATCH] md: make md on-disk bitmaps not host-endian
Current bitmaps use set_bit et.al and so are host-endian, which means
not-portable.  Oops.

Define a new version number (4) for which bitmaps are little-endian.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
b2d444d7ad [PATCH] md: convert 'faulty' and 'in_sync' fields to bits in 'flags' field
This has the advantage of removing the confusion caused by 'rdev_t' and
'mddev_t' both having 'in_sync' fields.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
ba22dcbf10 [PATCH] md: improvements to raid5 handling of read errors
Two refinements to the 'attempt-overwrite-on-read-error' mechanism.
1/ If the array is read-only, don't attempt an over-write.
2/ If there are more than max_nr_stripes read errors on a device with
   no success, fail the drive.  This will make sure a dead
   drive will be eventually kicked even when we aren't trying
   to rewrite (which would normally kick a dead drive more quickly.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
007583c925 [PATCH] md: change raid5 sysfs attribute to not create a new directory
There isn't really a need for raid5 attributes to be an a subdirectory,
so this patch moves them from
  /sys/block/mdX/md/raid5/attribute
to
  /sys/block/mdX/md/attribute

This suggests that all md personalities should co-operate about
namespace usage, but that shouldn't be a problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:38 -08:00
NeilBrown
31399d9e56 [PATCH] md: minor MD fixes
1/ Use reduce stack usage, because 'gcc' apparently doesn't overlay
   different variables  that are in separate scopes...

2/ Use test_bit instead of ( .. & 1<< ..) which in this case is buggy.

Thanks to Andrew Morton

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
9c79197761 [PATCH] md: fix ref-counting problems with kobjects in md
Thanks Greg.

Cc:  Greg KH <greg@kroah.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
Suzanne Wood
d6065f7bf8 [PATCH] md: provide proper rcu_dereference / rcu_assign_pointer annotations in md
Acked-by: <paulmck@us.ibm.com>
Signed-off-by: Suzanne Wood <suzannew@cs.pdx.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
9d88883e68 [PATCH] md: teach raid5 the difference between 'check' and 'repair'.
With this, raid5 can be asked to check parity without repairing it.  It also
keeps a count of the number of incorrect parity blocks found (mismatches) and
reports them through sysfs.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
24dd469d72 [PATCH] md: allow a manual resync with md
You can trigger a 'check' with
  echo check > /sys/block/mdX/md/scan_mode
or a check-and-repair errors with
  echo repair > /sys/block/mdX/md/scan_mode

and read the current state from the same file.

Note: personalities need to know the different between 'check' and 'repair',
but don't yet.  Until they do, 'check' will be the same as 'repair' and will
just do a normal resync pass.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
3f294f4fb6 [PATCH] md: add kobject/sysfs support to raid5
/sys/block/mdX/md/raid5/
contains raid5-related attributes.
Currently
  stripe_cache_size
is number of entries in stripe cache, and is settable.
  stripe_cache_active
is number of active entries, and in only readable.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
86e6ffdd24 [PATCH] md: extend md sysfs support to component devices.
Each device in an md array how has a corresponding
  /sys/block/mdX/md/devNN/
directory which can contain attributes.  Currently there is only 'state' which
summarises the state, nd 'super' which has a copy of the superblock, and
'block' which is a symlink to the block device.

Also, /sys/block/mdX/md/rdNN represents slot 'NN' in the array, and is a
symlink to the relevant 'devNN'.  Obviously spare devices do not have a slot
in the array, and so don't have such a symlink.

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:37 -08:00
NeilBrown
eae1701fbd [PATCH] md: initial sysfs support for md
Start using kobjects in mddevs, and provide a couple of simple attributes
(level and disks).  Attributes live in
  /sys/block/mdX/md/attr-name

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:36 -08:00
NeilBrown
4e5314b56a [PATCH] md: better handling of readerrors with raid5.
This patch changes the behaviour of raid5 when it gets a read error.
Instead of just failing the device, it tried to find out what should have
been there, and writes it over the bad block.  For some media-errors, this
has a reasonable chance of fixing the error.  If the write succeeds, and a
subsequent read succeeds as well, raid5 decided the address is OK and
conitnues.

Instead of failing a drive on read-error, we attempt to re-write the block,
and then re-read.  If that all works, we allow the device to remain in the
array.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:56:36 -08:00
Olaf Hering
733482e445 [PATCH] changing CONFIG_LOCALVERSION rebuilds too much, for no good reason
This patch removes almost all inclusions of linux/version.h.  The 3
#defines are unused in most of the touched files.

A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.

There are also lots of #ifdef for long obsolete kernels, this was not
touched.  In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.

quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`

search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09 07:55:57 -08:00
Nishanth Aravamudan
66c006a551 [PATCH] drivers/md: fix-up schedule_timeout() usage
Use schedule_timeout_interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-07 07:53:57 -08:00
Jens Axboe
a362357b6c [BLOCK] Unify the seperate read/write io stat fields into arrays
Instead of having ->read_sectors and ->write_sectors, combine the two
into ->sectors[2] and similar for the other fields. This saves a branch
several places in the io path, since we don't have to care for what the
actual io direction is. On my x86-64 box, that's 200 bytes less text in
just the core (not counting the various drivers).

Signed-off-by: Jens Axboe <axboe@suse.de>
2005-11-01 09:26:16 +01:00
David Hardeman
378f058cc4 [PATCH] Use sg_set_buf/sg_init_one where applicable
This patch uses sg_set_buf/sg_init_one in some places where it was
duplicated.

Signed-off-by: David Hardeman <david@2gen.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2005-10-30 11:19:43 +11:00
Al Viro
b4e3ca1ab1 [PATCH] gfp_t: remaining bits of drivers/*
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28 08:16:51 -07:00
NeilBrown
8712e55356 [PATCH] md: make sure mdthreads will always respond to kthread_stop
There are still a couple of cases where md threads (the resync/recovery
thread) is not interruptible since the change to use kthreads.  All places
there it tests "signal_pending", it should also test kthread_should_stop,
as with this patch.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-26 10:39:42 -07:00
NeilBrown
6985c43f39 [PATCH] Three one-liners in md.c
The main problem fixes is that in certain situations stopping md arrays may
take longer than you expect, or may require multiple attempts.  This would
only happen when resync/recovery is happening.

This patch fixes three vaguely related bugs.

1/ The recent change to use kthreads got the setting of the
   process name wrong.  This fixes it.
2/ The recent change to use kthreads lost the ability for
   md threads to be signalled with SIG_KILL.  This restores that.
3/ There is a long standing bug in that if:
    - An array needs recovery (onto a hot-spare) and
    - The recovery is being blocked because some other array being
       recovered shares a physical device and
    - The recovery thread is killed with SIG_KILL
   Then the recovery will appear to have completed with no IO being
   done, which can cause data corruption.
   This patch makes sure that incomplete recovery will be treated as
   incomplete.

Note that any kernel affected by bug 2 will not suffer the problem of bug
3, as the signal can never be delivered.  Thus the current 2.6.14-rc
kernels are not susceptible to data corruption.  Note also that if arrays
are shutdown (with "mdadm -S" or "raidstop") then the problem doesn't
occur.  It only happens if a SIGKILL is independently delivered as done by
'init' when shutting down.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-19 23:04:30 -07:00
Al Viro
dd0fc66fb3 [PATCH] gfp flags annotations - part 1
- added typedef unsigned int __nocast gfp_t;

 - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
   the same warnings as far as sparse is concerned, doesn't change
   generated code (from gcc point of view we replaced unsigned int with
   typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-08 15:00:57 -07:00
Alasdair G Kergon
485ef69ede [PATCH] device-mapper: Fix queue_if_no_path initialisation
When creating a multipath device, if the queue_if_no_path parameter is
specified it gets ignored.

While the queue_if_no_path variable is correctly set to 1, the
saved_queue_if_no_path gets set to 0.  When the device is subsequently made
live (resumed), the saved value (0) always overwrites the live value (1) so
the option *always* gets turned off.

The fix adds a parameter to the queue_if_no_path() function to indicate
whether the previous value should be preserved or not - if not, as when the
device is being set up, the saved value is set to the new value (1).

Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:42 -07:00
goggin, edward
269fd2a6f8 [PATCH] device-mapper: Trigger an event when a table is deleted
If anything is waiting on a device's table when the device is removed, we
must first wake it up so it will release its reference.  Otherwise the
table's reference count will not drop to zero and the table will not get
removed.

Signed-Off-By: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 07:46:42 -07:00
H. Peter Anvin
d7e70ba45f [PATCH] RAID6 Altivec fix
This patch fixes a signedness bug with RAID6 for Altivec, and makes the
Altivec code testable in userspace.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:49:58 -07:00
Adrian Bunk
338cec3253 [PATCH] merge some from Rusty's trivial patches
This patch contains the most trivial from Rusty's trivial patches:
- spelling fixes
- remove duplicate includes

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:30 -07:00
Jesper Juhl
f9101210e7 [PATCH] vfree and kfree cleanup in drivers/
This patch does a full cleanup of 'NULL checks before vfree', and a partial
cleanup of calls to kfree for all of drivers/ - the kfree bit is partial in
that I only did the files that also had vfree calls in them.  The patch
also gets rid of some redundant (void *) casts of pointers being passed to
[vk]free, and a some tiny whitespace corrections also crept in.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:30 -07:00
NeilBrown
87fc767b83 [PATCH] md: fix BUG when raid10 rebuilds without enough drives
This shouldn't be a BUG.  We should cope.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:15 -07:00
NeilBrown
6d508242b2 [PATCH] md: fix raid10 assembly when too many devices are missing
If you try to assemble an array with too many missing devices, raid10 will now
reject the attempt, instead of allowing it.

Also check when hot-adding a drive and refuse the hot-add if the array is
beyond hope.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
611815651b [PATCH] md: really get sb_size setting right in all cases
There was another case where sb_size wasn't being set, so instead do the
sensible thing and set if when filling in the content of a superblock.  That
ensures that whenever we write a superblock, the sb_size MUST be set.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
188c18fd79 [PATCH] md: make sure the new 'sb_size' is set properly device added without pre-existing superblock.
There are two ways to add devices to an md/raid array.

  It can have superblock written to it, and then given to the md driver,
  which will read the superblock (the new way)

or

  md can be told (through SET_ARRAY_INFO) the shape of the array, and
  the told about individual drives, and md will create the required
  superblock (the old way).

The newly introduced sb_size was only set for drives being added the
new way, not the old ways.  Oops :-(

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
b325a32e57 [PATCH] md: report spare drives in /proc/mdstat
Just like failed drives have (F), so spare drives now have (S).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
1cd6bf19bb [PATCH] md: add information about superblock version to /proc/mdstat
Leave it unchanged if the original (0.90) is used, incase it might be a
compatability problem.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:14 -07:00
NeilBrown
720a3dc39b [PATCH] md: use queue_hardsect_size instead of block_size for md superblock size calc.
Doh.  I want the physical hard-sector-size, not the current block size...

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
53e87fbb5d [PATCH] md: choose better default offset for bitmap.
On reflection, a better default location for hot-adding bitmaps with version-1
superblocks is immediately after the superblock.  There might not be much room
there, but there is usually atleast 3k, and that is a good start.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
500af87abb [PATCH] md: tidy up daemon stop/start code in md/bitmap.c
The bitmap code used to have two daemons, so there is some 'common' start/stop
code.  But now there is only one, so the common code is just noise.

This patch tidies this up somewhat.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
9ba00538ad [PATCH] md: ensure bitmap_writeback_daemon handles shutdown properly.
mddev->bitmap gets clearred before the writeback daemon is stopped.  So the
write_back daemon needs to be careful not to dereference the 'bitmap' if it is
NULL.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
a6fb0934f9 [PATCH] md: use kthread infrastructure in md
Switch MD to use the kthread infrastructure, to simplify the code and get rid
of tasklist_lock abuse in md_unregister_thread.

Also don't flush signals in md_thread, as the called thread will always do
that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:13 -07:00
NeilBrown
934ce7c840 [PATCH] md: write-intent bitmap support for raid6
This is a direct port of the raid5 patch.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:12 -07:00
NeilBrown
72626685dc [PATCH] md: add write-intent-bitmap support to raid5
Most awkward part of this is delaying write requests until bitmap updates have
been flushed.

To achieve this, we have a sequence number (seq_flush) which is incremented
each time the raid5 is unplugged.

If the raid thread notices that this has changed, it flushes bitmap changes,
and assigned the value of seq_flush to seq_write.

When a write request arrives, it is given the number from seq_write, and that
write request may not complete until seq_flush is larger than the saved seq
number.

We have a new queue for storing stripes which are waiting for a bitmap flush
and an extra flag for stripes to record if the write was 'degraded' and so
should not clear the a bit in the bitmap.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:12 -07:00
NeilBrown
0002b2718d [PATCH] md: limit size of sb read/written to appropriate amount
version-1 superblocks are not (normally) 4K long, and can be of variable size.
 Writing the full 4K can cause corruption (but only in non-default
configurations).

With this patch the super-block-flavour can choose a size to read, and set a
size to write based on what it finds.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:12 -07:00
NeilBrown
ab904d6346 [PATCH] md: fix bitmap/read_sb_page so that it handles errors properly.
read_sb_page() assumed that if sync_page_io fails, the device would be marked
faultly.  However it isn't.  So in the face of error, read_sb_page would loop
forever.

Redo the logic so that this cannot happen.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:11 -07:00
NeilBrown
71c0805cb4 [PATCH] md: allow md to load a superblock with feature-bit '1' set
As this is used to flag an internal bitmap.

Also, introduce symbolic names for feature bits.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:11 -07:00
NeilBrown
7b1e35f6d6 [PATCH] md: allow hot-adding devices to arrays with non-persistant superblocks.
It is possibly (and occasionally useful) to have a raid1 without persistent
superblocks.  The code in add_new_disk for adding a device to such an array
always tries to read a superblock.

This will obviously fail.

So do the appropriate test and call md_import_device with
appropriate args.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:11 -07:00
NeilBrown
3178b0dbdf [PATCH] md: do not set mddev->bitmap until bitmap is fully initialised
When hot-adding a bitmap, bitmap_daemon_work could get called while the bitmap
is being created, so don't set mddev->bitmap until the bitmap is ready.

This requires freeing the bitmap inside bitmap_create if creation failed
part-way through.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:11 -07:00
NeilBrown
585f0dd5a9 [PATCH] md: make sure bitmap_daemon_work actually does work.
The 'lastrun' time wasn't being initialised, so it could be half a
jiffie-cycle before it seemed to be time to do work again.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:11 -07:00
NeilBrown
9e6603da9b [PATCH] md: raid1_quiesce is back to front, fix it.
A state of 0 mean 'not quiesced'
A state of 1 means 'is quiesced'

The original code got this wrong.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:10 -07:00
NeilBrown
15945fee6f [PATCH] md: support md/linear array with components greater than 2 terabytes.
linear currently uses division by the size of the smallest componenet device
to find which device a request goes to.  If that smallest device is larger
than 2 terabytes, then the division will not work on some systems.

So we introduce a pre-shift, and take care not to make the hash table too
large, much like the code in raid0.

Also get rid of conf->nr_zones, which is not needed.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:10 -07:00
NeilBrown
4b6d287f62 [PATCH] md: add write-behind support for md/raid1
If a device is flagged 'WriteMostly' and the array has a bitmap, and the
bitmap superblock indicates that write_behind is allowed, then write_behind is
enabled for WriteMostly devices.

Write requests will be acknowledges as complete to the caller (via b_end_io)
when all non-WriteMostly devices have completed the write, but will not be
cleared from the bitmap until all devices complete.

This requires memory allocation to make a local copy of the data being
written.  If there is insufficient memory, then we fall-back on normal write
semantics.

Signed-Off-By: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:10 -07:00
NeilBrown
8ddf9efe67 [PATCH] md: support write-mostly device in raid1
This allows a device in a raid1 to be marked as "write mostly".  Read requests
will only be sent if there is no other option.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:10 -07:00
NeilBrown
36fa30636f [PATCH] md: all hot-add and hot-remove of md intent logging bitmaps
Both file-bitmaps and superblock bitmaps are supported.

If you add a bitmap file on the array device, you lose.

This introduces a 'default_bitmap_offset' field in mddev, as the ioctl used
for adding a superblock bitmap doesn't have room for giving an offset.  Later,
this value will be setable via sysfs.

Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09 16:39:10 -07:00