We cannot delay for the first dispatch of the async queue if it
hasn't dispatched at all, since that could present a local user
DoS attack vector using an app that just did slow timed sync reads
while filling memory.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Not all users of the topology information want to use libblkid. Provide
the topology information through bdev ioctls.
Also clarify sector size comments for existing BLK ioctls.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Don't think that's necessarily a perfect description of what this
option fiddles with, but it's probably better than 'desktop'.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This slowly ramps up the async queue depth based on the time
passed since the sync IO, and doesn't allow async at all until
a sync slice period has passed.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
o Do not allow more than max_dispatch requests from an async queue, if some
sync request has finished recently. This is in the hope that sync activity
is still going on in the system and we might receive a sync request soon.
Most likely from a sync queue which finished a request and we did not enable
idling on it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This is basically identical to what Vivek Goyal posted, but combined
into one and labelled 'desktop' instead of 'fairness'. The goal
is to continue to improve on the latency side of things as it relates
to interactiveness, keeping the questionable bits under this sysfs
tunable so it would be easy for throughput-only people to turn off.
Apart from adding the interactive sysfs knob, it also adds the
behavioural change of allowing slice idling even if the hardware
does tagged command queuing.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Currently we set the bio size to the byte equivalent of the blocks to
be trimmed when submitting the initial DISCARD ioctl. That means it
is subject to the max_hw_sectors limitation of the HBA which is
much lower than the size of a DISCARD request we can support.
Add a separate max_discard_sectors tunable to limit the size for discard
requests.
We limit the max discard request size in bytes to 32bit as that is the
limit for bio->bi_size. This could be much larger if we had a way to pass
that information through the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
prepare_discard_fn() was being called in a place where memory allocation
was effectively impossible. This makes it inappropriate for all but
the most trivial translations of Linux's DISCARD operation to the block
command set. Additionally adding a payload there makes the ownership
of the bio backing unclear as it's now allocated by the device driver
and not the submitter as usual.
It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
the queue supports discard operations or not. blkdev_issue_discard now
allocates a one-page, sector-length payload which is the right thing
for the common ATA and SCSI implementations.
The mtd implementation of prepare_discard_fn() is replaced with simply
checking for the request being a discard.
Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
which did the prepare_discard_fn but not the different payload allocation
yet.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
While testing Swap over NFS patchset, I noticed an oops that was triggered
during swapon. Investigating further, the NULL pointer deference is due to the
SSD device check/optimization in the swapon code that assumes s_bdev could never
be NULL.
inode->i_sb->s_bdev could be NULL in a few cases. For e.g. one such case is
loopback NFS mount, there could be others as well. Fix this by ensuring s_bdev
is not NULL before we try to deference s_bdev.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix these build errors when CONFIG_PROC_FS is not set:
drivers/block/cciss.c: In function 'cciss_show_raid_level':
drivers/block/cciss.c:623: error: 'RAID_UNKNOWN' undeclared (first use in this function)
drivers/block/cciss.c:626: error: 'raid_label' undeclared (first use in this function)
drivers/block/cciss.c: In function 'cciss_geometry_inquiry':
drivers/block/cciss.c:2696: error: 'RAID_UNKNOWN' undeclared (first use in this function)
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Stacking devices do not have an inherent max_hw_sector limit. Set the
default to INT_MAX so we are bounded only by capabilities of the
underlying storage.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The topology changes unintentionally caused SAFE_MAX_SECTORS to be set
for stacking devices. Set the default limit to BLK_DEF_MAX_SECTORS and
provide SAFE_MAX_SECTORS in blk_queue_make_request() for legacy hw
drivers that depend on the old behavior.
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cciss: Dynamically allocate the drive_info_struct for each logical drive.
This reduces the size of the per-hba ctlr_info structure from 106936
bytes to 8132 bytes. That's on 32-bit systems. On 64-bit systems, the
improvement is even bigger. Without this, the ctlr_info struct is so big
that the driver won't even load on a 64 bit system if CISS_MAX_LUN was
at it's current setting of 1024 logical drives.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add usage_count attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/usage_count for controller X,
logical drive Y. The usage count is the number of times
the device has currently been opened.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
and change get rid of some magic numbers in raid lavel decoding.
Add raid_level attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/raid_level for controller X,
logical drive Y
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cciss: fix some magic numbers in the raid-level decoding
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Add lunid attribute to each logical drive at
/sys/devices/<dev>/ccissX/cXdY/lunid for controller X,
logical drive Y
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Don't check h->busy_initializing in cciss_open(). Open won't be
called before things are ready, but h->busy_initializing won't be
unset until after the initial rebuild_lun_table is finished. But,
to read the partitions, cciss_open will be called for each logical
drive during rebuild_lun_table. If cciss_open checks h->busy_initializing,
then the reading of the partition information during the initial
rebuild_lun_table will fail, which is especially bad news if it
happens to be your boot device.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Preserve all 8 bytes of the LunID field returned
by CCISS_REPORT_LOGICAL instead of only saving 4 bytes.
This fixes a bug with logical volume addressing encountered on
an MSA2012.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Fix bug that free_hba was calling put_disk for all gendisk[]
pointers -- all 1024 of them -- regardless of whether the were
used or not (NULL). This bug could cause rmmod to oops if logical
drives had been deleted during the driver's lifetime.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When rebuild_lun_table is reached via sysfs, the usage count that
is checked prior to messing with c0d0 has different constraints
(must be zero) than if rebuild_lun_table is reached via ioctl
(must be one.) Fix rebuild_lun_table to take that into account.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When removing a logical drive, clear all the information that is
now exposed by sysfs (e.g. vendor, model, serial number.)
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
For c0dx where x is not 0, we handle deletion and addition simply,
but for c0d0, there is the special case that even when there's no
disk, the device node exists so that the controller may be accessed.
So, for c0d0, we only create the sysfs entries once, when a controller
is added, and only remove them once, when a controller is being
taken down.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Handle cases when cciss_add_disk fails.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Handle failure of blk_init_queue gracefully in cciss_add_disk.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Rearrange logical drive sysfs code to make the "changing a disk" path work.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Dynamically allocate struct device for each logical drive as needed
instead of allocating the maximum we would ever need at driver init time.
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Remove some unused code in rebuild_lun_table()
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Added /sys/bus/pci/devices/<dev>/ccissX/rescan sysfs entry used
to kick off a rescan that discovers logical drive topology changes.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Replace the use of one scan kthread per controller with one per driver.
Use a queue to hold a list of controllers that need to be rescanned with
routines to add and remove controllers from the queue.
Fix locking and completion handling to prevent a hang during rmmod.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Sysfs entries for logical drives need to be removed when a drive is
deleted during driver cleanup.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Change schedule_timeout() parameter to not be specific to HZ=1000.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: "Cameron, Steve" <Steve.Cameron@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
.. duplicated by merging the same fix twice, for details see commit
0d9df2515d ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following commit made console open fails while booting:
commit b50989dc44
Author: Alan Cox <alan@linux.intel.com>
Date: Sat Sep 19 13:13:22 2009 -0700
tty: make the kref destructor occur asynchronously
Due to tty release routines run in a workqueue now, error like the
following will be reported while booting:
INIT open /dev/console Input/output error
It also causes hibernation regression to appear as reported at
http://bugzilla.kernel.org/show_bug.cgi?id=14229
The reason is that now there's latency issue with closing, but when
we open a "closing not finished" tty, -EIO will be returned.
Fix it as per the following Alan's suggestion:
Fun but it's actually not a bug and the fix is wrong in itself as
the port may be closing but not yet being destructed, in which case
it seems to do the wrong thing. Opening a tty that is closing (and
could be closing for long periods) is supposed to return -EIO.
I suspect a better way to deal with this and keep the old console
timing is to split tty->shutdown into two functions.
tty->shutdown() - called synchronously just before we dump the tty
onto the waitqueue for destruction
tty->cleanup() - called when the destructor runs.
We would then do the shutdown part which can occur in IRQ context
fine, before queueing the rest of the release (from tty->magic = 0
... the end) to occur asynchronously
The USB update in -next would then need a call like
if (tty->cleanup)
tty->cleanup(tty);
at the top of the async function and the USB shutdown to be split
between shutdown and cleanup as the USB resource cleanup and final
tidy cannot occur synchronously as it needs to sleep.
In other words the logic becomes
final kref put
make object unfindable
async
clean it up
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
[ rjw: Rebased on top of 2.6.31-git, reworked the changelog. ]
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
[ Changed serial naming to match new rules, dropped tty_shutdown as per
comments from Alan Stern - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3d5b6fb47a ("ACPI: Kill overly
verbose "power state" log messages") removed the actual use of this
variable, but didn't remove the variable itself, resulting in build
warnings like
drivers/acpi/processor_idle.c: In function ‘acpi_processor_power_init’:
drivers/acpi/processor_idle.c:1169: warning: unused variable ‘i’
Just get rid of the now unused variable.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code
But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix hwpoison code related build failure on 32-bit NUMAQ
ia64's sim_defconfig uses CONFIG_ACPI=n
which now #define's acpi_disabled in <linux/acpi.h>
So we shouldn't re-define it here in <asm/acpi.h>
Signed-off-by: Len Brown <len.brown@intel.com>