linux

Author	SHA1	Message	Date
Ed Cashin	2292a7e109	aoe: err device: include MAC addresses for unexpected responses The /dev/etherd/err character device provides low-level information about normal but sometimes interesting AoE command retransmits and "unexpected responses", i.e., responses for packets that have already been retransmitted. This change adds MAC addresses to the messages about unexpected responses, so that when they occur, it's more easy to determine the network paths to which they belong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	3a0c40d2d2	aoe: improve network congestion handling The aoe driver already had some congestion handling, but it was limited in its ability to cope with the kind of congestion that can arise on more complex networks such as those involving paths through multiple ethernet switches. Some of the lessons from TCP's history of development can be applied to improving the congestion control and avoidance on AoE storage networks. These changes use familar concepts from Van Jacobson's "Congestion Avoidance and Control" paper from '88, without adding significant overhead. This patch depends on an upcoming patch that covers the failover case when AoE commands being retransmitted are transferred from one retransmit queue to another. Another upcoming patch increases the timing accuracy. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	667be1e757	aoe: provide ATA identify device content to user on request Make the aoe driver follow expected behavior when the user uses ioctl to get the ATA device identify information, allowing access to model, serial number, etc. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	cd220bf51f	aoe: update driver-internal version number to 60 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	a04b41cd2c	aoe: whitespace cleanup Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	d437962504	aoe: cleanup: remove unused ata_scnt function Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:24 -08:00
Ed Cashin	90a2508d01	aoe: "payload" sysfs file exports per-AoE-command data transfer size The userland aoetools package includes an "aoe-stat" command that can display a "payload size" column when the aoe driver exports this information. Users can quickly see what amount of user data is transferred inside each AoE command on the network, network headers excluded. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	aa304fdefa	aoe: support larger I/O requests via aoe_maxsectors module param The GPFS filesystem is an example of an aoe user that requires the aoe driver to support I/O request sizes larger than the default. Most users will not need large I/O request sizes, because they would need to be split up into multiple AoE commands anyway. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	4ba9aa7f98	aoe: support the forgetting (flushing) of a user-specified AoE target Users sometimes want to cause the aoe driver to forget a particular previously discovered device when it is no longer online. The aoetools provide an "aoe-flush" command that users run to perform this administrative task. The changes below provide the support needed in the driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	1b8a1636ce	aoe: update cap on outstanding commands based on config query response The ATA over Ethernet config query response contains a "buffer count" field reflecting the AoE target's capacity to buffer incoming AoE commands. By taking the current value of this field into accound, we increase performance throughput or avoid network congestion, when the value has increased or decreased, respectively. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	4e78dd144b	aoe: print warning regarding a common reason for dropped transmits Dropped transmits are not common, but when they do occur, increasing the transmit queue length often helps. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	662a889608	aoe: describe the behavior of the "err" character device Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-17 17:15:23 -08:00
Ed Cashin	11cfb6ff73	aoe: avoid running request handler on plugged queue Calling the request handler directly on a plugged queue defeats the performance improvements provided by the plugging mechanism. Use the __blk_run_queue function instead of calling the request handler directly, so that we don't interfere with the block layer's ability to plug the queue. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-11-23 14:32:55 +01:00
Ed Cashin	322c9ec009	aoe: update aoe-internal version number to 50 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	1ac9e60262	aoe: remove unused code Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:30 +09:00
Ed Cashin	08b6062351	aoe: make dynamic block minor numbers the default Because udev use is so widespread, making the old static mapping the default is too conservative, given the severe limitations it places on usable AoE addresses. Storage virtualization and larger shelves have made the old limitations too confining. These changes make the dynamic block device minor numbers the default, removing the limitations on usable AoE addresses. The static arrangement is still available with aoe_dyndevs=0, and the aoe-stat tool from the userland aoetools package, the user space counterpart to the aoe driver, recognizes the case where there is a mismatch between the minor number in sysfs and the minor number in a special device file. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	7159e969d1	aoe: update and specify AoE address guards and error messages In general, specific is better when it comes to messages about AoE usage problems. Also, explicit checks for the AoE broadcast addresses are added. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	4bcce1a355	aoe: retain static block device numbers for backwards compatibility The old mapping between AoE target shelf and slot addresses and the block device minor number is retained as a backwards-compatible feature, with a new "aoe_dyndevs" module parameter available for enabling dynamic block device minor numbers. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:29 +09:00
Ed Cashin	0c96621458	aoe: support more AoE addresses with dynamic block device minor numbers The ATA over Ethernet protocol uses a major (shelf) and minor (slot) address to identify a particular storage target. These changes remove an artificial limitation the aoe driver imposes on the use of AoE addresses. For example, without these changes, the slot address has a maximum of 15, but users commonly use slot numbers much greater than that. The AoE shelf and slot address space is often used sparsely. Instead of using a static mapping between AoE addresses and the block device minor number, the block device minor numbers are now allocated on demand. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	fea05a26c3	aoe: update copyright year in touched files Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:28 +09:00
Ed Cashin	7392fbe5ad	aoe: update internal version number to 49 The internal version number of the aoe driver appears in a console message when the driver loads and is usually obtained by the user with the userland aoe-version tool, part of the aoetools.[1] Although this patchset includes bugfixes backported from higher-numbered versions published on the coraid.com website, it is a form of version 49. 1. http://aoetools.sourceforge.net/ Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	b21faa25c6	aoe: remove unused code and add cosmetic improvements This change removes some unused code and attempts to increase code consistency. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	1b86fda9ad	aoe: increase net_device reference count while using it This change eliminates the danger that the user could rmmod the driver for a network interface that is being used for AoE by the aoe driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	64a80f5ac7	aoe: associate frames with the AoE storage target In the driver code, "target" and aoetgt refer to a particular remote interface on the AoE storage target. The latter is identified by its AoE major and minor addresses. Commands that are being sent to an AoE storage target {major, minor} can be sent or retransmitted to any of the remote MAC addresses associated with the AoE storage target. That is, frames are naturally associated with not an aoetgt (AoE major, AoE minor, remote MAC address) but an aoedev (AoE major, AoE minor). Making the code reflect that reality simplifies the driver, especially when the path to a remote MAC address becomes unusable. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:27 +09:00
Ed Cashin	6583303c5e	aoe: disallow unsupported AoE minor addresses A guard is inserted to prevent AoE minor addresses (slot addresses) higher than 15 to be used, as they are not yet supported by the driver. There is a change coming that will allow the aoe driver to overcome this limit by using system device minor numbers dynamically, but until then, this guard prevents unexpected targets from being used by the driver when AoE targets with high minor numbers are on the AoE network. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	25f4d75ea4	aoe: do revalidation steps in order The discovery process begins with an optional AoE config query command and an AoE config query response. Normally when an aoe device is already open, the config query response does not trigger an ATA identify device command to be sent out, since the response contains storage capacity information that, if changed, could surprise the user of the device. The userland "aoe-revalidate" tool uses a character device to trigger an AoE config query for a particular AoE storage target and an ATA device identify command, even when the device is open. This change causes the config query to go out first, reflecting the normal discovery sequence. The responses could come back in any order, so this change is fairly cosmetic. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	d54d35ac66	aoe: failover remote interface based on aoe_deadsecs parameter The aoe_deadsecs module parameter allows the user to specify a hard limit on the number of seconds an AoE command can be retransmitted before the AoE block device is considered to have failed. Using aoe_deadsecs to determine the time we try using a different remote interface helps to ensure that the hard limit is not reached before we've tried to recover by sending to a different remote port. As a data storage target, the AoE target is unambiguously identified by its {major, minor} AoE address tuple, and an AoE target can have multiple MAC addresses. However, note that "target" in the driver code and comments means a {major, minor, MAC address} tuple, as in "somewhere to send packets". Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:26 +09:00
Ed Cashin	3f0f013374	aoe: use packets that work with the smallest-MTU local interface Users with several network interfaces dedicated to AoE generally do not configure them to support different-sized AoE data payloads on purpose. For a given AoE target, there will be a set of local network interfaces that can reach it. Using only the payload that will fit in the smallest-sized MTU of all those local interfaces greatly simplifies the driver, especially in failure scenarios. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	eb086ec596	aoe: use a kernel thread for transmissions The dev_queue_xmit function needs to have interrupts enabled, so the most simple way to get the locking right but still fulfill that requirement is to use a process that can call dev_queue_xmit serially over queued transmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	69cf2d85de	aoe: become I/O request queue handler for increased user control To allow users to choose an elevator algorithm for their particular workloads, change from a make_request-style driver to an I/O-request-queue-handler-style driver. We have to do a couple of things that might be surprising. We manipulate the page _count directly on the assumption that we still have no guarantee that users of the block layer are prohibited from submitting bios containing pages with zero reference counts.[1] If such a prohibition now exists, I can get rid of the _count manipulation. Just as before this patch, we still keep track of the sk_buffs that the network layer still hasn't finished yet and cap the resources we use with a "pool" of skbs.[2] Now that the block layer maintains the disk stats, the aoe driver's diskstats function can go away. 1. https://lkml.org/lkml/2007/3/1/374 2. https://lkml.org/lkml/2007/7/6/241 Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:25 +09:00
Ed Cashin	896831f590	aoe: kernel thread handles I/O completions for simple locking Make the frames the aoe driver uses to track the relationship between bios and packets more flexible and detached, so that they can be passed to an "aoe_ktio" thread for completion of I/O. The frames are handled much like skbs, with a capped amount of preallocation so that real-world use cases are likely to run smoothly and degenerate gracefully even under memory pressure. Decoupling I/O completion from the receive path and serializing it in a process makes it easier to think about the correctness of the locking in the driver, especially in the case of a remote MAC address becoming unusable. [dan.carpenter@oracle.com: cleanup an allocation a bit] Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	3d5b06051c	aoe: for performance support larger packet payloads tAdd adds the ability to work with large packets composed of a number of segments, using the scatter gather feature of the block layer (biovecs) and the network layer (skb frag array). The motivation is the performance gained by using a packet data payload greater than a page size and by using the network card's scatter gather feature. Users of the out-of-tree aoe driver already had these changes, but since early 2011, they have complained of increased memory utilization and higher CPU utilization during heavy writes.[1] The commit below appears related, as it disables scatter gather on non-IP protocols inside the harmonize_features function, even when the NIC supports sg. commit `f01a5236bd` Author: Jesse Gross <jesse@nicira.com> Date: Sun Jan 9 06:23:31 2011 +0000 net offloading: Generalize netif_get_vlan_features(). With that regression in place, transmits always linearize sg AoE packets, but in-kernel users did not have this patch. Before 2.6.38, though, these changes were working to allow sg to increase performance. 1. http://www.spinics.net/lists/linux-mm/msg15184.html Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-06 03:05:24 +09:00
Ed Cashin	8babe8cc65	aoe: assert AoE packets marked as requiring no checksum In order for the network layer to see that AoE requires no checksumming in a generic way, the packets must be marked as requiring no checksum, so we make this requirement explicit with the assertion. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-20 22:23:40 -04:00
Al Viro	2c9ede55ec	switch device_get_devnode() and ->devnode() to umode_t * both callers of device_get_devnode() are only interested in lower 16bits and nobody tries to return anything wider than 16bit anyway. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:55 -05:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Paul Gortmaker	d5decd3b95	block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros These files were getting <linux/module.h> via an implicit include path, but we want to crush those out of existence since they cost time during compiles of processing thousands of lines of headers for no reason. Give them the lightweight header that just contains the EXPORT_SYMBOL infrastructure. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:12 -04:00
Christoph Hellwig	5a7bbad27a	block: remove support for bio remapping from ->make_request There is very little benefit in allowing to let a ->make_request instance update the bios device and sector and loop around it in __generic_make_request when we can archive the same through calling generic_make_request from the driver and letting the loop in generic_make_request handle it. Note that various drivers got the return value from ->make_request and returned non-zero values for errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-09-12 12:12:01 +02:00
Tracey Dent	a0700bdd0b	drivers/block/aoe/Makefile: replace the use of <module>-objs with <module>-y Change Makefile to use <modules>-y instead of <modules>-objs because -objs is deprecated and should now be switched. According to (documentation/kbuild/makefiles.txt). Signed-off-by: Tracey Dent <tdent48227@gmail.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-01-19 08:25:02 -07:00
David S. Miller	c25ecd0a21	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-11-14 11:57:05 -08:00
Christoph Hellwig	02e031cbc8	block: remove REQ_HARDBARRIER REQ_HARDBARRIER is dead now, so remove the leftovers. What's left at this point is: - various checks inside the block layer. - sanity checks in bio based drivers. - now unused bio_empty_barrier helper. - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while, but Xen really needs to sort out it's barrier situaton. - setting of ordered tags in uas - dead code copied from old scsi drivers. - scsi different retry for barriers - it's dead and should have been removed when flushes were converted to FS requests. - blktrace handling of barriers - removed. Someone who knows blktrace better should add support for REQ_FLUSH and REQ_FUA, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-11-10 14:54:09 +01:00
Eric Dumazet	840a185ddd	aoe: remove dev_base_lock use from aoecmd_cfg_pkts() dev_base_lock is the legacy way to lock the device list, and is planned to disappear. (writers hold RTNL, readers hold RCU lock) Convert aoecmd_cfg_pkts() to RCU locking. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 13:50:07 -08:00
Andrew Morton	027b180d74	drivers/block/aoe/aoeblk.c: ratelimit a warning printk As described in https://bugzilla.kernel.org/show_bug.cgi?id=19922 : I had an AoE device go down overnight, and while a server was trying to : write to it, it was also writing this message to its logs: : : 209 printk(KERN_INFO "aoe: device %ld.%d is not up\n", : 210 d->aoemajor, d->aoeminor); : : The message appeared many times per second, and over several hours : produced about 7.5 gigabytes of log files, filling up all free space on : the root filesystem. Cc: "Ed L. Cashin" <ecashin@coraid.com> Suggested-by: Roman Mamedov <roman@rm.pp.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:15:26 -06:00
Tejun Heo	5ad21a3374	aoe: don't use flush_scheduled_work() flush_scheduled_work() is deprecated and scheduled to be removed. Directly cancel aoedev->work on free instead of depending on flush_scheduled_works(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-28 06:15:26 -06:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Arnd Bergmann	2a48fc0ab2	block: autoconvert trivial BKL users to private mutex The block device drivers have all gained new lock_kernel calls from a recent pushdown, and some of the drivers were already using the BKL before. This turns the BKL into a set of per-driver mutexes. Still need to check whether this is safe to do. file=$1 name=$2 if grep -q lock_kernel ${file} ; then if grep -q 'include.linux.mutex.h' ${file} ; then sed -i '/include.<linux\/smp_lock.h>/d' ${file} else sed -i 's/include.<linux\/smp_lock.h>.$/include <linux\/mutex.h>/g' ${file} fi sed -i ${file} \ -e "/^#include.linux.mutex.h/,$ { 1,/^$static\\|int\\|long$/ { /^$static\\|int\\|long$/istatic DEFINE_MUTEX(${name}_mutex); } }" \ -e "s/$un$lock_kernel\>[ ]()/mutex_\1lock(\&${name}_mutex)/g" \ -e '/[ ]cycle_kernel_lock();/d' else sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \ -e '/cycle_kernel_lock()/d' fi Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2010-10-05 15:01:10 +02:00
Arnd Bergmann	6e9624b8ca	block: push down BKL into .open and .release The open and release block_device_operations are currently called with the BKL held. In order to change that, we must first make sure that all drivers that currently rely on this have no regressions. This blindly pushes the BKL into all .open and .release operations for all block drivers to prepare for the next step. The drivers can subsequently replace the BKL with their own locks or remove it completely when it can be shown that it is not needed. The functions blkdev_get and blkdev_put are the only remaining users of the big kernel lock in the block layer, besides a few uses in the ioctl code, none of which need to serialize with blkdev_{get,put}. Most of these two functions is also under the protection of bdev->bd_mutex, including the actual calls to ->open and ->release, and the common code does not access any global data structures that need the BKL. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:25:34 +02:00
Christoph Hellwig	7b6d91daee	block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:20:39 +02:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Andrew Morton	6ec1480d85	aoe: switch to the new bio_flush_dcache_pages() interface Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Ilya Loginov <isloginov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Horton <phorton@bitbox.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-12-22 09:12:48 +01:00
Peter Horton	0a1f127a05	aoe: prevent cache aliases Prevent the AoE block driver from creating cache aliases of page cache pages on machines with virtually indexed caches. Building kernels on an AT91SAM9G20 board without this patch fails with segmentation faults after a couple of passes. Signed-off-by: Peter Horton <zero@colonel-panic.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-01 16:32:20 -08:00
Alexey Dobriyan	83d5cde47d	const: make block_device_operations const Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:25 -07:00
Kay Sievers	e454cea20b	Driver-Core: extend devnode callbacks to provide permissions This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-09-19 12:50:38 -07:00
Linus Torvalds	355bbd8cb8	Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...	2009-09-14 17:55:15 -07:00
Ed Cashin	18d8217bc4	aoe: end barrier bios with EOPNOTSUPP BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=13942 Bruno Premont noticed that aoe throws a BUG during umount of an XFS in 2.6.31: [ 5259.349897] aoe: bi_io_vec is NULL [ 5259.349940] ------------[ cut here ]------------ [ 5259.349958] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177! [ 5259.349990] invalid opcode: 0000 [#1] The bio in question is a barrier. Jens Axboe suggested that such bios need to be recognized and ended with -EOPNOTSUPP by any driver that provides its own ->make_request_fn handler and does not handle barriers. In testing the changes below eliminate the BUG. (Better would be real barrier support, something that Ed says he'll add for later in the .32 cycle. For now, this at least gets rid of a bug with crashing on an empty barrier. Jens) Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-14 08:24:52 +02:00
Jens Axboe	d993831fa7	writeback: add name to backing_dev_info This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 09:20:26 +02:00
Ed Cashin	7135a71b19	aoe: allocate unused request_queue for sysfs Andy Whitcroft reported an oops in aoe triggered by use of an incorrectly initialised request_queue object: [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add an uninitialized object, something is seriously wrong. [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu [ 2645.959107] Call Trace: [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70 [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0 [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160 [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe] The request queue of an aoe device is not used but can be allocated in code that does not sleep. Bruno bisected this regression down to `cd43e26f07` block: Expose stacked device queues in sysfs "This seems to generate /sys/block/$device/queue and its contents for everyone who is using queues, not just for those queues that have a non-NULL queue->request_fn." Addresses http://bugs.launchpad.net/bugs/410198 Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942 Note that embedding a queue inside another object has always been an illegal construct, since the queues are reference counted and must persist until the last reference is dropped. So aoe was always buggy in this respect (Jens). Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Bruno Premont <bonbons@linux-vserver.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-09 14:10:18 +02:00
Kay Sievers	1ce8a0d396	Driver Core: aoe: add nodename for aoe devices This adds support to the AOE core to report the proper device name to userspace for the AOE devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jan Blunck <jblunck@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-06-15 21:30:26 -07:00
David S. Miller	438263ac58	aoe: Remove superfluous clearing of skb fields in new_skb(). This code uses alloc_skb() which clears them out for us. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-27 17:09:44 -07:00
Bartlomiej Zolnierkiewicz	04b3ab52a0	aoe: WIN_* -> ATA_CMD_* * Use ATA_CMD_* defines instead of WIN_* ones. * Include <linux/ata.h> directly instead of through <linux/hdreg.h>. Cc: Ed L. Cashin <ecashin@coraid.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-01 21:42:24 +02:00
Stephen Hemminger	7546dd97d2	net: convert usage of packet_type to read_mostly Protocols that use packet_type can be __read_mostly section for better locality. Elminate any unnecessary initializations of NULL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-10 05:22:43 -07:00
Roel Kluin	9487311157	aoe: error printed 1 too early with while (i-- > 0); i reaches -1 after the loop, so the test below is printed one too early: 0 still means success. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-04 00:11:52 -08:00
Ed Cashin	b6d6c51758	aoe: ignore vendor extension AoE responses The Welland ME-747K-SI AoE target generates unsolicited AoE responses that are marked as vendor extensions. Instead of ignoring these packets, the aoe driver was generating kernel messages for each unrecognized response received. This patch corrects the behavior. Signed-off-by: Ed Cashin <ecashin@coraid.com> Reported-by: <karaluh@karaluh.pl> Tested-by: <karaluh@karaluh.pl> Cc: <stable@kernel.org> Cc: Alex Buell <alex.buell@munted.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:53 -08:00
Harvey Harrison	411c41eea5	aoe: remove private mac address format function Add %pm to omit the colons when printing a mac address. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 00:40:37 -08:00
Al Viro	94562c1751	[PATCH] switch aoeblk Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:40 -04:00
Al Viro	d4430d62fa	[PATCH] beginning of methods conversion To keep the size of changesets sane we split the switch by drivers; to keep the damn thing bisectable we do the following: 1) rename the affected methods, add ones with correct prototypes, make (few) callers handle both. That's this changeset. 2) for each driver convert to new methods. ALL drivers are converted in this series. 3) kill the old (renamed) methods. Note that it _is_ a flagday; all in-tree drivers are converted and by the end of this series no trace of old methods remain. The only reason why we do that this way is to keep the damn thing bisectable and allow per-driver debugging if anything goes wrong. New methods: open(bdev, mode) release(disk, mode) ioctl(bdev, mode, cmd, arg) /* Called without BKL / compat_ioctl(bdev, mode, cmd, arg) locked_ioctl(bdev, mode, cmd, arg) / Called with BKL, legacy */ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-10-21 07:47:32 -04:00
Parag Warudkar	01e8ef11bc	x86: sysfs: kill owner field from attribute Tejun's commit `7b595756ec` made sysfs attribute->owner unnecessary. But the field was left in the structure to ease the merge. It's been over a year since that change and it is now time to start killing attribute->owner along with its users - one arch at a time! This patch is attempt #1 to get rid of attribute->owner only for CONFIG_X86_64 or CONFIG_X86_32 . We will deal with other arches later on as and when possible - avr32 will be the next since that is something I can test. Compile (make allyesconfig / make allmodconfig / custom config) and boot tested. akpm: the idea is that we put the declaration of sttribute.owner inside `#ifndef CONFIG_X86'. But that proved to be too ambitious for now because new usages kept on turning up in subsystem trees. [akpm: remove the ifdef for now] Signed-off-by: Parag Warudkar <parag.lkml@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <htejun@gmail.com> Cc: Len Brown <lenb@kernel.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: David Brownell <david-b@pacbell.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:42 -07:00
Greg Kroah-Hartman	1ff9f542e5	device create: block: convert device_create_drvdata to device_create Now that device_create() has been audited, rename things back to the original call to be sane. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-10-16 09:24:41 -07:00
Linus Torvalds	4dd9ec4946	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1075 commits) myri10ge: update driver version number to 1.4.3-1.369 r8169: add shutdown handler r8169: preliminary 8168d support r8169: support additional 8168cp chipset r8169: change default behavior for mildly identified 8168c chipsets r8169: add a new 8168cp flavor r8169: add a new 8168c flavor (bis) r8169: add a new 8168c flavor r8169: sync existing 8168 device hardware start sequences with vendor driver r8169: 8168b Tx performance tweak r8169: make room for more specific 8168 hardware start procedure r8169: shuffle some registers handling around (8168 operation only) r8169: new phy init parameters for the 8168b r8169: update phy init parameters r8169: wake up the PHY of the 8168 af_key: fix SADB_X_SPDDELETE response ath9k: Fix return code when ath9k_hw_setpower() fails on reset ath9k: remove nasty FAIL macro from ath9k_hw_reset() gre: minor cleanups in netlink interface gre: fix copy and paste error ...	2008-10-11 09:33:18 -07:00
Tejun Heo	074a7aca7a	block: move stats from disk to part0 Move stats related fields - stamp, in_flight, dkstats - from disk to part0 and unify stat handling such that... * part_stat_() now updates part0 together if the specified partition is not part0. ie. part_stat_() are now essentially all_stat_(). {disk\|all}_stat_() are gone. part_round_stats() is updated similary. It handles part0 stats automatically and disk_round_stats() is killed. * part_{inc\|dec}_in_fligh() is implemented which automatically updates part0 stats for parts other than part0. * disk_map_sector_rcu() is updated to return part0 if no part matches. Combined with the above changes, this makes NULL special case handling in callers unnecessary. * Separate stats show code paths for disk are collapsed into part stats show code paths. * Rename disk_stat_lock/unlock() to part_stat_lock/unlock() While at it, reposition stat handling macros a bit and add missing parentheses around macro parameters. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:08 +02:00
Tejun Heo	80795aefb7	block: move capacity from disk to part0 Move disk->capacity to part0->nr_sects and convert all users who directly accessed the field to use {get\|set}_capacity(). This is done early to allow the __dev field to be moved. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:07 +02:00
Tejun Heo	ed9e198234	block: implement and use {disk\|part}_to_dev() Implement {disk\|part}_to_dev() and use them to access generic device instead of directly dereferencing {disk\|part}->dev. To make sure no user is left behind, rename generic devices fields to __dev. This is in preparation of unifying partition 0 handling with other partitions. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:07 +02:00
Tejun Heo	c995905916	block: fix diskstats access There are two variants of stat functions - ones prefixed with double underbars which don't care about preemption and ones without which disable preemption before manipulating per-cpu counters. It's unclear whether the underbarred ones assume that preemtion is disabled on entry as some callers don't do that. This patch unifies diskstats access by implementing disk_stat_lock() and disk_stat_unlock() which take care of both RCU (for partition access) and preemption (for per-cpu counter access). diskstats access should always be enclosed between the two functions. As such, there's no need for the versions which disables preemption. They're removed and double underbars ones are renamed to drop the underbars. As an extra argument is added, there's no danger of using the old version unconverted. disk_stat_lock() uses get_cpu() and returns the cpu index and all diskstat functions which access per-cpu counters now has @cpu argument to help RT. This change adds RCU or preemption operations at some places but also collapses several preemption ops into one at others. Overall, the performance difference should be negligible as all involved ops are very lightweight per-cpu ones. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:06 +02:00
Tejun Heo	e71bf0d0ee	block: fix disk->part[] dereferencing race disk->part[] is protected by its matching bdev's lock. However, non-critical accesses like collecting stats and printing out sysfs and proc information used to be performed without any locking. As partitions can come and go dynamically, partitions can go away underneath those non-critical accesses. As some of those accesses are writes, this theoretically can lead to silent corruption. This patch fixes the race by using RCU for the partition array and dev reference counter to hold partitions. * Rename disk->part[] to disk->__part[] to make sure no one outside genhd layer proper accesses it directly. * Use RCU for disk->__part[] dereferencing. * Implement disk_{get\|put}_part() which can be used to get and put partitions from gendisk respectively. * Iterators are implemented to help iterate through all partitions safely. * Functions which require RCU readlock are marked with _rcu suffix. * Use disk_put_part() in __blkdev_put() instead of directly putting the contained kobject. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:06 +02:00
Tejun Heo	310a2c1012	block: misc updates This patch makes the following misc updates in preparation for disk->part dereference fix and extended block devt support. * implment part_to_disk() * fix comment about gendisk->part indexing * rename get_part() to disk_map_sector() * don't use n which is always zero while printing disk information in diskstats_show() Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:04 +02:00
David S. Miller	d87798450a	aoe: Fix OOPS after SKB queue changes. Reported by Thomas Graf. If we don't unlink the SKB from the queue when we send it out in aoenet_xmit(), dev_hard_start_xmit() will see skb->next as non-NULL and interpret this to mean the SKB is part of a GSO segment list. Add __skb_unlink() call to fix that. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-23 20:47:22 -07:00
David S. Miller	e9bb8fb0b6	aoe: Use SKB interfaces for list management instead of home-grown stuff. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-21 22:36:49 -07:00
Niels de Vos	61a2d07d3f	Remove newline from the description of module parameters Some module parameters with only one line have the '\n' at the end of the description. This is not needed nor wanted as after the description the type (i.e. int) is followed by a newline. Some modules contain a multi-line description, these are not affected by this patch. Signed-off-by: Niels de Vos <niels.devos@wincor-nixdorf.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Ed L. Cashin <ecashin@coraid.com> Cc: Dave Airlie <airlied@linux.ie> Cc: Roland Dreier <rolandd@cisco.com> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-08-01 12:46:41 -07:00
Matthias Kaehlcke	24879a8e3e	aoe: convert emsgs_sema into a completion ATA over Ethernet: The semaphore emsgs_sema is used for signalling an event, convert it in a completion. Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:45 -07:00
Greg Kroah-Hartman	f79f060561	device create: block: convert device_create to device_create_drvdata device_create() is race-prone, so use the race-free device_create_drvdata() instead as device_create() is going away. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-07-21 21:54:41 -07:00
Linus Torvalds	d1794f2c5b	Merge branch 'bkl-removal' of git://git.lwn.net/linux-2.6 * 'bkl-removal' of git://git.lwn.net/linux-2.6: (146 commits) IB/umad: BKL is not needed for ib_umad_open() IB/uverbs: BKL is not needed for ib_uverbs_open() bf561-coreb: BKL unneeded for open() Call fasync() functions without the BKL snd/PCM: fasync BKL pushdown ipmi: fasync BKL pushdown ecryptfs: fasync BKL pushdown Bluetooth VHCI: fasync BKL pushdown tty_io: fasync BKL pushdown tun: fasync BKL pushdown i2o: fasync BKL pushdown mpt: fasync BKL pushdown Remove BKL from remote_llseek v2 Make FAT users happier by not deadlocking x86-mce: BKL pushdown vmwatchdog: BKL pushdown vmcp: BKL pushdown via-pmu: BKL pushdown uml-random: BKL pushdown uml-mmapper: BKL pushdown ...	2008-07-14 14:48:31 -07:00
Harvey Harrison	823ed72e8f	block: use get_unaligned_* helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-07-04 09:28:32 +02:00
Jonathan Corbet	579174a55f	AoE: cdev lock_kernel() pushdown Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-06-20 14:03:43 -06:00
Jens Axboe	28f13702f0	block: avoid duplicate calls to get_part() in disk stat code get_part() is fairly expensive, as it O(N) loops over partitions to find the right one. In lots of normal IO paths we end up looking up the partition twice, to make matters even worse. Change the stat add code to accept a passed in partition instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-05-07 10:15:46 +02:00
Harvey Harrison	f885f8d127	drivers/block: use get_unaligned_* helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Ed L. Cashin <ecashin@coraid.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:27 -07:00
Adrian Bunk	0302190411	remove aoedev_isbusy() Remove the no longer used aoedev_isbusy(). Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:24 -07:00
YOSHIFUJI Hideaki	c346dca108	[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:53 +09:00
Linus Torvalds	03054de1e0	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Enhanced partition statistics: documentation update Enhanced partition statistics: remove old partition statistics Enhanced partition statistics: procfs Enhanced partition statistics: sysfs Enhanced partition statistics: aoe fix Enhanced partition statistics: update partition statitics Enhanced partition statistics: core statistics block: fixup rq_init() a bit Manually fixed conflict in drivers/block/aoe/aoecmd.c due to statistics support.	2008-02-08 09:42:46 -08:00
Andrew Morton	476aed3870	aoe: statically initialise devlist_lock I guess aoedev_init() can go away now. Cc: Greg KH <greg@kroah.com> Cc: "Ed L. Cashin" <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	52e112b3ab	aoe: update copyright date Update the year in the copyright notices. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	578c4aa0b4	aoe: make error messages more specific Andrew Morton pointed out that the "too many targets" message in patch 2 could be printed for failing GFP_ATOMIC allocations. This patch makes the messages more specific. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	1d75981a80	aoe: the aoeminor doesn't need a long format The aoedev aoeminor member doesn't need a long format. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	7df620d852	aoe: add module parameter for users who need more outstanding I/O An AoE target provides an estimate of the number of outstanding commands that the AoE initiator can send before getting a response. The aoe_maxout parameter provides a way to set an even lower limit. It will not allow a user to use more outstanding commands than the target permits. If a user discovers a problem with a large setting, this parameter provides a way for us to work with them to debug the problem. We expect to improve the dynamic window sizing algorithm and drop this parameter. For the time being, it is a debugging aid. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	6b9699bbd2	aoe: only install new AoE device once An aoe driver user who had about 70 AoE targets found that he was hitting a BUG in sysfs_create_file because the aoe driver was trying to tell the kernel about an AoE device more than once. Each AoE device was reachable by several local network interfaces, and multiple ATA device indentify responses were returning from that single device. This patch eliminates a race condition so that aoe always informs the block layer of a new AoE device once in the presence of multiple incoming ATA device identify responses. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	9bb237b6a6	aoe: dynamically allocate a capped number of skbs when necessary What this Patch Does Even before this recent series of 12 patches to 2.6.22-rc4, the aoe driver was reusing a small set of skbs that were allocated once and were only used for outbound AoE commands. The network layer cannot be allowed to put_page on the data that is still associated with a bio we haven't returned to the block layer, so the aoe driver (even before the patch under discussion) is still the owner of skbs that have been handed to the network layer for transmission. We need to keep track of these skbs so that we can free them, but by tracking them, we can also easily re-use them. The new patch was a response to the behavior of certain network drivers. We cannot reuse an skb that the network driver still has in its transmit ring. Network drivers can defer transmit ring cleanup and then use the state in the skb to determine how many data segments to clean up in its transmit ring. The tg3 driver is one driver that behaves in this way. When the network driver defers cleanup of its transmit ring, the aoe driver can find itself in a situation where it would like to send an AoE command, and the AoE target is ready for more work, but the network driver still has all of the pre-allocated skbs. In that case, the new patch just calls alloc_skb, as you'd expect. We don't want to get carried away, though. We try not to do excessive allocation in the write path, so we cap the number of skbs we dynamically allocate. Probably calling it a "dynamic pool" is misleading. We were already trying to use a small fixed-size set of pre-allocated skbs before this patch, and this patch just provides a little headroom (with a ceiling, though) to accomodate network drivers that hang onto skbs, by allocating when needed. The d->skbpool_hd list of allocated skbs is necessary so that we can free them later. We didn't notice the need for this headroom until AoE targets got fast enough. Alternatives If the network layer never did a put_page on the pages in the bio's we get from the block layer, then it would be possible for us to hand skbs to the network layer and forget about them, allowing the network layer to free skbs itself (and thereby calling our own skb->destructor callback function if we needed that). In that case we could get rid of the pre-allocated skbs and also the d->skbpool_hd, instead just calling alloc_skb every time we wanted to transmit a packet. The slab allocator would effectively maintain the list of skbs. Besides a loss of CPU cache locality, the main concern with that approach the danger that it would increase the likelihood of deadlock when VM is trying to free pages by writing dirty data from the page cache through the aoe driver out to persistent storage on an AoE device. Right now we have a situation where we have pre-allocation that corresponds to how much we use, which seems ideal. Of course, there's still the separate issue of receiving the packets that tell us that a write has successfully completed on the AoE target. When memory is low and VM is using AoE to flush dirty data to free up pages, it would be perfect if there were a way for us to register a fast callback that could recognize write command completion responses. But I don't think the current problems with the receive side of the situation are a justification for exacerbating the problem on the transmit side. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:32 -08:00
Ed L. Cashin	262bf54144	aoe: user can ask driver to forget previously detected devices When an AoE device is detected, the kernel is informed, and a new block device is created. If the device is unused, the block device corresponding to remote device that is no longer available may be removed from the system by telling the aoe driver to "flush" its list of devices. Without this patch, software like GPFS and LVM may attempt to read from AoE devices that were discovered earlier but are no longer present, blocking until the I/O attempt times out. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Ed L. Cashin	cf446f0dba	aoe: eliminate goto and improve readability Adam Richter suggested eliminating this goto. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Ed L. Cashin	1eb0da4cea	aoe: mac_addr: avoid 64-bit arch compiler warnings By returning unsigned long long, mac_addr does not generate compiler warnings on 64-bit architectures. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Ed L. Cashin	68e0d42f39	aoe: handle multiple network paths to AoE device A remote AoE device is something can process ATA commands and is identified by an AoE shelf number and an AoE slot number. Such a device might have more than one network interface, and it might be reachable by more than one local network interface. This patch tracks the available network paths available to each AoE device, allowing them to be used more efficiently. Andrew Morton asked about the call to msleep_interruptible in the revalidate function. Yes, if a signal is pending, then msleep_interruptible will not return 0. That means we will not loop but will call aoenet_xmit with a NULL skb, which is a noop. If the system is too low on memory or the aoe driver is too low on frames, then the user can hit control-C to interrupt the attempt to do a revalidate. I have added a comment to the code summarizing that. Andrew Morton asked whether the allocation performed inside addtgt could use a more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev lock has been locked with spin_lock_irqsave. It would be nice to allocate the memory under fewer restrictions, but targets are only added when the device is being discovered, and if the target can't be added right now, we can try again in a minute when then next AoE config query broadcast goes out. Andrew Morton pointed out that the "too many targets" message could be printed for failing GFP_ATOMIC allocations. The last patch in this series makes the messages more specific. Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00
Ed L. Cashin	8911ef4dc9	aoe: bring driver version number to 47 Signed-off-by: Ed L. Cashin <ecashin@coraid.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-08 09:22:31 -08:00

1 2 3 4 5

224 Commits