Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block
Pull block IO core bits from Jens Axboe:
 "Below are the core block IO bits for 3.9.  It was delayed a few days
  since my workstation kept crashing every 2-8h after pulling it into
  current -git, but turns out it is a bug in the new pstate code (divide
  by zero, will report separately).  In any case, it contains:
   - The big cfq/blkcg update from Tejun and and Vivek.
   - Additional block and writeback tracepoints from Tejun.
   - Improvement of the should sort (based on queues) logic in the plug
     flushing.
   - _io() variants of the wait_for_completion() interface, using
     io_schedule() instead of schedule() to contribute to io wait
     properly.
   - Various little fixes.
  You'll get two trivial merge conflicts, which should be easy enough to
  fix up"
Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42: "hlist: drop the node parameter from iterators").
* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
  block: remove redundant check to bd_openers()
  block: use i_size_write() in bd_set_size()
  cfq: fix lock imbalance with failed allocations
  drivers/block/swim3.c: fix null pointer dereference
  block: don't select PERCPU_RWSEM
  block: account iowait time when waiting for completion of IO request
  sched: add wait_for_completion_io[_timeout]
  writeback: add more tracepoints
  block: add block_{touch|dirty}_buffer tracepoint
  buffer: make touch_buffer() an exported function
  block: add @req to bio_{front|back}_merge tracepoints
  block: add missing block_bio_complete() tracepoint
  block: Remove should_sort judgement when flush blk_plug
  block,elevator: use new hashtable implementation
  cfq-iosched: add hierarchical cfq_group statistics
  cfq-iosched: collect stats from dead cfqgs
  cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
  blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
  block: RCU free request_queue
  blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
  ...
			
			
This commit is contained in:
		
						commit
						ee89f81252
					
				| @ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the | ||||
| performace although this can cause the latency of some I/O to increase due | ||||
| to more number of requests. | ||||
| 
 | ||||
| CFQ Group scheduling | ||||
| ==================== | ||||
| 
 | ||||
| CFQ supports blkio cgroup and has "blkio." prefixed files in each | ||||
| blkio cgroup directory. It is weight-based and there are four knobs | ||||
| for configuration - weight[_device] and leaf_weight[_device]. | ||||
| Internal cgroup nodes (the ones with children) can also have tasks in | ||||
| them, so the former two configure how much proportion the cgroup as a | ||||
| whole is entitled to at its parent's level while the latter two | ||||
| configure how much proportion the tasks in the cgroup have compared to | ||||
| its direct children. | ||||
| 
 | ||||
| Another way to think about it is assuming that each internal node has | ||||
| an implicit leaf child node which hosts all the tasks whose weight is | ||||
| configured by leaf_weight[_device]. Let's assume a blkio hierarchy | ||||
| composed of five cgroups - root, A, B, AA and AB - with the following | ||||
| weights where the names represent the hierarchy. | ||||
| 
 | ||||
|         weight leaf_weight | ||||
|  root :  125    125 | ||||
|  A    :  500    750 | ||||
|  B    :  250    500 | ||||
|  AA   :  500    500 | ||||
|  AB   : 1000    500 | ||||
| 
 | ||||
| root never has a parent making its weight is meaningless. For backward | ||||
| compatibility, weight is always kept in sync with leaf_weight. B, AA | ||||
| and AB have no child and thus its tasks have no children cgroup to | ||||
| compete with. They always get 100% of what the cgroup won at the | ||||
| parent level. Considering only the weights which matter, the hierarchy | ||||
| looks like the following. | ||||
| 
 | ||||
|           root | ||||
|        /    |   \ | ||||
|       A     B    leaf | ||||
|      500   250   125 | ||||
|    /  |  \ | ||||
|   AA  AB  leaf | ||||
|  500 1000 750 | ||||
| 
 | ||||
| If all cgroups have active IOs and competing with each other, disk | ||||
| time will be distributed like the following. | ||||
| 
 | ||||
| Distribution below root. The total active weight at this level is | ||||
| A:500 + B:250 + C:125 = 875. | ||||
| 
 | ||||
|  root-leaf :   125 /  875      =~ 14% | ||||
|  A         :   500 /  875      =~ 57% | ||||
|  B(-leaf)  :   250 /  875      =~ 28% | ||||
| 
 | ||||
| A has children and further distributes its 57% among the children and | ||||
| the implicit leaf node. The total active weight at this level is | ||||
| AA:500 + AB:1000 + A-leaf:750 = 2250. | ||||
| 
 | ||||
|  A-leaf    : ( 750 / 2250) * A =~ 19% | ||||
|  AA(-leaf) : ( 500 / 2250) * A =~ 12% | ||||
|  AB(-leaf) : (1000 / 2250) * A =~ 25% | ||||
| 
 | ||||
| CFQ IOPS Mode for group scheduling | ||||
| =================================== | ||||
| Basic CFQ design is to provide priority based time slices. Higher priority | ||||
|  | ||||
| @ -94,13 +94,11 @@ Throttling/Upper Limit policy | ||||
| 
 | ||||
| Hierarchical Cgroups | ||||
| ==================== | ||||
| - Currently none of the IO control policy supports hierarchical groups. But | ||||
|   cgroup interface does allow creation of hierarchical cgroups and internally | ||||
|   IO policies treat them as flat hierarchy. | ||||
| - Currently only CFQ supports hierarchical groups. For throttling, | ||||
|   cgroup interface does allow creation of hierarchical cgroups and | ||||
|   internally it treats them as flat hierarchy. | ||||
| 
 | ||||
|   So this patch will allow creation of cgroup hierarchcy but at the backend | ||||
|   everything will be treated as flat. So if somebody created a hierarchy like | ||||
|   as follows. | ||||
|   If somebody created a hierarchy like as follows. | ||||
| 
 | ||||
| 			root | ||||
| 			/  \ | ||||
| @ -108,16 +106,20 @@ Hierarchical Cgroups | ||||
| 			| | ||||
| 		     test3 | ||||
| 
 | ||||
|   CFQ and throttling will practically treat all groups at same level. | ||||
|   CFQ will handle the hierarchy correctly but and throttling will | ||||
|   practically treat all groups at same level. For details on CFQ | ||||
|   hierarchy support, refer to Documentation/block/cfq-iosched.txt. | ||||
|   Throttling will treat the hierarchy as if it looks like the | ||||
|   following. | ||||
| 
 | ||||
| 				pivot | ||||
| 			     /  /   \  \ | ||||
| 			root  test1 test2  test3 | ||||
| 
 | ||||
|   Down the line we can implement hierarchical accounting/control support | ||||
|   and also introduce a new cgroup file "use_hierarchy" which will control | ||||
|   whether cgroup hierarchy is viewed as flat or hierarchical by the policy.. | ||||
|   This is how memory controller also has implemented the things. | ||||
|   Nesting cgroups, while allowed, isn't officially supported and blkio | ||||
|   genereates warning when cgroups nest. Once throttling implements | ||||
|   hierarchy support, hierarchy will be supported and the warning will | ||||
|   be removed. | ||||
| 
 | ||||
| Various user visible config options | ||||
| =================================== | ||||
| @ -172,6 +174,12 @@ Proportional weight policy files | ||||
| 	  dev     weight | ||||
| 	  8:16    300 | ||||
| 
 | ||||
| - blkio.leaf_weight[_device] | ||||
| 	- Equivalents of blkio.weight[_device] for the purpose of | ||||
|           deciding how much weight tasks in the given cgroup has while | ||||
|           competing with the cgroup's child cgroups. For details, | ||||
|           please refer to Documentation/block/cfq-iosched.txt. | ||||
| 
 | ||||
| - blkio.time | ||||
| 	- disk time allocated to cgroup per device in milliseconds. First | ||||
| 	  two fields specify the major and minor number of the device and | ||||
| @ -279,6 +287,11 @@ Proportional weight policy files | ||||
| 	  and minor number of the device and third field specifies the number | ||||
| 	  of times a group was dequeued from a particular device. | ||||
| 
 | ||||
| - blkio.*_recursive | ||||
| 	- Recursive version of various stats. These files show the | ||||
|           same information as their non-recursive counterparts but | ||||
|           include stats from all the descendant cgroups. | ||||
| 
 | ||||
| Throttling/Upper limit policy files | ||||
| ----------------------------------- | ||||
| - blkio.throttle.read_bps_device | ||||
|  | ||||
| @ -4,7 +4,6 @@ | ||||
| menuconfig BLOCK | ||||
|        bool "Enable the block layer" if EXPERT | ||||
|        default y | ||||
|        select PERCPU_RWSEM | ||||
|        help | ||||
| 	 Provide block layer support for the kernel. | ||||
| 
 | ||||
|  | ||||
| @ -26,11 +26,32 @@ | ||||
| 
 | ||||
| static DEFINE_MUTEX(blkcg_pol_mutex); | ||||
| 
 | ||||
| struct blkcg blkcg_root = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT }; | ||||
| struct blkcg blkcg_root = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT, | ||||
| 			    .cfq_leaf_weight = 2 * CFQ_WEIGHT_DEFAULT, }; | ||||
| EXPORT_SYMBOL_GPL(blkcg_root); | ||||
| 
 | ||||
| static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS]; | ||||
| 
 | ||||
| static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, | ||||
| 				      struct request_queue *q, bool update_hint); | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants | ||||
|  * @d_blkg: loop cursor pointing to the current descendant | ||||
|  * @pos_cgrp: used for iteration | ||||
|  * @p_blkg: target blkg to walk descendants of | ||||
|  * | ||||
|  * Walk @c_blkg through the descendants of @p_blkg.  Must be used with RCU | ||||
|  * read locked.  If called under either blkcg or queue lock, the iteration | ||||
|  * is guaranteed to include all and only online blkgs.  The caller may | ||||
|  * update @pos_cgrp by calling cgroup_rightmost_descendant() to skip | ||||
|  * subtree. | ||||
|  */ | ||||
| #define blkg_for_each_descendant_pre(d_blkg, pos_cgrp, p_blkg)		\ | ||||
| 	cgroup_for_each_descendant_pre((pos_cgrp), (p_blkg)->blkcg->css.cgroup) \ | ||||
| 		if (((d_blkg) = __blkg_lookup(cgroup_to_blkcg(pos_cgrp), \ | ||||
| 					      (p_blkg)->q, false))) | ||||
| 
 | ||||
| static bool blkcg_policy_enabled(struct request_queue *q, | ||||
| 				 const struct blkcg_policy *pol) | ||||
| { | ||||
| @ -112,9 +133,10 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q, | ||||
| 
 | ||||
| 		blkg->pd[i] = pd; | ||||
| 		pd->blkg = blkg; | ||||
| 		pd->plid = i; | ||||
| 
 | ||||
| 		/* invoke per-policy init */ | ||||
| 		if (blkcg_policy_enabled(blkg->q, pol)) | ||||
| 		if (pol->pd_init_fn) | ||||
| 			pol->pd_init_fn(blkg); | ||||
| 	} | ||||
| 
 | ||||
| @ -125,8 +147,19 @@ err_free: | ||||
| 	return NULL; | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * __blkg_lookup - internal version of blkg_lookup() | ||||
|  * @blkcg: blkcg of interest | ||||
|  * @q: request_queue of interest | ||||
|  * @update_hint: whether to update lookup hint with the result or not | ||||
|  * | ||||
|  * This is internal version and shouldn't be used by policy | ||||
|  * implementations.  Looks up blkgs for the @blkcg - @q pair regardless of | ||||
|  * @q's bypass state.  If @update_hint is %true, the caller should be | ||||
|  * holding @q->queue_lock and lookup hint is updated on success. | ||||
|  */ | ||||
| static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, | ||||
| 				      struct request_queue *q) | ||||
| 				      struct request_queue *q, bool update_hint) | ||||
| { | ||||
| 	struct blkcg_gq *blkg; | ||||
| 
 | ||||
| @ -135,14 +168,19 @@ static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, | ||||
| 		return blkg; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Hint didn't match.  Look up from the radix tree.  Note that we | ||||
| 	 * may not be holding queue_lock and thus are not sure whether | ||||
| 	 * @blkg from blkg_tree has already been removed or not, so we | ||||
| 	 * can't update hint to the lookup result.  Leave it to the caller. | ||||
| 	 * Hint didn't match.  Look up from the radix tree.  Note that the | ||||
| 	 * hint can only be updated under queue_lock as otherwise @blkg | ||||
| 	 * could have already been removed from blkg_tree.  The caller is | ||||
| 	 * responsible for grabbing queue_lock if @update_hint. | ||||
| 	 */ | ||||
| 	blkg = radix_tree_lookup(&blkcg->blkg_tree, q->id); | ||||
| 	if (blkg && blkg->q == q) | ||||
| 	if (blkg && blkg->q == q) { | ||||
| 		if (update_hint) { | ||||
| 			lockdep_assert_held(q->queue_lock); | ||||
| 			rcu_assign_pointer(blkcg->blkg_hint, blkg); | ||||
| 		} | ||||
| 		return blkg; | ||||
| 	} | ||||
| 
 | ||||
| 	return NULL; | ||||
| } | ||||
| @ -162,7 +200,7 @@ struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, struct request_queue *q) | ||||
| 
 | ||||
| 	if (unlikely(blk_queue_bypass(q))) | ||||
| 		return NULL; | ||||
| 	return __blkg_lookup(blkcg, q); | ||||
| 	return __blkg_lookup(blkcg, q, false); | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(blkg_lookup); | ||||
| 
 | ||||
| @ -170,75 +208,129 @@ EXPORT_SYMBOL_GPL(blkg_lookup); | ||||
|  * If @new_blkg is %NULL, this function tries to allocate a new one as | ||||
|  * necessary using %GFP_ATOMIC.  @new_blkg is always consumed on return. | ||||
|  */ | ||||
| static struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg, | ||||
| 					     struct request_queue *q, | ||||
| 					     struct blkcg_gq *new_blkg) | ||||
| static struct blkcg_gq *blkg_create(struct blkcg *blkcg, | ||||
| 				    struct request_queue *q, | ||||
| 				    struct blkcg_gq *new_blkg) | ||||
| { | ||||
| 	struct blkcg_gq *blkg; | ||||
| 	int ret; | ||||
| 	int i, ret; | ||||
| 
 | ||||
| 	WARN_ON_ONCE(!rcu_read_lock_held()); | ||||
| 	lockdep_assert_held(q->queue_lock); | ||||
| 
 | ||||
| 	/* lookup and update hint on success, see __blkg_lookup() for details */ | ||||
| 	blkg = __blkg_lookup(blkcg, q); | ||||
| 	if (blkg) { | ||||
| 		rcu_assign_pointer(blkcg->blkg_hint, blkg); | ||||
| 		goto out_free; | ||||
| 	} | ||||
| 
 | ||||
| 	/* blkg holds a reference to blkcg */ | ||||
| 	if (!css_tryget(&blkcg->css)) { | ||||
| 		blkg = ERR_PTR(-EINVAL); | ||||
| 		goto out_free; | ||||
| 		ret = -EINVAL; | ||||
| 		goto err_free_blkg; | ||||
| 	} | ||||
| 
 | ||||
| 	/* allocate */ | ||||
| 	if (!new_blkg) { | ||||
| 		new_blkg = blkg_alloc(blkcg, q, GFP_ATOMIC); | ||||
| 		if (unlikely(!new_blkg)) { | ||||
| 			blkg = ERR_PTR(-ENOMEM); | ||||
| 			goto out_put; | ||||
| 			ret = -ENOMEM; | ||||
| 			goto err_put_css; | ||||
| 		} | ||||
| 	} | ||||
| 	blkg = new_blkg; | ||||
| 
 | ||||
| 	/* insert */ | ||||
| 	/* link parent and insert */ | ||||
| 	if (blkcg_parent(blkcg)) { | ||||
| 		blkg->parent = __blkg_lookup(blkcg_parent(blkcg), q, false); | ||||
| 		if (WARN_ON_ONCE(!blkg->parent)) { | ||||
| 			blkg = ERR_PTR(-EINVAL); | ||||
| 			goto err_put_css; | ||||
| 		} | ||||
| 		blkg_get(blkg->parent); | ||||
| 	} | ||||
| 
 | ||||
| 	spin_lock(&blkcg->lock); | ||||
| 	ret = radix_tree_insert(&blkcg->blkg_tree, q->id, blkg); | ||||
| 	if (likely(!ret)) { | ||||
| 		hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list); | ||||
| 		list_add(&blkg->q_node, &q->blkg_list); | ||||
| 
 | ||||
| 		for (i = 0; i < BLKCG_MAX_POLS; i++) { | ||||
| 			struct blkcg_policy *pol = blkcg_policy[i]; | ||||
| 
 | ||||
| 			if (blkg->pd[i] && pol->pd_online_fn) | ||||
| 				pol->pd_online_fn(blkg); | ||||
| 		} | ||||
| 	} | ||||
| 	blkg->online = true; | ||||
| 	spin_unlock(&blkcg->lock); | ||||
| 
 | ||||
| 	if (!ret) | ||||
| 		return blkg; | ||||
| 
 | ||||
| 	blkg = ERR_PTR(ret); | ||||
| out_put: | ||||
| 	/* @blkg failed fully initialized, use the usual release path */ | ||||
| 	blkg_put(blkg); | ||||
| 	return ERR_PTR(ret); | ||||
| 
 | ||||
| err_put_css: | ||||
| 	css_put(&blkcg->css); | ||||
| out_free: | ||||
| err_free_blkg: | ||||
| 	blkg_free(new_blkg); | ||||
| 	return blkg; | ||||
| 	return ERR_PTR(ret); | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_lookup_create - lookup blkg, try to create one if not there | ||||
|  * @blkcg: blkcg of interest | ||||
|  * @q: request_queue of interest | ||||
|  * | ||||
|  * Lookup blkg for the @blkcg - @q pair.  If it doesn't exist, try to | ||||
|  * create one.  blkg creation is performed recursively from blkcg_root such | ||||
|  * that all non-root blkg's have access to the parent blkg.  This function | ||||
|  * should be called under RCU read lock and @q->queue_lock. | ||||
|  * | ||||
|  * Returns pointer to the looked up or created blkg on success, ERR_PTR() | ||||
|  * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not | ||||
|  * dead and bypassing, returns ERR_PTR(-EBUSY). | ||||
|  */ | ||||
| struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg, | ||||
| 				    struct request_queue *q) | ||||
| { | ||||
| 	struct blkcg_gq *blkg; | ||||
| 
 | ||||
| 	WARN_ON_ONCE(!rcu_read_lock_held()); | ||||
| 	lockdep_assert_held(q->queue_lock); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * This could be the first entry point of blkcg implementation and | ||||
| 	 * we shouldn't allow anything to go through for a bypassing queue. | ||||
| 	 */ | ||||
| 	if (unlikely(blk_queue_bypass(q))) | ||||
| 		return ERR_PTR(blk_queue_dying(q) ? -EINVAL : -EBUSY); | ||||
| 	return __blkg_lookup_create(blkcg, q, NULL); | ||||
| 
 | ||||
| 	blkg = __blkg_lookup(blkcg, q, true); | ||||
| 	if (blkg) | ||||
| 		return blkg; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Create blkgs walking down from blkcg_root to @blkcg, so that all | ||||
| 	 * non-root blkgs have access to their parents. | ||||
| 	 */ | ||||
| 	while (true) { | ||||
| 		struct blkcg *pos = blkcg; | ||||
| 		struct blkcg *parent = blkcg_parent(blkcg); | ||||
| 
 | ||||
| 		while (parent && !__blkg_lookup(parent, q, false)) { | ||||
| 			pos = parent; | ||||
| 			parent = blkcg_parent(parent); | ||||
| 		} | ||||
| 
 | ||||
| 		blkg = blkg_create(pos, q, NULL); | ||||
| 		if (pos == blkcg || IS_ERR(blkg)) | ||||
| 			return blkg; | ||||
| 	} | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(blkg_lookup_create); | ||||
| 
 | ||||
| static void blkg_destroy(struct blkcg_gq *blkg) | ||||
| { | ||||
| 	struct blkcg *blkcg = blkg->blkcg; | ||||
| 	int i; | ||||
| 
 | ||||
| 	lockdep_assert_held(blkg->q->queue_lock); | ||||
| 	lockdep_assert_held(&blkcg->lock); | ||||
| @ -247,6 +339,14 @@ static void blkg_destroy(struct blkcg_gq *blkg) | ||||
| 	WARN_ON_ONCE(list_empty(&blkg->q_node)); | ||||
| 	WARN_ON_ONCE(hlist_unhashed(&blkg->blkcg_node)); | ||||
| 
 | ||||
| 	for (i = 0; i < BLKCG_MAX_POLS; i++) { | ||||
| 		struct blkcg_policy *pol = blkcg_policy[i]; | ||||
| 
 | ||||
| 		if (blkg->pd[i] && pol->pd_offline_fn) | ||||
| 			pol->pd_offline_fn(blkg); | ||||
| 	} | ||||
| 	blkg->online = false; | ||||
| 
 | ||||
| 	radix_tree_delete(&blkcg->blkg_tree, blkg->q->id); | ||||
| 	list_del_init(&blkg->q_node); | ||||
| 	hlist_del_init_rcu(&blkg->blkcg_node); | ||||
| @ -301,8 +401,10 @@ static void blkg_rcu_free(struct rcu_head *rcu_head) | ||||
| 
 | ||||
| void __blkg_release(struct blkcg_gq *blkg) | ||||
| { | ||||
| 	/* release the extra blkcg reference this blkg has been holding */ | ||||
| 	/* release the blkcg and parent blkg refs this blkg has been holding */ | ||||
| 	css_put(&blkg->blkcg->css); | ||||
| 	if (blkg->parent) | ||||
| 		blkg_put(blkg->parent); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * A group is freed in rcu manner. But having an rcu lock does not | ||||
| @ -401,8 +503,9 @@ static const char *blkg_dev_name(struct blkcg_gq *blkg) | ||||
|  * | ||||
|  * This function invokes @prfill on each blkg of @blkcg if pd for the | ||||
|  * policy specified by @pol exists.  @prfill is invoked with @sf, the | ||||
|  * policy data and @data.  If @show_total is %true, the sum of the return | ||||
|  * values from @prfill is printed with "Total" label at the end. | ||||
|  * policy data and @data and the matching queue lock held.  If @show_total | ||||
|  * is %true, the sum of the return values from @prfill is printed with | ||||
|  * "Total" label at the end. | ||||
|  * | ||||
|  * This is to be used to construct print functions for | ||||
|  * cftype->read_seq_string method. | ||||
| @ -416,11 +519,14 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, | ||||
| 	struct blkcg_gq *blkg; | ||||
| 	u64 total = 0; | ||||
| 
 | ||||
| 	spin_lock_irq(&blkcg->lock); | ||||
| 	hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) | ||||
| 	rcu_read_lock(); | ||||
| 	hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { | ||||
| 		spin_lock_irq(blkg->q->queue_lock); | ||||
| 		if (blkcg_policy_enabled(blkg->q, pol)) | ||||
| 			total += prfill(sf, blkg->pd[pol->plid], data); | ||||
| 	spin_unlock_irq(&blkcg->lock); | ||||
| 		spin_unlock_irq(blkg->q->queue_lock); | ||||
| 	} | ||||
| 	rcu_read_unlock(); | ||||
| 
 | ||||
| 	if (show_total) | ||||
| 		seq_printf(sf, "Total %llu\n", (unsigned long long)total); | ||||
| @ -479,6 +585,7 @@ u64 __blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, | ||||
| 	seq_printf(sf, "%s Total %llu\n", dname, (unsigned long long)v); | ||||
| 	return v; | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(__blkg_prfill_rwstat); | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_prfill_stat - prfill callback for blkg_stat | ||||
| @ -511,6 +618,82 @@ u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(blkg_prfill_rwstat); | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_stat_recursive_sum - collect hierarchical blkg_stat | ||||
|  * @pd: policy private data of interest | ||||
|  * @off: offset to the blkg_stat in @pd | ||||
|  * | ||||
|  * Collect the blkg_stat specified by @off from @pd and all its online | ||||
|  * descendants and return the sum.  The caller must be holding the queue | ||||
|  * lock for online tests. | ||||
|  */ | ||||
| u64 blkg_stat_recursive_sum(struct blkg_policy_data *pd, int off) | ||||
| { | ||||
| 	struct blkcg_policy *pol = blkcg_policy[pd->plid]; | ||||
| 	struct blkcg_gq *pos_blkg; | ||||
| 	struct cgroup *pos_cgrp; | ||||
| 	u64 sum; | ||||
| 
 | ||||
| 	lockdep_assert_held(pd->blkg->q->queue_lock); | ||||
| 
 | ||||
| 	sum = blkg_stat_read((void *)pd + off); | ||||
| 
 | ||||
| 	rcu_read_lock(); | ||||
| 	blkg_for_each_descendant_pre(pos_blkg, pos_cgrp, pd_to_blkg(pd)) { | ||||
| 		struct blkg_policy_data *pos_pd = blkg_to_pd(pos_blkg, pol); | ||||
| 		struct blkg_stat *stat = (void *)pos_pd + off; | ||||
| 
 | ||||
| 		if (pos_blkg->online) | ||||
| 			sum += blkg_stat_read(stat); | ||||
| 	} | ||||
| 	rcu_read_unlock(); | ||||
| 
 | ||||
| 	return sum; | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(blkg_stat_recursive_sum); | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_rwstat_recursive_sum - collect hierarchical blkg_rwstat | ||||
|  * @pd: policy private data of interest | ||||
|  * @off: offset to the blkg_stat in @pd | ||||
|  * | ||||
|  * Collect the blkg_rwstat specified by @off from @pd and all its online | ||||
|  * descendants and return the sum.  The caller must be holding the queue | ||||
|  * lock for online tests. | ||||
|  */ | ||||
| struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkg_policy_data *pd, | ||||
| 					     int off) | ||||
| { | ||||
| 	struct blkcg_policy *pol = blkcg_policy[pd->plid]; | ||||
| 	struct blkcg_gq *pos_blkg; | ||||
| 	struct cgroup *pos_cgrp; | ||||
| 	struct blkg_rwstat sum; | ||||
| 	int i; | ||||
| 
 | ||||
| 	lockdep_assert_held(pd->blkg->q->queue_lock); | ||||
| 
 | ||||
| 	sum = blkg_rwstat_read((void *)pd + off); | ||||
| 
 | ||||
| 	rcu_read_lock(); | ||||
| 	blkg_for_each_descendant_pre(pos_blkg, pos_cgrp, pd_to_blkg(pd)) { | ||||
| 		struct blkg_policy_data *pos_pd = blkg_to_pd(pos_blkg, pol); | ||||
| 		struct blkg_rwstat *rwstat = (void *)pos_pd + off; | ||||
| 		struct blkg_rwstat tmp; | ||||
| 
 | ||||
| 		if (!pos_blkg->online) | ||||
| 			continue; | ||||
| 
 | ||||
| 		tmp = blkg_rwstat_read(rwstat); | ||||
| 
 | ||||
| 		for (i = 0; i < BLKG_RWSTAT_NR; i++) | ||||
| 			sum.cnt[i] += tmp.cnt[i]; | ||||
| 	} | ||||
| 	rcu_read_unlock(); | ||||
| 
 | ||||
| 	return sum; | ||||
| } | ||||
| EXPORT_SYMBOL_GPL(blkg_rwstat_recursive_sum); | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_conf_prep - parse and prepare for per-blkg config update | ||||
|  * @blkcg: target block cgroup | ||||
| @ -656,6 +839,7 @@ static struct cgroup_subsys_state *blkcg_css_alloc(struct cgroup *cgroup) | ||||
| 		return ERR_PTR(-ENOMEM); | ||||
| 
 | ||||
| 	blkcg->cfq_weight = CFQ_WEIGHT_DEFAULT; | ||||
| 	blkcg->cfq_leaf_weight = CFQ_WEIGHT_DEFAULT; | ||||
| 	blkcg->id = atomic64_inc_return(&id_seq); /* root is 0, start from 1 */ | ||||
| done: | ||||
| 	spin_lock_init(&blkcg->lock); | ||||
| @ -775,7 +959,7 @@ int blkcg_activate_policy(struct request_queue *q, | ||||
| 			  const struct blkcg_policy *pol) | ||||
| { | ||||
| 	LIST_HEAD(pds); | ||||
| 	struct blkcg_gq *blkg; | ||||
| 	struct blkcg_gq *blkg, *new_blkg; | ||||
| 	struct blkg_policy_data *pd, *n; | ||||
| 	int cnt = 0, ret; | ||||
| 	bool preloaded; | ||||
| @ -784,19 +968,27 @@ int blkcg_activate_policy(struct request_queue *q, | ||||
| 		return 0; | ||||
| 
 | ||||
| 	/* preallocations for root blkg */ | ||||
| 	blkg = blkg_alloc(&blkcg_root, q, GFP_KERNEL); | ||||
| 	if (!blkg) | ||||
| 	new_blkg = blkg_alloc(&blkcg_root, q, GFP_KERNEL); | ||||
| 	if (!new_blkg) | ||||
| 		return -ENOMEM; | ||||
| 
 | ||||
| 	preloaded = !radix_tree_preload(GFP_KERNEL); | ||||
| 
 | ||||
| 	blk_queue_bypass_start(q); | ||||
| 
 | ||||
| 	/* make sure the root blkg exists and count the existing blkgs */ | ||||
| 	/*
 | ||||
| 	 * Make sure the root blkg exists and count the existing blkgs.  As | ||||
| 	 * @q is bypassing at this point, blkg_lookup_create() can't be | ||||
| 	 * used.  Open code it. | ||||
| 	 */ | ||||
| 	spin_lock_irq(q->queue_lock); | ||||
| 
 | ||||
| 	rcu_read_lock(); | ||||
| 	blkg = __blkg_lookup_create(&blkcg_root, q, blkg); | ||||
| 	blkg = __blkg_lookup(&blkcg_root, q, false); | ||||
| 	if (blkg) | ||||
| 		blkg_free(new_blkg); | ||||
| 	else | ||||
| 		blkg = blkg_create(&blkcg_root, q, new_blkg); | ||||
| 	rcu_read_unlock(); | ||||
| 
 | ||||
| 	if (preloaded) | ||||
| @ -844,6 +1036,7 @@ int blkcg_activate_policy(struct request_queue *q, | ||||
| 
 | ||||
| 		blkg->pd[pol->plid] = pd; | ||||
| 		pd->blkg = blkg; | ||||
| 		pd->plid = pol->plid; | ||||
| 		pol->pd_init_fn(blkg); | ||||
| 
 | ||||
| 		spin_unlock(&blkg->blkcg->lock); | ||||
| @ -890,6 +1083,8 @@ void blkcg_deactivate_policy(struct request_queue *q, | ||||
| 		/* grab blkcg lock too while removing @pd from @blkg */ | ||||
| 		spin_lock(&blkg->blkcg->lock); | ||||
| 
 | ||||
| 		if (pol->pd_offline_fn) | ||||
| 			pol->pd_offline_fn(blkg); | ||||
| 		if (pol->pd_exit_fn) | ||||
| 			pol->pd_exit_fn(blkg); | ||||
| 
 | ||||
|  | ||||
| @ -54,6 +54,7 @@ struct blkcg { | ||||
| 
 | ||||
| 	/* TODO: per-policy storage in blkcg */ | ||||
| 	unsigned int			cfq_weight;	/* belongs to cfq */ | ||||
| 	unsigned int			cfq_leaf_weight; | ||||
| }; | ||||
| 
 | ||||
| struct blkg_stat { | ||||
| @ -80,8 +81,9 @@ struct blkg_rwstat { | ||||
|  * beginning and pd_size can't be smaller than pd. | ||||
|  */ | ||||
| struct blkg_policy_data { | ||||
| 	/* the blkg this per-policy data belongs to */ | ||||
| 	/* the blkg and policy id this per-policy data belongs to */ | ||||
| 	struct blkcg_gq			*blkg; | ||||
| 	int				plid; | ||||
| 
 | ||||
| 	/* used during policy activation */ | ||||
| 	struct list_head		alloc_node; | ||||
| @ -94,17 +96,27 @@ struct blkcg_gq { | ||||
| 	struct list_head		q_node; | ||||
| 	struct hlist_node		blkcg_node; | ||||
| 	struct blkcg			*blkcg; | ||||
| 
 | ||||
| 	/* all non-root blkcg_gq's are guaranteed to have access to parent */ | ||||
| 	struct blkcg_gq			*parent; | ||||
| 
 | ||||
| 	/* request allocation list for this blkcg-q pair */ | ||||
| 	struct request_list		rl; | ||||
| 
 | ||||
| 	/* reference count */ | ||||
| 	int				refcnt; | ||||
| 
 | ||||
| 	/* is this blkg online? protected by both blkcg and q locks */ | ||||
| 	bool				online; | ||||
| 
 | ||||
| 	struct blkg_policy_data		*pd[BLKCG_MAX_POLS]; | ||||
| 
 | ||||
| 	struct rcu_head			rcu_head; | ||||
| }; | ||||
| 
 | ||||
| typedef void (blkcg_pol_init_pd_fn)(struct blkcg_gq *blkg); | ||||
| typedef void (blkcg_pol_online_pd_fn)(struct blkcg_gq *blkg); | ||||
| typedef void (blkcg_pol_offline_pd_fn)(struct blkcg_gq *blkg); | ||||
| typedef void (blkcg_pol_exit_pd_fn)(struct blkcg_gq *blkg); | ||||
| typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkcg_gq *blkg); | ||||
| 
 | ||||
| @ -117,6 +129,8 @@ struct blkcg_policy { | ||||
| 
 | ||||
| 	/* operations */ | ||||
| 	blkcg_pol_init_pd_fn		*pd_init_fn; | ||||
| 	blkcg_pol_online_pd_fn		*pd_online_fn; | ||||
| 	blkcg_pol_offline_pd_fn		*pd_offline_fn; | ||||
| 	blkcg_pol_exit_pd_fn		*pd_exit_fn; | ||||
| 	blkcg_pol_reset_pd_stats_fn	*pd_reset_stats_fn; | ||||
| }; | ||||
| @ -150,6 +164,10 @@ u64 blkg_prfill_stat(struct seq_file *sf, struct blkg_policy_data *pd, int off); | ||||
| u64 blkg_prfill_rwstat(struct seq_file *sf, struct blkg_policy_data *pd, | ||||
| 		       int off); | ||||
| 
 | ||||
| u64 blkg_stat_recursive_sum(struct blkg_policy_data *pd, int off); | ||||
| struct blkg_rwstat blkg_rwstat_recursive_sum(struct blkg_policy_data *pd, | ||||
| 					     int off); | ||||
| 
 | ||||
| struct blkg_conf_ctx { | ||||
| 	struct gendisk			*disk; | ||||
| 	struct blkcg_gq			*blkg; | ||||
| @ -180,6 +198,19 @@ static inline struct blkcg *bio_blkcg(struct bio *bio) | ||||
| 	return task_blkcg(current); | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkcg_parent - get the parent of a blkcg | ||||
|  * @blkcg: blkcg of interest | ||||
|  * | ||||
|  * Return the parent blkcg of @blkcg.  Can be called anytime. | ||||
|  */ | ||||
| static inline struct blkcg *blkcg_parent(struct blkcg *blkcg) | ||||
| { | ||||
| 	struct cgroup *pcg = blkcg->css.cgroup->parent; | ||||
| 
 | ||||
| 	return pcg ? cgroup_to_blkcg(pcg) : NULL; | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_to_pdata - get policy private data | ||||
|  * @blkg: blkg of interest | ||||
| @ -386,6 +417,18 @@ static inline void blkg_stat_reset(struct blkg_stat *stat) | ||||
| 	stat->cnt = 0; | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_stat_merge - merge a blkg_stat into another | ||||
|  * @to: the destination blkg_stat | ||||
|  * @from: the source | ||||
|  * | ||||
|  * Add @from's count to @to. | ||||
|  */ | ||||
| static inline void blkg_stat_merge(struct blkg_stat *to, struct blkg_stat *from) | ||||
| { | ||||
| 	blkg_stat_add(to, blkg_stat_read(from)); | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_rwstat_add - add a value to a blkg_rwstat | ||||
|  * @rwstat: target blkg_rwstat | ||||
| @ -434,14 +477,14 @@ static inline struct blkg_rwstat blkg_rwstat_read(struct blkg_rwstat *rwstat) | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_rwstat_sum - read the total count of a blkg_rwstat | ||||
|  * blkg_rwstat_total - read the total count of a blkg_rwstat | ||||
|  * @rwstat: blkg_rwstat to read | ||||
|  * | ||||
|  * Return the total count of @rwstat regardless of the IO direction.  This | ||||
|  * function can be called without synchronization and takes care of u64 | ||||
|  * atomicity. | ||||
|  */ | ||||
| static inline uint64_t blkg_rwstat_sum(struct blkg_rwstat *rwstat) | ||||
| static inline uint64_t blkg_rwstat_total(struct blkg_rwstat *rwstat) | ||||
| { | ||||
| 	struct blkg_rwstat tmp = blkg_rwstat_read(rwstat); | ||||
| 
 | ||||
| @ -457,6 +500,25 @@ static inline void blkg_rwstat_reset(struct blkg_rwstat *rwstat) | ||||
| 	memset(rwstat->cnt, 0, sizeof(rwstat->cnt)); | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blkg_rwstat_merge - merge a blkg_rwstat into another | ||||
|  * @to: the destination blkg_rwstat | ||||
|  * @from: the source | ||||
|  * | ||||
|  * Add @from's counts to @to. | ||||
|  */ | ||||
| static inline void blkg_rwstat_merge(struct blkg_rwstat *to, | ||||
| 				     struct blkg_rwstat *from) | ||||
| { | ||||
| 	struct blkg_rwstat v = blkg_rwstat_read(from); | ||||
| 	int i; | ||||
| 
 | ||||
| 	u64_stats_update_begin(&to->syncp); | ||||
| 	for (i = 0; i < BLKG_RWSTAT_NR; i++) | ||||
| 		to->cnt[i] += v.cnt[i]; | ||||
| 	u64_stats_update_end(&to->syncp); | ||||
| } | ||||
| 
 | ||||
| #else	/* CONFIG_BLK_CGROUP */ | ||||
| 
 | ||||
| struct cgroup; | ||||
|  | ||||
| @ -39,7 +39,6 @@ | ||||
| 
 | ||||
| EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap); | ||||
| EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap); | ||||
| EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_complete); | ||||
| EXPORT_TRACEPOINT_SYMBOL_GPL(block_unplug); | ||||
| 
 | ||||
| DEFINE_IDA(blk_queue_ida); | ||||
| @ -1348,7 +1347,7 @@ static bool bio_attempt_back_merge(struct request_queue *q, struct request *req, | ||||
| 	if (!ll_back_merge_fn(q, req, bio)) | ||||
| 		return false; | ||||
| 
 | ||||
| 	trace_block_bio_backmerge(q, bio); | ||||
| 	trace_block_bio_backmerge(q, req, bio); | ||||
| 
 | ||||
| 	if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) | ||||
| 		blk_rq_set_mixed_merge(req); | ||||
| @ -1370,7 +1369,7 @@ static bool bio_attempt_front_merge(struct request_queue *q, | ||||
| 	if (!ll_front_merge_fn(q, req, bio)) | ||||
| 		return false; | ||||
| 
 | ||||
| 	trace_block_bio_frontmerge(q, bio); | ||||
| 	trace_block_bio_frontmerge(q, req, bio); | ||||
| 
 | ||||
| 	if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff) | ||||
| 		blk_rq_set_mixed_merge(req); | ||||
| @ -1553,13 +1552,6 @@ get_rq: | ||||
| 		if (list_empty(&plug->list)) | ||||
| 			trace_block_plug(q); | ||||
| 		else { | ||||
| 			if (!plug->should_sort) { | ||||
| 				struct request *__rq; | ||||
| 
 | ||||
| 				__rq = list_entry_rq(plug->list.prev); | ||||
| 				if (__rq->q != q) | ||||
| 					plug->should_sort = 1; | ||||
| 			} | ||||
| 			if (request_count >= BLK_MAX_REQUEST_COUNT) { | ||||
| 				blk_flush_plug_list(plug, false); | ||||
| 				trace_block_plug(q); | ||||
| @ -2890,7 +2882,6 @@ void blk_start_plug(struct blk_plug *plug) | ||||
| 	plug->magic = PLUG_MAGIC; | ||||
| 	INIT_LIST_HEAD(&plug->list); | ||||
| 	INIT_LIST_HEAD(&plug->cb_list); | ||||
| 	plug->should_sort = 0; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * If this is a nested plug, don't actually assign it. It will be | ||||
| @ -2992,10 +2983,7 @@ void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule) | ||||
| 
 | ||||
| 	list_splice_init(&plug->list, &list); | ||||
| 
 | ||||
| 	if (plug->should_sort) { | ||||
| 		list_sort(NULL, &list, plug_rq_cmp); | ||||
| 		plug->should_sort = 0; | ||||
| 	} | ||||
| 	list_sort(NULL, &list, plug_rq_cmp); | ||||
| 
 | ||||
| 	q = NULL; | ||||
| 	depth = 0; | ||||
|  | ||||
| @ -121,9 +121,9 @@ int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk, | ||||
| 	/* Prevent hang_check timer from firing at us during very long I/O */ | ||||
| 	hang_check = sysctl_hung_task_timeout_secs; | ||||
| 	if (hang_check) | ||||
| 		while (!wait_for_completion_timeout(&wait, hang_check * (HZ/2))); | ||||
| 		while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2))); | ||||
| 	else | ||||
| 		wait_for_completion(&wait); | ||||
| 		wait_for_completion_io(&wait); | ||||
| 
 | ||||
| 	if (rq->errors) | ||||
| 		err = -EIO; | ||||
|  | ||||
| @ -436,7 +436,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask, | ||||
| 
 | ||||
| 	bio_get(bio); | ||||
| 	submit_bio(WRITE_FLUSH, bio); | ||||
| 	wait_for_completion(&wait); | ||||
| 	wait_for_completion_io(&wait); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * The driver must store the error location in ->bi_sector, if | ||||
|  | ||||
| @ -126,7 +126,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, | ||||
| 
 | ||||
| 	/* Wait for bios in-flight */ | ||||
| 	if (!atomic_dec_and_test(&bb.done)) | ||||
| 		wait_for_completion(&wait); | ||||
| 		wait_for_completion_io(&wait); | ||||
| 
 | ||||
| 	if (!test_bit(BIO_UPTODATE, &bb.flags)) | ||||
| 		ret = -EIO; | ||||
| @ -200,7 +200,7 @@ int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, | ||||
| 
 | ||||
| 	/* Wait for bios in-flight */ | ||||
| 	if (!atomic_dec_and_test(&bb.done)) | ||||
| 		wait_for_completion(&wait); | ||||
| 		wait_for_completion_io(&wait); | ||||
| 
 | ||||
| 	if (!test_bit(BIO_UPTODATE, &bb.flags)) | ||||
| 		ret = -ENOTSUPP; | ||||
| @ -262,7 +262,7 @@ int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, | ||||
| 
 | ||||
| 	/* Wait for bios in-flight */ | ||||
| 	if (!atomic_dec_and_test(&bb.done)) | ||||
| 		wait_for_completion(&wait); | ||||
| 		wait_for_completion_io(&wait); | ||||
| 
 | ||||
| 	if (!test_bit(BIO_UPTODATE, &bb.flags)) | ||||
| 		/* One of bios in the batch was completed with error.*/ | ||||
|  | ||||
| @ -497,6 +497,13 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr, | ||||
| 	return res; | ||||
| } | ||||
| 
 | ||||
| static void blk_free_queue_rcu(struct rcu_head *rcu_head) | ||||
| { | ||||
| 	struct request_queue *q = container_of(rcu_head, struct request_queue, | ||||
| 					       rcu_head); | ||||
| 	kmem_cache_free(blk_requestq_cachep, q); | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * blk_release_queue: - release a &struct request_queue when it is no longer needed | ||||
|  * @kobj:    the kobj belonging to the request queue to be released | ||||
| @ -538,7 +545,7 @@ static void blk_release_queue(struct kobject *kobj) | ||||
| 	bdi_destroy(&q->backing_dev_info); | ||||
| 
 | ||||
| 	ida_simple_remove(&blk_queue_ida, q->id); | ||||
| 	kmem_cache_free(blk_requestq_cachep, q); | ||||
| 	call_rcu(&q->rcu_head, blk_free_queue_rcu); | ||||
| } | ||||
| 
 | ||||
| static const struct sysfs_ops queue_sysfs_ops = { | ||||
|  | ||||
| @ -61,7 +61,7 @@ static inline void blk_clear_rq_complete(struct request *rq) | ||||
| /*
 | ||||
|  * Internal elevator interface | ||||
|  */ | ||||
| #define ELV_ON_HASH(rq)		(!hlist_unhashed(&(rq)->hash)) | ||||
| #define ELV_ON_HASH(rq) hash_hashed(&(rq)->hash) | ||||
| 
 | ||||
| void blk_insert_flush(struct request *rq); | ||||
| void blk_abort_flushes(struct request_queue *q); | ||||
|  | ||||
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							| @ -46,11 +46,6 @@ static LIST_HEAD(elv_list); | ||||
| /*
 | ||||
|  * Merge hash stuff. | ||||
|  */ | ||||
| static const int elv_hash_shift = 6; | ||||
| #define ELV_HASH_BLOCK(sec)	((sec) >> 3) | ||||
| #define ELV_HASH_FN(sec)	\ | ||||
| 		(hash_long(ELV_HASH_BLOCK((sec)), elv_hash_shift)) | ||||
| #define ELV_HASH_ENTRIES	(1 << elv_hash_shift) | ||||
| #define rq_hash_key(rq)		(blk_rq_pos(rq) + blk_rq_sectors(rq)) | ||||
| 
 | ||||
| /*
 | ||||
| @ -158,7 +153,6 @@ static struct elevator_queue *elevator_alloc(struct request_queue *q, | ||||
| 				  struct elevator_type *e) | ||||
| { | ||||
| 	struct elevator_queue *eq; | ||||
| 	int i; | ||||
| 
 | ||||
| 	eq = kmalloc_node(sizeof(*eq), GFP_KERNEL | __GFP_ZERO, q->node); | ||||
| 	if (unlikely(!eq)) | ||||
| @ -167,14 +161,7 @@ static struct elevator_queue *elevator_alloc(struct request_queue *q, | ||||
| 	eq->type = e; | ||||
| 	kobject_init(&eq->kobj, &elv_ktype); | ||||
| 	mutex_init(&eq->sysfs_lock); | ||||
| 
 | ||||
| 	eq->hash = kmalloc_node(sizeof(struct hlist_head) * ELV_HASH_ENTRIES, | ||||
| 					GFP_KERNEL, q->node); | ||||
| 	if (!eq->hash) | ||||
| 		goto err; | ||||
| 
 | ||||
| 	for (i = 0; i < ELV_HASH_ENTRIES; i++) | ||||
| 		INIT_HLIST_HEAD(&eq->hash[i]); | ||||
| 	hash_init(eq->hash); | ||||
| 
 | ||||
| 	return eq; | ||||
| err: | ||||
| @ -189,7 +176,6 @@ static void elevator_release(struct kobject *kobj) | ||||
| 
 | ||||
| 	e = container_of(kobj, struct elevator_queue, kobj); | ||||
| 	elevator_put(e->type); | ||||
| 	kfree(e->hash); | ||||
| 	kfree(e); | ||||
| } | ||||
| 
 | ||||
| @ -261,7 +247,7 @@ EXPORT_SYMBOL(elevator_exit); | ||||
| 
 | ||||
| static inline void __elv_rqhash_del(struct request *rq) | ||||
| { | ||||
| 	hlist_del_init(&rq->hash); | ||||
| 	hash_del(&rq->hash); | ||||
| } | ||||
| 
 | ||||
| static void elv_rqhash_del(struct request_queue *q, struct request *rq) | ||||
| @ -275,7 +261,7 @@ static void elv_rqhash_add(struct request_queue *q, struct request *rq) | ||||
| 	struct elevator_queue *e = q->elevator; | ||||
| 
 | ||||
| 	BUG_ON(ELV_ON_HASH(rq)); | ||||
| 	hlist_add_head(&rq->hash, &e->hash[ELV_HASH_FN(rq_hash_key(rq))]); | ||||
| 	hash_add(e->hash, &rq->hash, rq_hash_key(rq)); | ||||
| } | ||||
| 
 | ||||
| static void elv_rqhash_reposition(struct request_queue *q, struct request *rq) | ||||
| @ -287,11 +273,10 @@ static void elv_rqhash_reposition(struct request_queue *q, struct request *rq) | ||||
| static struct request *elv_rqhash_find(struct request_queue *q, sector_t offset) | ||||
| { | ||||
| 	struct elevator_queue *e = q->elevator; | ||||
| 	struct hlist_head *hash_list = &e->hash[ELV_HASH_FN(offset)]; | ||||
| 	struct hlist_node *next; | ||||
| 	struct request *rq; | ||||
| 
 | ||||
| 	hlist_for_each_entry_safe(rq, next, hash_list, hash) { | ||||
| 	hash_for_each_possible_safe(e->hash, rq, next, hash, offset) { | ||||
| 		BUG_ON(!ELV_ON_HASH(rq)); | ||||
| 
 | ||||
| 		if (unlikely(!rq_mergeable(rq))) { | ||||
|  | ||||
| @ -1090,10 +1090,13 @@ static const struct block_device_operations floppy_fops = { | ||||
| static void swim3_mb_event(struct macio_dev* mdev, int mb_state) | ||||
| { | ||||
| 	struct floppy_state *fs = macio_get_drvdata(mdev); | ||||
| 	struct swim3 __iomem *sw = fs->swim3; | ||||
| 	struct swim3 __iomem *sw; | ||||
| 
 | ||||
| 	if (!fs) | ||||
| 		return; | ||||
| 
 | ||||
| 	sw = fs->swim3; | ||||
| 
 | ||||
| 	if (mb_state != MB_FD) | ||||
| 		return; | ||||
| 
 | ||||
|  | ||||
| @ -626,7 +626,6 @@ static void dec_pending(struct dm_io *io, int error) | ||||
| 			queue_io(md, bio); | ||||
| 		} else { | ||||
| 			/* done with normal IO or empty flush */ | ||||
| 			trace_block_bio_complete(md->queue, bio, io_error); | ||||
| 			bio_endio(bio, io_error); | ||||
| 		} | ||||
| 	} | ||||
|  | ||||
| @ -184,8 +184,6 @@ static void return_io(struct bio *return_bi) | ||||
| 		return_bi = bi->bi_next; | ||||
| 		bi->bi_next = NULL; | ||||
| 		bi->bi_size = 0; | ||||
| 		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev), | ||||
| 					 bi, 0); | ||||
| 		bio_endio(bi, 0); | ||||
| 		bi = return_bi; | ||||
| 	} | ||||
| @ -3916,8 +3914,6 @@ static void raid5_align_endio(struct bio *bi, int error) | ||||
| 	rdev_dec_pending(rdev, conf->mddev); | ||||
| 
 | ||||
| 	if (!error && uptodate) { | ||||
| 		trace_block_bio_complete(bdev_get_queue(raid_bi->bi_bdev), | ||||
| 					 raid_bi, 0); | ||||
| 		bio_endio(raid_bi, 0); | ||||
| 		if (atomic_dec_and_test(&conf->active_aligned_reads)) | ||||
| 			wake_up(&conf->wait_for_stripe); | ||||
| @ -4376,8 +4372,6 @@ static void make_request(struct mddev *mddev, struct bio * bi) | ||||
| 		if ( rw == WRITE ) | ||||
| 			md_write_end(mddev); | ||||
| 
 | ||||
| 		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev), | ||||
| 					 bi, 0); | ||||
| 		bio_endio(bi, 0); | ||||
| 	} | ||||
| } | ||||
| @ -4754,11 +4748,8 @@ static int  retry_aligned_read(struct r5conf *conf, struct bio *raid_bio) | ||||
| 		handled++; | ||||
| 	} | ||||
| 	remaining = raid5_dec_bi_active_stripes(raid_bio); | ||||
| 	if (remaining == 0) { | ||||
| 		trace_block_bio_complete(bdev_get_queue(raid_bio->bi_bdev), | ||||
| 					 raid_bio, 0); | ||||
| 	if (remaining == 0) | ||||
| 		bio_endio(raid_bio, 0); | ||||
| 	} | ||||
| 	if (atomic_dec_and_test(&conf->active_aligned_reads)) | ||||
| 		wake_up(&conf->wait_for_stripe); | ||||
| 	return handled; | ||||
|  | ||||
							
								
								
									
										2
									
								
								fs/bio.c
									
									
									
									
									
								
							
							
						
						
									
										2
									
								
								fs/bio.c
									
									
									
									
									
								
							| @ -1428,6 +1428,8 @@ void bio_endio(struct bio *bio, int error) | ||||
| 	else if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) | ||||
| 		error = -EIO; | ||||
| 
 | ||||
| 	trace_block_bio_complete(bio, error); | ||||
| 
 | ||||
| 	if (bio->bi_end_io) | ||||
| 		bio->bi_end_io(bio, error); | ||||
| } | ||||
|  | ||||
| @ -1033,7 +1033,9 @@ void bd_set_size(struct block_device *bdev, loff_t size) | ||||
| { | ||||
| 	unsigned bsize = bdev_logical_block_size(bdev); | ||||
| 
 | ||||
| 	bdev->bd_inode->i_size = size; | ||||
| 	mutex_lock(&bdev->bd_inode->i_mutex); | ||||
| 	i_size_write(bdev->bd_inode, size); | ||||
| 	mutex_unlock(&bdev->bd_inode->i_mutex); | ||||
| 	while (bsize < PAGE_CACHE_SIZE) { | ||||
| 		if (size & bsize) | ||||
| 			break; | ||||
| @ -1118,7 +1120,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) | ||||
| 				} | ||||
| 			} | ||||
| 
 | ||||
| 			if (!ret && !bdev->bd_openers) { | ||||
| 			if (!ret) { | ||||
| 				bd_set_size(bdev,(loff_t)get_capacity(disk)<<9); | ||||
| 				bdi = blk_get_backing_dev_info(bdev); | ||||
| 				if (bdi == NULL) | ||||
|  | ||||
							
								
								
									
										10
									
								
								fs/buffer.c
									
									
									
									
									
								
							
							
						
						
									
										10
									
								
								fs/buffer.c
									
									
									
									
									
								
							| @ -41,6 +41,7 @@ | ||||
| #include <linux/bitops.h> | ||||
| #include <linux/mpage.h> | ||||
| #include <linux/bit_spinlock.h> | ||||
| #include <trace/events/block.h> | ||||
| 
 | ||||
| static int fsync_buffers_list(spinlock_t *lock, struct list_head *list); | ||||
| 
 | ||||
| @ -53,6 +54,13 @@ void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private) | ||||
| } | ||||
| EXPORT_SYMBOL(init_buffer); | ||||
| 
 | ||||
| inline void touch_buffer(struct buffer_head *bh) | ||||
| { | ||||
| 	trace_block_touch_buffer(bh); | ||||
| 	mark_page_accessed(bh->b_page); | ||||
| } | ||||
| EXPORT_SYMBOL(touch_buffer); | ||||
| 
 | ||||
| static int sleep_on_buffer(void *word) | ||||
| { | ||||
| 	io_schedule(); | ||||
| @ -1113,6 +1121,8 @@ void mark_buffer_dirty(struct buffer_head *bh) | ||||
| { | ||||
| 	WARN_ON_ONCE(!buffer_uptodate(bh)); | ||||
| 
 | ||||
| 	trace_block_dirty_buffer(bh); | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Very *carefully* optimize the it-is-already-dirty case. | ||||
| 	 * | ||||
|  | ||||
| @ -318,8 +318,14 @@ static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work) | ||||
| 
 | ||||
| static int write_inode(struct inode *inode, struct writeback_control *wbc) | ||||
| { | ||||
| 	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode)) | ||||
| 		return inode->i_sb->s_op->write_inode(inode, wbc); | ||||
| 	int ret; | ||||
| 
 | ||||
| 	if (inode->i_sb->s_op->write_inode && !is_bad_inode(inode)) { | ||||
| 		trace_writeback_write_inode_start(inode, wbc); | ||||
| 		ret = inode->i_sb->s_op->write_inode(inode, wbc); | ||||
| 		trace_writeback_write_inode(inode, wbc); | ||||
| 		return ret; | ||||
| 	} | ||||
| 	return 0; | ||||
| } | ||||
| 
 | ||||
| @ -450,6 +456,8 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) | ||||
| 
 | ||||
| 	WARN_ON(!(inode->i_state & I_SYNC)); | ||||
| 
 | ||||
| 	trace_writeback_single_inode_start(inode, wbc, nr_to_write); | ||||
| 
 | ||||
| 	ret = do_writepages(mapping, wbc); | ||||
| 
 | ||||
| 	/*
 | ||||
| @ -1150,8 +1158,12 @@ void __mark_inode_dirty(struct inode *inode, int flags) | ||||
| 	 * dirty the inode itself | ||||
| 	 */ | ||||
| 	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) { | ||||
| 		trace_writeback_dirty_inode_start(inode, flags); | ||||
| 
 | ||||
| 		if (sb->s_op->dirty_inode) | ||||
| 			sb->s_op->dirty_inode(inode, flags); | ||||
| 
 | ||||
| 		trace_writeback_dirty_inode(inode, flags); | ||||
| 	} | ||||
| 
 | ||||
| 	/*
 | ||||
|  | ||||
| @ -19,6 +19,7 @@ | ||||
| #include <linux/gfp.h> | ||||
| #include <linux/bsg.h> | ||||
| #include <linux/smp.h> | ||||
| #include <linux/rcupdate.h> | ||||
| 
 | ||||
| #include <asm/scatterlist.h> | ||||
| 
 | ||||
| @ -437,6 +438,7 @@ struct request_queue { | ||||
| 	/* Throttle data */ | ||||
| 	struct throtl_data *td; | ||||
| #endif | ||||
| 	struct rcu_head		rcu_head; | ||||
| }; | ||||
| 
 | ||||
| #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */ | ||||
| @ -974,7 +976,6 @@ struct blk_plug { | ||||
| 	unsigned long magic; /* detect uninitialized use-cases */ | ||||
| 	struct list_head list; /* requests */ | ||||
| 	struct list_head cb_list; /* md requires an unplug callback */ | ||||
| 	unsigned int should_sort; /* list to be sorted before flushing? */ | ||||
| }; | ||||
| #define BLK_MAX_REQUEST_COUNT 16 | ||||
| 
 | ||||
|  | ||||
| @ -12,6 +12,7 @@ | ||||
| 
 | ||||
| struct blk_trace { | ||||
| 	int trace_state; | ||||
| 	bool rq_based; | ||||
| 	struct rchan *rchan; | ||||
| 	unsigned long __percpu *sequence; | ||||
| 	unsigned char __percpu *msg_data; | ||||
|  | ||||
| @ -126,7 +126,6 @@ BUFFER_FNS(Write_EIO, write_io_error) | ||||
| BUFFER_FNS(Unwritten, unwritten) | ||||
| 
 | ||||
| #define bh_offset(bh)		((unsigned long)(bh)->b_data & ~PAGE_MASK) | ||||
| #define touch_buffer(bh)	mark_page_accessed(bh->b_page) | ||||
| 
 | ||||
| /* If we *know* page->private refers to buffer_heads */ | ||||
| #define page_buffers(page)					\ | ||||
| @ -142,6 +141,7 @@ BUFFER_FNS(Unwritten, unwritten) | ||||
| 
 | ||||
| void mark_buffer_dirty(struct buffer_head *bh); | ||||
| void init_buffer(struct buffer_head *, bh_end_io_t *, void *); | ||||
| void touch_buffer(struct buffer_head *bh); | ||||
| void set_bh_page(struct buffer_head *bh, | ||||
| 		struct page *page, unsigned long offset); | ||||
| int try_to_free_buffers(struct page *); | ||||
|  | ||||
| @ -77,10 +77,13 @@ static inline void init_completion(struct completion *x) | ||||
| } | ||||
| 
 | ||||
| extern void wait_for_completion(struct completion *); | ||||
| extern void wait_for_completion_io(struct completion *); | ||||
| extern int wait_for_completion_interruptible(struct completion *x); | ||||
| extern int wait_for_completion_killable(struct completion *x); | ||||
| extern unsigned long wait_for_completion_timeout(struct completion *x, | ||||
| 						   unsigned long timeout); | ||||
| extern unsigned long wait_for_completion_io_timeout(struct completion *x, | ||||
| 						    unsigned long timeout); | ||||
| extern long wait_for_completion_interruptible_timeout( | ||||
| 	struct completion *x, unsigned long timeout); | ||||
| extern long wait_for_completion_killable_timeout( | ||||
|  | ||||
| @ -2,6 +2,7 @@ | ||||
| #define _LINUX_ELEVATOR_H | ||||
| 
 | ||||
| #include <linux/percpu.h> | ||||
| #include <linux/hashtable.h> | ||||
| 
 | ||||
| #ifdef CONFIG_BLOCK | ||||
| 
 | ||||
| @ -96,6 +97,8 @@ struct elevator_type | ||||
| 	struct list_head list; | ||||
| }; | ||||
| 
 | ||||
| #define ELV_HASH_BITS 6 | ||||
| 
 | ||||
| /*
 | ||||
|  * each queue has an elevator_queue associated with it | ||||
|  */ | ||||
| @ -105,8 +108,8 @@ struct elevator_queue | ||||
| 	void *elevator_data; | ||||
| 	struct kobject kobj; | ||||
| 	struct mutex sysfs_lock; | ||||
| 	struct hlist_head *hash; | ||||
| 	unsigned int registered:1; | ||||
| 	DECLARE_HASHTABLE(hash, ELV_HASH_BITS); | ||||
| }; | ||||
| 
 | ||||
| /*
 | ||||
|  | ||||
| @ -6,10 +6,61 @@ | ||||
| 
 | ||||
| #include <linux/blktrace_api.h> | ||||
| #include <linux/blkdev.h> | ||||
| #include <linux/buffer_head.h> | ||||
| #include <linux/tracepoint.h> | ||||
| 
 | ||||
| #define RWBS_LEN	8 | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(block_buffer, | ||||
| 
 | ||||
| 	TP_PROTO(struct buffer_head *bh), | ||||
| 
 | ||||
| 	TP_ARGS(bh), | ||||
| 
 | ||||
| 	TP_STRUCT__entry ( | ||||
| 		__field(  dev_t,	dev			) | ||||
| 		__field(  sector_t,	sector			) | ||||
| 		__field(  size_t,	size			) | ||||
| 	), | ||||
| 
 | ||||
| 	TP_fast_assign( | ||||
| 		__entry->dev		= bh->b_bdev->bd_dev; | ||||
| 		__entry->sector		= bh->b_blocknr; | ||||
| 		__entry->size		= bh->b_size; | ||||
| 	), | ||||
| 
 | ||||
| 	TP_printk("%d,%d sector=%llu size=%zu", | ||||
| 		MAJOR(__entry->dev), MINOR(__entry->dev), | ||||
| 		(unsigned long long)__entry->sector, __entry->size | ||||
| 	) | ||||
| ); | ||||
| 
 | ||||
| /**
 | ||||
|  * block_touch_buffer - mark a buffer accessed | ||||
|  * @bh: buffer_head being touched | ||||
|  * | ||||
|  * Called from touch_buffer(). | ||||
|  */ | ||||
| DEFINE_EVENT(block_buffer, block_touch_buffer, | ||||
| 
 | ||||
| 	TP_PROTO(struct buffer_head *bh), | ||||
| 
 | ||||
| 	TP_ARGS(bh) | ||||
| ); | ||||
| 
 | ||||
| /**
 | ||||
|  * block_dirty_buffer - mark a buffer dirty | ||||
|  * @bh: buffer_head being dirtied | ||||
|  * | ||||
|  * Called from mark_buffer_dirty(). | ||||
|  */ | ||||
| DEFINE_EVENT(block_buffer, block_dirty_buffer, | ||||
| 
 | ||||
| 	TP_PROTO(struct buffer_head *bh), | ||||
| 
 | ||||
| 	TP_ARGS(bh) | ||||
| ); | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(block_rq_with_error, | ||||
| 
 | ||||
| 	TP_PROTO(struct request_queue *q, struct request *rq), | ||||
| @ -206,7 +257,6 @@ TRACE_EVENT(block_bio_bounce, | ||||
| 
 | ||||
| /**
 | ||||
|  * block_bio_complete - completed all work on the block operation | ||||
|  * @q: queue holding the block operation | ||||
|  * @bio: block operation completed | ||||
|  * @error: io error value | ||||
|  * | ||||
| @ -215,9 +265,9 @@ TRACE_EVENT(block_bio_bounce, | ||||
|  */ | ||||
| TRACE_EVENT(block_bio_complete, | ||||
| 
 | ||||
| 	TP_PROTO(struct request_queue *q, struct bio *bio, int error), | ||||
| 	TP_PROTO(struct bio *bio, int error), | ||||
| 
 | ||||
| 	TP_ARGS(q, bio, error), | ||||
| 	TP_ARGS(bio, error), | ||||
| 
 | ||||
| 	TP_STRUCT__entry( | ||||
| 		__field( dev_t,		dev		) | ||||
| @ -228,7 +278,8 @@ TRACE_EVENT(block_bio_complete, | ||||
| 	), | ||||
| 
 | ||||
| 	TP_fast_assign( | ||||
| 		__entry->dev		= bio->bi_bdev->bd_dev; | ||||
| 		__entry->dev		= bio->bi_bdev ? | ||||
| 					  bio->bi_bdev->bd_dev : 0; | ||||
| 		__entry->sector		= bio->bi_sector; | ||||
| 		__entry->nr_sector	= bio->bi_size >> 9; | ||||
| 		__entry->error		= error; | ||||
| @ -241,11 +292,11 @@ TRACE_EVENT(block_bio_complete, | ||||
| 		  __entry->nr_sector, __entry->error) | ||||
| ); | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(block_bio, | ||||
| DECLARE_EVENT_CLASS(block_bio_merge, | ||||
| 
 | ||||
| 	TP_PROTO(struct request_queue *q, struct bio *bio), | ||||
| 	TP_PROTO(struct request_queue *q, struct request *rq, struct bio *bio), | ||||
| 
 | ||||
| 	TP_ARGS(q, bio), | ||||
| 	TP_ARGS(q, rq, bio), | ||||
| 
 | ||||
| 	TP_STRUCT__entry( | ||||
| 		__field( dev_t,		dev			) | ||||
| @ -272,31 +323,33 @@ DECLARE_EVENT_CLASS(block_bio, | ||||
| /**
 | ||||
|  * block_bio_backmerge - merging block operation to the end of an existing operation | ||||
|  * @q: queue holding operation | ||||
|  * @rq: request bio is being merged into | ||||
|  * @bio: new block operation to merge | ||||
|  * | ||||
|  * Merging block request @bio to the end of an existing block request | ||||
|  * in queue @q. | ||||
|  */ | ||||
| DEFINE_EVENT(block_bio, block_bio_backmerge, | ||||
| DEFINE_EVENT(block_bio_merge, block_bio_backmerge, | ||||
| 
 | ||||
| 	TP_PROTO(struct request_queue *q, struct bio *bio), | ||||
| 	TP_PROTO(struct request_queue *q, struct request *rq, struct bio *bio), | ||||
| 
 | ||||
| 	TP_ARGS(q, bio) | ||||
| 	TP_ARGS(q, rq, bio) | ||||
| ); | ||||
| 
 | ||||
| /**
 | ||||
|  * block_bio_frontmerge - merging block operation to the beginning of an existing operation | ||||
|  * @q: queue holding operation | ||||
|  * @rq: request bio is being merged into | ||||
|  * @bio: new block operation to merge | ||||
|  * | ||||
|  * Merging block IO operation @bio to the beginning of an existing block | ||||
|  * operation in queue @q. | ||||
|  */ | ||||
| DEFINE_EVENT(block_bio, block_bio_frontmerge, | ||||
| DEFINE_EVENT(block_bio_merge, block_bio_frontmerge, | ||||
| 
 | ||||
| 	TP_PROTO(struct request_queue *q, struct bio *bio), | ||||
| 	TP_PROTO(struct request_queue *q, struct request *rq, struct bio *bio), | ||||
| 
 | ||||
| 	TP_ARGS(q, bio) | ||||
| 	TP_ARGS(q, rq, bio) | ||||
| ); | ||||
| 
 | ||||
| /**
 | ||||
| @ -306,11 +359,32 @@ DEFINE_EVENT(block_bio, block_bio_frontmerge, | ||||
|  * | ||||
|  * About to place the block IO operation @bio into queue @q. | ||||
|  */ | ||||
| DEFINE_EVENT(block_bio, block_bio_queue, | ||||
| TRACE_EVENT(block_bio_queue, | ||||
| 
 | ||||
| 	TP_PROTO(struct request_queue *q, struct bio *bio), | ||||
| 
 | ||||
| 	TP_ARGS(q, bio) | ||||
| 	TP_ARGS(q, bio), | ||||
| 
 | ||||
| 	TP_STRUCT__entry( | ||||
| 		__field( dev_t,		dev			) | ||||
| 		__field( sector_t,	sector			) | ||||
| 		__field( unsigned int,	nr_sector		) | ||||
| 		__array( char,		rwbs,	RWBS_LEN	) | ||||
| 		__array( char,		comm,	TASK_COMM_LEN	) | ||||
| 	), | ||||
| 
 | ||||
| 	TP_fast_assign( | ||||
| 		__entry->dev		= bio->bi_bdev->bd_dev; | ||||
| 		__entry->sector		= bio->bi_sector; | ||||
| 		__entry->nr_sector	= bio->bi_size >> 9; | ||||
| 		blk_fill_rwbs(__entry->rwbs, bio->bi_rw, bio->bi_size); | ||||
| 		memcpy(__entry->comm, current->comm, TASK_COMM_LEN); | ||||
| 	), | ||||
| 
 | ||||
| 	TP_printk("%d,%d %s %llu + %u [%s]", | ||||
| 		  MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs, | ||||
| 		  (unsigned long long)__entry->sector, | ||||
| 		  __entry->nr_sector, __entry->comm) | ||||
| ); | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(block_get_rq, | ||||
|  | ||||
| @ -32,6 +32,115 @@ | ||||
| 
 | ||||
| struct wb_writeback_work; | ||||
| 
 | ||||
| TRACE_EVENT(writeback_dirty_page, | ||||
| 
 | ||||
| 	TP_PROTO(struct page *page, struct address_space *mapping), | ||||
| 
 | ||||
| 	TP_ARGS(page, mapping), | ||||
| 
 | ||||
| 	TP_STRUCT__entry ( | ||||
| 		__array(char, name, 32) | ||||
| 		__field(unsigned long, ino) | ||||
| 		__field(pgoff_t, index) | ||||
| 	), | ||||
| 
 | ||||
| 	TP_fast_assign( | ||||
| 		strncpy(__entry->name, | ||||
| 			mapping ? dev_name(mapping->backing_dev_info->dev) : "(unknown)", 32); | ||||
| 		__entry->ino = mapping ? mapping->host->i_ino : 0; | ||||
| 		__entry->index = page->index; | ||||
| 	), | ||||
| 
 | ||||
| 	TP_printk("bdi %s: ino=%lu index=%lu", | ||||
| 		__entry->name, | ||||
| 		__entry->ino, | ||||
| 		__entry->index | ||||
| 	) | ||||
| ); | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(writeback_dirty_inode_template, | ||||
| 
 | ||||
| 	TP_PROTO(struct inode *inode, int flags), | ||||
| 
 | ||||
| 	TP_ARGS(inode, flags), | ||||
| 
 | ||||
| 	TP_STRUCT__entry ( | ||||
| 		__array(char, name, 32) | ||||
| 		__field(unsigned long, ino) | ||||
| 		__field(unsigned long, flags) | ||||
| 	), | ||||
| 
 | ||||
| 	TP_fast_assign( | ||||
| 		struct backing_dev_info *bdi = inode->i_mapping->backing_dev_info; | ||||
| 
 | ||||
| 		/* may be called for files on pseudo FSes w/ unregistered bdi */ | ||||
| 		strncpy(__entry->name, | ||||
| 			bdi->dev ? dev_name(bdi->dev) : "(unknown)", 32); | ||||
| 		__entry->ino		= inode->i_ino; | ||||
| 		__entry->flags		= flags; | ||||
| 	), | ||||
| 
 | ||||
| 	TP_printk("bdi %s: ino=%lu flags=%s", | ||||
| 		__entry->name, | ||||
| 		__entry->ino, | ||||
| 		show_inode_state(__entry->flags) | ||||
| 	) | ||||
| ); | ||||
| 
 | ||||
| DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode_start, | ||||
| 
 | ||||
| 	TP_PROTO(struct inode *inode, int flags), | ||||
| 
 | ||||
| 	TP_ARGS(inode, flags) | ||||
| ); | ||||
| 
 | ||||
| DEFINE_EVENT(writeback_dirty_inode_template, writeback_dirty_inode, | ||||
| 
 | ||||
| 	TP_PROTO(struct inode *inode, int flags), | ||||
| 
 | ||||
| 	TP_ARGS(inode, flags) | ||||
| ); | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(writeback_write_inode_template, | ||||
| 
 | ||||
| 	TP_PROTO(struct inode *inode, struct writeback_control *wbc), | ||||
| 
 | ||||
| 	TP_ARGS(inode, wbc), | ||||
| 
 | ||||
| 	TP_STRUCT__entry ( | ||||
| 		__array(char, name, 32) | ||||
| 		__field(unsigned long, ino) | ||||
| 		__field(int, sync_mode) | ||||
| 	), | ||||
| 
 | ||||
| 	TP_fast_assign( | ||||
| 		strncpy(__entry->name, | ||||
| 			dev_name(inode->i_mapping->backing_dev_info->dev), 32); | ||||
| 		__entry->ino		= inode->i_ino; | ||||
| 		__entry->sync_mode	= wbc->sync_mode; | ||||
| 	), | ||||
| 
 | ||||
| 	TP_printk("bdi %s: ino=%lu sync_mode=%d", | ||||
| 		__entry->name, | ||||
| 		__entry->ino, | ||||
| 		__entry->sync_mode | ||||
| 	) | ||||
| ); | ||||
| 
 | ||||
| DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode_start, | ||||
| 
 | ||||
| 	TP_PROTO(struct inode *inode, struct writeback_control *wbc), | ||||
| 
 | ||||
| 	TP_ARGS(inode, wbc) | ||||
| ); | ||||
| 
 | ||||
| DEFINE_EVENT(writeback_write_inode_template, writeback_write_inode, | ||||
| 
 | ||||
| 	TP_PROTO(struct inode *inode, struct writeback_control *wbc), | ||||
| 
 | ||||
| 	TP_ARGS(inode, wbc) | ||||
| ); | ||||
| 
 | ||||
| DECLARE_EVENT_CLASS(writeback_work_class, | ||||
| 	TP_PROTO(struct backing_dev_info *bdi, struct wb_writeback_work *work), | ||||
| 	TP_ARGS(bdi, work), | ||||
| @ -479,6 +588,13 @@ DECLARE_EVENT_CLASS(writeback_single_inode_template, | ||||
| 	) | ||||
| ); | ||||
| 
 | ||||
| DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode_start, | ||||
| 	TP_PROTO(struct inode *inode, | ||||
| 		 struct writeback_control *wbc, | ||||
| 		 unsigned long nr_to_write), | ||||
| 	TP_ARGS(inode, wbc, nr_to_write) | ||||
| ); | ||||
| 
 | ||||
| DEFINE_EVENT(writeback_single_inode_template, writeback_single_inode, | ||||
| 	TP_PROTO(struct inode *inode, | ||||
| 		 struct writeback_control *wbc, | ||||
|  | ||||
| @ -3258,7 +3258,8 @@ void complete_all(struct completion *x) | ||||
| EXPORT_SYMBOL(complete_all); | ||||
| 
 | ||||
| static inline long __sched | ||||
| do_wait_for_common(struct completion *x, long timeout, int state) | ||||
| do_wait_for_common(struct completion *x, | ||||
| 		   long (*action)(long), long timeout, int state) | ||||
| { | ||||
| 	if (!x->done) { | ||||
| 		DECLARE_WAITQUEUE(wait, current); | ||||
| @ -3271,7 +3272,7 @@ do_wait_for_common(struct completion *x, long timeout, int state) | ||||
| 			} | ||||
| 			__set_current_state(state); | ||||
| 			spin_unlock_irq(&x->wait.lock); | ||||
| 			timeout = schedule_timeout(timeout); | ||||
| 			timeout = action(timeout); | ||||
| 			spin_lock_irq(&x->wait.lock); | ||||
| 		} while (!x->done && timeout); | ||||
| 		__remove_wait_queue(&x->wait, &wait); | ||||
| @ -3282,17 +3283,30 @@ do_wait_for_common(struct completion *x, long timeout, int state) | ||||
| 	return timeout ?: 1; | ||||
| } | ||||
| 
 | ||||
| static long __sched | ||||
| wait_for_common(struct completion *x, long timeout, int state) | ||||
| static inline long __sched | ||||
| __wait_for_common(struct completion *x, | ||||
| 		  long (*action)(long), long timeout, int state) | ||||
| { | ||||
| 	might_sleep(); | ||||
| 
 | ||||
| 	spin_lock_irq(&x->wait.lock); | ||||
| 	timeout = do_wait_for_common(x, timeout, state); | ||||
| 	timeout = do_wait_for_common(x, action, timeout, state); | ||||
| 	spin_unlock_irq(&x->wait.lock); | ||||
| 	return timeout; | ||||
| } | ||||
| 
 | ||||
| static long __sched | ||||
| wait_for_common(struct completion *x, long timeout, int state) | ||||
| { | ||||
| 	return __wait_for_common(x, schedule_timeout, timeout, state); | ||||
| } | ||||
| 
 | ||||
| static long __sched | ||||
| wait_for_common_io(struct completion *x, long timeout, int state) | ||||
| { | ||||
| 	return __wait_for_common(x, io_schedule_timeout, timeout, state); | ||||
| } | ||||
| 
 | ||||
| /**
 | ||||
|  * wait_for_completion: - waits for completion of a task | ||||
|  * @x:  holds the state of this particular completion | ||||
| @ -3328,6 +3342,39 @@ wait_for_completion_timeout(struct completion *x, unsigned long timeout) | ||||
| } | ||||
| EXPORT_SYMBOL(wait_for_completion_timeout); | ||||
| 
 | ||||
| /**
 | ||||
|  * wait_for_completion_io: - waits for completion of a task | ||||
|  * @x:  holds the state of this particular completion | ||||
|  * | ||||
|  * This waits to be signaled for completion of a specific task. It is NOT | ||||
|  * interruptible and there is no timeout. The caller is accounted as waiting | ||||
|  * for IO. | ||||
|  */ | ||||
| void __sched wait_for_completion_io(struct completion *x) | ||||
| { | ||||
| 	wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE); | ||||
| } | ||||
| EXPORT_SYMBOL(wait_for_completion_io); | ||||
| 
 | ||||
| /**
 | ||||
|  * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout) | ||||
|  * @x:  holds the state of this particular completion | ||||
|  * @timeout:  timeout value in jiffies | ||||
|  * | ||||
|  * This waits for either a completion of a specific task to be signaled or for a | ||||
|  * specified timeout to expire. The timeout is in jiffies. It is not | ||||
|  * interruptible. The caller is accounted as waiting for IO. | ||||
|  * | ||||
|  * The return value is 0 if timed out, and positive (at least 1, or number of | ||||
|  * jiffies left till timeout) if completed. | ||||
|  */ | ||||
| unsigned long __sched | ||||
| wait_for_completion_io_timeout(struct completion *x, unsigned long timeout) | ||||
| { | ||||
| 	return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE); | ||||
| } | ||||
| EXPORT_SYMBOL(wait_for_completion_io_timeout); | ||||
| 
 | ||||
| /**
 | ||||
|  * wait_for_completion_interruptible: - waits for completion of a task (w/intr) | ||||
|  * @x:  holds the state of this particular completion | ||||
|  | ||||
| @ -739,6 +739,12 @@ static void blk_add_trace_rq_complete(void *ignore, | ||||
| 				      struct request_queue *q, | ||||
| 				      struct request *rq) | ||||
| { | ||||
| 	struct blk_trace *bt = q->blk_trace; | ||||
| 
 | ||||
| 	/* if control ever passes through here, it's a request based driver */ | ||||
| 	if (unlikely(bt && !bt->rq_based)) | ||||
| 		bt->rq_based = true; | ||||
| 
 | ||||
| 	blk_add_trace_rq(q, rq, BLK_TA_COMPLETE); | ||||
| } | ||||
| 
 | ||||
| @ -774,15 +780,30 @@ static void blk_add_trace_bio_bounce(void *ignore, | ||||
| 	blk_add_trace_bio(q, bio, BLK_TA_BOUNCE, 0); | ||||
| } | ||||
| 
 | ||||
| static void blk_add_trace_bio_complete(void *ignore, | ||||
| 				       struct request_queue *q, struct bio *bio, | ||||
| 				       int error) | ||||
| static void blk_add_trace_bio_complete(void *ignore, struct bio *bio, int error) | ||||
| { | ||||
| 	struct request_queue *q; | ||||
| 	struct blk_trace *bt; | ||||
| 
 | ||||
| 	if (!bio->bi_bdev) | ||||
| 		return; | ||||
| 
 | ||||
| 	q = bdev_get_queue(bio->bi_bdev); | ||||
| 	bt = q->blk_trace; | ||||
| 
 | ||||
| 	/*
 | ||||
| 	 * Request based drivers will generate both rq and bio completions. | ||||
| 	 * Ignore bio ones. | ||||
| 	 */ | ||||
| 	if (likely(!bt) || bt->rq_based) | ||||
| 		return; | ||||
| 
 | ||||
| 	blk_add_trace_bio(q, bio, BLK_TA_COMPLETE, error); | ||||
| } | ||||
| 
 | ||||
| static void blk_add_trace_bio_backmerge(void *ignore, | ||||
| 					struct request_queue *q, | ||||
| 					struct request *rq, | ||||
| 					struct bio *bio) | ||||
| { | ||||
| 	blk_add_trace_bio(q, bio, BLK_TA_BACKMERGE, 0); | ||||
| @ -790,6 +811,7 @@ static void blk_add_trace_bio_backmerge(void *ignore, | ||||
| 
 | ||||
| static void blk_add_trace_bio_frontmerge(void *ignore, | ||||
| 					 struct request_queue *q, | ||||
| 					 struct request *rq, | ||||
| 					 struct bio *bio) | ||||
| { | ||||
| 	blk_add_trace_bio(q, bio, BLK_TA_FRONTMERGE, 0); | ||||
|  | ||||
| @ -1986,6 +1986,8 @@ int __set_page_dirty_no_writeback(struct page *page) | ||||
|  */ | ||||
| void account_page_dirtied(struct page *page, struct address_space *mapping) | ||||
| { | ||||
| 	trace_writeback_dirty_page(page, mapping); | ||||
| 
 | ||||
| 	if (mapping_cap_account_dirty(mapping)) { | ||||
| 		__inc_zone_page_state(page, NR_FILE_DIRTY); | ||||
| 		__inc_zone_page_state(page, NR_DIRTIED); | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user