Fix typos & grammar. Use CPU instead of cpu in text. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
		
			
				
	
	
		
			161 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			161 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| The padata parallel execution mechanism
 | |
| Last updated for 2.6.36
 | |
| 
 | |
| Padata is a mechanism by which the kernel can farm work out to be done in
 | |
| parallel on multiple CPUs while retaining the ordering of tasks.  It was
 | |
| developed for use with the IPsec code, which needs to be able to perform
 | |
| encryption and decryption on large numbers of packets without reordering
 | |
| those packets.  The crypto developers made a point of writing padata in a
 | |
| sufficiently general fashion that it could be put to other uses as well.
 | |
| 
 | |
| The first step in using padata is to set up a padata_instance structure for
 | |
| overall control of how tasks are to be run:
 | |
| 
 | |
|     #include <linux/padata.h>
 | |
| 
 | |
|     struct padata_instance *padata_alloc(struct workqueue_struct *wq,
 | |
| 					 const struct cpumask *pcpumask,
 | |
| 					 const struct cpumask *cbcpumask);
 | |
| 
 | |
| The pcpumask describes which processors will be used to execute work
 | |
| submitted to this instance in parallel. The cbcpumask defines which
 | |
| processors are allowed to be used as the serialization callback processor.
 | |
| The workqueue wq is where the work will actually be done; it should be
 | |
| a multithreaded queue, naturally.
 | |
| 
 | |
| To allocate a padata instance with the cpu_possible_mask for both
 | |
| cpumasks this helper function can be used:
 | |
| 
 | |
|     struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq);
 | |
| 
 | |
| Note: Padata maintains two kinds of cpumasks internally. The user supplied
 | |
| cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable'
 | |
| cpumasks. The usable cpumasks are always a subset of active CPUs in the
 | |
| user supplied cpumasks; these are the cpumasks padata actually uses. So
 | |
| it is legal to supply a cpumask to padata that contains offline CPUs.
 | |
| Once an offline CPU in the user supplied cpumask comes online, padata
 | |
| is going to use it.
 | |
| 
 | |
| There are functions for enabling and disabling the instance:
 | |
| 
 | |
|     int padata_start(struct padata_instance *pinst);
 | |
|     void padata_stop(struct padata_instance *pinst);
 | |
| 
 | |
| These functions are setting or clearing the "PADATA_INIT" flag;
 | |
| if that flag is not set, other functions will refuse to work.
 | |
| padata_start returns zero on success (flag set) or -EINVAL if the
 | |
| padata cpumask contains no active CPU (flag not set).
 | |
| padata_stop clears the flag and blocks until the padata instance
 | |
| is unused.
 | |
| 
 | |
| The list of CPUs to be used can be adjusted with these functions:
 | |
| 
 | |
|     int padata_set_cpumasks(struct padata_instance *pinst,
 | |
| 			    cpumask_var_t pcpumask,
 | |
| 			    cpumask_var_t cbcpumask);
 | |
|     int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type,
 | |
| 			   cpumask_var_t cpumask);
 | |
|     int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask);
 | |
|     int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask);
 | |
| 
 | |
| Changing the CPU masks are expensive operations, though, so it should not be
 | |
| done with great frequency.
 | |
| 
 | |
| It's possible to change both cpumasks of a padata instance with
 | |
| padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask)
 | |
| and for the serial callback function (cbcpumask). padata_set_cpumask is used to
 | |
| change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL,
 | |
| PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use.
 | |
| To simply add or remove one CPU from a certain cpumask the functions
 | |
| padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or
 | |
| remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL.
 | |
| 
 | |
| If a user is interested in padata cpumask changes, he can register to
 | |
| the padata cpumask change notifier:
 | |
| 
 | |
|     int padata_register_cpumask_notifier(struct padata_instance *pinst,
 | |
| 					 struct notifier_block *nblock);
 | |
| 
 | |
| To unregister from that notifier:
 | |
| 
 | |
|     int padata_unregister_cpumask_notifier(struct padata_instance *pinst,
 | |
| 					   struct notifier_block *nblock);
 | |
| 
 | |
| The padata cpumask change notifier notifies about changes of the usable
 | |
| cpumasks, i.e. the subset of active CPUs in the user supplied cpumask.
 | |
| 
 | |
| Padata calls the notifier chain with:
 | |
| 
 | |
|     blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
 | |
| 				 notification_mask,
 | |
| 				 &pd_new->cpumask);
 | |
| 
 | |
| Here cpumask_change_notifier is registered notifier, notification_mask
 | |
| is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer
 | |
| to a struct padata_cpumask that contains the new cpumask information.
 | |
| 
 | |
| Actually submitting work to the padata instance requires the creation of a
 | |
| padata_priv structure:
 | |
| 
 | |
|     struct padata_priv {
 | |
|         /* Other stuff here... */
 | |
| 	void                    (*parallel)(struct padata_priv *padata);
 | |
| 	void                    (*serial)(struct padata_priv *padata);
 | |
|     };
 | |
| 
 | |
| This structure will almost certainly be embedded within some larger
 | |
| structure specific to the work to be done.  Most of its fields are private to
 | |
| padata, but the structure should be zeroed at initialisation time, and the
 | |
| parallel() and serial() functions should be provided.  Those functions will
 | |
| be called in the process of getting the work done as we will see
 | |
| momentarily.
 | |
| 
 | |
| The submission of work is done with:
 | |
| 
 | |
|     int padata_do_parallel(struct padata_instance *pinst,
 | |
| 		           struct padata_priv *padata, int cb_cpu);
 | |
| 
 | |
| The pinst and padata structures must be set up as described above; cb_cpu
 | |
| specifies which CPU will be used for the final callback when the work is
 | |
| done; it must be in the current instance's CPU mask.  The return value from
 | |
| padata_do_parallel() is zero on success, indicating that the work is in
 | |
| progress. -EBUSY means that somebody, somewhere else is messing with the
 | |
| instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being
 | |
| in that CPU mask or about a not running instance.
 | |
| 
 | |
| Each task submitted to padata_do_parallel() will, in turn, be passed to
 | |
| exactly one call to the above-mentioned parallel() function, on one CPU, so
 | |
| true parallelism is achieved by submitting multiple tasks.  Despite the
 | |
| fact that the workqueue is used to make these calls, parallel() is run with
 | |
| software interrupts disabled and thus cannot sleep.  The parallel()
 | |
| function gets the padata_priv structure pointer as its lone parameter;
 | |
| information about the actual work to be done is probably obtained by using
 | |
| container_of() to find the enclosing structure.
 | |
| 
 | |
| Note that parallel() has no return value; the padata subsystem assumes that
 | |
| parallel() will take responsibility for the task from this point.  The work
 | |
| need not be completed during this call, but, if parallel() leaves work
 | |
| outstanding, it should be prepared to be called again with a new job before
 | |
| the previous one completes.  When a task does complete, parallel() (or
 | |
| whatever function actually finishes the job) should inform padata of the
 | |
| fact with a call to:
 | |
| 
 | |
|     void padata_do_serial(struct padata_priv *padata);
 | |
| 
 | |
| At some point in the future, padata_do_serial() will trigger a call to the
 | |
| serial() function in the padata_priv structure.  That call will happen on
 | |
| the CPU requested in the initial call to padata_do_parallel(); it, too, is
 | |
| done through the workqueue, but with local software interrupts disabled.
 | |
| Note that this call may be deferred for a while since the padata code takes
 | |
| pains to ensure that tasks are completed in the order in which they were
 | |
| submitted.
 | |
| 
 | |
| The one remaining function in the padata API should be called to clean up
 | |
| when a padata instance is no longer needed:
 | |
| 
 | |
|     void padata_free(struct padata_instance *pinst);
 | |
| 
 | |
| This function will busy-wait while any remaining tasks are completed, so it
 | |
| might be best not to call it while there is work outstanding.  Shutting
 | |
| down the workqueue, if necessary, should be done separately.
 |