linux/drivers/crypto/caam/qi.h

205 lines
6.4 KiB
C
Raw Normal View History

crypto: caam - add Queue Interface (QI) backend support CAAM engine supports two interfaces for crypto job submission: -job ring interface - already existing caam/jr driver -Queue Interface (QI) - caam/qi driver added in current patch QI is present in CAAM engines found on DPAA platforms. QI gets its I/O (frame descriptors) from QMan (Queue Manager) queues. This patch adds a platform device for accessing CAAM's queue interface. The requests are submitted to CAAM using one frame queue per cryptographic context. Each crypto context has one shared descriptor. This shared descriptor is attached to frame queue associated with corresponding driver context using context_a. The driver hides the mechanics of FQ creation, initialisation from its applications. Each cryptographic context needs to be associated with driver context which houses the FQ to be used to transport the job to CAAM. The driver provides API for: (a) Context creation (b) Job submission (c) Context deletion (d) Congestion indication - whether path to/from CAAM is congested The driver supports affining its context to a particular CPU. This means that any responses from CAAM for the context in question would arrive at the given CPU. This helps in implementing one CPU per packet round trip in IPsec application. The driver processes CAAM responses under NAPI contexts. NAPI contexts are instantiated only on cores with affined portals since only cores having their own portal can receive responses from DQRR. The responses from CAAM for all cryptographic contexts ride on a fixed set of FQs. We use one response FQ per portal owning core. The response FQ is configured in each core's and thus portal's dedicated channel. This gives the flexibility to direct CAAM's responses for a crypto context on a given core. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-17 10:06:01 +00:00
/*
* Public definitions for the CAAM/QI (Queue Interface) backend.
*
* Copyright 2013-2016 Freescale Semiconductor, Inc.
* Copyright 2016-2017 NXP
*/
#ifndef __QI_H__
#define __QI_H__
#include <soc/fsl/qman.h>
#include "compat.h"
#include "desc.h"
#include "desc_constr.h"
/*
* CAAM hardware constructs a job descriptor which points to a shared descriptor
* (as pointed by context_a of to-CAAM FQ).
* When the job descriptor is executed by DECO, the whole job descriptor
* together with shared descriptor gets loaded in DECO buffer, which is
* 64 words (each 32-bit) long.
*
* The job descriptor constructed by CAAM hardware has the following layout:
*
* HEADER (1 word)
* Shdesc ptr (1 or 2 words)
* SEQ_OUT_PTR (1 word)
* Out ptr (1 or 2 words)
* Out length (1 word)
* SEQ_IN_PTR (1 word)
* In ptr (1 or 2 words)
* In length (1 word)
*
* The shdesc ptr is used to fetch shared descriptor contents into DECO buffer.
*
* Apart from shdesc contents, the total number of words that get loaded in DECO
* buffer are '8' or '11'. The remaining words in DECO buffer can be used for
* storing shared descriptor.
*/
#define MAX_SDLEN ((CAAM_DESC_BYTES_MAX - DESC_JOB_IO_LEN) / CAAM_CMD_SZ)
crypto: caam/qi - handle large number of S/Gs case For more than 16 S/G entries, driver currently corrupts memory on ARMv8, see below KASAN log. Note: this does not reproduce on PowerPC due to different (smaller) cache line size - 64 bytes on PPC vs. 128 bytes on ARMv8. One such use case is one of the cbc(aes) test vectors - with 8 S/G entries and src != dst. Driver needs 1 (IV) + 2 x 8 = 17 entries, which goes over the 16 S/G entries limit: (CAAM_QI_MEMCACHE_SIZE - offsetof(struct ablkcipher_edesc, sgt)) / sizeof(struct qm_sg_entry) = 256 / 16 = 16 S/Gs Fix this by: -increasing object size in caamqicache pool from 512 to 768; this means the maximum number of S/G entries grows from (at least) 16 to 32 (again, for ARMv8 case of 128-byte cache line) -add checks in the driver to fail gracefully (ENOMEM) in case the 32 S/G entries limit is exceeded ================================================================== BUG: KASAN: slab-out-of-bounds in ablkcipher_edesc_alloc+0x4ec/0xf60 Write of size 1 at addr ffff800021cb6003 by task cryptomgr_test/1394 CPU: 3 PID: 1394 Comm: cryptomgr_test Not tainted 4.12.0-rc7-next-20170703-00023-g72badbcc1ea7-dirty #26 Hardware name: LS1046A RDB Board (DT) Call trace: [<ffff20000808ac6c>] dump_backtrace+0x0/0x290 [<ffff20000808b014>] show_stack+0x14/0x1c [<ffff200008d62c00>] dump_stack+0xa4/0xc8 [<ffff200008264e40>] print_address_description+0x110/0x26c [<ffff200008265224>] kasan_report+0x1d0/0x2fc [<ffff2000082637b8>] __asan_store1+0x4c/0x54 [<ffff200008b4884c>] ablkcipher_edesc_alloc+0x4ec/0xf60 [<ffff200008b49304>] ablkcipher_encrypt+0x44/0xcc [<ffff20000848a61c>] skcipher_encrypt_ablkcipher+0x120/0x138 [<ffff200008495014>] __test_skcipher+0xaec/0xe30 [<ffff200008497088>] test_skcipher+0x6c/0xd8 [<ffff200008497154>] alg_test_skcipher+0x60/0xe4 [<ffff2000084974c4>] alg_test.part.13+0x130/0x304 [<ffff2000084976d4>] alg_test+0x3c/0x68 [<ffff2000084938ac>] cryptomgr_test+0x54/0x5c [<ffff20000810276c>] kthread+0x188/0x1c8 [<ffff2000080836c0>] ret_from_fork+0x10/0x50 Allocated by task 1394: save_stack_trace_tsk+0x0/0x1ac save_stack_trace+0x18/0x20 kasan_kmalloc.part.5+0x48/0x110 kasan_kmalloc+0x84/0xa0 kasan_slab_alloc+0x14/0x1c kmem_cache_alloc+0x124/0x1e8 qi_cache_alloc+0x28/0x58 ablkcipher_edesc_alloc+0x244/0xf60 ablkcipher_encrypt+0x44/0xcc skcipher_encrypt_ablkcipher+0x120/0x138 __test_skcipher+0xaec/0xe30 test_skcipher+0x6c/0xd8 alg_test_skcipher+0x60/0xe4 alg_test.part.13+0x130/0x304 alg_test+0x3c/0x68 cryptomgr_test+0x54/0x5c kthread+0x188/0x1c8 ret_from_fork+0x10/0x50 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff800021cb5e00 which belongs to the cache caamqicache of size 512 The buggy address is located 3 bytes to the right of 512-byte region [ffff800021cb5e00, ffff800021cb6000) The buggy address belongs to the page: page:ffff7e0000872d00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0xfffc00000008100(slab|head) raw: 0fffc00000008100 0000000000000000 0000000000000000 0000000180190019 raw: dead000000000100 dead000000000200 ffff800931268200 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff800021cb5f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff800021cb5f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff800021cb6000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff800021cb6080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff800021cb6100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-07-10 05:40:31 +00:00
/* Length of a single buffer in the QI driver memory cache */
#define CAAM_QI_MEMCACHE_SIZE 768
crypto: caam - add Queue Interface (QI) backend support CAAM engine supports two interfaces for crypto job submission: -job ring interface - already existing caam/jr driver -Queue Interface (QI) - caam/qi driver added in current patch QI is present in CAAM engines found on DPAA platforms. QI gets its I/O (frame descriptors) from QMan (Queue Manager) queues. This patch adds a platform device for accessing CAAM's queue interface. The requests are submitted to CAAM using one frame queue per cryptographic context. Each crypto context has one shared descriptor. This shared descriptor is attached to frame queue associated with corresponding driver context using context_a. The driver hides the mechanics of FQ creation, initialisation from its applications. Each cryptographic context needs to be associated with driver context which houses the FQ to be used to transport the job to CAAM. The driver provides API for: (a) Context creation (b) Job submission (c) Context deletion (d) Congestion indication - whether path to/from CAAM is congested The driver supports affining its context to a particular CPU. This means that any responses from CAAM for the context in question would arrive at the given CPU. This helps in implementing one CPU per packet round trip in IPsec application. The driver processes CAAM responses under NAPI contexts. NAPI contexts are instantiated only on cores with affined portals since only cores having their own portal can receive responses from DQRR. The responses from CAAM for all cryptographic contexts ride on a fixed set of FQs. We use one response FQ per portal owning core. The response FQ is configured in each core's and thus portal's dedicated channel. This gives the flexibility to direct CAAM's responses for a crypto context on a given core. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: Alex Porosanu <alexandru.porosanu@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-03-17 10:06:01 +00:00
extern bool caam_congested __read_mostly;
/*
* This is the request structure the driver application should fill while
* submitting a job to driver.
*/
struct caam_drv_req;
/*
* caam_qi_cbk - application's callback function invoked by the driver when the
* request has been successfully processed.
* @drv_req: original request that was submitted
* @status: completion status of request (0 - success, non-zero - error code)
*/
typedef void (*caam_qi_cbk)(struct caam_drv_req *drv_req, u32 status);
enum optype {
ENCRYPT,
DECRYPT,
GIVENCRYPT,
NUM_OP
};
/**
* caam_drv_ctx - CAAM/QI backend driver context
*
* The jobs are processed by the driver against a driver context.
* With every cryptographic context, a driver context is attached.
* The driver context contains data for private use by driver.
* For the applications, this is an opaque structure.
*
* @prehdr: preheader placed before shrd desc
* @sh_desc: shared descriptor
* @context_a: shared descriptor dma address
* @req_fq: to-CAAM request frame queue
* @rsp_fq: from-CAAM response frame queue
* @cpu: cpu on which to receive CAAM response
* @op_type: operation type
* @qidev: device pointer for CAAM/QI backend
*/
struct caam_drv_ctx {
u32 prehdr[2];
u32 sh_desc[MAX_SDLEN];
dma_addr_t context_a;
struct qman_fq *req_fq;
struct qman_fq *rsp_fq;
int cpu;
enum optype op_type;
struct device *qidev;
} ____cacheline_aligned;
/**
* caam_drv_req - The request structure the driver application should fill while
* submitting a job to driver.
* @fd_sgt: QMan S/G pointing to output (fd_sgt[0]) and input (fd_sgt[1])
* buffers.
* @cbk: callback function to invoke when job is completed
* @app_ctx: arbitrary context attached with request by the application
*
* The fields mentioned below should not be used by application.
* These are for private use by driver.
*
* @hdr__: linked list header to maintain list of outstanding requests to CAAM
* @hwaddr: DMA address for the S/G table.
*/
struct caam_drv_req {
struct qm_sg_entry fd_sgt[2];
struct caam_drv_ctx *drv_ctx;
caam_qi_cbk cbk;
void *app_ctx;
} ____cacheline_aligned;
/**
* caam_drv_ctx_init - Initialise a CAAM/QI driver context
*
* A CAAM/QI driver context must be attached with each cryptographic context.
* This function allocates memory for CAAM/QI context and returns a handle to
* the application. This handle must be submitted along with each enqueue
* request to the driver by the application.
*
* @cpu: CPU where the application prefers to the driver to receive CAAM
* responses. The request completion callback would be issued from this
* CPU.
* @sh_desc: shared descriptor pointer to be attached with CAAM/QI driver
* context.
*
* Returns a driver context on success or negative error code on failure.
*/
struct caam_drv_ctx *caam_drv_ctx_init(struct device *qidev, int *cpu,
u32 *sh_desc);
/**
* caam_qi_enqueue - Submit a request to QI backend driver.
*
* The request structure must be properly filled as described above.
*
* @qidev: device pointer for QI backend
* @req: CAAM QI request structure
*
* Returns 0 on success or negative error code on failure.
*/
int caam_qi_enqueue(struct device *qidev, struct caam_drv_req *req);
/**
* caam_drv_ctx_busy - Check if there are too many jobs pending with CAAM
* or too many CAAM responses are pending to be processed.
* @drv_ctx: driver context for which job is to be submitted
*
* Returns caam congestion status 'true/false'
*/
bool caam_drv_ctx_busy(struct caam_drv_ctx *drv_ctx);
/**
* caam_drv_ctx_update - Update QI driver context
*
* Invoked when shared descriptor is required to be change in driver context.
*
* @drv_ctx: driver context to be updated
* @sh_desc: new shared descriptor pointer to be updated in QI driver context
*
* Returns 0 on success or negative error code on failure.
*/
int caam_drv_ctx_update(struct caam_drv_ctx *drv_ctx, u32 *sh_desc);
/**
* caam_drv_ctx_rel - Release a QI driver context
* @drv_ctx: context to be released
*/
void caam_drv_ctx_rel(struct caam_drv_ctx *drv_ctx);
int caam_qi_init(struct platform_device *pdev);
int caam_qi_shutdown(struct device *dev);
/**
* qi_cache_alloc - Allocate buffers from CAAM-QI cache
*
* Invoked when a user of the CAAM-QI (i.e. caamalg-qi) needs data which has
* to be allocated on the hotpath. Instead of using malloc, one can use the
* services of the CAAM QI memory cache (backed by kmem_cache). The buffers
* will have a size of 256B, which is sufficient for hosting 16 SG entries.
*
* @flags: flags that would be used for the equivalent malloc(..) call
*
* Returns a pointer to a retrieved buffer on success or NULL on failure.
*/
void *qi_cache_alloc(gfp_t flags);
/**
* qi_cache_free - Frees buffers allocated from CAAM-QI cache
*
* Invoked when a user of the CAAM-QI (i.e. caamalg-qi) no longer needs
* the buffer previously allocated by a qi_cache_alloc call.
* No checking is being done, the call is a passthrough call to
* kmem_cache_free(...)
*
* @obj: object previously allocated using qi_cache_alloc()
*/
void qi_cache_free(void *obj);
#endif /* __QI_H__ */