Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
this is a decent win:
32regs : 17932.800 MB/sec
altivec : 19724.800 MB/sec
The bigger gain is when the same test is run in SMT4 mode, as it
would if there was a lot of work going on:
8regs : 8377.600 MB/sec
altivec : 15801.600 MB/sec
I tested this against an array created without the patch, and also
verified it worked as expected on a little endian kernel.
[ Fix !CONFIG_ALTIVEC build -- BenH ]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>