Instead of shuffling bits from two adjacent 16 bit words, use one 16 bit word with the appropriate byte offset in the buffer. Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>