linux/drivers/virtio
Venkatesh Srinivas f277ec42f3 virtio_ring: shadow available ring flags & index
Improves cacheline transfer flow of available ring header.

Virtqueues are implemented as a pair of rings, one producer->consumer
avail ring and one consumer->producer used ring; preceding the
avail ring in memory are two contiguous u16 fields -- avail->flags
and avail->idx. A producer posts work by writing to avail->idx and
a consumer reads avail->idx.

The flags and idx fields only need to be written by a producer CPU
and only read by a consumer CPU; when the producer and consumer are
running on different CPUs and the virtio_ring code is structured to
only have source writes/sink reads, we can continuously transfer the
avail header cacheline between 'M' states between cores. This flow
optimizes core -> core bandwidth on certain CPUs.

(see: "Software Optimization Guide for AMD Family 15h Processors",
Section 11.6; similar language appears in the 10h guide and should
apply to CPUs w/ exclusive caches, using LLC as a transfer cache)

Unfortunately the existing virtio_ring code issued reads to the
avail->idx and read-modify-writes to avail->flags on the producer.

This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.

In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:

(w/o shadowing):
 Performance counter stats for './vring_bench':
     5,451,082,016      L1-dcache-loads
     ...
       2.221477739 seconds time elapsed

(w/ shadowing):
 Performance counter stats for './vring_bench':
     5,405,701,361      L1-dcache-loads
     ...
       2.168405376 seconds time elapsed

The further away (in a NUMA sense) virtio producers and consumers are
from each other, the more we expect to benefit. Physical implementations
of virtio devices and implementations of virtio where the consumer polls
vring avail indexes (vhost) should also benefit.

Signed-off-by: Venkatesh Srinivas <venkateshs@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2015-12-07 17:28:11 +02:00
..
config.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
Kconfig Add virtio-input driver. 2015-03-29 12:13:52 +10:30
Makefile Add virtio-input driver. 2015-03-29 12:13:52 +10:30
virtio_balloon.c virtio_balloon: do not change memory amount visible via /proc/meminfo 2015-09-08 13:32:11 +03:00
virtio_input.c virtio-input: reset device and detach unused during remove 2015-08-06 10:40:35 +03:00
virtio_mmio.c virtio_mmio: add ACPI probing 2015-09-08 13:30:28 +03:00
virtio_pci_common.c virtio/vhost: cross endian support 2015-07-03 16:02:25 -07:00
virtio_pci_common.h virtio-pci: alloc only resources actually used. 2015-06-24 08:15:09 +02:00
virtio_pci_legacy.c virtio-pci: alloc only resources actually used. 2015-06-24 08:15:09 +02:00
virtio_pci_modern.c virtio-pci: alloc only resources actually used. 2015-06-24 08:15:09 +02:00
virtio_ring.c virtio_ring: shadow available ring flags & index 2015-12-07 17:28:11 +02:00
virtio.c virtio: fix memory leak of virtio ida cache layers 2015-12-07 17:28:01 +02:00