linux/Documentation/ABI
Jens Axboe f35546e072 Merge branch 'stable/for-jens-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers
Konrad writes:

It has the 'feature-max-indirect-segments' implemented in both backend
and frontend. The current problem with the backend and frontend is that the
segment size is limited to 11 pages. It means we can at most squeeze in 44kB per
request. The ring can hold 32 (next power of two below 36) requests, meaning we
can do 1.4M of outstanding requests. Nowadays that is not enough.

The problem in the past was addressed in two ways - but neither one went upstream.
The first solution to this proposed by Justin from Spectralogic was to negotiate
the segment size.  This means that the ‘struct blkif_sring_entry’ is now a variable size.
It can expand from 112 bytes (cover 11 pages of data - 44kB) to 1580 bytes
(256 pages of data - so 1MB). It is a simple extension by just making the array in the
request expand from 11 to a variable size negotiated. But it had limits: this extension
still limits the number of segments per request to 255 (as the total number must be
specified in the request, which only has an 8-bit field for that purpose).

The other solution (from Intel - Ronghui) was to create one extra ring that only has the
‘struct blkif_request_segment’ in them. The ‘struct blkif_request’ would be changed to have
an index in said ‘segment ring’. There is only one segment ring. This means that the size of
the initial ring is still the same. The requests would point to the segment and enumerate out
how many of the indexes it wants to use. The limit is of course the size of the segment.
If one assumes a one-page segment this means we can in one request cover ~4MB.

Those patches were posted as RFC and the author never followed up on the ideas on changing
it to be a bit more flexible.

There is yet another mechanism that could be employed  (which these patches implement) - and it
borrows from VirtIO protocol. And that is the ‘indirect descriptors’. This very similar to
what Intel suggests, but with a twist. The twist is to negotiate how many of these
'segment' pages (aka indirect descriptor pages) we want to support (in reality we negotiate
how many entries in the segment we want to cover, and we module the number if it is
bigger than the segment size).

This means that with the existing 36 slots in the ring (single page) we can cover:
32 slots * each blkif_request_indirect covers: 512 * 4096 ~= 64M. Since we ample space
in the blkif_request_indirect to span more than one indirect page, that number (64M)
can be also multiplied by eight = 512MB.

Roger Pau Monne took the idea and implemented them in these patches. They work
great and the corner cases (migration between backends with and without this extension)
work nicely. The backend has a limit right now off how many indirect entries
it can handle: one indirect page, and at maximum 256 entries (out of 512 - so  50% of the page
is used). That comes out to 32 slots * 256 entries in a indirect page * 1 indirect page
per request * 4096 = 32MB.

This is a conservative number that can change in the future. Right now it strikes
a good balance between giving excellent performance, memory usage in the backend, and
balancing the needs of many guests.

In the patchset there is also the split of the blkback structure to be per-VBD.
This means that the spinlock contention we had with many guests trying to do I/O and
all the blkback threads hitting the same lock has been eliminated.

Also there are bug-fixes to deal with oddly sized sectors, insane amounts on
th ring, and also a security fix (posted earlier).
2013-06-28 16:01:14 +02:00
..
obsolete Merge branches 'for-3.7/upstream-fixes', 'for-3.8/hidraw', 'for-3.8/i2c-hid', 'for-3.8/multitouch', 'for-3.8/roccat', 'for-3.8/sensors' and 'for-3.8/upstream' into for-linus 2012-12-12 21:41:55 +01:00
removed netfilter: remove ip_queue support 2012-05-08 20:25:42 +02:00
stable xen-blkback/sysfs: Move the parameters for the persistent grant features 2013-06-04 15:58:43 -04:00
testing Merge branch 'stable/for-jens-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.11/drivers 2013-06-28 16:01:14 +02:00
README Documentation: remove reference to feature-removal-schedule.txt 2012-12-17 17:15:12 -08:00

This directory attempts to document the ABI between the Linux kernel and
userspace, and the relative stability of these interfaces.  Due to the
everchanging nature of Linux, and the differing maturity levels, these
interfaces should be used by userspace programs in different ways.

We have four different levels of ABI stability, as shown by the four
different subdirectories in this location.  Interfaces may change levels
of stability according to the rules described below.

The different levels of stability are:

  stable/
	This directory documents the interfaces that the developer has
	defined to be stable.  Userspace programs are free to use these
	interfaces with no restrictions, and backward compatibility for
	them will be guaranteed for at least 2 years.  Most interfaces
	(like syscalls) are expected to never change and always be
	available.

  testing/
	This directory documents interfaces that are felt to be stable,
	as the main development of this interface has been completed.
	The interface can be changed to add new features, but the
	current interface will not break by doing this, unless grave
	errors or security problems are found in them.  Userspace
	programs can start to rely on these interfaces, but they must be
	aware of changes that can occur before these interfaces move to
	be marked stable.  Programs that use these interfaces are
	strongly encouraged to add their name to the description of
	these interfaces, so that the kernel developers can easily
	notify them if any changes occur (see the description of the
	layout of the files below for details on how to do this.)

  obsolete/
  	This directory documents interfaces that are still remaining in
	the kernel, but are marked to be removed at some later point in
	time.  The description of the interface will document the reason
	why it is obsolete and when it can be expected to be removed.

  removed/
	This directory contains a list of the old interfaces that have
	been removed from the kernel.

Every file in these directories will contain the following information:

What:		Short description of the interface
Date:		Date created
KernelVersion:	Kernel version this feature first showed up in.
Contact:	Primary contact for this interface (may be a mailing list)
Description:	Long description of the interface and how to use it.
Users:		All users of this interface who wish to be notified when
		it changes.  This is very important for interfaces in
		the "testing" stage, so that kernel developers can work
		with userspace developers to ensure that things do not
		break in ways that are unacceptable.  It is also
		important to get feedback for these interfaces to make
		sure they are working in a proper way and do not need to
		be changed further.


How things move between levels:

Interfaces in stable may move to obsolete, as long as the proper
notification is given.

Interfaces may be removed from obsolete and the kernel as long as the
documented amount of time has gone by.

Interfaces in the testing state can move to the stable state when the
developers feel they are finished.  They cannot be removed from the
kernel tree without going through the obsolete state first.

It's up to the developer to place their interfaces in the category they
wish for it to start out in.