If unused data exists in in_vq, ensure we flush that first and then
detach unused buffers, which will ensure all buffers from the in_vq are
removed.
Also ensure we free the buffers after detaching them.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Remove port data; deregister from the hvc core if it's a console port.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If the 'nr_ports' variable in the config space is updated to a higher
value, that means new ports have been hotplugged.
Introduce a new workqueue to handle such updates and create new ports.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Remove any data that we might have in a port's inbuf when closing a port
or when any data is received when a port is closed.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The host can set a name for ports so that they're easily discoverable
instead of going by the /dev/vportNpn naming. This attribute will be
placed in /sys/class/virtio-ports/vportNpn/name. udev scripts can then
create symlinks to the port using the name.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add a guest_connected field that ensures only one process
can have a port open at a time.
This also ensures we don't have a race when we later add support for
dropping buffers when closing the char dev and buffer caching is turned
off for the particular port.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Allow guest userspace applications to open, read from, write to, poll
the ports via the char dev interface.
When a port gets opened, a notification is sent to the host via a
control message indicating a connection has been established. Similarly,
on closing of the port, a notification is sent indicating disconnection.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The char device will be used as an interface by applications on the
guest to communicate with apps on the host.
The devices created are placed in /dev/vportNpn where N is the
virtio-console device number and n is the port number for that device.
One dynamic major device number is allocated for each device and minor
numbers are allocated for the ports contained within that device.
The file operation for the char devs will be added in the following
commits.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When ports get advertised as char devices, the buffers will come from
userspace. Equip the fill_readbuf function with the ability to write
to userspace buffers.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This commit adds a new feature, MULTIPORT. If the host supports this
feature as well, the config space has the number of ports defined for
that device. New ports are spawned according to this information.
The config space also has the maximum number of ports that can be
spawned for a particular device. This is useful in initializing the
appropriate number of virtqueues in advance, as ports might be
hot-plugged in later.
Using this feature, generic ports can be created which are not tied to
hvc consoles.
We also open up a private channel between the host and the guest via
which some "control" messages are exchanged for the ports, like whether
the port being spawned is a console port, resizing the console window,
etc.
Next commits will add support for hotplugging and presenting char
devices in /dev/ for bi-directional guest-host communication.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Adding support for generic ports that will write to userspace will need
some code changes.
Consolidate the write routine into send_buf() and put_chars() now just
calls into the new function.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In preparation for serving data to userspace (generic ports) as well as
in-kernel users (hvc consoles), separate out the functionality common to
both in a 'fill_readbuf()' function.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With support for multiple ports, each port will have its own input and
output vqs. Prepare the probe function for this change.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Console ports could be hot-added. Also, with the new multiport support,
a port is identified as a console port only if the host sends a control
message.
Move the console port init into a separate function so it can be invoked
from other places.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Move out console-specific stuff into a separate struct from 'struct
port' as we need to maintain two lists: one for all the ports (which
includes consoles) and one only for consoles since the hvc callbacks
only give us the vtermno.
This makes console handling cleaner.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When multiple console support is added, ensure each port's size gets
updated when a new one is opened via hvc.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rather than assume a single port, add a 'struct ports_device' which
stores data related to all the ports for that device.
Currently, there's only one port and is hooked up with hvc, but that
will change.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now we can use an allocation function to remove our global console variable.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Keep a list of all ports being used as a console, and provide a lock
and a lookup function. The hvc callbacks only give us a vterm number,
so we need to map this.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Part of removing our "one console" assumptions, use vdev->priv to point
to the port (currently == the global console).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes taking locks around the get_buf vq operation easier, as well
as complements the add_inbuf() operation.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
add_inbuf() assumed one port and one inbuf per port. Remove that
assumption.
Also move the function so that put_chars and get_chars are together.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Collect port buffer, used_len, offset fields into a single structure.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We are heading towards a multiple-"port" system, so as part of weaning off
globals we encapsulate the information into 'struct port'.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We support only one virtio_console device at a time. If multiple are
found, error out if one is already initialized.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is nicer for modern R/O protection. And noone needs it non-const, so
constify the callers as well.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linuxppc-dev@ozlabs.org
That way, we can make it const as is good kernel style. We use a separate
indirection for the early console, rather than mugging ops.put_chars.
We rename it hv_ops, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Remove old lguest-style comments.
[Amit: - wingify comments acc. to kernel style
- indent comments ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
vq operations depend on vq->data[i] being NULL to figure out if the vq
entry is in use (since the previous patch).
We have to initialize them to NULL to ensure we don't work with junk
data and trigger false BUG_ONs.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shirley Ma <xma@us.ibm.com>
There's currently no way for a virtio driver to ask for unused
buffers, so it has to keep a list itself to reclaim them at shutdown.
This is redundant, since virtio_ring stores that information. So
add a new hook to do this.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Allow reading various alignment values from the config page. This
allows the guest to much better align I/O requests depending on the
storage topology.
Note that the formats for the config values appear a bit messed up,
but we follow the formats used by ATA and SCSI so they are expected in
the storage world.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio is communicating with a virtual "device" that actually runs on
another host processor. Thus SMP barriers can be used to control
memory access ordering.
Where possible, we should use SMP barriers which are more lightweight than
mandatory barriers, because mandatory barriers also control MMIO effects on
accesses through relaxed memory I/O windows (which virtio does not use)
(compare specifically smp_rmb and rmb on x86_64).
We can't just use smp_mb and friends though, because
we must force memory ordering even if guest is UP since host could be
running on another CPU, but SMP barriers are defined to barrier() in
that configuration. So, for UP fall back to mandatory barriers instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
With DEBUG defined, we add an ->in_use flag to detect if the caller
invokes two virtio methods in parallel. The barriers attempt to ensure
timely update of the ->in_use flag.
But they're voodoo: if we need these barriers it implies that the
calling code doesn't have sufficient synchronization to ensure the
code paths aren't invoked at the same time anyway, and we want to
detect it.
Also, adding barriers changes timing, so turning on debug has more
chance of hiding real problems.
Thanks to MST for drawing my attention to this code...
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Two years ago 5bbf89fc26 removed the horrible bzImage unpacking code.
Now it's time to remove the unneeded zlib.h include, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a fix for my earlier patch: "virtio: Add memory statistics reporting to
the balloon driver (V4)".
I discovered that all_vm_events() can sleep and therefore stats collection
cannot be done in interrupt context. One solution is to handle the interrupt
by noting that stats need to be collected and waking the existing vballoon
kthread which will complete the work via stats_handle_request(). Rusty, is
this a saner way of doing business?
There is one issue that I would like a broader opinion on. In stats_request, I
update vb->need_stats_update and then wake up the kthread. The kthread uses
vb->need_stats_update as a condition variable. Do I need a memory barrier
between the update and wake_up to ensure that my kthread sees the correct
value? My testing suggests that it is not needed but I would like some
confirmation from the experts.
Signed-off-by: Adam Litke <agl@us.ibm.com>
To: Rusty Russell <rusty@rustcorp.com.au>
Cc: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changes since V3:
- Do not do endian conversions as they will be done in the host
- Report stats that reference a quantity of memory in bytes
- Minor coding style updates
Changes since V2:
- Increase stat field size to 64 bits
- Report all sizes in kb (not pages)
- Drop anon_pages stat and fix endianness conversion
Changes since V1:
- Use a virtqueue instead of the device config space
When using ballooning to manage overcommitted memory on a host, a system for
guests to communicate their memory usage to the host can provide information
that will minimize the impact of ballooning on the guests. The current method
employs a daemon running in each guest that communicates memory statistics to a
host daemon at a specified time interval. The host daemon aggregates this
information and inflates and/or deflates balloons according to the level of
host memory pressure. This approach is effective but overly complex since a
daemon must be installed inside each guest and coordinated to communicate with
the host. A simpler approach is to collect memory statistics in the virtio
balloon driver and communicate them directly to the hypervisor.
This patch enables the guest-side support by adding stats collection and
reporting to the virtio balloon driver.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor fixes)
This is needed to compile with CONFIG_VIRTIO_PCI=y,
because virtio_pci_remove is marked __devexit.
Signed-off-by: Jamie Lokier <jamie@shareable.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
803bf5ec25 ("fs/exec.c: restrict initial
stack space expansion to rlimit") attempts to limit the initial stack to
20*PAGE_SIZE. Unfortunately, in attempting ensure the stack is not
reduced in size, we ended up not changing the stack at all.
This size reduction check is not necessary as the expand_stack call does
this already.
This caused a regression in UML resulting in most guest processes being
killed.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jouni Malinen <j@w1.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 4410f39109 ("fbdev: add support for
handoff from firmware to hw framebuffers") didn't add fb_destroy
operation to efifb. Fix it and change aperture_size to match size
passed to request_mem_region.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15151
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Reported-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Tested-by: Alex Zhavnerchik <alex.vizor@gmail.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
geode-mfgpt: restore previous behavior for selecting IRQ
The MFGPT IRQ used to be, in order of decreasing priority,
* IRQ supplied by the user as a boot-time parameter,
* IRQ previously set by the BIOS or another driver,
* default IRQ given at compile time.
Return to this behavior, which got broken when splitting the
MFGPT/clocksource driver for 2.6.33-rc1.
Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Acked-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan.crouse@amd.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is retry of reverted 859ddf0974
("idr: fix a critical misallocation bug") which contained two bugs.
* pa[idp->layers] should be cleared even if it's not used by
sub_alloc() because it's used by mark idr_mark_full().
* The original condition check also assigned pa[l] to p which the new
code didn't do thus leaving p pointing at the wrong layer.
Both problems have been fixed and the idr code has received good amount
testing using userland testing setup where simple bitmap allocator is
run parallel to verify the result of idr allocation.
The bug this patch fixes is caused by sub_alloc() optimization path
bypassing out-of-room condition check and restarting allocation loop
with starting value higher than maximum allowed value. For detailed
description, please read commit message of 859ddf09.
Signed-off-by: Tejun Heo <tj@kernel.org>
Based-on-patch-from: Eric Paris <eparis@redhat.com>
Reported-by: Eric Paris <eparis@redhat.com>
Tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
find_task_by_vpid() is not safe without rcu_read_lock(). 2.6.33-rc7 got
RCU protection for sys_setpriority() but missed it for sys_getpriority().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently the oom-killer is memcg aware and it finds the worst process
from processes under memcg(s) in oom. Then, it kills victim's child
first.
It may kill a child in another cgroup and may not be any help for
recovery. And it will break the assumption users have.
This patch fixes it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ignoring the last page when ddr size is 128M. Cached accesses to last page
is causing the processor to prefetch using address above 128M stepping out
of the DDR address space.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/981/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
arch/mips/mm/highmem.c: In function 'kmap_init':
arch/mips/mm/highmem.c:130: error: 'init_mm' undeclared (first use in this function)
arch/mips/mm/highmem.c:130: error: (Each undeclared identifier is reported only once
arch/mips/mm/highmem.c:130: error: for each function it appears in.)
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips <linux-mips@linux-mips.org>
Patchwork: http://patchwork.linux-mips.org/patch/980/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>