The lightweight vmexit path avoids saving and reloading certain host
state. However in certain cases lightweight vmexit handling can schedule()
which requires reloading the host state.
So we store the host state in the vcpu structure, and reloaded it if we
relinquish the vcpu.
Signed-off-by: Avi Kivity <avi@qumranet.com>
A typical demand page/copy on write pattern is:
- page fault on vaddr
- kvm propagates fault to guest
- guest handles fault, updates pte
- kvm traps write, clears shadow pte, resumes guest
- guest returns to userspace, re-faults on same vaddr
- kvm installs shadow pte, resumes guest
- guest continues
So, three vmexits for a single guest page fault. But if instead of clearing
the page table entry, we update to correspond to the value that the guest
has just written, we eliminate the third vmexit.
This patch does exactly that, reducing kbuild time by about 10%.
Signed-off-by: Avi Kivity <avi@qumranet.com>
When a guest writes to a page that has an mmu shadow, we have to clear
the shadow pte corresponding to the memory location touched by the guest.
Now, in nonpae mode, a single guest page may have two or four shadow
pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps
2MB or 1GB), so we when we look up the page we find up to three additional
aliases for the page. Since we _clear_ the shadow pte, it doesn't matter
except for a slight performance penalty, but if we want to _update_ the
shadow pte instead of clearing it, it is vital that we don't modify the
aliases.
Fortunately, exactly which page is needed (the "quadrant") is easily
computed, and is accessible in the shadow page header. All we need is
to ignore shadow pages from the wrong quadrants.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Instead of calling two functions and repeating expensive checks, call one
function and provide it with before/after information.
Signed-off-by: Avi Kivity <avi@qumranet.com>
i386 wants fs for accessing the pda even on a lightweight exit, so ensure
we can always restore it. This fixes a regression on i386 introduced by
the lightweight vmexit patch.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The kvm mmu tries to detects forks by looking for repeated writes to a
page table. If it sees a fork, it unshadows the page table so the page
table copying can proceed at native speed instead of being emulated.
However, the detector also triggered on simple demand paging access patterns:
a linear walk of memory would of course cause repeated writes to the same
pagetable page, causing it to unshadow prematurely.
Fix by resetting the fork detector if we detect a demand fault.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Many msrs and the like will only be used by the host if we schedule() or
return to userspace. Therefore, we avoid saving them if we handle the
exit within the kernel, and if a reschedule is not requested.
Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of
fixes by me.
Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This allows us to remove write protection earlier than otherwise. Should
some mad OS choose to use byte writes to update pagetables, it will suffer
a performance hit, but still work correctly.
Signed-off-by: Avi Kivity <avi@qumranet.com>
The PC debug port is used for IO delay and does not require emulation.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch enables IO bitmaps control on vmx and unmask the 0x80 port to
avoid VMEXITs caused by accessing port 0x80. 0x80 is used as delays (see
include/asm/io.h), and handling VMEXITs on its access is unnecessary but
slows things down. This patch improves kernel build test at around
3%~5%.
Because every VM uses the same io bitmap, it is shared between
all VMs rather than a per-VM data structure.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* git://git.infradead.org/battery-2.6:
git-battery vs git-acpi
Power supply class and drivers: remove non obligatory return statements
pda_power: clean up irq, timer
MAINTAINERS: Add maintainers for power supply subsystem and drivers
Fixed up trivial conflict in drivers/w1/slaves/w1_ds2760.c manually
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (166 commits)
[SCSI] ibmvscsi: convert to use the data buffer accessors
[SCSI] dc395x: convert to use the data buffer accessors
[SCSI] ncr53c8xx: convert to use the data buffer accessors
[SCSI] sym53c8xx: convert to use the data buffer accessors
[SCSI] ppa: coding police and printk levels
[SCSI] aic7xxx_old: remove redundant GFP_ATOMIC from kmalloc
[SCSI] i2o: remove redundant GFP_ATOMIC from kmalloc from device.c
[SCSI] remove the dead CYBERSTORMIII_SCSI option
[SCSI] don't build scsi_dma_{map,unmap} for !HAS_DMA
[SCSI] Clean up scsi_add_lun a bit
[SCSI] 53c700: Remove printk, which triggers because of low scsi clock on SNI RMs
[SCSI] sni_53c710: Cleanup
[SCSI] qla4xxx: Fix underrun/overrun conditions
[SCSI] megaraid_mbox: use mutex instead of semaphore
[SCSI] aacraid: add 51245, 51645 and 52245 adapters to documentation.
[SCSI] qla2xxx: update version to 8.02.00-k1.
[SCSI] qla2xxx: add support for NPIV
[SCSI] stex: use resid for xfer len information
[SCSI] Add Brownie 1200U3P to blacklist
[SCSI] scsi.c: convert to use the data buffer accessors
...
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits)
[TCP]: Verify the presence of RETRANS bit when leaving FRTO
[IPV6]: Call inet6addr_chain notifiers on link down
[NET_SCHED]: Kill CONFIG_NET_CLS_POLICE
[NET_SCHED]: act_api: qdisc internal reclassify support
[NET_SCHED]: sch_dsmark: act_api support
[NET_SCHED]: sch_atm: act_api support
[NET_SCHED]: sch_atm: Lindent
[IPV6]: MSG_ERRQUEUE messages do not pass to connected raw sockets
[IPV4]: Cleanup call to __neigh_lookup()
[NET_SCHED]: Revert "avoid transmit softirq on watchdog wakeup" optimization
[NETFILTER]: nf_conntrack: UDPLITE support
[NETFILTER]: nf_conntrack: mark protocols __read_mostly
[NETFILTER]: x_tables: add connlimit match
[NETFILTER]: Lower *tables printk severity
[NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
[NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
[NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
[NETFILTER]: nf_conntrack: Increment error count on parsing IPv4 header
[NET]: Add ethtool support for NETIF_F_IPV6_CSUM devices.
[AF_IUCV]: Add lock when updating accept_q
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix a race condition bug in umount which caused a segfault
9p: re-enable mount time debug option
9p: cache meta-data when cache=loose
net/9p: set error to EREMOTEIO if trans->write returns zero
net/9p: change net/9p module name to 9pnet
9p: Reorganization of 9p file system code
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: (37 commits)
[XFS] Fix lockdep annotations for xfs_lock_inodes
[LIB]: export radix_tree_preload()
[XFS] Fix XFS_IOC_FSBULKSTAT{,_SINGLE} & XFS_IOC_FSINUMBERS in compat mode
[XFS] Compat ioctl handler for handle operations
[XFS] Compat ioctl handler for XFS_IOC_FSGEOMETRY_V1.
[XFS] Clean up function name handling in tracing code
[XFS] Quota inode has no parent.
[XFS] Concurrent Multi-File Data Streams
[XFS] Use uninitialized_var macro to stop warning about rtx
[XFS] XFS should not be looking at filp reference counts
[XFS] Use is_power_of_2 instead of open coding checks
[XFS] Reduce shouting by removing unnecessary macros from dir2 code.
[XFS] Simplify XFS min/max macros.
[XFS] Kill off xfs_count_bits
[XFS] Cancel transactions on xfs_itruncate_start error.
[XFS] Use do_div() on 64 bit types.
[XFS] Fix remount,readonly path to flush everything correctly.
[XFS] Cleanup inode extent size hint extraction
[XFS] Prevent ENOSPC from aborting transactions that need to succeed
[XFS] Prevent deadlock when flushing inodes on unmount
...
It depends on tristate I2C and it's trivial to make modular. The
current Kconfig allows I2C=m, I2C_ACORN=y, which doesn't work at
all; alternatives are dependency on I2C=y and making I2C_ACORN
itself a tristate. The latter is the right thing to do...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
s390 is the only 32bit with unsigned long for size_t (usual for those
is unsigned int). Tell sparse...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... so all proud owners of s390-based PDAs will have to live without that one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Going through the string and waiting for _pointer_ to become '\0'
is not what the authors meant...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Ben Collins <ben.collins@ubuntu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shows how many people are testing coda - the bug had been there for 5 years
and results of stepping on it are not subtle.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/w1/slaves/w1_ds2760.c:85: warning: initialization from incompatible pointer type
The ACPI guys changed the bin_attr APIs
(commit 91a6902958)
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Clean up pda_power interrupt handling:
Prior to this patch, the driver would pass information it needed
to the interrupt handler dev_id pointer, and then prompt forget it
ever did so, recreating that same information after a couple passes
through the timer-based state machine.
This patch removes the redundant checks by passing the
pda_power_supply[] pointer through the state machine. The current
code passed 'irq' through the state machine, as an index to recreate
the pointer, when we could more simply pass around the pointer itself.
This patch makes it easier to remove the 'irq' argument in the future,
in addition to cleaning up the driver today.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Santiago Leon <santil@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jamie Lenehan <lenehan@twibble.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
Jens Axboe <jens.axboe@oracle.com> did the for_each_sg cleanup.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add printk levels
Clean up some oddities of formatting
Fix goto labels
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
drivers/scsi/aic7xxx_old.c:aic7xxx_slave_alloc() unnecessarily passes
GFP_ATOMIC (along with GFP_KERNEL) to kmalloc() from a context that is not
atomic. Remove the pointless GFP_ATOMIC.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
drivers/message/i2o/device.c:i2o_parm_field_get() unnecessarily passes
GFP_ATOMIC (along with GFP_KERNEL) to kmalloc() from a context that is not
atomic. Remove the pointless GFP_ATOMIC.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Not converted to the 2.6 kconfig system and no code in the tree.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For yet unknown reason, something cleared SACKED_RETRANS bit
underneath FRTO.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the link is brought down via ip link or ifconfig down,
the inet6addr_chain notifiers are not called even though all
the addresses are removed from the interface. This caused SCTP
to add duplicate addresses to it's list.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The behaviour of NET_CLS_POLICE for TC_POLICE_RECLASSIFY was to return
it to the qdisc, which could handle it internally or ignore it. With
NET_CLS_ACT however, tc_classify starts over at the first classifier
and never returns it to the qdisc. This makes it impossible to support
qdisc-internal reclassification, which in turn makes it impossible to
remove the old NET_CLS_POLICE code without breaking compatibility since
we have two qdiscs (CBQ and ATM) that support this.
This patch adds a tc_classify_compat function that handles
reclassification the old way and changes CBQ and ATM to use it.
This again is of course not fully backwards compatible with the previous
NET_CLS_ACT behaviour. Unfortunately there is no way to fully maintain
compatibility *and* support qdisc internal reclassification with
NET_CLS_ACT, but this seems like the better choice over keeping the two
incompatible options around forever.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>